Problems with Minirosetta v1.54

Message boards : Number crunching : Problems with Minirosetta v1.54

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 · Next

AuthorMessage
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 60465 - Posted: 3 Apr 2009, 12:03:34 UTC - in response to Message 60461.  

Another WU with 99 successful decoys

ala_2he4_p40-1.ala.ppk_dock_random.xml_RANDOM12_BOUND_DOCK_9895_843_0

My preferred run time is 6 hours, but this one completed in less than 2. Either this is an extremely quick model or something odd occurred.


This appears to look normal, I am getting through them at the rate of about 1.17 minutes per model. If my calculations are correct you are .02 minutes faster per model.

ID: 60465 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,701,869
RAC: 2,154
Message 60495 - Posted: 5 Apr 2009, 7:50:26 UTC

first error in a long time!
ran 100% and had a compute error at the end
abinitio_nohomfrag_129_B_1o73A_SAVE_ALL_OUT_7581_8721_1
Exit status -1073741819 (0xc0000005)
CPU time 11314.84
Starting work on structure: _U9X3X_00001
# cpu_run_time_pref: 14400
Starting work on structure: _U9X3X_00002
Starting work on structure: _U9X3X_00003
Starting work on structure: _U9X3X_00004
Starting work on structure: _U9X3X_00005
Starting work on structure: _U9X3X_00006
Starting work on structure: _U9X3X_00007
Starting work on structure: _U9X3X_00008
Starting work on structure: _U9X3X_00009
Starting work on structure: _U9X3X_00010


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00587042 write attempt to address 0x34A2BAB7

Engaging BOINC Windows Runtime Debugger...
ID: 60495 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,448,817
RAC: 14,577
Message 60514 - Posted: 6 Apr 2009, 12:47:14 UTC - in response to Message 60432.  

frb_1_8_bestfrag_hb_t313___IGNORE_THE_REST_1F9TA_5_9696_15_0

7 hours running (3hr default), no decoys, Validate Error.

I've been noticing these "frb" WUs are singularly unsuccessful. What are the stats on their successful completion? I'd say they were minimal.

Oh, I don't know...

frb_1_8_ecut_hb_t322___IGNORE_THE_REST_1VPMA_12_9712_12_0

# cpu_run_time_pref: 14400
CPU time 14099.2

Claimed credit 69.0173659142213
Granted credit 229.296476006251

No complaints here!!! :)

I spoke too soon...

frb_0_8_template_enriched_hb_t313___IGNORE_THE_REST_1CZ7A_7_9682_18_1

# cpu_run_time_pref: 14400
CPU time 17744.52 [1 decoy]

Claimed credit 86.8616680245843
Granted credit 9.36388194088631

:(
ID: 60514 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 60524 - Posted: 6 Apr 2009, 22:46:11 UTC

This one cut off after a clean exit of BOINC and a reboot to install a MS fix. What wasn't clean was the restart. I forgot BOINC was in my Win startup folder and so ended up starting two of them. I then ended both and after 61 second after starting again, this task was ended. No messages, just that it finished. But it should have run another couple of hours.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 60524 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 60525 - Posted: 7 Apr 2009, 2:32:23 UTC

Task 241419982 failed on Mac: see below. Oddly, it then went out to someone on a Linux machine and completed fine.

Watchdog active.
# cpu_run_time_pref: 14400
Hbond tripped.

ERROR: dis==0 in pairtermderiv!
ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 338
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>


ID: 60525 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Klimax

Send message
Joined: 27 Apr 07
Posts: 38
Credit: 2,509,938
RAC: 4,060
Message 60526 - Posted: 7 Apr 2009, 4:45:16 UTC

Again another task is now not crunching due to "Accepted Energy:1.#QNAN" and "Accpeted RMSD:1.#QQ".
It is 39.50% Complete ; Model:11 Step 7788. I have now suspended task.

I can create dump file.Should I?

Or is it already fixed in next version?
ID: 60526 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 60528 - Posted: 7 Apr 2009, 9:00:35 UTC

Error with this one 240746159


ERROR: in::file::boinc_wu_zip fragments_2hkv.zip does not exist!
ERROR:: Exit from: ....srcappspublicboincminirosetta.cc line: 108
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

ID: 60528 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60529 - Posted: 7 Apr 2009, 12:07:58 UTC

Klimax, why don't you go ahead and take a dump and EMail it to me, along with details on what you observered with it as it ran. I will forward it to the Project Team.
Rosetta Moderator: Mod.Sense
ID: 60529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
jswolf19

Send message
Joined: 3 Apr 09
Posts: 3
Credit: 1,040,577
RAC: 0
Message 60530 - Posted: 7 Apr 2009, 13:21:23 UTC

I'm also having an issue with no progress. Rosetta Beta runs fine, but Rosetta Mini (1.54) never registers any progress even after clocking hours of CPU time (the current process I just aborted clocked at almost 17 hours). I have an Intel(R) Core(TM)2 Duo CPU P8600 @ 2.40GHz (WinXP Professional SP3) . It also won't switch off, freeing up a core for another BOINC (v6.4.7) process to run.
ID: 60530 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Klimax

Send message
Joined: 27 Apr 07
Posts: 38
Credit: 2,509,938
RAC: 4,060
Message 60540 - Posted: 7 Apr 2009, 19:26:17 UTC - in response to Message 60529.  

Klimax, why don't you go ahead and take a dump and EMail it to me, along with details on what you observered with it as it ran. I will forward it to the Project Team.

Ups,didn't know :-(
Last time I reported it,I was told to let it finish and upload.(IIRC)
Mail is being prepared.
ID: 60540 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TomaszPawel

Send message
Joined: 28 Apr 07
Posts: 54
Credit: 2,791,145
RAC: 0
Message 60554 - Posted: 8 Apr 2009, 16:02:15 UTC
Last modified: 8 Apr 2009, 16:02:56 UTC

ID: 60554 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,846,030
RAC: 1,866
Message 60560 - Posted: 9 Apr 2009, 1:12:27 UTC - in response to Message 60554.  

I have this error :

https://boinc.bakerlab.org/rosetta/result.php?resultid=240721682

any tips?


Looks like you've hit one of the errors still in 1.54 because it's too uncommon to debug quickly. Let's hope your results for that workunit help them finally debug it.

Looking at the rest of the jobs your machine has been working on lately, I'd say that that you have a lower frequency of errors than I do because you've set up your machine well for aiming at a high score (probably selecting Rosetta@home as your only BOINC project on that machine, selecting leave in memory, and running at 100% CPU usage), while I'm deliberately choosing settings aimed at helping debug problems with the program (giving other BOINC projects enough computer time to prevent workunits from Rosetta@home from being likely to complete without being interrupted to give workunits from other projects a turn, and running at 95% CPU usage, although with leave in memory selected). However, is there any good reason for maintaining such a long queue of jobs waiting for your machine to choose them next, and therefore delaying any work at the Rosetta@home end on your results?

I can't tell if you've also tried a few other things I've also found good for getting a high score, such as:

1. Selecting a black screen, instead of the BOINC graphics, as your screen saver, and avoiding activating the BOINC graphics.

2. If you see the lockfile problem in your results, suspend all projects, reboot the machine to clear any lockfiles left behind by failed workunits, then resume the projects.

3. Running the machine 24 hours a day, except when shutting BOINC down for Windows updates or other updates, running antivirus programs, running antispyware programs, and any needed reboots.

4. If you happen to need some update that doesn't require a reboot, such as most Windows Defender updates, only tell BOINC to suspend all jobs while you install the update, instead of shutting it down completely; then resume the projects after the update completes.
ID: 60560 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 20
Message 60561 - Posted: 9 Apr 2009, 5:47:08 UTC

I'm interested to know how Selecting a black screen, instead of the BOINC graphics, as your screen saver, and avoiding activating the BOINC graphics.
helps increase a work units score? Thank's in advance
Have a crunching good day!!
ID: 60561 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TomaszPawel

Send message
Joined: 28 Apr 07
Posts: 54
Credit: 2,791,145
RAC: 0
Message 60562 - Posted: 9 Apr 2009, 6:43:02 UTC - in response to Message 60560.  
Last modified: 9 Apr 2009, 6:47:09 UTC

I have this error :

https://boinc.bakerlab.org/rosetta/result.php?resultid=240721682

any tips?


Looks like you've hit one of the errors still in 1.54 because it's too uncommon to debug quickly. Let's hope your results for that workunit help them finally debug it.

Looking at the rest of the jobs your machine has been working on lately, I'd say that that you have a lower frequency of errors than I do because you've set up your machine well for aiming at a high score (probably selecting Rosetta@home as your only BOINC project on that machine, selecting leave in memory, and running at 100% CPU usage), while I'm deliberately choosing settings aimed at helping debug problems with the program (giving other BOINC projects enough computer time to prevent workunits from Rosetta@home from being likely to complete without being interrupted to give workunits from other projects a turn, and running at 95% CPU usage, although with leave in memory selected). However, is there any good reason for maintaining such a long queue of jobs waiting for your machine to choose them next, and therefore delaying any work at the Rosetta@home end on your results?

I can't tell if you've also tried a few other things I've also found good for getting a high score, such as:

1. Selecting a black screen, instead of the BOINC graphics, as your screen saver, and avoiding activating the BOINC graphics.

2. If you see the lockfile problem in your results, suspend all projects, reboot the machine to clear any lockfiles left behind by failed workunits, then resume the projects.

3. Running the machine 24 hours a day, except when shutting BOINC down for Windows updates or other updates, running antivirus programs, running antispyware programs, and any needed reboots.

4. If you happen to need some update that doesn't require a reboot, such as most Windows Defender updates, only tell BOINC to suspend all jobs while you install the update, instead of shutting it down completely; then resume the projects after the update completes.


Thanks for reply.

If you see scores I achieve for WU on that host witch make error I must tell you 2 important things:
1. It was computer with orginally Q6600@3200. On 7 apr 09 I replace this CPU to Q9550@3600. So it is safe to say that credits form 6 apr 09 and older represents Q6600 and from 8 apr 09 and newer represents Q9550.
2. I am crunching Rosetta@home at all 4 cores with GPUGRID on my GTX260. So in reality i run 5 treads by Boinc.

Also:
AD 1. I don't use BOINC screen saver only windows logo screen saver on my CRT NEC 2111SB
AD 2. I sometimes suspend to play some games....
AD 3. I must shut down my PC for night because it is to loud for me, so it crunch from 10 a.m. do 11-12 p.m. usually.
Ad 4. Rosetta@home is very GUI friendly because there is no slow down in interface. GPUGRID is real horror in that matter...
Running at 100% CPU usage is also set.
Live in memory option was not selected but today I selected it. I will see what happend :)

Also i work in 32 bit XP with 2x2Gb as CL4 DDR2 423 (846).
ID: 60562 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,507,800
RAC: 56,533
Message 60563 - Posted: 9 Apr 2009, 7:36:08 UTC - in response to Message 60561.  

I'm interested to know how Selecting a black screen, instead of the BOINC graphics, as your screen saver, and avoiding activating the BOINC graphics.
helps increase a work units score? Thank's in advance

just because your computer doesn't have to do the computation for the graphics tread too then.
ID: 60563 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 20
Message 60564 - Posted: 9 Apr 2009, 7:42:01 UTC - in response to Message 60563.  

I'm interested to know how Selecting a black screen, instead of the BOINC graphics, as your screen saver, and avoiding activating the BOINC graphics.
helps increase a work units score?

just because your computer doesn't have to do the computation for the graphics tread too then.

OK thanks
Have a crunching good day!!
ID: 60564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,846,030
RAC: 1,866
Message 60569 - Posted: 9 Apr 2009, 12:41:10 UTC - in response to Message 60561.  

I'm interested to know how Selecting a black screen, instead of the BOINC graphics, as your screen saver, and avoiding activating the BOINC graphics.
helps increase a work units score? Thank's in advance


Selecting a black screen, which only needs to be calculated once, cuts down on CPU time needed to calculate the graphics, and lets more of what's available be used for the scientific calculations. Since Rosetta@home uses the number of decoys produced as a more important factor in calculating how much credit to give you than the CPU time required to do it, this is likely to increase the number of decoys your computer produces for that workunit, and therefore the resulting score.

Also, something involving the graphics seems to be able to trigger the lockfile problem for a workunit, with the results then returned marked as invalid and therefore worth a score of zero. Once a lockfile problem occurs, 1.54 seems to be unable to erase the lockfile from the slot used by that workunit, and therefore lets the problems spread to any 1.54 workunits run later in the same slot but before the next reboot. My results for Ralph@home indicate that the 1.58 now being tested there has kept this same problem, and therefore needs more testing before the 1.54 used at Rosetta@home is replaced with a newer version.
ID: 60569 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 60570 - Posted: 9 Apr 2009, 12:54:21 UTC

Also, something involving the graphics seems to be able to trigger the lockfile problem for a workunit, with the results then returned marked as invalid


I turn the graphics on and off several times during the course of the day to check on the performance and I haven't encountered this lockfile problem for a long time now on both Rosetta and Ralph.

Having said that, Murphy's Law states 'watch this space'!!
ID: 60570 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1224
Credit: 13,846,030
RAC: 1,866
Message 60573 - Posted: 9 Apr 2009, 14:23:20 UTC - in response to Message 60570.  

Also, something involving the graphics seems to be able to trigger the lockfile problem for a workunit, with the results then returned marked as invalid


I turn the graphics on and off several times during the course of the day to check on the performance and I haven't encountered this lockfile problem for a long time now on both Rosetta and Ralph.

Having said that, Murphy's Law states 'watch this space'!!


The lockfile problem results could vary depending on what operating system version and what BOINC version is used; if so, my results could easily apply only when using BOINC 6.2.28 under Vista SP1. In other words, I suspect that results from just the two of us aren't enough; we need more people with access to other operating system versions and more versions of BOINC to test for graphics causing the lockfile problem and report the results, along with which operating system version and which BOINC version was used.
ID: 60573 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60591 - Posted: 10 Apr 2009, 6:12:14 UTC

New error:

ERROR: dis==0 in pairtermderiv!
ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 338
BOINC:: Error reading and gzipping output datafile: default.out

For this task

The same task run on an XP machine ran for a long time and only failed on validate. Which is kind of interesting, it almost seems as if my machine (OS-X) tipped over on an assertion or parameter file error ... what is the difference in OS platform guys ...
ID: 60591 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 · Next

Message boards : Number crunching : Problems with Minirosetta v1.54



©2024 University of Washington
https://www.bakerlab.org