Message boards : Number crunching : Minirosetta v1.47 bug thread.
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 6,141 |
serious credit issue here: In different tasks I've had: 216878857 - CPU time 10076.6 Claimed credit 49.588655190211 Granted credit 100.839750703433 217129212 - CPU time 12904.09 Claimed credit 62.9250827866192 Granted credit 47.1981949319233 It varies. I wouldn't worry about it. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 6,141 |
This makes 10 tasks in a days time that have died with the 0xc error. COME ON! Later... dec 24 22.15 UTC - system is stable and RAC is slowly returning to normal. I must've missed the apology elsewhere in the thread. I'm sure it was there somewhere. But maybe not. Literally a thankless task. |
Hugh Miller Send message Joined: 2 Nov 05 Posts: 1 Credit: 37,808 RAC: 0 |
I'm running: BOINC 6.4.5 Rosetta Mini 1.47 on a machine with: Win Vista Ultimate 64-bit SP1 Core Duo P8600 2.4GHz 4GB RAM NVIDIA GEForce 9200M GS chipset, 256MB dedicated graphics memory The screensaver behaves erratically. Sometimes it presents the familiar screen, other times it just goes white with a spinning cursor; if I hit ESC to exit, I get the errorbox reading: minirosetta_graphics_1.20_windows_x86_64.exe is not responding I have to bail manually from the screensaver at that point. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 6,141 |
Happy New Year from this side of the Atlantic! Once people sober up can you consider this scenario I've seen: I glanced at my Boinc Manager earlier this evening and had one long-running WU at nearly 5 hours on a 3 hour run-time. A couple of hours later I noticed it had dropped back massively to just 19 minutes in (still the first model). It's done this again a few times since. I upgraded to Boinc 6.4.5 a day or two before the Mini 1.47 WUs started coming through (mid-Dec), so I'm not sure which is responsible for this, but since the lockfile errors stopped crashing WUs out there have been several instances of WUs taking a long time with nothing at all reported in the manager's message tab, then finishing relatively early with no error message. Am I imagining this or are others seeing the same thing? Without error messages I don't really know what to report, nor where to report it, but I'm sure it's happening. I believe it happened with this completed WU and is currently happening with this in-progress WU. Both are cc2_1_8_mammoth_mix_fa_cst_hb jobs if that makes a difference. Any ideas? |
arminius Send message Joined: 23 Sep 05 Posts: 8 Credit: 805,403 RAC: 0 |
|
arminius Send message Joined: 23 Sep 05 Posts: 8 Credit: 805,403 RAC: 0 |
|
Greenshit Send message Joined: 30 Jan 07 Posts: 3 Credit: 55,173 RAC: 0 |
|
Greenshit Send message Joined: 30 Jan 07 Posts: 3 Credit: 55,173 RAC: 0 |
|
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
I've had a fairly consistent failure rate for the mini-Rosetta app on my 64bit Vista computer for several months now (hence the reason why it is rarely crunching here). I thought I saw some light at the end so I attached again yesterday only to find 3 more tasks that have failed. All have error code: -1073741819 (0xc0000005) The workunits are as follows: 218380490 218380489 218380488 I do hope project staff will look into these. I would really like to get back over to ROSETTA on this machine but I can' waste the cycles without the fix. I can run some RALPH WU if this is needed to track it down. Also, all three WU had messages reporting that the "Output file was missing" prior to failure. Edit Added: Paul Buck mentioned a few posts ago that his tasks that failed were possibly suspended and I know for a fact that the tasks that failed on my computer were indeed suspended and were not left in memory after the suspension. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I've had a fairly consistent failure rate for the mini-Rosetta app on my 64bit Vista computer for several months now (hence the reason why it is rarely crunching here). I thought I saw some light at the end so I attached again yesterday only to find 3 more tasks that have failed. All have error code: quick qustion. are you OC'd at all? this looks like what I had when my OC speed was to high. I lowered it and all was ok. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
what with this task and its credit? cc2_1_8_native_cen_cst_hb_t369__IGNORE_THE_REST_1RXQA_14_5863_202_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=218243427 i am running flat out cpu speed and produced 4 decoys in 11679.33 seconds in a setting of 14400 seconds and it grants me UNDER the claimed credit. Claimed credit 78.1755065660898 Granted credit 32.0937916886001 that's just unbelievable my frustration is rising again with bad credit granted and problems with downloads on your end as well as the lousy credit for long running tasks. it is like the project is at the bottom of a sine wave again. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
Another lr5_score12 workunit that failed: 1/3/2009 9:34:50 AM|rosetta@home|Computation for task lr5_score12_rlbd_256b_IGNORE_THE_REST_DECOY_5559_1304_0 finished 1/3/2009 9:34:50 AM|rosetta@home|Output file lr5_score12_rlbd_256b_IGNORE_THE_REST_DECOY_5559_1304_0_0 for task lr5_score12_rlbd_256b_IGNORE_THE_REST_DECOY_5559_1304_0 absent https://boinc.bakerlab.org/rosetta/workunit.php?wuid=199023434 |
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
@greg_be No, I am running stock. I lowered my runtime to 1 hour (thus no switching of apps) and of the 4 completed MR that have completed, all look like they will validate. Is there causation here, idk, but I would be interested to know. It seems like the 4 or 5 times that I have come back to Rosetta with this setup (64bit Vista) everything works well until the runtime is increased to greater than 1 hour. Perhaps I will increase the runtime but switch to "leave app in memory" to see if there is any change... I've had a fairly consistent failure rate for the mini-Rosetta app on my 64bit Vista computer for several months now (hence the reason why it is rarely crunching here). I thought I saw some light at the end so I attached again yesterday only to find 3 more tasks that have failed. All have error code: |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Interesting that Win64 acts up for you. Your only 1 version of boinc manager 'out of date', but that may or may not help. Leaving in memory, thats something the group always recommends. I don't really have any other idea's at the moment. Could someone else look at his tasks and see if they have any idea's why he's crashing? @greg_be |
Path7 Send message Joined: 25 Aug 07 Posts: 128 Credit: 61,751 RAC: 0 |
Exit status: -1073741819 (0xc0000005) unhandled exception detected: lr5_score12_rlbd_1who_IGNORE_THE_REST_DECOY_5559_986_0 lr5_score12_rlbd_1mjc_IGNORE_THE_REST_DECOY_5559_534_0 AMD Turion Dual-Core RM-70 at stock speed: 2.0 GHz Windows Vista SP1 32-bit. Boinc 5.10.45 with throttling 40 %. Didn't see any errors (before) on this machine after upgrading to minirosetta 1.45. On their second run these tasks ran: Successfully on a Mac, had the same error on Windows Vista. Have a nice day, Path7. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
Interesting that Win64 acts up for you. Your only 1 version of boinc manager 'out of date', but that may or may not help. Leaving in memory, thats something the group always recommends. I don't really have any other idea's at the moment. Could someone else look at his tasks and see if they have any idea's why he's crashing? BOINC 6.4.5 is now available, which suggests that a few people found problems in BOINC 6.4.0 and more recent. I notice that all three of those workunits were the lr5_score12 type, which a few other people have been reporting having problems with. Note that some other threads indicate that Rosetta@home is likely to have problems supplying all the workunits that are requested for at least a few more hours, though. I've had problems with one of the lr5_score12 workunits lately, but after six workunits in a row that completed successfully but weren't the lr5_score12 type. Choosing the leave in memory option helps, especially if you also raise the upper limit on how much hard drive space BOINC can use, and at least for 32-bit Vista SP1, the upper limit on what fraction of the swap space BOINC can use. Since then, another non-lr5_score12 workunit has completed on my machine successfully. Another lr5_score12 workunit is still running. I'm using 14 hour workunits, but with 32-bit Vista, the leave in memory option, and with enough other projects to insure switching to another workunit a few times before these workunits complete. My lr5_score12 workunit with an error gave an error message similar to yours, so I wouldn't be surprised if it's an error specific to that batch of workunits. If you'd like to increase the workunit time, I've found that there's a setting for how long workunits can go before deciding whether to switch to another workunit, but I don't remember if Rosetta@home includes this in the settings you're allowed to change. I currently have it set to 2 hours between such decisions, though. |
Sharlee Send message Joined: 8 Nov 05 Posts: 1 Credit: 86,487 RAC: 0 |
New error to report: I am running an i7 CPU at 965 with 6G memory and Kapersky antivirus. Is there anything I can do to fix this problem? 1/4/2009 5:45:01 AM|rosetta@home|Sending scheduler request: To fetch work. Requesting 84480 seconds of work, reporting 0 completed tasks 1/4/2009 5:45:11 AM|rosetta@home|Scheduler request completed: got 7 new tasks 1/4/2009 5:45:13 AM|rosetta@home|Started download of boinc_mfr_aaat01_03_05.200_v1_3.gz 1/4/2009 5:45:13 AM|rosetta@home|Started download of boinc_mfr_aaAT01_03_05.200_v1_3.gz 1/4/2009 5:45:22 AM|rosetta@home|Finished download of boinc_mfr_aaat01_03_05.200_v1_3.gz 1/4/2009 5:45:22 AM|rosetta@home|Finished download of boinc_mfr_aaAT01_03_05.200_v1_3.gz 1/4/2009 5:45:22 AM|rosetta@home|Started download of boinc_mfr_aaat01_09_05.200_v1_3.gz 1/4/2009 5:45:22 AM|rosetta@home|Started download of boinc_mfr_aaAT01_09_05.200_v1_3.gz 1/4/2009 5:45:22 AM|rosetta@home|[error] MD5 check failed for boinc_mfr_aaat01_03_05.200_v1_3.gz 1/4/2009 5:45:22 AM|rosetta@home|[error] expected 9e156df4c561be65533ceb64059254ab, got a500261b0525281e82d9c3166980820c 1/4/2009 5:45:22 AM|rosetta@home|[error] Checksum or signature error for boinc_mfr_aaat01_03_05.200_v1_3.gz 1/4/2009 5:45:44 AM|rosetta@home|Finished download of boinc_mfr_aaat01_09_05.200_v1_3.gz 1/4/2009 5:45:44 AM|rosetta@home|Started download of AT01_.fasta 1/4/2009 5:45:45 AM|rosetta@home|Finished download of AT01_.fasta 1/4/2009 5:45:45 AM|rosetta@home|Started download of boinc_description_file.txt 1/4/2009 5:45:46 AM|rosetta@home|Finished download of boinc_description_file.txt 1/4/2009 5:45:46 AM|rosetta@home|Started download of AT01.pdb 1/4/2009 5:45:49 AM|rosetta@home|Finished download of AT01.pdb 1/4/2009 5:45:49 AM|rosetta@home|Started download of AT012.pdb 1/4/2009 5:45:51 AM|rosetta@home|Finished download of AT012.pdb 1/4/2009 5:45:53 AM|rosetta@home|Finished download of boinc_mfr_aaAT01_09_05.200_v1_3.gz 1/4/2009 5:45:53 AM|rosetta@home|[error] MD5 check failed for boinc_mfr_aaAT01_09_05.200_v1_3.gz 1/4/2009 5:45:53 AM|rosetta@home|[error] expected 01275336f54af3e7ff7d41ae314e4f73, got 7cbad1935a58db3fe90e367e4d2f7daf 1/4/2009 5:45:53 AM|rosetta@home|[error] Checksum or signature error for boinc_mfr_aaAT01_09_05.200_v1_3.gz |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
New error to report: If you run out of Rosetta@home workunits that haven't been completed and reported, you can click on Reset project after selecting Rosetta@home in the Projects window of the Advanced view and make BOINC download all though files again. |
Matthias Lehmkuhl Send message Joined: 20 Nov 05 Posts: 10 Credit: 2,448,323 RAC: 143 |
got also one WU lr5_score12... with error <message> - exit code -1073741819 (0xc0000005) </message> wuid=198929431 Matthias |
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
@ greb_be & robertmiles Thanks for looking into this. I let rosetta run last night with increased runtimes and I left the application in memory but I see that 1 wu did fail: 218380754 for the same reason as before. Also of note, there were 20 that failed because of client error while downloading--couldn't get input files, MD5 check failed: 218580846 for instance. On this computer, I have set Rosetta to no new work and I had to abort the remaining wu's. I really want to attach here but the problems are far too severe at the moment. Perhaps I'll try again in 6 months, but I must say, this is getting a bit old... |
Message boards :
Number crunching :
Minirosetta v1.47 bug thread.
©2025 University of Washington
https://www.bakerlab.org