Message boards : Number crunching : Problems with version 5.96
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
It is a shame that the t405 cancellation wasn't mentioned when it was done, I could have resumed Rosetta then instead of now! The news flow is certainly a bit stagnant, I appreciate CASP is a busy time, but a couple of lines on the news column on the front page would not take long. I'm sorry about that. I didn't think about posting it on the news because there were no more tasks queued by the time they were cancelled but I should have. I'm still working on a fix for that particular protocol because we need to run similar jobs soon for another CASP target. I'll definitely post something up when we update the app with the fix. Likely within the next day or two. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 225 |
Yes, I saw the comments added, it is that which comes through the RSS feed. Good luck with the fix. Crunching Rosetta agian now. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
The t434_1_NMRREF_1_t434_1_T0434_2QPWA_2JV0_hybridIGNORE_THE_REST_truncated_4104_8528_0 errored out: Incorrect function. (0x1) - exit code 1 (0x1) Peter |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I just got this. 6/25/2008 7:43:44 AM|rosetta@home|Output file for task t434_1_NMRREF_1_t434_1_T0434_2QPWA_2JV0_hybridIGNORE_THE_REST_truncated_4104_3025_0 absent Edit// just to add this. <core_client_version>5.10.30</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # cpu_run_time_pref: 21600 # random seed: 2591620 ERROR:: Exit from: .refold.cc line: 338 </stderr_txt> pete. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
|
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,343,833 RAC: 5,955 |
On about half of the jobs, when I reach around 95% completed progress simply crawls. To completion time stops but percentages increment extremely slowly. I assume the job is progreessing but I don;t know. I guess this is normal, but sometimes, like today, it bugs me. I have two hyperthreaded Xeons (32-bit) and 8 GBytes RAM running Linux kernel 2.6.18-92.1.1.el5PAE on one machine and two Pentium III processors and 512 MBytes RAM running Linux kernel 2.6.9-67.0.15.ELsmp on the other machine. In each case, Rosetta runs up to about 96% complete in a relatively short period of time, and time remaining is usually in the order of 10 minutes. Right now, it has used up about 8 hours since getting to 96% complete (and it took only about three hours to get to 96%). This is time actually consumed by the process, not wall-clock time. I just wish the time remaining would more accurately reflect the time needed to complete. Rosetta is not the worst offender in this regard. Some projects have the time remaining actually increasing as the time consumed increases. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I've got three more of these t434_ all have failed on other hosts, what a waste. pete. |
Jack Shaftoe Send message Joined: 30 Apr 06 Posts: 115 Credit: 1,307,916 RAC: 0 |
I've got three more of these t434_ all have failed on other hosts, what a waste. Rosetta Beta 5.96 t434 is doing terrible on my hosts too. Ugh. I remember when I first joined 2 years ago - I could let my hosts go for weeks without checking on them. Now I feel the need to make sure they are ok twice a day, and will be suspending Rosetta while I leave for vacation this weekend! What a tragic shame... |
BrnmccO1 Send message Joined: 26 Jun 07 Posts: 17 Credit: 578,825 RAC: 0 |
Have also had a rash of compute errors last two weeks. Mostly the aforementioned t405's and a few t434's as well. Here's a list of the failed WU's: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=158023723 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=155316236 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=155266537 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=156920807 <-- had to manually abort, was 'stuck' https://boinc.bakerlab.org/rosetta/workunit.php?wuid=156498712 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=158046502 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=157548537 <-- Mini Rosetta, was sucessful on someone elses computer tho. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=155266298 <-- T409 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=156219608 <-- t405 had to manually abort Other than the recent troubles, things have been pretty good the past year for me, so I'll keep plugging away! Cheers, |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Well two more t434_ failed one after 3hrs,23min the other after 15min, more to go. pete. |
TeAm Enterprise Send message Joined: 28 Sep 05 Posts: 18 Credit: 27,907,583 RAC: 119 |
I am close to being out of here! I started crunching Rosetta because it would run for weeks without any attention, it sure wasn't because of the way low Boinc credit. Now I have many stuck jobs, have had to abort plenty of jobs and am running out of patience. Jim |
hedera Send message Joined: 15 Jul 06 Posts: 76 Credit: 5,246,877 RAC: 991 |
I've only been back from vacation a few days and I've already got one of these: WU 173161638 This one seems to have completed correctly on another computer. --hedera Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
This is an odd one, the first host failed on it mine ran o.k. and finished! It's one of the t434. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=158008942 pete. |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 187 Credit: 6,343,833 RAC: 5,955 |
I am close to being out of here! I started crunching Rosetta because it would run for weeks without any attention, it sure wasn't because of the way low Boinc credit. I have gotten a few "stuck jobs", if by that you mean some that get to 100% complete, time remaining: --, but still running for quite a while. I just assumed this was similar to those that run 2x or 3x longer for the last 4% than they took for the first 96%, so I let them continue to run for a while. They ultimately finished. I have not checked if they finished correctly or with an error. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 225 |
I had only just resumed Rosetta after the t405 problem, then straight away t434 strikes! Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
I'm getting a slew of those "Exit from: .refold.cc line: 338" errors. https://boinc.bakerlab.org/rosetta/result.php?resultid=173121168 https://boinc.bakerlab.org/rosetta/result.php?resultid=173123778 https://boinc.bakerlab.org/rosetta/result.php?resultid=173124103 https://boinc.bakerlab.org/rosetta/result.php?resultid=173128512 https://boinc.bakerlab.org/rosetta/result.php?resultid=173135656 https://boinc.bakerlab.org/rosetta/result.php?resultid=173166131 https://boinc.bakerlab.org/rosetta/result.php?resultid=173168057 |
(_KoDAk_) Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=173123569 Incorrect function. (0x1) - exit code 1 (0x1) ERROR:: Exit from: .refold.cc line: 338 https://boinc.bakerlab.org/rosetta/result.php?resultid=172523452 ERROR:: Unable to determine sequence length from pdb file https://boinc.bakerlab.org/rosetta/result.php?resultid=172522701 ERROR:: Unable to determine sequence length from pdb file |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=172358241 Client Error Compute error CPU Time: 15.85938 stderr out <core_client_version>6.2.6</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 2802237 ERROR:: Exit from: .loop_relax.cc line: 1863 </stderr_txt> ]]> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=172355994 FRA_t426_CASP8_2JASA_51_IGNORE_THE_RESTt426_51_mdT0421_2JASA_5.Cterm_0001_3821_590_0 big debugger dump on this one CPU time 2852.328 stderr out <core_client_version>6.2.6</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 2715239 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x00A4F68A read attempt to address 0x86842CA4 wasted cpu time on this one and crashed my system. i was testing my OC speed when this one died...maybe that had something to do with it? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=172355994 Edit: It certainly does as I crashed a few other work units as well. Sorry folks. |
Message boards :
Number crunching :
Problems with version 5.96
©2024 University of Washington
https://www.bakerlab.org