Message boards : Number crunching : Report Problems with Rosetta Version 5.07
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
OS = Linux 2.6.10 I looked on your host and I see as many errors for 5.01 as for 5.07. All failed WU on your host completed succesful on another host. Almost all your errors have exit code 131 - this may help the team to figure out what's going on on your machine. |
![]() Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
OS = Linux 2.6.10 Looking at your host's log, you seem to get SIGSEGV errors. Btw, do you have Leave-in-mem-when-preempted=YES? (I would try this first). It looks as if WUs are restarted several times. Which Linux distro (FC5?) Also see here for others having similar problem. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
charmed Send message Joined: 2 Nov 05 Posts: 11 Credit: 1,780,440 RAC: 0 |
This work unit failed as I was watching it https://boinc.bakerlab.org/rosetta/result.php?resultid=19041999 Running Win xp on an Athlon64 3200+ with 1gb memory. |
![]() ![]() Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
I looked on your host and I see as many errors for 5.01 as for 5.07. All failed WU on your host completed succesful on another host. Almost all your errors have exit code 131 - this may help the team to figure out what's going on on your machine. Many thanks for your quick response =) I am especially grateful for the detective work that produced the exit code. I will be sure to include this in further posts regarding this host. |
![]() ![]() Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
Looking at your host's log, you seem to get SIGSEGV errors. Awesome response!!! I set the leave-in-mem-when-preempted to NO quite a while ago when I saw a message in technical news that said to do so. I will return that variable to YES immediately. The Linux distribution I am using is LinSpire 5.0 (build 5.0.59). Thank you very much for your response =) |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
Not a failure but a "suspicious" WU: https://boinc.bakerlab.org/rosetta/result.php?resultid=19037537 This one generated 1123 decoys in 8 hours. Each model started in Full Atom Relax Mode in a somewhat "unfolded" stage (only a part of the amino acid strain was visible) and had alsways high RMSD (about 50). After a few steps it quited and started a new model. |
![]() ![]() Send message Joined: 17 Sep 05 Posts: 116 Credit: 41,315 RAC: 0 |
I have suspend following WU: FA_CASP6_t198__470_5745_0 After 2:13h only 1.04%. Steps increasing very low. Last entry stdout.txt: CYCLES::number is 1 x total_residue: 69 initializing full atom coordinates BOINC :: [2006-05-04 11:46:11] :: checkpoint_decoys() :: saved decoy info :: attempted_decoys: 7 :: num_decoys: 7 :: farlx_stage: 10 dump_fullatom_pdb: farlxcheck starting score 357.328156 rms 4.70180273 starting full atom minimization [T/F OPT]Default FALSE value for [-infinite_loop] Should I running further or abort it? Don`t know how long does it take? Normally 3h for one WU. 200MB RAM usage now. |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
I have suspend following WU: FA_CASP6_t198__470_5745_0 t198 is one of the bigger proteins - 235 amino acids. I'd let it run at least 4 hour before I abort. Better abort only if reaching 24 hours and the 300 credit claiming barrier for failed WUs. |
![]() Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
I have suspend following WU: FA_CASP6_t198__470_5745_0 I had one run over my 8 hour preference time, but then it completed. Just a huge protein it seems! Now I have set my preference time to 12 hours. And I have learned to be patient! :) Regards, Bob P. |
![]() ![]() Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Keep in mind, that 1.04% number is REALLY just telling you that it is still on model 1. Once it completes model one it will recompute the % completed and may determine that you're 60% done, or even 100% and end it. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
![]() ![]() Send message Joined: 7 Oct 05 Posts: 16 Credit: 35,427 RAC: 0 |
Got this error on This WU, and here's the info: Result ID 18984230 Name HBLR_1.0_2tif_ROT_TRIALS_TRIE_CHECKPOINTS_482_214_0 Workunit 15712387 Created 3 May 2006 0:08:00 UTC Sent 3 May 2006 4:07:40 UTC Received 4 May 2006 16:13:06 UTC Server state Over Outcome Client error Client state Computing Exit status 1 (0x1) Computer ID 12719 Report deadline 17 May 2006 4:07:40 UTC CPU time 8127.546875 stderr out <core_client_version>5.4.2</core_client_version> <message>Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 1065667 # cpu_run_time_pref: 14400 # cpu_run_time_pref: 14400 # cpu_run_time_pref: 14400 ERROR:: Exit at: .hbonds.cc line:293 </stderr_txt> Validate state Invalid Claimed credit 15.0702212672248 Granted credit 0 application version 5.07 Thanks. Jeremy ![]() ![]() |
![]() Send message Joined: 23 Sep 05 Posts: 8 Credit: 808,475 RAC: 276 |
|
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=18872518 Swap space 1692.22 MB Total disk space 29.29 GB Free Disk Space 8.53 GB --- Use no more than 10 GB disk space Leave at least 0.01 GB disk space free Use no more than 50% of total disk space Write to disk at most every 60 seconds Use no more than 75% of total virtual memory ---- Generally, when I look at the memory usage on the machine itself, Rosetta is only claiming to use up around 20 megs. None of the partitions have less than 8 gigs free space - so did that WU really eat up the 8.52 gigs of HD space on the C: partition before erroring out? |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
5/2/2006 5:40:48 PM|rosetta@home|Aborting result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0: exceeded disk limit: 100308693.000000 > 100000000.000000 5/2/2006 5:40:48 PM|rosetta@home|Unrecoverable error for result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0 (Maximum disk usage exceeded) From the message log, I see that it's whining about going over 100 megs. Where did it get this value from, since I can't see that representing the settings I've chosen for Boinc&Rosetta. |
Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0 |
2 WUs with coding errors? WU 15827716 <core_client_version>5.2.13</core_client_version> <message>Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 1048865 # cpu_run_time_pref: 14400 # cpu_run_time_pref: 14400 ERROR:: Exit at: .hbonds.cc line:293 </stderr_txt> WU 15757279 <core_client_version>5.2.13</core_client_version> <message>Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 1767551 # cpu_run_time_pref: 14400 # cpu_run_time_pref: 14400 |
![]() ![]() Send message Joined: 25 Nov 05 Posts: 129 Credit: 57,345 RAC: 0 |
Yep, its a p4 - number of cpus is only reported as 1 so it doesn't even support hyper threading. ![]() ![]() |
![]() ![]() Send message Joined: 29 Nov 05 Posts: 12 Credit: 232,942 RAC: 2,082 ![]() |
Hi, I am running 5 BOINC projects, but here are the Rosetta lines of info. Looks like Rosetta has not worked in 4 days. Or so says the BOINC manager statistics tab. 5/5/2006 5:00:53 AM||Starting BOINC client version 5.2.13 for windows_intelx86 5/5/2006 5:00:53 AM||libcurl/7.14.0 OpenSSL/0.9.8 zlib/1.2.3 5/5/2006 5:00:53 AM||Data directory: C:Program FilesBOINC 5/5/2006 5:00:54 AM||Processor: 1 GenuineIntel Intel(R) Celeron(TM) CPU 1400MHz 5/5/2006 5:00:54 AM||Memory: 382.52 MB physical, 728.66 MB virtual 5/5/2006 5:00:54 AM||Disk: 93.15 GB total, 12.41 GB free 5/5/2006 5:00:54 AM|rosetta@home|Computer ID: 78725; location: home; project prefs: default 5/5/2006 5:00:58 AM|rosetta@home|Deferring computation for result HBLR_1.0_1ogw_ROT_TRIALS_TRIE_CHECKPOINTS_482_2037_0 5/5/2006 7:30:59 AM|rosetta@home|Restarting result HBLR_1.0_1ogw_ROT_TRIALS_TRIE_CHECKPOINTS_482_2037_0 using rosetta version 507 5/5/2006 7:35:38 AM|rosetta@home|Unrecoverable error for result HBLR_1.0_1ogw_ROT_TRIALS_TRIE_CHECKPOINTS_482_2037_0 ( - exit code -529697949 (0xe06d7363)) 5/5/2006 7:35:38 AM|rosetta@home|Computation for result HBLR_1.0_1ogw_ROT_TRIALS_TRIE_CHECKPOINTS_482_2037_0 finished 5/5/2006 7:36:39 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 5/5/2006 7:36:39 AM|rosetta@home|Reason: To fetch work 5/5/2006 7:36:39 AM|rosetta@home|Requesting 8640 seconds of new work, and reporting 1 results 5/5/2006 7:36:48 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 5/5/2006 7:36:50 AM|rosetta@home|Started download of 1tul_.fasta.gz 5/5/2006 7:36:50 AM|rosetta@home|Started download of 1tul.pdb.gz 5/5/2006 7:36:52 AM|rosetta@home|Finished download of 1tul_.fasta.gz 5/5/2006 7:36:52 AM|rosetta@home|Throughput 476 bytes/sec 5/5/2006 7:36:52 AM|rosetta@home|Finished download of 1tul.pdb.gz 5/5/2006 7:36:52 AM|rosetta@home|Throughput 19178 bytes/sec 5/5/2006 7:36:52 AM|rosetta@home|Started download of 1tul_.psipred_ss2.gz 5/5/2006 7:36:52 AM|rosetta@home|Started download of aa1tul_03_05.200_v1_3.gz 5/5/2006 7:36:54 AM|rosetta@home|Finished download of 1tul_.psipred_ss2.gz 5/5/2006 7:36:54 AM|rosetta@home|Throughput 4763 bytes/sec 5/5/2006 7:36:54 AM|rosetta@home|Started download of aa1tul_09_05.200_v1_3.gz 5/5/2006 7:38:35 AM|rosetta@home|Finished download of aa1tul_03_05.200_v1_3.gz 5/5/2006 7:38:35 AM|rosetta@home|Throughput 13194 bytes/sec 5/5/2006 7:38:35 AM|rosetta@home|Started download of alltopcodes.pdat.gz 5/5/2006 7:38:37 AM|rosetta@home|Finished download of alltopcodes.pdat.gz 5/5/2006 7:38:37 AM|rosetta@home|Throughput 7279 bytes/sec 5/5/2006 7:38:37 AM|rosetta@home|Started download of allbarcodes04.bar.gz 5/5/2006 7:38:43 AM|rosetta@home|Finished download of allbarcodes04.bar.gz 5/5/2006 7:38:43 AM|rosetta@home|Throughput 9792 bytes/sec 5/5/2006 7:39:38 AM|rosetta@home|Finished download of aa1tul_09_05.200_v1_3.gz 5/5/2006 7:39:38 AM|rosetta@home|Throughput 18857 bytes/sec 5/5/2006 7:39:39 AM||request_reschedule_cpus: files downloaded 5/5/2006 7:53:10 AM||request_reschedule_cpus: process exited 5/5/2006 9:39:03 AM|rosetta@home|Starting result JUMP_ALLBARCODE04_1tul__468_8868_0 using rosetta version 507 5/5/2006 9:39:25 AM|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODE04_1tul__468_8868_0 ( - exit code -164 (0xffffff5c)) 5/5/2006 9:39:25 AM||request_reschedule_cpus: process exited 5/5/2006 9:39:25 AM|rosetta@home|Computation for result JUMP_ALLBARCODE04_1tul__468_8868_0 finished 5/5/2006 9:40:28 AM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi 5/5/2006 9:40:28 AM|rosetta@home|Reason: To fetch work 5/5/2006 9:40:28 AM|rosetta@home|Requesting 8640 seconds of new work, and reporting 1 results 5/5/2006 9:40:44 AM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded 5/5/2006 9:40:46 AM|rosetta@home|Started download of 1hz6A.psipred_ss2.gz 5/5/2006 9:40:46 AM|rosetta@home|Started download of aa1hz6A03_05.400_v1_3.gz 5/5/2006 9:40:49 AM|rosetta@home|Finished download of 1hz6A.psipred_ss2.gz 5/5/2006 9:40:49 AM|rosetta@home|Throughput 309 bytes/sec 5/5/2006 9:40:49 AM|rosetta@home|Started download of frags400.txt 5/5/2006 9:41:02 AM|rosetta@home|Finished download of frags400.txt 5/5/2006 9:41:02 AM|rosetta@home|Throughput 91 bytes/sec 5/5/2006 9:41:02 AM|rosetta@home|Started download of 1hz6.pdb.gz 5/5/2006 9:41:09 AM|rosetta@home|Finished download of 1hz6.pdb.gz 5/5/2006 9:41:09 AM|rosetta@home|Throughput 1475 bytes/sec 5/5/2006 9:41:09 AM|rosetta@home|Started download of aa1hz6A09_05.400_v1_3.gz 5/5/2006 9:41:47 AM||request_reschedule_cpus: files downloaded 5/5/2006 9:42:14 AM|rosetta@home|Finished download of aa1hz6A03_05.400_v1_3.gz 5/5/2006 9:42:14 AM|rosetta@home|Throughput 12589 bytes/sec 5/5/2006 9:42:14 AM|rosetta@home|Started download of 1hz6A.fasta 5/5/2006 9:42:16 AM|rosetta@home|Finished download of 1hz6A.fasta 5/5/2006 9:42:16 AM|rosetta@home|Throughput 48 bytes/sec 5/5/2006 9:43:22 AM|rosetta@home|Finished download of aa1hz6A09_05.400_v1_3.gz 5/5/2006 9:43:22 AM|rosetta@home|Throughput 21418 bytes/sec 5/5/2006 9:43:23 AM||request_reschedule_cpus: files downloaded cheers, Jonathan ![]() |
Robert Everly Send message Joined: 8 Oct 05 Posts: 27 Credit: 665,094 RAC: 0 |
Two 5.07s died here. resultid=19196099 died with <core_client_version>5.4.3</core_client_version> <message>Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 1535558 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 ERROR:: Exit at: .dock_structure.cc line:401 </stderr_txt> and resultid=19101907 died with <core_client_version>5.4.3</core_client_version> <message>Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 3953224 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 ERROR:: Exit at: .hbonds.cc line:293 </stderr_txt> These are my first errors in a long time, so keep up the good work. |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Hi Rebirther and others with "1.04%" after 3 hours or so, please let them run until they go about 4 times your cpu run time preference. (If you haven't set a preference, our default is 3 hours, so let them run 12 hours.) If they're running longer, the jobs should be aborted by the watchdog, but please post here if not! I have suspend following WU: FA_CASP6_t198__470_5745_0 |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Hi Jon: thanks for posting. We definitely don't want Rosetta to be dysfunctional on your PC! Can you possibly post here a link to your failed workunits? In the boinc manager, you can hit "Your results" and it will give you the links. We are now beginning a big push on our test server ralph to track down the final set of bugs in rosetta@home. The app there is getting more debugging machinery added every few days. So if any users out there are seeing repeated failures on rosetta@home (there don't seem to be many -- our error rates are low), please consider attaching your computer to ralph! Hi, |
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.07
©2025 University of Washington
https://www.bakerlab.org