Message boards : Number crunching : Problems with Rosetta version 5.89
Previous · 1 · 2
Author | Message |
---|---|
AM Send message Joined: 15 Jul 06 Posts: 7 Credit: 535,652 RAC: 405 |
So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired. I don't have any advanced knowledge of future Rosetta releases. But my experience with the project tells me they are working on such modications to the program, and so yes, I would assume both that other WUs will require less memory, and that future releases will improve the memory footprint. It may be helpful if folks would post the WU name and their observations of memory usage. That will provide something concrete to compare against when a new release comes out. Rosetta Moderator: Mod.Sense |
AM Send message Joined: 15 Jul 06 Posts: 7 Credit: 535,652 RAC: 405 |
So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired. Sorry, here it is below... 1cc8AWHS_ETABLE_SVM_TESTS-1cc8A-frags83__2452_282 |
M.L. Send message Joined: 21 Nov 06 Posts: 182 Credit: 180,462 RAC: 0 |
Can anyone explain...... Task ID 126999841 Name 1tul__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-9-S3-8--1tul_-vf__2434_88_0 Workunit 115452516 Created 15 Dec 2007 20:31:18 UTC Sent 15 Dec 2007 20:32:33 UTC Received 17 Dec 2007 16:47:46 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 510574 Report deadline 25 Dec 2007 20:32:33 UTC CPU time 14202.75 stderr out <core_client_version>5.10.28</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 14400 # random seed: 2330093 sin_cos_range ERROR: -1.0408722 is outside of [-1,+1] sin and cos value legal range sin_cos_range ERROR: -1.0408722 is outside of [-1,+1] sin and cos value legal range # cpu_run_time_pref: 14400 # cpu_run_time_pref: 14400 ====================================================== DONE :: 1 starting structures 14202.7 cpu seconds This process generated 42 decoys from 42 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... Am curious as to what sin_cos_range Error means,sees that Rosie seems happy to accept this task and gave credits. This is the second task i have had with the same comment, the other was under 5.82. Anyone help? </stderr_txt> ]]> Validate state Valid Claimed credit 58.8835581491606 Granted credit 53.3430028257213 application version 5.89 |
Thomas Leibold Send message Joined: 30 Jul 06 Posts: 55 Credit: 19,627,164 RAC: 0 |
Task ID 126856692 Name 1lis_WHS_ETABLE_SVM_TESTS-1lis_-frags83__2453_41_0 Workunit 115320625 Created 15 Dec 2007 4:37:53 UTC Sent 15 Dec 2007 4:38:18 UTC Received 15 Dec 2007 19:55:18 UTC Server state Over Outcome Client error Client state Compute error Exit status 193 (0xc1) Computer ID 687330 Report deadline 25 Dec 2007 4:38:18 UTC CPU time 24587.680633 stderr out <core_client_version>5.10.21</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 28800 # random seed: 1054640 SIGILL: illegal instruction Stack trace (18 frames): [0x8da95f7] [0x8da43ec] [0xffffe500] [0x8c18d60] [0x8775f6f] [0x877651e] [0x877a4cf] [0x878b883] [0x879128d] [0x8cfb408] [0x8b45e94] [0x8b48f13] [0x80d8c55] [0x85f4d25] [0x8732b67] [0x8732c12] [0x8e0d944] [0x8048111] Exiting... </stderr_txt> ]]> Validate state Invalid Claimed credit 94.8036591472954 Granted credit 0 application version 5.89 This one died after more than 6.5 hours (I run with 8 hours per workunit) when it was almost finished with an illegal instruction trap. Computer has two Quad-Core Opteron 2346HE processors and 8GB of memory running OpenSuSE 10.3 in 64-bit mode. No other error have been reported from this fairly new system as far as I can tell. Team Helix |
marc zubrin Send message Joined: 13 Nov 07 Posts: 1 Credit: 6,392 RAC: 0 |
Please post any problems with rosetta 5.89 here. Thanks! I had 4 "client errors" in the last days among a dozen good with rosetta beta (version 5.89 I think). None of my other clients has had any error during this lapse of time (crunching about 400 credits per day). Are you sure there ain't anything wrong with either the software or the workunits you send us ?... marc zubrin |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
I have been alternating between Windows and Linux on 3 of my 6 machines for the last for 4 days. Last nite the three that were running linux decided to stop working properly. I woke to find that the monitor program "gkrellm" showed NO CPU useage on either of my single cores or on one of both cpus on my dual core. Up until now these machines have run rosetta flawlessly, and that's what makes this strange. All are using 64b Boinc 5.10.21 (official). All have been working well since Nov 26th 2007. I have my run time pref to 2 hours(was 1 hour until 5 days ago), so I have plenty of samples of good work already processed. The machines in question are: 1) hostid=692479 AMD64 2800 w/768M ram 2) hostid=692481 AMD64 3700 w/1G ram 3) hostid=692483 AMD64 X2 4800 w/2G ram Why all the machines running linux would decide to break, while the other three running Windows kept working is a mystery. I have a 1 day cache and switch OSes every 12 hours (around every 12) so after these last days there should be "similar" work for both WIN and LIN. Two of the three froze with the same job type, while one didn't. The one that had the unique job also only had the one error. While the two with the identical job types had numerous errors overnite (which is unusual as well). You can see what the results page looks like for my AMD64 3700 below: Here's what the results page for the AMD64 2800 looked like: All were/are running Rosetta Beta 5.89 at this time. Many SIGSEGV faults are showing up for those compute errors. Such as: process exited with code 193 (0xc1, -63) </message> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 7200 # random seed: 2133824 No heartbeat from core client for 31 sec - exiting SIGSEGV: segmentation violation |
AM Send message Joined: 15 Jul 06 Posts: 7 Credit: 535,652 RAC: 405 |
Memory hog WU: 1b72__BOINC_TETHER_DURING_STAGE1_POSE_ABRELAX-1b72_-_2458_1145 |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Looking deeper into the issues addressed in my last post. I don't see any "wu" specific issues (I.E it doesn't appear to be ONE particular wu). These are the compute errors from overnite. AMD64 2800 wuid=115793636 1npsAWHS_ETABLE_SVM_TESTS-1npsA-frags83__2455_980 AMD64 2800 wuid=115810033 2chf__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-7-S3-6--2chf_-vf__2445_13 AMD64 3700 wuid=115907962 2chf__BOINC_ABINITIO_VF-S25-9-S3-3--2chf_-vf__2450_102 AMD64 3700 wuid=115457762 1ubi__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-6-S3-11--1ubi_-vf__2435_66 This one errored for both parties. AMD64 3700 wuid=115881653 2vik__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-10-S3-11--2vik_-vf__2447_57 AMD64 3700 wuid=115795643 1shfAWHS_ETABLE_SVM_TESTS-1shfA-frags83__2455_984 AMD64 3700 wuid=115778690 1wit__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-16-S3-5--1wit_-vf__2442_38 AMD64 3700 wuid=115762923 1aiu_WHS_ETABLE_SVM_TESTS-1aiu_-frags83__2453_919 |
Dr Who Fan Send message Joined: 28 May 06 Posts: 79 Credit: 273,880 RAC: 243 |
This one crashed and burned almost immediately with a 161 exit error. Message highlighted in red below says to keep in memory... All tasks are kept in memory when preempted. w099_1_homologymodel_strictosidine_synthase_2352_99743_1 CPU time 1.402016 stderr out <core_client_version>5.10.28</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 3900258 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 7200 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 7200 # random seed: 3900258 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 7200 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 7200 No heartbeat from core client for 31 sec - exiting # cpu_run_time_pref: 7200 No heartbeat from core client for 31 sec - exiting Too many restarts with no progress. Keep application in memory while preempted. ====================================================== DONE :: 1 starting structures 1.35194 cpu seconds This process generated 0 decoys from 0 attempts 0 starting pdbs were skipped ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> <message> <file_xfer_error> <file_name>w099_1_homologymodel_strictosidine_synthase_2352_99743_1_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Validate state Invalid Claimed credit 0.00186677223308905 Granted credit 0 application version 5.89 |
Jon C Melusky Send message Joined: 29 Nov 05 Posts: 12 Credit: 193,820 RAC: 21 |
I've been getting errors with rosetta for years on my 384 ram box running xp home. Just thought it was normal. 1 out of every 5 Wu's fail. Spinhenge, MalariaControl, SETI, BOINCSIMAP, TANPAKU, and Leiden Classical all run fine with zero errors over the years. I wish Rosetta could actually meet or email those people in those other groups and ask them for help with virtual memory settings. Pretty much every week, I get a few virtual memory warnings down by the clock. Pretty much every week I get a few error windows (got one today) about Rosetta runtime. No big deal, you just click ok and they go away. They all say "client error", but I have no idea if I am the client or if Rosetta is the client or if Boinc is the client or something else ? Task ID Work unit ID Sent Time reported or deadline Server state Outcome Client state CPU time (sec) claimed credit granted credit 127686448 116086135 18 Dec 2007 23:38:15 UTC 28 Dec 2007 23:38:15 UTC In Progress Unknown New --- --- --- 127660442 116061746 18 Dec 2007 21:06:57 UTC 18 Dec 2007 23:38:14 UTC Over Client error Compute error 1.28 0.00 --- 126724871 115201235 14 Dec 2007 13:44:09 UTC 18 Dec 2007 21:06:57 UTC Over Client error Compute error 1.09 0.00 --- 126591603 115079314 13 Dec 2007 22:50:25 UTC 14 Dec 2007 18:50:18 UTC Over Success Done 6,776.48 12.64 8.76 126190225 114717930 12 Dec 2007 1:53:00 UTC 13 Dec 2007 22:50:25 UTC Over Client error Done 1.30 0.00 --- 126093431 114461562 11 Dec 2007 16:00:14 UTC 12 Dec 2007 1:53:00 UTC Over Client error Compute error 1.08 0.00 --- Jonathan |
Laurent BISSON Send message Joined: 15 Nov 07 Posts: 1 Credit: 238,951 RAC: 0 |
hi i'm in the roseatta project since november this year but boinc manager cannot bring back from internet new work . what can i do ? (i'm on macintosh intel core 2 duo Mac os X10.4.11) thanks for help |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Laurent, here is a thread with a number of ideas on things to check if you are not getting new work. If you have further questions about getting work, please post them in that thread. Rosetta Moderator: Mod.Sense |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
1hz6A_BOINC_ABINITIO_VF-S25-9-S3-3--1hz6A-vf__2450_4023_0 hung at 0.470% on Intel iMac2 ( Mac OS X 10.4.11 ): aborting. |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Hey everybody -- we've been listening, and we've been especially concerned regarding the "memory hogs". We think we've fixed this problem and are updating Rosetta@home! So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired. |
Message boards :
Number crunching :
Problems with Rosetta version 5.89
©2024 University of Washington
https://www.bakerlab.org