Problems with Rosetta version 5.89

Message boards : Number crunching : Problems with Rosetta version 5.89

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile AM

Send message
Joined: 15 Jul 06
Posts: 7
Credit: 535,652
RAC: 405
Message 49750 - Posted: 17 Dec 2007, 14:00:17 UTC

So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired.
ID: 49750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 49753 - Posted: 17 Dec 2007, 16:43:44 UTC - in response to Message 49750.  

So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired.


I don't have any advanced knowledge of future Rosetta releases. But my experience with the project tells me they are working on such modications to the program, and so yes, I would assume both that other WUs will require less memory, and that future releases will improve the memory footprint.

It may be helpful if folks would post the WU name and their observations of memory usage. That will provide something concrete to compare against when a new release comes out.
Rosetta Moderator: Mod.Sense
ID: 49753 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AM

Send message
Joined: 15 Jul 06
Posts: 7
Credit: 535,652
RAC: 405
Message 49754 - Posted: 17 Dec 2007, 17:18:02 UTC - in response to Message 49753.  

So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired.


I don't have any advanced knowledge of future Rosetta releases. But my experience with the project tells me they are working on such modications to the program, and so yes, I would assume both that other WUs will require less memory, and that future releases will improve the memory footprint.

It may be helpful if folks would post the WU name and their observations of memory usage. That will provide something concrete to compare against when a new release comes out.


Sorry, here it is below...

1cc8AWHS_ETABLE_SVM_TESTS-1cc8A-frags83__2452_282
ID: 49754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 49763 - Posted: 17 Dec 2007, 19:49:11 UTC

Can anyone explain......

Task ID 126999841
Name 1tul__BOINC_ABINITIO_VF_IGNORE_THE_REST-S25-9-S3-8--1tul_-vf__2434_88_0
Workunit 115452516
Created 15 Dec 2007 20:31:18 UTC
Sent 15 Dec 2007 20:32:33 UTC
Received 17 Dec 2007 16:47:46 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 510574
Report deadline 25 Dec 2007 20:32:33 UTC
CPU time 14202.75
stderr out <core_client_version>5.10.28</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
# random seed: 2330093
sin_cos_range ERROR: -1.0408722 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: -1.0408722 is outside of [-1,+1] sin and cos value legal range
# cpu_run_time_pref: 14400
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 14202.7 cpu seconds
This process generated 42 decoys from 42 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

Am curious as to what sin_cos_range Error means,sees that Rosie seems happy to accept this task and gave credits. This is the second task i have had with the same comment, the other was under 5.82. Anyone help?

</stderr_txt>
]]>


Validate state Valid
Claimed credit 58.8835581491606
Granted credit 53.3430028257213
application version 5.89
ID: 49763 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 49767 - Posted: 17 Dec 2007, 21:52:16 UTC

Task ID 126856692
Name 1lis_WHS_ETABLE_SVM_TESTS-1lis_-frags83__2453_41_0
Workunit 115320625
Created 15 Dec 2007 4:37:53 UTC
Sent 15 Dec 2007 4:38:18 UTC
Received 15 Dec 2007 19:55:18 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 193 (0xc1)
Computer ID 687330
Report deadline 25 Dec 2007 4:38:18 UTC
CPU time 24587.680633
stderr out
<core_client_version>5.10.21</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 28800
# random seed: 1054640
SIGILL: illegal instruction
Stack trace (18 frames):
[0x8da95f7]
[0x8da43ec]
[0xffffe500]
[0x8c18d60]
[0x8775f6f]
[0x877651e]
[0x877a4cf]
[0x878b883]
[0x879128d]
[0x8cfb408]
[0x8b45e94]
[0x8b48f13]
[0x80d8c55]
[0x85f4d25]
[0x8732b67]
[0x8732c12]
[0x8e0d944]
[0x8048111]

Exiting...

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 94.8036591472954
Granted credit 0
application version 5.89

This one died after more than 6.5 hours (I run with 8 hours per workunit) when it was almost finished with an illegal instruction trap. Computer has two Quad-Core Opteron 2346HE processors and 8GB of memory running OpenSuSE 10.3 in 64-bit mode. No other error have been reported from this fairly new system as far as I can tell.
Team Helix
ID: 49767 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
marc zubrin

Send message
Joined: 13 Nov 07
Posts: 1
Credit: 6,392
RAC: 0
Message 49769 - Posted: 18 Dec 2007, 6:36:40 UTC - in response to Message 49676.  

Please post any problems with rosetta 5.89 here. Thanks!


I had 4 "client errors" in the last days among a dozen good with rosetta beta
(version 5.89 I think).
None of my other clients has had any error during this lapse of time (crunching about 400 credits per day).
Are you sure there ain't anything wrong with either the software or the workunits you send us ?...

marc zubrin
ID: 49769 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49774 - Posted: 18 Dec 2007, 13:00:48 UTC
Last modified: 18 Dec 2007, 13:08:09 UTC

I have been alternating between Windows and Linux on 3 of my 6 machines for the last for 4 days. Last nite the three that were running linux decided to stop working properly. I woke to find that the monitor program "gkrellm" showed NO CPU useage on either of my single cores or on one of both cpus on my dual core. Up until now these machines have run rosetta flawlessly, and that's what makes this strange.

All are using 64b Boinc 5.10.21 (official). All have been working well since Nov 26th 2007. I have my run time pref to 2 hours(was 1 hour until 5 days ago), so I have plenty of samples of good work already processed.

The machines in question are:

1) hostid=692479 AMD64 2800 w/768M ram
2) hostid=692481 AMD64 3700 w/1G ram
3) hostid=692483 AMD64 X2 4800 w/2G ram

Why all the machines running linux would decide to break, while the other three running Windows kept working is a mystery. I have a 1 day cache and switch OSes every 12 hours (around every 12) so after these last days there should be "similar" work for both WIN and LIN. Two of the three froze with the same job type, while one didn't. The one that had the unique job also only had the one error. While the two with the identical job types had numerous errors overnite (which is unusual as well). You can see what the results page looks like for my AMD64 3700 below:



Here's what the results page for the AMD64 2800 looked like:



All were/are running Rosetta Beta 5.89 at this time. Many SIGSEGV faults are showing up for those compute errors. Such as:

process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 7200
# random seed: 2133824
No heartbeat from core client for 31 sec - exiting
SIGSEGV: segmentation violation
ID: 49774 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AM

Send message
Joined: 15 Jul 06
Posts: 7
Credit: 535,652
RAC: 405
Message 49776 - Posted: 18 Dec 2007, 13:25:03 UTC

Memory hog WU:

1b72__BOINC_TETHER_DURING_STAGE1_POSE_ABRELAX-1b72_-_2458_1145
ID: 49776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 49777 - Posted: 18 Dec 2007, 14:03:29 UTC
Last modified: 18 Dec 2007, 14:07:41 UTC

ID: 49777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 79
Credit: 273,880
RAC: 243
Message 49782 - Posted: 18 Dec 2007, 18:56:57 UTC

This one crashed and burned almost immediately with a 161 exit error.
Message highlighted in red below says to keep in memory... All tasks are kept in memory when preempted.

w099_1_homologymodel_strictosidine_synthase_2352_99743_1

CPU time 1.402016
stderr out

<core_client_version>5.10.28</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 3900258
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
# random seed: 3900258
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
No heartbeat from core client for 31 sec - exiting
# cpu_run_time_pref: 7200
No heartbeat from core client for 31 sec - exiting
Too many restarts with no progress. Keep application in memory while preempted.
======================================================
DONE :: 1 starting structures 1.35194 cpu seconds
This process generated 0 decoys from 0 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>w099_1_homologymodel_strictosidine_synthase_2352_99743_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Validate state Invalid
Claimed credit 0.00186677223308905
Granted credit 0
application version 5.89

ID: 49782 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jon C Melusky
Avatar

Send message
Joined: 29 Nov 05
Posts: 12
Credit: 193,820
RAC: 21
Message 49787 - Posted: 18 Dec 2007, 23:51:24 UTC

I've been getting errors with rosetta for years on my 384 ram box running xp home. Just thought it was normal. 1 out of every 5 Wu's fail. Spinhenge, MalariaControl, SETI, BOINCSIMAP, TANPAKU, and Leiden Classical all run fine with zero errors over the years. I wish Rosetta could actually meet or email those people in those other groups and ask them for help with virtual memory settings. Pretty much every week, I get a few virtual memory warnings down by the clock. Pretty much every week I get a few error windows (got one today) about Rosetta runtime. No big deal, you just click ok and they go away. They all say "client error", but I have no idea if I am the client or if Rosetta is the client or if Boinc is the client or something else ?

Task ID Work unit ID Sent Time reported or deadline Server state Outcome Client state CPU time (sec) claimed credit granted credit
127686448 116086135 18 Dec 2007 23:38:15 UTC 28 Dec 2007 23:38:15 UTC In Progress Unknown New --- --- ---

127660442 116061746 18 Dec 2007 21:06:57 UTC 18 Dec 2007 23:38:14 UTC Over Client error Compute error 1.28 0.00 ---

126724871 115201235 14 Dec 2007 13:44:09 UTC 18 Dec 2007 21:06:57 UTC Over Client error Compute error 1.09 0.00 ---

126591603 115079314 13 Dec 2007 22:50:25 UTC 14 Dec 2007 18:50:18 UTC Over Success Done 6,776.48 12.64 8.76


126190225 114717930 12 Dec 2007 1:53:00 UTC 13 Dec 2007 22:50:25 UTC Over Client error Done 1.30 0.00 ---

126093431 114461562 11 Dec 2007 16:00:14 UTC 12 Dec 2007 1:53:00 UTC Over Client error Compute error 1.08 0.00 ---

Jonathan
ID: 49787 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Laurent BISSON

Send message
Joined: 15 Nov 07
Posts: 1
Credit: 238,951
RAC: 0
Message 49808 - Posted: 19 Dec 2007, 20:04:21 UTC

hi i'm in the roseatta project since november this year but boinc manager cannot bring back from internet new work . what can i do ?

(i'm on macintosh intel core 2 duo Mac os X10.4.11)

thanks for help
ID: 49808 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 49811 - Posted: 19 Dec 2007, 20:15:56 UTC

Laurent, here is a thread with a number of ideas on things to check if you are not getting new work. If you have further questions about getting work, please post them in that thread.
Rosetta Moderator: Mod.Sense
ID: 49811 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 49812 - Posted: 19 Dec 2007, 20:24:17 UTC

1hz6A_BOINC_ABINITIO_VF-S25-9-S3-3--1hz6A-vf__2450_4023_0 hung at 0.470% on Intel iMac2 ( Mac OS X 10.4.11 ): aborting.



ID: 49812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 49834 - Posted: 20 Dec 2007, 20:11:15 UTC - in response to Message 49753.  

Hey everybody -- we've been listening, and we've been especially concerned regarding the "memory hogs". We think we've fixed this problem and are updating Rosetta@home!

So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired.


I don't have any advanced knowledge of future Rosetta releases. But my experience with the project tells me they are working on such modications to the program, and so yes, I would assume both that other WUs will require less memory, and that future releases will improve the memory footprint.

It may be helpful if folks would post the WU name and their observations of memory usage. That will provide something concrete to compare against when a new release comes out.


ID: 49834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Problems with Rosetta version 5.89



©2024 University of Washington
https://www.bakerlab.org