Problems with Rosetta version 5.80

Message boards : Number crunching : Problems with Rosetta version 5.80

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 47812 - Posted: 17 Oct 2007, 16:55:51 UTC

This result says it is "invalid", even though the stderr.txt, the exit status, and the message log look perfectly normal.

https://boinc.bakerlab.org/rosetta/result.php?resultid=112607927
ID: 47812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile MM Sihombing
Avatar

Send message
Joined: 22 May 06
Posts: 15
Credit: 1,424,082
RAC: 0
Message 47850 - Posted: 19 Oct 2007, 7:56:31 UTC

1c26__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1c26_-foldanddock__2176_5799_0

Compute error

Exit status -1073741819 (0xc0000005)
ID: 47850 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Xaak

Send message
Joined: 20 Mar 06
Posts: 17
Credit: 3,701,702
RAC: 0
Message 47862 - Posted: 19 Oct 2007, 14:40:38 UTC
Last modified: 19 Oct 2007, 14:41:20 UTC

The rediculously high credit wus are still happening

Latest example:
https://boinc.bakerlab.org/rosetta/result.php?resultid=113518482

Result ID 113518482
Name 1r69__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1r69_-frags83__2179_18461_1
Workunit 103081085
Created 18 Oct 2007 8:10:52 UTC
Sent 18 Oct 2007 8:11:00 UTC
Received 18 Oct 2007 17:48:39 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 567404
Report deadline 28 Oct 2007 8:11:00 UTC
CPU time 7703.15625
stderr out <core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 3319170
======================================================
DONE :: 1 starting structures 7702.38 cpu seconds
This process generated 165 decoys from 165 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>


Validate state Valid
Claimed credit 34.169257800865
Granted credit 1238.8941257865
application version 5.80



XaaK



ID: 47862 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 47867 - Posted: 19 Oct 2007, 16:58:23 UTC - in response to Message 47862.  

The rediculously high credit wus are still happening

Latest example:
https://boinc.bakerlab.org/rosetta/result.php?resultid=113518482

Result ID 113518482
Name 1r69__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1r69_-frags83__2179_18461_1
Workunit 103081085
Created 18 Oct 2007 8:10:52 UTC
Sent 18 Oct 2007 8:11:00 UTC
Received 18 Oct 2007 17:48:39 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 567404
Report deadline 28 Oct 2007 8:11:00 UTC
CPU time 7703.15625
stderr out <core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 3319170
======================================================
DONE :: 1 starting structures 7702.38 cpu seconds
This process generated 165 decoys from 165 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>


Validate state Valid
Claimed credit 34.169257800865
Granted credit 1238.8941257865
application version 5.80




that 1 made just an awfull lot of decoys, if you devide the credit given by the decoys, that comes to somewere between 7 and 8 per decoy. claimed is normal, cause thats calculated on hand of the cpu seconds spend. but this 1 made whay much decoys, i normaly make 10 to 20 decoys in 3 hours on my single core thingie.
ID: 47867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikus

Send message
Joined: 7 Nov 05
Posts: 58
Credit: 700,115
RAC: 0
Message 47868 - Posted: 19 Oct 2007, 18:28:37 UTC

aborted beta - https://boinc.bakerlab.org/rosetta/result.php?resultid=112298830

Went to my computer (to make a connection), and saw (gkrellm) that one of the cores was idle. Boincmgr status showed two Rosetta WUs running. Top showed one of them using CPU, the other sitting there "stuck". Manually aborted the second.

I have plenty of memory; "leave work in memory" is specified. Judging by the CPU time acumulated by the "stuck" workunit, it had completed its quota of decoys, and was in the process of shutting down when it got "stuck". Dual core Linux 32-bit system, boinc 5.10.21. Rosetta tasks usually complete just fine.

The problem that "stuck" workunits cause is that boinc keeps track of the number of seconds given to tasks. As near as I can tell, my system spent so much wall clock time __not__ executing that "stuck" WU that its boinc calculated efficiency has now been severely reduced. I run off-line, and connect only occasionally. The lowered efficiency value means that for a while I will be given *less* work each time I connect, and will therefore have to connect more often. Not good.
.
ID: 47868 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 47871 - Posted: 19 Oct 2007, 21:36:08 UTC

Mikus, please join the discussion on Linux preemption issues in this thread.
Rosetta Moderator: Mod.Sense
ID: 47871 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (_KoDAk_)

Send message
Joined: 18 Jul 06
Posts: 109
Credit: 1,859,263
RAC: 0
Message 47876 - Posted: 20 Oct 2007, 5:49:00 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=113112849
????
ID: 47876 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 80
Credit: 273,880
RAC: 100
Message 47884 - Posted: 20 Oct 2007, 16:05:15 UTC

Another crunching error

https://boinc.bakerlab.org/rosetta/result.php?resultid=113450720
stderr out

CPU type AuthenticAMD
AMD Athlon(tm) 64 X2 Dual-Core Processor TK-53 [x86 Family 15 Model 104 Stepping 1]

CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_1052746_0

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 2226295
==
</stderr_txt>
]]>

Validate state Invalid
Claimed credit 20.426126431417
Granted credit 0
application version 5.69

ID: 47884 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikus

Send message
Joined: 7 Nov 05
Posts: 58
Credit: 700,115
RAC: 0
Message 47897 - Posted: 21 Oct 2007, 18:29:21 UTC - in response to Message 47871.  

Mikus, please join the discussion on Linux preemption issues in this thread.

I may do so -- but the reason I did not originally is that I believe that all of the recommendations in that thread were already in place on my system. As far as I can tell, the Rosetta workunit got "stuck" __after__ it had completed crunching. So to my mind there was no "task preemption" involved (only "task exit").

Also, if it were a preemption issue, I would expect other Rosetta tasks on my system to be failing in a similar fashion. But only that *beta* 5.80 has failed so far. I suspect the problem was triggered by something about that particular task. [Note: My 'Rosetta time to crunch' is 8 hours, meaning I run Rosetta applications (including 5.80) for a longer time than typical participants do.
ID: 47897 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 47998 - Posted: 24 Oct 2007, 16:31:35 UTC

Work unit 104545488 stuck after 3 seconds: aborting. Mac OS X 10.4.19, Intel-based Imac, Boinc 5.10.20
ID: 47998 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 48012 - Posted: 25 Oct 2007, 6:13:03 UTC

I just got these two units, now they both failed on the same user it seems.

Is there a problem with the work units, the app? or their puter's.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=104775454

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=104776777

This is from their results, same for both.

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<message>
No main program specified
</message>
]]>

Pete.

ID: 48012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 48023 - Posted: 25 Oct 2007, 18:03:28 UTC

I suspect the computer. I have eight of his and am working on the third without problems.
ID: 48023 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Markus Schuhmacher

Send message
Joined: 29 May 06
Posts: 4
Credit: 1,455,542
RAC: 0
Message 48085 - Posted: 28 Oct 2007, 19:31:38 UTC

Since I installed BOINc 5.10.23 the service didn't crash anymore.

##

Seitdem ich den Boinc-Client auf 5.10.23 aktualisiert habe, läuft Boinc stabiel.
ID: 48085 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith T.
Avatar

Send message
Joined: 1 Mar 07
Posts: 58
Credit: 34,135
RAC: 0
Message 48149 - Posted: 30 Oct 2007, 13:25:30 UTC
Last modified: 30 Oct 2007, 13:28:09 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=115759805

cryb__BOINC_ABRELAX_SAVE_ALL_OUT-cryb_-_2227_32333_0

Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)

CPU time 1314.3125


stderr out <core_client_version>5.10.7</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 2304098


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910E23 read attempt to address 0x00150586

Engaging BOINC Windows Runtime Debugger...



Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C9105F8 read attempt to address 0x00150010

Engaging BOINC Windows Runtime Debugger...


</stderr_txt>
]]>
Validate state Invalid

I have seen this error before but it is rare. Last one was 16 Sep 2007 according to BoincView logs.

Keith T.
ID: 48149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ziegenmelker

Send message
Joined: 26 Jul 06
Posts: 10
Credit: 26,061
RAC: 0
Message 48257 - Posted: 1 Nov 2007, 21:23:35 UTC
Last modified: 1 Nov 2007, 21:25:29 UTC

Some more:

5.80: SIGSEGV and '*** glibc detected *** corrupted double-linked list: 0x0a01aa28 ***', but valid and granted credits (32 for 4h ???)
5.80: process got signal 11 and 2 SIGSEGV: Invalid
5.69: process exited with code 193 (0xc1) and 3 SIGSEGV: Invalid
5.80: process exited with code 193 (0xc1) and 1 SIGSEGV: Invalid

I shortened the crunching time from 4 to 1 h.

5.80: *** glibc detected *** corrupted double-linked list: 0x097ea480 *** and 1 SIGSEGV: Valid
5.69: resultid=116839781: Valid
5.69: resultid=116896290 1 SIGSEGV: Valid

The '*** glibc detected *** corrupted double-linked list:' is an error in the app.
One of the last(valid) WUs got stuck, so I shut down boinc, restarted and the WU was successfully finished.

This host is doing work for Einstein(32Bit), ABC(64Bit), Seti(64Bit) and WCG(32Bit) without problems.

cu,
Michael

[edit]format[/edit]
ID: 48257 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 48273 - Posted: 2 Nov 2007, 18:27:31 UTC

Workunit 106397169 (trunc_cryb__BOINC_ABRELAX_-trunc_cryb_-_2238_78739_0) stuck at 0.751%. Intel iMac2: Mac OSX 10.4.10; Boinc 5.10.20. Aborting.
ID: 48273 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Trey

Send message
Joined: 3 Oct 06
Posts: 11
Credit: 110,142
RAC: 0
Message 48307 - Posted: 3 Nov 2007, 19:12:56 UTC

I had a problem with 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0. I did just re-install my computer with openSUSE 10.3 (from 10.1) a few hours previous. However, WUs on the new O/S before/after the problem one seem OK.

2007-11-03 10:36:31 [---] Starting BOINC client version 5.10.21 for x86_64-pc-linux-gnu
2007-11-03 10:36:31 [---] log flags: task, file_xfer, sched_ops
2007-11-03 10:36:31 [---] Libraries: libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5
2007-11-03 10:36:31 [---] Executing as a daemon
2007-11-03 10:36:31 [---] Data directory: /home/trey/BOINC
2007-11-03 10:36:31 [---] Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ [Family 15 Model 43 Stepping 1]
2007-11-03 10:36:31 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_
legacy
2007-11-03 10:36:31 [---] OS: Linux: 2.6.22.9-0.4-default
2007-11-03 10:36:31 [---] Memory: 1.97 GB physical, 4.01 GB virtual
2007-11-03 10:36:31 [---] Disk: 98.44 GB total, 89.08 GB free
2007-11-03 10:36:31 [---] Local time is UTC -5 hours
2007-11-03 10:36:31 [rosetta@home] URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 647624; location: home; project prefs: default
2007-11-03 10:36:31 [---] General prefs: from http://www.worldcommunitygrid.org/ (last modified 2007-10-28 21:44:36)
2007-11-03 10:36:31 [---] Host location: home
2007-11-03 10:36:31 [---] General prefs: no separate prefs for home; using your defaults
2007-11-03 10:36:31 [---] Reading preferences override file
2007-11-03 10:36:31 [---] Preferences limit memory usage when active to 1007.34MB
2007-11-03 10:36:31 [---] Preferences limit memory usage when idle to 1813.21MB
2007-11-03 10:36:31 [---] Preferences limit disk usage to 1.86GB
2007-11-03 10:40:13 [rosetta@home] Restarting task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0 using rosetta version 569
2007-11-03 10:40:13 [rosetta@home] Restarting task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0 using rosetta version 569
2007-11-03 11:23:31 [rosetta@home] Computation for task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0 finished
2007-11-03 11:23:31 [rosetta@home] Starting 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0
2007-11-03 11:23:31 [rosetta@home] Starting task 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 using rosetta_beta version 580

2007-11-03 11:23:33 [rosetta@home] [file_xfer] Started upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0_0
2007-11-03 11:23:36 [rosetta@home] [file_xfer] Finished upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0_0
2007-11-03 11:23:36 [rosetta@home] [file_xfer] Throughput 42065 bytes/sec
2007-11-03 12:04:45 [rosetta@home] Deferring communication for 1 min 0 sec
2007-11-03 12:04:45 [rosetta@home] Reason: Unrecoverable error for result 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 (process exited with code 1 (0x1, -255))
2007-11-03 12:04:45 [rosetta@home] Computation for task 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 finished
2007-11-03 12:04:45 [rosetta@home] Output file 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0_0 for task 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 absent

2007-11-03 12:04:45 [rosetta@home] Starting 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0
2007-11-03 12:04:45 [rosetta@home] Starting task 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0 using rosetta_beta version 580
2007-11-03 13:02:50 [rosetta@home] Computation for task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0 finished
2007-11-03 13:02:50 [rosetta@home] Starting 1ogw__TREEJUMP_ABRELAX_NOTOR-1ogw_-_BARCODE__2241_674_0
2007-11-03 13:02:50 [rosetta@home] Starting task 1ogw__TREEJUMP_ABRELAX_NOTOR-1ogw_-_BARCODE__2241_674_0 using rosetta_beta version 580
2007-11-03 13:02:52 [rosetta@home] [file_xfer] Started upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0_0
2007-11-03 13:02:55 [rosetta@home] [file_xfer] Finished upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0_0
2007-11-03 13:02:55 [rosetta@home] [file_xfer] Throughput 43435 bytes/sec
2007-11-03 14:33:06 [rosetta@home] Computation for task 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0 finished
2007-11-03 14:33:06 [rosetta@home] Starting 2reb__BARCODE_ABRELAX_NOTOR-2reb_-_BARCODE__2242_687_0
2007-11-03 14:33:06 [rosetta@home] Starting task 2reb__BARCODE_ABRELAX_NOTOR-2reb_-_BARCODE__2242_687_0 using rosetta_beta version 580
2007-11-03 14:33:09 [rosetta@home] [file_xfer] Started upload of file 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0_0
2007-11-03 14:33:17 [rosetta@home] [file_xfer] Finished upload of file 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0_0
2007-11-03 14:33:17 [rosetta@home] [file_xfer] Throughput 7927 bytes/sec
ID: 48307 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Eric

Send message
Joined: 20 Jan 06
Posts: 3
Credit: 47,910
RAC: 0
Message 48311 - Posted: 3 Nov 2007, 20:58:57 UTC

I have a computation error on 1n0u__TREEJUMP_ABRELAX_NOTOR-1n0u_-_BARCODE__2241_50881_1
This is a first for me.
ID: 48311 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 48312 - Posted: 3 Nov 2007, 21:04:24 UTC

ID: 48312 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Trey

Send message
Joined: 3 Oct 06
Posts: 11
Credit: 110,142
RAC: 0
Message 48357 - Posted: 4 Nov 2007, 18:27:30 UTC
Last modified: 4 Nov 2007, 18:30:43 UTC

I've had another failure on a different computer (running Windoze this time):

1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8157_0



ID: 48357 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Problems with Rosetta version 5.80



©2024 University of Washington
https://www.bakerlab.org