Problems with Rosetta version 5.80

Message boards : Number crunching : Problems with Rosetta version 5.80

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 79
Credit: 273,880
RAC: 243
Message 47579 - Posted: 10 Oct 2007, 0:08:52 UTC

several error to report:
https://boinc.bakerlab.org/rosetta/result.php?resultid=110371301
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 3975811
==
</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_4190_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Validate state Invalid

-------------

https://boinc.bakerlab.org/rosetta/result.php?resultid=110418515
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 3948666
======================================================
DONE :: 1 starting structures 8134.02 cpu seconds
This process generated 5 decoys from 5 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_31335_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Validate state Invalid

-------------


https://boinc.bakerlab.org/rosetta/result.php?resultid=110838918
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 1657484
==
</stderr_txt>
]]>

Validate state Invalid


ID: 47579 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ziegenmelker

Send message
Joined: 26 Jul 06
Posts: 10
Credit: 26,061
RAC: 0
Message 47586 - Posted: 10 Oct 2007, 4:25:50 UTC - in response to Message 47579.  

According to this post I kill every 'sen15_RESAMPLE_BOINC_MFR_ABRELAX_...' WU.

cu,
Michael
ID: 47586 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 47608 - Posted: 10 Oct 2007, 22:10:58 UTC
Last modified: 10 Oct 2007, 22:12:36 UTC

1bq9A_SEARCH_PAIRINGS_ROUND3_RESCORE_75_SAVE_ALL_OUT_-1bq9A-_BARCODE__2166_3994_0
Crashed after I opened the graphics window about 1 hour and 11 minutes in as it was initializing the second model

result
ID: 47608 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Oliver

Send message
Joined: 11 Oct 07
Posts: 4
Credit: 525
RAC: 0
Message 47621 - Posted: 11 Oct 2007, 19:35:23 UTC

Hi all: We're trying to track down several sources of error. Workuntis with the batch number 2155:

sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED


appear to be flawed. I've cancelled the job; you should also feel free to abort these jobs if you see them. There aren't that many. I just fixed the problem and sent out a similar job with ID 2163.

thanks *very* much for posting!
ID: 47621 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 47666 - Posted: 13 Oct 2007, 0:49:41 UTC
Last modified: 13 Oct 2007, 0:54:36 UTC

HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_2121_65537_0

Exit status = -1073741819 (0xc0000005)


Edit: Only a couple more edits and this might make sense :)

ID: 47666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 47675 - Posted: 13 Oct 2007, 11:37:01 UTC
Last modified: 13 Oct 2007, 11:37:48 UTC

I believe there is a problem with 2168 wus. They are pruducing to many decoys.
2168
Jmarks
ID: 47675 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ziegenmelker

Send message
Joined: 26 Jul 06
Posts: 10
Credit: 26,061
RAC: 0
Message 47678 - Posted: 13 Oct 2007, 12:18:48 UTC

A valid WU, but still with errors:

<core_client_version>5.10.8</core_client_version>
<![CDATA[
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
# random seed: 3647667
SIGSEGV: segmentation violation
Stack trace (12 frames):
[0x8d7cf2f]
[0x8d77d1c]
[0xffffe500]
[0x8e024c7]
[0x8dd2715]
[0x8dd2481]
[0x83f9b8b]
[0x8de873f]
[0x8d79987]
[0x8d7afa5]
[0x8d73f9d]
[0x8e1487a]

Exiting...
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
ERROR:: Exit from: fragments.cc line: 465
FILE_LOCK::unlock(): close failed.: Bad file descriptor
*** glibc detected *** double free or corruption (fasttop): 0x0909e348 ***
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
*** glibc detected *** corrupted double-linked list: 0x09757f20 ***
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
*** glibc detected *** corrupted double-linked list: 0x09511408 ***
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 14211.6 cpu seconds
This process generated 19 decoys from 19 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>

I really wonder about all these SIGSEGV errors. I don't think they are hardware related.
"glibc detected *** corrupted double-linked list" should be caused from the app itself.
System: AMD 64 X2 4400, 2Gig, standard clock, 64-Bit OpenSUSE 10.2, glibc-2.5-25.

cu,
Michael
ID: 47678 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 151
Credit: 4,244,078
RAC: 843
Message 47680 - Posted: 13 Oct 2007, 13:24:44 UTC - in response to Message 47675.  
Last modified: 13 Oct 2007, 13:26:35 UTC

I believe there is a problem with 2168 wus. They are pruducing to many decoys.
2168


To the project people, please also look at thread 4000 credit wus? which discusses this issue, the following is one of those posts from Xaak (in the quotes) and what I found when I looked at that host.


Found another one:
https://boinc.bakerlab.org/rosetta/results.php?hostid=567404&offset=20

Maybe the credit/decoy ratio on these are way off?


I don't believe the owner knows anything about this as all the WU's making the huge claims all have the same name, they all start with mcr1, for example

mcr1__BOINC_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-short_mfr__2168_67176_0

On the other side of the coin the same computer is getting a lot of stuck WU's that are being terminated by the Watchdog that start with

STM0082_BOINC_MFR_ABRELAX_PICKED_

and they get just 20 credits for each of these WU's.
ID: 47680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 47690 - Posted: 13 Oct 2007, 17:28:52 UTC - in response to Message 47680.  

[quote]I believe there is a problem with 2168 wus. They are pruducing to many decoys.
2168


To the project people, please also look at thread 4000 credit wus? which discusses this issue, the following is one of those posts from Xaak (in the quotes) and what I found when I looked at that host.

That is the same thread my 2168 points to.
Jmarks
ID: 47690 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Christoph Jansen
Avatar

Send message
Joined: 6 Jun 06
Posts: 248
Credit: 267,153
RAC: 0
Message 47692 - Posted: 13 Oct 2007, 18:23:29 UTC
Last modified: 13 Oct 2007, 18:27:58 UTC

The machines affected by this "multi-decoy" bug seem all to be Core2Quads. And, contrary to what the 4000 credit WU thread implies, also other WU types than mcr1
are affected, see these two:

https://boinc.bakerlab.org/rosetta/result.php?resultid=111755669

https://boinc.bakerlab.org/rosetta/result.php?resultid=111753474
"I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant." R.M. Nixon
ID: 47692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M.L.

Send message
Joined: 21 Nov 06
Posts: 182
Credit: 180,462
RAC: 0
Message 47712 - Posted: 14 Oct 2007, 2:13:12 UTC

Result ID 111953379
Name HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_2121_55153_0
Workunit 101758314
Created 11 Oct 2007 16:10:04 UTC
Sent 11 Oct 2007 16:12:47 UTC
Received 14 Oct 2007 2:04:59 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 510574
Report deadline 21 Oct 2007 16:12:47 UTC
CPU time 9247.8125
stderr out <core_client_version>5.10.20</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 1886848


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00B77A52 write attempt to address 0x1FF63C2C

Engaging BOINC Windows Runtime Debugger...



********************





ID: 47712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 22,813,645
RAC: 3,531
Message 47725 - Posted: 14 Oct 2007, 19:01:45 UTC

Here's one that didn't work out:

Result ID 112315759
Name STM0082_BOINC_MFR_ABRELAX_PICKED_2175_1860_0
Workunit 102095676
Created 13 Oct 2007 5:23:55 UTC
Sent 13 Oct 2007 5:25:15 UTC
Received 14 Oct 2007 18:15:54 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 307276
Report deadline 23 Oct 2007 5:25:15 UTC
CPU time 3577.953125
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 3485771
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score 0 for 900 seconds
**********************************************************************
GZIP SILENT FILE: .aaSTM1.out

</stderr_txt>
]]>

Validate state Valid
Claimed credit 9.24557159531191
Granted credit 20
application version 5.80
ID: 47725 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 47731 - Posted: 14 Oct 2007, 20:26:03 UTC

Result ID 110380482
Name sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0
Workunit 100312279

client error and compute error

CPU time 21566.46875
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 3967126
======================================================
DONE :: 1 starting structures 21565.9 cpu seconds
This process generated 10 decoys from 10 attempts
======================================================
=================================================================================

from BOINC Manager time is CET (gmt+2)

10/13/2007 1:57:01 AM|rosetta@home|Computation for task sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0 finished
10/13/2007 1:57:01 AM|rosetta@home|Output file sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0_0 for task sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0 absent
10/13/2007 1:57:01 AM|rosetta@home|Starting CNTRL_01ABRELAX_SAVE_ALL_OUT_-1di2_-_filters_1782_408715_0
10/13/2007 1:57:01 AM|rosetta@home|Starting task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1di2_-_filters_1782_408715_0 using rosetta version 569
10/13/2007 1:57:02 AM|rosetta@home|Deferring communication for 1 min 0 sec
10/13/2007 1:57:02 AM|rosetta@home|Reason: Unrecoverable error for result sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0 (<file_xfer_error> <file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)



BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

ID: 47731 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Angus

Send message
Joined: 17 Sep 05
Posts: 412
Credit: 321,053
RAC: 0
Message 47736 - Posted: 14 Oct 2007, 21:46:28 UTC - in response to Message 47735.  

Result ID 110380482
Name sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0
Workunit 100312279


You don't read the forum very much do you? (just joking)

Might I point you to this post in this very same thread?

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=3552&nowrap=true#47621


For some reason this strikes me as very funny.
Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)
ID: 47736 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 47752 - Posted: 15 Oct 2007, 9:11:52 UTC - in response to Message 47735.  

doh! it got lost in the clutter of the rest of my work. I don't look at my work unit list that much so let that one slip through. thanks for the reminder.
have to go hunt for the rest if any.


Result ID 110380482
Name sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0
Workunit 100312279


You don't read the forum very much do you? (just joking)

Might I point you to this post in this very same thread?

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=3552&nowrap=true#47621


ID: 47752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bletchley Park

Send message
Joined: 4 Oct 07
Posts: 4
Credit: 18,052
RAC: 0
Message 47755 - Posted: 15 Oct 2007, 10:50:18 UTC

Version 5.80 BETA

computation error

unknown software exception 0xc0000409 occurred in the application at 0x00c2ec4a
lc26__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1c26_-foldanddock__2176_16844

using a lot of system cpu time.
ID: 47755 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 47766 - Posted: 15 Oct 2007, 20:07:26 UTC

the scaling for this work unit is way off in the graphs
mcr1__BOINC_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_72362_0

-275 accepted energy and lower is going down out of the window. the rmsd is often out the left of its window. I can't get any screenshots of this for some reason, but you guys know what i mean.
ID: 47766 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Z3r0

Send message
Joined: 3 Aug 06
Posts: 2
Credit: 8,453
RAC: 0
Message 47804 - Posted: 17 Oct 2007, 8:59:03 UTC

I have 5.10.20 boinc and I have just used Rosetta since last week.

It said computation error, I haven't got a log to report, I hibernate my laptop a lot, maybe this is the cause?
ID: 47804 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Z3r0

Send message
Joined: 3 Aug 06
Posts: 2
Credit: 8,453
RAC: 0
Message 47805 - Posted: 17 Oct 2007, 9:03:06 UTC

I have 5.10.20 boinc and I have just used Rosetta since last week.

It said computation error, I haven't got a log to report, I hibernate my laptop a lot, maybe this is the cause?

2007-10-17 13:13:01 [rosetta@home] Deferring communication for 1 min 11 sec
2007-10-17 13:13:01 [rosetta@home] Reason: Unrecoverable error for result 1opd__SEARCH_PAIRINGS_ROUND3_RESCORE_75_SAVE_ALL_OUT_-1opd_-_BARCODE__2166_6199_0 (Incorrect function. (0x1) - exit code 1 (0x1))

ID: 47805 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 47809 - Posted: 17 Oct 2007, 12:39:17 UTC - in response to Message 47805.  

I have 5.10.20 boinc and I have just used Rosetta since last week.

It said computation error, I haven't got a log to report, I hibernate my laptop a lot, maybe this is the cause?

2007-10-17 13:13:01 [rosetta@home] Deferring communication for 1 min 11 sec
2007-10-17 13:13:01 [rosetta@home] Reason: Unrecoverable error for result 1opd__SEARCH_PAIRINGS_ROUND3_RESCORE_75_SAVE_ALL_OUT_-1opd_-_BARCODE__2166_6199_0 (Incorrect function. (0x1) - exit code 1 (0x1))


In general Preferenced make sure the-
Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes') yes
Jmarks
ID: 47809 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Problems with Rosetta version 5.80



©2024 University of Washington
https://www.bakerlab.org