Rosetta@Home Version 3.24

Message boards : Number crunching : Rosetta@Home Version 3.24

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Yank
Avatar

Send message
Joined: 18 Apr 06
Posts: 71
Credit: 1,752,514
RAC: 0
Message 72556 - Posted: 20 Mar 2012, 1:37:07 UTC

Out of about 60 work units 5 have "computer error". They ran any where from 1,000 to 3,000 CPU seconds. Is this about normal for Rosetta. Not too concerned, just asking.

ID: 72556 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 72561 - Posted: 20 Mar 2012, 20:42:21 UTC

<message>
Maximum disk usage exceeded
</message>

It seems unlikely I'm the only one seeing this. My cpu preferred run time is currently 8 hours but this one maxed out in just over 2 hours.

CASP9_bq_benchmark_hybridization_run43_T0617_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45316_32_0


Best,
Snags
ID: 72561 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 72564 - Posted: 21 Mar 2012, 14:03:28 UTC

why are tasks with the names similar to this
rb_03_20_29193_58732__t000__SAVE_ALL_OUT_IGNORE_THE_REST_45388_2181_0
crashing on my system?

I have not changed any settings and can run everything else just fine other than CASP9. I get the error:
Exit status -1073741819 (0xffffffffc0000005)
Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00E82D90 read attempt to address 0x1E5D5000

the wingman ran this just fine. he's running an opteron cpu.
i hardly have any OC on my cpu, so is this just a touchy work unit or what?
ID: 72564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dirk Broer

Send message
Joined: 16 Nov 05
Posts: 22
Credit: 3,404,708
RAC: 1,336
Message 72566 - Posted: 21 Mar 2012, 14:49:32 UTC

bounds error (radius = -1.#IND, val = -1.#IND), def = SOGFUNC 3 5.69 4.151 0.009 5.717 4.151 0.007 18.3 7.3 0.984

ERROR: Fatal SOGFunc_Impl error.

?? Running on a P4-3200 (Prescott), WinXP 32-bit
ID: 72566 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Louis A. Hatton

Send message
Joined: 5 Oct 05
Posts: 2
Credit: 47,821
RAC: 0
Message 72568 - Posted: 21 Mar 2012, 16:51:57 UTC - in response to Message 72510.  

Rosetta@Home has been updated to version 3.24. If you encounter any problems, please let us know. Thank you for your continued support.

Among other things, this release includes support for symmetry in the hybrid protocol for comparative modeling.


ID: 72568 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 72573 - Posted: 22 Mar 2012, 6:24:02 UTC

Hi.

This is my first error with the new app, it ran for just over 3hrs.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=449462958

CASP9_bq_benchmark_hybridization_run43_T0550_1_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45178_168_0


<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>

BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation
Stack trace (15 frames):
[0xa858057]
[0xf7773400]
[0xa09e4cf]
[0xa09f0ec]
[0x9d2da83]
[0x9288587]
[0x8a3bbb0]
[0x93c09a0]
[0x93c336a]
[0x954a627]
[0x95b06f5]
[0x95adf25]
[0x80547ed]
[0xa8e7f78]
[0x8048131]

Exiting...

</stderr_txt>
]]>

ID: 72573 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 72579 - Posted: 23 Mar 2012, 2:34:41 UTC

Hi.

Another one erred, lost over 3hrs work.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=449624823

CASP9_bq_benchmark_hybridization_run43_T0518_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45099_1038_0

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>

BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation
Stack trace (23 frames):
[0xa858057]
[0xf77ec400]
[0xa55e3f3]
[0xa36da78]
[0xa55c6b1]
[0xa36d66d]
[0xa36d448]
[0x9b2b581]
[0x9a01564]
[0x9d26907]
[0x99193e3]
[0x9d729f0]
[0x9d6c74e]
[0x928aae8]
[0x8a3bbb0]
[0x93c09a0]
[0x93c336a]
[0x954a627]
[0x95b06f5]
[0x95adf25]
[0x80547ed]
[0xa8e7f78]
[0x8048131]

Exiting...

</stderr_txt>
]]>


ID: 72579 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 72581 - Posted: 23 Mar 2012, 19:47:55 UTC

whats with all the errors in CASP9 and rb_03_xxxxxx?
if it begins with one of these names it errors out on my system.
bugs, bugs and more bugs.
thought RALPH was supposed to find these problems and let you know before you release them here?!!
ID: 72581 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 72630 - Posted: 31 Mar 2012, 2:18:41 UTC
Last modified: 31 Mar 2012, 2:30:27 UTC

Hi.

I've got two erred task, 1 ran 11sec the other 9sec same error.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=451344754

2K1R_nonoe_broker_SAVE_ALL_OUT_45824_1325_0

=====================================================================

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=451406648

2K1R_nonoe_broker_SAVE_ALL_OUT_45824_1843_0

BOINC:: Worker startup.
Starting watchdog...
Watchdog active.

ERROR: Unable to set up interface foldtree because there are no movable jumps
ERROR:: Exit from: src/protocols/docking/util.cc line: 289
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>
ID: 72630 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 72633 - Posted: 31 Mar 2012, 23:15:39 UTC - in response to Message 72581.  

whats with all the errors in CASP9 and rb_03_xxxxxx?
if it begins with one of these names it errors out on my system.
bugs, bugs and more bugs.
thought RALPH was supposed to find these problems and let you know before you release them here?!!



seems that these tasks are sensitive to OC. Even the little bit I had going on was causing them to fail. turned it off.
ID: 72633 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,790,281
RAC: 4,437
Message 72634 - Posted: 1 Apr 2012, 6:56:19 UTC

495343739

Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev47790.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/if3dimer_design10monomer_fold_data.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
# cpu_run_time_pref: 7200
Starting work on structure: _00002

</stderr_txt>
]]>

Validate state Invalid
ID: 72634 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 72636 - Posted: 1 Apr 2012, 11:31:32 UTC

CASP9_bs_benchmark_hybridization_run45_T0581_2_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45708_285 Both copies failed.

On my Mac: Maximum disk usage exceeded after a cpu time of 13288.65,
nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range


On the second machine, a Windows machine on which the workunit failed much faster (1963.429 sec): Incorrect function. (0x1) - exit code 1 (0x1)
Hbond tripped: [2012- 4- 1 0:15:53:]
bounds error (radius = -1.#IND, val = -1.#IND), def = SOGFUNC 1 7.725 4.214 1

ERROR: Fatal SOGFunc_Impl error.
ERROR:: Exit from: ......srccorescoringconstraintsSOGFunc_Impl.cc line: 181
ID: 72636 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 72649 - Posted: 3 Apr 2012, 19:43:45 UTC

CASP9 tasks stink!
I can't run them, but my wingman can.
I shut down any overclocking I had on and they still die on me similar to Snagles post.

I get this Exit status -1073741819 (0xffffffffc0000005) as the error.
I looked through all the goblygook error info and find something about an execution delay.

I bombed 3 tasks in a row in just over 24hrs now.
Gets kind of old.

With no remarks from the team about this problem why should I donate more time?
So I can crash more tasks?
ID: 72649 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DmGun

Send message
Joined: 21 Nov 10
Posts: 6
Credit: 706,645
RAC: 0
Message 72659 - Posted: 4 Apr 2012, 17:15:27 UTC

> "It looks like the performance of the Rosetta@home application dropped on Macs (we believe all Macs) with 3.24. We're aware of the issue and looking into ways of remedying it."

This is already fixed?
ID: 72659 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72662 - Posted: 4 Apr 2012, 18:24:33 UTC - in response to Message 72659.  

> "It looks like the performance of the Rosetta@home application dropped on Macs (we believe all Macs) with 3.24. We're aware of the issue and looking into ways of remedying it."

This is already fixed?


Unfortunately, it probably isn't going to get resolved in the foreseeable future. The problem is there's a complex interaction between the Rosetta@home code base and a compiler bug which means that trying to compile with full optimizations just doesn't work (the compiler gets stuck in an infinite loop). While the bug has been fixed in the latest versions of the compiler, using them means that we lose compatibility with all but the most recent MacOS versions.

We've been banging on this multiple ways, and have tried *numerous* settings on multiple machines (you name it, we've probably tried it), and haven't been able to come up with anything which simultaneously allows compiling with optimizations and support for the full range of MacOS versions currently being used. We've made the decision to provide a working, albeit slower, R@h to all Mac users, rather than forcing everyone to update to the latest OS version.

It sucks, but unfortunately it's the position we're stuck with for the foreseeable future.
ID: 72662 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DmGun

Send message
Joined: 21 Nov 10
Posts: 6
Credit: 706,645
RAC: 0
Message 72664 - Posted: 4 Apr 2012, 23:59:20 UTC - in response to Message 72662.  

It sucks, but unfortunately it's the position we're stuck with for the foreseeable future.


It is a pity. We'll have to go to the F@H...
ID: 72664 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile b1llyb0y

Send message
Joined: 16 May 11
Posts: 7
Credit: 4,142,933
RAC: 14
Message 72666 - Posted: 5 Apr 2012, 0:45:36 UTC - in response to Message 72662.  

Maybe you could solve the problem by not sending those particular work units to Mac's?


Unfortunately, it probably isn't going to get resolved in the foreseeable future. The problem is there's a complex interaction between the Rosetta@home code base and a compiler bug which means that trying to compile with full optimizations just doesn't work (the compiler gets stuck in an infinite loop). While the bug has been fixed in the latest versions of the compiler, using them means that we lose compatibility with all but the most recent MacOS versions.

We've been banging on this multiple ways, and have tried *numerous* settings on multiple machines (you name it, we've probably tried it), and haven't been able to come up with anything which simultaneously allows compiling with optimizations and support for the full range of MacOS versions currently being used. We've made the decision to provide a working, albeit slower, R@h to all Mac users, rather than forcing everyone to update to the latest OS version.

It sucks, but unfortunately it's the position we're stuck with for the foreseeable future. [/quote]
ID: 72666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72667 - Posted: 5 Apr 2012, 2:25:41 UTC - in response to Message 72666.  

Maybe you could solve the problem by not sending those particular work units to Mac's?


Unfortunately, it's an application-compilation-level problem, rather than a workunit-level problem, so it affects all workunits.
ID: 72667 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 72668 - Posted: 5 Apr 2012, 7:24:43 UTC

Rocco,

Since you are reading this thread, I have a question that I can not find an answer to. After shutting down an overclocking program I still have problems processing CASP9 tasks. The majority of them crash, but my wingmen can process the majority of my crashes without any problem.

What is going on? Also the tasks with RB03 were having troubles on my system.
ID: 72668 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rocco Moretti

Send message
Joined: 18 May 10
Posts: 66
Credit: 585,745
RAC: 0
Message 72674 - Posted: 5 Apr 2012, 16:39:36 UTC - in response to Message 72668.  

Rocco,

Since you are reading this thread, I have a question that I can not find an answer to. After shutting down an overclocking program I still have problems processing CASP9 tasks. The majority of them crash, but my wingmen can process the majority of my crashes without any problem.

What is going on? Also the tasks with RB03 were having troubles on my system.


My understanding is that there is an edge case on some of the runs with the very new hybridize protocol (which are mainly being sent out as CASP9 and rb_ runs) which result in numerical instability and range errors in calculations for a substantial fraction of workunits for particular protein systems. The crashes were happening somewhat randomly, so it makes sense that the next person on the same workunit could complete it fine.

The issue should hopefully be fixed in the new version of Rosetta@home we are currently testing on Ralph@home.
ID: 72674 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Rosetta@Home Version 3.24



©2024 University of Washington
https://www.bakerlab.org