Rosetta@home

Rosetta@Home Version 3.24

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Rosetta@Home Version 3.24

Sort
AuthorMessage
cmiles

Joined: Jan 4 11
Posts: 9
ID: 407423
Credit: 0
RAC: 0
Message 72510 - Posted 14 Mar 2012 16:06:58 UTC

Rosetta@Home has been updated to version 3.24. If you encounter any problems, please let us know. Thank you for your continued support.

Among other things, this release includes support for symmetry in the hybrid protocol for comparative modeling.

Sysadm@Nbg Profile

Joined: Mar 16 10
Posts: 1
ID: 373939
Credit: 323,557
RAC: 0
Message 72511 - Posted 14 Mar 2012 18:12:30 UTC

I have some problemes with upload of results
I think this is in relation with the distribution of the new app (high network traffic ?!)

A network monitoring at the rah_status.php like at primegridĀ“s server_status.php should be helpfully...

TD Nickell
Avatar

Joined: Jan 20 07
Posts: 10
ID: 142815
Credit: 3,810,259
RAC: 0
Message 72513 - Posted 14 Mar 2012 21:26:01 UTC

Same problem here.Work unit's won't upload.
____________

Andrii Muliar Profile

Joined: Nov 10 05
Posts: 12
ID: 10952
Credit: 7,050,629
RAC: 89
Message 72514 - Posted 14 Mar 2012 22:27:51 UTC

Upload is very slow but it is working for me ("Retry Now").
____________

TD Nickell
Avatar

Joined: Jan 20 07
Posts: 10
ID: 142815
Credit: 3,810,259
RAC: 0
Message 72515 - Posted 14 Mar 2012 23:16:10 UTC

Seems to be uploading okay now!
____________

cmiles

Joined: Jan 4 11
Posts: 9
ID: 407423
Credit: 0
RAC: 0
Message 72518 - Posted 15 Mar 2012 15:59:32 UTC

Thanks for your patience. We bog down a bit when new versions of the application are distributed. If you continue to experience slow uploads, please let us know.

pvh

Joined: Feb 7 10
Posts: 3
ID: 369324
Credit: 1,792,756
RAC: 0
Message 72519 - Posted 15 Mar 2012 16:51:51 UTC

I noticed that the 3.24 app did not have the execute bit set after download (in openSUSE 11.4, Boinc 7.0.18), which caused all WUs to fail. I have fixed this manually, but that should not be necessary of course...

rochester new york Profile
Avatar

Joined: Jul 2 06
Posts: 2572
ID: 98229
Credit: 1,017,229
RAC: 1,281
Message 72520 - Posted 15 Mar 2012 16:55:42 UTC - in response to Message ID 72510.

Rosetta@Home has been updated to version 3.24. If you encounter any problems, please let us know. Thank you for your continued support.

Among other things, this release includes support for symmetry in the hybrid protocol for comparative modeling.


plus remind new people the update is automatic and there is nothing they have to download ..........

In Memory of Kimsey M Fowler Sr Profile

Joined: Mar 10 12
Posts: 26
ID: 445635
Credit: 19,311,858
RAC: 13,183
Message 72522 - Posted 15 Mar 2012 18:23:19 UTC - in response to Message ID 72519.

Please post details about correcting the execute bit. I built a new machine over the weekend for R@H and the WU's all failed. As a consequence BOINC/R@H will only give me 8 new work units per day, and those are completed in three hours... a lot of processing time is being wasted. Also wasted were many hours testing the computer trying to figure out why it couldn't get a WU done correctly.

Rocco Moretti

Joined: May 18 10
Posts: 66
ID: 381114
Credit: 585,745
RAC: 0
Message 72523 - Posted 15 Mar 2012 21:24:12 UTC - in response to Message ID 72522.

pvh: I noticed that the 3.24 app did not have the execute bit set after download (in openSUSE 11.4, Boinc 7.0.18), which caused all WUs to fail.


There was nothing different done on our end, with respect to the executable bit, from any of the previous versions, so it's likely it's a Boinc 7 issue.

Note that the Boinc 7.0 series is currently still a development version, and people have reported a number of issues with Boinc 7 and R@h. As it's development code, we're really not supporting Boinc 7 at this point.

In Memory of Kimsey M Fowler Sr: I built a new machine over the weekend for R@H and the WU's all failed.


If you're referring to this machine, it looks like the issue is not a faulty execute bit, but rather the successful completion/Exit status 0/Client Error/missing application version issue that others have experienced. (See http://boinc.bakerlab.org/forum_thread.php?id=5914#72425) We're looking into it, but the best lead so far is that it's related to GPU settings. If it won't impact computing for other projects, try turning off GPU usage for that machine. (Rosetta@home itself does not use GPUs, although other boinc projects you're running might.)

In Memory of Kimsey M Fowler Sr Profile

Joined: Mar 10 12
Posts: 26
ID: 445635
Credit: 19,311,858
RAC: 13,183
Message 72530 - Posted 16 Mar 2012 19:53:16 UTC - in response to Message ID 72523.


If you're referring to this machine, it looks like the issue is not a faulty execute bit, but rather the successful completion/Exit status 0/Client Error/missing application version issue that others have experienced. (See http://boinc.bakerlab.org/forum_thread.php?id=5914#72425) We're looking into it, but the best lead so far is that it's related to GPU settings. If it won't impact computing for other projects, try turning off GPU usage for that machine. (Rosetta@home itself does not use GPUs, although other boinc projects you're running might.)


Thanks for taking time to respond. Yes, that is the correct machine. I set the GPU Activity button to "Suspend GPU", and the last few days WU's appear to be completing normally. I wonder if, like myself, others having experienced a similar problem are running F@H on one or more GPU's and R@H on the CPU? I'm doing this on a second nearly identical machine (computer ID 1498519) without any problems, so I'm thinking that "Suspend GPU" is the ticket.

I am still dealing with the problem of being limited to eight new WU's per day. From poking around various forums it looks like BOINC may take several days to recognize that I can perform additional work units in the allotted time. I'm experimenting with a suggestion to accelerate that process by setting my preferences differently to indicate a connection to the internet every six days and request two days of work at the time even though the machine is always connected.

Rocco Moretti

Joined: May 18 10
Posts: 66
ID: 381114
Credit: 585,745
RAC: 0
Message 72533 - Posted 17 Mar 2012 0:36:22 UTC - in response to Message ID 72530.

I set the GPU Activity button to "Suspend GPU", and the last few days WU's appear to be completing normally.


Interesting ... But I'm wondering why you think they're completing normally, as according to the task list for that computer (http://boinc.bakerlab.org/rosetta/results.php?hostid=1525425) everything for the past couple of days (at least of from 16 Mar 15:00 UTC on back) seems to be still suffering from Client Errors.

In Memory of Kimsey M Fowler Sr Profile

Joined: Mar 10 12
Posts: 26
ID: 445635
Credit: 19,311,858
RAC: 13,183
Message 72536 - Posted 17 Mar 2012 15:15:24 UTC - in response to Message ID 72533.

I set the GPU Activity button to "Suspend GPU", and the last few days WU's appear to be completing normally.


Interesting ... But I'm wondering why you think they're completing normally, as according to the task list for that computer (http://boinc.bakerlab.org/rosetta/results.php?hostid=1525425) everything for the past couple of days (at least of from 16 Mar 15:00 UTC on back) seems to be still suffering from Client Errors.


Yep, I see that now. There were no reports of errors when the jobs completed and I saw credit was awarded. 'I assumed'.... shameful.

I'm going to uninstall and reinstall BOINC for a fresh attempt.

DmGun

Joined: Nov 21 10
Posts: 6
ID: 402480
Credit: 706,645
RAC: 0
Message 72537 - Posted 17 Mar 2012 17:18:04 UTC

After updating to version 3.24
- Tasks are considered from two to seven hours (set to 3:00)
- Granted credit was less than 6 times
- Compute errors
http://boinc.bakerlab.org/rosetta/results.php?userid=402480
Restarting the project has not helped...

ArcSedna

Joined: Oct 23 11
Posts: 6
ID: 434280
Credit: 13,218,434
RAC: 1,784
Message 72540 - Posted 18 Mar 2012 0:19:13 UTC

Recently, "Granted credit" for Mac OS X clients is relatively low compared to one for Windows client.

Client #1 Mac OS X(10.7.3)
- Measured floating point speed 2840.51 million ops/sec
- Measured integer speed 4754.19 million ops/sec
- CPU Time (sec) 21,972.88
- Claimed Credit 96.57
- Granted Credit 14.61
- http://boinc.bakerlab.org/rosetta/workunit.php?wuid=448711523

Client #2 Windows 7
- Measured floating point speed 2271.54 million ops/sec
- Measured integer speed 6904.26 million ops/sec
- CPU Time (sec) 21,434.99
- Claimed Credit 113.82
- Granted Credit 92.04
- http://boinc.bakerlab.org/rosetta/workunit.php?wuid=448711904

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,834,811
RAC: 4,046
Message 72541 - Posted 18 Mar 2012 10:47:26 UTC

Your first example returned 7 decoys, that's a bit over 2 credits per decoy. The second example returned 44 decoys that comes to about the same number of credits per decoy. So the credit-granting is the same (2.09 credit/decoy). To reliably compare performance between the apps the other variables have to be the same. Same CPU, same memory configuration. It could be the 2 hosts are not similar enough to compare them too each other.
____________

DmGun

Joined: Nov 21 10
Posts: 6
ID: 402480
Credit: 706,645
RAC: 0
Message 72542 - Posted 18 Mar 2012 12:27:36 UTC
Last modified: 18 Mar 2012 12:28:11 UTC

transient, I have the same thing happens in OS X 10.7.3
http://boinc.bakerlab.org/rosetta/results.php?userid=402480
see what it was two days ago - all calculated results fell about six times

m2a2b2

Joined: May 10 07
Posts: 2
ID: 175607
Credit: 327,835
RAC: 0
Message 72546 - Posted 18 Mar 2012 21:23:27 UTC

I am also experiencing the same results with MacOS X 10.6.8. All results have dropped to 20-25% of what they were for jobs completed prior to March 16.

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,834,811
RAC: 4,046
Message 72547 - Posted 18 Mar 2012 22:20:28 UTC - in response to Message ID 72542.

transient, I have the same thing happens in OS X 10.7.3
http://boinc.bakerlab.org/rosetta/results.php?userid=402480
see what it was two days ago - all calculated results fell about six times


Comparing the same type of task on the same type of cpu, one a mac (yours) and one a windows 7 computer. I found these the mac: http://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1372047, windows: http://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1336570, I looked up these tasks:

MAC:
======================================================
DONE :: 1 starting structures 10241 cpu seconds
This process generated 6 decoys from 6 attempts
======================================================
BOINC :: WS_max 4.51269e+08

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Valid
Claimed credit 61.2023891428566
Granted credit 12.5052457755095

Windows:
======================================================
DONE :: 1 starting structures 10707.2 cpu seconds
This process generated 28 decoys from 28 attempts
======================================================
BOINC :: WS_max 3.97976e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Valid
Claimed credit 66.716736771631
Granted credit 58.6144700013407


Looking at these 2 tasks, provided it is valid to compare them, performance on the mac is less.
____________

Rocco Moretti

Joined: May 18 10
Posts: 66
ID: 381114
Credit: 585,745
RAC: 0
Message 72549 - Posted 18 Mar 2012 23:35:26 UTC

It looks like the performance of the Rosetta@home application dropped on Macs (we believe all Macs) with 3.24. We're aware of the issue and looking into ways of remedying it.

Note that the low performance is the direct cause of the variable runtimes. The R@h client will try to always produce at least decoy. If execution slows down enough that a job takes 7 hours to produce the first decoy, that workunit will run for 7 hours, even if your runtime setting is 3 hours. But once that first decoy is produced, the client will only start on subsequent decoys if the estimated runtime falls under the run-time limit. So if the first decoy takes 2 hours to complete and your runtime is set for 3 hours, the client will stop early, rather than run for 4 hours.

DmGun

Joined: Nov 21 10
Posts: 6
ID: 402480
Credit: 706,645
RAC: 0
Message 72550 - Posted 18 Mar 2012 23:43:13 UTC

And why?
Before the update to the new kernel was nothing like this. You can see the results for the previous couple of months.
Also, errors were very rare.
Another very bad bug: reboot the client, many jobs are beginning to be at zero (this and the users complain Windows)
CASP9*** is not stable. The big difference in time calculations (from 2.5 to 7 hours) and reset after a reboot.
Sorry for bad english - google translate (((

DmGun

Joined: Nov 21 10
Posts: 6
ID: 402480
Credit: 706,645
RAC: 0
Message 72551 - Posted 18 Mar 2012 23:48:09 UTC

I saw that you already answered ...

Snagletooth

Joined: Feb 22 07
Posts: 193
ID: 149031
Credit: 1,425,415
RAC: 236
Message 72552 - Posted 19 Mar 2012 0:02:53 UTC

Another "Maximum disk usage exceeded" error

CASP9_bj_benchmark_hybridization_run36_T0628_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_44571_2431_0

CPU time 16596.04
cpu run time pref is 28800

Lots of "sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range" in the stdrr out


Best,
Snags

Snagletooth

Joined: Feb 22 07
Posts: 193
ID: 149031
Credit: 1,425,415
RAC: 236
Message 72553 - Posted 19 Mar 2012 10:21:28 UTC

One more: CASP9_bj_benchmark_hybridization_run36_T0601_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_44523_2703_0


<message>
Maximum disk usage exceeded
</message>

CPU time 11739.44

Best,
Snags

Yank Profile
Avatar

Joined: Apr 18 06
Posts: 69
ID: 77735
Credit: 1,643,014
RAC: 0
Message 72556 - Posted 20 Mar 2012 1:37:07 UTC

Out of about 60 work units 5 have "computer error". They ran any where from 1,000 to 3,000 CPU seconds. Is this about normal for Rosetta. Not too concerned, just asking.
____________

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,834,811
RAC: 4,046
Message 72557 - Posted 20 Mar 2012 9:04:42 UTC - in response to Message ID 72556.

Out of about 60 work units 5 have "computer error". They ran any where from 1,000 to 3,000 CPU seconds. Is this about normal for Rosetta. Not too concerned, just asking.


And the names of the tasks begin with CASP9? Those WU's appear to be particularly bothersome. I don't worry too much about them.
Lately the error rate has gone up, especially with the CASP9 tasks. I hope this is not permanent.
____________

Snagletooth

Joined: Feb 22 07
Posts: 193
ID: 149031
Credit: 1,425,415
RAC: 236
Message 72561 - Posted 20 Mar 2012 20:42:21 UTC

<message>
Maximum disk usage exceeded
</message>

It seems unlikely I'm the only one seeing this. My cpu preferred run time is currently 8 hours but this one maxed out in just over 2 hours.

CASP9_bq_benchmark_hybridization_run43_T0617_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45316_32_0


Best,
Snags

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 72564 - Posted 21 Mar 2012 14:03:28 UTC

why are tasks with the names similar to this
rb_03_20_29193_58732__t000__SAVE_ALL_OUT_IGNORE_THE_REST_45388_2181_0
crashing on my system?

I have not changed any settings and can run everything else just fine other than CASP9. I get the error:
Exit status -1073741819 (0xffffffffc0000005)
Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00E82D90 read attempt to address 0x1E5D5000

the wingman ran this just fine. he's running an opteron cpu.
i hardly have any OC on my cpu, so is this just a touchy work unit or what?

Dirk Broer

Joined: Nov 16 05
Posts: 16
ID: 12707
Credit: 1,198,630
RAC: 22
Message 72566 - Posted 21 Mar 2012 14:49:32 UTC

bounds error (radius = -1.#IND, val = -1.#IND), def = SOGFUNC 3 5.69 4.151 0.009 5.717 4.151 0.007 18.3 7.3 0.984

ERROR: Fatal SOGFunc_Impl error.

?? Running on a P4-3200 (Prescott), WinXP 32-bit
____________

Louis A. Hatton

Joined: Oct 5 05
Posts: 2
ID: 2840
Credit: 47,821
RAC: 0
Message 72568 - Posted 21 Mar 2012 16:51:57 UTC - in response to Message ID 72510.

Rosetta@Home has been updated to version 3.24. If you encounter any problems, please let us know. Thank you for your continued support.

Among other things, this release includes support for symmetry in the hybrid protocol for comparative modeling.


____________

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 72573 - Posted 22 Mar 2012 6:24:02 UTC

Hi.

This is my first error with the new app, it ran for just over 3hrs.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=449462958

CASP9_bq_benchmark_hybridization_run43_T0550_1_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45178_168_0


<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>

BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation
Stack trace (15 frames):
[0xa858057]
[0xf7773400]
[0xa09e4cf]
[0xa09f0ec]
[0x9d2da83]
[0x9288587]
[0x8a3bbb0]
[0x93c09a0]
[0x93c336a]
[0x954a627]
[0x95b06f5]
[0x95adf25]
[0x80547ed]
[0xa8e7f78]
[0x8048131]

Exiting...

</stderr_txt>
]]>

____________


P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 72579 - Posted 23 Mar 2012 2:34:41 UTC

Hi.

Another one erred, lost over 3hrs work.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=449624823

CASP9_bq_benchmark_hybridization_run43_T0518_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45099_1038_0

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>

BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation
Stack trace (23 frames):
[0xa858057]
[0xf77ec400]
[0xa55e3f3]
[0xa36da78]
[0xa55c6b1]
[0xa36d66d]
[0xa36d448]
[0x9b2b581]
[0x9a01564]
[0x9d26907]
[0x99193e3]
[0x9d729f0]
[0x9d6c74e]
[0x928aae8]
[0x8a3bbb0]
[0x93c09a0]
[0x93c336a]
[0x954a627]
[0x95b06f5]
[0x95adf25]
[0x80547ed]
[0xa8e7f78]
[0x8048131]

Exiting...

</stderr_txt>
]]>


____________


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 72581 - Posted 23 Mar 2012 19:47:55 UTC

whats with all the errors in CASP9 and rb_03_xxxxxx?
if it begins with one of these names it errors out on my system.
bugs, bugs and more bugs.
thought RALPH was supposed to find these problems and let you know before you release them here?!!

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 72630 - Posted 31 Mar 2012 2:18:41 UTC
Last modified: 31 Mar 2012 2:30:27 UTC

Hi.

I've got two erred task, 1 ran 11sec the other 9sec same error.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=451344754

2K1R_nonoe_broker_SAVE_ALL_OUT_45824_1325_0

=====================================================================

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=451406648

2K1R_nonoe_broker_SAVE_ALL_OUT_45824_1843_0

BOINC:: Worker startup.
Starting watchdog...
Watchdog active.

ERROR: Unable to set up interface foldtree because there are no movable jumps
ERROR:: Exit from: src/protocols/docking/util.cc line: 289
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>
____________


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 72633 - Posted 31 Mar 2012 23:15:39 UTC - in response to Message ID 72581.

whats with all the errors in CASP9 and rb_03_xxxxxx?
if it begins with one of these names it errors out on my system.
bugs, bugs and more bugs.
thought RALPH was supposed to find these problems and let you know before you release them here?!!



seems that these tasks are sensitive to OC. Even the little bit I had going on was causing them to fail. turned it off.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 556
ID: 25524
Credit: 1,559,760
RAC: 1,171
Message 72634 - Posted 1 Apr 2012 6:56:19 UTC

495343739

Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev47790.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/if3dimer_design10monomer_fold_data.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
# cpu_run_time_pref: 7200
Starting work on structure: _00002

</stderr_txt>
]]>

Validate state Invalid
____________

Snagletooth

Joined: Feb 22 07
Posts: 193
ID: 149031
Credit: 1,425,415
RAC: 236
Message 72636 - Posted 1 Apr 2012 11:31:32 UTC

CASP9_bs_benchmark_hybridization_run45_T0581_2_C2_SAVE_ALL_OUT_IGNORE_THE_REST_45708_285 Both copies failed.

On my Mac: Maximum disk usage exceeded after a cpu time of 13288.65,
nan is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: nan is outside of [-1,+1] sin and cos value legal range


On the second machine, a Windows machine on which the workunit failed much faster (1963.429 sec): Incorrect function. (0x1) - exit code 1 (0x1)
Hbond tripped: [2012- 4- 1 0:15:53:]
bounds error (radius = -1.#IND, val = -1.#IND), def = SOGFUNC 1 7.725 4.214 1

ERROR: Fatal SOGFunc_Impl error.
ERROR:: Exit from: ..\..\..\src\core\scoring\constraints\SOGFunc_Impl.cc line: 181

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 72649 - Posted 3 Apr 2012 19:43:45 UTC

CASP9 tasks stink!
I can't run them, but my wingman can.
I shut down any overclocking I had on and they still die on me similar to Snagles post.

I get this Exit status -1073741819 (0xffffffffc0000005) as the error.
I looked through all the goblygook error info and find something about an execution delay.

I bombed 3 tasks in a row in just over 24hrs now.
Gets kind of old.

With no remarks from the team about this problem why should I donate more time?
So I can crash more tasks?

DmGun

Joined: Nov 21 10
Posts: 6
ID: 402480
Credit: 706,645
RAC: 0
Message 72659 - Posted 4 Apr 2012 17:15:27 UTC

> "It looks like the performance of the Rosetta@home application dropped on Macs (we believe all Macs) with 3.24. We're aware of the issue and looking into ways of remedying it."

This is already fixed?

Rocco Moretti

Joined: May 18 10
Posts: 66
ID: 381114
Credit: 585,745
RAC: 0
Message 72662 - Posted 4 Apr 2012 18:24:33 UTC - in response to Message ID 72659.

> "It looks like the performance of the Rosetta@home application dropped on Macs (we believe all Macs) with 3.24. We're aware of the issue and looking into ways of remedying it."

This is already fixed?


Unfortunately, it probably isn't going to get resolved in the foreseeable future. The problem is there's a complex interaction between the Rosetta@home code base and a compiler bug which means that trying to compile with full optimizations just doesn't work (the compiler gets stuck in an infinite loop). While the bug has been fixed in the latest versions of the compiler, using them means that we lose compatibility with all but the most recent MacOS versions.

We've been banging on this multiple ways, and have tried *numerous* settings on multiple machines (you name it, we've probably tried it), and haven't been able to come up with anything which simultaneously allows compiling with optimizations and support for the full range of MacOS versions currently being used. We've made the decision to provide a working, albeit slower, R@h to all Mac users, rather than forcing everyone to update to the latest OS version.

It sucks, but unfortunately it's the position we're stuck with for the foreseeable future.

DmGun

Joined: Nov 21 10
Posts: 6
ID: 402480
Credit: 706,645
RAC: 0
Message 72664 - Posted 4 Apr 2012 23:59:20 UTC - in response to Message ID 72662.

It sucks, but unfortunately it's the position we're stuck with for the foreseeable future.


It is a pity. We'll have to go to the F@H...

b1llyb0y Profile

Joined: May 16 11
Posts: 7
ID: 419514
Credit: 3,048,922
RAC: 854
Message 72666 - Posted 5 Apr 2012 0:45:36 UTC - in response to Message ID 72662.

Maybe you could solve the problem by not sending those particular work units to Mac's?


Unfortunately, it probably isn't going to get resolved in the foreseeable future. The problem is there's a complex interaction between the Rosetta@home code base and a compiler bug which means that trying to compile with full optimizations just doesn't work (the compiler gets stuck in an infinite loop). While the bug has been fixed in the latest versions of the compiler, using them means that we lose compatibility with all but the most recent MacOS versions.

We've been banging on this multiple ways, and have tried *numerous* settings on multiple machines (you name it, we've probably tried it), and haven't been able to come up with anything which simultaneously allows compiling with optimizations and support for the full range of MacOS versions currently being used. We've made the decision to provide a working, albeit slower, R@h to all Mac users, rather than forcing everyone to update to the latest OS version.

It sucks, but unfortunately it's the position we're stuck with for the foreseeable future. [/quote]

Rocco Moretti

Joined: May 18 10
Posts: 66
ID: 381114
Credit: 585,745
RAC: 0
Message 72667 - Posted 5 Apr 2012 2:25:41 UTC - in response to Message ID 72666.

Maybe you could solve the problem by not sending those particular work units to Mac's?


Unfortunately, it's an application-compilation-level problem, rather than a workunit-level problem, so it affects all workunits.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 72668 - Posted 5 Apr 2012 7:24:43 UTC

Rocco,

Since you are reading this thread, I have a question that I can not find an answer to. After shutting down an overclocking program I still have problems processing CASP9 tasks. The majority of them crash, but my wingmen can process the majority of my crashes without any problem.

What is going on? Also the tasks with RB03 were having troubles on my system.

Rocco Moretti

Joined: May 18 10
Posts: 66
ID: 381114
Credit: 585,745
RAC: 0
Message 72674 - Posted 5 Apr 2012 16:39:36 UTC - in response to Message ID 72668.

Rocco,

Since you are reading this thread, I have a question that I can not find an answer to. After shutting down an overclocking program I still have problems processing CASP9 tasks. The majority of them crash, but my wingmen can process the majority of my crashes without any problem.

What is going on? Also the tasks with RB03 were having troubles on my system.


My understanding is that there is an edge case on some of the runs with the very new hybridize protocol (which are mainly being sent out as CASP9 and rb_ runs) which result in numerical instability and range errors in calculations for a substantial fraction of workunits for particular protein systems. The crashes were happening somewhat randomly, so it makes sense that the next person on the same workunit could complete it fine.

The issue should hopefully be fixed in the new version of Rosetta@home we are currently testing on Ralph@home.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,969,735
RAC: 81
Message 72677 - Posted 5 Apr 2012 18:56:54 UTC - in response to Message ID 72674.

Rocco,

Since you are reading this thread, I have a question that I can not find an answer to. After shutting down an overclocking program I still have problems processing CASP9 tasks. The majority of them crash, but my wingmen can process the majority of my crashes without any problem.

What is going on? Also the tasks with RB03 were having troubles on my system.


My understanding is that there is an edge case on some of the runs with the very new hybridize protocol (which are mainly being sent out as CASP9 and rb_ runs) which result in numerical instability and range errors in calculations for a substantial fraction of workunits for particular protein systems. The crashes were happening somewhat randomly, so it makes sense that the next person on the same workunit could complete it fine.

The issue should hopefully be fixed in the new version of Rosetta@home we are currently testing on Ralph@home.


ok, thanks.
You talking about version 3.26 that is out now?

Rocco Moretti

Joined: May 18 10
Posts: 66
ID: 381114
Credit: 585,745
RAC: 0
Message 72681 - Posted 5 Apr 2012 20:39:25 UTC - in response to Message ID 72677.

You talking about version 3.26 that is out now?


Right. 3.26 should hopefully fix most of the CASP9/rb workunit-related issues.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 72686 - Posted 6 Apr 2012 0:35:00 UTC

Hi.

Got two more with errors first one ran for 47min & did the 99 the other ran for 9sec then erred.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=452618941

sh3_d310_design_010_relax_SAVE_ALL_OUT_46196_233_0

cpu_run_time_pref: 14400
======================================================
DONE :: 99 starting structures 2840.33 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
==========================================================

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=452649604

ab_11_29__optpps_T5311_optpps_03_09_35686_254232_0

Setting up checkpointing ...
Setting up graphics native ...
EFPCWLVEEFVVAEECSPCSNFRAKTTPECGPTGYVEKITCSSSKRNEFKSCRSALME
can not find a residue type that matches the residue PRO_p:pro_hydroxylated_case1at position 3

ERROR: core::util::switch_to_residue_type_set fails

ERROR:: Exit from: src/core/util/SwitchResidueTypeSet.cc line: 143
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

____________


Message boards : Number crunching : Rosetta@Home Version 3.24


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^