Rosetta@home

Minirosetta 3.46

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Minirosetta 3.46

Sort
AuthorMessage
Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 75493 - Posted 26 Apr 2013 22:59:12 UTC

minirosetta is updated to 3.46 to include recent developments in electron density and other scoring functions.
This update also fixes a bug in density gradient calculations that drives the reference frame apart and occasionally cause the program to crash in a long simulation.
Post problems related to the update here.

TJ

Joined: Mar 29 09
Posts: 127
ID: 308421
Credit: 4,799,890
RAC: 0
Message 75501 - Posted 27 Apr 2013 12:08:45 UTC

Indeed Yifan, all the cryo's finish without error. So thanks for your effort.
One thing I noticed though is that roughly 1/3 of these cryo's run for little more than 25000 seconds. The other 2/3 in 9000-10000 seconds, which seems normal for a Rosetta task.
____________
Greetings,
TJ.

fcbrants

Joined: Mar 25 13
Posts: 1
ID: 472655
Credit: 473,075
RAC: 0
Message 75509 - Posted 27 Apr 2013 18:57:33 UTC

I restarted the project last night & ran through a few WU's (stopped taking WU's 4/24/ & restarted 4/27), working only on Rosetta. I checked the task list today (have Not been watching it closely) & found an errored WU. It was sitting in the task list with "Computation Error" & I found its entry in the event log:

Computation for task e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2 finished
Output file e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2_0 for task e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2 absent

It soon spooled off of the task list & was gone.

Hope this helps, feel free to contact me if you need to do any troubleshooting.

My machine:

http://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1606827

and results:

http://boinc.bakerlab.org/rosetta/results.php?hostid=1606827

and the offending WU:

http://boinc.bakerlab.org/rosetta/result.php?resultid=578113826

Franko

JAMES DORISIO Profile

Joined: Dec 25 05
Posts: 14
ID: 43247
Credit: 56,315,530
RAC: 48,209
Message 75510 - Posted 27 Apr 2013 19:06:29 UTC

The applications page still shows Windows/x86 as 3.45 and I am getting Rosetta Mini 3.45 with windows xp x86 32 bit computers.
Are you planing to upgrade this version also.
Thanks Jim.


JKitterman

Joined: Oct 21 05
Posts: 11
ID: 5963
Credit: 814,463
RAC: 0
Message 75511 - Posted 27 Apr 2013 19:24:44 UTC - in response to Message ID 75493.
Last modified: 27 Apr 2013 19:25:40 UTC

Link to Cryo that ran for over 25000 seconds
http://boinc.bakerlab.org/rosetta/result.php?resultid=577970219
It pretty much repeats the sin_cos_range Error the whole time as below.
I only checked three of them but they all appeared to have the same issue

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<stderr_txt>
ROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 75512 - Posted 27 Apr 2013 20:54:19 UTC - in response to Message ID 75509.

I restarted the project last night & ran through a few WU's (stopped taking WU's 4/24/ & restarted 4/27), working only on Rosetta. I checked the task list today (have Not been watching it closely) & found an errored WU. It was sitting in the task list with "Computation Error" & I found its entry in the event log:

Computation for task e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2 finished
Output file e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2_0 for task e6-52-1-46_abinitio_SAVE_ALL_OUT_78312_744_2 absent

It soon spooled off of the task list & was gone.

Hope this helps, feel free to contact me if you need to do any troubleshooting.

My machine:

http://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1606827

and results:

http://boinc.bakerlab.org/rosetta/results.php?hostid=1606827

and the offending WU:

http://boinc.bakerlab.org/rosetta/result.php?resultid=578113826

Franko


Franko's task output shows 9 seconds of CPU time and the following:

ERROR: ERROR: FragmentIO: could not open file start.200.9mers
ERROR:: Exit from: ..\..\..\src\core\fragment\FragmentIO.cc line: 233
std::cerr: Exception was thrown:


[ERROR] EXCN_utility_exit has been thrown from: ..\..\..\src\core\fragment\FragmentIO.cc line: 233
ERROR: ERROR: FragmentIO: could not open file start.200.9mers

____________
Rosetta Moderator: Mod.Sense

Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 75513 - Posted 27 Apr 2013 21:56:22 UTC
Last modified: 28 Apr 2013 0:26:04 UTC

Thanks guys! The cryo jobs use a different protocol, so they do run longer.
Let me take a look at the sin_cos_range error. That was the error I eventually saw with the bug in the 3.45 version. I'll check to see if there's anything else still causing the problem.
I think the Windows/x86 one is only for graphic interface, the actual minirosetta program runs on the platform "Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU". I'll double check with DEK to make sure.
I'll also tell the user running the abinitio job to pay attention to their input files.

Yifan

PS: the cryo_bf... jobs are from earlier, I think the input files might be been screwed up already with earlier iterations using the old release.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75516 - Posted 28 Apr 2013 2:33:51 UTC

This one ran under the new app & had this error message 99 times by the look of it, I haven't counted them ;) you can if you like.

CASP9_fb_benchmark_hybridization_run54_T0613_0_D2_SAVE_ALL_OUT_IGNORE_THE_REST_48029_1425_1

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=524562110



ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93

======================================================
DONE :: 99 starting structures 1384.62 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

____________


chip Profile
Avatar

Joined: Oct 2 07
Posts: 1
ID: 209261
Credit: 100,133
RAC: 0
Message 75518 - Posted 28 Apr 2013 4:28:03 UTC
Last modified: 28 Apr 2013 5:01:12 UTC

3 instruction per cycle and 100% L2/L3 Hit Ratio on Sandy Bridge CPU - nice performance for 25000s cryo's tasks!

P.S.: xxx.A_ and CASP9_ tasks have 1 IPC and 60%/40% L2/L3 Hit Ratio...

Brian Priebe

Joined: Nov 27 09
Posts: 13
ID: 360315
Credit: 17,713,026
RAC: 21,007
Message 75519 - Posted 28 Apr 2013 5:45:40 UTC

rb_04_26_38593_73094__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79295_1489_0 (task ID 578191167) died with exit status -1 in less than 12sec using the new code.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 75520 - Posted 28 Apr 2013 6:01:07 UTC - in response to Message ID 75511.

Link to Cryo that ran for over 25000 seconds
http://boinc.bakerlab.org/rosetta/result.php?resultid=577970219
It pretty much repeats the sin_cos_range Error the whole time as below.
I only checked three of them but they all appeared to have the same issue

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<stderr_txt>
ROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range


A similar failed workunit for me:

http://boinc.bakerlab.org/rosetta/result.php?resultid=577969315

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75521 - Posted 28 Apr 2013 8:02:44 UTC
Last modified: 28 Apr 2013 8:04:51 UTC

Hi Yifan.

I had this one finish O.K. but showing this error message.

I also have another 3 cryo tasks that are running overtime and have only check pointed once, I'll let them go to see what happens.


cryo_bh__chain_K_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79122_932_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887806

# cpu_run_time_pref: 21600
dof_atom1 atomno= 3 rsd= 1
atom1 atomno= 1 rsd= 1
atom2 atomno= 2 rsd= 1
atom3 atomno= 5 rsd= 1
atom4 atomno= 6 rsd= 1
THETA1 nan
THETA3 nan
PHI2 0

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 780
======================================================
DONE :: 52 starting structures 21197.3 cpu seconds
This process generated 52 decoys from 52 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Valid
____________


Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 75526 - Posted 28 Apr 2013 21:13:14 UTC - in response to Message ID 75516.

CASP9_fb is a really old batch of jobs. The symmetry definition IO changed since then. So the new executable shouldn't work on them any more. "com" defines the center of mass, and I believe the naming was changed to avoid confusion.
y

This one ran under the new app & had this error message 99 times by the look of it, I haven't counted them ;) you can if you like.

CASP9_fb_benchmark_hybridization_run54_T0613_0_D2_SAVE_ALL_OUT_IGNORE_THE_REST_48029_1425_1

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=524562110



ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93

======================================================
DONE :: 99 starting structures 1384.62 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 75527 - Posted 28 Apr 2013 21:15:31 UTC - in response to Message ID 75519.

rb_04_26_38593_73094__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79295_1489_0 (task ID 578191167) died with exit status -1 in less than 12sec using the new code.


I'm running a local test now, and it's been running for 20 min now and still going. Maybe there's some downloading errors that make the input files incomplete?

Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 75529 - Posted 28 Apr 2013 21:28:11 UTC - in response to Message ID 75521.

Hi Yifan.

I had this one finish O.K. but showing this error message.

I also have another 3 cryo tasks that are running overtime and have only check pointed once, I'll let them go to see what happens.


cryo_bh__chain_K_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79122_932_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887806

# cpu_run_time_pref: 21600
dof_atom1 atomno= 3 rsd= 1
atom1 atomno= 1 rsd= 1
atom2 atomno= 2 rsd= 1
atom3 atomno= 5 rsd= 1
atom4 atomno= 6 rsd= 1
THETA1 nan
THETA3 nan
PHI2 0

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 780
======================================================
DONE :: 52 starting structures 21197.3 cpu seconds
This process generated 52 decoys from 52 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Valid


That comes from the same problem as the sin_cos_range error. I'm looking into it now. The bug in the gradient calculations makes the transformation matrix non-orthogonal, which is why arcsin(cos) gets the bigger-than-one input, an then some angels become NaN

bfromcolo

Joined: Apr 25 13
Posts: 1
ID: 474563
Credit: 194,854
RAC: 0
Message 75530 - Posted 28 Apr 2013 21:48:24 UTC

I am new at this, just got things running over the weekend. But I just aborted three of these that had made no progress on their ETA in hours, I assumed it was looping. Looking at my task log I see another 3 that went over 25000 seconds, instead a more normal 10500 seconds. They had all rapidly gotten to 9 - 15 minutes remaining and then the ETA just stopped updating, or was decreasing at a very slow rate, like 1 sec every 5 min. Is there any point allowing these to continue to run once they stop updating the ETA while continuing to consume CPU? When it finally does complete is it returning anything useful?

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75531 - Posted 28 Apr 2013 22:00:51 UTC
Last modified: 28 Apr 2013 22:06:04 UTC

Hi Yifan.

These finished late last night there, all 3 only check pointed once at around an 1 hour.

The first 2 are from my i7 2700k, the watch dog kill them by the look of it at 10hrs, the other is from my x6 1055T again at 10hrs.


http://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887852

cryo_bh__chain_N_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79121_294_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=524887800

cryo_bg__chain_K_subrun_002_SAVE_ALL_OUT_IGNORE_THE_REST_79130_932_0


# cpu_run_time_pref: 21600
BOINC:: CPU time: 36269.9s, 14400s + 21600s[2013- 4-28 19:36:38:] :: BOINC
InternalDecoyCount: 11
======================================================
DONE :: 2 starting structures 36269.9 cpu seconds
This process generated 11 decoys from 11 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation

</stderr_txt>
]]>

Validate state Invalid

=============================================================

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=524910872

cryo_bh__chain_e_l_subrun_003_SAVE_ALL_OUT_IGNORE_THE_REST_79150_107_0


# cpu_run_time_pref: 21600
BOINC:: CPU time: 36377.6s, 14400s + 21600s[2013- 4-28 21: 2:59:] :: BOINC
InternalDecoyCount: 8
======================================================
DONE :: 2 starting structures 36377.6 cpu seconds
This process generated 8 decoys from 8 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation
Stack trace (21 frames):
[0xb2aef87]
[0xf77c0400]
[0xa6ce54c]
[0xa6e7659]
[0xa1648c7]
[0xa1f2dd2]
[0xa1f4df1]
[0x9d4d1a5]
[0x9f10187]
[0x9d56457]
[0x9d4265a]
[0x8925eca]
[0x8681018]
[0x992d14f]
[0x9931429]
[0x9aebcad]
[0x9b4f815]
[0x9b4d045]
[0x8054950]
[0xb33f328]
[0x8048131]

Exiting...

</stderr_txt>
]]>

Validate state Valid
____________


robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 75532 - Posted 29 Apr 2013 1:17:44 UTC

Yifan,

Two cryo workunits where most but not all of the decoys gave the sin and cos range error:

sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range

http://boinc.bakerlab.org/rosetta/result.php?resultid=578116849

http://boinc.bakerlab.org/rosetta/result.php?resultid=578212532


Another one where it has been nearly an hour since the last checkpoint. Not clear if this indicates a problem.

http://boinc.bakerlab.org/rosetta/result.php?resultid=578064578

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75533 - Posted 29 Apr 2013 1:26:18 UTC

Hi.

I can only speak for myself, but I'm seeing more invalid tasks & no/few check pointing since the update.?



____________


Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 75535 - Posted 29 Apr 2013 2:02:59 UTC - in response to Message ID 75527.

rb_04_26_38593_73094__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79295_1489_0 (task ID 578191167) died with exit status -1 in less than 12sec using the new code.


I'm running a local test now, and it's been running for 20 min now and still going. Maybe there's some downloading errors that make the input files incomplete?


OK, found the problem with this one. It comes from our robetta server using a parameter to randomly trigger a deprecated function. I just changed the server to disable that mechanism.

Brian Priebe

Joined: Nov 27 09
Posts: 13
ID: 360315
Credit: 17,713,026
RAC: 21,007
Message 75536 - Posted 29 Apr 2013 6:05:51 UTC - in response to Message ID 75493.

minirosetta is updated to 3.46 to include recent developments in electron density and other scoring functions.
Are there any particular minimum requirements for this version? Despite resetting the project, I am still getting only 3.45 jobs on a 6.x version of BOINC.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75537 - Posted 29 Apr 2013 7:51:51 UTC

Another that one erred after 28min.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=525210001

rb_04_24_37778_72771__t000__2_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79247_1030_0

# cpu_run_time_pref: 21600
dof_atom1 atomno= 3 rsd= 2
atom1 atomno= 1 rsd= 2
atom2 atomno= 2 rsd= 2
atom3 atomno= 5 rsd= 2
atom4 atomno= 6 rsd= 2
THETA1 nan
THETA3 nan
PHI2 0

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 780
SIGSEGV: segmentation violation
Stack trace (17 frames):
[0xb2aef87]
[0xf7703400]
[0xa166837]
[0xa1f3edc]
[0xa1f4e3c]
[0x996c8d6]
[0x996df60]
[0x89561af]
[0x867d35e]
[0x992d14f]
[0x9931429]
[0x9aebcad]
[0x9b4f815]
[0x9b4d045]
[0x8054950]
[0xb33f328]
[0x8048131]

Exiting...

</stderr_txt>
]]>



____________


[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 75538 - Posted 29 Apr 2013 10:05:23 UTC - in response to Message ID 75536.

Despite resetting the project, I am still getting only 3.45 jobs on a 6.x version of BOINC.


Still here, on 7.0.x version on boinc
Only 3.45....
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 75540 - Posted 29 Apr 2013 16:53:46 UTC - in response to Message ID 75536.

minirosetta is updated to 3.46 to include recent developments in electron density and other scoring functions.
Are there any particular minimum requirements for this version? Despite resetting the project, I am still getting only 3.45 jobs on a 6.x version of BOINC.


The best I can tell, only the cryo workunits need the new features of 3.46.

Brian Priebe

Joined: Nov 27 09
Posts: 13
ID: 360315
Credit: 17,713,026
RAC: 21,007
Message 75541 - Posted 29 Apr 2013 17:47:54 UTC - in response to Message ID 75540.

The best I can tell, only the cryo workunits need the new features of 3.46.

Yet on BOINC 7.0.28 under Windows 7 (64-bit), all work units are delivered to run under Mini Rosetta app 3.46. Such is not the case under BOINC 6.12.33 running under Windows 2003 Server (32-bit): they all still run app 3.45. And of course the "cryo" series are still bombing out under 3.45.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75543 - Posted 29 Apr 2013 23:10:10 UTC - in response to Message ID 75541.

The best I can tell, only the cryo workunits need the new features of 3.46.

Yet on BOINC 7.0.28 under Windows 7 (64-bit), all work units are delivered to run under Mini Rosetta app 3.46. Such is not the case under BOINC 6.12.33 running under Windows 2003 Server (32-bit): they all still run app 3.45. And of course the "cryo" series are still bombing out under 3.45.



Hi.

For whatever reason windows 32 hasn't been updated, see the app page.

Windows/x86 3.45 14 Nov 2012 19:40:42 UTC

____________


Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 75545 - Posted 30 Apr 2013 0:52:00 UTC
Last modified: 30 Apr 2013 0:53:35 UTC

OK, I got it wrong earlier. The Windows/x86 version is for the actual application, not the graphics. For some reason that file didn't get updated the last time I ran the script. I just reran the update, and it looks ok now. I didn't even think that would be the problem. Sorry about the confusion.

Yury Naydenov

Joined: Jun 17 12
Posts: 3
ID: 453191
Credit: 1,780,713
RAC: 2,508
Message 75551 - Posted 30 Apr 2013 20:38:15 UTC
Last modified: 30 Apr 2013 20:42:20 UTC

cryo 100,853.60 CPU Time
cryo 101,225.50 CPU Time

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75558 - Posted 2 May 2013 22:49:21 UTC
Last modified: 2 May 2013 22:50:08 UTC

Hi.

I haven't had error like this for a long time, it ran for over 5hrs.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=525740466

CASP9_bw_benchmark_hybridization_run49_T0534_2_C1_SAVE_ALL_OUT_IGNORE_THE_REST_46345_6031_0

# cpu_run_time_pref: 21600

Client error___Compute error___18,861.34

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>
____________


P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75567 - Posted 5 May 2013 21:53:35 UTC
Last modified: 5 May 2013 22:03:02 UTC

This thing had ran for over 9hrs, 40min without a check point, so when I restarted the rig this morning it went back to the 0/start again so I've aborted it.


endo_ab_Pan927.run.12_SAVE_ALL_OUT_IGNORE_THE_REST_79957_24_0


http://boinc.bakerlab.org/rosetta/workunit.php?wuid=526157680

=============================

Also another one of these.

rb_04_24_37778_72771__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79247_1579_1

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=525205524

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>

<stderr_txt>
____________


P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75577 - Posted 7 May 2013 7:51:58 UTC
Last modified: 7 May 2013 7:53:48 UTC

I'm getting sick of this, over 10hrs run time then this Not impressed at all.

If the watchdog killed it, why didn't it do so earlier?

cryo_bg__chain_d_l_subrun_003_SAVE_ALL_OUT_IGNORE_THE_REST_79148_309_1

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=524912436


# cpu_run_time_pref: 21600
BOINC:: CPU time: 36259.9s, 14400s + 21600s[2013- 5- 7 14:35:29:] :: BOINC
InternalDecoyCount: 45
======================================================
DONE :: 2 starting structures 36259.9 cpu seconds
This process generated 45 decoys from 45 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 373.059569535114
Granted credit 0
application version 3.46
____________


Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 75581 - Posted 7 May 2013 22:25:07 UTC

I've been running debugging from my side for the last week on the same set of jobs, it's running a lot slower with the debug mode, so I haven't consistently reproduce the seg fault yet. My suspicion is that something is still not quite fixed in the gradient calculations.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 75582 - Posted 8 May 2013 3:29:57 UTC - in response to Message ID 75577.

I'm getting sick of this, over 10hrs run time then this Not impressed at all.

If the watchdog killed it, why didn't it do so earlier?

cryo_bg__chain_d_l_subrun_003_SAVE_ALL_OUT_IGNORE_THE_REST_79148_309_1

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=524912436


Looks like you might get more useful results if you decrease the allowed runtime for your workunits enough that they won't run over 10 hours.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75586 - Posted 9 May 2013 6:00:52 UTC

I don't know what I've done to deserve these things, got'a love that credit for 8hrs work.

rb_05_07_37484_73467__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_80583_498_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=526810531

Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
BOINC:: CPU time: 28814.8s, 14400s + 14400s[2013- 5- 9 15:34:10:] :: BOINC
InternalDecoyCount: 3
======================================================
DONE :: 2 starting structures 28814.8 cpu seconds
This process generated 3 decoys from 3 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation
Stack trace (21 frames):
[0xb2aef87]
[0xf7745400]
[0xa6ce54c]
[0xa6e7659]
[0xa1648c7]
[0xa1f2dd2]
[0xa1f4df1]
[0x9d4d1a5]
[0x9f10187]
[0x9d56457]
[0x9d4265a]
[0x8925eca]
[0x8681018]
[0x992d14f]
[0x9931429]
[0x9aebcad]
[0x9b4f815]
[0x9b4d045]
[0x8054950]
[0xb33f328]
[0x8048131]

Exiting...

</stderr_txt>
]]>

Validate state__Valid
Claimed credit__222.194343136291
Granted credit__7.33239552472893
application version__3.46

____________


P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75591 - Posted 9 May 2013 22:44:54 UTC
Last modified: 9 May 2013 23:13:08 UTC

Something odd with these new tasks, showing 0 for everything but are valid.

Both ran for longer then runtime shown, they went back when restarted & they both ran for over 7hrs.

idealdead2_test_abrelax_nohoms_1l9l_SAVE_ALL_OUT_80619_74_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=526963404

# cpu_run_time_pref: 14400
======================================================
DONE :: 0 starting structures 11822.6 cpu seconds
This process generated 0 decoys from 0 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Valid
Claimed credit 91.1614143589971
Granted credit 125.885524056059
application version 3.46

==============================================

idealdead2_test_abrelax_homs_1l9l_SAVE_ALL_OUT_80618_77_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=526964451

# cpu_run_time_pref: 14400
======================================================
DONE :: 0 starting structures 3917.32 cpu seconds
This process generated 0 decoys from 0 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Valid
Claimed credit 40.3030670421785
Granted credit 43.9948759779241
application version 3.46
____________


Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,238,180
RAC: 4,709
Message 75592 - Posted 10 May 2013 5:27:52 UTC

I got a bunch of WUs that failed after hours of crunching (from different machines):

http://boinc.bakerlab.org/rosetta/result.php?resultid=580361643
http://boinc.bakerlab.org/rosetta/result.php?resultid=580450964
http://boinc.bakerlab.org/rosetta/result.php?resultid=580286623
http://boinc.bakerlab.org/rosetta/result.php?resultid=580286646
____________

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75593 - Posted 10 May 2013 5:58:20 UTC

Another one of these died, after 4sec's.

idealdead2_test_abrelax_nohoms_1jf8_SAVE_ALL_OUT_80619_456_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=527126051

Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
DKKTIYFISTGNSARSQMAEGWGKEILGEGWNVYSAGIETHGVNPKAIEAMKEVDIDISNHTSDLIDNDILKQSDLVVTLCSDADNNCPILPPNVKKEHWGFDDPAGKEWSEFQRVRDEIKLAIEKFKLRX
can not find a residue type that matches the residue Kat position 131

ERROR: core::util::switch_to_residue_type_set fails

ERROR:: Exit from: src/core/util/SwitchResidueTypeSet.cc line: 143
std::cerr: Exception was thrown:


[ERROR] EXCN_utility_exit has been thrown from: src/core/util/SwitchResidueTypeSet.cc line: 143
ERROR: core::util::switch_to_residue_type_set fails



</stderr_txt>
]]>

____________


Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 75596 - Posted 10 May 2013 20:15:42 UTC

Thanks! I'll let the person running the idealdead2 jobs know about this. -y

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 75599 - Posted 11 May 2013 2:27:52 UTC

A workunit that may have a problem:

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=527210644
another idealdead2 workunit

Now at 15:51:54 elapsed, even though I've set 12 hours as the workunit length for my computers.

Showing 98.956% progress, slowly increasing.

Remaining time shown as 00:10:03 and NOT decreasing.

No error messages visible.

I'll let it go to at least 16 hours before deciding if I should abort it.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 75600 - Posted 11 May 2013 3:49:12 UTC - in response to Message ID 75599.

A workunit that may have a problem:

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=527210644
another idealdead2 workunit

Now at 15:51:54 elapsed, even though I've set 12 hours as the workunit length for my computers.

Showing 98.956% progress, slowly increasing.

Remaining time shown as 00:10:03 and NOT decreasing.

No error messages visible.

I'll let it go to at least 16 hours before deciding if I should abort it.


It finished in a little more than 16 hours, and was declared a success.

However, several dozen of these errors in the output:

sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range

Also around 20 NaN errors.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75606 - Posted 12 May 2013 8:48:16 UTC

And another three of these, this 1 ran for 8hrs my run time is now 4hrs so why didn't it stop earlier.

The other 2 I aborted at over 6hrs because I couldn't see them finishing without getting an error & wasting more time anyway.

B.T.W. they ran non-stop for all that time, so I don't know why it's saying that there is 2 starting structures I think it normally says 1?


rb_05_10_38828_73745__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_80811_594_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=527333216

# cpu_run_time_pref: 14400
BOINC:: CPU time: 29192.9s, 14400s + 14400s[2013- 5-12 18:22:12:] :: BOINC
InternalDecoyCount: 12
======================================================
DONE :: 2 starting structures 29192.9 cpu seconds
This process generated 12 decoys from 12 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 298.87771452369
Granted credit 0
application version 3.46
____________


P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75614 - Posted 14 May 2013 22:08:51 UTC

Still getting errors on these tasks, I'm aborting any of these I see.

rb_05_12_38289_72979__t000__3_C1_SAVE_ALL_OUT_IGNORE_THE_REST_80960_200_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=527807465

# cpu_run_time_pref: 14400
dof_atom1 atomno= 3 rsd= 1
atom1 atomno= 1 rsd= 1
atom2 atomno= 2 rsd= 1
atom3 atomno= 5 rsd= 1
atom4 atomno= 6 rsd= 1
THETA1 nan
THETA3 nan
PHI2 0

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 780

ERROR: unknown atom_name: PRO NV
ERROR:: Exit from: src/core/chemical/ResidueType.cc line: 2016
SIGSEGV: segmentation violation
Stack trace (17 frames):
[0xb2aef87]
[0xf7735400]
[0xa166837]
[0xa1f3edc]
[0xa1f4e3c]
[0x996c8d6]
[0x996df60]
[0x89561af]
[0x867d35e]
[0x992d14f]
[0x9931429]
[0x9aebcad]
[0x9b4f815]
[0x9b4d045]
[0x8054950]
[0xb33f328]
[0x8048131]

Exiting...

</stderr_txt>
]]>
____________


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 75655 - Posted 23 May 2013 10:30:31 UTC

These hyb_cb_bench_(etc) tasks are really poor credit tasks.
they complete ok, but give you only 20 credits for your work.

James W

Joined: Nov 25 12
Posts: 11
ID: 463505
Credit: 232,676
RAC: 329
Message 75657 - Posted 24 May 2013 5:04:25 UTC

I've noticed over the last couple weeks that there have been several types of jobs I haven't seen before (some beginning with the "hyb" or "hybred," "cyto," etc.) These jobs are not setting checkpoints, even after crunching up to 11 hours or so (with checkpoint limited to no more than every 60 sec. in computer pref.) The jobs starting "rb_5_17" and other "dates" continue to have checkpoints as usual.

The problem, as noted in another recent thread, is that if I must shut down my system or reboot (such as for doing Windows updates, updating applications, etc.), or if I must close BIONC, I lose all the work in these "new" type jobs without checkpoints.

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 75658 - Posted 24 May 2013 12:31:51 UTC - in response to Message ID 75657.

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.


Target CPU run time: 1 hour..... :-)

____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 75659 - Posted 24 May 2013 12:56:41 UTC - in response to Message ID 75658.

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.


Target CPU run time: 1 hour..... :-)


That should work if you're allowed to set it that low. If I remember correctly, though, the minimum is now 3 hours.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 75660 - Posted 24 May 2013 15:13:57 UTC - in response to Message ID 75659.

That should work if you're allowed to set it that low. If I remember correctly, though, the minimum is now 3 hours.


My run time is 2 hours....

____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 75774 - Posted 19 Jun 2013 7:59:07 UTC

A lot of rb_number wu's cannot start graphic (and, i think, calculation).
There is a simple green line and 0 steps.
I kill these wus
____________

mikey
Avatar

Joined: Jan 5 06
Posts: 1445
ID: 47185
Credit: 3,503,433
RAC: 0
Message 75778 - Posted 19 Jun 2013 15:18:52 UTC - in response to Message ID 75655.

These hyb_cb_bench_(etc) tasks are really poor credit tasks.
they complete ok, but give you only 20 credits for your work.


I am SOOOO glad I am almost outa here!! My cryo tasks are again taking 7 hours to finish, I just use the defaults, and I am getting 20 to 25 frickin credits for them. NOW the same thing is happening with the RB units, PATHETIC!!! My eb units are doing okay but it is a pain trying to keep 10 systems clear of all the bad units!! I AM trying to help but my rac is DECLINING and my work output is RISING, that is just NOT RIGHT!!!

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 507,926
RAC: 0
Message 75782 - Posted 21 Jun 2013 22:26:00 UTC - in response to Message ID 75774.

A lot of rb_number wu's cannot start graphic (and, i think, calculation).
There is a simple green line and 0 steps.
I kill these wus

I am running rb_06_21_39751_75850__t000__3_C1_SAVE_ALL_OUT_IGNORE_THE_REST_87610_60 the default run time (3 hours) and the graphics are working perfectly.
____________
Have a crunching good day!!

mikey
Avatar

Joined: Jan 5 06
Posts: 1445
ID: 47185
Credit: 3,503,433
RAC: 0
Message 75788 - Posted 23 Jun 2013 14:59:03 UTC

How is THIS my fault?
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=534659615

stderr out

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>
of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range

etc, etc, ETC!!!

The pc is:
CPU type AuthenticAMD
AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]
Number of CPUs 6
Operating System Microsoft Windows 7
Ultimate x64 Edition, Service Pack 1, (06.01.7601.00)
Memory 12283.63 MB
Cache 512 KB
Swap space 24565.44 MB

That is SIX cpu's with ONLY five crunching, the other is being used to support the gpu, with some left over for whatever.

The unit took "CPU time 14335.48", ie 4 HOURS, and then it just errored out!! What kind of project is this right now?!!! LOTS of problems with the different kinds of units yet they are being released like this is a BETA project or something!! Rosetta is SUPPOSED to be about the SCIENCE, not releasing units to 'see if they work or not'!! I thought that's what the Beta Project was all about, testing the units PRIOR to them being released here!!!

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 75790 - Posted 23 Jun 2013 20:32:29 UTC - in response to Message ID 75788.

How is THIS my fault?
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=534659615


Rosetta@Home doesn't use beta testing - they use alpha testing (RALPH@Home) instead. Does this make them think that Rosetta@Home should do the beta testing instead? If so, the results are often worse than most of the alpha test BOINC projects I have my computers participate in.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75791 - Posted 24 Jun 2013 6:48:59 UTC

Had a couple of these tasks error today, this message goes on for a few pages.

CASP9_fb_benchmark_hybridization_run54_T0534_1_C2_SAVE_ALL_OUT_IGNORE_THE_REST_47953_1846_1

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=534900423

ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93
# cpu_run_time_pref: 21600


ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93

ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93
======================================================
DONE :: 99 starting structures 1201 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish
____________


[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 75792 - Posted 24 Jun 2013 8:32:37 UTC - in response to Message ID 75790.
Last modified: 24 Jun 2013 8:33:18 UTC

Rosetta@Home doesn't use beta testing - they use alpha testing (RALPH@Home) instead. Does this make them think that Rosetta@Home should do the beta testing instead? If so, the results are often worse than most of the alpha test BOINC projects I have my computers participate in.


I partecipate to ralph@home project, but sometimes i think that the possibility to test largely the new version/new code/etc is VERY understimated.
And admins do not partecipate on test forum.....
____________

mikey
Avatar

Joined: Jan 5 06
Posts: 1445
ID: 47185
Credit: 3,503,433
RAC: 0
Message 75795 - Posted 24 Jun 2013 11:02:22 UTC - in response to Message ID 75792.

Rosetta@Home doesn't use beta testing - they use alpha testing (RALPH@Home) instead. Does this make them think that Rosetta@Home should do the beta testing instead? If so, the results are often worse than most of the alpha test BOINC projects I have my computers participate in.


I partecipate to ralph@home project, but sometimes i think that the possibility to test largely the new version/new code/etc is VERY understimated.
And admins do not partecipate on test forum.....


To be honest until some of us screamed, yelled and started aborting all the cryo units recently the Admins aren't HERE either!! I guess they 'are too busy' to waste their time seeing if what they designed actually works in the REAL WORLD!!

mikey
Avatar

Joined: Jan 5 06
Posts: 1445
ID: 47185
Credit: 3,503,433
RAC: 0
Message 75796 - Posted 24 Jun 2013 13:28:58 UTC
Last modified: 24 Jun 2013 13:30:14 UTC

Here is ANOTHER "hyb-ab-bench" unit that just cost me SEVEN HOURS of crunching time and THEN errored out:
http://boinc.bakerlab.org/rosetta/result.php?resultid=588986073

The reason:
</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hyb_ab_bench_4aimA_SAVE_ALL_OUT_IGNORE_THE_REST_53960_1303_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

UPLOAD FAILURE----WTF are you telling me? Is Rosetta telling me that AFTER SEVEN HOURS of crunching a unit fails to upload and I will get NO CREDITS for it???!!!!!!!! WHERE is the Scientist who designed these things? Why is SOMEONE not here explaining what the heck is going on?!!!! This is JUST ONE of my pc's here, I have NOT checked the others, but it is NOT the same one as the last problem I posted about having problems with!!!

TJ

Joined: Mar 29 09
Posts: 127
ID: 308421
Credit: 4,799,890
RAC: 0
Message 75799 - Posted 24 Jun 2013 19:07:39 UTC - in response to Message ID 75778.

These hyb_cb_bench_(etc) tasks are really poor credit tasks.
they complete ok, but give you only 20 credits for your work.


I am SOOOO glad I am almost outa here!! My cryo tasks are again taking 7 hours to finish, I just use the defaults, and I am getting 20 to 25 frickin credits for them. NOW the same thing is happening with the RB units, PATHETIC!!! My eb units are doing okay but it is a pain trying to keep 10 systems clear of all the bad units!! I AM trying to help but my rac is DECLINING and my work output is RISING, that is just NOT RIGHT!!!

It is indeed not right. The server code is obsolete and a cause of the low credit, for any WU. But they will not update it here unless it totally crashed and cannot be brought to live again... shame.
They are lucky that their research is quite important, otherwise...there are many many projects that need CPU-time.
____________
Greetings,
TJ.

TJ

Joined: Mar 29 09
Posts: 127
ID: 308421
Credit: 4,799,890
RAC: 0
Message 75800 - Posted 24 Jun 2013 19:10:10 UTC - in response to Message ID 75658.

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.


Target CPU run time: 1 hour..... :-)

I have seen this on preferences but have now idea what it does or where it is for.
Can someone please explain this?
____________
Greetings,
TJ.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 75803 - Posted 25 Jun 2013 1:59:42 UTC - in response to Message ID 75800.
Last modified: 25 Jun 2013 2:02:30 UTC

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.


Target CPU run time: 1 hour..... :-)

I have seen this on preferences but have now idea what it does or where it is for.
Can someone please explain this?


Rosetta@Home workunits are set up in usually 100 sections, called decoys. They try to run however many of these decoys they expect to finish in the target CPU run time, but can go over if the last one takes longer than expected.

I'm not sure if the shutdown code runs properly if the last decoy that was finished reported an error instead of a good answer.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 75805 - Posted 25 Jun 2013 8:09:19 UTC

Looks like I'm going to have to take the big hammer to some of these tasks, I'm not amused at all. My 6hr runtime ended up over 10hrs.

CASP9_bw_benchmark_hybridization_run49_T0606_1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_46414_1348_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=534948991

Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_CASP9_bw_benchmark_hybridization_run49_T0606_1_C1_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 21600
BOINC:: CPU time: 36341.9s, 14400s + 21600s[2013- 6-25 17:58:54:] :: BOINC
InternalDecoyCount: 2
======================================================
DONE :: 2 starting structures 36341.9 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================
called boinc_finish
SIGSEGV: segmentation violation
Stack trace (21 frames):
[0xb2aef87]
[0xf777f400]
[0xa6ce54c]
[0xa6e7659]
[0xa1648c7]
[0xa1f2dd2]
[0xa1f4df1]
[0x9d4d1a5]
[0x9f10187]
[0x9d56457]
[0x9d4265a]
[0x8925eca]
[0x8681018]
[0x992d14f]
[0x9931429]
[0x9aebcad]
[0x9b4f815]
[0x9b4d045]
[0x8054950]
[0xb33f328]
[0x8048131]

Exiting...

</stderr_txt>
]]>

Validate state Valid
Claimed credit 280.45
Granted credit 11.25



____________


TJ

Joined: Mar 29 09
Posts: 127
ID: 308421
Credit: 4,799,890
RAC: 0
Message 75807 - Posted 26 Jun 2013 17:08:09 UTC - in response to Message ID 75803.

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.


Target CPU run time: 1 hour..... :-)

I have seen this on preferences but have now idea what it does or where it is for.
Can someone please explain this?


Rosetta@Home workunits are set up in usually 100 sections, called decoys. They try to run however many of these decoys they expect to finish in the target CPU run time, but can go over if the last one takes longer than expected.

I'm not sure if the shutdown code runs properly if the last decoy that was finished reported an error instead of a good answer.

Does that mean that when I set the runtime at i.e. 2 hours, a Rosetta WU will be finished within 2 hours?
I have not set anything there at the moment and WU's take around 5 hours to finish.
____________
Greetings,
TJ.

dcdc Profile

Joined: Nov 3 05
Posts: 1596
ID: 8948
Credit: 33,802,222
RAC: 17,340
Message 75808 - Posted 26 Jun 2013 18:04:57 UTC - in response to Message ID 75807.

Were these new jobs set up this way, or was this an oversight? I've already lost a number of hours/days of crunching because of this issue before I got an idea of what was happening. Is there a reasonable workaround for this problem? Thanks.


Target CPU run time: 1 hour..... :-)

I have seen this on preferences but have now idea what it does or where it is for.
Can someone please explain this?


Rosetta@Home workunits are set up in usually 100 sections, called decoys. They try to run however many of these decoys they expect to finish in the target CPU run time, but can go over if the last one takes longer than expected.

I'm not sure if the shutdown code runs properly if the last decoy that was finished reported an error instead of a good answer.

Does that mean that when I set the runtime at i.e. 2 hours, a Rosetta WU will be finished within 2 hours?
I have not set anything there at the moment and WU's take around 5 hours to finish.


It's a preference rather than a hard limit. If the decoys are small/quick to run in comparison to your preference time then there's a good chance that Rosetta will be able to complete the task near your target time, but it has to complete a minimum of one decoy so if one decoy takes longer than the run time then you'll be over the time limit. I guess if the decoys are very variable in run-time then that'll also reduce Rosetta's prediction accuracy on run-time.
____________

TJ

Joined: Mar 29 09
Posts: 127
ID: 308421
Credit: 4,799,890
RAC: 0
Message 75812 - Posted 27 Jun 2013 20:24:06 UTC - in response to Message ID 75808.

It's a preference rather than a hard limit. If the decoys are small/quick to run in comparison to your preference time then there's a good chance that Rosetta will be able to complete the task near your target time, but it has to complete a minimum of one decoy so if one decoy takes longer than the run time then you'll be over the time limit. I guess if the decoys are very variable in run-time then that'll also reduce Rosetta's prediction accuracy on run-time.

Thank you. In that case I leave it as it is.

____________
Greetings,
TJ.

Link
Avatar

Joined: May 4 07
Posts: 260
ID: 173059
Credit: 337,463
RAC: 237
Message 75834 - Posted 11 Jul 2013 9:59:49 UTC - in response to Message ID 75807.

Does that mean that when I set the runtime at i.e. 2 hours, a Rosetta WU will be finished within 2 hours?
I have not set anything there at the moment and WU's take around 5 hours to finish.

Also note the wording "Target CPU run time". The task will try to run no more than the set CPU time, but if your CPU has a lot of other stuff to do, the actuall runtime might be a lot longer.
____________
.

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 75835 - Posted 11 Jul 2013 15:30:13 UTC

After 6 hours of crunch, error on 592222594:

# cpu_run_time_pref: 7200
BOINC:: CPU time: 21994.5s, 14400s + 7200s[2013- 7-11 16:58:11:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 21994.5 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>cryo_bb__t20s__SAVE_ALL_OUT_IGNORE_THE_REST_88799_4052_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 75859 - Posted 21 Jul 2013 19:13:46 UTC

screensaver crashes with endo_ac_ wus....
____________

Nick Perry

Joined: Jul 19 13
Posts: 1
ID: 478837
Credit: 105,673
RAC: 77
Message 75894 - Posted 3 Aug 2013 9:54:55 UTC - in response to Message ID 75859.

screensaver crashes with endo_ac_ wus....

same issue here. Windows 7 system runs fine XP system ALL endo units error. NOT running the screensaver, just in BOINC manager.

Graphics fail after 2 to 5 minutes..

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 75919 - Posted 9 Aug 2013 8:35:41 UTC

597301386

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
[2013- 8- 9 10:33:41:] :: BOINC:: Initializing ... ok.
[2013- 8- 9 10:33:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
ERROR: Illegal value specified for option -run:protocol : abinitio

</stderr_txt>
]]>
____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 76031 - Posted 5 Sep 2013 19:22:09 UTC - in response to Message ID 75859.

screensaver crashes with endo_ac_ wus....


Again....no fix?
____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 76060 - Posted 23 Sep 2013 7:53:07 UTC

605202808

After 68 minutes

Continuing computation from checkpoint: chk_NoTag_FastRelax__chk1_fa ... success!
dof_atom1 atomno= 3 rsd= 8
atom1 atomno= 1 rsd= 8
atom2 atomno= 2 rsd= 8
atom3 atomno= 5 rsd= 8
atom4 atomno= 6 rsd= 8
THETA1 nan
THETA3 1.02049
PHI2 0

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: src/core/kinematics/AtomTree.cc line: 780
SIGSEGV: segmentation violation
Stack trace (18 frames):
[0xb2aef87]
[0x85a400]
[0xa720abb]
[0xa166837]
[0xa1f3edc]
[0xa1f4e3c]
[0x996c8d6]
[0x996df60]
[0x89561af]
[0x867d35e]
[0x992d14f]
[0x9931429]
[0x9aebcad]
[0x9b4f815]
[0x9b4d045]
[0x8054950]
[0xb33f328]
[0x8048131]

Exiting...

____________

[VENETO] boboviz Profile

Joined: Dec 1 05
Posts: 545
ID: 25524
Credit: 1,510,213
RAC: 1,277
Message 76090 - Posted 2 Oct 2013 15:51:33 UTC

607525909
607525911


Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev54943.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_ac_t20s_reg_shift_6.0A_1pma_fit_INPUT_A0076-A0089_-3_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 7200
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc
____________

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 76091 - Posted 3 Oct 2013 1:06:47 UTC

I've had 7 of these fail 1 after the other all the same.


ab_t20s_reg_shift_4.1A_1pma_fit_INPUT_B0402-B0408_01_SAVE_ALL_OUT_IGNORE_THE_REST_99824_2_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=551792680

ERROR: Unable to open weights/patch file. None of (./)stage1 or (./)stage1.wts or minirosetta_database/scoring/weights/stage1 or minirosetta_database/scoring/weights/stage1.wts exist
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 2967
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation
Stack trace (9 frames):
[0xb2aef87]
[0xb7720400]
[0x99704f6]
[0x9aebb07]
[0x9b4f815]
[0x9b4d045]
[0x8054950]
[0xb33f328]
[0x8048131]

Exiting...

</stderr_txt>
]]>

____________


Yury Naydenov

Joined: Jun 17 12
Posts: 3
ID: 453191
Credit: 1,780,713
RAC: 2,508
Message 76694 - Posted 6 May 2014 22:33:39 UTC - in response to Message ID 75551.

.

Yury Naydenov

Joined: Jun 17 12
Posts: 3
ID: 453191
Credit: 1,780,713
RAC: 2,508
Message 76695 - Posted 6 May 2014 22:35:31 UTC - in response to Message ID 75551.

.

Message boards : Number crunching : Minirosetta 3.46


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^