Mini Rosetta Version 3.41.

Message boards : Number crunching : Mini Rosetta Version 3.41.

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74064 - Posted: 21 Oct 2012, 4:26:23 UTC

Another two, looks like the same error, you've got a bad batch.


rb_10_19_34252_64405__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_62305_29_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490082071

rb_10_19_34252_64405__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_62305_28_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490082070


Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.

ERROR: can't open file: minirosetta_database//sampling/filtered.vall.dat.2006-05-05
ERROR:: Exit from: src/core/fragment/picking_old/vall/vall_io.cc line: 63
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 74064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74068 - Posted: 21 Oct 2012, 23:37:19 UTC

Got a bucket full of the errors, same as others below.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490081939

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490081940

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490081942

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490081943

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=490081944

ID: 74068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 74071 - Posted: 22 Oct 2012, 3:37:08 UTC

Looks like they all have that same prefix on the WU name and the wingmen are failing as well:
rb_10_19_34252_64405

I sent an EMail to DK let him know there seems to be a bad path name in those.
Rosetta Moderator: Mod.Sense
ID: 74071 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74159 - Posted: 3 Nov 2012, 5:56:35 UTC

This one failed quickly, 11sec.


rb_11_02_34573_64567__t000__0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_62635_81_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=492457626


Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_rb_11_02_34573_64567__t000__0_C2_robetta.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.

ERROR: can't open file: minirosetta_database//sampling/filtered.vall.dat.2006-05-05
ERROR:: Exit from: src/core/fragment/picking_old/vall/vall_io.cc line: 63
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 74159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74193 - Posted: 7 Nov 2012, 7:08:14 UTC

Another quickie 14sec.

rb_11_06_34628_64941__t000__0_D2_SAVE_ALL_OUT_IGNORE_THE_REST_63295_515_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493214346


BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400

ERROR: can't open file: minirosetta_database//sampling/filtered.vall.dat.2006-05-05
ERROR:: Exit from: src/core/fragment/picking_old/vall/vall_io.cc line: 63
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 74193 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 74210 - Posted: 8 Nov 2012, 21:21:28 UTC

All of the jobs I'm getting with zdock in the task name are failing quickly after start for me and wingmen as well. Looking at sterr_out There seems to be a few different failure modes, one gives an incorrect function 0x1 error, another says maximum disk usage exceeded (I have 5GB set, BOINC/Rosetta usually uses ~1). Yet another has an unhandled exception.

Examples:

https://boinc.bakerlab.org/workunit.php?wuid=493514155
https://boinc.bakerlab.org/workunit.php?wuid=493512417
https://boinc.bakerlab.org/workunit.php?wuid=493474527

The one that did finish, finished far too quickly and didn't validate, heh:

https://boinc.bakerlab.org/workunit.php?wuid=493461620

ID: 74210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74211 - Posted: 8 Nov 2012, 21:29:14 UTC

I've had these so far, i'm sure there will be more. I have the same amount of space ( Use at most: 4 GB disk space ) on both rig never had a problem with them before.

Is that disc use limit set in the input files, why didn't they see this on Ralph?

============================

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493404478

2HQS_zdock_2HQS_cluster_selectcst_c.3.23_SAVE_ALL_OUT_63681_1_0

Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev50262.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/2HQS_allinput2.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...

ERROR: Cannot open PDB file "2HQS_ppk_b_start.pdb"
ERROR:: Exit from: src/core/import_pose/import_pose.cc line: 198
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

==========================

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493431352

1T6B_zdock_1T6B_cluster_selectcst_c.13.0_SAVE_ALL_OUT_63610_1_0

Exit status -177 (0xffffffffffffff4f)

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>
===================================

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493429697

2OUL_zdock_2OUL_cluster_selectcst_c.0.41_SAVE_ALL_OUT_63657_1_0

Exit status -177 (0xffffffffffffff4f)

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>
===============================

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493366529

1XU1_zdock_1XU1_cluster_selectcst_c.0.50_SAVE_ALL_OUT_63619_1_0

# cpu_run_time_pref: 14400
======================================================
DONE :: 20 starting structures 1201 cpu seconds
This process generated 20 decoys from 20 attempts
======================================================
BOINC :: WS_max 8.81443e-280

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>
===========================

ID: 74211 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74215 - Posted: 8 Nov 2012, 22:39:12 UTC

And another, failed!

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=493428654

2O8V_zdock_2O8V_cluster_selectcst_c.3.2_SAVE_ALL_OUT_63653_1_0

Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 14400
======================================================
DONE :: 20 starting structures 1201 cpu seconds
This process generated 20 decoys from 20 attempts
======================================================
BOINC :: WS_max 0

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

ID: 74215 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 74219 - Posted: 9 Nov 2012, 2:19:10 UTC

It's all from the same series, probably just a small parameter that needs to be adjusted, heh.
ID: 74219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74220 - Posted: 9 Nov 2012, 2:32:18 UTC

Hi.

I've increased Boinc disc limit to 10GB on both my rigs to see if that helps, as i've got a few of the zdock tasks in line to run on both rigs.



ID: 74220 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 74221 - Posted: 9 Nov 2012, 2:56:01 UTC

Well that didn't work, more failed tasks.

These 2 got Validate errors.

2B4J_zdock_2B4J_cluster_selectcst_c.1.5_SAVE_ALL_OUT_63634_2_1

1SYX_zdock_1SYX_cluster_selectcst_c.17.1_SAVE_ALL_OUT_63609_1_1


=========================================

This one same disc useage problems, after my changes.

1R0R_zdock_1R0R_cluster_selectcst_c.7.22_SAVE_ALL_OUT_63602_2_0

Fri 09 Nov 2012 13:38:16 EST rosetta@home Aborting task 1R0R_zdock_1R0R_cluster_selectcst_c.7.22_SAVE_ALL_OUT_63602_2_0: exceeded disk limit: 323.93MB > 286.10MB


ps/ If your going to put these tasks out in the wild, make sure they run!

ID: 74221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,816,664
RAC: 863
Message 74222 - Posted: 9 Nov 2012, 14:36:28 UTC - in response to Message 74220.  

Hi.

I've increased Boinc disc limit to 10GB on both my rigs to see if that helps, as i've got a few of the zdock tasks in line to run on both rigs.




I think Polian is right and this is a problem for the project to solve. According to the BOINC FAQ Service it happens when "the amount of disk space that the task uses exceeds the amount of space specified in the <rsc_disk_bound>n</rsc_disk_bound> amount given to the task."

Nothing has been run on ralph in a few weeks but if this is a simple typing error then it could have been caught by running a handful on an in-house computer before adding them to the rosetta queue. Perhaps this type of error doesn't happen frequently enough to warrant adding that step to the existing protocols.

The tasks appear to error out almost immediately so they don't waste much of our time and the bulk of them have probably already made their way through the system. There will be a few stragglers showing up over the next couple of weeks (dependent on users' settings) but not enough to justify trying to preemptively delete the bad workunits.

Best,
Snags
ID: 74222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 74229 - Posted: 9 Nov 2012, 22:20:46 UTC - in response to Message 74222.  
Last modified: 9 Nov 2012, 22:21:56 UTC

I think Polian is right and this is a problem for the project to solve. According to the BOINC FAQ Service it happens when "the amount of disk space that the task uses exceeds the amount of space specified in the <rsc_disk_bound>n</rsc_disk_bound> amount given to the task."

I think is time for them to increase that limit, the rosetta db takes more than half of it when extracted, the input file for those WUs needs ~180MB. Sure it can't work, don't need big tests for to figure it out, just a pocket calculator. But they probably just didn't think about such a simple thing. Can happen.

Milkyway WUs for example have 15MB limit while the WUs need less than 10KB. So make 3GB out of it and it should be enough for a while.
.
ID: 74229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 74234 - Posted: 10 Nov 2012, 5:45:52 UTC

ID: 74234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 74238 - Posted: 10 Nov 2012, 12:52:18 UTC

My first zdock crashed apparently before it reached the allowed maximum disc space:

2SIC_zdock_2SIC_cluster_selectcst_c.15.5_SAVE_ALL_OUT_63690_1


.
ID: 74238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,816,664
RAC: 863
Message 74259 - Posted: 12 Nov 2012, 12:32:08 UTC

More zdock proplems:

Several ended quickly with client error/compute error

2PCC_zdock_2PCC_cluster_selectcst_c.1.53_SAVE_ALL_OUT_63659_5
1YVB_zdock_1YVB_cluster_selectcst_c.0.77_SAVE_ALL_OUT_63621_6
1FLE_zdock_1FLE_cluster_selectcst_c.5.12_SAVE_ALL_OUT_63540_7
Ended with exit status -177, maximum disk usage exceeded, a long stderr out and "SIGPIPE: write on a pipe with no reader". My wingman on the second task received exit status 196 on a Windows machine.

1WEJ_zdock_1WEJ_cluster_selectcst_c.7.6_SAVE_ALL_OUT_63679_7 both copies "process exited with code 1" and
ERROR: Cannot open PDB file "1WEJ_ppk_b_start.pdb"
ERROR:: Exit from: src/core/import_pose/import_pose.cc line: 198
BOINC:: Error reading and gzipping output datafile: default.out


Two more ended with validate errors and the odd, presumably tell-tale, 1201 cpu seconds

2ABZ_zdock_2ABZ_cluster_selectcst_c.4.7_SAVE_ALL_OUT_63630_6
2H7V_zdock_2H7V_cluster_selectcst_c.16.0_SAVE_ALL_OUT_63641_5

Best,
Snags




ID: 74259 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
shilei
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 25 Aug 11
Posts: 5
Credit: 1,014,314
RAC: 0
Message 74267 - Posted: 13 Nov 2012, 22:06:05 UTC

Sincere apology for all the Zdock errors. The jobs failed due to one missing file in some of the zip files. I have downsized/withdrawn all the WUs. Sorry for the irresponsible submissions. Thanks for your contribution of WUs. It won't happen again.
ID: 74267 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FredJVerster

Send message
Joined: 25 Nov 11
Posts: 4
Credit: 132,655
RAC: 0
Message 74308 - Posted: 14 Nov 2012, 11:09:34 UTC - in response to Message 74267.  
Last modified: 14 Nov 2012, 11:15:33 UTC

Sincere apology for all the Zdock errors. The jobs failed due to one missing file in some of the zip files. I have downsized/withdrawn all the WUs. Sorry for the irresponsible submissions. Thanks for your contribution of WUs. It won't happen again.



I have MiniRosettas 3.43 that show no progress after 4 hours???
Aborted already 1, no change should I Abort all of them?
They kind of fail with no CPU-use?
Knights Who Say Ni N!
ID: 74308 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Umfriend

Send message
Joined: 22 Jun 11
Posts: 3
Credit: 12,052,815
RAC: 0
Message 74311 - Posted: 14 Nov 2012, 11:21:17 UTC - in response to Message 74308.  
Last modified: 14 Nov 2012, 11:26:41 UTC

I have MiniRosettas 3.43 that show no progress after 4 hours???
Aborted already 1, no change should I Abort all of them?
They kind of fail with no CPU-use?
Same here. The 3.41s are running happily but the 3.43 are not. The graphics window pops up at the start of a WU. No graphics are shown. On one line it says:
Stage: unknown [TABs TO RIGHT]No shared mem

I'll reset my project once the 3.41s are done and report
Umf.

Edit: Also, no CPU usage and just 19Mb mem footprint. It's not doing anything.
ID: 74311 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JoeyJoJo

Send message
Joined: 20 Jan 11
Posts: 2
Credit: 823,144
RAC: 0
Message 74313 - Posted: 14 Nov 2012, 11:28:35 UTC

Same issue in the Q&A thread https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6124
ID: 74313 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Mini Rosetta Version 3.41.



©2024 University of Washington
https://www.bakerlab.org