Posts by Brian Priebe

1) Message boards : Number crunching : Constant Timeouts when Unpacking (Message 86713)
Posted 24 Jun 2017 by Brian Priebe
Post:
On one machine here, since detached, all Rosetta WU's resulted in failures due to constant:

Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_d0bf94b.zip
No heartbeat from core client for 30 sec - exiting

Any ideas?
2) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81480)
Posted 17 Apr 2017 by Brian Priebe
Post:
Still have WU's stuck uploading. However, a different error sometimes shows up now:

17-Apr-2017 18:06:53 | rosetta@home | [error] Error reported by file upload server: [des_DS_160_fragments_fold_SAVE_ALL_OUT_470271_1422_0_0] locked by file_upload_handler PID=4156
17-Apr-2017 18:06:53 | rosetta@home | [error] Error reported by file upload server: [tj_3_6_junc_X_DHR55_DHR55_l3_t2_t3_4_v5c_fragments_abinitio_SAVE_ALL_OUT_474351_416_0_0] locked by file_upload_handler PID=23998
3) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 81432)
Posted 14 Apr 2017 by Brian Priebe
Post:
I have two machines here with the same problem. 10 WU's stuck on uploading. They transmit between 48KB and 55KB then stop dead before going into the retry loop.
4) Message boards : Number crunching : Impossible for me to return results... (Message 78671)
Posted 1 Sep 2015 by Brian Priebe
Post:
...but it is now fixed

This seems to have been quite short-lived. I now get:
01-Sep-2015 16:42:35 | rosetta@home | Scheduler request completed: got 0 new tasks
01-Sep-2015 16:42:35 | rosetta@home | Project is temporarily shut down for maintenance

Also, it does not upload anything. All of the servers are showing disabled except for web pages.
5) Message boards : Number crunching : Cannot retrieve new work (Message 77291)
Posted 6 Aug 2014 by Brian Priebe
Post:
are you still seeing the error? I checked, we still have workunits in queue (so you should be getting some).


My machines are getting new work again.
6) Message boards : Number crunching : Cannot retrieve new work (Message 77256)
Posted 3 Aug 2014 by Brian Priebe
Post:
BOINC event log reports this afternoon:

"03-Aug-2014 15:08:34 | rosetta@home | Server can't open database".
7) Message boards : Number crunching : Minirosetta 3.46 (Message 75541)
Posted 29 Apr 2013 by Brian Priebe
Post:
The best I can tell, only the cryo workunits need the new features of 3.46.

Yet on BOINC 7.0.28 under Windows 7 (64-bit), all work units are delivered to run under Mini Rosetta app 3.46. Such is not the case under BOINC 6.12.33 running under Windows 2003 Server (32-bit): they all still run app 3.45. And of course the "cryo" series are still bombing out under 3.45.
8) Message boards : Number crunching : Minirosetta 3.46 (Message 75536)
Posted 29 Apr 2013 by Brian Priebe
Post:
minirosetta is updated to 3.46 to include recent developments in electron density and other scoring functions.
Are there any particular minimum requirements for this version? Despite resetting the project, I am still getting only 3.45 jobs on a 6.x version of BOINC.
9) Message boards : Number crunching : Minirosetta 3.46 (Message 75519)
Posted 28 Apr 2013 by Brian Priebe
Post:
rb_04_26_38593_73094__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_79295_1489_0 (task ID 578191167) died with exit status -1 in less than 12sec using the new code.
10) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 75497)
Posted 27 Apr 2013 by Brian Priebe
Post:
I just updated the code and restarted the jobs.
Thank you for trying to fix this...
11) Message boards : Number crunching : Mini Rosetta 3.45 (Message 75478)
Posted 25 Apr 2013 by Brian Priebe
Post:
Hi.

I'm seeing a few of these error out with the same problem.

CASP9_fc_benchmark_hybridization_run55_T0542_0_D4_SAVE_ALL_OUT_IGNORE_THE_REST_48152_899_0


CASP9_fc_benchmark_hybridization_run55_T0547_0_C2_SAVE_ALL_OUT_IGNORE_THE_REST_48158_899_0


ERROR: error in process_residue_request: 'com'
ERROR:: Exit from: src/core/conformation/symmetry/util.cc line: 93
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

For me too, 90% of the CASP9 series bomb on this same error.
12) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 75475)
Posted 25 Apr 2013 by Brian Priebe
Post:
Saying 'sorry' doesn't mean a darned thing IF NOTHING CHANGES!!!

I have to second this sentiment. About 35% of all work units I've returned in the last week have been aborted due to 'out of memory' errors. If this appalling record doesn't soon change, ROSETTA is history for me.
13) Message boards : Number crunching : minirosetta 2.16 (Message 68014)
Posted 10 Oct 2010 by Brian Priebe
Post:
Same error on WU 338667991 (http://boinc.bakerlab.org/rosetta/result.php?resultid=370646341). Wingman also had same error.

Snippet from log:

Incorrect function. (0x1) - exit code 1 (0x1)
...
ERROR: bad line in file minirosetta_databasescoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5
ERROR:: Exit from: ....srccorescoringScoreFunction.cc line: 204
14) Message boards : Number crunching : minirosetta 2.15 (Message 67997)
Posted 9 Oct 2010 by Brian Priebe
Post:
2.15 is the most problematic and buggy version of all that I've seen

Out of my most recent 100 WU's, 22% of them blew up on one or another of the errors already posted here. The "Unusual Termination" dialog box from MSVC seems to be becoming more frequent.
15) Message boards : Number crunching : minirosetta 2.15 (Message 67936)
Posted 2 Oct 2010 by Brian Priebe
Post:
They quickly rise to over 1.5 GB of memory use, at which point they are shut down and wait for memory to free up.
If you haven't already done so, you should review the BOINC memory limits set up in Advanced->Preferences->Disk and Memory Usage. A 4GB machine could be adequate to (barely) run two 1.5GB WU's at once. On a 2GB machine, it might be impossible though.
16) Message boards : Number crunching : minirosetta 2.15 (Message 67858)
Posted 29 Sep 2010 by Brian Priebe
Post:
I too am seeing an unusually high number of errors on 3 different machines (and 3 different operating systems) for Rosetta 2.15. 16 WU's in the last few days failed on various errors:

"The system cannot find the path specified. (0x3) - exit code 3 (0x3)"

"Reason: Access Violation (0xc0000005) at address 0x00581B5C write attempt to address 0x00000024"

"Incorrect function. (0x1) - exit code 1 (0x1)" (many different root causes per detailed error messages in the log. <ERROR: Error in traceback: pointer doesn't go anywhere!> occurred multiple times.)

"Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x759AB727"






©2024 University of Washington
https://www.bakerlab.org