Minirosetta 1.97

Message boards : Number crunching : Minirosetta 1.97

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile Michael Hoffmann
Avatar

Send message
Joined: 5 Jun 08
Posts: 9
Credit: 1,307,108
RAC: 0
Message 63459 - Posted: 26 Sep 2009, 13:44:11 UTC

Had a validate error again (https://boinc.bakerlab.org/rosetta/result.php?resultid=283360235), though the wu ran through smoothly.
ID: 63459 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,822,907
RAC: 1,301
Message 63464 - Posted: 26 Sep 2009, 23:51:25 UTC

I'm seeing several workunits with names like histone_loopbuild_run1_* (sample 283470540) fail with a validate error after about 20 minutes on Mac OS X 10.6.1, but there's nothing in the log to hint at the problem.

etting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
======================================================
DONE :: 2 starting structures 1201 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>



ID: 63464 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 63466 - Posted: 27 Sep 2009, 4:41:38 UTC

I've had a few tasks that have bailed out early with the new app, that could have

run for hours and done more models, like this one only ran 32min.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=258477161

histone_loopbuild_run1_14925_27036_0

# cpu_run_time_pref: 14400
======================================================
DONE :: 2 starting structures 1949.47 cpu seconds
This process generated 2 decoys from 2 attempts
======================================================

Got credit by the way.

ID: 63466 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Gen_X_Accord
Avatar

Send message
Joined: 5 Jun 06
Posts: 154
Credit: 279,018
RAC: 0
Message 63468 - Posted: 27 Sep 2009, 5:00:59 UTC

Seems like these "histone" ones are real trouble.

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=5081
ID: 63468 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1226
Credit: 13,917,849
RAC: 2,833
Message 63469 - Posted: 27 Sep 2009, 5:05:45 UTC - in response to Message 63018.  

Too many tasks are stalling. I'm switching my resources to World Community Grid for a while, hoping this will be fixed.

Is it my imagination or are new versions of Rosetta@home getting less testing on Ralph before being released here? Maybe that would help to find the stability problems...


The last I looked, Ralph@home was still testing 1.95 and hadn't started 1.96 or 1.97. Typical of several versions lately. I've decided to stop Rosetta@home and Ralph@home participation on my computer with the least memory per processor until the memory leak problem is fixed, and also put off even starting them on my new laptop, but continue them on a third computer.

I've noticed one item that might have something to do with the memory leak - Windows Task Manager on the computer where I've participated the longest reports 485596 handles, on the second one where I started participating it reports 27185, and on my new laptop it reports 21283. The Vista help file says very little about handles, but if you search long enough, you'll eventually find a statement that ordinary users don't need to know what they are; they should talk to a programmer or administrator about any problems with them; and it does not offer any way to determine how many are attached to what program, or even a statement about whether they are normally attached to programs. Ignore the statement that the proper name for them is object handles, since that appears to be the only time the help file even mentions object handles if you haven't installed any software with more specific information about them.
ID: 63469 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 63474 - Posted: 27 Sep 2009, 6:58:17 UTC

Hi Robert.

The way i understand it is that when mini 1.95 was released here, there was a bit of

a foul up with some files on the server being sent out to us. Only then the version number

was changed to 1.96 which that caused some other problems. So the version number was changed

again to 1.97 to go with all the new files, so you see the app is really the same as 1.95

that is on Ralph just the version number is different here. ( don't quote me ;)

Hope that helps.

ID: 63474 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Gen_X_Accord
Avatar

Send message
Joined: 5 Jun 06
Posts: 154
Credit: 279,018
RAC: 0
Message 63480 - Posted: 27 Sep 2009, 11:46:07 UTC

I have had another "histone" work unit error out on me. I'm just going to abort any of them that get sent my way. I don't feel like wasting time running junk work units.
ID: 63480 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile bill Johnson@GMU

Send message
Joined: 5 Aug 09
Posts: 5
Credit: 1,356,008
RAC: 0
Message 63481 - Posted: 27 Sep 2009, 12:29:54 UTC - in response to Message 63480.  

I have had another "histone" work unit error out on me. I'm just going to abort any of them that get sent my way. I don't feel like wasting time running junk work units.


I’m doing the same thing. The histone units have been getting to around 4% done and then just stopping there. I can’t view any of the graphics for these work units and even though there original work time is set at 4 hours I had two that had spent 10 hours and were still only at 4%
ID: 63481 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TJ

Send message
Joined: 29 Mar 09
Posts: 127
Credit: 4,799,890
RAC: 0
Message 63497 - Posted: 28 Sep 2009, 14:33:23 UTC

At Mod.Sence request,
This is one of 3 errors I got yesterday.

283639845 258715590 26 Sep 2009 22:00:57 UTC 27 Sep 2009 8:19:32 UTC Over Validate error Done 1,157.33 --- ---

The wingman has also a Validate error for this.
Sorry I don't know how to make it "clickable".

Greetings,
TJ.
ID: 63497 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,958,559
RAC: 23,957
Message 63499 - Posted: 28 Sep 2009, 15:15:31 UTC - in response to Message 63497.  
Last modified: 28 Sep 2009, 15:20:58 UTC

At Mod.Sense request,
This is one of 3 errors I got yesterday.

283639845 258715590 26 Sep 2009 22:00:57 UTC 27 Sep 2009 8:19:32 UTC Over Validate error Done 1,157.33 --- ---

The wingman has also a Validate error for this.
Sorry I don't know how to make it "clickable".

This is histone_loopbuild_run1_14925_69353_1
Running on an Intel i7 920 Vista Ultimate 64 SP2

As stated, the wingman also produced an error on a Phenom II 955 running Windows 7 64 bit, though both were given credit for their short runtimes and 2 decoys processed.

There does seem something odd about this kind of job, but people aren't actually being penalised for it. More a problem for the project to work out.
ID: 63499 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 63500 - Posted: 28 Sep 2009, 15:21:50 UTC

Sid thanks for linking, and while you were editing, I was moving ;)
Rosetta Moderator: Mod.Sense
ID: 63500 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,958,559
RAC: 23,957
Message 63501 - Posted: 28 Sep 2009, 15:21:55 UTC
Last modified: 28 Sep 2009, 15:23:38 UTC

^ I know - spooky! Anyway...

I've come up with 2 errors in the last week, both coming from the same type of job.

sel_core_5.0_low50_beta_low200_start_hb_t374__IGNORE_THE_REST_14879_226_1
sel_core_5.0_low50_beta_low200_start_hb_t374__IGNORE_THE_REST_14879_799_1
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005A7230 read attempt to address 0x00000000

ID: 63501 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 63509 - Posted: 29 Sep 2009, 0:54:30 UTC

Just returned this, it finished short of my runtime, looks odd no models.

I've cut down the txt from result.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=258992201

symm_lr8_seq_score12_ss_1.7_rlbd_1t2i_IGNORE_THE_REST_DECOY_14923_1529_0

Continuing computation from checkpoint: chk_NoTag_SequenceRelax__chk46_fa ... success!
ERROR: Could not find disulfide partner for residue 7
ERROR:: Exit from: src/core/scoring/disulfides/FullatomDisulfideEnergyContainer.cc line: 562
called boinc_finish

</stderr_txt>
]]>

Validate state Valid
Claimed credit 20.3091706964679
Granted credit 17.1487999873651
application version 1.97

ID: 63509 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 63522 - Posted: 29 Sep 2009, 21:25:55 UTC

I've had a number of "frb" WUs run out of disk space with multi-gigabyte stderr.txt files. Those files were full of "bounds error" statements.

example:
https://boinc.bakerlab.org/rosetta/result.php?resultid=284162008
ID: 63522 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 63523 - Posted: 29 Sep 2009, 21:45:35 UTC
Last modified: 29 Sep 2009, 21:51:20 UTC

Why are these tasks that have failed on Ralph let loss here! The same type of

error as seen over their.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=259185621

Tue 29 Sep 2009 19:31:59 EST|rosetta@home|Aborting task frb_0_8_mike_chosen_cst_oct09_hb_t313__IGNORE_THE_REST_1I5SA_2_14958_15_1: exceeded disk limit: 301.57MB > 286.10MB

EDIT// Just to add i happen to see it running and it was wrighting to disk every

5 sec 1.3MB.
ID: 63523 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,958,559
RAC: 23,957
Message 63524 - Posted: 29 Sep 2009, 22:59:40 UTC - in response to Message 63522.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=284162008

I'm sure it's unrelated seeing as the wingman errored out with a more recent version (plus it's none of my business really) but why are you running Boinc 5.2.13? That can't be good, can it?
ID: 63524 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
288VKYUjwsXfAaTXn6SFJC4LVPRf

Send message
Joined: 16 Dec 05
Posts: 31
Credit: 153,110
RAC: 0
Message 63531 - Posted: 30 Sep 2009, 21:03:03 UTC - in response to Message 63524.  
Last modified: 30 Sep 2009, 21:04:34 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=284162008

I'm sure it's unrelated seeing as the wingman errored out with a more recent version (plus it's none of my business really) but why are you running Boinc 5.2.13? That can't be good, can it?


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=259177853
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=259177853

2 WU's with same problem. Both frb_ . Should I allow more disk space for those WU's ?
ID: 63531 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,958,559
RAC: 23,957
Message 63614 - Posted: 5 Oct 2009, 16:32:11 UTC

Over the last week, these two errored out with
Server state Over
Outcome Client error
Client state Compute error
Exit status -177 (0xffffff4f)

<core_client_version>6.6.38</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
]]>

frb_0_8_mike_chosen_cst_oct09_hb_t290__IGNORE_THE_REST_1XO7A_3_14951_1_0
frb_0_8_mike_chosen_cst_oct09_hb_t293__IGNORE_THE_REST_1NV8A_16_14952_18_0

While the following three errored out with
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005A7230 read attempt to address 0x00000000

frb_0_8_mike_chosen_cst_oct09_hb_t374__IGNORE_THE_REST_1TIQA_7_14969_28_0
frb_0_8_mike_chosen_cst_oct09_hb_t374__IGNORE_THE_REST_1Y9KA_7_14969_28_0
frb_0_8_mike_chosen_cst_oct09_hb_t374__IGNORE_THE_REST_1Z4EA_4_14969_28_0
ID: 63614 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,958,559
RAC: 23,957
Message 63646 - Posted: 11 Oct 2009, 18:10:23 UTC

Over the last week I had no errors at all out of 150 WUs. Well done guys.
ID: 63646 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile banditwolf

Send message
Joined: 10 Jan 06
Posts: 28
Credit: 139,737
RAC: 0
Message 63653 - Posted: 13 Oct 2009, 2:17:14 UTC

Had this message:
10/12/2009 12:49:02 PM|rosetta@home|Task abinitio_withrelax_homfrag_129_B_1ctf__SAVE_ALL_OUT_15148_499_0 exited with a DLL initialization error.

It is still running, should finish in the next hour.
ID: 63653 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Minirosetta 1.97



©2024 University of Washington
https://www.bakerlab.org