Minirosetta 2.00

Message boards : Number crunching : Minirosetta 2.00

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1958
Credit: 37,935,725
RAC: 9,966
Message 64216 - Posted: 25 Nov 2009, 15:48:04 UTC

Can I be clear on something:

The problems are with WUs with the name:

lr8_combine_smooth_torsion_it00_rama*

New WU's are coming down with the name:

lr5_combine_smooth_torsion_it00_redo*

Are these new ones ok? I think I aborted one by accident.That was wrong, wasn't it?
ID: 64216 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 64219 - Posted: 25 Nov 2009, 16:46:52 UTC - in response to Message 64216.  
Last modified: 25 Nov 2009, 16:49:25 UTC

Can I be clear on something:

The problems are with WUs with the name:

lr8_combine_smooth_torsion_it00_rama*

New WU's are coming down with the name:

lr5_combine_smooth_torsion_it00_redo*

Are these new ones ok? I think I aborted one by accident.That was wrong, wasn't it?


I've received no specific word, but it sounds very likely, yes. No biggie.

[edit]Yifan posted here confirming the rama batch had a problem in how it was created.
Rosetta Moderator: Mod.Sense
ID: 64219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1958
Credit: 37,935,725
RAC: 9,966
Message 64223 - Posted: 25 Nov 2009, 19:04:21 UTC

That's what I eventually realised before going into an abort-frenzy.

Ok, I think I'm clear and fully re-stocked now.
ID: 64223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile bruce

Send message
Joined: 15 Sep 07
Posts: 10
Credit: 839,797
RAC: 0
Message 64228 - Posted: 26 Nov 2009, 0:11:52 UTC

I'm also seeing a considerable number of WUs with errors similar to those posted by others recently.

Here is an example of the messages on the client:
11/25/2009 1:10:59 PM rosetta@home Starting sel_core_1.0_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16216_654_1
11/25/2009 1:11:00 PM rosetta@home Starting task sel_core_1.0_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16216_654_1 using minirosetta version 200
11/25/2009 1:12:43 PM rosetta@home Computation for task sel_core_1.0_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16216_654_1 finished
11/25/2009 1:12:43 PM rosetta@home Output file sel_core_1.0_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16216_654_1_0 for task sel_core_1.0_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16216_654_1 absent

Here are some examples from the results:
299750820
299623182
299623178
299623176
299623173
299623157
299621562
299618069
299540989
299540964
299524862
299522937
299509005
299508999
... and 11 more WUs downloaded today (Nov 25) err'd with similar results
24 downloaded yesterday (nov 24) err'd with similar results.

I'll monitor these boards for updates, 'till then I've suspended further WU downloads.


ID: 64228 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
darkpella

Send message
Joined: 27 Sep 05
Posts: 13
Credit: 66,840
RAC: 0
Message 64232 - Posted: 26 Nov 2009, 8:02:16 UTC - in response to Message 64228.  

I'm also seeing a considerable number of WUs with errors similar to those posted by others recently.

.....


Similar here with the following WUs:
299885240
299643164
299547442
298811740

stderr is slightly different though. stderr from my WUs is like:
<core_client_version>6.6.38</core_client_version>
<![CDATA[
<message>
Funzione non corretta. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
[2009-11-24 9:22: 6:] :: BOINC:: Initializing ... ok.
[2009-11-24 9:22: 6:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev33769.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/yfsong_lr8_combine_smooth_torsion_it00_rama06_A.zip
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr8_1shf.out.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Fullatom mode ..
ERROR: Value of inactive option accessed: -score:dun08_dir


</stderr_txt>
]]>


while the one from some of bruce's WUs is like:
<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
[2009-11-25 13:11: 0:] :: BOINC:: Initializing ... ok.
[2009-11-25 13:11: 0:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev33769.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/sel_core_1.0_low200_beta_low200_nostart.broker_corebuild.t313_.olange.boinc_files.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.

ERROR: res1 != res2
ERROR:: Exit from: ....srccorekinematicsFoldTree.cc line: 2342
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


</stderr_txt>
]]>

ID: 64232 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 64245 - Posted: 27 Nov 2009, 17:10:16 UTC
Last modified: 27 Nov 2009, 17:22:19 UTC

darkpella, all four of the tasks you linked are the known problem described here with "...rama..." in the name. These tasks were later corrected and reissued with "...redo..." in the name.
Rosetta Moderator: Mod.Sense
ID: 64245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 2
Message 64310 - Posted: 30 Nov 2009, 17:24:20 UTC

Two tasks failing on Windows 7

300429025 sel_core_1.5_low200_beta_low200_nostart_hb_t297__IGNORE_THE_REST_15865_1407_1
300429024 resa_sel_core_1.5_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16299_98_1

both with the error

ERROR: res1 != res2
ERROR:: Exit from: ....srccorekinematicsFoldTree.cc line: 2342
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
ID: 64310 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 346
Credit: 382,349
RAC: 0
Message 64311 - Posted: 30 Nov 2009, 17:25:53 UTC
Last modified: 30 Nov 2009, 17:38:35 UTC

I'm getting recently many -1073741819 (0xc0000005) errors:

300067343
300945493 WU: 273907970. My wingman got same error.
301117881
301138967
.
ID: 64311 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1958
Credit: 37,935,725
RAC: 9,966
Message 64323 - Posted: 1 Dec 2009, 15:28:57 UTC - in response to Message 64310.  

Two tasks failing on Windows 7

300429025 sel_core_1.5_low200_beta_low200_nostart_hb_t297__IGNORE_THE_REST_15865_1407_1
300429024 resa_sel_core_1.5_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_16299_98_1

both with the error

ERROR: res1 != res2
ERROR:: Exit from: ....srccorekinematicsFoldTree.cc line: 2342
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>

My W7 laptop is error-free, but my Vista desktop had a few of the same errors:

sel_core_1.5_low200_beta_low200_nostart_hb_t297__IGNORE_THE_REST_15865_1075_0
sel_core_1.5_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_15870_2182_1
sel_core_1.5_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_15870_9716_1

All other WUs are fine.
ID: 64323 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 2
Message 64345 - Posted: 3 Dec 2009, 0:58:45 UTC

Some more failures on Win 7, all with the res1 != res2 error

resa_sel_core_1.5_low200_beta_low200_nostart_hb_t331__IGNORE_THE_REST_16303_113_1
rsel_core_1.5_low200_beta_low200_nostart_hb_t297__IGNORE_THE_REST_15865_7931_1
rsel_core_1.5_low200_beta_low200_nostart_hb_t313__IGNORE_THE_REST_15870_8110_1

and this one

sel_core_1.5_low200_beta_low200_nostart_hb_t297__IGNORE_THE_REST_15865_7946_0

which gave the same res1 != res2 error but ran for half an hour and returned an error status of success.

Again, it seems it's those tasks with t331 and t297 in their names that are causing problems.
ID: 64345 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 64368 - Posted: 4 Dec 2009, 2:57:19 UTC

Mod Sense, Here you go.

These are the only ones still in my list.

Credit was about normal, mostly get less than claimed anyway.

Don't think there was any double headers as you call them, some may have restarted.
===============================================================
This one did 135 models. - CC_101.22 / GC_83.32

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=275075567
---------------------------------------------------------------
This did 112. - CC_101.69 / GC_81.83

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=274576093
---------------------------------------------------------------
This did 153. - CC_102.90 / GC_86.03

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=273886738
---------------------------------------------------------------
This did 116. - CC_103.40 / GC_85.25

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=273350890

All with mini 2.00, i've had some with older versions to.

ID: 64368 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1222
Credit: 13,719,365
RAC: 2,784
Message 64376 - Posted: 4 Dec 2009, 19:15:56 UTC

ID: 64376 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
aguiar@carrier.com.br

Send message
Joined: 19 Feb 06
Posts: 6
Credit: 367,089
RAC: 0
Message 64392 - Posted: 7 Dec 2009, 9:20:48 UTC

Good morning!

I have WU 3gbm_3g0l_0264_revert.php_dock_rmsd.xml__16270_181_1 now elapsed 13:25:10 with 0.789% progress. Should I let it go or delete it?

Thanks,

Valter Aguiar
Brazil.
ID: 64392 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1958
Credit: 37,935,725
RAC: 9,966
Message 64395 - Posted: 7 Dec 2009, 13:41:47 UTC - in response to Message 64392.  

I have WU 3gbm_3g0l_0264_revert.php_dock_rmsd.xml__16270_181_1 now elapsed 13:25:10 with 0.789% progress. Should I let it go or delete it?

With a 3-hour default runtime the watchdog ought to have closed it down already, but if you click properties on that WU I would expect the CPU time is minimal, so something seems to have stalled with that one. I'd abort it and hope the next person that picks it up has more success with it.
ID: 64395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
aguiar@carrier.com.br

Send message
Joined: 19 Feb 06
Posts: 6
Credit: 367,089
RAC: 0
Message 64396 - Posted: 7 Dec 2009, 14:19:21 UTC

Done, thanks. You were right, only 3 min of CPU time.

Valter.
ID: 64396 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 64397 - Posted: 7 Dec 2009, 14:42:16 UTC
Last modified: 7 Dec 2009, 14:47:50 UTC

...and so it becomes a question of whether your machine has something else going on at a higher priority that is causing BOINC not to get any CPU time? Or is there a problem with BOINC or the task?

All other things being equal, starting a new task would also be impacted by other activity on the system (assuming the other activity is still running). Is your next task running normally? (i.e. check properties or task manager and see how many actual CPU seconds it has now used).

[edit] I don't see this task in your results and off-hand, the naming doesn't look like a Rosetta task. Can you post a link?
Rosetta Moderator: Mod.Sense
ID: 64397 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1958
Credit: 37,935,725
RAC: 9,966
Message 64399 - Posted: 7 Dec 2009, 17:46:13 UTC - in response to Message 64397.  

...and so it becomes a question of whether your machine has something else going on at a higher priority that is causing BOINC not to get any CPU time? Or is there a problem with BOINC or the task?

All other things being equal, starting a new task would also be impacted by other activity on the system (assuming the other activity is still running). Is your next task running normally? (i.e. check properties or task manager and see how many actual CPU seconds it has now used).

[edit] I don't see this task in your results and off-hand, the naming doesn't look like a Rosetta task. Can you post a link?

It appears to be this one:
3gbm_3g0l_0264_revert.pdb_dock_rmsd.xml__16270_181_1

I've seen this kind of thing very occasionally, even while other WUs appear to be running fine. In this case Valter appears to have been the wingman where the original cruncher failed as well.
ID: 64399 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 2
Message 64404 - Posted: 8 Dec 2009, 3:15:26 UTC

I've had a couple of tasks with names like 3a9bB* fail on Windows 7. In both cases I had to abort them as no progress was being made, even though they weren't getting any CPU time. My wingman in both cases successfully completed the tasks, one on Max OS X and the other on Win XP. The first one's reported above: the second is 271436170
ID: 64404 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,428,086
RAC: 0
Message 64406 - Posted: 8 Dec 2009, 10:10:47 UTC

Validate errors in workunits with the name: mix_score13_hb_rlbd_1ttz__IGNORE_THE_RESTlr13_DECOY_16324_*

- 1. ----------------------------------------------------------
Task: 303144429
Workunit: mix_score13_hb_rlbd_1ttz__IGNORE_THE_RESTlr13_DECOY_16324_936_0
CPU time: 85.64598
stderr out:
...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Fullatom mode ..
# cpu_run_time_pref: 43200
======================================================
DONE :: 1 starting structures 1201 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

- 2. ----------------------------------------------------------
Task: 302775198
Workunit: mix_score13_hb_rlbd_1ttz__IGNORE_THE_RESTlr13_DECOY_16324_508_1
CPU time: 75.6415
stderr out:
...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Fullatom mode ..
# cpu_run_time_pref: 43200
======================================================
DONE :: 1 starting structures 1201 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish


AdeB
ID: 64406 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 64414 - Posted: 9 Dec 2009, 0:23:35 UTC - in response to Message 64406.  

Validate errors in workunits with the name: mix_score13_hb_rlbd_1ttz__IGNORE_THE_RESTlr13_DECOY_16324_*
...
AdeB


Thanks!
There was a bug when we combine lr5, 8, 10 and 13 to make a large test. As a result, a few lr13 ones end up with too small input file and running too fast for the validation server.
This should be fixed soon.
ID: 64414 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Minirosetta 2.00



©2024 University of Washington
https://www.bakerlab.org