minirosetta 2.16

Message boards : Number crunching : minirosetta 2.16

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 68004 - Posted: 9 Oct 2010, 20:14:32 UTC

This is reverting minirosetta to 2.14 due to the recent memory problem.
ID: 68004 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 68009 - Posted: 9 Oct 2010, 22:33:56 UTC
Last modified: 9 Oct 2010, 22:46:14 UTC

These two both failed after 11sec with the new app!

Maybe something to do with the change over?

EDIT// I have a oct_ task running O.K. now.


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338631824

mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_74504_0

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>


Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...

ERROR: bad line in file minirosetta_database/scoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 204
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>

==============================================================================

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338631880

[b]mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_74521_0[b/]

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>

Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...

ERROR: bad line in file minirosetta_database/scoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 204
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
ID: 68009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 68011 - Posted: 10 Oct 2010, 1:47:57 UTC
Last modified: 10 Oct 2010, 2:28:39 UTC

Another one same problem on a different rig, 14sec this time.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338630619

mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_74213_0

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>


Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...

ERROR: bad line in file minirosetta_database/scoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 204
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
ID: 68011 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 16
Credit: 33,020,247
RAC: 0
Message 68014 - Posted: 10 Oct 2010, 5:40:04 UTC - in response to Message 68004.  
Last modified: 10 Oct 2010, 5:44:42 UTC

Same error on WU 338667991 (https://boinc.bakerlab.org/rosetta/result.php?resultid=370646341). Wingman also had same error.

Snippet from log:

Incorrect function. (0x1) - exit code 1 (0x1)
...
ERROR: bad line in file minirosetta_databasescoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5
ERROR:: Exit from: ....srccorescoringScoreFunction.cc line: 204
ID: 68014 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 68020 - Posted: 10 Oct 2010, 23:14:13 UTC

My apologies for the new errors. I forgot that those jobs are using an option that's associated with 2.15. I just cancelled those jobs.
ID: 68020 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 68025 - Posted: 11 Oct 2010, 3:30:57 UTC

I got this one this morning, must have been before you stopped them.

Same as others, ran for 11sec.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338812571

mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_85437_0

Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...

ERROR: bad line in file minirosetta_database/scoring/weights/membrane_highres_Menv_smooth.wts:Menv_smooth 0.5
ERROR:: Exit from: src/core/scoring/ScoreFunction.cc line: 204
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>

ID: 68025 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 68026 - Posted: 11 Oct 2010, 6:49:02 UTC

Two more of the same old error, downloaded earlier today.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338799412

mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_83642_0


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=338804158

mem_widd_run02_Menv_B_1c3w_SAVE_ALL_OUT_IGNORE_THE_REST_22293_84286_0

ID: 68026 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 68035 - Posted: 11 Oct 2010, 18:42:53 UTC - in response to Message 68030.  

weights is not a oc error.
that is a program error.
if you get a windows debugger dump with a 0xc error I think it is, that could be due to oc or just some problem with the task and your memory.
if you get a ton of the memory errors all in a row then your oc speed is to high.
for my machine i can push rosie up to 3.0 ghz (this is as fast as I can go without crashing google earth). official clock speed on my machine is 2.5ghz

Could those errors be related to oc? I contributed to project in past and recently have been recontributing and have seen no erros on my side.

ID: 68035 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mickey

Send message
Joined: 11 Jan 10
Posts: 2
Credit: 20,113
RAC: 0
Message 68067 - Posted: 13 Oct 2010, 12:32:54 UTC - in response to Message 68004.  
Last modified: 13 Oct 2010, 12:36:52 UTC

I have found one little issue in x86_64 linux app.
When the workunits are started for the first time (i.e. started from 0.000% of progress) they works normally and the graphics are displayed correctly, but sometimes i find the graphics switching countinuosly from
Stage:
to
App suspended
without printing which stage is running. The value % complete is still increasing, like the step counter and the cpu time, but the top graphs: "Searching..." "Accepted", "Low Energy" and "Accepted Energy" are empty. I think that this trouble is raised when the app is stopped (for system reboot or any other motivation), and for making the app running again correctly I have to reset the project, because also new workunits will go in this state.

Following the stderr of one corrupted execution at 17% done:
[2010-10-13  2: 3:34:] :: BOINC:: Initializing ... ok.
[2010-10-13  2: 3:34:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully. 
Registering options.. 
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
Loaded options.... ok 
Processed options.... ok 
Initializing random generators... ok 
Initialization complete. 
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev38513.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_mem_widd_run03_centroid_A_2a9h_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup. 
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 21600
[2010-10-13  9:55: 8:] :: BOINC:: Initializing ... ok.
[2010-10-13  9:55: 8:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully. 
Registering options.. 
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
Loaded options.... ok 
Processed options.... ok 
Initializing random generators... ok 
Initialization complete. 
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev38513.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_mem_widd_run03_centroid_A_2a9h_yfsong.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup. 
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00001_FragmentSampler__stage1 ... success! 
Continuing computation from checkpoint: chk_S_00001_FragmentSampler__stage2 ... success! 
Continuing computation from checkpoint: chk_S_00001_FragmentSampler__stage_3_iter1_1 ... success! # cpu_run_time_pref: 21600

Continuing computation from checkpoint: chk_S_00001_FragmentSampler__stage_3_iter1_2 ... success! 
Continuing computation from checkpoint: chk_S_00001_FragmentSampler__stage_3_iter1_3 ... success! 


I don't know if is only a graphics issue or a real app trouble, but I think that it's a strange behaviour.

edit: my system config is
Fedora core 13 x86_64
boinc installed from Fedora repositories version 6.10.45
other computer infos here
ID: 68067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mickey

Send message
Joined: 11 Jan 10
Posts: 2
Credit: 20,113
RAC: 0
Message 68073 - Posted: 13 Oct 2010, 16:21:50 UTC - in response to Message 68067.  

Ok guys, calm down. :)
I have just checked again the graphics of the workunit that I am crunching on, and it works again.

So I think that what I've posted before is only a visualization issue and not an application one.
ID: 68073 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
eruda

Send message
Joined: 6 Apr 06
Posts: 1
Credit: 6,508,987
RAC: 0
Message 68086 - Posted: 14 Oct 2010, 16:22:15 UTC
Last modified: 14 Oct 2010, 16:24:41 UTC

it continues that some of the app seemed to corrupt suddenly and the BOINC manager fail to notice it. I have a task running for over 60 hours and stuck at 16%, so I have no choice to terminate it manually...I don't know much about the technical issues, sorry for being not able to provide the error log.
ID: 68086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 68112 - Posted: 16 Oct 2010, 22:20:37 UTC

NP_961412.1_boinc_boinc_loopbuild_threading_cst_relax_tex_IGNORE_THE_REST_22319_804_1

The WU was ended after only 1201 seconds and one model completed despite 43200 second run time pref, and received the outcome "validate error".

This is a resend (no reply) originally issued on the 6th prior 2.15 being rescinded though obviously I crunched it with 2.16. A third copy has now been issued.


Snags
ID: 68112 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snowfall

Send message
Joined: 10 Dec 06
Posts: 2
Credit: 4,832,916
RAC: 99
Message 68117 - Posted: 17 Oct 2010, 18:55:53 UTC

My computer hanged after about 5 and a half hours of computation (with about the same amount left to go), then, after a reset, when I restarted BOINC it suddenly uploaded the workunit back to the server instead of continuing it.

I logged in to see why it failed, but it seems that its outcome state is "Success". I don't think this is right. Is this supposed to happen?

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=339761570
ID: 68117 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 68119 - Posted: 18 Oct 2010, 6:34:21 UTC - in response to Message 68117.  

My computer hanged after about 5 and a half hours of computation (with about the same amount left to go), then, after a reset, when I restarted BOINC it suddenly uploaded the workunit back to the server instead of continuing it.

I logged in to see why it failed, but it seems that its outcome state is "Success". I don't think this is right. Is this supposed to happen?

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=339761570


The task unit log you linked to ends with an error, so no, that isn't the expected behaviour. However you must have been able to return complete data for at least one model/decoy from the WU in order for the system to recognise that part of the upload as valid.


ERROR: ERROR: Unable to open silent_input file: 'chk_S_00014_FragmentSampler__rg_state.out'
ERROR:: Exit from: src/core/io/silent/SilentFileData.cc line: 86
called boinc_finish
# cpu_run_time_pref: 28800
ID: 68119 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snowfall

Send message
Joined: 10 Dec 06
Posts: 2
Credit: 4,832,916
RAC: 99
Message 68122 - Posted: 18 Oct 2010, 18:11:28 UTC - in response to Message 68119.  

Thank you for the quick reply. Yes, I did notice that part of the error log, that's why I was intrigued by the apparently successful outcome state. I didn't ever consider it could be partly successful though :).
ID: 68122 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68146 - Posted: 20 Oct 2010, 14:01:58 UTC

Linking to a post reporting task names with "lrlx" in the name seem to cause some problems.
Rosetta Moderator: Mod.Sense
ID: 68146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pilgrim57

Send message
Joined: 31 Jul 08
Posts: 3
Credit: 1,965,851
RAC: 0
Message 68177 - Posted: 23 Oct 2010, 20:30:58 UTC

For over a week now I have had a lot of errors with work units getting computer error after just starting, half way through or just before completion
see task id 373614961,373614960,373614958, 372516143, 372515901. 370066639
371496237, 371496257.

I have altered the run time from 6 hours to 4 & now the default 3.
My PC is overclocked but it seems these errors have started with 2.16.

Any help appreciated.
ID: 68177 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bikermatt

Send message
Joined: 12 Feb 10
Posts: 20
Credit: 10,552,445
RAC: 0
Message 68179 - Posted: 23 Oct 2010, 23:06:29 UTC

Anyone else notice PCS_ tasks running poorly in Linux? They are running longer than default and producing fewer models then on my similarly equipped Win 7 box.
ID: 68179 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 68180 - Posted: 23 Oct 2010, 23:25:57 UTC

This one failed after 4sec.

PCS_2RN2_v1.frag_1-41_SAVE_ALL_OUT_22378_8_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=341585851

ERROR: ERROR: FragmentIO: could not open file boinc_aafrag_1-41_09_05.200_v1_3.gz
ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 258
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 68180 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 68182 - Posted: 23 Oct 2010, 23:39:04 UTC

PCS_2RN2_atensor.frag_1-100_SAVE_ALL_OUT_22378_16_0 died @ 3.5 seconds

ERROR: ERROR: FragmentIO: could not open file boinc_aafrag_1-100_09_05.200_v1_3.gz
ERROR:: Exit from: ....srccorefragmentFragmentIO.cc line: 258
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


ID: 68182 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : minirosetta 2.16



©2024 University of Washington
https://www.bakerlab.org