Validate errors

Message boards : Number crunching : Validate errors

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Khali

Send message
Joined: 10 Mar 14
Posts: 1
Credit: 1,704,409
RAC: 0
Message 76585 - Posted: 3 Apr 2014, 19:36:18 UTC

I am new to Rosetta and I am getting a lot of Invalid tasks reported as Validate errors. I did a test run of Rosetta a few weeks ago and about 30% of the tasks I ran then were Validate errors.

Some of my team mates said it might be because of my mild over clock. Since we were all crunching for a team event on another project I had not had a chance to run any more Rosetta tasks until yesterday. I removed my over clock and tried again. Validate errors are not as prevalent as they were but I am still getting them.

Here is Rosetta's explanation of a Validate error.

Validate error - The task was reported but could not be validated, typically because the output files were lost on the server.

I can only conclude that I am spending three hours plus on each task only to have 25 to 30 percent of them get lost on your server. Not acceptable. This needs fixed asap.

Task ID 651174635
Name gr040214_ama1_longee_newhair377_relax_SAVE_ALL_OUT_157229_40_0
Workunit 591389920
Created 3 Apr 2014 11:32:15 UTC
Sent 3 Apr 2014 11:35:21 UTC
Received 3 Apr 2014 14:05:41 UTC
Server state Over
Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 1730267
Report deadline 13 Apr 2014 11:35:21 UTC
CPU time 2409.047
stderr out

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
[2014- 4- 3 8:22: 1:] :: BOINC:: Initializing ... ok.
[2014- 4- 3 8:22: 1:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: projects/boinc.bakerlab.org_rosetta/minirosetta_3.48_windows_x86_64.exe -out:file:silent default.out -in:file:s 00001.pdb -frag3 00001.200.3mers -in:file:native 00001.pdb -frag9 00001.200.9mers -silent_gz 1 -ex2aro 1 -relax::default_repeats 15 -in:file:fullatom 1 -run:protocol relax -ex1 1 -in:file:boinc_wu_zip gr040214_ama1_longee_newhair377_data.zip -out:file:silent default.out -silent_gz -mute all -detect_disulf True -in:file:native 00001.pdb -in:file:fullatom -in:file:s 00001.pdb -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1434278
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_b14204d.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/gr040214_ama1_longee_newhair377_data.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
======================================================
DONE :: 99 starting structures 2408.23 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 4.39628e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 25.3788504634809
Granted credit 0
application version 3.48
ID: 76585 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 76588 - Posted: 6 Apr 2014, 1:22:46 UTC - in response to Message 76585.  

I am new to Rosetta and I am getting a lot of Invalid tasks reported as Validate errors. I did a test run of Rosetta a few weeks ago and about 30% of the tasks I ran then were Validate errors.

Some of my team mates said it might be because of my mild over clock. Since we were all crunching for a team event on another project I had not had a chance to run any more Rosetta tasks until yesterday. I removed my over clock and tried again. Validate errors are not as prevalent as they were but I am still getting them.

Here is Rosetta's explanation of a Validate error.

Validate error - The task was reported but could not be validated, typically because the output files were lost on the server.

I can only conclude that I am spending three hours plus on each task only to have 25 to 30 percent of them get lost on your server. Not acceptable. This needs fixed asap.

Task ID 651174635
Name gr040214_ama1_longee_newhair377_relax_SAVE_ALL_OUT_157229_40_0
Workunit 591389920
Created 3 Apr 2014 11:32:15 UTC
Sent 3 Apr 2014 11:35:21 UTC
Received 3 Apr 2014 14:05:41 UTC
Server state Over
Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 1730267
Report deadline 13 Apr 2014 11:35:21 UTC
CPU time 2409.047
stderr out

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
[2014- 4- 3 8:22: 1:] :: BOINC:: Initializing ... ok.
[2014- 4- 3 8:22: 1:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: projects/boinc.bakerlab.org_rosetta/minirosetta_3.48_windows_x86_64.exe -out:file:silent default.out -in:file:s 00001.pdb -frag3 00001.200.3mers -in:file:native 00001.pdb -frag9 00001.200.9mers -silent_gz 1 -ex2aro 1 -relax::default_repeats 15 -in:file:fullatom 1 -run:protocol relax -ex1 1 -in:file:boinc_wu_zip gr040214_ama1_longee_newhair377_data.zip -out:file:silent default.out -silent_gz -mute all -detect_disulf True -in:file:native 00001.pdb -in:file:fullatom -in:file:s 00001.pdb -nstruct 10000 -cpu_run_time 10800 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -run::rng mt19937 -constant_seed -jran 1434278
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_b14204d.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/gr040214_ama1_longee_newhair377_data.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
======================================================
DONE :: 99 starting structures 2408.23 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================
BOINC :: WS_max 4.39628e+008

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 25.3788504634809
Granted credit 0
application version 3.48


I am getting them as well and I would like to know why, results are either valid or not.
ID: 76588 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Killersocke@rosetta

Send message
Joined: 13 Nov 06
Posts: 29
Credit: 2,579,125
RAC: 0
Message 76590 - Posted: 6 Apr 2014, 9:47:44 UTC

exactly same problems here with some of my tasks
since 4th April

regards
ID: 76590 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 76591 - Posted: 6 Apr 2014, 9:56:19 UTC

Me to!
ID: 76591 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 76593 - Posted: 6 Apr 2014, 15:57:01 UTC
Last modified: 6 Apr 2014, 16:02:44 UTC

one more

And this one (not mine)

As Rosetta doesn't have validation on comparison level, I wonder if it could be a problem with the upload handler. If the message "Couldn't resolve host name" is caused by a network problem on the Rosetta site, BOINC server side tasks could be affected as well.

Afaik. "finished upload" just means that the upload handler was able to store the file in a temporary storage place, but then the file has to be moved and if this move fails, the uploading host will probably not notice it (that's how http uploads usually work).
ID: 76593 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Billy

Send message
Joined: 29 May 06
Posts: 13
Credit: 1,490,408
RAC: 471
Message 76595 - Posted: 7 Apr 2014, 12:37:04 UTC

Same here. Mac OSX 10.7.5 Boinc Manager 7.3.13

Billy
ID: 76595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Columbus

Send message
Joined: 2 Jan 10
Posts: 2
Credit: 24,743
RAC: 0
Message 76596 - Posted: 8 Apr 2014, 8:06:50 UTC
Last modified: 8 Apr 2014, 8:13:06 UTC

Yes, I noticed that too. This seems to be a recent problem, I didn't have any issues in the past.

I'm also wondering why my "tasks for user" page lists more active WUs than I have downloaded. At least, I can't find them on my computer or on the BOINC manager's task list. They are just sitting there on the web page doing nothing and when the deadline is reached, they are tagged as "over - no reply".
ID: 76596 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 76597 - Posted: 8 Apr 2014, 13:09:16 UTC - in response to Message 76593.  

If the message "Couldn't resolve host name" is caused by a network problem on the Rosetta site, BOINC server side tasks could be affected as well.

"Couldn't resolve host name" is DNS problem on your side. The Rosetta servers will very unlikely use the same DNS server as you, if they need one at all (doubt that).
.
ID: 76597 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 76598 - Posted: 8 Apr 2014, 13:11:02 UTC - in response to Message 76596.  

I'm also wondering why my "tasks for user" page lists more active WUs than I have downloaded. At least, I can't find them on my computer or on the BOINC manager's task list. They are just sitting there on the web page doing nothing and when the deadline is reached, they are tagged as "over - no reply".

Are the names of those tasks similar to those from this thread?
.
ID: 76598 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Columbus

Send message
Joined: 2 Jan 10
Posts: 2
Credit: 24,743
RAC: 0
Message 76603 - Posted: 9 Apr 2014, 6:42:24 UTC - in response to Message 76598.  

Are the names of those tasks similar to those from this thread?


No, they are more like these:

foldit_997258_0009_fold_SAVE_ALL_OUT_155429_717_0
yrssfrv2d3_8_fold_SAVE_ALL_OUT_155181_1806_0
gr033114_ama1_longee_try75_fold_SAVE_ALL_OUT_156790_259_0
MoltnTIA_fold_SAVE_ALL_OUT_152738_13068_1

So, they have relatively short names. But there's also this:


C3_1kr4_C2_1sgm_0006_trimer_C3_1kr4_C2_1sgm_0006_dimer_patchdock_split_08_140330_SAVE_ALL_OUT__156751_154_0

However, none of them have two consecutive dots in their names. Perhaps I should have posted this in the other thread.
ID: 76603 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 76609 - Posted: 11 Apr 2014, 8:22:39 UTC
Last modified: 11 Apr 2014, 8:28:30 UTC

nearly 20% validate errors lately, I guess I'll disable Rosetta until this is solved.

p.s.: my runtime prefs are set to 8 hours and the broken ones all ran over the full timespan, maybe some results cannot handle that?

There are Windows and Linux results with this problem, so it is independant from the OS.
ID: 76609 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 76610 - Posted: 11 Apr 2014, 14:49:49 UTC

Second delivery of one of my invalid results :


Outcome Client error

Validate state Invalid

Granted credit 300


Btw. : My invalid WUs show granted credits too, not in the list but on the details page.


I'm even more confused now - are the results useful and recoverd manually or are they used to fill the trashcan?
ID: 76610 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 76612 - Posted: 11 Apr 2014, 18:40:52 UTC - in response to Message 76610.  


I'm even more confused now - are the results useful and recoverd manually or are they used to fill the trashcan?


Rosetta grants credit (in a nightly script) for failed tasks. The idea being that learning about failure is a part of finding success and so it is of value.

Often times there are specific models that do not process in a timely mannar. So a given task may have many normal and successful models produced, then the last one gets hung up and runs long. So, yes, there is also such a thing as partial success (BOINC does not really support the concept). And learning about what hangs up an algorithm is useful too.
Rosetta Moderator: Mod.Sense
ID: 76612 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 76613 - Posted: 11 Apr 2014, 21:30:00 UTC - in response to Message 76612.  

... So, yes, there is also such a thing as partial success (BOINC does not really support the concept). And learning about what hangs up an algorithm is useful too.

A very important information, if crashed results are used to improve the methods, they are not really a waste of energy.

Thanks :-)
ID: 76613 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Cesar Gil

Send message
Joined: 5 Apr 14
Posts: 1
Credit: 34,245
RAC: 0
Message 76638 - Posted: 20 Apr 2014, 20:09:40 UTC
Last modified: 20 Apr 2014, 20:10:00 UTC

If Rosetta grants credit for failed tasks, then I must conclude that they are something different from validate error tasks, since I have validate error tasks even from days ago and none of them receives credits.
Since the rate of validate error tasks is high, I guess I'll switch to helping other projects until this gets fixed. It is not reasonable that I see computing power systematically being given for no salvageable result.
ID: 76638 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 76639 - Posted: 20 Apr 2014, 22:56:24 UTC - in response to Message 76638.  

If Rosetta grants credit for failed tasks, then I must conclude that they are something different from validate error tasks, since I have validate error tasks even from days ago and none of them receives credits.
Since the rate of validate error tasks is high, I guess I'll switch to helping other projects until this gets fixed. It is not reasonable that I see computing power systematically being given for no salvageable result.


The granting of credit for errors is done daily, and when it occurs, the credit is NOT shown on the list of WUs, so looking at the WU details for one of your validate errors more than a day old shows you were granted all of the credit you claimed:
https://boinc.bakerlab.org/rosetta/result.php?resultid=654447052
Rosetta Moderator: Mod.Sense
ID: 76639 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 76644 - Posted: 21 Apr 2014, 22:38:30 UTC
Last modified: 21 Apr 2014, 22:39:02 UTC

Hi.

I'll add myself to the list of those getting validate errors now as well, I think this has only started for me anyway since the servers had been moved.
It seams to be a mixture of task name types for both my rigs, plus with the trouble getting work that seamed start around the same time, just wondering if something got knocked about on one or more of the servers when they where moved.

just my 2c!
ID: 76644 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 76651 - Posted: 25 Apr 2014, 12:11:35 UTC - in response to Message 76644.  

Hi.

I'll add myself to the list of those getting validate errors now as well, I think this has only started for me anyway since the servers had been moved.
It seams to be a mixture of task name types for both my rigs, plus with the trouble getting work that seamed start around the same time, just wondering if something got knocked about on one or more of the servers when they where moved.

just my 2c!


Me to and they are getting on my *******'s

Not sure how many of the people who run this project care anymore.
ID: 76651 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tex1954

Send message
Joined: 3 Apr 11
Posts: 9
Credit: 3,338,665
RAC: 1,687
Message 76653 - Posted: 25 Apr 2014, 20:35:02 UTC
Last modified: 25 Apr 2014, 20:38:17 UTC

ID: 76653 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jozef J

Send message
Joined: 7 Jun 12
Posts: 3
Credit: 1,156,504
RAC: 0
Message 76654 - Posted: 25 Apr 2014, 21:06:07 UTC

I donated to your project, I'm waiting some response or donation badge
ID: 76654 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Validate errors



©2024 University of Washington
https://www.bakerlab.org