validate errors

Message boards : Number crunching : validate errors

To post messages, you must log in.

AuthorMessage
Robby1959

Send message
Joined: 10 May 07
Posts: 38
Credit: 9,298,741
RAC: 0
Message 78829 - Posted: 20 Sep 2015, 5:21:04 UTC

one of my machines produces validate errors - what could be the problem ? the only other project is GPUgrid running
ID: 78829 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 78831 - Posted: 20 Sep 2015, 6:18:32 UTC

Assuming this wasn't the case before the update:
It's not you, we've been getting lots of validate errors since the version update to 3.62.
GPUGrid, or any other project for that matter, wouldn't cause such errors.
ID: 78831 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 78832 - Posted: 20 Sep 2015, 9:48:17 UTC

If you get any type of errors on your task:

Step 1: Add a quick note on the type of error and the task id number in the Minirosetta 3.62 thread. That will help the scientists to investigate the problem and hopefully speed up finding a solution.

Step 2: Check the work unit details for the invalid task. If one user gets an error the task is sent out to another user to try again. If both users had an error then it is safe to say there is a problem with the work unit. If the other user was successful when you failed then that could suggest a problem with your machine (though I would only make that assumption after several errors that turned out successfully on other machines - one error may just be an isolated glitch).

One of your recent errors was on task 756849504 and checking the associated work unit, 685776063, shows that the second user also had an error. In this case it is safe to assume the problem is with the task and not your computer.


One important point to remember with validate errors is that it is possible for 99% of your data to be useable by the scientists. The system will report a validate error if the final decoy information doesn't come back correctly. In this task you submitted 1,261 decoys so it is possible that 1,260 of them came out fine (unfortunately there is no way for us to tell on the user side whether the data we submit is usable solely on the error message).
ID: 78832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 78833 - Posted: 20 Sep 2015, 17:23:12 UTC - in response to Message 78832.  
Last modified: 20 Sep 2015, 17:25:46 UTC

One of your recent errors was on task 756849504 and checking the associated work unit, 685776063, shows that the second user also had an error. In this case it is safe to assume the problem is with the task and not your computer.

This in a "FFD*" task, they all end with validate errors. All his validate errors are from those work units, so no problem on his end.
.
ID: 78833 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Murasaki
Avatar

Send message
Joined: 20 Apr 06
Posts: 303
Credit: 511,418
RAC: 0
Message 78834 - Posted: 20 Sep 2015, 18:14:34 UTC - in response to Message 78833.  
Last modified: 20 Sep 2015, 18:15:18 UTC

One of your recent errors was on task 756849504 and checking the associated work unit, 685776063, shows that the second user also had an error. In this case it is safe to assume the problem is with the task and not your computer.

This in a "FFD*" task, they all end with validate errors. All his validate errors are from those work units, so no problem on his end.


My advice explains what to do when validate errors occur in the future and I also pointed out the current issue is not with their machine. I don't see what was wrong with my reply...
ID: 78834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 78836 - Posted: 20 Sep 2015, 22:03:53 UTC - in response to Message 78834.  

Nothing wrong with your reply, just added a link to the thread about those tasks, actually as a hint for the OP to check at least few most recent threads for known problems, that way it's often possible to get an answer faster than by starting a new thread.
.
ID: 78836 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 78891 - Posted: 8 Oct 2015, 23:28:15 UTC

Not even got this small problem sorted yet?

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=692377625
ID: 78891 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
krypton
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 16 Nov 11
Posts: 108
Credit: 2,164,309
RAC: 0
Message 78892 - Posted: 9 Oct 2015, 1:54:04 UTC - in response to Message 78891.  

Shooooot! Thanks for the report, we're working on canceling the jobs and fixing the bug.

Not even got this small problem sorted yet?

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=692377625

ID: 78892 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rymorea

Send message
Joined: 28 Sep 15
Posts: 3
Credit: 53,241
RAC: 0
Message 78893 - Posted: 9 Oct 2015, 16:55:00 UTC

Last night I got 3 validate error

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=692514402

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=692515434

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=692402873

I hope it helps to find solution.
Seti@home Classic account User ID 955 member since 8 Sep 1999 classic CPU time 539,770 hours

ID: 78893 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 11,805,838
RAC: 0
Message 79005 - Posted: 29 Oct 2015, 20:28:08 UTC

Task 768264227 gave Validate Error.

6H_TUBE8_AB_6H_TUBE_07.pdb_15_10_12_32_13_globalDocking_1_SAVE_ALL_OUT_309429_46_0
ID: 79005 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 79168 - Posted: 9 Dec 2015, 0:21:15 UTC
Last modified: 9 Dec 2015, 0:22:38 UTC

rb_12_07_59935_105598_ab_stage0_t000___robetta_IGNORE_THE_REST_12_12_313657_76_0

99 decoys generated (i think that's where all WUs stop), first delivery, no restarts, outcome "invalid"

:-(
ID: 79168 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,641,936
RAC: 36
Message 79169 - Posted: 9 Dec 2015, 5:16:11 UTC - in response to Message 79168.  

rb_12_07_59935_105598_ab_stage0_t000___robetta_IGNORE_THE_REST_12_12_313657_76_0

99 decoys generated (i think that's where all WUs stop), first delivery, no restarts, outcome "invalid"

:-(


Was still counted towards science and towards your points total, if you scroll to the bottom of the page you linked to you'll see this:


Claimed credit 142.733357277178
Granted credit 142.733357277178

ID: 79169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 79170 - Posted: 9 Dec 2015, 6:20:52 UTC - in response to Message 79169.  
Last modified: 9 Dec 2015, 6:22:10 UTC

It has the WU state "cancelled" now, one more from the same series has the same outcome and it is cancelled too.

The credits have been granted later on that first one, I guess someone checked the problem and granted manually (thanks for that!), as the new one still shows 0 granted.
ID: 79170 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 79189 - Posted: 11 Dec 2015, 21:01:27 UTC
Last modified: 11 Dec 2015, 21:02:16 UTC

App version: Rosetta Mini 3.67

I'm still getting some validate errors from two different i7.

These 24 hours of computing have 0 granted credit... why???
https://boinc.bakerlab.org/rosetta/result.php?resultid=777361677
https://boinc.bakerlab.org/rosetta/result.php?resultid=777391503
ID: 79189 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,641,936
RAC: 36
Message 79190 - Posted: 11 Dec 2015, 23:20:48 UTC - in response to Message 79189.  

App version: Rosetta Mini 3.67

I'm still getting some validate errors from two different i7.

These 24 hours of computing have 0 granted credit... why???
https://boinc.bakerlab.org/rosetta/result.php?resultid=777361677
https://boinc.bakerlab.org/rosetta/result.php?resultid=777391503


Both of those WUs came from jobs that got cancelled by the researcher (see here and here - note the 'errors' field shows the job was cancelled) after your received the WU in your queue. Generally if you wait a couple of days the claimed credit will be granted on a future sweep of the validator program.

Situations like this are why David has been looking at tightening up the required 'average TAT' for some jobs. It doesn't appear to me that the BOINC client/server supports any kind of 'remote kill switch' for a given WU once it's been downloaded, so there's no way to prevent said WUs from needlessly running after queued if the job they belong to gets cancelled.

Personally, to improve my average turn around time for my boxes and thus increase the chance that the crunching I'm doing is actually going to contribute to someone's query results, I've actually tightened my queue settings (I basically don't queue any tasks) and my target runtime is a more modest 8 hours.
ID: 79190 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luigi R.

Send message
Joined: 7 Feb 14
Posts: 39
Credit: 2,045,527
RAC: 0
Message 79191 - Posted: 12 Dec 2015, 0:07:23 UTC - in response to Message 79190.  
Last modified: 12 Dec 2015, 0:15:23 UTC

Both of those WUs came from jobs that got cancelled by the researcher (see here and here - note the 'errors' field shows the job was cancelled) after your received the WU in your queue. Generally if you wait a couple of days the claimed credit will be granted on a future sweep of the validator program.

Ok! I thought granted credit was immediately equal to claimed credit when a WU failed and the only credit counter was updated after a couple of days. Glad to know I was wrong.

Situations like this are why David has been looking at tightening up the required 'average TAT' for some jobs. It doesn't appear to me that the BOINC client/server supports any kind of 'remote kill switch' for a given WU once it's been downloaded, so there's no way to prevent said WUs from needlessly running after queued if the job they belong to gets cancelled

I've seen many times workunits to get "cancelled by server" when are no longer needed or for other reasons.

Personally, to improve my average turn around time for my boxes and thus increase the chance that the crunching I'm doing is actually going to contribute to someone's query results, I've actually tightened my queue settings (I basically don't queue any tasks) and my target runtime is a more modest 8 hours.

I usually run default WUs. I was doing some tests and my queue's settings were "min/max reserve of work" = "0.5/0.6 days".
ID: 79191 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,448,817
RAC: 14,577
Message 79192 - Posted: 12 Dec 2015, 2:02:02 UTC - in response to Message 79191.  

Both of those WUs came from jobs that got cancelled by the researcher (see here and here - note the 'errors' field shows the job was cancelled) after your received the WU in your queue. Generally if you wait a couple of days the claimed credit will be granted on a future sweep of the validator program.

Ok! I thought granted credit was immediately equal to claimed credit when a WU failed and the only credit counter was updated after a couple of days. Glad to know I was wrong.

And credited already. It seems to be an overnight job that mops up validate errors, so usually within 24hrs. It is disconcerting, I agree
ID: 79192 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : validate errors



©2024 University of Washington
https://www.bakerlab.org