Lots of Validate Errors

Message boards : Number crunching : Lots of Validate Errors

To post messages, you must log in.

AuthorMessage
Profile Ace Casino

Send message
Joined: 16 Jul 07
Posts: 17
Credit: 11,587,856
RAC: 11,193
Message 79227 - Posted: 16 Dec 2015, 11:25:22 UTC

Please refer to the Title.
ID: 79227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,641,936
RAC: 36
Message 79229 - Posted: 16 Dec 2015, 14:19:17 UTC - in response to Message 79227.  
Last modified: 16 Dec 2015, 14:23:01 UTC

Please refer to the Title.


Took a look through your task queue - which is huge by the way - and noticed that the validate errors were all related to some new logic that aimed to tighten up the turn around time of some robetta jobs. Essentially, tasks were being sent out *after* a given job had already found a sufficiently minimum energy state, simply for the reason that the job had originally queued x number of tasks. To remedy this, some logic was put in place that cancelled the job's tasks once a minimum was successfully found (ie. a prediction was made). It looks like your validate errors are due to that 'fix' cancelling the jobs before your work was reported back...

For example, look at your task: 703912536 or 703912470. Notice that the 'errors:' value is 'Cancelled' to both. This is because the respective RB job was actually cancelled by the time your work was reported back.

There is now some work being done to implement a better solution, but one of the caveats is that it will mean some jobs will only come to hosts with a historically short turn around time.

A note about optimizing your contribution to the project:

Unless there is a specific reason why your work cache is so huge, I might recommend changing your client settings to cache much less work. Right now, work sent to your computer has an average turn around time of 9.2 days. This means, if a researcher submits a query for a prediction, and your computer happens to be the one to stumble upon the minimum energy state for the given model, it wont actually get reported for more than a week and in that case it may run into deadline/job expiry issues like what you've seen. Having a smaller cache will also mean the science moves forward faster! :)

I work in data science, and deal with slow database queries every day. It gets frustrating for me at work when I have to wait longer than a few minutes for a slow/massive query to finish - I can't fathom being a researcher and having to wait for an entire week for a query result. Don't be the reason researchers are waiting for a week for an answer :P

Please note I'm just another volunteer like yourself, so take my advice however you will.
ID: 79229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ace Casino

Send message
Joined: 16 Jul 07
Posts: 17
Credit: 11,587,856
RAC: 11,193
Message 79231 - Posted: 16 Dec 2015, 15:22:50 UTC

I appreciate your reply...but:

This is the only computer I have that is crunching 3 projects at the same time: SETI, Einstein & Rosetta.

The resource share of Rosetta is 2%. The work cache is: Maintain a 6 day cache with 3 day additional. That shouldn't be an unreasonable setting. It's been set up like this forever (as long a I can remember). If you where to look at my boinc stats chart you would see I go up and down like a yo yo because of how BOINC downloads the work. A week or so ago I had No Rosetta's cached. Then one day I look and I have 200 or so. My RAC goes up to 1500 or so....than slowly down to 70, or so.

If there is a problem with return times the Project Managers may need to increase or make an adjustment on their part. I'm sure there are people out there with 10/10 caches set up? Maybe someone can notify the Project Managers what is happening.

If this continues I'll just stop for a while. I've done that before with constant errors from Rosetta.

It's not about the credits...it's about wasting time and resources.

If I wanted credits I could just unleash my massive farm on the project......lol

Thanks again.
ID: 79231 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 352
Credit: 382,349
RAC: 0
Message 79232 - Posted: 16 Dec 2015, 16:35:22 UTC - in response to Message 79231.  

The resource share of Rosetta is 2%. The work cache is: Maintain a 6 day cache with 3 day additional. That shouldn't be an unreasonable setting.

With 3 active projects you actually don't need that large cache, the more project you run, the smaller should be your cache, as the risk of running out of work is also smaller, I'd suggest something like maximum 2 or 3 days, that should be completely enough.

As to your validate errors: as far as I have seen, all of them are rb_12* jobs, this issue I have already reported, there is a workaround for those tasks and since then they validate fine for me. With your large cache it might of course take a while, till your are free from those. BTW, as far as I understand this post, they still want those results back, so the work is not lost, even if it looks like that on the tasks page.
.
ID: 79232 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Lots of Validate Errors



©2024 University of Washington
https://www.bakerlab.org