Valid WUs not meeting deadline are flagged incorrectly

Message boards : Number crunching : Valid WUs not meeting deadline are flagged incorrectly

To post messages, you must log in.

AuthorMessage
Profile Michael H.W. Weber
Avatar

Send message
Joined: 18 Sep 05
Posts: 13
Credit: 6,672,462
RAC: 0
Message 93953 - Posted: 9 Apr 2020, 9:32:08 UTC

When checking my returned tasks today, I found two which were flagged as "Berechnungsfehler", meaning "computation invalid". However, that is not the case as follows:

When looking into the details, both tasks were validated against an Apple Darwin machine which generated a faulty result, too. So how can something be validated against something faulty?
Checking the sending date of the tasks, the deadline and the return time it quickly appeared that my tasks were simply returned to the server too late.

This by the way is no wonder when tasks requiring 1.5 days for completion are given a deadline of only three days on machines which do not run 24/7 and where newly loaded tasks are having a deadline that is SHORTER than tasks already running (which means that BOINC switches tasks).

You should re-validate these tasks and make sure that future error classification is working correctly. I actually suspect a general issue with validation between Windows 10 x64 Intel machines and the Apple stuff.

Michael.

P.S.: Another two of my tasks will forseeably not meet the deadline on this same Alienware laptop of mine. I will let them run and you should consider implementing a grace period of a few days for tasks NOT returned within the deadline. Many distributed computing projects have implemented this including our own (Yoyo@home, RNA World).
President of Rechenkraft.net e.V.

http://www.rechenkraft.net - The world's first and largest distributed computing association. We make those things possible that supercomputers don't.
ID: 93953 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 93971 - Posted: 9 Apr 2020, 13:05:24 UTC - in response to Message 93953.  

When checking my returned tasks today, I found two which were flagged as "Berechnungsfehler", meaning "computation invalid". However, that is not the case as follows:

When looking into the details, both tasks were validated against an Apple Darwin machine which generated a faulty result, too. So how can something be validated against something faulty?
Checking the sending date of the tasks, the deadline and the return time it quickly appeared that my tasks were simply returned to the server too late.

This by the way is no wonder when tasks requiring 1.5 days for completion are given a deadline of only three days on machines which do not run 24/7 and where newly loaded tasks are having a deadline that is SHORTER than tasks already running (which means that BOINC switches tasks).

You should re-validate these tasks and make sure that future error classification is working correctly. I actually suspect a general issue with validation between Windows 10 x64 Intel machines and the Apple stuff.

Michael.

P.S.: Another two of my tasks will forseeably not meet the deadline on this same Alienware laptop of mine. I will let them run and you should consider implementing a grace period of a few days for tasks NOT returned within the deadline. Many distributed computing projects have implemented this including our own (Yoyo@home, RNA World).


Rosetta 4.12 didn't work for older MacOS machines, and the two WU's you showed as examples are exactly what would happen when we would try and run them before 4.15 came out. They would fail in seconds, note at the run-times on both Darwin failures you listed.

Can't speak for your second problem, but if the work units fail twice (for whatever reason) they are invalid. Enough of them come back invalid and the researchers know they might have a issue with something.
ID: 93971 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dayle

Send message
Joined: 6 Jan 14
Posts: 13
Credit: 792,486
RAC: 2,401
Message 93978 - Posted: 9 Apr 2020, 15:10:21 UTC - in response to Message 93971.  
Last modified: 9 Apr 2020, 15:11:54 UTC

I've been starting to get validation errors too.

I had zero a few days ago, now I've had four units fail. All were instructed to run for 24 hours, although most invalidated in half the time.

https://boinc.bakerlab.org/rosetta/result.php?resultid=1143200166
https://boinc.bakerlab.org/rosetta/result.php?resultid=1143200182
https://boinc.bakerlab.org/rosetta/result.php?resultid=1143200238
https://boinc.bakerlab.org/rosetta/result.php?resultid=1142301309
ID: 93978 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 94001 - Posted: 9 Apr 2020, 20:22:57 UTC - in response to Message 93978.  

Several of these show 600 completed models. That nice round number sounds like it must have been setup as the max for the WU. Having a max avoids the outfiles getting too huge. So that would explain why they didn't run a full 24 hours. They completed the max models before that, and ended. Not sure why the validation errors though. I see no indication of any problem in the rest of the output.
Rosetta Moderator: Mod.Sense
ID: 94001 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Valid WUs not meeting deadline are flagged incorrectly



©2024 University of Washington
https://www.bakerlab.org