Message boards : Number crunching : Valid WUs not meeting deadline are flagged incorrectly
Author | Message |
---|---|
Michael H.W. Weber Send message Joined: 18 Sep 05 Posts: 13 Credit: 6,672,462 RAC: 0 |
When checking my returned tasks today, I found two which were flagged as "Berechnungsfehler", meaning "computation invalid". However, that is not the case as follows: When looking into the details, both tasks were validated against an Apple Darwin machine which generated a faulty result, too. So how can something be validated against something faulty? Checking the sending date of the tasks, the deadline and the return time it quickly appeared that my tasks were simply returned to the server too late. This by the way is no wonder when tasks requiring 1.5 days for completion are given a deadline of only three days on machines which do not run 24/7 and where newly loaded tasks are having a deadline that is SHORTER than tasks already running (which means that BOINC switches tasks). You should re-validate these tasks and make sure that future error classification is working correctly. I actually suspect a general issue with validation between Windows 10 x64 Intel machines and the Apple stuff. Michael. P.S.: Another two of my tasks will forseeably not meet the deadline on this same Alienware laptop of mine. I will let them run and you should consider implementing a grace period of a few days for tasks NOT returned within the deadline. Many distributed computing projects have implemented this including our own (Yoyo@home, RNA World). President of Rechenkraft.net e.V. http://www.rechenkraft.net - The world's first and largest distributed computing association. We make those things possible that supercomputers don't. |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
When checking my returned tasks today, I found two which were flagged as "Berechnungsfehler", meaning "computation invalid". However, that is not the case as follows: Rosetta 4.12 didn't work for older MacOS machines, and the two WU's you showed as examples are exactly what would happen when we would try and run them before 4.15 came out. They would fail in seconds, note at the run-times on both Darwin failures you listed. Can't speak for your second problem, but if the work units fail twice (for whatever reason) they are invalid. Enough of them come back invalid and the researchers know they might have a issue with something. |
Dayle Send message Joined: 6 Jan 14 Posts: 13 Credit: 792,486 RAC: 2,401 |
I've been starting to get validation errors too. I had zero a few days ago, now I've had four units fail. All were instructed to run for 24 hours, although most invalidated in half the time. https://boinc.bakerlab.org/rosetta/result.php?resultid=1143200166 https://boinc.bakerlab.org/rosetta/result.php?resultid=1143200182 https://boinc.bakerlab.org/rosetta/result.php?resultid=1143200238 https://boinc.bakerlab.org/rosetta/result.php?resultid=1142301309 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Several of these show 600 completed models. That nice round number sounds like it must have been setup as the max for the WU. Having a max avoids the outfiles getting too huge. So that would explain why they didn't run a full 24 hours. They completed the max models before that, and ended. Not sure why the validation errors though. I see no indication of any problem in the rest of the output. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Valid WUs not meeting deadline are flagged incorrectly
©2024 University of Washington
https://www.bakerlab.org