Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 72 · 73 · 74 · 75 · 76 · 77 · 78 . . . 311 · Next
Author | Message |
---|---|
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 679 |
"Validation error" No idea what it was. https://boinc.bakerlab.org/rosetta/result.php?resultid=1263056236 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
"Validation error" The other task from the same workunit also failed. Therefore, an error in one or more of the input files is a likely cause, even though the stderr output the two task said nothing very useful about just what the error was. |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 679 |
Yes, but it was an Unhandled Exception error at the start. I used to get some of of those related to my machine a while ago. Mine ran for a lot longer with no apparent errors. Just thought I'd report here. |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,276,393 RAC: 679 |
"Validation error" Another one https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1131803173 Just "validation error". Nothing seemingly wrong on the log. |
sph Send message Joined: 27 Mar 20 Posts: 7 Credit: 17,359,964 RAC: 0 |
Issue with 50% of all Rosetta tasks on this PC.: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041 Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected. All other PCs are fine. I have removed Rosetta from PC and run other projects, which work as expected. Re-added Rosetta. Tasks worked well for 4 days, then reverted back to the above failure pattern. The pc has no detected issues. As Rosetta is working well on my other PCs, the error is obviously only shown under specific conditions. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
Issue with 50% of all Rosetta tasks on this PC.: [snip] This looks like most of the points were based on the number of decoys completed, and NOT on the amount of CPU time used. You might check if this also holds for your other computers. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1735 Credit: 18,532,940 RAC: 14,716 |
Issue with 50% of all Rosetta tasks on this PC.:Just had a look at those WUs on my systems, and there are some WUs that pay out considerably less Credit than others, but no where near as low as what yours are doing. And some of those with low Credit have produced many more Decoys than some of those with much higher Credit. The only difference i can see is that i've process a lot more of them- more cores & threads in use & using the default processing time. The benchmarks on that system are OK, and the system isn't losing time to doing non-crunching work, so i can't think of any particular reason for such a variation in Credit granted (although i do recall that someone had a host several months back that was exhibiting similar odd Credit payouts, but i can't remember the result of that particular issue)). The amount of Credit granted depends on the amount of work done- which is the number of Models completed. 2 WUs of the same type running on the same system running for the same length of time may complete a similar number of Models, but one may produce only 1 Decoy, the other may produce hundreds. But both should get similar amounts of Credit as they did similar amounts of work (number of Models completed), even though the number of Decoys produced is different. Processing a Task for a longer period will result in more Credit for that Task- but the Credit per hour will still be on par with processing it for a much shorter period of time. The only way to get more Credit per hour is more cores & threads, and/or higher clock speed and/or greater IPC (Instructions Per Clock). Grant Darwin NT |
sph Send message Joined: 27 Mar 20 Posts: 7 Credit: 17,359,964 RAC: 0 |
Issue with 50% of all Rosetta tasks on this PC.:Just had a look at those WUs on my systems, and there are some WUs that pay out considerably less Credit than others, but no where near as low as what yours are doing. Hi Grant +1 on all feedback. This is an older gen PC that has been able to contribute at the level expected of this generation pc. The latest optimisation of the WU seems to has introduced issues peculiar to this PC config. It is a linux virtual box VM on a windows host whereas an identical PC running Linux on the host is running fine. If Admins cannot track the issue, I may just format the host as Linux and be done with it. Some issues dont justify the time required to debug. Just hoping others may have also seen similar issues. EDITS: fix typo and format |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Issue with 50% of all Rosetta tasks on this PC.:There’s a problem somewhere that’s causing those tasks to get stuck without performing much useful work. The lines in the output like BOINC:: CPU time: 50422.3s, 36000s + 14400s[2020-10- 4 18:25: 5:] :: BOINCcome from the watchdog ending the tasks 10 hours after their target 4-hour run time. It’s odd that they validate as successful under those circumstances. That machine is running the 32-bit Rosetta application, which I suspect doesn’t get much testing these days. Perhaps there’s a bug in the application itself, or some compatibility issue with the OS environment, or even something strange going on with the virtualisation. Hard to say. |
sph Send message Joined: 27 Mar 20 Posts: 7 Credit: 17,359,964 RAC: 0 |
Issue with 50% of all Rosetta tasks on this PC.:There’s a problem somewhere that’s causing those tasks to get stuck without performing much useful work. The lines in the output like Hi Brian Saw the same message (and unusual error messages in others WUs) and the unexpected successful completion... hence my hunch on an error in the app. Didnt think of the 32 bit angle, thanks for highlighting this aspect. This may definitely be a contributing factor. Looks more and more like a format for this machine, but wont be able to schedule this for 2 - 3 weeks..... so will continue to tinker with it till then. |
sph Send message Joined: 27 Mar 20 Posts: 7 Credit: 17,359,964 RAC: 0 |
Issue with 50% of all Rosetta tasks on this PC.: Further information on this issue: If I abort these tasks after 8 hours, credit is awarded at the expected level of work completed. I can only assume the aborted tasks would result is the low credit level, but based on current trend for this pc, this is a safe assumption. The credit is not awarded immediatley, but is awarded before the task is completed by another pc. |
Ross Parlette Send message Joined: 10 Nov 05 Posts: 32 Credit: 2,165,044 RAC: 0 |
I've only had a handful of tasks for the last few days, only two today. Am I missing something? Ross |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 677 |
I've only had a handful of tasks for the last few days, only two today. Am I missing something? That's been normal for about a week. The server status page indicates that few tasks are ready to send, but many are in progress. In other words, the number of user requests for tasks greatly exceeds the number of tasks created. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1735 Credit: 18,532,940 RAC: 14,716 |
I've only had a handful of tasks for the last few days, only two today. Am I missing something?There hasn't been any new work available for around 5 days now*. Every so often you might be lucky enough to pickup a resend when some other system misses it's deadline & the Work Unit is re-issued. *I know of one person that did actually get allocated a new Work Unit, but it errored out as it wasn't there to be downloaded. Grant Darwin NT |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,981,693 RAC: 922 |
sometimes we can get somes WU like yesteday 10 the 1 Nov 2020, 21:25:35 UTC |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=1287276494 and https://boinc.bakerlab.org/rosetta/result.php?resultid=1287276019 blew up and crashed BOINC and made a mess of my system. First one "file name to long" and out of memory errors Second one Status Access Violation probably because of the first one. |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
Phantom tasks? These two tasks are supposedly "In progress", but I cannot find them in Boinc Manager. Updating the project does nothing. drhicks1_derroids_torricks_fd2_SAVE_ALL_OUT_IGNORE_THE_REST_4za6sf5o_1021338_3_0 rb_11_04_43158_42385__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_1021346_50_0 These aren't even in the BOINC project folder. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Phantom tasks?I’ve seen this happen recently, too. Don’t know what was going on. I assume they just timed out and the server resent them to a different host. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1735 Credit: 18,532,940 RAC: 14,716 |
Phantom tasks? This can happen when there are network issues- Rosetta has allocated you the work during a Scheduler request, but for some reason your system didn't get that Scheduler reply, so you didn't download the work. The Task list for your Rosetta account shows you have the work, but there is no indication of it on your system. BOINC does support reissuing of missing Tasks, but it has to be enabled by the project. However due to the significant Scheduler server overhead in doing such work it is usually disabled by projects. It is possible to manually recover such Tasks, but it involves a lot of mucking around, with excellent attention to timing required. If you had hundreds of them, it'd be worth giving it a go. For just a couple, i wouldn't bother. They'll time out and be resent. Ghost Task recovery procedure. Grant Darwin NT |
Tomcat雄猫 Send message Joined: 20 Dec 14 Posts: 180 Credit: 5,386,173 RAC: 0 |
This can happen when there are network issues Welcome to my daily life. That and power issues. That explains it, I'll just not bother. It's only two tasks. Thanks. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org