Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 72 · 73 · 74 · 75 · 76 · 77 · 78 . . . 311 · Next

AuthorMessage
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 679
Message 99097 - Posted: 22 Sep 2020, 21:20:52 UTC

"Validation error"

No idea what it was.


https://boinc.bakerlab.org/rosetta/result.php?resultid=1263056236
ID: 99097 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 677
Message 99112 - Posted: 23 Sep 2020, 0:42:18 UTC - in response to Message 99097.  

"Validation error"

No idea what it was.


https://boinc.bakerlab.org/rosetta/result.php?resultid=1263056236

The other task from the same workunit also failed. Therefore, an error in one or more of the input files is a likely cause, even though the stderr output the two task said nothing very useful about just what the error was.
ID: 99112 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 679
Message 99114 - Posted: 23 Sep 2020, 8:36:57 UTC - in response to Message 99112.  

Yes, but it was an Unhandled Exception error at the start. I used to get some of of those related to my machine a while ago.


Mine ran for a lot longer with no apparent errors.

Just thought I'd report here.
ID: 99114 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 679
Message 99127 - Posted: 25 Sep 2020, 19:23:40 UTC - in response to Message 99097.  

"Validation error"

No idea what it was.


https://boinc.bakerlab.org/rosetta/result.php?resultid=1263056236



Another one https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1131803173

Just "validation error". Nothing seemingly wrong on the log.
ID: 99127 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sph

Send message
Joined: 27 Mar 20
Posts: 7
Credit: 17,359,964
RAC: 0
Message 99234 - Posted: 4 Oct 2020, 2:43:38 UTC

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected.
All other PCs are fine.
I have removed Rosetta from PC and run other projects, which work as expected.
Re-added Rosetta. Tasks worked well for 4 days, then reverted back to the above failure pattern.
The pc has no detected issues.
As Rosetta is working well on my other PCs, the error is obviously only shown under specific conditions.
ID: 99234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 677
Message 99236 - Posted: 4 Oct 2020, 3:02:37 UTC - in response to Message 99234.  
Last modified: 4 Oct 2020, 3:04:02 UTC

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected.

[snip]

This looks like most of the points were based on the number of decoys completed, and NOT on the amount of CPU time used.

You might check if this also holds for your other computers.
ID: 99236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1735
Credit: 18,532,940
RAC: 14,716
Message 99238 - Posted: 4 Oct 2020, 4:15:38 UTC - in response to Message 99234.  

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected.
All other PCs are fine.
I have removed Rosetta from PC and run other projects, which work as expected.
Re-added Rosetta. Tasks worked well for 4 days, then reverted back to the above failure pattern.
The pc has no detected issues.
As Rosetta is working well on my other PCs, the error is obviously only shown under specific conditions.
Just had a look at those WUs on my systems, and there are some WUs that pay out considerably less Credit than others, but no where near as low as what yours are doing.
And some of those with low Credit have produced many more Decoys than some of those with much higher Credit.
The only difference i can see is that i've process a lot more of them- more cores & threads in use & using the default processing time.

The benchmarks on that system are OK, and the system isn't losing time to doing non-crunching work, so i can't think of any particular reason for such a variation in Credit granted (although i do recall that someone had a host several months back that was exhibiting similar odd Credit payouts, but i can't remember the result of that particular issue)).




The amount of Credit granted depends on the amount of work done- which is the number of Models completed.
2 WUs of the same type running on the same system running for the same length of time may complete a similar number of Models, but one may produce only 1 Decoy, the other may produce hundreds. But both should get similar amounts of Credit as they did similar amounts of work (number of Models completed), even though the number of Decoys produced is different.

Processing a Task for a longer period will result in more Credit for that Task- but the Credit per hour will still be on par with processing it for a much shorter period of time. The only way to get more Credit per hour is more cores & threads, and/or higher clock speed and/or greater IPC (Instructions Per Clock).
Grant
Darwin NT
ID: 99238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sph

Send message
Joined: 27 Mar 20
Posts: 7
Credit: 17,359,964
RAC: 0
Message 99240 - Posted: 4 Oct 2020, 10:39:41 UTC - in response to Message 99238.  
Last modified: 4 Oct 2020, 10:42:31 UTC

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected.
All other PCs are fine.
I have removed Rosetta from PC and run other projects, which work as expected.
Re-added Rosetta. Tasks worked well for 4 days, then reverted back to the above failure pattern.
The pc has no detected issues.
As Rosetta is working well on my other PCs, the error is obviously only shown under specific conditions.
Just had a look at those WUs on my systems, and there are some WUs that pay out considerably less Credit than others, but no where near as low as what yours are doing.
And some of those with low Credit have produced many more Decoys than some of those with much higher Credit.
The only difference i can see is that i've process a lot more of them- more cores & threads in use & using the default processing time.

The benchmarks on that system are OK, and the system isn't losing time to doing non-crunching work, so i can't think of any particular reason for such a variation in Credit granted (although i do recall that someone had a host several months back that was exhibiting similar odd Credit payouts, but i can't remember the result of that particular issue)).




The amount of Credit granted depends on the amount of work done- which is the number of Models completed.
2 WUs of the same type running on the same system running for the same length of time may complete a similar number of Models, but one may produce only 1 Decoy, the other may produce hundreds. But both should get similar amounts of Credit as they did similar amounts of work (number of Models completed), even though the number of Decoys produced is different.

Processing a Task for a longer period will result in more Credit for that Task- but the Credit per hour will still be on par with processing it for a much shorter period of time. The only way to get more Credit per hour is more cores & threads, and/or higher clock speed and/or greater IPC (Instructions Per Clock).


Hi Grant
+1 on all feedback.
This is an older gen PC that has been able to contribute at the level expected of this generation pc. The latest optimisation of the WU seems to has introduced issues peculiar to this PC config. It is a linux virtual box VM on a windows host whereas an identical PC running Linux on the host is running fine.
If Admins cannot track the issue, I may just format the host as Linux and be done with it. Some issues dont justify the time required to debug.
Just hoping others may have also seen similar issues.

EDITS: fix typo and format
ID: 99240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99241 - Posted: 4 Oct 2020, 11:30:10 UTC - in response to Message 99234.  

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points.
There’s a problem somewhere that’s causing those tasks to get stuck without performing much useful work. The lines in the output like
BOINC:: CPU time: 50422.3s, 36000s + 14400s[2020-10- 4 18:25: 5:] :: BOINC
come from the watchdog ending the tasks 10 hours after their target 4-⁠hour run time. It’s odd that they validate as successful under those circumstances.

That machine is running the 32-⁠bit Rosetta application, which I suspect doesn’t get much testing these days. Perhaps there’s a bug in the application itself, or some compatibility issue with the OS environment, or even something strange going on with the virtualisation. Hard to say.
ID: 99241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sph

Send message
Joined: 27 Mar 20
Posts: 7
Credit: 17,359,964
RAC: 0
Message 99243 - Posted: 4 Oct 2020, 12:18:28 UTC - in response to Message 99241.  
Last modified: 4 Oct 2020, 12:21:28 UTC

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points.
There’s a problem somewhere that’s causing those tasks to get stuck without performing much useful work. The lines in the output like
BOINC:: CPU time: 50422.3s, 36000s + 14400s[2020-10- 4 18:25: 5:] :: BOINC
come from the watchdog ending the tasks 10 hours after their target 4-⁠hour run time. It’s odd that they validate as successful under those circumstances.

That machine is running the 32-⁠bit Rosetta application, which I suspect doesn’t get much testing these days. Perhaps there’s a bug in the application itself, or some compatibility issue with the OS environment, or even something strange going on with the virtualisation. Hard to say.


Hi Brian
Saw the same message (and unusual error messages in others WUs) and the unexpected successful completion... hence my hunch on an error in the app.
Didnt think of the 32 bit angle, thanks for highlighting this aspect. This may definitely be a contributing factor.

Looks more and more like a format for this machine, but wont be able to schedule this for 2 - 3 weeks..... so will continue to tinker with it till then.
ID: 99243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sph

Send message
Joined: 27 Mar 20
Posts: 7
Credit: 17,359,964
RAC: 0
Message 99244 - Posted: 5 Oct 2020, 2:56:09 UTC - in response to Message 99234.  

Issue with 50% of all Rosetta tasks on this PC.:
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=4975041
Tasks will run for 50000 seconds yet only yield 6.5 points to 45 points. The other 50% of Rosetta tasks on this PC work as expected.
All other PCs are fine.
I have removed Rosetta from PC and run other projects, which work as expected.
Re-added Rosetta. Tasks worked well for 4 days, then reverted back to the above failure pattern.
The pc has no detected issues.
As Rosetta is working well on my other PCs, the error is obviously only shown under specific conditions.


Further information on this issue:
If I abort these tasks after 8 hours, credit is awarded at the expected level of work completed. I can only assume the aborted tasks would result is the low credit level, but based on current trend for this pc, this is a safe assumption.
The credit is not awarded immediatley, but is awarded before the task is completed by another pc.
ID: 99244 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ross Parlette

Send message
Joined: 10 Nov 05
Posts: 32
Credit: 2,165,044
RAC: 0
Message 99484 - Posted: 1 Nov 2020, 23:21:44 UTC

I've only had a handful of tasks for the last few days, only two today. Am I missing something?

Ross
ID: 99484 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 677
Message 99485 - Posted: 1 Nov 2020, 23:27:48 UTC - in response to Message 99484.  

I've only had a handful of tasks for the last few days, only two today. Am I missing something?

Ross

That's been normal for about a week.

The server status page indicates that few tasks are ready to send, but many are in progress.

In other words, the number of user requests for tasks greatly exceeds the number of tasks created.
ID: 99485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1735
Credit: 18,532,940
RAC: 14,716
Message 99499 - Posted: 2 Nov 2020, 9:13:16 UTC - in response to Message 99484.  

I've only had a handful of tasks for the last few days, only two today. Am I missing something?
There hasn't been any new work available for around 5 days now*.
Every so often you might be lucky enough to pickup a resend when some other system misses it's deadline & the Work Unit is re-issued.




*I know of one person that did actually get allocated a new Work Unit, but it errored out as it wasn't there to be downloaded.
Grant
Darwin NT
ID: 99499 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,981,693
RAC: 922
Message 99500 - Posted: 2 Nov 2020, 10:31:53 UTC

sometimes we can get somes WU like yesteday 10 the 1 Nov 2020, 21:25:35 UTC
ID: 99500 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 99547 - Posted: 3 Nov 2020, 19:37:53 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=1287276494

and

https://boinc.bakerlab.org/rosetta/result.php?resultid=1287276019

blew up and crashed BOINC and made a mess of my system.

First one "file name to long" and out of memory errors
Second one Status Access Violation probably because of the first one.
ID: 99547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 99596 - Posted: 7 Nov 2020, 0:29:17 UTC
Last modified: 7 Nov 2020, 0:29:37 UTC

Phantom tasks?
These two tasks are supposedly "In progress", but I cannot find them in Boinc Manager. Updating the project does nothing.
drhicks1_derroids_torricks_fd2_SAVE_ALL_OUT_IGNORE_THE_REST_4za6sf5o_1021338_3_0
rb_11_04_43158_42385__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_1021346_50_0
These aren't even in the BOINC project folder.
ID: 99596 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99597 - Posted: 7 Nov 2020, 1:10:48 UTC - in response to Message 99596.  

Phantom tasks?
I’ve seen this happen recently, too. Don’t know what was going on. I assume they just timed out and the server resent them to a different host.
ID: 99597 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1735
Credit: 18,532,940
RAC: 14,716
Message 99598 - Posted: 7 Nov 2020, 1:40:47 UTC - in response to Message 99596.  
Last modified: 7 Nov 2020, 1:42:53 UTC

Phantom tasks?
These two tasks are supposedly "In progress", but I cannot find them in Boinc Manager. Updating the project does nothing.
drhicks1_derroids_torricks_fd2_SAVE_ALL_OUT_IGNORE_THE_REST_4za6sf5o_1021338_3_0
rb_11_04_43158_42385__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_1021346_50_0
These aren't even in the BOINC project folder.

This can happen when there are network issues- Rosetta has allocated you the work during a Scheduler request, but for some reason your system didn't get that Scheduler reply, so you didn't download the work. The Task list for your Rosetta account shows you have the work, but there is no indication of it on your system.

BOINC does support reissuing of missing Tasks, but it has to be enabled by the project. However due to the significant Scheduler server overhead in doing such work it is usually disabled by projects.


It is possible to manually recover such Tasks, but it involves a lot of mucking around, with excellent attention to timing required. If you had hundreds of them, it'd be worth giving it a go. For just a couple, i wouldn't bother. They'll time out and be resent.

Ghost Task recovery procedure.
Grant
Darwin NT
ID: 99598 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 99599 - Posted: 7 Nov 2020, 1:53:10 UTC - in response to Message 99598.  
Last modified: 7 Nov 2020, 1:56:41 UTC

This can happen when there are network issues

Welcome to my daily life. That and power issues.
That explains it, I'll just not bother. It's only two tasks.
Thanks.
ID: 99599 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 72 · 73 · 74 · 75 · 76 · 77 · 78 . . . 311 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org