Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 320 · 321 · 322 · 323 · 324 · 325 · 326 . . . 328 · Next

AuthorMessage
dagamier

Send message
Joined: 12 Dec 05
Posts: 8
Credit: 2,942,707
RAC: 1,311
Message 112288 - Posted: 26 Mar 2025, 17:42:23 UTC

Why is it that in the Tasks tab, all of my work units show that they will be late (Completion before deadline) and already are late based on the deadline column. I'm on a 12 core Mac and all of my other project burn through units, but the Rosetta one consistently has them all run late. I've even tried tweaking settings to give it a higher priority, but still consistently late.
ID: 112288 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bill F
Avatar

Send message
Joined: 29 Jan 08
Posts: 54
Credit: 1,778,862
RAC: 1,111
Message 112289 - Posted: 26 Mar 2025, 18:10:29 UTC - in response to Message 112288.  

Why is it that in the Tasks tab, all of my work units show that they will be late (Completion before deadline) and already are late based on the deadline column. I'm on a 12 core Mac and all of my other project burn through units, but the Rosetta one consistently has them all run late. I've even tried tweaking settings to give it a higher priority, but still consistently late.



The Tasks that your computer has completed have a very long compute time. Under your account for Rosetta look at your Preferences for Rosetta and see what you have for a Targeted "CPU Run Time. ? Standard is 8 Hours
ID: 112289 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2292
Credit: 43,161,777
RAC: 25,399
Message 112290 - Posted: 26 Mar 2025, 18:48:42 UTC - in response to Message 112289.  

Why is it that in the Tasks tab, all of my work units show that they will be late (Completion before deadline) and already are late based on the deadline column. I'm on a 12 core Mac and all of my other project burn through units, but the Rosetta one consistently has them all run late. I've even tried tweaking settings to give it a higher priority, but still consistently late.

The Tasks that your computer has completed have a very long compute time. Under your account for Rosetta look at your Preferences for Rosetta and see what you have for a Targeted "CPU Run Time. ? Standard is 8 Hours

Yes, it's definitely that.
It looks like the Targeted CPU runtime is set at "1 day" (24hrs) while it's also taking 30hrs of 'wall clock time' to complete that 24hrs of CPU runtime.

While I've just argued that runtimes that are below the 8hrs Boinc schedules at should be increased to an explicit 8hrs, runtimes that are more than 8hrs - and, importantly, are missing deadline - do need to be reduced but not necessarily all the way down to 8hrs.
I run all my tasks <successfully> with a 12hr runtime, so I'd reduce to that level first and I think it will work even if the wall clock time requires 15hrs to complete them.

The downside is that more tasks will be required to run Rosetta for the same throughput at a time when they're few and far between, but it's other users who are the far bigger problem, as described in my earlier post this evening.
ID: 112290 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 59
Credit: 46,885,830
RAC: 134,589
Message 112291 - Posted: 26 Mar 2025, 21:02:41 UTC - in response to Message 112287.  
Last modified: 26 Mar 2025, 21:03:29 UTC

I currently have my buffer set to store at least 0.15 and up to 0.25 additional days of work...
...
just a 0.1 plus 0.1 cache size and 50% CPUs...
...
3 hours (Rosetta Beta I think) should be set explicitly at 8h

Me jumping in late, per normal. I realize there's strong opinions about the buffer size. But I have mine defined as 3 days +.01 for one simple reason...
I'm often away from the computers for 4 days at a time and I have had them run out of work. I firmly believe that idle jiffies are the devils playground.
Having said that, I keep mine, don't even need to take my socks off so I can count all 8* of them, at 100% CPU 24X7. I've also gone in and removed all the limitations on run-time, what they want is what they get. Thanks to information provided by the wise folks here I'm controlling which project get priority via other means.
*2 will go back to Alaska where there's already 4 running, where I live, in a little less than 2 weeks and 2 of them will go into summer hibernation, here in Arizona, until around Halloween when I return.
ID: 112291 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2292
Credit: 43,161,777
RAC: 25,399
Message 112292 - Posted: 27 Mar 2025, 3:29:43 UTC - in response to Message 112291.  

I currently have my buffer set to store at least 0.15 and up to 0.25 additional days of work...
...
just a 0.1 plus 0.1 cache size and 50% CPUs...
...
3 hours (Rosetta Beta I think) should be set explicitly at 8h

Me jumping in late, per normal. I realize there's strong opinions about the buffer size. But I have mine defined as 3 days +.01 for one simple reason...
I'm often away from the computers for 4 days at a time and I have had them run out of work. I firmly believe that idle jiffies are the devils playground.
Having said that, I keep mine, don't even need to take my socks off so I can count all 8* of them, at 100% CPU 24X7. I've also gone in and removed all the limitations on run-time, what they want is what they get. Thanks to information provided by the wise folks here I'm controlling which project get priority via other means.
*2 will go back to Alaska where there's already 4 running, where I live, in a little less than 2 weeks and 2 of them will go into summer hibernation, here in Arizona, until around Halloween when I return.

Sorry, but this really makes no sense at all.
If deadlines are 3 days and you fill your buffer to 3 days and they have to run for the default 8 hours after downloading then *ALL* will miss deadline by design.
They don't for you, for one reason and one reason alone.
The default runtime for these majority Rosetta Beta tasks turns out to be 3hrs, not 8hrs, so while Boinc *thinks* it's filling your cache to 3 days at the point of download, it only turns out to be ~1.125 days
You must see this.

My small cache thing is just me. You can have larger if you want. That's not the problem, up to a total of, say, 2.5 days (+8hrs runtime = < 3days).
If they ever get round to fixing this 3hr default issue, you'll find out pretty quickly.

The fact you're away for 4 days at a time shouldn't be relevant. Boinc polls for tasks several times a day. It will only fail if there are no tasks to grab, in which case that'll apply if you're there or not.
ID: 112292 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2292
Credit: 43,161,777
RAC: 25,399
Message 112293 - Posted: 27 Mar 2025, 3:39:24 UTC - in response to Message 112287.  
Last modified: 27 Mar 2025, 3:46:38 UTC

I'm sure everyone already knows that boinc-process is down again - being Wednesday. I'd estimate about 10hrs ago.
Crossing fingers that it may come back early again, like it did last week when it returned on Thursday am (UTC) rather than Friday

Edit: weird thing, but the assimilation backlog seems to have cleared down to zero at about the same time as validators went down. No idea what that's about

It's happened again.
Early Thursday am UTC and boinc-process is back for validating (still 128k backlog atm).
And Assimilators are still running (zero backlog)

Plus, In progress tasks have jumped ~40k from 106k to 148k
Still 0 ready to send, but I've somehow already got a full cache here

Positive news right now at least. I'll look again in the morning.

Edit: In progress just updated to 162k with 4674 showing ready to send.
Edit 2: And validation backlog down to just 11k
Could everything really be fixed, or is it just a false dawn?
ID: 112293 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 59
Credit: 46,885,830
RAC: 134,589
Message 112294 - Posted: 27 Mar 2025, 5:02:29 UTC - in response to Message 112292.  

Consider me confused. Here are the properties of the last Rosetta that I downloaded, hopefully the formatting isn't terrible, it says it wants 8 hours of processing time.

Application Rosetta Beta 6.06
Name 11mrredo_11_hallucinated_138_30_SAVE_ALL_OUT_3013178_65
State Ready to start
Received Wed 26 Mar 2025 10:07:34 AM MST
Report deadline Sat 29 Mar 2025 10:07:34 AM MST
Estimated computation size 80,000 GFLOPs
Executable rosetta_beta_6.06_x86_64-pc-linux-gnu

I looked at another computer and it's pretty much a mirror of this also. In the mean time this computer is running WCG jobs that appear to have been downloaded 3 days ago.
ID: 112294 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1804
Credit: 18,534,891
RAC: 2
Message 112295 - Posted: 27 Mar 2025, 7:56:00 UTC - in response to Message 112287.  

Edit: weird thing, but the assimilation backlog seems to have cleared down to zero at about the same time as validators went down. No idea what that's about
The Assimilators clear Tasks (move them from here to the main science database) once they have been Validated. No more Validation, no more new work for the Assimilators to do.
Grant
Darwin NT
ID: 112295 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1804
Credit: 18,534,891
RAC: 2
Message 112296 - Posted: 27 Mar 2025, 8:09:55 UTC - in response to Message 112288.  

Why is it that in the Tasks tab, all of my work units show that they will be late (Completion before deadline) and already are late based on the deadline column. I'm on a 12 core Mac and all of my other project burn through units, but the Rosetta one consistently has them all run late. I've even tried tweaking settings to give it a higher priority, but still consistently late.
Because it takes you 30 hours to do only 24 hours of work.
eg
Run time 1  days  6 hours 11 min 55 sec
CPU time         23 hours 59 min 22 sec

And it takes you 3.5 days to return work, when the deadlines are only 3 days.
Running more than one project, there is no need for a cache, 0.1 days and 0.01 additional days is plenty.

You should also figure out why it's taking you so long to process a Task- either your system is busy doing other computationally intensive work as well as BOINC, or you have set in your Usage Limits settings "Use at most 100 % of CPU time" to something less than 100% (this is an option that really should be removed). Set it to 100% and be done with it- if you have cooling issues with your system, fix them. If it's a laptop, then just limit the number of cores/threads BOINC can use.
On a lightly used system, the difference between CPU time and Run time shouldn't be much more than 5-10mins for a Target time of 24hrs.
Grant
Darwin NT
ID: 112296 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1804
Credit: 18,534,891
RAC: 2
Message 112297 - Posted: 27 Mar 2025, 8:17:57 UTC - in response to Message 112294.  

Consider me confused. Here are the properties of the last Rosetta that I downloaded, hopefully the formatting isn't terrible, it says it wants 8 hours of processing time.
It is hard coded by the project, regardless of how long it may actually run, and regardless of what your Target CPU time is.

Years back, due to problems with the initial Estimated completion time estimates, people were getting smashed with 1,000s of Tasks they had no hope of doing in time. One of the suggestions was to set the initial Estimated completion time to the project's default value (which is 8 hours). The other suggestion (and the preferred option) was to set it to the project's default value if no Target CPU time was set by the account holder. If they had a set Target CPU time, then use that time for their initial Estimated completion time.
Unfortunately, they went with the first option.
Grant
Darwin NT
ID: 112297 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1804
Credit: 18,534,891
RAC: 2
Message 112298 - Posted: 27 Mar 2025, 8:23:38 UTC - in response to Message 112293.  

Edit: In progress just updated to 162k with 4674 showing ready to send.
Edit 2: And validation backlog down to just 11k
Could everything really be fixed, or is it just a false dawn?
I'm thinking false dawn.
The Assimilators appeared to do OK early on as the Validation backlog was cleared. And the amount of work In progress climbed nicely as there were Tasks Ready to Send- for a while.
But now the In progress work appears to have plateaued and is dropping again as the Tasks Ready to Send is pretty much back to 0 again, and the assimilator backlog continues to grow, again.
Grant
Darwin NT
ID: 112298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1804
Credit: 18,534,891
RAC: 2
Message 112300 - Posted: 27 Mar 2025, 11:19:01 UTC

Ready to send is still pretty much 0, In progress continues to decline, Assimilator backlog continues to grow.
It's still borked.
Grant
Darwin NT
ID: 112300 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Matthias Lehmkuhl

Send message
Joined: 20 Nov 05
Posts: 13
Credit: 2,573,510
RAC: 1,555
Message 112301 - Posted: 27 Mar 2025, 11:25:34 UTC

I can't report my finished and uploaded work
message: Rosetta@home 27.03.2025 12:19:56 (CET) Server error: feeder not running

Project status tells feeder is running
Matthias

ID: 112301 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2292
Credit: 43,161,777
RAC: 25,399
Message 112302 - Posted: 27 Mar 2025, 11:36:16 UTC - in response to Message 112294.  

Consider me confused. Here are the properties of the last Rosetta that I downloaded, hopefully the formatting isn't terrible, it says it wants 8 hours of processing time.

Application Rosetta Beta 6.06
Name 11mrredo_11_hallucinated_138_30_SAVE_ALL_OUT_3013178_65
State Ready to start
Received Wed 26 Mar 2025 10:07:34 AM MST
Report deadline Sat 29 Mar 2025 10:07:34 AM MST
Estimated computation size 80,000 GFLOPs
Executable rosetta_beta_6.06_x86_64-pc-linux-gnu

I looked at another computer and it's pretty much a mirror of this also. In the mean time this computer is running WCG jobs that appear to have been downloaded 3 days ago.

What Grant said is right.
Before the task is started the runtime shown to Boinc is hard-coded to be 8hrs. That's what you're showing us above.
But the moment the task starts running the Rosetta Beta task's own internal runtime, mistakenly imo, is set to 3hrs and takes over.
Look at a running task to confirm that - as it progresses the remaining runtime veers toward 3hrs.
That's the <only> reason you don't miss deadlines.
Which is why I make the suggested changes to Target CPU runtime that I have.
Your tasks all run too short, you grab more tasks than you should, and if it wasn't for the large queue we currently have, which is an anomaly tbh, we'd all run out much quicker than we should.
ID: 112302 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2292
Credit: 43,161,777
RAC: 25,399
Message 112303 - Posted: 27 Mar 2025, 11:43:03 UTC - in response to Message 112298.  

Edit: In progress just updated to 162k with 4674 showing ready to send.
Edit 2: And validation backlog down to just 11k
Could everything really be fixed, or is it just a false dawn?
I'm thinking false dawn.
The Assimilators appeared to do OK early on as the Validation backlog was cleared. And the amount of work In progress climbed nicely as there were Tasks Ready to Send- for a while.
But now the In progress work appears to have plateaued and is dropping again as the Tasks Ready to Send is pretty much back to 0 again, and the assimilator backlog continues to grow, again.

I'm now agreeing with you.
In progress has dropped back to currently 133k, Assimilator backlog increasing again and ready to send dropping nearer to zero <sigh>
It's the hope that kills you...
Thanks for explaining Assimilators - I never really understood that. More the project's problem than ours, so I'll stop caring about that one and leave it to them.
Except to the extent that it indicates an ongoing problem everywhere.
Nice while it lasted (a few hours)
ID: 112303 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lem Novantotto

Send message
Joined: 13 Sep 23
Posts: 7
Credit: 1,713,174
RAC: 39,093
Message 112304 - Posted: 27 Mar 2025, 11:53:56 UTC - in response to Message 112301.  
Last modified: 27 Mar 2025, 12:30:54 UTC

I can't report my finished and uploaded work
message: Rosetta@home 27.03.2025 12:19:56 (CET) Server error: feeder not running

Project status tells feeder is running


I have the exact same issue with 2 of my 3 computers (all running Linux).
Oddly enough, the third one is reporting its work without any problem.

All my three computers are behind the same router.
--
Bye, Lem
ID: 112304 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 217
Credit: 7,399,159
RAC: 7,462
Message 112305 - Posted: 27 Mar 2025, 12:25:25 UTC - in response to Message 112301.  
Last modified: 27 Mar 2025, 12:29:16 UTC

I can't report my finished and uploaded work
message: Rosetta@home 27.03.2025 12:19:56 (CET) Server error: feeder not running

Project status tells feeder is running


Me too. Running Linux.
ID: 112305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2054
Credit: 10,773,622
RAC: 11,574
Message 112306 - Posted: 27 Mar 2025, 16:51:29 UTC

'cmon guys of Rosetta, there are over 9 milions wus to crunch.
Open the floodgates and let if flow
ID: 112306 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5726
Credit: 5,966,803
RAC: 1,731
Message 112307 - Posted: 27 Mar 2025, 17:30:46 UTC - in response to Message 112306.  

9 mill? I don't see that. Be sure to subtract what is going to their AI system.
We just used up 1.33+ million tasks
With current total users with credit there is barely 9 tasks per user that were processed.


And for the system...well what else is new? seriously...SOS if you know what I mean and not the distress symbol.
ID: 112307 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2054
Credit: 10,773,622
RAC: 11,574
Message 112308 - Posted: 27 Mar 2025, 17:34:27 UTC - in response to Message 112307.  

9 mill? I don't see that

Home page.

Be sure to subtract what is going to their AI system.

Their AI system is internal, so, no, you can't see the queue.
ID: 112308 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 320 · 321 · 322 · 323 · 324 · 325 · 326 . . . 328 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org