Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 321 · 322 · 323 · 324 · 325 · 326 · 327 . . . 329 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1807
Credit: 18,534,891
RAC: 2
Message 112298 - Posted: 27 Mar 2025, 8:23:38 UTC - in response to Message 112293.  

Edit: In progress just updated to 162k with 4674 showing ready to send.
Edit 2: And validation backlog down to just 11k
Could everything really be fixed, or is it just a false dawn?
I'm thinking false dawn.
The Assimilators appeared to do OK early on as the Validation backlog was cleared. And the amount of work In progress climbed nicely as there were Tasks Ready to Send- for a while.
But now the In progress work appears to have plateaued and is dropping again as the Tasks Ready to Send is pretty much back to 0 again, and the assimilator backlog continues to grow, again.
Grant
Darwin NT
ID: 112298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1807
Credit: 18,534,891
RAC: 2
Message 112300 - Posted: 27 Mar 2025, 11:19:01 UTC

Ready to send is still pretty much 0, In progress continues to decline, Assimilator backlog continues to grow.
It's still borked.
Grant
Darwin NT
ID: 112300 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Matthias Lehmkuhl

Send message
Joined: 20 Nov 05
Posts: 13
Credit: 2,573,510
RAC: 1,276
Message 112301 - Posted: 27 Mar 2025, 11:25:34 UTC

I can't report my finished and uploaded work
message: Rosetta@home 27.03.2025 12:19:56 (CET) Server error: feeder not running

Project status tells feeder is running
Matthias

ID: 112301 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2297
Credit: 43,239,371
RAC: 26,458
Message 112302 - Posted: 27 Mar 2025, 11:36:16 UTC - in response to Message 112294.  

Consider me confused. Here are the properties of the last Rosetta that I downloaded, hopefully the formatting isn't terrible, it says it wants 8 hours of processing time.

Application Rosetta Beta 6.06
Name 11mrredo_11_hallucinated_138_30_SAVE_ALL_OUT_3013178_65
State Ready to start
Received Wed 26 Mar 2025 10:07:34 AM MST
Report deadline Sat 29 Mar 2025 10:07:34 AM MST
Estimated computation size 80,000 GFLOPs
Executable rosetta_beta_6.06_x86_64-pc-linux-gnu

I looked at another computer and it's pretty much a mirror of this also. In the mean time this computer is running WCG jobs that appear to have been downloaded 3 days ago.

What Grant said is right.
Before the task is started the runtime shown to Boinc is hard-coded to be 8hrs. That's what you're showing us above.
But the moment the task starts running the Rosetta Beta task's own internal runtime, mistakenly imo, is set to 3hrs and takes over.
Look at a running task to confirm that - as it progresses the remaining runtime veers toward 3hrs.
That's the <only> reason you don't miss deadlines.
Which is why I make the suggested changes to Target CPU runtime that I have.
Your tasks all run too short, you grab more tasks than you should, and if it wasn't for the large queue we currently have, which is an anomaly tbh, we'd all run out much quicker than we should.
ID: 112302 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2297
Credit: 43,239,371
RAC: 26,458
Message 112303 - Posted: 27 Mar 2025, 11:43:03 UTC - in response to Message 112298.  

Edit: In progress just updated to 162k with 4674 showing ready to send.
Edit 2: And validation backlog down to just 11k
Could everything really be fixed, or is it just a false dawn?
I'm thinking false dawn.
The Assimilators appeared to do OK early on as the Validation backlog was cleared. And the amount of work In progress climbed nicely as there were Tasks Ready to Send- for a while.
But now the In progress work appears to have plateaued and is dropping again as the Tasks Ready to Send is pretty much back to 0 again, and the assimilator backlog continues to grow, again.

I'm now agreeing with you.
In progress has dropped back to currently 133k, Assimilator backlog increasing again and ready to send dropping nearer to zero <sigh>
It's the hope that kills you...
Thanks for explaining Assimilators - I never really understood that. More the project's problem than ours, so I'll stop caring about that one and leave it to them.
Except to the extent that it indicates an ongoing problem everywhere.
Nice while it lasted (a few hours)
ID: 112303 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lem Novantotto

Send message
Joined: 13 Sep 23
Posts: 7
Credit: 1,793,775
RAC: 37,364
Message 112304 - Posted: 27 Mar 2025, 11:53:56 UTC - in response to Message 112301.  
Last modified: 27 Mar 2025, 12:30:54 UTC

I can't report my finished and uploaded work
message: Rosetta@home 27.03.2025 12:19:56 (CET) Server error: feeder not running

Project status tells feeder is running


I have the exact same issue with 2 of my 3 computers (all running Linux).
Oddly enough, the third one is reporting its work without any problem.

All my three computers are behind the same router.
--
Bye, Lem
ID: 112304 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 217
Credit: 7,413,302
RAC: 7,054
Message 112305 - Posted: 27 Mar 2025, 12:25:25 UTC - in response to Message 112301.  
Last modified: 27 Mar 2025, 12:29:16 UTC

I can't report my finished and uploaded work
message: Rosetta@home 27.03.2025 12:19:56 (CET) Server error: feeder not running

Project status tells feeder is running


Me too. Running Linux.
ID: 112305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2054
Credit: 10,806,245
RAC: 12,028
Message 112306 - Posted: 27 Mar 2025, 16:51:29 UTC

'cmon guys of Rosetta, there are over 9 milions wus to crunch.
Open the floodgates and let if flow
ID: 112306 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5740
Credit: 5,970,829
RAC: 1,701
Message 112307 - Posted: 27 Mar 2025, 17:30:46 UTC - in response to Message 112306.  

9 mill? I don't see that. Be sure to subtract what is going to their AI system.
We just used up 1.33+ million tasks
With current total users with credit there is barely 9 tasks per user that were processed.


And for the system...well what else is new? seriously...SOS if you know what I mean and not the distress symbol.
ID: 112307 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2054
Credit: 10,806,245
RAC: 12,028
Message 112308 - Posted: 27 Mar 2025, 17:34:27 UTC - in response to Message 112307.  

9 mill? I don't see that

Home page.

Be sure to subtract what is going to their AI system.

Their AI system is internal, so, no, you can't see the queue.
ID: 112308 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5740
Credit: 5,970,829
RAC: 1,701
Message 112309 - Posted: 27 Mar 2025, 17:34:51 UTC

Linux guys, I don't think you OS is the issue. If you can't get into the RAH feeder, which as far as I know sends out both OS type tasks, it's once your inside the feeder it figures out which OS is asking and will send work.
I am Windows and I get the same error as you. So it's not OS.
ID: 112309 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lem Novantotto

Send message
Joined: 13 Sep 23
Posts: 7
Credit: 1,793,775
RAC: 37,364
Message 112310 - Posted: 28 Mar 2025, 0:04:04 UTC - in response to Message 112309.  

Linux guys, I don't think you OS is the issue. If you can't get into the RAH feeder, which as far as I know sends out both OS type tasks, it's once your inside the feeder it figures out which OS is asking and will send work.
I am Windows and I get the same error as you. So it's not OS.


Surely not. One of my machines is reporting and getting work flawlessly. It has the same OS, same version, same software, same everything of the others.
Even the connection is the same: the ok machine has a wired connection to one of the ko machines, and the latter is wirelessly connected to the router.

I don't know how to set the logs, and what to look for... Trying something, I see when connecting (I don't know if these logs are meaningful):

OK machine:
ven 28 mar 2025, 00:49:20 | | [work_fetch] No project chosen for work fetch
ven 28 mar 2025, 00:50:20 | Rosetta@home | rsc type 0 last MC limit time 345351.303849 total buf 371520.000000
ven 28 mar 2025, 00:50:20 | | [work_fetch] ------- start work fetch state -------
ven 28 mar 2025, 00:50:20 | | [work_fetch] target work buffer: 25920.00 + 345600.00 sec
ven 28 mar 2025, 00:50:20 | | [work_fetch] --- project states ---
ven 28 mar 2025, 00:50:20 | Rosetta@home | [work_fetch] REC 3611.800 prio -1.927 can request work
ven 28 mar 2025, 00:50:20 | | [work_fetch] --- state for CPU ---
ven 28 mar 2025, 00:50:20 | | [work_fetch] shortfall 66825.57 nidle 0.00 saturated 346899.60 busy 204796.13
ven 28 mar 2025, 00:50:20 | Rosetta@home | [work_fetch] share 1.000
ven 28 mar 2025, 00:50:20 | | [work_fetch] ------- end work fetch state -------
ven 28 mar 2025, 00:50:20 | Rosetta@home | choose_project: scanning
ven 28 mar 2025, 00:50:20 | Rosetta@home | can fetch CPU
ven 28 mar 2025, 00:50:20 | | [work_fetch] No project chosen for work fetch



KO machine:
ven 28 mar 2025, 00:44:17 | | [work_fetch] No project chosen for work fetch
ven 28 mar 2025, 00:45:18 | Rosetta@home | rsc type 0 last MC limit time 73122.227706 total buf 302400.000000
ven 28 mar 2025, 00:45:18 | Rosetta@home | rsc type 1 last MC limit time 0.000000 total buf 302400.000000
ven 28 mar 2025, 00:45:18 | | [work_fetch] ------- start work fetch state -------
ven 28 mar 2025, 00:45:18 | | [work_fetch] target work buffer: 95040.00 + 207360.00 sec
ven 28 mar 2025, 00:45:18 | | [work_fetch] --- project states ---
ven 28 mar 2025, 00:45:18 | Rosetta@home | [work_fetch] REC 8430.172 prio -0.017 can't request work: scheduler RPC backoff (2573.28 sec)
ven 28 mar 2025, 00:45:18 | | [work_fetch] --- state for CPU ---
ven 28 mar 2025, 00:45:18 | | [work_fetch] shortfall 2598075.15 nidle 0.00 saturated 74184.51 busy 28873.15
ven 28 mar 2025, 00:45:18 | Rosetta@home | [work_fetch] share 0.000
ven 28 mar 2025, 00:45:18 | | [work_fetch] --- state for AMD/ATI GPU ---
ven 28 mar 2025, 00:45:18 | | [work_fetch] shortfall 302400.00 nidle 1.00 saturated 0.00 busy 0.00
ven 28 mar 2025, 00:45:18 | Rosetta@home | [work_fetch] share 0.000
ven 28 mar 2025, 00:45:18 | | [work_fetch] ------- end work fetch state -------
ven 28 mar 2025, 00:45:18 | Rosetta@home | choose_project: scanning
ven 28 mar 2025, 00:45:18 | Rosetta@home | skip: scheduler RPC backoff
ven 28 mar 2025, 00:45:18 | | [work_fetch] No project chosen for work fetch

I have bolded some differences.

Please HELP!
--
Bye, Lem
ID: 112310 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2297
Credit: 43,239,371
RAC: 26,458
Message 112311 - Posted: 28 Mar 2025, 0:08:39 UTC - in response to Message 112309.  

Linux guys, I don't think you OS is the issue. If you can't get into the RAH feeder, which as far as I know sends out both OS type tasks, it's once your inside the feeder it figures out which OS is asking and will send work.
I am Windows and I get the same error as you. So it's not OS.

I didn't understand the report earlier as I couldn't see the issue on my main PC (W10) or my laptop (W11).
Then today I couldn't see it at work either (W10).

But I've just returned to where I stay in London and the PC I keep here (W10) has been reporting it for 17hrs.
I don't understand why but finally I'm seeing what some of you guys have been reporting.
Very odd. No idea what the solution is, but I'm going to do a reboot now after a windows update (clue?)
ID: 112311 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lem Novantotto

Send message
Joined: 13 Sep 23
Posts: 7
Credit: 1,793,775
RAC: 37,364
Message 112312 - Posted: 28 Mar 2025, 0:13:30 UTC - in response to Message 112310.  

And:

Ok machine:
ven 28 mar 2025, 01:11:45 | Rosetta@home | update requested by user
ven 28 mar 2025, 01:11:48 | Rosetta@home | Sending scheduler request: Requested by user.
ven 28 mar 2025, 01:11:48 | Rosetta@home | Requesting new tasks for CPU
ven 28 mar 2025, 01:11:50 | Rosetta@home | Scheduler request completed: got 1 new tasks
ven 28 mar 2025, 01:11:50 | Rosetta@home | Project requested delay of 31 seconds
ven 28 mar 2025, 01:11:52 | Rosetta@home | Started download of 13mrredo_13_hallucinated_97_232.zip
ven 28 mar 2025, 01:11:52 | Rosetta@home | Started download of 13mrredo_13_hallucinated_97_232.flags
ven 28 mar 2025, 01:11:53 | Rosetta@home | Finished download of 13mrredo_13_hallucinated_97_232.zip (3886 bytes)
ven 28 mar 2025, 01:11:53 | Rosetta@home | Finished download of 13mrredo_13_hallucinated_97_232.flags (1018 bytes)



KO machine:
ven 28 mar 2025, 01:09:38 | Rosetta@home | update requested by user
ven 28 mar 2025, 01:09:40 | Rosetta@home | Sending scheduler request: Requested by user.
ven 28 mar 2025, 01:09:40 | Rosetta@home | Reporting 88 completed tasks
ven 28 mar 2025, 01:09:40 | Rosetta@home | Requesting new tasks for CPU and AMD/ATI GPU
ven 28 mar 2025, 01:09:42 | Rosetta@home | Scheduler request completed: got 0 new tasks
ven 28 mar 2025, 01:09:42 | Rosetta@home | Server error: feeder not running

--
Bye, Lem
ID: 112312 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2297
Credit: 43,239,371
RAC: 26,458
Message 112313 - Posted: 28 Mar 2025, 0:42:48 UTC - in response to Message 112311.  

Very odd. No idea what the solution is, but I'm going to do a reboot now after a windows update (clue?)

That didn't fix anything.
I'm now out of ideas - if anyone wants to suggest something, go right ahead. I'm all ears
ID: 112313 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
WezH

Send message
Joined: 6 Apr 20
Posts: 6
Credit: 3,439,817
RAC: 98,111
Message 112314 - Posted: 28 Mar 2025, 5:34:08 UTC

I did disable IPv6 from Win10 and Linux hosts and all tasks were reported and no "feeder not running" error. I didn't get any new tasks but maybe out of tasks?

Try to disable IPv6, see if that helps.
ID: 112314 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1807
Credit: 18,534,891
RAC: 2
Message 112315 - Posted: 28 Mar 2025, 7:34:53 UTC - in response to Message 112314.  

I did disable IPv6 from Win10 and Linux hosts and all tasks were reported and no "feeder not running" error. I didn't get any new tasks but maybe out of tasks?

Try to disable IPv6, see if that helps.
We suspect there are issues with the bwsrv1 host, which is responsible for the Feeder (along with other processes).
It's just the luck of the draw when you request work if you will get any or not, as most of the time there is no work ready to Send (19:20 to 22:30 UTC yesterday there were a few thousand ready to send, and 06:00 to 07:00UTC today there were a few thousand ready to send. Othe than those times, it's between 6 and zero ready to send).
Grant
Darwin NT
ID: 112315 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5740
Credit: 5,970,829
RAC: 1,701
Message 112317 - Posted: 28 Mar 2025, 7:56:00 UTC - in response to Message 112311.  

Linux guys, I don't think you OS is the issue. If you can't get into the RAH feeder, which as far as I know sends out both OS type tasks, it's once your inside the feeder it figures out which OS is asking and will send work.
I am Windows and I get the same error as you. So it's not OS.

I didn't understand the report earlier as I couldn't see the issue on my main PC (W10) or my laptop (W11).
Then today I couldn't see it at work either (W10).

But I've just returned to where I stay in London and the PC I keep here (W10) has been reporting it for 17hrs.
I don't understand why but finally I'm seeing what some of you guys have been reporting.
Very odd. No idea what the solution is, but I'm going to do a reboot now after a windows update (clue?)



Sid - its inside Baker Labs system. I have been getting feeder errors for a few days now. Not surprising given how they kind of orphaned this project.

Grant has some more detailed information in this thread https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=112315
ID: 112317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5740
Credit: 5,970,829
RAC: 1,701
Message 112318 - Posted: 28 Mar 2025, 7:59:13 UTC

Times shown at GMT+1 or CET (Central European Time)

3/28/2025 8:57:05 AM | Rosetta@home | update requested by user
3/28/2025 8:57:05 AM | Rosetta@home | Sending scheduler request: Requested by user.
3/28/2025 8:57:05 AM | Rosetta@home | Reporting 5 completed tasks
3/28/2025 8:57:05 AM | Rosetta@home | Not requesting tasks: don't need (CPU: ; NVIDIA GPU: )
3/28/2025 8:57:07 AM | Rosetta@home | Scheduler request completed
3/28/2025 8:57:07 AM | Rosetta@home | Server error: feeder not running
3/28/2025 8:57:07 AM | Rosetta@home | Project requested delay of 3600 seconds
ID: 112318 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lem Novantotto

Send message
Joined: 13 Sep 23
Posts: 7
Credit: 1,793,775
RAC: 37,364
Message 112319 - Posted: 28 Mar 2025, 9:00:08 UTC - in response to Message 112314.  

I did disable IPv6 from Win10 and Linux hosts and all tasks were reported and no "feeder not running" error. I didn't get any new tasks but maybe out of tasks?

Try to disable IPv6, see if that helps.


Thanks a lot! It worked.

And that is obviously the difference that I forgot between my systems, too. The one working was configured to use just IPv4!
Thanks again.
--
Bye, Lem
ID: 112319 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 321 · 322 · 323 · 324 · 325 · 326 · 327 . . . 329 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org