Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 58 · 59 · 60 · 61 · 62 · 63 · 64 . . . 309 · Next

AuthorMessage
Skillz

Send message
Joined: 24 May 17
Posts: 3
Credit: 5,914,356
RAC: 1,026
Message 97687 - Posted: 27 Jun 2020, 16:16:23 UTC

I am trying to attach two new computers to the project, but they fail every time.

When attempted to add I get a "project failed to attach" and looking at the logs its claiming it can't reach the project servers.

I can visit the rosetta@home web site using a browser on both computers I'm trying to attach so they're not blocked with a firewall or anything.

Those BOINC instances are attached to other projects and I can get work from those other projects.
ID: 97687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 97692 - Posted: 27 Jun 2020, 17:25:46 UTC - in response to Message 97686.  

EricM wrote:
Below is a shot
Link is broken; it redirects to a login page


I don't know of any way to see the status of the current work unit other than to let the screensaver start
Advanced view > Tasks tab
ID: 97692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 97695 - Posted: 27 Jun 2020, 18:14:48 UTC - in response to Message 97680.  

EricM,

Rosetta@home currently has so many new users that it's not keeping up with the demand for tasks.

As for being paused mid-task while another BOINC project run, that's normal if you have more than one BOINC project providing tasks. Tasks close to their deadlines get higher priority to run, and tasks for the other project catch up on run time later.


Boinc needs to reprogram the scheduler so the project weight works properly. In particular if you change the weighting, it takes days to actually do what you asked. For example, I changed from

Universe 0
LHC 0
Rosetta 1

to

Universe 1
LHC 5
Rosetta 25

I would expect to immediately see 1 Universe to every 5 LHC to every 25 Rosetta tasks running, but I didn't, not for 3 days. Boinc went utterly mental and ran almost exclusively LHC, presumably doing some weird lookback over the last week and seeing it hadn't done any. When the user changes the weighting, it should have immediate effect.
ID: 97695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97696 - Posted: 27 Jun 2020, 18:25:12 UTC - in response to Message 97692.  

Thanks for letting me know about the screenshot problem, Brian. I wish I could edit my post. Posting below via imgur.
There are no Rosetta tasks in my task listing.



system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ProDigit

Send message
Joined: 6 Dec 18
Posts: 27
Credit: 2,718,346
RAC: 0
Message 97698 - Posted: 27 Jun 2020, 18:33:24 UTC

12 CPU WUs hogging up my PC, using only 1 cpu core.
I will for the time being disconnect from this project until the issue is resolved.
ID: 97698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 97699 - Posted: 27 Jun 2020, 18:58:35 UTC - in response to Message 97698.  

12 CPU WUs hogging up my PC, using only 1 cpu core.
I will for the time being disconnect from this project until the issue is resolved.


What issue? I've got 6 computers running Rosetta - two of them with 24 cores each. All cores utilised as normal. What's happening on yours? Are there tasks that say running but doing no calculations?
ID: 97699 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 97700 - Posted: 27 Jun 2020, 19:00:34 UTC - in response to Message 97684.  
Last modified: 27 Jun 2020, 19:02:05 UTC

EricM,
Rosetta@home currently has so many new users that it's not keeping up with the demand for tasks.
As for being paused mid-task while another BOINC project run, that's normal if you have more than one BOINC project providing tasks. Tasks close to their deadlines get higher priority to run, and tasks for the other project catch up on run time later.

Hi Robert, and thanks for the info. I added the second project at your suggestion, thanks for that as well.
But the Rosetta pause occurred before that, and has several other times in the past couple months. So I still wonder why it's not finishing the current task.
Eric


I turned that nonsense off. Go into Boinc's properties and change the "switch between applications" to a huge number. I set mine to a year. I do not want stuff changing before it's finished.
ID: 97700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97701 - Posted: 27 Jun 2020, 19:31:26 UTC - in response to Message 97700.  


I turned that nonsense off. Go into Boinc's properties and change the "switch between applications" to a huge number. I set mine to a year. I do not want stuff changing before it's finished.

Thanks for the tip, Peter. I switched it from 120 minutes to 1,000 for now to see what happens, double what I've observed as the process time for a Rosetta task. And now I'm suspending my second project temporarily to see if Rosetta resumes the task when the screensaver kicks in.
Eric

system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 97703 - Posted: 27 Jun 2020, 19:42:32 UTC - in response to Message 97701.  
Last modified: 27 Jun 2020, 19:43:39 UTC


I turned that nonsense off. Go into Boinc's properties and change the "switch between applications" to a huge number. I set mine to a year. I do not want stuff changing before it's finished.

Thanks for the tip, Peter. I switched it from 120 minutes to 1,000 for now to see what happens, double what I've observed as the process time for a Rosetta task. And now I'm suspending my second project temporarily to see if Rosetta resumes the task when the screensaver kicks in.
Eric


That setting will still annoy you every 17 hours. I think it means to change every 17 hours, not 17 hours since the last task started. Hence I changed mine to a year.

And it doesn't apply if you:
Restart the machine.
Pause tasks to play a game etc.
Have another project go into high priority panic mode due to a late task.

I mainly changed mine because I run LHC, and their tasks don't checkpoint very well and can sometimes get corrupted or at least lose a lot of work.

But also I detest having hundreds of half done work units - especially when I see one with 1 second (!) left to go, which it doesn't get round to doing for a whole day! Boinc programmers aren't right in the head.
ID: 97703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 97704 - Posted: 27 Jun 2020, 19:51:48 UTC - in response to Message 97696.  

@EricM:

It seems that BOINC is having trouble deciding how much work to download for your computer. As I understand it, that decision is based in part on what BOINC has seen the computer complete in the past. Does the machine have an irregular usage pattern? (Powered off frequently/​irregularly? Variable amount of other work being done while BOINC is running?) The missed deadlines and high average turnaround time (2.81 days: only just within the 3-⁠day deadline) may be contributing.

Check all your Computing preferences. Post values/​screenshots here, and we might spot something amiss. Regarding pauses: do you have any restrictions in your Daily schedules settings?

To try to get some tasks, try increasing Store at least N days of work. Do it in steps: add around 0.4 (slightly more than one 8-⁠hour task time), save, wait a couple of minutes for BOINC to contact the server, and see if it downloads some tasks. If not, repeat. As soon as you get some tasks, reduce N again to maybe 0.3 (slightly less than one task) to avoid your machine getting flooded with work it cannot complete until BOINC learns better how long each task will take. Set Store up to an additional to 0, as at this stage the last thing you need is BOINC using poor estimates to opportunistically download even more work.
ID: 97704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 97705 - Posted: 27 Jun 2020, 19:58:32 UTC - in response to Message 97686.  

EricM,

If you're using the Simple View, click on View near the top of the window, then Advanced View to show more information.

When you want to go back to Simple View, click on View, then Simple View.

In Advanced View, click on Tasks to see a list of all the tasks currently on your computer. Some will show as Running, some as Waiting to run (started but not currently running; waiting for its next turn for CPU time), and some as Ready to start. There are also a few less common conditions you don't see as often.

Those in the Running condition will have time advancing in the Elapsed column, not always every second. They should have time decreasing in the Remaining column, but it can be increasing instead if the initial guess at how long it will run is sufficiently less than accurate.

The Deadline column shows when the task must be finished and returned to avoid problems.

For about one day before the deadline and for some time after the deadline, any tasks that finishes will upload its outputs and report the finish automatically. Any tasks finishing earlier than that may or may not wait. If you need to speed up an upload, click on Transfers, then some line for a file, then Retry now. This starts an attempt to upload all of the files going to the same BOINC project as the file you clicked on. The Status column shows whether the upload was blocked (usually temporarily).

Generally, if your BOINC Manager contacts the project server for any reason, it will then try to do any uploads and reports that are waiting.

If you need to speed up reporting a finished task, click on Projects, then the BOINC project the task is for, then Update. It should then try to report all finished task for that project, except any that are still waiting to finished their uploads.

To see the main event log, click on Tools, then Event Log. The main event log should then appear on the screen until you click Close at the bottom corner.

Your no longer usable messages indicate that the task is enough past its deadline that another task from the same workunit has been send to another user and that user has sent back the upload files and reported the task as finished, so the server no longer needs anything from your task any will not give you any credit for it.

Your no tasks sent message indicates that either there are no tasks available to send you, or the server has decided that your computer is not reliable enough to be worth sending any tasks for a while.

Your Project requested delay message indicates how long your BOINC Manager should wait before trying again. This is to prevent overly frequent requests from blocking access to the server for other users.

Your overdue messages indicate that the tasks is past their deadline, enough that you are unlikely to get any credit for returning them.

There is also a separate log file for each task.

You might check if you have Task Manager installed. I often use it to show problems with too many tasks trying to run at once, or not having enough memory to keep all of the tasks running.
ID: 97705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 97706 - Posted: 27 Jun 2020, 20:12:43 UTC - in response to Message 97687.  
Last modified: 27 Jun 2020, 20:13:04 UTC

[H]Skillz,

Are you using this link when you try to attach?

https://boinc.bakerlab.org/rosetta/

Note the https instead of the previous http.

If not, delete what's currently in the Project URL box, and put this link there instead, before clicking Next.

If this doesn't make it work, give us more details about what version of BOINC you are using under what version of what operating system (most users use the Windows operating system).

When you enter a message, then click on Post Reply, you seldom need to enter it again. Try waiting about one minute for the server to show the message first.
ID: 97706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 97707 - Posted: 27 Jun 2020, 20:21:52 UTC - in response to Message 97695.  

Peter Hucker,

[snip]

Boinc needs to reprogram the scheduler so the project weight works properly. In particular if you change the weighting, it takes days to actually do what you asked. For example, I changed from

Universe 0
LHC 0
Rosetta 1

to

Universe 1
LHC 5
Rosetta 25

I would expect to immediately see 1 Universe to every 5 LHC to every 25 Rosetta tasks running, but I didn't, not for 3 days. Boinc went utterly mental and ran almost exclusively LHC, presumably doing some weird lookback over the last week and seeing it hadn't done any. When the user changes the weighting, it should have immediate effect.


That would interfere with the way it recovers from times when one of the projects has no tasks available to send.

Instead, it looks back over the last few weeks, and tries to get tasks from whichever project would move it toward the new weighting.
ID: 97707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 97708 - Posted: 27 Jun 2020, 20:31:13 UTC - in response to Message 97698.  

12 CPU WUs hogging up my PC, using only 1 cpu core.
I will for the time being disconnect from this project until the issue is resolved.

That probably indicates that you have told BOINC Manager that it can use only one CPU core.

In Advanced view, click on Options, then Computing preferences. Adjust the fraction of the available CPU cores (show as CPUs here), possibly adding 1% to the fraction you really want to keep roundoff from causing problems.

After adjusting this, click on Save at the bottom of that window.
ID: 97708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 97709 - Posted: 27 Jun 2020, 20:35:39 UTC - in response to Message 97701.  
Last modified: 27 Jun 2020, 20:54:53 UTC


I turned that nonsense off. Go into Boinc's properties and change the "switch between applications" to a huge number. I set mine to a year. I do not want stuff changing before it's finished.

Thanks for the tip, Peter. I switched it from 120 minutes to 1,000 for now to see what happens, double what I've observed as the process time for a Rosetta task. And now I'm suspending my second project temporarily to see if Rosetta resumes the task when the screensaver kicks in.
Eric

If you use a huge value here, DON'T add project CPDN or project RNA World without first setting a smaller value instead. Both of these projects have tasks that run for months, so the huge value will keep any other projects from getting turns for CPU use.

Note that if you have enough main memory and have the setting to keep tasks in main memory even when not running, you won't be using the disk checkpoints except except when BOINC restarts. The project applications need to be able to adjust any timeouts when restarting from pauses, unless the timeouts are based on time when running rather than clock time.
ID: 97709 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97710 - Posted: 27 Jun 2020, 21:19:51 UTC - in response to Message 97704.  
Last modified: 27 Jun 2020, 21:28:47 UTC

@EricM:

It seems that BOINC is having trouble deciding how much work to download for your computer. As I understand it, that decision is based in part on what BOINC has seen the computer complete in the past. Does the machine have an irregular usage pattern? (Powered off frequently/​irregularly? Variable amount of other work being done while BOINC is running?) The missed deadlines and high average turnaround time (2.81 days: only just within the 3-⁠day deadline) may be contributing.

Check all your Computing preferences. Post values/​screenshots here, and we might spot something amiss. Regarding pauses: do you have any restrictions in your Daily schedules settings?

To try to get some tasks, try increasing Store at least N days of work. Do it in steps: add around 0.4 (slightly more than one 8-⁠hour task time), save, wait a couple of minutes for BOINC to contact the server, and see if it downloads some tasks. If not, repeat. As soon as you get some tasks, reduce N again to maybe 0.3 (slightly less than one task) to avoid your machine getting flooded with work it cannot complete until BOINC learns better how long each task will take. Set Store up to an additional to 0, as at this stage the last thing you need is BOINC using poor estimates to opportunistically download even more work.

Thanks to the three of you for your input!

@Peter, your 17-hour comment addresses a question I almost posed. It would be good of the Lords of BOINC to make clear in the settings if the application switching interval setting refers to elapsed time or task run-time.

@Brian, I was planning to implement your suggestions, but first waited to see how BOINC would react to my having suspended World Community Grid / OpenPandemics (my only other active project). As I suspected might occur, Rosetta started processing again at screensaver invocation, and I noted that BOINC had fetched four new Rosetta units. Of course I don't know if that was prompted by the suspension and/or the result of my having changed the application switching interval, or mere coincidence.
Given this, would you say I should still change my work storage settings?

To answer your questions:

  • My computer usage would be irregular seen over a couple decades, but is more regular the past few years. Typically, the computer is on 15 hours a day. Sometimes I'm working with it for much of the day, during which BOINC does not run. Other times, the sceensaver might come on 5-10 times per day and run for an hour each time. I have the screensaver set to come on after ten minutes, then I think the comp sleeps after an hour. But, depending on how far back we go -- I ran SETI@home for 20 years -- there are times when the computer is off for a few days or a few weeks, and there was a ten-year era when it would often be running only on weekends.
  • See my computing preferences screenshot below.


@Robert, I think your comment regarding deadlines and the server determining reliability of my computer could explain at least part of the reason for Rosetta being stalled the past few days. I was away Sun-Tue, so maybe the server gave up on the partially completed work units on my machine? If the cause is something like that, it makes me think that Rosetta is less tolerant that SETI was in this regard.
Eric



system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 97715 - Posted: 27 Jun 2020, 21:41:04 UTC - in response to Message 97710.  
Last modified: 27 Jun 2020, 22:13:52 UTC

EricM,

@Robert, I think your comment regarding deadlines and the server determining reliability of my computer could explain at least part of the reason for Rosetta being stalled the past few days. I was away Sun-Tue, so maybe the server gave up on the partially completed work units on my machine? If the cause is something like that, it makes me think that Rosetta is less tolerant that SETI was in this regard.
Eric

It's likely that the server gave up for the partially completed tasks. This may make it think your computer is unreliable at returning results fast enough.

Rosetta@home NEEDS results from previous tasks to generate most of the next round of workunits. Also, it considers COVID-19 work urgent. This means short deadlines are likely.

SETI recently decided to pause their last few years of work to analyze the results, and think about what to try next. In other words, they do not consider their work urgent, so long deadlines are likely.

Also, you might try changing your settings to use 50% (or 51% to avoid roundoff error) of the CPUs, but 100% of the CPU time. This will make it finish each task faster, although sometimes with fewer tasks in progress.

That would increase your chances of returning tasks on time.
ID: 97715 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,382,444
RAC: 19,446
Message 97716 - Posted: 27 Jun 2020, 21:54:50 UTC - in response to Message 97695.  

When the user changes the weighting, it should have immediate effect.
I does, as you yourself noted.
Boinc went utterly mental and ran almost exclusively LHC, presumably doing some weird lookback over the last week and seeing it hadn't done any.


It takes time for changes to settle down as it now has to balance out the new debts & credits between projects to match the new settings. That takes time to do- ie the time necessary to process the work to produce the Credit to match the new Resource share settings.
Grant
Darwin NT
ID: 97716 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 107
Credit: 865,910
RAC: 814
Message 97717 - Posted: 27 Jun 2020, 22:09:21 UTC - in response to Message 97676.  

Actually, today Rosetta disappeared from my project list. When I try to log on, it says it is unable.
This is a drag.

With no response from Rosetta, I added World Community Grid.

Steven Gaber
Oldsmar, FL
ID: 97717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,382,444
RAC: 19,446
Message 97718 - Posted: 27 Jun 2020, 22:09:53 UTC - in response to Message 97710.  

Typically, the computer is on 15 hours a day. Sometimes I'm working with it for much of the day, during which BOINC does not run.
Why?
Rosetta (like Seti) applications are set to run at Idle priority (the lowest level). Any other running programme of similar priority will get equal CPU resources. If it's priority is higher, then Rosetta applications will slow & even stop to allow the higher priority application to use the CPU resources.
If there is an application that is affected by having BOINC doing work in the background, you can use the Exclusive applications option to stop BOINC when just that particular application is running.

Back in the days of single core or just hyperthreaded systems, yeah you often needed to stop BOINC to allow other programmes to run OK. But with multi core/thread systems, unless the programme you are running is heavily multi threaded it just isn't necessary anymore.
Grant
Darwin NT
ID: 97718 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 58 · 59 · 60 · 61 · 62 · 63 · 64 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org