Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 58 · 59 · 60 · 61 · 62 · 63 · 64 . . . 311 · Next
Author | Message |
---|---|
Skillz Send message Joined: 24 May 17 Posts: 3 Credit: 5,914,356 RAC: 421 |
I am trying to attach two new computers to the project, but they fail every time. When attempted to add I get a "project failed to attach" and looking at the logs its claiming it can't reach the project servers. I can visit the rosetta@home web site using a browser on both computers I'm trying to attach so they're not blocked with a firewall or anything. Those BOINC instances are attached to other projects and I can get work from those other projects. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
EricM wrote: Below is a shotLink is broken; it redirects to a login page I don't know of any way to see the status of the current work unit other than to let the screensaver startAdvanced view > Tasks tab |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
EricM, Boinc needs to reprogram the scheduler so the project weight works properly. In particular if you change the weighting, it takes days to actually do what you asked. For example, I changed from Universe 0 LHC 0 Rosetta 1 to Universe 1 LHC 5 Rosetta 25 I would expect to immediately see 1 Universe to every 5 LHC to every 25 Rosetta tasks running, but I didn't, not for 3 days. Boinc went utterly mental and ran almost exclusively LHC, presumably doing some weird lookback over the last week and seeing it hadn't done any. When the user changes the weighting, it should have immediate effect. |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
|
ProDigit Send message Joined: 6 Dec 18 Posts: 27 Credit: 2,718,346 RAC: 0 |
12 CPU WUs hogging up my PC, using only 1 cpu core. I will for the time being disconnect from this project until the issue is resolved. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
12 CPU WUs hogging up my PC, using only 1 cpu core. What issue? I've got 6 computers running Rosetta - two of them with 24 cores each. All cores utilised as normal. What's happening on yours? Are there tasks that say running but doing no calculations? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
EricM, I turned that nonsense off. Go into Boinc's properties and change the "switch between applications" to a huge number. I set mine to a year. I do not want stuff changing before it's finished. |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
Thanks for the tip, Peter. I switched it from 120 minutes to 1,000 for now to see what happens, double what I've observed as the process time for a Rosetta task. And now I'm suspending my second project temporarily to see if Rosetta resumes the task when the screensaver kicks in. Eric system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 4,044 |
That setting will still annoy you every 17 hours. I think it means to change every 17 hours, not 17 hours since the last task started. Hence I changed mine to a year. And it doesn't apply if you: Restart the machine. Pause tasks to play a game etc. Have another project go into high priority panic mode due to a late task. I mainly changed mine because I run LHC, and their tasks don't checkpoint very well and can sometimes get corrupted or at least lose a lot of work. But also I detest having hundreds of half done work units - especially when I see one with 1 second (!) left to go, which it doesn't get round to doing for a whole day! Boinc programmers aren't right in the head. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
@EricM: It seems that BOINC is having trouble deciding how much work to download for your computer. As I understand it, that decision is based in part on what BOINC has seen the computer complete in the past. Does the machine have an irregular usage pattern? (Powered off frequently/irregularly? Variable amount of other work being done while BOINC is running?) The missed deadlines and high average turnaround time (2.81 days: only just within the 3-day deadline) may be contributing. Check all your Computing preferences. Post values/screenshots here, and we might spot something amiss. Regarding pauses: do you have any restrictions in your Daily schedules settings? To try to get some tasks, try increasing Store at least N days of work. Do it in steps: add around 0.4 (slightly more than one 8-hour task time), save, wait a couple of minutes for BOINC to contact the server, and see if it downloads some tasks. If not, repeat. As soon as you get some tasks, reduce N again to maybe 0.3 (slightly less than one task) to avoid your machine getting flooded with work it cannot complete until BOINC learns better how long each task will take. Set Store up to an additional to 0, as at this stage the last thing you need is BOINC using poor estimates to opportunistically download even more work. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
EricM, If you're using the Simple View, click on View near the top of the window, then Advanced View to show more information. When you want to go back to Simple View, click on View, then Simple View. In Advanced View, click on Tasks to see a list of all the tasks currently on your computer. Some will show as Running, some as Waiting to run (started but not currently running; waiting for its next turn for CPU time), and some as Ready to start. There are also a few less common conditions you don't see as often. Those in the Running condition will have time advancing in the Elapsed column, not always every second. They should have time decreasing in the Remaining column, but it can be increasing instead if the initial guess at how long it will run is sufficiently less than accurate. The Deadline column shows when the task must be finished and returned to avoid problems. For about one day before the deadline and for some time after the deadline, any tasks that finishes will upload its outputs and report the finish automatically. Any tasks finishing earlier than that may or may not wait. If you need to speed up an upload, click on Transfers, then some line for a file, then Retry now. This starts an attempt to upload all of the files going to the same BOINC project as the file you clicked on. The Status column shows whether the upload was blocked (usually temporarily). Generally, if your BOINC Manager contacts the project server for any reason, it will then try to do any uploads and reports that are waiting. If you need to speed up reporting a finished task, click on Projects, then the BOINC project the task is for, then Update. It should then try to report all finished task for that project, except any that are still waiting to finished their uploads. To see the main event log, click on Tools, then Event Log. The main event log should then appear on the screen until you click Close at the bottom corner. Your no longer usable messages indicate that the task is enough past its deadline that another task from the same workunit has been send to another user and that user has sent back the upload files and reported the task as finished, so the server no longer needs anything from your task any will not give you any credit for it. Your no tasks sent message indicates that either there are no tasks available to send you, or the server has decided that your computer is not reliable enough to be worth sending any tasks for a while. Your Project requested delay message indicates how long your BOINC Manager should wait before trying again. This is to prevent overly frequent requests from blocking access to the server for other users. Your overdue messages indicate that the tasks is past their deadline, enough that you are unlikely to get any credit for returning them. There is also a separate log file for each task. You might check if you have Task Manager installed. I often use it to show problems with too many tasks trying to run at once, or not having enough memory to keep all of the tasks running. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
[H]Skillz, Are you using this link when you try to attach? https://boinc.bakerlab.org/rosetta/ Note the https instead of the previous http. If not, delete what's currently in the Project URL box, and put this link there instead, before clicking Next. If this doesn't make it work, give us more details about what version of BOINC you are using under what version of what operating system (most users use the Windows operating system). When you enter a message, then click on Post Reply, you seldom need to enter it again. Try waiting about one minute for the server to show the message first. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
Peter Hucker, [snip] Boinc needs to reprogram the scheduler so the project weight works properly. In particular if you change the weighting, it takes days to actually do what you asked. For example, I changed from That would interfere with the way it recovers from times when one of the projects has no tasks available to send. Instead, it looks back over the last few weeks, and tries to get tasks from whichever project would move it toward the new weighting. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
12 CPU WUs hogging up my PC, using only 1 cpu core. That probably indicates that you have told BOINC Manager that it can use only one CPU core. In Advanced view, click on Options, then Computing preferences. Adjust the fraction of the available CPU cores (show as CPUs here), possibly adding 1% to the fraction you really want to keep roundoff from causing problems. After adjusting this, click on Save at the bottom of that window. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
If you use a huge value here, DON'T add project CPDN or project RNA World without first setting a smaller value instead. Both of these projects have tasks that run for months, so the huge value will keep any other projects from getting turns for CPU use. Note that if you have enough main memory and have the setting to keep tasks in main memory even when not running, you won't be using the disk checkpoints except except when BOINC restarts. The project applications need to be able to adjust any timeouts when restarting from pauses, unless the timeouts are based on time when running rather than clock time. |
EHM-1 Send message Joined: 21 Mar 20 Posts: 23 Credit: 183,782 RAC: 0 |
@EricM: Thanks to the three of you for your input! @Peter, your 17-hour comment addresses a question I almost posed. It would be good of the Lords of BOINC to make clear in the settings if the application switching interval setting refers to elapsed time or task run-time. @Brian, I was planning to implement your suggestions, but first waited to see how BOINC would react to my having suspended World Community Grid / OpenPandemics (my only other active project). As I suspected might occur, Rosetta started processing again at screensaver invocation, and I noted that BOINC had fetched four new Rosetta units. Of course I don't know if that was prompted by the suspension and/or the result of my having changed the application switching interval, or mere coincidence. Given this, would you say I should still change my work storage settings? To answer your questions:
system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
EricM, @Robert, I think your comment regarding deadlines and the server determining reliability of my computer could explain at least part of the reason for Rosetta being stalled the past few days. I was away Sun-Tue, so maybe the server gave up on the partially completed work units on my machine? If the cause is something like that, it makes me think that Rosetta is less tolerant that SETI was in this regard. It's likely that the server gave up for the partially completed tasks. This may make it think your computer is unreliable at returning results fast enough. Rosetta@home NEEDS results from previous tasks to generate most of the next round of workunits. Also, it considers COVID-19 work urgent. This means short deadlines are likely. SETI recently decided to pause their last few years of work to analyze the results, and think about what to try next. In other words, they do not consider their work urgent, so long deadlines are likely. Also, you might try changing your settings to use 50% (or 51% to avoid roundoff error) of the CPUs, but 100% of the CPU time. This will make it finish each task faster, although sometimes with fewer tasks in progress. That would increase your chances of returning tasks on time. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
When the user changes the weighting, it should have immediate effect.I does, as you yourself noted. Boinc went utterly mental and ran almost exclusively LHC, presumably doing some weird lookback over the last week and seeing it hadn't done any. It takes time for changes to settle down as it now has to balance out the new debts & credits between projects to match the new settings. That takes time to do- ie the time necessary to process the work to produce the Credit to match the new Resource share settings. Grant Darwin NT |
Stevie G Send message Joined: 15 Dec 18 Posts: 108 Credit: 866,895 RAC: 389 |
Actually, today Rosetta disappeared from my project list. When I try to log on, it says it is unable. This is a drag. With no response from Rosetta, I added World Community Grid. Steven Gaber Oldsmar, FL |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
Typically, the computer is on 15 hours a day. Sometimes I'm working with it for much of the day, during which BOINC does not run.Why? Rosetta (like Seti) applications are set to run at Idle priority (the lowest level). Any other running programme of similar priority will get equal CPU resources. If it's priority is higher, then Rosetta applications will slow & even stop to allow the higher priority application to use the CPU resources. If there is an application that is affected by having BOINC doing work in the background, you can use the Exclusive applications option to stop BOINC when just that particular application is running. Back in the days of single core or just hyperthreaded systems, yeah you often needed to stop BOINC to allow other programmes to run OK. But with multi core/thread systems, unless the programme you are running is heavily multi threaded it just isn't necessary anymore. Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org