Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 58 · 59 · 60 · 61 · 62 · 63 · 64 . . . 257 · Next

AuthorMessage
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1217
Credit: 13,359,660
RAC: 238
Message 97680 - Posted: 27 Jun 2020, 13:15:04 UTC - in response to Message 97676.  
Last modified: 27 Jun 2020, 13:18:20 UTC

EricM,

Rosetta@home currently has so many new users that it's not keeping up with the demand for tasks.

As for being paused mid-task while another BOINC project run, that's normal if you have more than one BOINC project providing tasks. Tasks close to their deadlines get higher priority to run, and tasks for the other project catch up on run time later.
ID: 97680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SolidAir79

Send message
Joined: 5 May 20
Posts: 4
Credit: 1,928,188
RAC: 2
Message 97681 - Posted: 27 Jun 2020, 13:31:38 UTC

Getting some errors on a windows machine all similar Stderr message?
<core_client_version>7.16.5</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_06_25_30642_30047_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 2 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_06_25_30642_30047_ab_t000__robetta.zip -frag3 rb_06_25_30642_30047_ab_t000__robetta.200.3mers.index.gz -fragA rb_06_25_30642_30047_ab_t000__robetta.200.12mers.index.gz -fragB rb_06_25_30642_30047_ab_t000__robetta.200.3mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2765475
Using database: database_357d5d93529_n_methylminirosetta_database

[ ERROR ]: Caught exception:


File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)
------------------------ Begin developer's backtrace -------------------------
BACKTRACE:
------------------------- End developer's backtrace --------------------------


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



</stderr_txt>
]]>

Regards Alan
ID: 97681 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1217
Credit: 13,359,660
RAC: 238
Message 97682 - Posted: 27 Jun 2020, 13:37:58 UTC
Last modified: 27 Jun 2020, 13:40:48 UTC

SolidAir79,

That looks like an error in one of the input files for the workunit.

If so, you can't fix the problem, and all other users who get copies of that workunit or any other workunit using that input file will have it crash the same way.
ID: 97682 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SolidAir79

Send message
Joined: 5 May 20
Posts: 4
Credit: 1,928,188
RAC: 2
Message 97683 - Posted: 27 Jun 2020, 13:43:01 UTC - in response to Message 97682.  

Okay thanks must have a bad batch !
ID: 97683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97684 - Posted: 27 Jun 2020, 14:09:21 UTC - in response to Message 97680.  

EricM,
Rosetta@home currently has so many new users that it's not keeping up with the demand for tasks.
As for being paused mid-task while another BOINC project run, that's normal if you have more than one BOINC project providing tasks. Tasks close to their deadlines get higher priority to run, and tasks for the other project catch up on run time later.

Hi Robert, and thanks for the info. I added the second project at your suggestion, thanks for that as well.
But the Rosetta pause occurred before that, and has several other times in the past couple months. So I still wonder why it's not finishing the current task.
Eric

system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1217
Credit: 13,359,660
RAC: 238
Message 97685 - Posted: 27 Jun 2020, 14:29:33 UTC

EricM,

Looks like you need to give more details about not finishing the current task. If it includes leaving a CPU core idle, that's a problem. If it's switching to running another task instead, that's normal.
ID: 97685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97686 - Posted: 27 Jun 2020, 15:21:09 UTC - in response to Message 97685.  

EricM,

Looks like you need to give more details about not finishing the current task. If it includes leaving a CPU core idle, that's a problem. If it's switching to running another task instead, that's normal.

Below is a shot of the project properties in BOINC. I don't know of any way to see the status of the current work unit other than to let the screensaver start running and read it from the BOINC status messages that run before the project screensaver does. From the event log, here are the most recent Rosetta-related messages that contain anything other than "no tasks sent/project requested delay":

6/24/2020 7:16:42 PM | Rosetta@home | Result hbnet_surface_design3_0.7_SAVE_ALL_OUT_IGNORE_THE_REST_3yb2cw0d_949347_1_0 is no longer usable
6/24/2020 7:16:42 PM | Rosetta@home | Result miniprotein_relax2_COVID_SAVE_ALL_OUT_IGNORE_THE_REST_9db4ko0a_949448_2_0 is no longer usable
6/24/2020 7:16:42 PM | Rosetta@home | No tasks sent
6/24/2020 7:16:42 PM | Rosetta@home | Project requested delay of 31 seconds

and the most recent Rosetta-related messages prior to the above:

6/23/2020 3:51:03 PM | Rosetta@home | Task hbnet_surface_design3_0.7_SAVE_ALL_OUT_IGNORE_THE_REST_3yb2cw0d_949347_1_0 is 1.74 days overdue; you may not get credit for it.  Consider aborting it.
6/23/2020 3:51:03 PM | Rosetta@home | Task miniprotein_relax2_COVID_SAVE_ALL_OUT_IGNORE_THE_REST_9db4ko0a_949448_2_0 is 0.70 days overdue; you may not get credit for it.  Consider aborting it.
6/23/2020 3:51:03 PM | Rosetta@home | URL https://boinc.bakerlab.org/rosetta/; Computer ID 3864355; resource share 100
6/23/2020 3:51:03 PM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 8050566; resource share 100





system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97686 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Skillz

Send message
Joined: 24 May 17
Posts: 3
Credit: 2,748,881
RAC: 0
Message 97687 - Posted: 27 Jun 2020, 16:16:23 UTC

I am trying to attach two new computers to the project, but they fail every time.

When attempted to add I get a "project failed to attach" and looking at the logs its claiming it can't reach the project servers.

I can visit the rosetta@home web site using a browser on both computers I'm trying to attach so they're not blocked with a firewall or anything.

Those BOINC instances are attached to other projects and I can get work from those other projects.
ID: 97687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 97692 - Posted: 27 Jun 2020, 17:25:46 UTC - in response to Message 97686.  

EricM wrote:
Below is a shot
Link is broken; it redirects to a login page


I don't know of any way to see the status of the current work unit other than to let the screensaver start
Advanced view > Tasks tab
ID: 97692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1479
Credit: 7,256,432
RAC: 10,716
Message 97695 - Posted: 27 Jun 2020, 18:14:48 UTC - in response to Message 97680.  

EricM,

Rosetta@home currently has so many new users that it's not keeping up with the demand for tasks.

As for being paused mid-task while another BOINC project run, that's normal if you have more than one BOINC project providing tasks. Tasks close to their deadlines get higher priority to run, and tasks for the other project catch up on run time later.


Boinc needs to reprogram the scheduler so the project weight works properly. In particular if you change the weighting, it takes days to actually do what you asked. For example, I changed from

Universe 0
LHC 0
Rosetta 1

to

Universe 1
LHC 5
Rosetta 25

I would expect to immediately see 1 Universe to every 5 LHC to every 25 Rosetta tasks running, but I didn't, not for 3 days. Boinc went utterly mental and ran almost exclusively LHC, presumably doing some weird lookback over the last week and seeing it hadn't done any. When the user changes the weighting, it should have immediate effect.
ID: 97695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97696 - Posted: 27 Jun 2020, 18:25:12 UTC - in response to Message 97692.  

Thanks for letting me know about the screenshot problem, Brian. I wish I could edit my post. Posting below via imgur.
There are no Rosetta tasks in my task listing.



system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ProDigit

Send message
Joined: 6 Dec 18
Posts: 27
Credit: 2,716,044
RAC: 0
Message 97698 - Posted: 27 Jun 2020, 18:33:24 UTC

12 CPU WUs hogging up my PC, using only 1 cpu core.
I will for the time being disconnect from this project until the issue is resolved.
ID: 97698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1479
Credit: 7,256,432
RAC: 10,716
Message 97699 - Posted: 27 Jun 2020, 18:58:35 UTC - in response to Message 97698.  

12 CPU WUs hogging up my PC, using only 1 cpu core.
I will for the time being disconnect from this project until the issue is resolved.


What issue? I've got 6 computers running Rosetta - two of them with 24 cores each. All cores utilised as normal. What's happening on yours? Are there tasks that say running but doing no calculations?
ID: 97699 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1479
Credit: 7,256,432
RAC: 10,716
Message 97700 - Posted: 27 Jun 2020, 19:00:34 UTC - in response to Message 97684.  
Last modified: 27 Jun 2020, 19:02:05 UTC

EricM,
Rosetta@home currently has so many new users that it's not keeping up with the demand for tasks.
As for being paused mid-task while another BOINC project run, that's normal if you have more than one BOINC project providing tasks. Tasks close to their deadlines get higher priority to run, and tasks for the other project catch up on run time later.

Hi Robert, and thanks for the info. I added the second project at your suggestion, thanks for that as well.
But the Rosetta pause occurred before that, and has several other times in the past couple months. So I still wonder why it's not finishing the current task.
Eric


I turned that nonsense off. Go into Boinc's properties and change the "switch between applications" to a huge number. I set mine to a year. I do not want stuff changing before it's finished.
ID: 97700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
EHM-1
Avatar

Send message
Joined: 21 Mar 20
Posts: 23
Credit: 183,782
RAC: 0
Message 97701 - Posted: 27 Jun 2020, 19:31:26 UTC - in response to Message 97700.  


I turned that nonsense off. Go into Boinc's properties and change the "switch between applications" to a huge number. I set mine to a year. I do not want stuff changing before it's finished.

Thanks for the tip, Peter. I switched it from 120 minutes to 1,000 for now to see what happens, double what I've observed as the process time for a Rosetta task. And now I'm suspending my second project temporarily to see if Rosetta resumes the task when the screensaver kicks in.
Eric

system: up-to-date Windows 10, Intel quad-core 3.6 GHz processor, 8 GB RAM
ID: 97701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1479
Credit: 7,256,432
RAC: 10,716
Message 97703 - Posted: 27 Jun 2020, 19:42:32 UTC - in response to Message 97701.  
Last modified: 27 Jun 2020, 19:43:39 UTC


I turned that nonsense off. Go into Boinc's properties and change the "switch between applications" to a huge number. I set mine to a year. I do not want stuff changing before it's finished.

Thanks for the tip, Peter. I switched it from 120 minutes to 1,000 for now to see what happens, double what I've observed as the process time for a Rosetta task. And now I'm suspending my second project temporarily to see if Rosetta resumes the task when the screensaver kicks in.
Eric


That setting will still annoy you every 17 hours. I think it means to change every 17 hours, not 17 hours since the last task started. Hence I changed mine to a year.

And it doesn't apply if you:
Restart the machine.
Pause tasks to play a game etc.
Have another project go into high priority panic mode due to a late task.

I mainly changed mine because I run LHC, and their tasks don't checkpoint very well and can sometimes get corrupted or at least lose a lot of work.

But also I detest having hundreds of half done work units - especially when I see one with 1 second (!) left to go, which it doesn't get round to doing for a whole day! Boinc programmers aren't right in the head.
ID: 97703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 97704 - Posted: 27 Jun 2020, 19:51:48 UTC - in response to Message 97696.  

@EricM:

It seems that BOINC is having trouble deciding how much work to download for your computer. As I understand it, that decision is based in part on what BOINC has seen the computer complete in the past. Does the machine have an irregular usage pattern? (Powered off frequently/​irregularly? Variable amount of other work being done while BOINC is running?) The missed deadlines and high average turnaround time (2.81 days: only just within the 3-⁠day deadline) may be contributing.

Check all your Computing preferences. Post values/​screenshots here, and we might spot something amiss. Regarding pauses: do you have any restrictions in your Daily schedules settings?

To try to get some tasks, try increasing Store at least N days of work. Do it in steps: add around 0.4 (slightly more than one 8-⁠hour task time), save, wait a couple of minutes for BOINC to contact the server, and see if it downloads some tasks. If not, repeat. As soon as you get some tasks, reduce N again to maybe 0.3 (slightly less than one task) to avoid your machine getting flooded with work it cannot complete until BOINC learns better how long each task will take. Set Store up to an additional to 0, as at this stage the last thing you need is BOINC using poor estimates to opportunistically download even more work.
ID: 97704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1217
Credit: 13,359,660
RAC: 238
Message 97705 - Posted: 27 Jun 2020, 19:58:32 UTC - in response to Message 97686.  

EricM,

If you're using the Simple View, click on View near the top of the window, then Advanced View to show more information.

When you want to go back to Simple View, click on View, then Simple View.

In Advanced View, click on Tasks to see a list of all the tasks currently on your computer. Some will show as Running, some as Waiting to run (started but not currently running; waiting for its next turn for CPU time), and some as Ready to start. There are also a few less common conditions you don't see as often.

Those in the Running condition will have time advancing in the Elapsed column, not always every second. They should have time decreasing in the Remaining column, but it can be increasing instead if the initial guess at how long it will run is sufficiently less than accurate.

The Deadline column shows when the task must be finished and returned to avoid problems.

For about one day before the deadline and for some time after the deadline, any tasks that finishes will upload its outputs and report the finish automatically. Any tasks finishing earlier than that may or may not wait. If you need to speed up an upload, click on Transfers, then some line for a file, then Retry now. This starts an attempt to upload all of the files going to the same BOINC project as the file you clicked on. The Status column shows whether the upload was blocked (usually temporarily).

Generally, if your BOINC Manager contacts the project server for any reason, it will then try to do any uploads and reports that are waiting.

If you need to speed up reporting a finished task, click on Projects, then the BOINC project the task is for, then Update. It should then try to report all finished task for that project, except any that are still waiting to finished their uploads.

To see the main event log, click on Tools, then Event Log. The main event log should then appear on the screen until you click Close at the bottom corner.

Your no longer usable messages indicate that the task is enough past its deadline that another task from the same workunit has been send to another user and that user has sent back the upload files and reported the task as finished, so the server no longer needs anything from your task any will not give you any credit for it.

Your no tasks sent message indicates that either there are no tasks available to send you, or the server has decided that your computer is not reliable enough to be worth sending any tasks for a while.

Your Project requested delay message indicates how long your BOINC Manager should wait before trying again. This is to prevent overly frequent requests from blocking access to the server for other users.

Your overdue messages indicate that the tasks is past their deadline, enough that you are unlikely to get any credit for returning them.

There is also a separate log file for each task.

You might check if you have Task Manager installed. I often use it to show problems with too many tasks trying to run at once, or not having enough memory to keep all of the tasks running.
ID: 97705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1217
Credit: 13,359,660
RAC: 238
Message 97706 - Posted: 27 Jun 2020, 20:12:43 UTC - in response to Message 97687.  
Last modified: 27 Jun 2020, 20:13:04 UTC

[H]Skillz,

Are you using this link when you try to attach?

https://boinc.bakerlab.org/rosetta/

Note the https instead of the previous http.

If not, delete what's currently in the Project URL box, and put this link there instead, before clicking Next.

If this doesn't make it work, give us more details about what version of BOINC you are using under what version of what operating system (most users use the Windows operating system).

When you enter a message, then click on Post Reply, you seldom need to enter it again. Try waiting about one minute for the server to show the message first.
ID: 97706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1217
Credit: 13,359,660
RAC: 238
Message 97707 - Posted: 27 Jun 2020, 20:21:52 UTC - in response to Message 97695.  

Peter Hucker,

[snip]

Boinc needs to reprogram the scheduler so the project weight works properly. In particular if you change the weighting, it takes days to actually do what you asked. For example, I changed from

Universe 0
LHC 0
Rosetta 1

to

Universe 1
LHC 5
Rosetta 25

I would expect to immediately see 1 Universe to every 5 LHC to every 25 Rosetta tasks running, but I didn't, not for 3 days. Boinc went utterly mental and ran almost exclusively LHC, presumably doing some weird lookback over the last week and seeing it hadn't done any. When the user changes the weighting, it should have immediate effect.


That would interfere with the way it recovers from times when one of the projects has no tasks available to send.

Instead, it looks back over the last few weeks, and tries to get tasks from whichever project would move it toward the new weighting.
ID: 97707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 58 · 59 · 60 · 61 · 62 · 63 · 64 . . . 257 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2022 University of Washington
https://www.bakerlab.org