Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 80 · 81 · 82 · 83 · 84 · 85 · 86 . . . 310 · Next

AuthorMessage
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,981,693
RAC: 1,845
Message 100599 - Posted: 10 Feb 2021, 15:13:06 UTC

ID: 100599 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1729
Credit: 18,451,410
RAC: 17,781
Message 100600 - Posted: 10 Feb 2021, 19:14:26 UTC
Last modified: 10 Feb 2021, 19:19:57 UTC

Server Status page shows the Transitioners are down. Someone needs to give things a nudge.

Tasks ready to send          1
Transitioner backlog (hours) 8.98    (usually zero, or very close to it).

Grant
Darwin NT
ID: 100600 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 100601 - Posted: 10 Feb 2021, 19:47:35 UTC - in response to Message 100600.  

Server Status page shows the Transitioners are down. Someone needs to give things a nudge.

Just when I reattach a Ryzen 3900X, things fall apart. Back to OPN.
ID: 100601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2145
Credit: 41,550,899
RAC: 8,846
Message 100602 - Posted: 10 Feb 2021, 21:14:36 UTC - in response to Message 100599.  

and no validation too

https://munin.kiska.pw/munin/rosetta-day.html

No indication of that on the server page, but you're right - several here still waiting after @6hrs
ID: 100602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hammer

Send message
Joined: 7 Mar 07
Posts: 4
Credit: 15,451,390
RAC: 0
Message 100603 - Posted: 11 Feb 2021, 2:16:46 UTC

Always wondered what a transitioner did.
ID: 100603 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1729
Credit: 18,451,410
RAC: 17,781
Message 100605 - Posted: 11 Feb 2021, 2:50:31 UTC - in response to Message 100603.  

Always wondered what a transitioner did.
A lot.


From the Seti@home website server page.
transitioner: Handles state transitions of workunits and results. Basically, the transitioners keep track of the results in progress and makes sure they properly move down the pipeline. It is always asking the questions: Is this workunit ready to send out? Has this result been received yet? Is this a valid result? Can we delete it now?
Basically it moves the Task from one state to another. Ready to send, sent & awaiting on a result, result received- Is the result Valid? If so, move it to the science database & delete it after a set time period. If not, send out another copy, then check it's result. Has it timed out? Send another one.
Grant
Darwin NT
ID: 100605 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Garry Heather

Send message
Joined: 23 Nov 20
Posts: 10
Credit: 362,743
RAC: 0
Message 100612 - Posted: 11 Feb 2021, 16:36:47 UTC

I see the Scheduler on bwsrv1 has gone AWOL now. I hope it sends us a postcard.
ID: 100612 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 100613 - Posted: 11 Feb 2021, 23:10:50 UTC

Looks like none of you bothered to look at a weather report for Seattle, WA, USA.

I did, and found that today's weather includes times above freezing and times below, with snow expected.

That means a lot of ice on paths to and from the building with the server, so delays fixing any problems are likely.
ID: 100613 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1729
Credit: 18,451,410
RAC: 17,781
Message 100614 - Posted: 12 Feb 2021, 2:18:05 UTC - in response to Message 100613.  

Looks like none of you bothered to look at a weather report for Seattle, WA, USA.

I did, and found that today's weather includes times above freezing and times below, with snow expected.

That means a lot of ice on paths to and from the building with the server, so delays fixing any problems are likely.
That's why remote management is such a wonderful thing,

And it looks like they were successful. "Project is down for maintenance" is what i got when i first checked in this morning, but now the web site is back up & work is flowing again.
Thanks to whoever it was that got it working again.
Grant
Darwin NT
ID: 100614 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dave

Send message
Joined: 10 May 09
Posts: 3
Credit: 109,605
RAC: 0
Message 100630 - Posted: 17 Feb 2021, 16:45:02 UTC

Two tasks downloaded on my fairphone2 Under the tasks view they both say download complete 0.000%

BOINC version is 7.16.16 from the BOINC site as I understand this version not available from Google Play.
ID: 100630 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,214,786
RAC: 932
Message 100631 - Posted: 18 Feb 2021, 0:31:17 UTC - in response to Message 100630.  
Last modified: 18 Feb 2021, 0:32:56 UTC

Two tasks downloaded on my fairphone2 Under the tasks view they both say download complete 0.000%

BOINC version is 7.16.16 from the BOINC site as I understand this version not available from Google Play.


"David Anderson:
Version 7.16.16 of the BOINC Android client has been released. This is the first new Android version in over 4 years, and is a major rewrite of the GUI. Thanks for Vitalii Koshura, Tal Regev, and Isira Seneviratne for their work on this.

The new version is available from the BOINC web site and (for Amazon Fire tablets) from the Amazon app store. It's not on the Google play store because of new restrictions imposed by Google; hopefully this will be resolved in a future version."

I personally don't update every time a new update gets released, I let other try it out and figure out how it actually works and list the things it does differently, then if I either don't care about the new things or like them I will update. Some times though the older versions just work differently enough to make me keep them.

Another reason not to update right away is the Projects need to implement any necessary compatibility changes too. Some Projects are waaaay behind in their version of the Boinc Server side software.
ID: 100631 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lohphat

Send message
Joined: 22 Apr 06
Posts: 5
Credit: 4,965,549
RAC: 0
Message 100632 - Posted: 18 Feb 2021, 7:31:10 UTC

I have two failed work units in the last batch. One also failed again with another user's attempt on the same platform (windows_x86_64)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1197364660

However this one failed on windows_x86_64 but succeeded on aarch64-unknown-linux-gnu

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1197364006

Are these types of asymmetrical failures indicate platform bugs vs tasks which fail on all platforms?
ID: 100632 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1729
Credit: 18,451,410
RAC: 17,781
Message 100637 - Posted: 18 Feb 2021, 23:21:21 UTC - in response to Message 100632.  
Last modified: 18 Feb 2021, 23:22:58 UTC

Are these types of asymmetrical failures indicate platform bugs vs tasks which fail on all platforms?
Sometimes/maybe.
If the Tasks only ever fail & only ever complete on particular platform, then you can put it down to the application. But due to the fact that when a Task is run it is started with a random seed value, so even if you were to run the same task 50 times on the very same system, it may error out on some occasions and not others, all due to the different initial value used.
Grant
Darwin NT
ID: 100637 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 86
Credit: 2,981,693
RAC: 1,845
Message 100639 - Posted: 20 Feb 2021, 9:32:46 UTC

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol dock_and_relax.xml @flags_21ffc515 -in:file:silent drhicks1_fd_21ffc515_egg_140_3229_348_1_000000036_0001_PJS-I-23D_xtl_ROSETTA_relax_super2_SAVE_ALL_OUT_IGNORE_THE_REST_1aa1aa1a.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type score_jump -out:file:silent default.out -in:file:boinc_wu_zip drhicks1_fd_21ffc515_egg_140_3229_348_1_000000036_0001_PJS-I-23D_xtl_ROSETTA_relax_super2_SAVE_ALL_OUT_IGNORE_THE_REST_1aa1aa1a.zip -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3919319
Using database: database_357d5d93529_n_methylminirosetta_database

ERROR: Assertion `active( key )` failed.
ERROR:: Exit from: C:cygwin64homeboinc4.17Rosettamainsourcesrcutility/keys/SmallKeyVector.hh line: 548
02:56:26 (948): called boinc_finish(0)

</stderr_txt>
]]>


from this WU https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1198052234
ID: 100639 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael E.@ team Carl Sagan

Send message
Joined: 5 Apr 08
Posts: 16
Credit: 1,947,553
RAC: 128
Message 100671 - Posted: 1 Mar 2021, 4:11:44 UTC

I downloaded this work unit: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1202147167

It never begins processing. It stays in the "Ready to Start" state. Tasks from other projects process just fine. I have used BOINC for two+ decades but never saw this happen before.

The work unit is Rosetta version 4.20, BOINC is at Version 7.16.11, and it is a Windows 10 system with a GPU. The Options > Computing Preferences are set at 50% of CPUs (6). There are no work units in the Transfers tab.

Should I abort it and get some new work Rosetta units? Or abort and reset the Rosetta project?

Anyone ever seen this?
ID: 100671 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 403
Credit: 12,294,748
RAC: 3,791
Message 100672 - Posted: 1 Mar 2021, 4:30:46 UTC - in response to Message 100671.  

I downloaded this work unit: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1202147167

It never begins processing. It stays in the "Ready to Start" state. Tasks from other projects process just fine. I have used BOINC for two+ decades but never saw this happen before.

The work unit is Rosetta version 4.20, BOINC is at Version 7.16.11, and it is a Windows 10 system with a GPU. The Options > Computing Preferences are set at 50% of CPUs (6). There are no work units in the Transfers tab.

Should I abort it and get some new work Rosetta units? Or abort and reset the Rosetta project?

Anyone ever seen this?


Without seeing what, for example, Milky Way is doing on that machine at the same time it’s impossible to say. You need to look at the full picture, not just one project.

As a example, if one of the other projects has had an off day and fallen behind on its resource share then it will suspend processing on Rosetta, leaving all WUs as Ready to Start, until the other project has caught up.
ID: 100672 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1729
Credit: 18,451,410
RAC: 17,781
Message 100673 - Posted: 1 Mar 2021, 4:46:55 UTC - in response to Message 100672.  

I downloaded this work unit: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1202147167

It never begins processing. It stays in the "Ready to Start" state. Tasks from other projects process just fine. I have used BOINC for two+ decades but never saw this happen before.

The work unit is Rosetta version 4.20, BOINC is at Version 7.16.11, and it is a Windows 10 system with a GPU. The Options > Computing Preferences are set at 50% of CPUs (6). There are no work units in the Transfers tab.

Should I abort it and get some new work Rosetta units? Or abort and reset the Rosetta project?

Anyone ever seen this?


Without seeing what, for example, Milky Way is doing on that machine at the same time it’s impossible to say. You need to look at the full picture, not just one project.

As a example, if one of the other projects has had an off day and fallen behind on its resource share then it will suspend processing on Rosetta, leaving all WUs as Ready to Start, until the other project has caught up.
I would suggest setting your cache to 0 as you are signed up to a dozen projects, almost half of them active.
The smaller the cache, the sooner the system can meet your resource share settings- with that many projects i'd suggest you'd be looking at weeks. With even a small cache, it will take months,
Preferences,
When and how BOINC uses your computer Computing preferences, Computing, Other
           Store at least 0.00 days of work
Store up to an additional 0.01 days of work

I would also run the benchmarks on that system- it is showing the default values, and as they are used when it comes to allocating work (as well as allocating Credit for work done) it is probably impacting on what work is done & when.
On the BOINC manager, Tools, Run CPU benchmarks.
Grant
Darwin NT
ID: 100673 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael E.@ team Carl Sagan

Send message
Joined: 5 Apr 08
Posts: 16
Credit: 1,947,553
RAC: 128
Message 100678 - Posted: 1 Mar 2021, 16:40:52 UTC - in response to Message 100673.  

Without seeing what, for example, Milky Way is doing on that machine at the same time it’s impossible to say. You need to look at the full picture, not just one project.

As a example, if one of the other projects has had an off day and fallen behind on its resource share then it will suspend processing on Rosetta, leaving all WUs as Ready to Start, until the other project has caught up.


Sorry for the incomplete info! No other CPU tasks are running from other projects. The other 3 projects on this PC do not allow new tasks (No New Tasks selected). Thanks for the questions!

I would suggest setting your cache to 0 as you are signed up to a dozen projects, almost half of them active.
The smaller the cache, the sooner the system can meet your resource share settings- with that many projects i'd suggest you'd be looking at weeks. With even a small cache, it will take months,
Preferences,
When and how BOINC uses your computer Computing preferences, Computing, Other
Store at least 0.00 days of work
Store up to an additional 0.01 days of work

I would also run the benchmarks on that system- it is showing the default values, and as they are used when it comes to allocating work (as well as allocating Credit for work done) it is probably impacting on what work is done & when.
On the BOINC manager, Tools, Run CPU benchmarks. .


No other active CPU tasks. Four total projects on this PC.

CPU benchmarks result:
3/1/2021 11:04:04 AM | | Running CPU benchmarks
3/1/2021 11:04:05 AM | | Suspending computation - CPU benchmarks in progress
3/1/2021 11:04:36 AM | | Benchmark results:
3/1/2021 11:04:36 AM | | Number of CPUs: 3
3/1/2021 11:04:36 AM | | 4742 floating point MIPS (Whetstone) per CPU
3/1/2021 11:04:36 AM | | 13780 integer MIPS (Dhrystone) per CPU
3/1/2021 11:04:37 AM | | Resuming computation
3/1/2021 11:12:48 AM | | General prefs: from http://einstein.phys.uwm.edu/ (last modified ---)
3/1/2021 11:12:48 AM | | Computer location: home
3/1/2021 11:12:48 AM | | General prefs: using separate prefs for home
3/1/2021 11:12:48 AM | | Reading preferences override file
3/1/2021 11:12:48 AM | | Preferences:
3/1/2021 11:12:48 AM | | max memory usage when active: 2428.71 MB
3/1/2021 11:12:48 AM | | max memory usage when idle: 8095.70 MB
3/1/2021 11:12:48 AM | | max disk usage: 8.00 GB
3/1/2021 11:12:48 AM | | max CPUs used: 3
3/1/2021 11:12:48 AM | | suspend work if non-BOINC CPU load exceeds 35%
3/1/2021 11:12:48 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
. . .

Good suggestion to run the benchmarks. Yes, I use the Advanced View and I use Local Pref's. I removed Einstein a few days ago so not sure why it appeared in the benchmarks.

I changed the cache for now but do not see why that matters. Cache was previously set for 1 day.

I exited and restarted BOINC. I just enabled Rosetta to download new tasks and it downloaded 2 tasks. I will let them finish - all 3 are running.

I had an issue with GPUGrid a few weeks ago and had to remove BOINC (and its ProgramData directory) completely. Not sure that is related.

Anyhow, not sure why it is fixed (maybe reducing cache?) but it is working OK now.

Thanks!
ID: 100678 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrhastyrib

Send message
Joined: 18 Feb 21
Posts: 90
Credit: 2,541,890
RAC: 0
Message 100680 - Posted: 2 Mar 2021, 7:16:26 UTC
Last modified: 2 Mar 2021, 7:36:24 UTC

Hello to all. I recently joined Rosetta@home with three computers. Things were fine until a few days ago. The wireless connection was interrupted over the weekend for one of the three, causing some of the tasks to time out for processing start (I am not sure if any of this history is relevant to the problem). I reconnected the wireless, and cleared out the task queue, and it filled up with new tasks. Since then, none of the new tasks will start. They all just sit at "Ready to start." Eventually, the new tasks abort for not starting by the deadline.

I have been fiddling with the settings, and ran a CPU benchmark, nothing helps. I even deleted the program and reinstalled it.

The other two computers continue to operate normally. All three computers are operating on Linux Mint.

I tried to search for information about this problem; there is little that I could find other than it seems to be something that others encounter because of conflicts with other projects. I am on Rosetta@home only.

Any guidance towards a solution would be appreciated.
ID: 100680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1729
Credit: 18,451,410
RAC: 17,781
Message 100682 - Posted: 2 Mar 2021, 10:15:46 UTC - in response to Message 100680.  

Any guidance towards a solution would be appreciated.
Are you using Web based preferences, or settings in the BOINC Manager?
If the Web based settings, in the Manager menu, select Options, Computing preferences and make sure it shows that the Web based preferences are being used. If local, make sure a value hasn't been set that stops BOINC from running.

With your computing preferences, what "Usage limits" & "When to suspend" values do you have?
Ideally-
Usage limits	
Use at most 100 % of the CPUs
Use at most 100 % of CPU time

When to suspend	
           Suspend when computer is on battery	
               Suspend when computer is in use	
 Suspend GPU computing when computer is in use	
   'In use' means mouse/keyboard input in last 3 minutes
  Suspend when no mouse/keyboard input in last --- minutes
     Suspend when non-BOINC CPU usage is above --- %
                          Compute only between ---
If it's set to suspend at any time, check to see that there is nothing going on, on that system, that meets any of those settings values- eg some system or other process using CPU time, stopping the Tasks from starting.
Check that something isn't hogging system RAM, and hitting the limits that stop BOINC from processing work.

In the BOINC Manager, you can select one of the Tasks ready to start, Suspend it, then Resume it a few seconds later & see if that kick starts things.


And even with just Rosetta as your only project, with the very short deadlines no cache (or an extremely small one) eg 0.1 + 0.01 is the best way to go.
Grant
Darwin NT
ID: 100682 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 80 · 81 · 82 · 83 · 84 · 85 · 86 . . . 310 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org