Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 80 · 81 · 82 · 83 · 84 · 85 · 86 . . . 311 · Next
Author | Message |
---|---|
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,981,693 RAC: 1,241 |
|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
Server Status page shows the Transitioners are down. Someone needs to give things a nudge. Tasks ready to send 1 Transitioner backlog (hours) 8.98 (usually zero, or very close to it). Grant Darwin NT |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Server Status page shows the Transitioners are down. Someone needs to give things a nudge. Just when I reattach a Ryzen 3900X, things fall apart. Back to OPN. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2146 Credit: 41,570,180 RAC: 8,210 |
and no validation too No indication of that on the server page, but you're right - several here still waiting after @6hrs |
Hammer Send message Joined: 7 Mar 07 Posts: 4 Credit: 15,451,390 RAC: 0 |
Always wondered what a transitioner did. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
Always wondered what a transitioner did.A lot. From the Seti@home website server page. transitioner: Handles state transitions of workunits and results. Basically, the transitioners keep track of the results in progress and makes sure they properly move down the pipeline. It is always asking the questions: Is this workunit ready to send out? Has this result been received yet? Is this a valid result? Can we delete it now?Basically it moves the Task from one state to another. Ready to send, sent & awaiting on a result, result received- Is the result Valid? If so, move it to the science database & delete it after a set time period. If not, send out another copy, then check it's result. Has it timed out? Send another one. Grant Darwin NT |
Garry Heather Send message Joined: 23 Nov 20 Posts: 10 Credit: 362,743 RAC: 0 |
I see the Scheduler on bwsrv1 has gone AWOL now. I hope it sends us a postcard. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 826 |
Looks like none of you bothered to look at a weather report for Seattle, WA, USA. I did, and found that today's weather includes times above freezing and times below, with snow expected. That means a lot of ice on paths to and from the building with the server, so delays fixing any problems are likely. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
Looks like none of you bothered to look at a weather report for Seattle, WA, USA.That's why remote management is such a wonderful thing, And it looks like they were successful. "Project is down for maintenance" is what i got when i first checked in this morning, but now the web site is back up & work is flowing again. Thanks to whoever it was that got it working again. Grant Darwin NT |
Dave Send message Joined: 10 May 09 Posts: 3 Credit: 109,605 RAC: 0 |
Two tasks downloaded on my fairphone2 Under the tasks view they both say download complete 0.000% BOINC version is 7.16.16 from the BOINC site as I understand this version not available from Google Play. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,217,610 RAC: 822 |
Two tasks downloaded on my fairphone2 Under the tasks view they both say download complete 0.000% "David Anderson: Version 7.16.16 of the BOINC Android client has been released. This is the first new Android version in over 4 years, and is a major rewrite of the GUI. Thanks for Vitalii Koshura, Tal Regev, and Isira Seneviratne for their work on this. The new version is available from the BOINC web site and (for Amazon Fire tablets) from the Amazon app store. It's not on the Google play store because of new restrictions imposed by Google; hopefully this will be resolved in a future version." I personally don't update every time a new update gets released, I let other try it out and figure out how it actually works and list the things it does differently, then if I either don't care about the new things or like them I will update. Some times though the older versions just work differently enough to make me keep them. Another reason not to update right away is the Projects need to implement any necessary compatibility changes too. Some Projects are waaaay behind in their version of the Boinc Server side software. |
lohphat Send message Joined: 22 Apr 06 Posts: 5 Credit: 4,965,549 RAC: 0 |
I have two failed work units in the last batch. One also failed again with another user's attempt on the same platform (windows_x86_64) https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1197364660 However this one failed on windows_x86_64 but succeeded on aarch64-unknown-linux-gnu https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1197364006 Are these types of asymmetrical failures indicate platform bugs vs tasks which fail on all platforms? |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
Are these types of asymmetrical failures indicate platform bugs vs tasks which fail on all platforms?Sometimes/maybe. If the Tasks only ever fail & only ever complete on particular platform, then you can put it down to the application. But due to the fact that when a Task is run it is started with a random seed value, so even if you were to run the same task 50 times on the very same system, it may error out on some occasions and not others, all due to the different initial value used. Grant Darwin NT |
Kissagogo27 Send message Joined: 31 Mar 20 Posts: 86 Credit: 2,981,693 RAC: 1,241 |
<core_client_version>7.16.11</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol dock_and_relax.xml @flags_21ffc515 -in:file:silent drhicks1_fd_21ffc515_egg_140_3229_348_1_000000036_0001_PJS-I-23D_xtl_ROSETTA_relax_super2_SAVE_ALL_OUT_IGNORE_THE_REST_1aa1aa1a.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type score_jump -out:file:silent default.out -in:file:boinc_wu_zip drhicks1_fd_21ffc515_egg_140_3229_348_1_000000036_0001_PJS-I-23D_xtl_ROSETTA_relax_super2_SAVE_ALL_OUT_IGNORE_THE_REST_1aa1aa1a.zip -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3919319 Using database: database_357d5d93529_n_methylminirosetta_database ERROR: Assertion `active( key )` failed. ERROR:: Exit from: C:cygwin64homeboinc4.17Rosettamainsourcesrcutility/keys/SmallKeyVector.hh line: 548 02:56:26 (948): called boinc_finish(0) </stderr_txt> ]]> from this WU https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1198052234 |
Michael E.@ team Carl Sagan Send message Joined: 5 Apr 08 Posts: 16 Credit: 1,947,553 RAC: 86 |
I downloaded this work unit: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1202147167 It never begins processing. It stays in the "Ready to Start" state. Tasks from other projects process just fine. I have used BOINC for two+ decades but never saw this happen before. The work unit is Rosetta version 4.20, BOINC is at Version 7.16.11, and it is a Windows 10 system with a GPU. The Options > Computing Preferences are set at 50% of CPUs (6). There are no work units in the Transfers tab. Should I abort it and get some new work Rosetta units? Or abort and reset the Rosetta project? Anyone ever seen this? |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 404 Credit: 12,294,748 RAC: 2,551 |
I downloaded this work unit: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1202147167 Without seeing what, for example, Milky Way is doing on that machine at the same time it’s impossible to say. You need to look at the full picture, not just one project. As a example, if one of the other projects has had an off day and fallen behind on its resource share then it will suspend processing on Rosetta, leaving all WUs as Ready to Start, until the other project has caught up. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
I would suggest setting your cache to 0 as you are signed up to a dozen projects, almost half of them active.I downloaded this work unit: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1202147167 The smaller the cache, the sooner the system can meet your resource share settings- with that many projects i'd suggest you'd be looking at weeks. With even a small cache, it will take months, Preferences, When and how BOINC uses your computer Computing preferences, Computing, Other Store at least 0.00 days of work Store up to an additional 0.01 days of work I would also run the benchmarks on that system- it is showing the default values, and as they are used when it comes to allocating work (as well as allocating Credit for work done) it is probably impacting on what work is done & when. On the BOINC manager, Tools, Run CPU benchmarks. Grant Darwin NT |
Michael E.@ team Carl Sagan Send message Joined: 5 Apr 08 Posts: 16 Credit: 1,947,553 RAC: 86 |
Without seeing what, for example, Milky Way is doing on that machine at the same time it’s impossible to say. You need to look at the full picture, not just one project. Sorry for the incomplete info! No other CPU tasks are running from other projects. The other 3 projects on this PC do not allow new tasks (No New Tasks selected). Thanks for the questions! I would suggest setting your cache to 0 as you are signed up to a dozen projects, almost half of them active. No other active CPU tasks. Four total projects on this PC. CPU benchmarks result: 3/1/2021 11:04:04 AM | | Running CPU benchmarks 3/1/2021 11:04:05 AM | | Suspending computation - CPU benchmarks in progress 3/1/2021 11:04:36 AM | | Benchmark results: 3/1/2021 11:04:36 AM | | Number of CPUs: 3 3/1/2021 11:04:36 AM | | 4742 floating point MIPS (Whetstone) per CPU 3/1/2021 11:04:36 AM | | 13780 integer MIPS (Dhrystone) per CPU 3/1/2021 11:04:37 AM | | Resuming computation 3/1/2021 11:12:48 AM | | General prefs: from http://einstein.phys.uwm.edu/ (last modified ---) 3/1/2021 11:12:48 AM | | Computer location: home 3/1/2021 11:12:48 AM | | General prefs: using separate prefs for home 3/1/2021 11:12:48 AM | | Reading preferences override file 3/1/2021 11:12:48 AM | | Preferences: 3/1/2021 11:12:48 AM | | max memory usage when active: 2428.71 MB 3/1/2021 11:12:48 AM | | max memory usage when idle: 8095.70 MB 3/1/2021 11:12:48 AM | | max disk usage: 8.00 GB 3/1/2021 11:12:48 AM | | max CPUs used: 3 3/1/2021 11:12:48 AM | | suspend work if non-BOINC CPU load exceeds 35% 3/1/2021 11:12:48 AM | | (to change preferences, visit a project web site or select Preferences in the Manager) . . . Good suggestion to run the benchmarks. Yes, I use the Advanced View and I use Local Pref's. I removed Einstein a few days ago so not sure why it appeared in the benchmarks. I changed the cache for now but do not see why that matters. Cache was previously set for 1 day. I exited and restarted BOINC. I just enabled Rosetta to download new tasks and it downloaded 2 tasks. I will let them finish - all 3 are running. I had an issue with GPUGrid a few weeks ago and had to remove BOINC (and its ProgramData directory) completely. Not sure that is related. Anyhow, not sure why it is fixed (maybe reducing cache?) but it is working OK now. Thanks! |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
Hello to all. I recently joined Rosetta@home with three computers. Things were fine until a few days ago. The wireless connection was interrupted over the weekend for one of the three, causing some of the tasks to time out for processing start (I am not sure if any of this history is relevant to the problem). I reconnected the wireless, and cleared out the task queue, and it filled up with new tasks. Since then, none of the new tasks will start. They all just sit at "Ready to start." Eventually, the new tasks abort for not starting by the deadline. I have been fiddling with the settings, and ran a CPU benchmark, nothing helps. I even deleted the program and reinstalled it. The other two computers continue to operate normally. All three computers are operating on Linux Mint. I tried to search for information about this problem; there is little that I could find other than it seems to be something that others encounter because of conflicts with other projects. I am on Rosetta@home only. Any guidance towards a solution would be appreciated. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1734 Credit: 18,532,940 RAC: 17,945 |
Any guidance towards a solution would be appreciated.Are you using Web based preferences, or settings in the BOINC Manager? If the Web based settings, in the Manager menu, select Options, Computing preferences and make sure it shows that the Web based preferences are being used. If local, make sure a value hasn't been set that stops BOINC from running. With your computing preferences, what "Usage limits" & "When to suspend" values do you have? Ideally- Usage limits Use at most 100 % of the CPUs Use at most 100 % of CPU time When to suspend Suspend when computer is on battery Suspend when computer is in use Suspend GPU computing when computer is in use 'In use' means mouse/keyboard input in last 3 minutes Suspend when no mouse/keyboard input in last --- minutes Suspend when non-BOINC CPU usage is above --- % Compute only between ---If it's set to suspend at any time, check to see that there is nothing going on, on that system, that meets any of those settings values- eg some system or other process using CPU time, stopping the Tasks from starting. Check that something isn't hogging system RAM, and hitting the limits that stop BOINC from processing work. In the BOINC Manager, you can select one of the Tasks ready to start, Suspend it, then Resume it a few seconds later & see if that kick starts things. And even with just Rosetta as your only project, with the very short deadlines no cache (or an extremely small one) eg 0.1 + 0.01 is the best way to go. Grant Darwin NT |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org