Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 47 · 48 · 49 · 50 · 51 · 52 · 53 . . . 55 · Next
Author | Message |
---|---|
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
For me, what it means is that if I am going to be away from systems for say a few days or longer, I need to have another project ready to do work. I had been pushing work toward Rosetta over the past few months getting a RAC 24K or so, but if I am going to not be able to monitor things that is going to drop back. 1) For whatever reason, reporting gets stalled periodically and requires a manual 'push' -- which then reports 5, 10 or more work units and downloads work units -- this does not happen with any other project. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
@BarryAZ, it would be helpful if you could give some details on what getting stalled means. Are you saying completed work does not get reported? That no new work is requested? or both? What do the messages in the event log say? Are any of that tasks running "high priority" at the time? Rosetta Moderator: Mod.Sense |
Meddling-Monk Send message Joined: 30 Mar 13 Posts: 1 Credit: 2,004,255 RAC: 0 |
Since today (March 8th), around noon UTC, I am experiencing numerous "compute errors". Almost 80% of the tasks are terminated after various time lengths, showing this error. I have encountered this issue lately. :: edit :: The following statement is now false after just after typing this response, my first computation error occurred with an older Boinc client. I have uninstalled the latest Boinc Client and rolled back to a previous one. No issues now. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
Both -- I will need to see if I can get a piece of the event log. And no, there are no 'high priority' tasks running. One other piece of information -- when I encounter this, on the status panel for projects, Rosetta reports a 'communication deferred with a LONG wait cycle -- 10 hours or more. Note, this happens on several different workstations and it is a relatively recent occurrence. If I am checking the workstations regularly it isn't that big a deal. But there are times workstations run for days on 'autopilot' and in that case, the Rosetta jobs don't report nor do I get new work. I noticed that yesterday when I checked -- there were no CPU tasks left to run since I had other projects set with 'no new work'. I've changed that for now. @BarryAZ, it would be helpful if you could give some details on what getting stalled means. Are you saying completed work does not get reported? That no new work is requested? or both? What do the messages in the event log say? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
So, is there anything more here than the server was very low on WUs available to issue earlier in the week? A host hits the scheduler, receives no new work when requested, goes in to lengthy backoff, and if the cache of work on the machine does not exceed the backoff, then the machine goes much of a day without work? I'm not sure when such lengthy backoffs came in to BOINC Manager, but it would seem that keeping a 2 or 3 day cache of work would help weather periods when the server is backlogged or out of work. Rosetta Moderator: Mod.Sense |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
So, is there anything more here than the server was very low on WUs available to issue earlier in the week? A host hits the scheduler, receives no new work when requested, goes in to lengthy backoff, and if the cache of work on the machine does not exceed the backoff, then the machine goes much of a day without work? Very good thought, but the problem was closer to home for me. I had used an app_config.xml to limit the number of ATLAS tasks to 4 at a time. But unfortunately the BOINC scheduler does not know about app_configs, and so it download enough ATLAS tasks for six cores. That left no room for more Rosetta work units, and one core was left empty. I have changed the app_config, will work off the excess ATLAS tasks, and will be back to Rosetta shortly. I have seen this before, but have to learn it again each time it seems. |
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
It seems that uploads are stalled -- multiple systems for me. |
HW&JC Send message Joined: 2 May 08 Posts: 21 Credit: 7,897,606 RAC: 754 |
It seems that uploads are stalled -- multiple systems for me. New database I saw was 270Mb in size. Expect every single person is downloading it over the last day or so. It should ease down shortly. Except the server is currently down for maintenance at this precise moment |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1980 Credit: 9,197,551 RAC: 2,407 |
On my 6 core Amd: Rosetta Mini for Android is not available for your type of computer. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2082 Credit: 40,621,050 RAC: 4,944 |
On my 6 core Amd: Yup, I saw that yesterday too |
Exidor Send message Joined: 6 Jun 08 Posts: 1 Credit: 144,204 RAC: 0 |
I have been trying to get work from Rosetta. Installed new BOINC. Got a few jobs at first but now nothing is downloading. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
There have been some issues with large numbers of new hosts coming in to the project. But, at present, it looks like there are plenty of tasks at-the-ready on the server. So, I suspect the automatic retries done in the BOINC Manager have already received some new work. Rosetta Moderator: Mod.Sense |
Mark Kramer Send message Joined: 25 Jun 10 Posts: 5 Credit: 74,534 RAC: 0 |
Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences. Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
You've broken something, I'm getting this message now on all my Ubuntu rigs. Fri 29 Apr 2016 12:13:59 AEST | rosetta@home | Sending scheduler request: Requested by user. Fri 29 Apr 2016 12:13:59 AEST | rosetta@home | Reporting 6 completed tasks, requesting new tasks for CPU Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | Scheduler request completed: got 0 new tasks Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | No work sent Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | Rosetta Mini for Android is not available for your type of computer. |
googloo Send message Joined: 15 Sep 06 Posts: 133 Credit: 22,505,528 RAC: 1,794 |
You've broken something, I'm getting this message now on all my Ubuntu rigs. Me too, on my Windows 7 machine. |
amgthis Send message Joined: 25 Mar 06 Posts: 81 Credit: 203,879,282 RAC: 0 |
All of my pred5csxxxxx w/u's terminate after 2-3 hrs. but show no error. Is this by design and OK? I don't want to dump good w/u's for no reason. good luck with the server work. /mike |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
If there are no errors, it should be fine. |
Dr. Merkwürdigliebe Send message Joined: 5 Dec 10 Posts: 81 Credit: 2,657,273 RAC: 0 |
If there are no errors, it should be fine. There are errors... Example
|
BarryAZ Send message Joined: 27 Dec 05 Posts: 153 Credit: 30,843,285 RAC: 0 |
I have had a persistent problem reporting work. For me, what happens on my systems is that the reporting gets a 'server offline' message from the project. OK that happens when the server is busy -- after all the work units uploading and downloading can be quite large. What is different with Rosetta though is that when I get one of those reports, my workstation immediately shifts to 'deferred reporting for 24 hours'. On all the other projects when a report runs into a 'server busy or offline' response, there is a progressive retry cycle starting from a try back in 10 minutes or so up to around 4 hours. With Rosetta, and only Rosetta, I get the 24 hours defer. When I go to the workstations to manually push the reports -- they happen every time. What that means for me is that when I know I won't be available to manually push reports at least once during the day, I essentially end up having to shift to another project for those systems. That's what I do during a vacation away from the systems. Again, this only happens with Rosetta and not with any of the other many projects I run cycles for. |
sinspin Send message Joined: 30 Jan 06 Posts: 29 Credit: 6,574,585 RAC: 0 |
You've broken something, I'm getting this message now on all my Ubuntu rigs. Same here, on my Win7 Ultimate!! 30.04.2016 11:19:55 | rosetta@home | Sending scheduler request: To fetch work. 30.04.2016 11:19:55 | rosetta@home | Requesting new tasks for CPU and NVIDIA GPU and Intel GPU 30.04.2016 11:19:58 | rosetta@home | Scheduler request completed: got 0 new tasks 30.04.2016 11:19:58 | rosetta@home | No work sent 30.04.2016 11:19:58 | rosetta@home | Rosetta Mini for Android is not available for your type of computer. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org