Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 47 · 48 · 49 · 50 · 51 · 52 · 53 . . . 55 · Next

AuthorMessage
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 79747 - Posted: 9 Mar 2016, 19:56:37 UTC - in response to Message 79746.  

For me, what it means is that if I am going to be away from systems for say a few days or longer, I need to have another project ready to do work. I had been pushing work toward Rosetta over the past few months getting a RAC 24K or so, but if I am going to not be able to monitor things that is going to drop back.


1) For whatever reason, reporting gets stalled periodically and requires a manual 'push' -- which then reports 5, 10 or more work units and downloads work units -- this does not happen with any other project.

I have seen that too. I attributed it to the fact that I was coming off a CPDN work unit that took 8 or 9 days to complete, and the BOINC scheduler had not picked up on that yet. But another thought is that the BOINC server version on Rosetta is rather old (it is said), and may not work so well with the latest BOINC clients; I am using 7.6.22 or 7.6.29, depending on the machine.

I have not seen the errors yet, but just started again a week ago, with 40 successes thus far (24 hour runs).


ID: 79747 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 79748 - Posted: 9 Mar 2016, 20:30:56 UTC

@BarryAZ, it would be helpful if you could give some details on what getting stalled means. Are you saying completed work does not get reported? That no new work is requested? or both? What do the messages in the event log say?

Are any of that tasks running "high priority" at the time?
Rosetta Moderator: Mod.Sense
ID: 79748 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Meddling-Monk

Send message
Joined: 30 Mar 13
Posts: 1
Credit: 2,004,255
RAC: 0
Message 79749 - Posted: 10 Mar 2016, 2:02:04 UTC - in response to Message 79722.  
Last modified: 10 Mar 2016, 2:03:41 UTC

Since today (March 8th), around noon UTC, I am experiencing numerous "compute errors". Almost 80% of the tasks are terminated after various time lengths, showing this error.
Has anyone else made the same experience?


I have encountered this issue lately.

:: edit :: The following statement is now false after just after typing this response, my first computation error occurred with an older Boinc client.

I have uninstalled the latest Boinc Client and rolled back to a previous one. No issues now.
ID: 79749 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 79751 - Posted: 10 Mar 2016, 5:00:33 UTC - in response to Message 79748.  

Both -- I will need to see if I can get a piece of the event log.

And no, there are no 'high priority' tasks running.

One other piece of information -- when I encounter this, on the status panel for projects, Rosetta reports a 'communication deferred with a LONG wait cycle -- 10 hours or more. Note, this happens on several different workstations and it is a relatively recent occurrence. If I am checking the workstations regularly it isn't that big a deal. But there are times workstations run for days on 'autopilot' and in that case, the Rosetta jobs don't report nor do I get new work.

I noticed that yesterday when I checked -- there were no CPU tasks left to run since I had other projects set with 'no new work'. I've changed that for now.


@BarryAZ, it would be helpful if you could give some details on what getting stalled means. Are you saying completed work does not get reported? That no new work is requested? or both? What do the messages in the event log say?

Are any of that tasks running "high priority" at the time?


ID: 79751 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 79752 - Posted: 10 Mar 2016, 5:41:36 UTC

So, is there anything more here than the server was very low on WUs available to issue earlier in the week? A host hits the scheduler, receives no new work when requested, goes in to lengthy backoff, and if the cache of work on the machine does not exceed the backoff, then the machine goes much of a day without work?

I'm not sure when such lengthy backoffs came in to BOINC Manager, but it would seem that keeping a 2 or 3 day cache of work would help weather periods when the server is backlogged or out of work.
Rosetta Moderator: Mod.Sense
ID: 79752 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 79753 - Posted: 11 Mar 2016, 2:10:29 UTC - in response to Message 79752.  

So, is there anything more here than the server was very low on WUs available to issue earlier in the week? A host hits the scheduler, receives no new work when requested, goes in to lengthy backoff, and if the cache of work on the machine does not exceed the backoff, then the machine goes much of a day without work?

Very good thought, but the problem was closer to home for me. I had used an app_config.xml to limit the number of ATLAS tasks to 4 at a time. But unfortunately the BOINC scheduler does not know about app_configs, and so it download enough ATLAS tasks for six cores. That left no room for more Rosetta work units, and one core was left empty. I have changed the app_config, will work off the excess ATLAS tasks, and will be back to Rosetta shortly. I have seen this before, but have to learn it again each time it seems.

ID: 79753 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 79827 - Posted: 31 Mar 2016, 21:58:53 UTC

It seems that uploads are stalled -- multiple systems for me.


ID: 79827 · Rating: 0 · rate: Rate + / Rate - Report as offensive
HWJC

Send message
Joined: 2 May 08
Posts: 21
Credit: 8,000,737
RAC: 2,005
Message 79830 - Posted: 1 Apr 2016, 17:51:36 UTC - in response to Message 79827.  

It seems that uploads are stalled -- multiple systems for me.

New database I saw was 270Mb in size. Expect every single person is downloading it over the last day or so. It should ease down shortly. Except the server is currently down for maintenance at this precise moment
ID: 79830 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1996
Credit: 9,678,094
RAC: 8,128
Message 79837 - Posted: 2 Apr 2016, 21:00:10 UTC

On my 6 core Amd:
Rosetta Mini for Android is not available for your type of computer.

ID: 79837 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2128
Credit: 41,307,917
RAC: 10,728
Message 79838 - Posted: 3 Apr 2016, 22:25:12 UTC - in response to Message 79837.  

On my 6 core Amd:
Rosetta Mini for Android is not available for your type of computer.

Yup, I saw that yesterday too
ID: 79838 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Exidor

Send message
Joined: 6 Jun 08
Posts: 1
Credit: 144,204
RAC: 0
Message 79902 - Posted: 21 Apr 2016, 8:50:14 UTC

I have been trying to get work from Rosetta. Installed new BOINC. Got a few jobs at first but now nothing is downloading.
ID: 79902 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 79903 - Posted: 21 Apr 2016, 13:36:16 UTC

There have been some issues with large numbers of new hosts coming in to the project. But, at present, it looks like there are plenty of tasks at-the-ready on the server. So, I suspect the automatic retries done in the BOINC Manager have already received some new work.
Rosetta Moderator: Mod.Sense
ID: 79903 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mark Kramer

Send message
Joined: 25 Jun 10
Posts: 5
Credit: 74,534
RAC: 0
Message 79917 - Posted: 24 Apr 2016, 1:08:05 UTC
Last modified: 24 Apr 2016, 1:16:44 UTC

Mine seems to be locking up since Friday 4/22/16. I run both SETI and Rosetta but each time it's locked up, it's been stuck on Rosetta. It's running on an XP machine so I don't know if that's the issue. System logs don't show anything other than events stop happening. I've already uninstalled as well as completely wipe the BOINC folder then reinstalled. I have noticed in Task Manager that Minirosetta has recently been starting itself when I've been using the computer despite my preferences.

Since I haven't seen any other entries on the BOINC/SETI/Rosetta board, I presume that it is this computer or XP. As it stands, I'm going to remove BOINC for awhile since the issues with it just turning itself on while the computer is in use as well as it locking up have become too much to ignore. However, if this is happening to anyone else and they worked around it, feel free to post. I also wasn't sure which thread to post this in so I posted it in the top 2 threads concerning technical problems.
ID: 79917 · Rating: 0 · rate: Rate + / Rate - Report as offensive
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 79965 - Posted: 29 Apr 2016, 2:25:36 UTC

You've broken something, I'm getting this message now on all my Ubuntu rigs.

Fri 29 Apr 2016 12:13:59 AEST | rosetta@home | Sending scheduler request: Requested by user.
Fri 29 Apr 2016 12:13:59 AEST | rosetta@home | Reporting 6 completed tasks, requesting new tasks for CPU
Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | Scheduler request completed: got 0 new tasks
Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | No work sent
Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | Rosetta Mini for Android is not available for your type of computer.


ID: 79965 · Rating: 0 · rate: Rate + / Rate - Report as offensive
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 22,748,351
RAC: 4,297
Message 79970 - Posted: 29 Apr 2016, 14:15:15 UTC - in response to Message 79965.  
Last modified: 29 Apr 2016, 14:18:46 UTC

You've broken something, I'm getting this message now on all my Ubuntu rigs.

Fri 29 Apr 2016 12:13:59 AEST | rosetta@home | Sending scheduler request: Requested by user.
Fri 29 Apr 2016 12:13:59 AEST | rosetta@home | Reporting 6 completed tasks, requesting new tasks for CPU
Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | Scheduler request completed: got 0 new tasks
Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | No work sent
Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | Rosetta Mini for Android is not available for your type of computer.



Me too, on my Windows 7 machine.
ID: 79970 · Rating: 0 · rate: Rate + / Rate - Report as offensive
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 79971 - Posted: 29 Apr 2016, 15:35:22 UTC

All of my pred5csxxxxx w/u's terminate after 2-3 hrs. but show no error.
Is this by design and OK?

I don't want to dump good w/u's for no reason.

good luck with the server work.

/mike
ID: 79971 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 79973 - Posted: 29 Apr 2016, 21:17:56 UTC

If there are no errors, it should be fine.
ID: 79973 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Dr. Merkwürdigliebe
Avatar

Send message
Joined: 5 Dec 10
Posts: 81
Credit: 2,657,273
RAC: 0
Message 79974 - Posted: 29 Apr 2016, 21:52:33 UTC - in response to Message 79973.  
Last modified: 29 Apr 2016, 21:54:51 UTC

If there are no errors, it should be fine.


There are errors...

Example



ERROR: Error in simple_cycpep_predict app! The imported native pose has a different number of residues than the sequence provided.
ERROR:: Exit from: src/protocols/cyclic_peptide_predict/SimpleCycpepPredictApplication.cc line: 751
[0x4485e82]
[0x480914]
[0x48925c]
[0x4025d4]
[0x46df0cb]
[0x400429]
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
ID: 79974 · Rating: 0 · rate: Rate + / Rate - Report as offensive
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 79977 - Posted: 30 Apr 2016, 4:51:31 UTC

I have had a persistent problem reporting work. For me, what happens on my systems is that the reporting gets a 'server offline' message from the project. OK that happens when the server is busy -- after all the work units uploading and downloading can be quite large.

What is different with Rosetta though is that when I get one of those reports, my workstation immediately shifts to 'deferred reporting for 24 hours'.

On all the other projects when a report runs into a 'server busy or offline' response, there is a progressive retry cycle starting from a try back in 10 minutes or so up to around 4 hours.

With Rosetta, and only Rosetta, I get the 24 hours defer.

When I go to the workstations to manually push the reports -- they happen every time.

What that means for me is that when I know I won't be available to manually push reports at least once during the day, I essentially end up having to shift to another project for those systems. That's what I do during a vacation away from the systems.

Again, this only happens with Rosetta and not with any of the other many projects I run cycles for.


ID: 79977 · Rating: 0 · rate: Rate + / Rate - Report as offensive
sinspin

Send message
Joined: 30 Jan 06
Posts: 29
Credit: 6,574,585
RAC: 0
Message 79979 - Posted: 30 Apr 2016, 7:26:47 UTC - in response to Message 79965.  

You've broken something, I'm getting this message now on all my Ubuntu rigs.

Fri 29 Apr 2016 12:13:59 AEST | rosetta@home | Sending scheduler request: Requested by user.
Fri 29 Apr 2016 12:13:59 AEST | rosetta@home | Reporting 6 completed tasks, requesting new tasks for CPU
Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | Scheduler request completed: got 0 new tasks
Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | No work sent
Fri 29 Apr 2016 12:14:04 AEST | rosetta@home | Rosetta Mini for Android is not available for your type of computer.


Same here, on my Win7 Ultimate!!

30.04.2016 11:19:55 | rosetta@home | Sending scheduler request: To fetch work.
30.04.2016 11:19:55 | rosetta@home | Requesting new tasks for CPU and NVIDIA GPU and Intel GPU
30.04.2016 11:19:58 | rosetta@home | Scheduler request completed: got 0 new tasks
30.04.2016 11:19:58 | rosetta@home | No work sent
30.04.2016 11:19:58 | rosetta@home | Rosetta Mini for Android is not available for your type of computer.

ID: 79979 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 47 · 48 · 49 · 50 · 51 · 52 · 53 . . . 55 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org