Rosetta Problems?

Message boards : Rosetta@home Science : Rosetta Problems?

To post messages, you must log in.

AuthorMessage
F. Prefect
Avatar

Send message
Joined: 7 Nov 05
Posts: 35
Credit: 114,312
RAC: 0
Message 3018 - Posted: 12 Nov 2005, 21:10:57 UTC

About 3 hours ago I noticed that the Rosetta program I have running on 4 machines seemed to just stop processing jobs. The jobs are shown to be running in the BOINC manager, but the % completed remains constant and the time to completion is increasing rather that decreasing. Einstein@Home jobs continue to run normally. If it was just only one machine, but 4 ? I have made no changes in preferences.

Very spooky

F. Prefect
ID: 3018 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 3025 - Posted: 12 Nov 2005, 22:13:54 UTC - in response to Message 3018.  

About 3 hours ago I noticed that the Rosetta program I have running on 4 machines seemed to just stop processing jobs. The jobs are shown to be running in the BOINC manager, but the % completed remains constant and the time to completion is increasing rather that decreasing. Einstein@Home jobs continue to run normally. If it was just only one machine, but 4 ? I have made no changes in preferences.

Very spooky

F. Prefect


Linux install by any chance?

I'm seeing the exact same thing on exactly one system, which is running RedHat Fedora Core 4, Boinc 5.2.6, and it was the 4.78 version of Rosetta that kept "wedging"

If it is Linux, next time you see this try doing a ps -ax, and see what shows in the STAT column. I was getting 'S', when it should be 'R' for a running process.

This thread describes the problem I'm having.

ID: 3025 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
F. Prefect
Avatar

Send message
Joined: 7 Nov 05
Posts: 35
Credit: 114,312
RAC: 0
Message 3026 - Posted: 12 Nov 2005, 22:31:50 UTC - in response to Message 3025.  

About 3 hours ago I noticed that the Rosetta program I have running on 4 machines seemed to just stop processing jobs. The jobs are shown to be running in the BOINC manager, but the % completed remains constant and the time to completion is increasing rather that decreasing. Einstein@Home jobs continue to run normally. If it was just only one machine, but 4 ? I have made no changes in preferences.

Very spooky

F. Prefect


Linux install by any chance?

I'm seeing the exact same thing on exactly one system, which is running RedHat Fedora Core 4, Boinc 5.2.6, and it was the 4.78 version of Rosetta that kept "wedging"

If it is Linux, next time you see this try doing a ps -ax, and see what shows in the STAT column. I was getting 'S', when it should be 'R' for a running process.

This thread describes the problem I'm having.


No, I was running Redhat Linux (7.1 I believe) on dual boot with windoz98, but now two of the machines are running windoz98, the onther 2, XP. I doesn't seem to matter whether I am connected or not and I can switch over to Einstein@Home and those jobs run normally. I know it was working this morning becasue I uploaded 4 or 5 results that were completed this morning. Now, the BOINC manager shows them running normally but the percentage completed remains unchanged and the the time to completion slowly increases.

F. Prefect
ID: 3026 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 3030 - Posted: 12 Nov 2005, 23:33:52 UTC

Try restarting the boinc manager and see if that helps but before doing it, save the stdout.txt files in the slot directories where the jobs are being run and email them to me if you can. dekim at u.washington.edu. It's odd that it is happening to all 4 at the same time. If restarting doesn't help, cancel the workunits and see if the next wu's run okay.

Is anyone else out there having similar problems on multiple machines?
ID: 3030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 3044 - Posted: 13 Nov 2005, 2:33:46 UTC - in response to Message 3030.  

Try restarting the boinc manager and see if that helps but before doing it, save the stdout.txt files in the slot directories where the jobs are being run and email them to me if you can. dekim at u.washington.edu. It's odd that it is happening to all 4 at the same time. If restarting doesn't help, cancel the workunits and see if the next wu's run okay.

Is anyone else out there having similar problems on multiple machines?


As I hinted, I'm having a similar problem, but it's only on my one Linux system. If I catch it at it again, do you want me to send the same set of files to you?
ID: 3044 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 3058 - Posted: 13 Nov 2005, 3:53:10 UTC

Yes, send me the stdout.txt file.
ID: 3058 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : Rosetta Problems?



©2024 University of Washington
https://www.bakerlab.org