Report Problems with Rosetta Version 5.22

Message boards : Number crunching : Report Problems with Rosetta Version 5.22

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
Profile [B^S] Dr. Bill Skiba
Avatar

Send message
Joined: 26 Oct 05
Posts: 5
Credit: 238,426
RAC: 0
Message 19241 - Posted: 24 Jun 2006, 21:52:31 UTC

Just aborted this work unit.

https://boinc.bakerlab.org/rosetta/result.php?resultid=25006316

Stuck at 1hr 7min - suspened and resumed several times to no avail. Next work Rosetta work unit seems to be running normally.

rosetta 5.22
windows 2K
athlon xp 2500 barton

ID: 19241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Clare Jarvis

Send message
Joined: 14 Dec 05
Posts: 8
Credit: 874,698
RAC: 0
Message 19338 - Posted: 27 Jun 2006, 1:48:14 UTC

I have been having similar problems. I cannot
leave Rosetta alone or it simply hangs. But if I
visit and hit "Update" every day then I get much better production.
Is this a problem with Rosetta or with Boinc. It is very frustrating.
I wish the statistics page had the start time and date of each
run along with the deadline.


ID: 19338 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 19418 - Posted: 28 Jun 2006, 14:38:25 UTC
Last modified: 28 Jun 2006, 15:09:05 UTC

I have (occassionally) the problem of stalled/hanging Rosettas (somewhere, not at 0% or 1% or 100% progress) already for ages, on Red Hat EL 4.1. Now using BCC 5.4.9, attached to 7 projects, Rosetta's share is ~20%. The computer is running for months betwen reboots, without graphics.

The symptoms are that Rosetta app seems to be running, but the CPU time does not increase. Recently I've noticed that even BCC is not able to run benchmarks, if this happens. IIRC previously if BCC was able to switch to aother app, it got 0 CPU cyces (because Rosetta was consuming all) and did not increment time. Usually the only way to overcome this problem was to manually restart BCC. This way the Rosettas were able to continue and finish. (Whether correctly? Now I can see a few (5) process exited with code 131 (0x83) messages since March in the logs.)

This time, a week ago I've made few snapshots of suspended rosetta 5.22' result t312__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom010__711_1635_0 and reported them in the Rosetta WU's stall on RedHat Fedora thread. It is stuck at 28.80% (2:43:29 CPU time), maybe for a day already. I'll try to restart BCC, if something new will come into the files in it's slot/3/ dir. And then abort and report, it's now after deadline anyway...

Yes, it restarted happily, CPU time jumped from 2:43:29 to 1:43:29 and is incrementing, but progress stayed at 28.80% and does not move. Aborting...

<core_client_version>5.4.9</core_client_version>
<message>
aborted by user
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# random seed: 1940641
# cpu_run_time_pref: 21600
SIGSEGV: segmentation violation
Stack trace (14 frames):
[0x884cb9f]
[0x8864cfc]
[0x88cade8]
[0x8621564]
[0x87f229b]
[0x873b844]
[0x873d0af]
[0x85a95e9]
[0x85b190a]
[0x83d6c9f]
[0x86022d3]
[0x84740c8]
[0x88c41e4]
[0x8048111]

Exiting...
Graphics are disabled due to configuration...
# cpu_run_time_pref: 21600
SIGSEGV: segmentation violation
Stack trace (15 frames):
[0x884cb9f]
[0x8864cfc]
[0x88cade8]
[0x88e5473]
[0x88b6601]
[0x88b8029]
[0x805fdd8]
[0x83d75de]
[0x83d90a0]
[0x83d8f89]
[0x83d72ca]
[0x88cb7ef]
[0x885bff0]
[0x8865f65]
[0x88f771a]

Exiting...
SIGSEGV: segmentation violation
Stack trace (14 frames):
[0x884cb9f]
[0x8864cfc]
[0x88cade8]
[0x853664c]
[0x854a184]
[0x830867c]
[0x8308fdf]
[0x86c4a6a]
[0x86c6f15]
[0x83d6f08]
[0x86022d3]
[0x84740c8]
[0x88c41e4]
[0x8048111]

Exiting...
Graphics are disabled due to configuration...
# cpu_run_time_pref: 21600
ERROR:: Exit at: fragments.cc line:459
FILE_LOCK::unlock(): close failed.: Bad file descriptor

</stderr_txt>

Peter
ID: 19418 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 19422 - Posted: 28 Jun 2006, 16:21:28 UTC - in response to Message 19338.  

But if I visit and hit "Update" every day then I get much better production. Is this a problem with Rosetta or with Boinc.


BOINC is responsible to contact the projects that it needs to get work from. Performing an update wouldn't have much to do with a hung work unit. Are you saying to end up without work? Or are you saying that your existing WUs are not ending properly?

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 19422 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 19423 - Posted: 28 Jun 2006, 16:24:06 UTC

Pepo: I'm not clear how long you observed the running of the WU after restarting it. But the progress % does not change very frequently and this is normal. Here is some relevant information on the subject. Perhaps you are saying you let it run for over an hour with no progress... that would be another matter. But, if not, that portion of what you are describing is probably normal and does not require your intervention to abort.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 19423 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 19425 - Posted: 28 Jun 2006, 16:39:14 UTC - in response to Message 19423.  
Last modified: 28 Jun 2006, 16:40:16 UTC

Pepo: I'm not clear how long you observed the running of the WU after restarting it. But the progress % does not change very frequently and this is normal. Here is some relevant information on the subject. Perhaps you are saying you let it run for over an hour with no progress... that would be another matter. But, if not, that portion of what you are describing is probably normal and does not require your intervention to abort.

Yes, I've read the FAQ. If you look at the Rosetta WU's stall on RedHat Fedora thread I mentioned, the Rosetta was hung for at least more than a day, I could look into the logs to tell exactly.

I usually check the machine once in a day-two (because of Rosetta :-) and restart Boinc if this happens. And it is happening for long already. I'm pretty sure that for few months.

Peter
ID: 19425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 19432 - Posted: 28 Jun 2006, 19:24:05 UTC - in response to Message 19425.  

Pepo: I'm not clear how long you observed the running of the WU after restarting it.

[...]the Rosetta was hung for at least more than a day, I could look into the logs to tell exactly.

I'm sory, Feet1st, I did not read carefully enough. I aborted the result 20 minutes after restarting it.

Peter
ID: 19432 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : Report Problems with Rosetta Version 5.22



©2024 University of Washington
https://www.bakerlab.org