Report Problems with Rosetta Version 5.25

Message boards : Number crunching : Report Problems with Rosetta Version 5.25

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 12 · Next

AuthorMessage
NJMHoffmann

Send message
Joined: 17 Dec 05
Posts: 45
Credit: 45,891
RAC: 0
Message 22533 - Posted: 16 Aug 2006, 12:43:20 UTC - in response to Message 22529.  

I'm having the same problems with Rosetta@Home "hanging" (it shows "running" but the CPU is at 0%).

You don't happen to have Boinc alpha 5.5.10 running?

Norbert
ID: 22533 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tino Ruiz

Send message
Joined: 12 Oct 05
Posts: 13
Credit: 397,392
RAC: 0
Message 22550 - Posted: 16 Aug 2006, 14:22:11 UTC

No, just the regular Rosetta@Home.
ID: 22550 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile meshmar

Send message
Joined: 1 Apr 06
Posts: 26
Credit: 176,432
RAC: 0
Message 22556 - Posted: 16 Aug 2006, 15:29:02 UTC - in response to Message 22529.  

Hi,

I'm having the same problems with Rosetta@Home "hanging" (it shows "running" but the CPU is at 0%). Usually it occurs within 23%~26% of processing the unit. The same thing happens with World Community Grid as well, but I know that's another project. ALL other projects work fine. I'm on a P4C 2.6 GHz, 512 MB RAM running Xubuntu. Nothing is overclocked. The workunits below I know are "stuck":

FRA_t370_CASPR_hom001_6_t370_4_2a2jA_IGNORE_THE_REST_223_1078_61_0
FRA_t322_CASPR_hom001_6_t322_3_1u1zA_IGNORE_THE_REST_17_1079_65_1

There are a lot more that I've had to abort over the weeks, but my log only goes so far.


FRA_t322_CASPR_hom001_6_t322_3_1u1zA_IGNORE_THE_REST_341_1079_74_0 did the same on one of my systems. 20.153% progress; running; cpu idle. Shutting down Boinc and then restarting it caused rosetta to start running a second WU and to show FRA_t322_CASPR_hom001_6_t322_3_1u1zA_IGNORE_THE_REST_341_1079_74_0 as preempted ...

Running Boinc 5.4.11
ID: 22556 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tino Ruiz

Send message
Joined: 12 Oct 05
Posts: 13
Credit: 397,392
RAC: 0
Message 22563 - Posted: 16 Aug 2006, 16:21:49 UTC

So...what does that mean? Sorry if I'm being stupid, but is there anything that can be done? I've tried everything I could, resetting the project etc. Nothing works. If I have to keep aborting all workunits like this I'm afraid I'm not much use to the project and I'll probably have to quit. :-(
ID: 22563 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 22564 - Posted: 16 Aug 2006, 17:31:45 UTC

MonsterTruck
I see you are running Linux, and BOINC 5.4.9. How do you have your General preferences set for run while PC is in use? And keep in memory while preempted?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 22564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tino Ruiz

Send message
Joined: 12 Oct 05
Posts: 13
Credit: 397,392
RAC: 0
Message 22574 - Posted: 16 Aug 2006, 19:30:13 UTC
Last modified: 16 Aug 2006, 19:31:35 UTC

Feet1st: I have BOINC set to run 24/7 (which it does, my PC is up 24/7), the unit is set to be removed from memory when it switches to another project. Interval is 1 hour, then it moves to the next project and so on.
ID: 22574 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
NJMHoffmann

Send message
Joined: 17 Dec 05
Posts: 45
Credit: 45,891
RAC: 0
Message 22576 - Posted: 16 Aug 2006, 19:52:45 UTC - in response to Message 22574.  
Last modified: 16 Aug 2006, 19:53:30 UTC

Feet1st: I have BOINC set to run 24/7 (which it does, my PC is up 24/7), the unit is set to be removed from memory when it switches to another project. Interval is 1 hour, then it moves to the next project and so on.

Rosetta still :-( has WUs with checkpoints more then 1 hour apart. So you should
- leave in memory or
- use a bigger interval or
- use a boinc version, that waits for a checkpoint before switching (e.g. 5.5.13)

Norbert
ID: 22576 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tino Ruiz

Send message
Joined: 12 Oct 05
Posts: 13
Credit: 397,392
RAC: 0
Message 22673 - Posted: 17 Aug 2006, 13:17:41 UTC

Oh, I see...thanks! :-D Hopefully that'll fix it.
ID: 22673 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,818,279
RAC: 728
Message 22807 - Posted: 17 Aug 2006, 21:31:00 UTC

I have been reporting this problem for over a week but nothing has come of it yet, they are still mucking around with the crediting system I suppose.
I have both Windows and Linux machines, the Windows ones are not having the problem (only 2 WU's have stuck) but the Linux machines virtually can't process a WU. Both are Opteron machines and I changed the preferences from 90 minute swap times to 60 minutes but this has made no difference. WU's stop doing anything, CPU goes idle but the Boinc Manager says still running and will not switch to another project (I have more than just Rosetta on each computer) despite the preference setting, it stays locked to the WU and does nothing. Suspending/resuming soes nothing and restarting Boinc does nothing, a machine reboot is the only way to restart the unit which usually then errors out. The current WU's I am doing on my AMD Opteron 275 machine have all failed with computation errors or I have aborted them (about 10 I think over the last 2 days), none have been successful. I am not using the latest 5.5.9 or whatever the current version is I am using the previous version or versions. All my other projects are having no problems. I have preferences set to remove from memory due to the other projects I am running across all my machines.
ID: 22807 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 22825 - Posted: 17 Aug 2006, 22:13:30 UTC

Sorry, I forgot to report this WU:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28166866

Result: https://boinc.bakerlab.org/rosetta/result.php?resultid=32544227

It crashed when I opened the graphics, as I remember.


[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 22825 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 22834 - Posted: 17 Aug 2006, 22:41:35 UTC

Conan: Does the problem on your Linux machines go away if you switch to "leave app in memory"? If so.. you can always change the mix of the projects on those two machines so that Rosetta is on only one machine, with the "leave app in memory" and limit the number of other projects running on the second machine.

i.e. instead of two machines giving Rosetta 25% - have one machine giving Rosetta 50%.

And keep mentioning the "app in memory" problem every so often, until it gets fixed.


ID: 22834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TCU Computer Science

Send message
Joined: 7 Dec 05
Posts: 28
Credit: 12,861,977
RAC: 0
Message 22898 - Posted: 18 Aug 2006, 3:35:20 UTC - in response to Message 22834.  
Last modified: 18 Aug 2006, 3:44:41 UTC

Does the problem on your Linux machines go away if you switch to "leave app in memory"?



I leave the app in memory and still have the problem, but it doesn't occur very often. Boinc Manager says Rosetta is running, but the CPU Time is not increasing and the CPU is idle. Usually when the problem occurs and I stop the Boinc process, the Rosetta process remains in the process list. I have to manually kill it or reboot the machine.

I have seen the problem on Mac OS X and Linux (CentOS 4.3) but never under Windows. It has occurred on machines running only Rosetta and machines running Rosetta and Einstein but it only effects the Rosetta app.

ID: 22898 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,818,279
RAC: 728
Message 22902 - Posted: 18 Aug 2006, 5:04:54 UTC - in response to Message 22834.  

Conan: Does the problem on your Linux machines go away if you switch to "leave app in memory"? If so.. you can always change the mix of the projects on those two machines so that Rosetta is on only one machine, with the "leave app in memory" and limit the number of other projects running on the second machine.

i.e. instead of two machines giving Rosetta 25% - have one machine giving Rosetta 50%.

And keep mentioning the "app in memory" problem every so often, until it gets fixed.


Thanks BennyRop for your reply. impractical to change project mix at this stage. I was having problems with leaving the app in memory as it was causing problems with one of my Windows machines that also runs Rosetta. The preferences are not computer specific so has to change for all. With about 5 projects on 3 of my machines leaving app in memory (i was leaving all apps in memory at one stage) caused problems. I could try it again though. The problem does not affect my Windows machine only the Linux machines. Rosetta is the only app affected, Einstein,Cp,QMC,Predictor all ok. Ralph not affected as much as Rosetta.
ID: 22902 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MikeMarsUK

Send message
Joined: 15 Jan 06
Posts: 121
Credit: 2,637,872
RAC: 0
Message 22947 - Posted: 18 Aug 2006, 8:43:27 UTC

I believe there is an <something>_override.xml file in the boinc directory which can be edited to change the settings on one specific machine (i.e., 10 machines would use the web settings, and one would use the settings from the override file).

But I've never looked into it myself (no need), so can't do more than point you in the general direction.

ID: 22947 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 22950 - Posted: 18 Aug 2006, 8:48:55 UTC - in response to Message 22947.  

I believe there is an <something>_override.xml file in the boinc directory which can be edited to change the settings on one specific machine (i.e., 10 machines would use the web settings, and one would use the settings from the override file).

But I've never looked into it myself (no need), so can't do more than point you in the general direction.

Details of the global_prefs_override.xml file can be found here: http://boinc.berkeley.edu/prefs_override.php

It needs BOINC 5.4 or later.
ID: 22950 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 23548 - Posted: 19 Aug 2006, 20:05:18 UTC

I'm the second one to error out on these two WUs:

1c9oA_BOINC_BACKBONE_HN_PENALTY_ABRELAX_SAVE_ALL_OUT__1175_75
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=29000987
we both get errors at line 401:

<core_client_version>5.4.9</core_client_version>
<message>
Yanl�� i�lev. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 1690956
# cpu_run_time_pref: 10800
ERROR:: Exit at: .dock_structure.cc line:401

FRA_t367_CASPR_hom001_6_t367_4_1wolA_IGNORE_THE_REST_568_1076_12
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=28126307
and these two both error out at line 1860 of a different module:

<core_client_version>5.4.9</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 2331689
# cpu_run_time_pref: 10800
ERROR:: Exit at: .pack.cc line:1860

ID: 23548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 23579 - Posted: 19 Aug 2006, 21:11:41 UTC

here's another:
1tit__BOINC_BACKBONE_HN_PENALTY_ABRELAX_SAVE_ALL_OUT__1175_470
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=29074487

<core_client_version>5.4.11</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 1660561
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 7200
ERROR:: Exit at: .dock_structure.cc line:401


ID: 23579 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tino Ruiz

Send message
Joined: 12 Oct 05
Posts: 13
Credit: 397,392
RAC: 0
Message 23608 - Posted: 20 Aug 2006, 2:39:52 UTC

Great news guys: it appears that the "CPU hanging issue" has disappeared for me. I've set the interval to 120 minutes (2 hours), and that seems to have worked so far. :-)
ID: 23608 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 23609 - Posted: 20 Aug 2006, 2:52:38 UTC

Congratulations, MonsterTruck.. :)
ID: 23609 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 23627 - Posted: 20 Aug 2006, 6:32:58 UTC

ID: 23627 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 12 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.25



©2024 University of Washington
https://www.bakerlab.org