Problems with (X)ubuntu 7.10?

Message boards : Number crunching : Problems with (X)ubuntu 7.10?

To post messages, you must log in.

AuthorMessage
Harrison Neal

Send message
Joined: 30 Jul 07
Posts: 8
Credit: 133,501
RAC: 0
Message 48008 - Posted: 25 Oct 2007, 2:36:48 UTC

Hello All-

It seems like some older computers I have that were running Xubuntu 7.04 and Rosetta@home tasks peacefully are choking on Xubuntu 7.10 and Rosetta@home. Granted, this is OLD hardware, which is why I'm running sanity checks on this hardware as I type (badblocks, MemTest86, etc.), but the time stamps on the tasks that have failed or otherwise acted quirky all seem to be after the installation of Xubuntu 7.10 (otherwise I'd finish the sanity checks). It also seems suspicious that only the Rosetta@home project is giving these computers grief - other projects, such as World Community Grid, aren't having the same problems as Rosetta@home (or at least I haven't seen them yet).

It should be noted that all the computers have 4 partitions - the first for swap, the second for the root directory, the third for the home directory, and the fourth for BOINC files (/var/lib/boinc-client). The reason being that I prefer to freshly install new versions of OSes as opposed to doing an in-place upgrade, and the above partition layout makes that easier (for me, anyway). I don't see how this would definitely be a problem (perhaps files from the older BOINC conflicting...?), but, heck, I've seen stranger things before.

I've also reset the Rosetta@home project on all these computers as well. I don't believe a task has finished/failed after the reset, but, assuming the hardware checks come back clean, I'll let them run and see what happens.

The computers that seem to be griping over Rosetta@home on Xubuntu 7.10 (which has BOINC 5.10.8; Xubuntu 7.04 had BOINC 5.4.11) are as follows:

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=625082

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=625047

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=617847

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=622444

I also just got another computer and put Xubuntu 7.10 on it without any previous installation on the Hard Drive. It seems to be working perfectly fine, thus far. It can be found here: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=642904

If there is a known issue or something I've done that goes against a "proper" method, please mention it.

Thanks for your help in advance,
-Harrison N.
ID: 48008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Harrison Neal

Send message
Joined: 30 Jul 07
Posts: 8
Credit: 133,501
RAC: 0
Message 48009 - Posted: 25 Oct 2007, 2:46:42 UTC

Sure enough, I find something that might explain the situation after I open my big mouth. Oops.

However, there are still these results, which may or may not be related to another post which describes Rosetta@home having problems when Rosetta@home is not suspended in memory (I'm including these because they don't have exit code 193):

https://boinc.bakerlab.org/rosetta/result.php?resultid=115098768

https://boinc.bakerlab.org/rosetta/result.php?resultid=115061214

https://boinc.bakerlab.org/rosetta/result.php?resultid=114287567

https://boinc.bakerlab.org/rosetta/result.php?resultid=113947212

https://boinc.bakerlab.org/rosetta/result.php?resultid=114999850

https://boinc.bakerlab.org/rosetta/result.php?resultid=115032344

https://boinc.bakerlab.org/rosetta/result.php?resultid=115068071
ID: 48009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 48046 - Posted: 26 Oct 2007, 20:43:05 UTC

Having used Gutsy Gibbon (7.10) throughout the development cycle and updating regularly (every few days some 150+ updates..) and a couple of distribution updates from the road 7.04 to 7.10 final...
I can say that there should be no problems.

BUT you have 2 things to understand

a) 5.10.x version of BOINC has a new in use and not in use memory feature, this is the most likely cause. go alter your setting either as local setting or via your online settings.

b) The lastest Ubuntu's have a system where if the nice (priority) is to run as a low priority, then it will not increase the multiplier of your computer.
So for example a Pentium-M 1.7GHz would run at 600MHz if BOINC was left to run normally, no matter what BOINC setting you use, rather than the 1700MHz it should run at.
You have to override this setting (if you need to know how just ask)
Team mauisun.org
ID: 48046 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tribaal
Avatar

Send message
Joined: 6 Feb 06
Posts: 80
Credit: 2,754,607
RAC: 0
Message 48047 - Posted: 26 Oct 2007, 21:01:33 UTC

I for one would love to know how :)

- trib'
ID: 48047 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,242,482
RAC: 124
Message 48050 - Posted: 27 Oct 2007, 0:25:08 UTC - in response to Message 48047.  

I for one would love to know how :)


Not sure about Ubuntu, but in RedHat Linux, look for a file called:
/etc/sysconfig/cpuspeed

Change the line that says:
IGNORE_NICE=1

to:
IGNORE_NICE=0

ID: 48050 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Harrison Neal

Send message
Joined: 30 Jul 07
Posts: 8
Credit: 133,501
RAC: 0
Message 48062 - Posted: 27 Oct 2007, 18:02:19 UTC

I personally have never tried to throttle the CPU, but, of the computers I have that support speed throttling through /sys/devices/system/cpu/cpu#/cpufreq, the governor is currently at "performance", and the current frequency is equal to the maximum frequency, so my interpretation is that they are all running as fast as they can. (First off, is that a correct interpretation [given the fact that some don't have a cpufreq folder]?)

Also, in the "Preemption Failures on Linux" thread ( https://boinc.bakerlab.org/rosetta/forum_thread.php?id=3649 ), I confirmed that troggling that setting solved all my problems, or so it seemed.

It seems like the only problem I've seen after changing the leave applications in memory setting (as in I saw the problem this morning) is isolated to a single computer, and this same problem has happened before changing that setting: A Rosetta@Home task simply stalls at 100%, BOINC reports it as "Waiting to run", but BOINC will refuse to run it nor send the results. It should be noted that Rosetta@Home is still in memory on this computer. I also never saw this happen with Xubuntu 7.04, and it was crunching Rosetta@Home on Xubuntu 7.04 for about 3 weeks. It's been attempting to crunch Rosetta@Home on Xubuntu 7.10 for about a week, and this particular problem has occured twice.

The only difference between this computer and the rest of the computers is that this particular computer only meets the minimum disk space requirement; it's running an AMD K6 at 350MHz and is maxed out with 192MB RAM. Obviously, since this is below the minimum requirements, I understand there is a chance I simply would have to say the computer is too old to crunch Rosetta@Home properly, but, in the off chance that this computer shouldn't be encountering this problem (or, put another way, since I haven't seen this problem in a previous version of the BOINC Software), I'm posting this.

The Computer:

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=617847

Task that failed before turning on leave applications in memory setting; aborted:

https://boinc.bakerlab.org/rosetta/result.php?resultid=114100598

Task that is still stuck "waiting to run":

https://boinc.bakerlab.org/rosetta/result.php?resultid=115401178

It might be worth mentioning that both tasks begin with "mcr1__BOINC_RG_FULLWEIGHT_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_".

If there is anything I would need to do to help determine why it is that Rosetta@Home is simply stalling, please post it.

Thanks,
-HN
ID: 48062 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,242,482
RAC: 124
Message 48088 - Posted: 29 Oct 2007, 1:46:36 UTC - in response to Message 48062.  

I cannot say if 192MB is enough RAM for the Rosetta app to finish up and write the results to a file. Perhaps that's why it's stuck at 100%? See if it will finish in runlevel 2.

I have personally been displeased with BOINC 5.10.x and beyond, so I still run 5.8.16. If it allows your older computer to crunch Rosetta properly, then maybe a BOINC downgrade is appropriate. I would not go earlier than 5.8.x however.

Yes, if you have selected the performance governor, then CPUs are at max speed. No further changes are required.
ID: 48088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Harrison Neal

Send message
Joined: 30 Jul 07
Posts: 8
Credit: 133,501
RAC: 0
Message 48089 - Posted: 29 Oct 2007, 2:20:22 UTC

After two updates and two restarts, the task stuck at 100% finally coughed up a segmentation violation with exit status 139 (along with mentioning that the Rosetta core seemed "stuck").

I'll try putting it in runlevel 2 tonight and watch it for a few days, to see if it'll hang again. If that doesn't work, I'll take your advice and try putting BOINC 5.8 on it.

And, also - unfortunately, this computer has a hard-wired limit of 192MB RAM... I couldn't put more in it if I tried (and I have).

Thanks, -HN
ID: 48089 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,242,482
RAC: 124
Message 48138 - Posted: 30 Oct 2007, 1:29:36 UTC - in response to Message 48089.  

...exit status 139...


That's the typical error I've seen and many have posted about in this thread:
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=3649

If it doesn't work in runlevel 2 after restoring, then just let that WU die and move on. I think you could still run Rosetta on this machine as long as you keep it lean.
ID: 48138 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Harrison Neal

Send message
Joined: 30 Jul 07
Posts: 8
Credit: 133,501
RAC: 0
Message 48172 - Posted: 31 Oct 2007, 5:17:26 UTC

As it turns out, *buntu's default runlevel is 2, and just relies on a recovery mode entry in Grub to avoid launching unnecessary "services", among other things.

Nonetheless, I'm now experimenting with the BOINC version on that computer, to see if it'll behave like it used to.

And, just to clarify - the tasks that this computer has choked up on had exit code 139, while that thread mentioned exit code 193 (unless any of the dead result links in fact had exit code 139...).

-HN
ID: 48172 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,242,482
RAC: 124
Message 48181 - Posted: 31 Oct 2007, 13:44:14 UTC - in response to Message 48172.  

You're right; 193 is not 139. My dyslexia kicked in. :)

Good luck with the upgraded BOINC version. It's not perfect but it's the best I've found so far, old and new.
ID: 48181 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Problems with (X)ubuntu 7.10?



©2024 University of Washington
https://www.bakerlab.org