Running BOINC in LTSP 4.2 with custom SMP kernel

Message boards : Number crunching : Running BOINC in LTSP 4.2 with custom SMP kernel

To post messages, you must log in.

AuthorMessage
Dr_Strangelove
Avatar

Send message
Joined: 2 May 08
Posts: 2
Credit: 118,118
RAC: 0
Message 52878 - Posted: 6 May 2008, 7:48:15 UTC
Last modified: 6 May 2008, 8:11:29 UTC

Thanks BitSpit for the helpful links and info on building a custom SMP kernel for LTSP!!

http://wiki.ltsp.org/twiki/bin/view/Ltsp/Build-LTSP-42
http://wiki.ltsp.org/twiki/bin/view/Ltsp/CustomLtspKernels


I actually made through building a custom kernel!! <pat's himself on the back : ) >
I've been using a number of different linux distro's for several years (mostly redhat/fedora) but NEVER had built a custom kernel before. My ltsp server is currently running Fedora 8. I see that Fedora 9 (when it releases) will have LTSP built in so I'm looking forward to trying that... Anyway.

After reading BitSpit's howto from the https://muzso.hu/node/3978 page. I went to kernel.org and got the source for 2.6.17.8 since I saw that was one of the kernels that the ltsp-4.2 install had downloaded by default. Then I got the 4.2 ltsp_kernel_kit.

Following the howto instructions helped me get started. When I first got into the menuconfig I was somewhat overwhelmed with the number of choices and was not sure what to select. Then I found a MAJOR shortcut. In the menuconfig's first menu, second choice from the bottom of the list, it says something like “Load config from file”. Looking in the lstp_kernel_kit directory I found the vanilla config-2.6.17.8-ltsp-1 file. Since I knew that the 2.6.17.8 ltsp kernel booted just fine in single core mode, I loaded this as a baseline config. It was then a fairly simple matter of selecting my P4/Xeon type CPU, SMP, and Hyperthreading (more on that). I then saved my mod'ed config and continued to follow the howto. Kernel compiled using gcc 4 with no errors on the very first try! However I did go back a couple times after that and turned off just about everything that I did not need for my hardware.

I'm not sure yet but I think I missed something in the steps where I copied the initramfs from the original ltsp tree. However I do have a kernel that boots and dmesg shows 4 CPU's!!

I have not got boinc to run as a local app on my ltsp clients yet however I do get them all up to a bash shell. I'm fighting an ssh issue that maybe related to a mistake in the copy initramfs steps. I can get a bash shell but cannot telnet or ssh from the server into the client. I can ssh from the bash on the client into the server. Still need to do some more reading there before I could say what that problem might be there.

As far as Hyperthreading goes.. I have been running SETI@home WU's in boinc with hyperthreading enabled on the FC8 server (same dual Xeon MB as hosts) for a few months now. I must have read some of the same threads BitSpit did about HT in regards to performance. I have tried running with HT enabled and disabled. All I can say is with HT disabled, the boinc benchmarks show about 1240 floating point mips Whetstone / 2790 Integer mips Dhrystone per CPU, with a CPU count of 2. With HT enabled, I acually get slightly less Whetstone 1080 and about half Dhrystone 1650. However I show 4 CPU's.

So...

HT off 1240/2798 x 2 cpu = 2480 Whetstone / 5596 Dhrystone ??
HT on 1080/1650 x4 cpu = 4329 Whetstone / 6600 Dhrystone ??

Correct me if I am wrong... but HT enabled seems MUCH faster according to the boinc manager's dual precision benchmarks?? I have been running mostly SETI@home work units so I do not yet have much knowledge of what Rosetta uses more of.. Integer or Floating point power. Either way the benchmarks seem to say HT enabled if you have it. I do know SETI loads one WU per CPU.

I'm sure I'll have a few dozen LTSP ssh and local app's questions when I'm not quite so sleepy ; )

Again.. Thanks BitSpit!!
"Gentlemen, you can't fight in here! This is the War Room!!"
President Merkin Muffley
ID: 52878 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BitSpit
Avatar

Send message
Joined: 5 Nov 05
Posts: 33
Credit: 4,147,344
RAC: 0
Message 52879 - Posted: 6 May 2008, 12:03:17 UTC - in response to Message 52878.  
Last modified: 6 May 2008, 12:05:34 UTC

After reading BitSpit's howto from the https://muzso.hu/node/3978 page.


Not my howto. Can't take credit for someone else's work.

However I did go back a couple times after that and turned off just about everything that I did not need for my hardware.


I remember trying that when I started compiling kernels. It didn't work well. I think the only thing I could safely leave out was sound.

I have not got boinc to run as a local app on my ltsp clients yet however I do get them all up to a bash shell. I'm fighting an ssh issue that maybe related to a mistake in the copy initramfs steps. I can get a bash shell but cannot telnet or ssh from the server into the client.


See http://wiki.ltsp.org/twiki/bin/view/Ltsp/LocalApps for ssh help.

For anyone interested in trying some of the diskless, netbooting Linux madness and you don't feel like compiling your own SMP kernel, you can download mine at http://splicedcollective.org/software/2.6.17.8-ltsp-p3-smp.zip It's the same as the stock LTSP 4.2 kernel except it has 4-core SMP support and it requires a P3 minimum.

A couple of additional useful notes. First, BOINC has a known flaw in its netcode. Whenever it does a DNS lookup, it halts all other network connections. That tends to be a problem when you're running over the network. The workaround is changing the DNS timeout. Open up /opt/ltsp/i386/etc/rc.sysinit and go to the section labeled "Setup the resolv.conf file." Add this line:

echo "options timeout:2">>/tmp/resolv.conf

This changes the timeout from the default 5 seconds to 2 seconds. You can play around with the timeout and retry (default 2) options but the important thing is to keep the total time resolving DNS at 25 seconds or less. Any longer and tasks WILL start crashing with the message "No heartbeat from core for 31 seconds" In my experience, once that starts happening, BOINC will start idling cores, erroring out the work queue, and possibly locking up the whole system requiring a reboot.

Second and while we're in resolv.conf, let's talk about setting the DNS servers. By default, LTSP sets the DNS server the same as your LTSP server. I don't like that. Also, it does not pick up any additional servers through DHCP. To change the servers, you'll need to edit rc.sysinit in the same resolv.conf section. To specify your own DNS servers, just append /tmp/resolv.conf with some echo statements. For example, to use OpenDNS:

echo "nameserver 208.67.222.222" >>/tmp/resolv.conf
echo "nameserver 208.67.220.220" >>/tmp/resolv.conf

To disable the default behavior of using the LTSP server for DNS, comment out or delete this line:

echo "nameserver ${DNS_SERVER}" >>tmp/resolv.conf

Finally, you can do some fun things using a screen script. On my setup, I have one created that does a multi-project setup. I can change a system's config file and choose what it will run (currently either BOINC or distributed.net). It also lets me set a flag for the system to either shutdown or reboot when the current program ends.
ID: 52879 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr_Strangelove
Avatar

Send message
Joined: 2 May 08
Posts: 2
Credit: 118,118
RAC: 0
Message 52889 - Posted: 7 May 2008, 0:23:52 UTC - in response to Message 52879.  
Last modified: 7 May 2008, 0:39:36 UTC

Howdy..
Not my howto. Can't take credit for someone else's work.


Told you that I was sleepy : ) What I meant to say was after reading this Howto and BitSpit's helpfull links.. blah blah, etc.. etc

However I did go back a couple times after that and turned off just about everything that I did not need for my hardware.

I remember trying that when I started compiling kernels. It didn't work well. I think the only thing I could safely leave out was sound.


There was a whole bunch of network card drivers that I was never going to use for this project. I religiously use either Intel or 3Com NIC's so I took out almost everything else. My MB's have an onboard Intel Pro/e1000 gigabit nic's with the PXE boot option so that was really all I needed. Also my MB's are server boards so there was no sound.. Gone. Kernel had choices for a gazillion video cards. MB's have just the basic 8 meg ATI Rage 128 GPU built on the board, so I removed all but a few ATI modules for video, leaving the basic VESA display stuff. I left in much of the USB stuff wanting to retain function for USB stick's. Have a couple diagnostic tools on bootable thumb drives that have been very useful. Gone are all the Bluetooth drivers, V4L (Video for Linux) stuff, and DVB-TV tuner card drivers. Also gone is much of the RAID stuff. If I need any RAID support it will be on the server and not the clients. Removing all this stuff also really cut down on the compile time for the kernel and all the modules.

I have started reading the http://wiki.ltsp.org/twiki/bin/view/Ltsp/LocalApps link for local apps/ssh help.

For anyone interested in trying some of the diskless, netbooting Linux madness and you don't feel like compiling your own SMP kernel, you can download mine at http://splicedcollective.org/software/2.6.17.8-ltsp-p3-smp.zip It's the same as the stock LTSP 4.2 kernel except it has 4-core SMP support and it requires a P3 minimum.


Thanks again for linking to your kernel. I may try that at some point.

A couple of additional useful notes. First, BOINC has a known flaw in its netcode. Whenever it does a DNS lookup, it halts all other network connections. That tends to be a problem when you're running over the network. The workaround is changing the DNS timeout. Open up /opt/ltsp/i386/etc/rc.sysinit and go to the section labeled "Setup the resolv.conf file." Add this line:

echo "options timeout:2">>/tmp/resolv.conf

This changes the timeout from the default 5 seconds to 2 seconds. You can play around with the timeout and retry (default 2) options but the important thing is to keep the total time resolving DNS at 25 seconds or less. Any longer and tasks WILL start crashing with the message "No heartbeat from core for 31 seconds" In my experience, once that starts happening, BOINC will start idling cores, erroring out the work queue, and possibly locking up the whole system requiring a reboot.


COOL! I was wondering what was going on there. I have had the ltsp server and one other P4 machine chewing on boinc WU's and I have seen this behavior a couple times. A complete reboot is usually not required, I just did a 'service network restart'. That would usually fix it.

Second and while we're in resolv.conf, let's talk about setting the DNS servers. By default, LTSP sets the DNS server the same as your LTSP server. I don't like that. Also, it does not pick up any additional servers through DHCP. To change the servers, you'll need to edit rc.sysinit in the same resolv.conf section. To specify your own DNS servers, just append /tmp/resolv.conf with some echo statements. For example, to use OpenDNS:

echo "nameserver 208.67.222.222" >>/tmp/resolv.conf
echo "nameserver 208.67.220.220" >>/tmp/resolv.conf

To disable the default behavior of using the LTSP server for DNS, comment out or delete this line:

echo "nameserver ${DNS_SERVER}" >>tmp/resolv.conf


Since I had not got boinc to run on the clients yet I had not seen this one. Makes sense.

Finally, you can do some fun things using a screen script. On my setup, I have one created that does a multi-project setup. I can change a system's config file and choose what it will run (currently either BOINC or distributed.net). It also lets me set a flag for the system to either shutdown or reboot when the current program ends.


Cool. I'm also looking at ways to startup (wake on lan), and shutdown by remote from the server since the clients will be in the basement where it's cooler and where fan noise is not an issue. I'm also building this system in an effort to learn more about how HPC clustering works with a Render Farm for 3D graphics in the back of my mind. All depends on how I decide to boot up the "nodes".

My plan for these 8 MB's is for two to be configured in an HPA cluster mode, and the other six as dedicated ltsp boinc clients or Beowulf HPC compute nodes.

I'll do much more reading and fiddling. I do appreciate the great advice!

Thanks : )
"Gentlemen, you can't fight in here! This is the War Room!!"
President Merkin Muffley
ID: 52889 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Running BOINC in LTSP 4.2 with custom SMP kernel



©2024 University of Washington
https://www.bakerlab.org