Rosetta@home

BOINC over a cluster

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : BOINC over a cluster

Sort
AuthorMessage
Dragokatzov Profile

Joined: Oct 5 05
Posts: 25
ID: 2728
Credit: 2,058,638
RAC: 1,265
Message 7731 - Posted 27 Dec 2005 17:10:05 UTC

I see over at the Folding@home forums theres a sticky with LOTS of information on setting up a diskless stack. I was woundering would it be possible to rin BOINC with Rosetta on one of these? how many cpu's does BOINC support anyways? thanks for your information!
____________
Victory is the ONLY option!

Honza

Joined: Sep 18 05
Posts: 48
ID: 434
Credit: 173,517
RAC: 0
Message 7732 - Posted 27 Dec 2005 17:28:31 UTC

It is up to you how many CPUs can BOINC use
Just go to the General preferences ( http://boinc.bakerlab.org/rosetta/prefs.php?subset=global ) and set "On multiprocessors, use at most ... processors"
It should be fine with 8 if you get to a quad dual-core Opteron machine for example.

When running a farm, I would suggest to use BOINCView ( http://boincview.amanheis.de/ ) to monitor/control machines.

You can run BOINC over a network, from a Ramdrive or even from a flash disk [only CPDN is not suitable for this since it is quite HD space and I/O demanding].
____________

Bill Michael Profile

Joined: Oct 25 05
Posts: 573
ID: 6600
Credit: 24,429
RAC: 0
Message 7735 - Posted 27 Dec 2005 18:04:25 UTC

One warning is that you need a separate installation of BOINC for each host, and each needs it's own access to the internet. If your "stack" appears to the OS as a single computer with multiple processors, all is well; if each boots individually, you have multiple "hosts", and you'll have to do a bit more work to make BOINC work correctly.

____________

J D K
Avatar

Joined: Sep 23 05
Posts: 168
ID: 858
Credit: 101,266
RAC: 0
Message 7745 - Posted 27 Dec 2005 20:12:41 UTC

I have seen 64....
____________
BOINC Wiki

John McLeod VII
Avatar

Joined: Sep 17 05
Posts: 108
ID: 314
Credit: 157,618
RAC: 12
Message 8141 - Posted 1 Jan 2006 22:08:04 UTC

There is nothing that I have seen in the code that limits the number of CPUs. However, the communications between the BOINC daemon and the project applications is through shared memory - and AFAIK clusters do not do shared memory, thus requiring a separate installation for each host.
____________


BOINC WIKI

River~~ Profile
Avatar

Joined: Dec 15 05
Posts: 752
ID: 37802
Credit: 132,982
RAC: 0
Message 8169 - Posted 2 Jan 2006 9:56:51 UTC - in response to Message ID 8141.

There is nothing that I have seen in the code that limits the number of CPUs. However, the communications between the BOINC daemon and the project applications is through shared memory - and AFAIK clusters do not do shared memory, thus requiring a separate installation for each host.


In principle either a software hack (*) or a hardware hack could provide shared memory across a cluster. But nobody has done one as far as I know because you would not want to anyway.

With many cpus sharing the same memory you reach a bottleneck where they are fighting one another for access to that memory. Even if memory speeds can be pushed a little, the speed of light limits how far the memory can be away from the chip at a given clock speed. If you had a 1GHz memory bus for example light would travel 1 foot (30cm) in that time, and because of the ask/fetch model that limits the memory to be within 6in (15cm). Or 4in (10cm) as the signals travel at 'only' 2/3 the speed of light.

The consequence is that two real cpus will not give quite twice the crunch as a single cpu; by the time you get to four cpus in one box the effect should be noticeable even if they are real cpus in separate sockets.

If you got to (say) 64 cpus the whole thing would be memory bound and adding more cpus would just slow it down. The cpus would spend more time haggling over memory access than doing real work. Well before we see the 64-cpu motherboard my guess is we will see motherboards with several groups of 2-4 cpus, each group having its own RAM...

River~~

(*) the software hack is to associate a file with the shared memory area, and every member of the cluster use the same file. If the word 'performance' has just walked across your mind, you just independently figured out why you wouldn't want to do it...

River~~ Profile
Avatar

Joined: Dec 15 05
Posts: 752
ID: 37802
Credit: 132,982
RAC: 0
Message 8170 - Posted 2 Jan 2006 10:17:26 UTC - in response to Message ID 7732.

You can run BOINC over a network, from a Ramdrive or even from a flash disk.


Ouch!

Please don't use a flash disk directly.

You only have a few times 10,000 rewrites on a flash disk before you wear out the memory. The flash mechanism is fine for photos and passing files from box to box, but not for uses where the file is repeatedly re-written. Remember it was developped from write once read many (WORM) memories used for BIOS, and so on. It is just not designed for this job.

Reading a flash memory does not wear it out, writing does.

If you want to do run from a flash drive, figure out how to make you flash disk copy itself onto a ramdisk before BOINC starts and copy itself back after it stops. This means of course that a power failure wipes out all work that hasn't been reported, as it is all on the ramdisk, but on Rosetta we're used to poor checkpointing anyway ;-(.

Linux users might be able to achieve the same effect by setting 'async' in fstab - but check whether the kernel really does no writes till umount. Personally I would still want to do a copy to ramdisk before and after because I haven't looked at the kernel code.

If you are running Rosetta it ls less of a problem than running Einstein - Rosetta only does about 10 checkpionts per wu, Einstein (Albert app) 1 checkpoint every few sec. Rosetta would take only a thousand wu to wear out the memory in your slot directories. That is three wu a day for a year.
Albert could wear it out in a few dozen wu...

____________

PCZ

Joined: Sep 16 05
Posts: 26
ID: 61
Credit: 2,024,330
RAC: 0
Message 8228 - Posted 3 Jan 2006 1:03:04 UTC
Last modified: 3 Jan 2006 1:19:27 UTC

What you have seen described as a discless stack are PXE boot nodes.

I and many others use these.
Basically you have one Server which runs DHCP, TFTP and NFS.
The nodes are just basic MB's without any HD's.
The nodes boot up over the network and load a kernel via TFTP.
The OS is then mounted as an NFS share from the Server.
There are other NFS shares setup for the various DC projects that you want to run.

So the nodes have no HD's and all there files are on a central Server.
However once booted they are individual entities, each node runs it's own copy
of boinc.
You control them via telnet or SSH.
Each one of the discless nodes is infact a linux PC but without Screen or HD's.

Folks use these clusters because the node cost is very low, just MB, RAM, CPU and PSU.
Adding extra nodes only takes seconds, all that is required to add a node is a quick edit of dhcpd.conf to add the MAC address and some NFS exports for the new node.


These PXE stacks are not true clusters as in they are not slaves to the central server, they are individual PC's using resources provided by a Server but not controlled by it.



____________

raukondil

Joined: Jan 6 06
Posts: 4
ID: 47691
Credit: 78,328
RAC: 0
Message 8828 - Posted 12 Jan 2006 9:27:59 UTC

did anyone have experience in using openmosix with boinc???
i've read sometimes in forums that this is not possible, but i'm not quite sure...
would be happy to know if it is possible ...

@DEV-team
do i have to change the code of the boinc manager or just the project-code...

answers are welcome

greeting david

Paul D. Buck Profile

Joined: Sep 17 05
Posts: 815
ID: 269
Credit: 1,757,755
RAC: 2,519
Message 8836 - Posted 12 Jan 2006 12:13:53 UTC

The current structure and design of BOINC does not support this concept.

Long term there is some vague plan to maybe do something along these lines. But, the demand for this type of hosting is not really there. And, with BOINC running on a per-node basis it simulates a very loosly coupled cluster ...

For example, I have 9 computers, all running BOINC and each computer manages its work but the main account records all the information from my "cluster".

Also, I manage the group of computers with BOINC View as a "Cluster Console" ...

So, without development work on BOINC it is not currently possible. The biggest stumbling block is the use of shared memory segments to send messages between the BOINC Daemon and the Science Applications.

Changes would have to include modifications to the science applications also, or, a 'stub" Daemon with shared memory that would run on each cluster node linked to a BOINC Daemon/Manager on the cluster control using some other mechanism.

Bottom line, probably not worth the effort as there is no particular gain to be had.
____________

dcdc Profile

Joined: Nov 3 05
Posts: 1484
ID: 8948
Credit: 18,124,908
RAC: 12,350
Message 8857 - Posted 12 Jan 2006 16:54:56 UTC - in response to Message ID 8170.

You can run BOINC over a network, from a Ramdrive or even from a flash disk.


Ouch!

Please don't use a flash disk directly.

You only have a few times 10,000 rewrites on a flash disk before you wear out the memory. The flash mechanism is fine for photos and passing files from box to box, but not for uses where the file is repeatedly re-written. Remember it was developped from write once read many (WORM) memories used for BIOS, and so on. It is just not designed for this job.

Reading a flash memory does not wear it out, writing does.

If you want to do run from a flash drive, figure out how to make you flash disk copy itself onto a ramdisk before BOINC starts and copy itself back after it stops. This means of course that a power failure wipes out all work that hasn't been reported, as it is all on the ramdisk, but on Rosetta we're used to poor checkpointing anyway ;-(.



JFI you can do this using the WinXP embedded (XPe) Enhanced Write Filter (by overwriting a couple of files from XPe to XP). There's loads of info on it at www.mp3car.com I'm working on one at the moment but having I/O problems with my CF IDE converter...

____________

River~~ Profile
Avatar

Joined: Dec 15 05
Posts: 752
ID: 37802
Credit: 132,982
RAC: 0
Message 8876 - Posted 12 Jan 2006 19:57:07 UTC - in response to Message ID 8228.
Last modified: 12 Jan 2006 19:57:52 UTC

What you have seen described as a discless stack are PXE boot nodes.

I and many others use these.
Basically you have one Server which runs DHCP, TFTP and NFS.
The nodes are just basic MB's without any HD's.
The nodes boot up over the network and load a kernel via TFTP.
The OS is then mounted as an NFS share from the Server.
There are other NFS shares setup for the various DC projects that you want to run.


Sounds interesting. Ive already got Linux going on my LAN, but so far with several separate boxes and fixed IP addresses. Can you recommend a good link to get me going in DCHP, TFTP, NFS & PXE, none of which I've used before?

Or even better, would someone like to write an article for the Wiki on getting from a typical Linux distro up to a diskless stack of this kind?

As well as learning the software, I also need to know stuff like how do I work out how much power is needed (same question really as how many MB can I run off one power supply if there are no extra HD on the extra MB) and similar practical ideas.

And thank you *very* much to Dragokatzov for asking this question - its given me something to think about instead of going out and buying another 5 second hand boxes.

River~~

BennyRop

Joined: Dec 17 05
Posts: 555
ID: 38837
Credit: 140,800
RAC: 0
Message 8901 - Posted 13 Jan 2006 4:16:32 UTC

I've seen the descriptions of how to setup "headless crunchers" for FaH; but opted for replacing my work system with a dual core cpu instead of experimenting with headless crunchers.
Don't some of the other BOINC clients have writeups on how to setup "headless crunchers" to use PXE to boot up and run their client? Then all you should need to do is match the headless machines to Rosetta's requirements and change the client used.
____________

Paul D. Buck Profile

Joined: Sep 17 05
Posts: 815
ID: 269
Credit: 1,757,755
RAC: 2,519
Message 8931 - Posted 13 Jan 2006 12:26:37 UTC - in response to Message ID 8876.

Or even better, would someone like to write an article for the Wiki on getting from a typical Linux distro up to a diskless stack of this kind?

Since he has the most recent experience, perhaps River~~ might do it ...
____________

spacemeat Profile

Joined: Dec 15 05
Posts: 1
ID: 37892
Credit: 961,743
RAC: 0
Message 9346 - Posted 19 Jan 2006 13:43:56 UTC - in response to Message ID 8876.

Sounds interesting. Ive already got Linux going on my LAN, but so far with several separate boxes and fixed IP addresses. Can you recommend a good link to get me going in DCHP, TFTP, NFS & PXE, none of which I've used before?

Or even better, would someone like to write an article for the Wiki on getting from a typical Linux distro up to a diskless stack of this kind?


http://www.gentoo.org/doc/en/diskless-howto.xml

TPR_Mojo

Joined: Sep 20 05
Posts: 4
ID: 672
Credit: 684,947
RAC: 0
Message 9697 - Posted 24 Jan 2006 10:48:38 UTC

Rather than look into a HOW-TO, just download a preconfigued Linux distro designed for this. I use K12LTSP from K12LTSP.org and it runs diskless nodes fine on BOINC projects right out of the box.

Configuration help can be found at my team's site http://forums.teamphoenixrising.net
____________

networkman Profile

Joined: Jan 19 06
Posts: 1
ID: 52195
Credit: 251,782
RAC: 0
Message 9759 - Posted 25 Jan 2006 3:30:39 UTC

Hmm.. interesting..

____________
"Beer is proof that God loves us and wants us to be happy." - Benjamin Franklin
---

Message boards : Number crunching : BOINC over a cluster


Home | Join | About | Participants | Community | Statistics

Copyright © 2014 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^