BOINC server farm idea / survey

Message boards : Number crunching : BOINC server farm idea / survey

To post messages, you must log in.

AuthorMessage
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 47303 - Posted: 2 Oct 2007, 1:06:50 UTC

I wanted to take a little census of the population here. I've been studying about some internet services available from Amazon. In a nutshell, they allow you to use Amazon disk space (highly distributed, highly available) to serve as many copies of files as you like (global Amazon high speed internet bandwidth), and to use Amazon servers to perform as much computing and web serving as you like (again, highly available).

It all comes with a price of course, but they've set it all up so you only pay for what you use. So for about $US20/week you can keep a single machine instance running, here are the machine specs provided:
1.7Ghz x86 processor, 1.75GB of RAM, 160GB of local disk, and 250Mb/s of network bandwidth.

For another $US20/week you can buy about 100GB of server bandwidth from the Amazon storage space. I for one, believe many individuals and teams would be willing to make such donations... if it were easy to do.

I see the number of BOINC projects increasing over time. Does it really make any sense to have them ALL purchase hardware and bandwidth sufficient to meet the PEAK demand on their servers?

These Amazon services could be set up in a mannar to offload much of the server burden of the projects on to the Amazon network of servers. And if all of the machine instances were shared, rather then dedicated to a specific project, then you should be able to pretty quickly make a big dent in the startup and expansion costs for all BOINC projects.

For example, right now SETI is looking for hardware (and financial) donations, and has a history of blowing out power, and servers on their host campus. Rosetta just purchased a rather expensive SAN file server to keep up with growing numbers of participants. These upgrades might not be required if it were easy for participants to sponsor a server for a week, or to donate 100GB of download bandwidth to their favorite project.

Using these Amazon services would:
1) reduce floor space at project site
2) reduce power required & electricity costs at project site
3) reduce climate control costs at project site
4) allow even small BOINC projects to scale on demand
5) reduce internet bandwidth requirements of project
6) provide more consistent response time to participant machines
7) make it easy for participants to help directly sponsor some of the project costs
8) allow the various BOINC projects to pool their resources to essentially make a virtual large server complex, which they all share as needed to run their projects.
9) project staff focus on limited hardware configuration (the basic science components), the rest of the scale and high availability is maintained by Amazon.

I'm curious to hear your comments about the idea, and any info. you might have on other approaches such as this that people and projects have already attempted in the past.

I know there are a number of hurdles to cross about controls and making sure everyone gets their fair share of resources etc. But I want to first focus on the merits of the idea itself. I think there are some basic limits and controls that could be put in place to assure fairness.

Another idea is to set things up so that teams could buy the server time to crunch projects, but that is a topic for another thread :)
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 47303 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 47304 - Posted: 2 Oct 2007, 1:16:01 UTC
Last modified: 2 Oct 2007, 1:16:41 UTC

Whoops, I almost forgot. For those that want more specifics on this Amazon stuff, here are some links:
Amazon internet storage space (called S3)
Amazon host machines (called EC2)

I'm not trying to promote Amazon, but the service oferrings and the "pay only for what you use" model seem to be a great fit to me.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 47304 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 47323 - Posted: 2 Oct 2007, 13:49:03 UTC

Maybe it is just the accountant in me but the cost for these specs. a pc at 1.75 etc. comes to $1040 a year. (20.00 * 52=1040)

This is without a monitor, you need to connect thru another pc so corporate, personal or project bandwidth cost must maintained. A quad core 2.8 PC with 4g of memory only cost $650.00 with out the monitor. Add $85.00 a year for electric and you still save $305.00 in the first. 650.00+85.00=735.00 1040.00-735.00=305.00

So for less cost you get 4 times the processing power and after a year you get to keep using it .The only cost for rest of the pc’s estimated life, at least another 3 years, is the $85.00 a year electric.
Jmarks
ID: 47323 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 47329 - Posted: 2 Oct 2007, 16:48:54 UTC

No the idea is that much of the project bandwidth would be offloaded to the Amazon internet storage, and so that cost will be the majority of the expense, not the $1000/yr for a server.

You've not given any value to the maintenance and monitoring of that server. And that if the hardware it runs on crashes, you fire up a new one and it is Amazon that has to buy a new one.

My thought was that if you get a collection of 5 or 10 servers running 24/7, that you might be able to offload some 30% of the work done by Rosetta, Seti and Einstein combined. And each project would be much more scalable then they are at present.

You could also easily add more servers for brief periods during peak demand. Then shut them down. So you'd have the ability to fire up 3 extra servers for a day, at a cost <$10, then shut them down once the peak load has been served.

So the economy is in the scale when you start to picture idle server time on every BOINC project. And that when you combine things, you can squeeze out a significant portion of that idle time and utilize it for other projects.

In the end, at present, each project has to build up infrastructure to handle their peak demand, which probably only exists for a few days per month.

I'm sure the Amazon specs and pricing will change with time. Right now they only have one fixed configuration available. In the future, I'm sure they will offer servers with higher bandwidth, and/or higher Ghz.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 47329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 47336 - Posted: 2 Oct 2007, 19:59:12 UTC

It would be even better if their was a distributed servers for BOINC.

I know when we did FaD (before Rosetta@Home) all results, bandwidth etc where distributed via servers around the globe (three or four iirc). It would normally try the closest one and if that was down thry the others.
It allowed for the main project servers to go down for updates, repair and it could still get results...
The project could then collect the result when it liked.

Worked well.


The other distributed servers where just paid for hosts. so it's not really a new idea.
Team mauisun.org
ID: 47336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Natronomonas

Send message
Joined: 11 Jan 06
Posts: 38
Credit: 536,978
RAC: 0
Message 47354 - Posted: 3 Oct 2007, 12:36:34 UTC - in response to Message 47336.  

It would be even better if their was a distributed servers for BOINC.

I know when we did FaD (before Rosetta@Home) all results, bandwidth etc where distributed via servers around the globe (three or four iirc). It would normally try the closest one and if that was down thry the others.


It would be nice, but since there is now many BOINC projects, I guess the redundancy is in the project diversity, rather than any one project itself; as long as people are running at least two projects, they should have a pretty high uptime (it's unlikely two of the big projects would go down at once).

I guess the cost/benefit of maintaining the extra servers wasn't worth it.

I'm sure the BOINC project people have looked at options like Amazon, etc. However, they may be getting their power at reduced rates, the equipment tax-free, or any number of other factors that may skew the economics. Still, for the smaller projects especially, it might be a good way to 'test the waters'.
Crunching Rosetta as a member of the Whirlpool BOINC Teams
ID: 47354 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : BOINC server farm idea / survey



©2024 University of Washington
https://www.bakerlab.org