Have any idle server bandwidth?

Message boards : Number crunching : Have any idle server bandwidth?

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 52113 - Posted: 25 Mar 2008, 16:23:31 UTC
Last modified: 25 Mar 2008, 16:25:40 UTC

I've devised a means of extending the volunteer computing concept. Rather then just using the idle time on your home computer, now you can make use of idle bandwidth on an internet server.

It has been incredible how much bandwidth prices have dropped this past year. I have a shared hosting service on a 2-year subscription that provided the following:
1GB of disk space
5GB/month of download bandwidth.

Without any increase in price or even a contract renewal hitting yet, they extended my shared host server to:
150GB of disk space
1,500GB/month of bandwidth!
I feel like I've gone from the kiddie pool to the ocean! That's 1.5TB of bandwidth!

...anyway, it takes about 10TB per month to run Rosetta. What I've done is setup a rather basic set of PHP programs that can perform the role of file downloads and take that burden off the project's servers.

I'd like to get a few other people with similar unused bandwidth on existing servers together to form a group of systems that share doing the downloads. If you have >100GB/month of surplus bandwidth, and >1GB of storage available on a publicly addressable server, please send me a PM. We can specifically meter the usage if needed. But I don't think it is practicle trying to use hourly metered services. Too much overhead involved in assuring you don't exceed the limits and determining which servers still have bandwidth available.

If you are interested in getting a shared hosting service for other purposes, and could contribute a portion of it to the effort, I know of some available for as little as $3.00USD/month that have 1TB/month of bandwidth.

If you have a server behind a firewall, or would like to host the bandwidth used by your BOINC team, I also have PHP code that allows you to use such a server to dramatically reduce the internet bandwidth required to run the machines participating in the project on your local LAN. Each file is downloaded only once for all of your local machines. If you are interested in further details, please send me a PM.

My goal is to assemble contributions of 10TB/month of bandwidth capacity. This won't all be used right away. But will then free up the project servers to accomodate another 70TFlOPs of client computing power!

I'm setting aside a portion of my available capacity for other projects, and to leave some capacity to divide the work amongst the resulting group of servers. But I'll commit to 1TB per month of downloads.

Bandwidth commited to date: 1TB/month. GOAL: 10TB.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 52113 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 52128 - Posted: 26 Mar 2008, 13:17:35 UTC - in response to Message 52113.  
Last modified: 26 Mar 2008, 13:25:35 UTC

ok, i'm pretty certain i'm missing something...

1TB @ $3/mo * 10 = 10TB @ $30/mo = 70Tflops for Rosie?

i know i can't be right, because for 70tflops, wouldn't The Project be able to come up with $30/mo on their own?

anyway, i don't currently lease any shared host / server bandwidth, but i'd be willing to "donate" $3/mo to get some for you...

just in time for CASP!


...anyway, it takes about 10TB per month to run Rosetta.

...I know of some available for as little as $3.00USD/month that have 1TB/month of bandwidth.

But will then free up the project servers to accomodate another 70TFlOPs of client computing power!
ID: 52128 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 52131 - Posted: 26 Mar 2008, 14:52:19 UTC

All I am saying is that the project currently uses roughly 10TB per month to support the hosts that are currently producing the 70TFLOPs. I don't mean to imply that the servers I am looking for would be contributing any TFLOPs, they'd just be supporting the infrastructure required to do so... you still have to recruit more hosts to get the TFLOPs. ...yes, just in time for CASP, that is the idea.

I'm sure they can come up with $30/month (actually only $10 would do it, you get more then double the bandwidth when you double the $3 minimum account). In fact they would just add another server to the 3 they already use. But it would still be within the UW campus, and running through the same power substation, and same ISP etc.

Global diversity is better. ex: It is better to serve files to Europe from a server in Europe. And if the existing 3 Rosetta servers are hitting capacity limits, then adding a 4th will just try to push more out the pipe that is already full.

It takes some time to coordinate a globally diverse pool of systems. I've devised a way for me to take that time off their hands. Getting such a pool set up will also provide a working environment for corporate eyes to see and study and provide them a way to run BOINC that affords them the controls that they require. Thus bringing more uses to Rosetta! Seeing is believing.

I appreciate the offer to provide funds rather then servers. I would be willing to handle the logistics of setting it up and perform admin of it. In order to assure that such a server is around for the foreseeable future, I would probably structure such a thing with a group of 5 or 10 people funding it at $25 per year. That way the server is not dependant on funding from any one person.

I would prefer not to solicit money. It potentially complicates things... but, would this avenue of providing funding rather then bandwidth be something that would be more viable for others as well?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 52131 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 52132 - Posted: 26 Mar 2008, 15:13:55 UTC - in response to Message 52131.  
Last modified: 26 Mar 2008, 15:42:59 UTC

Understood that servers sought are to support infrastructure, not "crunch" for tflops.

No "hands-on experience" with this, and as far as i'm concerned, you're a trusted member and i feel comfortable donating $25/yr to this experiment, to assist you with obtaining the necessary resources and seeing what happens.
ID: 52132 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 52137 - Posted: 26 Mar 2008, 23:49:11 UTC

My first thought is that the Rosetta team needs to be able to certify that the files being downloaded haven't been tampered with. Any server they use must be controlled by either themselves or someone they trust.

My second thought is that some of these low cost bandwidth providers may use a business model where they promise huge bandwidth knowing that 99% of their users won't use anywhere near that much, then they throttle or cut off the 1% of users who do try to use the bandwidth. I've seen that sort of thing in the past.
ID: 52137 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,151
Message 52143 - Posted: 27 Mar 2008, 9:22:54 UTC

I don't see what this is all about. What files are you talking about here? Not wu's obviously. If you are talking about stats stuff, then most teams hold their own copies for their team stats anyway.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 52143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,569,893
RAC: 58,602
Message 52146 - Posted: 27 Mar 2008, 12:18:37 UTC - in response to Message 52143.  

I don't see what this is all about. What files are you talking about here? Not wu's obviously. If you are talking about stats stuff, then most teams hold their own copies for their team stats anyway.

The Rosetta files - the servers would act as a proxy and only download those files from the main R@H servers if they don't already have a local copy. The proxy would then have a local copy of almost all of the required files and so there would be minimal requests to the main R@H servers and therefore minimal load.
ID: 52146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 52148 - Posted: 27 Mar 2008, 14:38:04 UTC
Last modified: 27 Mar 2008, 14:41:43 UTC

Hi all, yes dcdc has used a prototype of my full-blown server. It actually handles the scheduler requests as though it were a project. It does so by passing the requests through to the Rosetta server, and then handling any file downloads that result. And this is how a corporate user may be afforded the controls and bandwidth reductions that might allow them to deploy Rosetta across an enterprise.

The idea here is somewhat simpler. When your host is assigned tasks, it is sent a list of files required to download to complete the task. In the case of Rosetta, all of these files are the same for an entire large set of tasks. None of them are unique to your host. What is unique is just the WU name, and the parameter on the command line that sets the random number seed, which defines exactly which models you will be crunching. But the command line is not in the files, it is in the scheduler reply (along with your runtime preference). The scheduling would still all be done directly by the project as it is now. Without my full-blown proxy in between.

So your client is sent a list of files to get, and a list of servers that host them. The client then goes down the list and if one server fails to serve the file, for most any reason, it tries the next server in the list. This is how the BOINC clients and servers recover after outages, and if a server is too busy to complete your request, the client tries the next one; a very simplistic method of load balancing.

So, the idea here is just to add another server to the top of the list of servers that can serve the file. This is done simply by adding a line in a project configuration file in the Baker lab, so they would have to trust the system before doing so. That new server will then distribute all of the load across a field of volunteer servers. This central control allows better optimization and scattering of the load. And yet, if that central hub should go offline for any reason, the client machine will just proceed down the list to the next server (which would be back to one of the original project servers). So, overall availability is enhanced.

As the BOINC client requests a file, it can process a redirect from the file server. So my central host would send redirects to enforce limited bandwidths, and to balance the load across the field, in addition to serving some of the files itself. This redirect load is why I reserved a portion of my host's bandwidth.

Yes, authentication of the files sent is essential. And in fact is already performed by your client. The files are sent with digital signitures that assure none of the files have been tampered with, and also confirm that the signiture was produced by the project. So any tampering or corruption of the file, or the signiture will be detected at the client.

As for the limits of shared hosting services, yes I totally agree that the published bandwidths may not actually be achievable. This is why I would much prefer to use a pool of about a dozen servers, and distribute work to each in proportion to the contributed bandwidth. If I accept contributions and purchase a single server with the money, I have not fully achieved the desired diversity.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 52148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,151
Message 52151 - Posted: 27 Mar 2008, 16:47:01 UTC

It would seem that there is a design issue if the same files are being sent over and over again to the same host. That aside, has the project a current problem, or is this a solution looking for a problem?
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 52151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 52152 - Posted: 27 Mar 2008, 18:08:17 UTC

I didn't say they were sending the same file to the same host. I meant to say that when they create a batch of 10,000 WUs or whatever, the roughly 6 or 8 files related specific to that batch are required by all hosts that process any of the 10,000 WUs.

Having said that, statistics from my full proxy server show a given host may request the same file several times in the same month. This relates to how BOINC works. Once all uncompleted WUs on your machine no longer require a file, it is removed. Then when you get more work down the road, it may be from a batch that needs the file again, and so it is again downloaded. When it's the 3MB file, that's certainly sub-optimal to download it more then once. On the other hand, to keep it around on your host would consume more disk space.

BOINC offers other approaches to handling files, but to use them would require more CPU time on every scheduler request. I think perhaps a compromise solution is also available. That being to set an expiration date on the files. That way they don't have to be reported on RPC, and the server doesn't have to determine if a delete request for every given file should be included in the scheduler reply.

I'm not positive how much of the campus internet capacity is being consumed or is available to Rosetta. Given the prices I am seeing now for high capacity shared hosts, perhaps 10TB/month ain't what it used to be. Will have to wait for the project team to comment to be sure.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 52152 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,151
Message 52153 - Posted: 27 Mar 2008, 19:05:19 UTC
Last modified: 27 Mar 2008, 19:14:35 UTC

I think there are a couple of issues. One is, as you say, the savings you are seeing on bandwidth are being seen by everyone, including, and probably more so with academic institutions who always get good deals anyway.

Using a torrent server has been suggested several times if there is a real problem, but none of the projects have considered that worthwhile.

The file deletion/disk space issue has the same trend. I have 160GB drives in both of my big quads, these are mostly empty space, but, I doubt I could economically buy disks that small any more. There are now several 1TB drives around, the price per GB is falling through the floor. If I had to store a 3MB file for a month extra, or a 3GB file for a year...

The biggest problem I've seen facing the projects, and I crunch a lot of them, and am active on a number of their boards/support groups, is database throughput.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 52153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,151
Message 52166 - Posted: 28 Mar 2008, 18:08:49 UTC
Last modified: 28 Mar 2008, 18:17:07 UTC

Here's something I've been thinking about, which may be of interest and worthy of pursuit.

When I grab the XML stats files for a project for my teams stats, I download megagobs of data, when I'm only interested in a small fraction of it, that pertaining to my team. My team is not the largest team in the world, but is nicely placed in the world top 50 with over 100,000,000 credit from our little population. Even so, most of the data I grab is junked.

Perhaps if a server network that offered a reliable fallback path so that a simple http query to it could be sure of getting a reply with current data was available, that had filtered the XML on a "per team basis", small team stats sites would be able to reduce their overheads.

Small teams on small servers with lowly connections and perhaps not the sharpest knife in the drawer servers would benefit enormously as they do not have to download many GB and then process it.

I don't really need it, but if it existed, I'd use it.

Less people hitting the bandwidth at the projects. More stats sides attracting crunchers to BOINC?
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 52166 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 52175 - Posted: 29 Mar 2008, 21:36:18 UTC

Feet1st
I think the integration of bittorrent into the client would be better for server load bandwidth reduction (hey the TV 'on demand' systems in the UK all do it, so why not boinc). I remember it was going to be an opt in option... but then I've not read anything boinc for quite some months now and if they have got around to adding it yet.

adrian
Isn't that what the RPC calls to the stats are for ?
You can just request what you need instead of getting all of it.




Team mauisun.org
ID: 52175 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 52176 - Posted: 29 Mar 2008, 23:07:56 UTC

To my knowledge, bittorrent has not yet been implemented by BOINC, but is still on the board. And in order for it to save you any bandwidth, you have to have the identical file being downloaded by multiple hosts at the same time. I'm still not clear on what the odds of that happening actually are.

And I've not seen much doc. describing the details of bittorrent, but it would seem to me that your machine would have to permit an incoming internet connection, and almost noone runs BOINC on such a server. And that if it doesn't work that way, that you would have to send the file to a central site where it gets resent to one specific requesting client. So now the "improved" download of the file travels across the internet twice. Seems rather wasteful. Perhaps someone can enlighten me on the mechanics of the dataflows of bittorrent.

The advantage the TV folks have is the relatively small number of channels possible for people to watch (as compared to all the files in a project's download directory). And the fact that the data (TV signal) is done in realtime, i.e. you don't have to begin the program again when a new person turns on their TV set (except perhaps for the on demand type of movies). Are you saying the movies use some sort of peer-to-peer system? Would seem to quickly lend itself to people stealing the movies.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 52176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,569,893
RAC: 58,602
Message 52177 - Posted: 29 Mar 2008, 23:50:24 UTC

This was my suggestion back in '06:

http://boinc.berkeley.edu/dev/forum_thread.php?id=1366
ID: 52177 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,151
Message 52179 - Posted: 30 Mar 2008, 8:07:04 UTC

@Fluffy Chicken.

I use the RPC's as well as the XML files. The RPC interface is not as standard as the XML however. In this thread on the BOINC site I provide an example of such. There are also many items that are drectly available in the XML files that are not included in any of the RPC's.

In this thread is an example of access to the RPC's actually being suspended, albeit by lack of knowledge then anything else.

I would like a one stop reliable network for good raw stats data that small teams could easily integrate into their existing websites without the need to parse 100's of megabytes of what is essentially junk to them.

I am not a terribly competitive person, that is why my portfolio of projects is so broad and features some notoriously bad "payer" projects. Within my own team however, I can see there are several members that simply crunch the high paying projects, the stats on our site are the most popular pages. I think if it could be easily developed, more people might be drawn to those teams, and to BOINC in general, a win-win situation.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 52179 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 52180 - Posted: 30 Mar 2008, 9:09:06 UTC

lol, I've got some reading to catch up on happenings, but they will add needed RPC calls if you want them. I remeber some where added at the request of the TeamDOC chaps (if they are still going). Not sure what additional information you want but TeamDOC managed to get pretty much every stat I could think of quickly and easily for a team, exl. known problem servers. ( www.boincteams.com but I've again not visited there either for a while )
I think the problem is projects just not keeping up with server developments, and that will be a problem using either xml or rpc calls.

What you say would be nice, but then the stats aggregating site where suppose to be for that purpose ? e.g. boinc.netsoft-online.com and it has (had) RPC calls, though a quick look and site seems to be individual based
Team mauisun.org
ID: 52180 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 52181 - Posted: 30 Mar 2008, 9:21:43 UTC - in response to Message 52176.  

To my knowledge, bittorrent has not yet been implemented by BOINC, but is still on the board. And in order for it to save you any bandwidth, you have to have the identical file being downloaded by multiple hosts at the same time. I'm still not clear on what the odds of that happening actually are.

And I've not seen much doc. describing the details of bittorrent, but it would seem to me that your machine would have to permit an incoming internet connection, and almost noone runs BOINC on such a server. And that if it doesn't work that way, that you would have to send the file to a central site where it gets resent to one specific requesting client. So now the "improved" download of the file travels across the internet twice. Seems rather wasteful. Perhaps someone can enlighten me on the mechanics of the dataflows of bittorrent.

The advantage the TV folks have is the relatively small number of channels possible for people to watch (as compared to all the files in a project's download directory). And the fact that the data (TV signal) is done in realtime, i.e. you don't have to begin the program again when a new person turns on their TV set (except perhaps for the on demand type of movies). Are you saying the movies use some sort of peer-to-peer system? Would seem to quickly lend itself to people stealing the movies.



Bittorrent works by a central server having all the files (aka seeder, here it would be r@h).
Everyone then gets the files from here, until a portion of it is downloaded, they then connect to all the other users downloading the files and grab from as many as possible to make there download faster, they then let others get the part files from them. This, once running, drops the original seeders file transfers a lot. All new files will still need to come from r@h till they are at at least one other server. 'Tasks' would still need to come from the r@h servers, but the support files would not, e.g. the main application, the other standard initial files and the other files you keep downloading over and over again that are part of the tasks (but actually the same file).

The other situation this helps is in LANs as it works just the same in the LAN setup, it's even possible to request from the LAN first before going to others.

---

The TV On demand is not actual VoD style but shows that have been aired and then put on a site for download, the UK major TV people doe this with some naff kontiki software for there download service. This is P2P software where you download say Doctor Who, then when other download the same program they will grab parts of the file from everyone else that has downloaded Doctor Who. (in a similar way to ho bittorrent works). I also believe many of the US TV companies use konitiki as well for their 'On demand downloads'.

It's not vidoe streaming, though it is possible in the same way.
Team mauisun.org
ID: 52181 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 52182 - Posted: 30 Mar 2008, 9:30:49 UTC

P.S. I'm not saying what you're trying to do is not a good idea.

Just does R@H servers actually need it, i've not seen them have any problems.

It would be useful for small 'home run' projects, just bittorrent style transfers would be better as it could accomodate all.

this is useful though
If you have a server behind a firewall, or would like to host the bandwidth used by your BOINC team, I also have PHP code that allows you to use such a server to dramatically reduce the internet bandwidth required to run the machines participating in the project on your local LAN. Each file is downloaded only once for all of your local machines. If you are interested in further details, please send me a PM. and something you should persue, though if bittorrent was implement properly everyone could do it without thinking.


Bitttorent protocol, (though think it as the concept not how it is actually implemented)
http://en.wikipedia.org/wiki/BitTorrent_(protocol)
Team mauisun.org
ID: 52182 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 18 Sep 05
Posts: 652
Credit: 11,662,550
RAC: 1,151
Message 52183 - Posted: 30 Mar 2008, 11:24:28 UTC
Last modified: 30 Mar 2008, 11:27:34 UTC

but they will add needed RPC calls if you want them

They may well do so, but at what priority level? The question I asked about the existing RPC has been there since last December and nobody has answered it.

I am subscribed to the dev and stats mail lists. Dev is really busy with BOINC 6 right now, I can't imagine anyone dallying with RPC's right now. It would also take a good deal of work. The existing RPC's are okay to get team and user info, but not hosts - and there are a lot more hosts then either of the other categories.

I looked over boincteams when it started up. It never seems to have much traffic, (apart from team ads!), - I think it is a good idea.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 52183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Have any idle server bandwidth?



©2024 University of Washington
https://www.bakerlab.org