Am I causing the bandwidth problems? Are you?

Message boards : Number crunching : Am I causing the bandwidth problems? Are you?

To post messages, you must log in.

AuthorMessage
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 78888 - Posted: 8 Oct 2015, 2:02:55 UTC

Notwithstanding explanatory commentary from the actual scientists, right now I feel that the most reliable way to contribute to rosetta@home is to run ONLY the rb projects. At least I have a good chance of completing those work units and thus providing useful results. How many of you have reached the same conclusion?

That means I routinely abort non-rb projects before they can waste any running time--but even if I catch them during the original download, it seems the data gets downloaded anyway. (I'm pretty sure the data downloads to client machines are much larger than the uploaded results. Most of the data is presumably discarded after use and the uploads are almost negligible.) If bandwidth is their bottleneck resource, then people like me are probably making a bad situation worse...

Just one specific example of the problem: Two FKRP units managed to get started on one of my machines yesterday. I looked at their statuses at the end of the day when I had to shut down the machine. As confirmed the next morning, one of them lost at least 20 minutes of work since the last checkpoint, and the other lost 15 minutes. Those losses were actually on the small side for my observations of FKRP work units, but in general each time I turn off a machine about 15 minutes of rb work is lost. (Yes, sleeping the machines is an option, but there are reasons I rarely use that approach.)

There are various constructive suggestions, but I doubt it is worth discussing them here, since I would wager that most of them are out of the scope of rosetta@home. The most definitive solution would be P2P network caching of the bulk of the data, where the BOINC clients would first try to find a copy of the data from some other client that already has it... (That's based on a belief that a lot of the data is analysed several times in different ways.)

A less definitive approach that should be within the scope of the rosetta@home staff would be to slack up on the deadlines. Way up, if you ask me. I strongly believe they could save a lot of bandwidth with a little patience.
ID: 78888 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Timo
Avatar

Send message
Joined: 9 Jan 12
Posts: 185
Credit: 45,644,940
RAC: 71
Message 78889 - Posted: 8 Oct 2015, 4:18:33 UTC

I have 4 different machines of different configurations all crunching Rosetta work of various types and have not had any issues with any WU's failing with consistency for a while now (The last one was a known issue with a certain type of work unit which was subsequently pulled out of the project by project admins.)

I'm not sure why you would abort all WUs that aren't 'rb_'? Indeed this seems like a waste of bandwidth and could ultimately lead to an unreported result that would of otherwise been the lowest energy structure for that given simulation (once a certain WU has failed twice it is not sent out again, even if both of those failures were people force-aborting said WU).

I'd encourage anyone to not abort WUs. If you have issues with runtimes and checkpoints, maybe look at decreasing your target runtime as an alternative.
ID: 78889 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 78894 - Posted: 11 Oct 2015, 1:56:10 UTC

Well, obviously part of the problem is that I don't care that much about rosetta@home and do not want to spent a lot of time or energy figuring it out. I intended that comment to be a new thread on the topic, but failed. Not caring, I can't see that it's worth the effort, but... I'll go ahead and try again to start it as a new thread and attempt to clarify the new version to also address your [Timo's] comment.
ID: 78894 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Am I causing the bandwidth problems? Are you?



©2024 University of Washington
https://www.bakerlab.org