Discussion- Work Unit Distribution

Message boards : Rosetta@home Science : Discussion- Work Unit Distribution

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15257 - Posted: 2 May 2006, 10:06:28 UTC
Last modified: 2 May 2006, 10:10:15 UTC

(2) After going through the code today, I think we can reduce the memory requirements for the larger proteins by at least 25%; I hope to make progress on this front this week. In addition to easing the burden on lower memory machines, this may help to reduce some of the remaining low frequency errors.


Great news! Rhiju mentioned here that one can define minimum memory requirement for certain WUs: Too ease this problem further I recommend sending out large WUs only to machine with >=512 MB RAM. I think there are enough machines out which meet these specs.
ID: 15257 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15271 - Posted: 2 May 2006, 12:15:47 UTC - in response to Message 15257.  
Last modified: 2 May 2006, 12:24:25 UTC

... Too ease this problem further I recommend sending out large WUs only to machine with >=512 MB RAM. I think there are enough machines out which meet these specs.


While a number of people keep bringing up this idea, there is currently no way to do this. It is not as simple at it seems. The requirement is for 512Mb per CPU. So a system that has 1024MB but 4 CPUs is not really compatible with the large work units. and since the number of CPUs used is user adjustable, the decision to send a work unit to a particular machine would have to be done on the fly each time a work unit is sent. The sever load from this extra sorting would not be trivial.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15271 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Cureseekers~Kristof

Send message
Joined: 5 Nov 05
Posts: 80
Credit: 689,603
RAC: 0
Message 15288 - Posted: 2 May 2006, 13:54:06 UTC

Isn't it easier to make a new option in your preferences?
"Allow large workunits (uses more memory)"


Member of Dutch Power Cows
ID: 15288 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 15294 - Posted: 2 May 2006, 14:26:26 UTC - in response to Message 15288.  

Isn't it easier to make a new option in your preferences?
"Allow large workunits (uses more memory)"


The problem is that bigger proteins and/or more complex algorithms (e.g. full atom relax) require more memory.

A small PC with 256MB RAM, receiving a CASP WU which requires 250MB to run, will start swapping to disk, degrade performance and upset his owner.

So it needs to be handled by the BOINC server code, whereas it'd only send big WUs to hosts capable (>512MB, 24/7, leave-in-mem=yes) / willing (preference BigWU=yes)to handle them. But the BOINC-srv code is being developed by Berkeley University and sofar doesn't offer such capabilities (which aren't needed by most other BOINC projects; projects like F@H do it, but don't use BOINC).


Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 15294 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 15295 - Posted: 2 May 2006, 14:44:42 UTC - in response to Message 15294.  

But the BOINC-srv code is being developed by Berkeley University and sofar doesn't offer such capabilities (which aren't needed by most other BOINC projects; projects like F@H do it, but don't use BOINC).

Seems like this capability might be needed by new projects in the future, so it would appear prudent for BOINC development to look into this. Hopefully someone has contacted them with this need/request.

Regards,
Bob P.
ID: 15295 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15298 - Posted: 2 May 2006, 14:55:25 UTC - in response to Message 15295.  

But the BOINC-srv code is being developed by Berkeley University and sofar doesn't offer such capabilities (which aren't needed by most other BOINC projects; projects like F@H do it, but don't use BOINC).

Seems like this capability might be needed by new projects in the future, so it would appear prudent for BOINC development to look into this. Hopefully someone has contacted them with this need/request.

It has been suggested to the BOINC developers. Moreover, Rom is helping the project and he has seen and responded to the these forums on this subject. Rom is a BOINC developer. So they know about the issue.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15307 - Posted: 2 May 2006, 16:48:59 UTC - in response to Message 15271.  

... Too ease this problem further I recommend sending out large WUs only to machine with >=512 MB RAM. I think there are enough machines out which meet these specs.


While a number of people keep bringing up this idea, there is currently no way to do this. It is not as simple at it seems. The requirement is for 512Mb per CPU. So a system that has 1024MB but 4 CPUs is not really compatible with the large work units. and since the number of CPUs used is user adjustable, the decision to send a work unit to a particular machine would have to be done on the fly each time a work unit is sent. The sever load from this extra sorting would not be trivial.

Does that mean what Rhiju tried did not work because of the server load? Or does it mean even a requirement of >=512 MB RAM won't solve the problem since a 4-CPU-Machine might still not handle multiple large WUs? If the latter is the case requiring 512 MB RAM is still a _start_ which might not solve but _ease_ the problem a bit. It is still better to send those WU out to machines with >=512 MB RAM than to send it out randomly even if the WU may cause problems on multi CPU hosts with 512 MB RAM.
ID: 15307 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15313 - Posted: 2 May 2006, 18:52:26 UTC - in response to Message 15307.  

...
Does that mean what Rhiju tried did not work because of the server load? Or does it mean even a requirement of >=512 MB RAM won't solve the problem since a 4-CPU-Machine might still not handle multiple large WUs? If the latter is the case requiring 512 MB RAM is still a _start_ which might not solve but _ease_ the problem a bit. It is still better to send those WU out to machines with >=512 MB RAM than to send it out randomly even if the WU may cause problems on multi CPU hosts with 512 MB RAM.



The pertinent element here is this -

There is currently no way to do this.

Until the BOINC system is changed to accommodate this sort of feature it very likely not going to happen. As far as I am aware Rhiju was not testing for this capability, either here or at Ralph.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15313 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15328 - Posted: 2 May 2006, 19:53:14 UTC - in response to Message 15313.  

The pertinent element here is this -

There is currently no way to do this.

Until the BOINC system is changed to accommodate this sort of feature it very likely not going to happen. As far as I am aware Rhiju was not testing for this capability, either here or at Ralph.


but he said he did here:
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1493#14807

In any case I've cancelled the workunits and sent them out again with a minimum memory requirement of 300 Mb.


ID: 15328 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15333 - Posted: 2 May 2006, 20:45:35 UTC - in response to Message 15328.  

The pertinent element here is this -

There is currently no way to do this.

Until the BOINC system is changed to accommodate this sort of feature it very likely not going to happen. As far as I am aware Rhiju was not testing for this capability, either here or at Ralph.


but he said he did here:
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1493#14807

In any case I've cancelled the workunits and sent them out again with a minimum memory requirement of 300 Mb.


That is not the same thing you are asking for. He adjusted the Work Unit to limit the memory usage. He did not specify that the Work Units only be sent to systems with a specific memory size.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15333 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15345 - Posted: 2 May 2006, 21:48:54 UTC - in response to Message 15333.  
Last modified: 2 May 2006, 21:49:19 UTC

In any case I've cancelled the workunits and sent them out again with a minimum memory requirement of 300 Mb.

That is not the same thing you are asking for. He adjusted the Work Unit to limit the memory usage. He did not specify that the Work Units only be sent to systems with a specific memory size.


Ok I shut up. He should have written: "with a maximum memory requirement" and there would be no confusion by my side.
ID: 15345 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15357 - Posted: 2 May 2006, 23:01:56 UTC
Last modified: 3 May 2006, 12:22:02 UTC

Discussion moved from another thread
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15357 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15418 - Posted: 3 May 2006, 18:08:02 UTC

Sorry to be so insistent, but I asked in the boinc mailing list and I was told that it is possible to define memory requirements for different WUs.

See here how to generate work with minimum requirements and here for a description of the available limitations. There is for example:

rsc_memory_bound: A bound on the virtual memory working set size. The workunit will only be sent to hosts with at least this much available RAM. If this bound is exceeded, the application will be aborted.

This sounds pretty much as the feature which is needed to send out larger WUs only to hosts with >=512 MB RAM, doesn't it?

ID: 15418 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15419 - Posted: 3 May 2006, 18:29:41 UTC - in response to Message 15418.  
Last modified: 3 May 2006, 18:31:59 UTC

... If this bound is exceeded, the application will be aborted.

This sounds pretty much as the feature which is needed to send out larger WUs only to hosts with >=512 MB RAM, doesn't it?

No, actually it looks more like a way to assure an increase in the error rate.

It is of course up to the proeject to determine what they will or will not do in this area. But frankly, I would not recommend any action that would increase the work unit errors for the users. What you describe will only abort the workunit after it is on the users system as indicated by the phrase "the application will be aborted".

So the user gets to download the Work Unit, and then have it fail when it sees he has insufficient memory. Frankly, I do not see the issue you are trying to address here. The Work Units have been adjusted to run with less memory already. A few users have mentioned the size of the Work Units, but they do run on those systems.

The project has a volume of work it needs to process. To exclude an entire class of machines from that work against their will seems unreasonable and counter productive somehow. The setting you mention is NOT user selectable, it is a hard setting for the Work Unit at the server. If it were not that would be different. But it does not seem to fit the "Opt in/out" feature controlled by the user discussed here in any way, beyond excluding a large portion of the user base from any access to the more interesting work ahead.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15419 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15424 - Posted: 3 May 2006, 19:06:43 UTC - in response to Message 15419.  

... If this bound is exceeded, the application will be aborted.

This sounds pretty much as the feature which is needed to send out larger WUs only to hosts with >=512 MB RAM, doesn't it?

No, actually it looks more like a way to assure an increase in the error rate.

It is of course up to the proeject to determine what they will or will not do in this area. But frankly, I would not recommend any action that would increase the work unit errors for the users. What you describe will only abort the workunit after it is on the users system as indicated by the phrase "the application will be aborted".

So the user gets to download the Work Unit, and then have it fail when it sees he has insufficient memory. Frankly, I do not see the issue you are trying to address here. The Work Units have been adjusted to run with less memory already. A few users have mentioned the size of the Work Units, but they do run on those systems.

The project has a volume of work it needs to process. To exclude an entire class of machines from that work against their will seems unreasonable and counter productive somehow. The setting you mention is NOT user selectable, it is a hard setting for the Work Unit at the server. If it were not that would be different. But it does not seem to fit the "Opt in/out" feature controlled by the user discussed here in any way, beyond excluding a large portion of the user base from any access to the more interesting work ahead.


I think you misunderstood the description. There are two concepts involved:

1.) "The workunit will only be sent to hosts with at least this much available RAM."

This means if a host asks for work the scheduler checks its RAM specification and sends out only appropriate WU. If you are below the big WU specs you get the small ones. This is intended behaviour, isn't it?

2.) If this bound is exceeded, the application will be aborted.

This means if the WU exceed even the high specs it gets aborted on the client. This just means you should not understimate the memory requirements. But if you define the memory requirements liberal and generous that should be no problem. It is always a good idea to abort a WU if it exceeds a very high safety margin. If I would get WU which use more than 512 MB RAM I really would appreciate if they get auto-aborted (they would crash or cripple half of the comps either way).

If used wisely it would allow to send out very small WU even to comps with 128 MB RAM and at the same time using the full potential of high-spec-comps for really big WU. Of course all that should be properly tested on Ralph.

If used correct I can only imagin a drop in error rate since, more options for the project (huge WU for big comps) and more TFlops since 128MB-machines can contribute as well.
ID: 15424 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15425 - Posted: 3 May 2006, 20:04:23 UTC

I think that we're all on the same page. We want WUs that are suitable for the host requesting work.

I believe part of the problem is that regardless of the actual protein size, there are still wide variences in the amount of RAM required for it to run WELL. It's not going to crator your machine if it doesn't run well, it's just going to be doing a lot of page faulting, which means lots of disk activity.

Even a "simple" protein can end up having very deep paths of possible fit of the amino acids. And depending on how it plays out, the current approach Rosetta takes may deduce that out of 100 of the next possible orientations, only 5 are viable... or, they may be testing a more exhaustive approach, and decide they wish to pursue ALL of the 100 possible next orientations, and the resulting downstream from there.

In short, "it depends", and ALSO (almost MORE importantly) a LARGE degree of what it depends upon, is the Rosetta algorythms being used... which is changing and improving over time.

Over on the Ralph boards, there was a meantion of using a trickle approach. THIS would be the ultimate. It would allow you to do a single download and crunch the same WU for weeks at a time. Your client would occaisionally send up progress reports on the models it has crunched, and then proceed to try some more.

This will require significant code changes, and so it's doubtful this would be done during CASP.

Tralala, I think your suggestion has been heard. Indeed it has come up several times before in various ways (I'm sure I can come up with at least three threads if you would like to see them). It's not as simple as throwing a switch and declaring a WU as being a 512MB WU and another as a 256MB WU.

Some of the confusion is that there were some changes made to basically economize on how memory is used. The result is that ALL WUs take a little less memory than they used to (at least that's my understanding of what was done). They still don't have classes of WUs and hosts and the ability to match them. Even if the BOINC infrastructure has the exact capability that's required, it doesn't mean that R@H WUs are appropriate to tag that way, nor does it mean that they just flip a switch on a server and suddenly everything works that way.

They will do what they can to lighten the requirements imposed on the client, so they can get more TFLOPS by adding some means of supporting clients with less memory. But, at present the quidelines are in place for a reason, and much of that is yet to come in more complex WUs and new approaches to navigating them.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15430 - Posted: 3 May 2006, 20:29:04 UTC - in response to Message 15425.  

It's not as simple as throwing a switch and declaring a WU as being a 512MB WU and another as a 256MB WU.


I think it is, but it seems the mechanism was just not known to the project staff.
Of course WUs on the same protein vary but you still can distinguish between big and small. You can just assign the n biggest proteins to hosts with 512 MB and more RAM and the remaining to all. That is not a perfect match but it helps and it allows you to send out big WU without inflicting problems on old hosts.

I'm sure there are more details to work out and perhaps there is something rosetta specific which does not allow this to be used. But so far I read everywhere yes we would like to do that unfortunately BOINC does not allow that. I just wanted to make sure the project staff knows that there actually is already such a mechanism in BOINC. It is there and it is used by Einstein, Seti etc. It works and could be tested on Ralph. If they decide for other reasons not to use it in BOINC fine with me.
ID: 15430 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 15439 - Posted: 3 May 2006, 21:17:28 UTC

tralala, I just had a look at BOINC's source:

sched_send.C

It doesn't seem to do what we need, i.e. dynamically match WUs to host capabilities / preferences.
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 15439 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15441 - Posted: 3 May 2006, 21:32:18 UTC - in response to Message 15439.  
Last modified: 3 May 2006, 21:32:52 UTC

tralala, I just had a look at BOINC's source:

sched_send.C

It doesn't seem to do what we need, i.e. dynamically match WUs to host capabilities / preferences.


I can't read C but this is what I got from David Anderson as a reply to my question in boinc_alpha:

The tools for creating WUs (script and function) allow you to specify resource requirements on a per-WU basis; see http://boinc.berkeley.edu/tools_work.php

(this discussion belongs in boinc_projects or boinc_dev, not boinc_alpha)

-- David

Joachim Rang wrote:
> Hi,
>
> For some projects WU requirements differ. It would be helpful if one
> could specify _per WU_ the requirements of the host computer, such as
> RAM, CPU Type etc. Is such a feature planned?
>
> Kind regards
> Joachim
>
> _______________________________________________
> boinc_alpha mailing list
> boinc_alpha@ssl.berkeley.edu
> http://www.ssl.berkeley.edu/mailman/listinfo/boinc_alpha

ID: 15441 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Divide Overflow

Send message
Joined: 17 Sep 05
Posts: 82
Credit: 921,382
RAC: 0
Message 15589 - Posted: 5 May 2006, 19:58:11 UTC

It still does not preemptively match the WU to the host's capabilities at the scheduling level. The large memory WU would go out to everybody and would error out on all the hosts with insufficient RAM. That is not an acceptable solution.
ID: 15589 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Rosetta@home Science : Discussion- Work Unit Distribution



©2024 University of Washington
https://www.bakerlab.org