BOINC thread: Questions re scheduler & v5.8.x

Message boards : Number crunching : BOINC thread: Questions re scheduler & v5.8.x

To post messages, you must log in.

AuthorMessage
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 37403 - Posted: 4 Mar 2007, 14:06:23 UTC

Q1

I am not clear just what the scheduler does when it refuses to give me work because of not having enough memory in the case where there is a variety of sizes of work on wating to be issued.

Does it

a) issue jobs strictly from a single queue, (so that I am refused work if the next job happens to be too big for my machine, and so that a big machine might get given a small-memory job)

b) try to issue the next job in the queue, but look for a smaller one if need be (so that I get work if there is any that fits, but so that a big machine might still be given a small job)

c) issue the largest job that will fit in my machine (so that big machines do not take all the small jobs leaving the small machines without work)

Clearly from a user perspective (c) is what we want, and if not (c) then at least (b). My impression (on insufficient evidence to be certain as yet) is that the scheduler is operating (a).

Q2.

Is it possible for users & projects to set things up so that a job asks for less memory whern there is no graphics needed (could be that the user is running a Linux command-line only box, like me, or that the user is running a windows box in service mode, no shared graphics).

My suspiscion is that I am being denied work that my box could run perfectly well, simply because I do not have enough RAM to run graphics, but that is not an issue because I don't want to run graphics anyway.

Does anyone know how the new code works and/or what the design intentions were?
River~~
ID: 37403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keck_Komputers
Avatar

Send message
Joined: 17 Sep 05
Posts: 211
Credit: 4,246,150
RAC: 0
Message 37436 - Posted: 4 Mar 2007, 22:27:32 UTC
Last modified: 4 Mar 2007, 22:31:31 UTC

If my memory serves me correctly it is closer to d) none of the above.

1) A clean scheduler will issue results randomly from the available pool.

2) If a result is deemed unsuitable for a host it is prioritized so that work that has failed to be sent to a host will be attempted before a clean result is attempted.

The server will generally attempt about 50 results per RPC. If none are suitable then you get the no work message.

This may be outdated information. It was accurate prior to the implementation of locallity scheduling. Locallity and HR schedulers work differently.

edit: The design goal is to get bad workunits out of the system. The prioritization is by incrementing a variable called infeasable_count associated with the workunit. Once that reaches a project settable threshold the workunit is automatically cancelled.
BOINC WIKI

BOINCing since 2002/12/8
ID: 37436 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 37457 - Posted: 5 Mar 2007, 6:47:04 UTC - in response to Message 37436.  

Thanks for replying John.

If my memory serves me correctly it is closer to d) none of the above.

1) A clean scheduler will issue results randomly from the available pool.

2) If a result is deemed unsuitable for a host it is prioritized so that work that has failed to be sent to a host will be attempted before a clean result is attempted.

The server will generally attempt about 50 results per RPC. If none are suitable then you get the no work message.

This may be outdated information. It was accurate prior to the implementation of locallity scheduling. Locallity and HR schedulers work differently.

edit: The design goal is to get bad workunits out of the system. The prioritization is by incrementing a variable called infeasable_count associated with the workunit. Once that reaches a project settable threshold the workunit is automatically cancelled.


This is a counter-productive goal, plausible as it sounds at first sight.

The effect is that when 50 prioritised WU are in the system, work will only be issued to large boxes.

It does have the advantage that large machines are more likley to get large work.

Keeping to the spirit of this approach, a better approach would be that the prioritisation of previously-rejected work is not applied where, earlier in the same RPC, a workunit has already been rejected. That way the scheduler is not setting itself up to fail. The first offering would be prioritised, but if that were too large then the other 49 random offerings would be from the general pool not from the already-too-big sub-pool.

I still feel that my (c) would be even better - keeping subpools of work of different sizes and issuing the largest suitable work each time. However that may be more work than the change is worth, and may have knock on effects on other scheduling decisions that I don't know about.

If you think there is any merit in either of these suggestions, please feel free to pass them on to the BOINC lists.

R~~
ID: 37457 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keck_Komputers
Avatar

Send message
Joined: 17 Sep 05
Posts: 211
Credit: 4,246,150
RAC: 0
Message 37501 - Posted: 5 Mar 2007, 23:02:23 UTC

You are correct a better way to do this may be needed for projects with multiple classes of tasks. The other way should still be good for projects where there is less variation in task complexity. I sent an email to the developer's list pointing to this thread.
BOINC WIKI

BOINCing since 2002/12/8
ID: 37501 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
John McLeod VII
Avatar

Send message
Joined: 17 Sep 05
Posts: 108
Credit: 195,137
RAC: 0
Message 37752 - Posted: 13 Mar 2007, 12:51:57 UTC - in response to Message 37501.  

You are correct a better way to do this may be needed for projects with multiple classes of tasks. The other way should still be good for projects where there is less variation in task complexity. I sent an email to the developer's list pointing to this thread.

It might be a good idea to have a where clause on the query to find work that includes such things as the time between connections, RAM size, and HD space allowed. However, that is not exactly my area of expertise.


BOINC WIKI
ID: 37752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 37756 - Posted: 13 Mar 2007, 13:29:08 UTC

...Or perhaps a second query runs with specific criteria, only if the first turns up no available work. That way if the work queue has many high memory tasks in front of the line, a machine that does not meet those criteria could still locate available work without getting the message about how there is work on the server but the machine doesn't have enough memory. It would simply get a task that is the highest available priority for that it is able to crunch.

Conversely, it would be good if high memory systems tried to get high memory tasks first, or had a higher then average tendency to get them. That would tend to avoid the first problem from occuring.

Unfortunately, Rosetta's high memory tasks are just now on that threshold where one may be supported, but two may be too large from a dual processor machine. So the second thought here could lead to idle time that would be maximized by getting a low memory task and a high memory task at the same time.

Not a trivial task, but seems like avoiding the "work is available but..." message should be fairly doable.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 37756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : BOINC thread: Questions re scheduler & v5.8.x



©2024 University of Washington
https://www.bakerlab.org