System requirements????

Message boards : Number crunching : System requirements????

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 50081 - Posted: 26 Dec 2007, 21:26:28 UTC


Each specific task can be resent as many as 3 times. Normally the first person it is sent to completes it and returns it within the deadline, and so it is not sent again. If the first person fails for any reason, including the deadline, then it is resent. Often you will see that the second person that gets the task completes it normally. In addition to a completed result, this also gives the Project Team information about whether specific operating systems are having problems that are not seen on others. So, they are often resent, but only to a very limited degree.


I don't have a problem with this, mybe i wasn't clear in what i asked.

A number of people are having problems with getting tasks that need more

memory then they have and so the tasks are failing. The hosts that got them

before me had 256 or 512mb ram and they errored then they have been sent to me.

Why are they getting them in the first place?

Is the system failing to see the hosts memory?

pete.





ID: 50081 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 50084 - Posted: 26 Dec 2007, 22:45:39 UTC

I see what you mean about that specific work unit where the first host got an error about maximum memory exceeded. But I had thought that others were having problems getting any work sent to them. I'm not clear why the BOINC server code would schedule work for a client that won't be able to run it. Nor am I clear on why a 256MB client would get a workunit and fail on it, and others would get messages that no work was sent due to their allowed memory.

I have seen in the past where the "short list" of available work gets overpopulated with high memory tasks and so normal memory hosts aren't able to get work. It's frustraiting all around, because there are normal tasks out in the 20,000 available... but the BOINC server code doesn't work to keep any on the short list of tasks it keeps in shared memory. It only searches the first x tasks for one your machine can process, and if it doesn't find any then it gives up. That's why such problems are most common after an outage of the project server, because the server is getting hit with so many requests... and I think it will send low memory tasks to high memory hosts. Just searching down the list until it reaches it's search limit, or finds something you can crunch. So these high memory tasks tend to float to the top of the list because many hosts are running past them in the list looking for work. Once the search limit is filled with high memory tasks, noone can get normal memory work until a high memory host comes along and pulls some of those tasks off.

This is a BOINC issue and has been discussed on the BOINC boards. But I don't know if there is a plan to enhance the BOINC server's scheduler to better handle multiple types of tasks.
Rosetta Moderator: Mod.Sense
ID: 50084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 50085 - Posted: 26 Dec 2007, 22:49:01 UTC - in response to Message 50084.  

I see what you mean about that specific work unit where the first host got an error about maximum memory exceeded. But I had thought that others were having problems getting any work sent to them. I'm not clear why the BOINC server code would schedule work for a client that won't be able to run it. Nor am I clear on why a 256MB client would get a workunit and fail on it, and others would get messages that no work was sent due to their allowed memory.

I have seen in the past where the "short list" of available work gets overpopulated with high memory tasks and so normal memory hosts aren't able to get work. It's frustraiting all around, because there are normal tasks out in the 20,000 available... but the BOINC server code doesn't work to keep any on the short list of tasks it keeps in shared memory. It only searches the first x tasks for one your machine can process, and if it doesn't find any then it gives up. That's why such problems are most common after an outage of the project server, because the server is getting hit with so many requests... and I think it will send low memory tasks to high memory hosts. Just searching down the list until it reaches it's search limit, or finds something you can crunch. So these high memory tasks tend to float to the top of the list because many hosts are running past them in the list looking for work. Once the search limit is filled with high memory tasks, noone can get normal memory work until a high memory host comes along and pulls some of those tasks off.

This is a BOINC issue and has been discussed on the BOINC boards. But I don't know if there is a plan to enhance the BOINC server's scheduler to better handle multiple types of tasks.


seams a simple solution for this, give it a max memory.
ID: 50085 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 50186 - Posted: 30 Dec 2007, 16:17:09 UTC - in response to Message 50084.  
Last modified: 30 Dec 2007, 16:23:01 UTC

I see what you mean about that specific work unit where the first host got an error about maximum memory exceeded. But I had thought that others were having problems getting any work sent to them. I'm not clear why the BOINC server code would schedule work for a client that won't be able to run it. Nor am I clear on why a 256MB client would get a workunit and fail on it, and others would get messages that no work was sent due to their allowed memory.

There's atleast 2 types of Rosetta-wu's, "low"-memory-wu's and "high"-memory-wu's. The "high"-memory-wu's seems to be marked at 763 MB or something now, meaning you'll need atleast 768 MB installed memory, and needs to increase BOINC's memory-preferences to 99% if you've only got 768 MB installed memory.

The "low"-memory-wu's is incorrectly marked of needing 96 MB memory. This is too low, since in practice they can use around 120 MB or something of "real" memory, probably more if uses screensaver. Meaning, computers that has 256 MB memory and 50% memory-preference, will get assigned the wu since 96 MB < 127 MB. But, during crunching, Rosetta@home uses more than 127 MB, and the wu is correctly aborted by BOINC.

Whatever "extra" is assigned in pagefile isn't a problem, the problem is the "low"-memory-wu's uses more than 96 MB "real" memory. Rosetta@home mis-configuring their "low"-memory-wu's is a Rosetta@home-problem, and has nothing to do with BOINC.

Seasonal Attribution made the same mistake, setting memory-requirement to 256 MB, while in reality uses 430 MB, 480 MB if displays screensaver. Still, they did write the requirement was 1 GB on web-page...

I have seen in the past where the "short list" of available work gets overpopulated with high memory tasks and so normal memory hosts aren't able to get work. It's frustraiting all around, because there are normal tasks out in the 20,000 available... but the BOINC server code doesn't work to keep any on the short list of tasks it keeps in shared memory. It only searches the first x tasks for one your machine can process, and if it doesn't find any then it gives up. That's why such problems are most common after an outage of the project server, because the server is getting hit with so many requests... and I think it will send low memory tasks to high memory hosts. Just searching down the list until it reaches it's search limit, or finds something you can crunch. So these high memory tasks tend to float to the top of the list because many hosts are running past them in the list looking for work. Once the search limit is filled with high memory tasks, noone can get normal memory work until a high memory host comes along and pulls some of those tasks off.

This is a BOINC issue and has been discussed on the BOINC boards. But I don't know if there is a plan to enhance the BOINC server's scheduler to better handle multiple types of tasks.

Yes, it is a current weakness in BOINC that low-memory-tasks is assigned to computers with lots of memory, and the Feeder doesn't keep a portion of low-memory-tasks available if same application has wu's with different memory-requirements. Not sure if there's any current plans to change this, and even if there is, it can take many months before it's changed, if Rosetta@home doesn't program the changes themselves...

But, the Feeder can be configured to keep a certain amount of Tasks available, as long as the Tasks is for different applications...

So, a work-around would be to use the same actual Rosetta-application, but duplicate it and call one application "low-memory" and another "high-memory", and as long as there is any "low-memory" wu's generated, the Feeder will have some available.

Now, this won't fix the other problem, that computers with lots of memory still grabs low-memory-wu's. To fix this problem, some changes must be made. Still, would guess a small customization of Rosetta@home-scheduler to do something like this would work:
"if BOINC usable memory >= 1 GB, set computer to 'only crunch high-memory-application' except if none available".
"if BOINC usable memory < 1 GB, set computer to 'only crunch low-memory-application' except if none available".
"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 50186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 50188 - Posted: 30 Dec 2007, 17:55:07 UTC - in response to Message 50186.  

Rosetta@home mis-configuring their "low"-memory-wu's is a Rosetta@home-problem, and has nothing to do with BOINC.


Where do you see the memory requirement assigned to a task? Is that in one of the xml files? I've been reviewing TCP traces of the client interactions with the project, and I don't recall seeing that in the data.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 50188 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 50189 - Posted: 30 Dec 2007, 18:23:25 UTC - in response to Message 50186.  
Last modified: 30 Dec 2007, 18:24:44 UTC

"if BOINC usable memory < 1 GB, set computer to 'only crunch low-memory-application' except if none available".


here is also something wrong! here you say that systems with few memory can crunch tasks that requier large amounts of memory, that aint gonna work.

i would suggest make the scedular see 3 wu's, and 3 types of machines. (multiple core issues has to be helped out first, so boinc sees memory/core and not total memory.)

machines with less then 512 mb of memory installed
machines with 512 to 1024 mb of memory installed
and machines with 1024 or more mb's of memory installed

and then make it like this :
give all machines wu's that fit within above rules.
if there are to much WU's with "less then 512" specification, then allow "512 to 1024mb" machines to crunch on those wu's
if there are not enaugh 1024 or more wu's let the "1024mb or more" pc's crunch "512 to 1024mb" tasks.

in this way i think the tasks getter devided better, and we can make optimal use of the resources we have.

[EDIT]
numbers metioned above can be changed, to optimize the spread of WU's.
ID: 50189 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 50210 - Posted: 31 Dec 2007, 13:27:37 UTC - in response to Message 50188.  

Where do you see the memory requirement assigned to a task? Is that in one of the xml files? I've been reviewing TCP traces of the client interactions with the project, and I don't recall seeing that in the data.

It's like other things listed in client_state.xml, marked <rsc_memory_bound>

If you haven't connected to Scheduling-server after got assigned Task(s), it's also listed in sched_reply_project-url.xml

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 50210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 50215 - Posted: 31 Dec 2007, 14:20:55 UTC - in response to Message 50189.  

"if BOINC usable memory < 1 GB, set computer to 'only crunch low-memory-application' except if none available".


here is also something wrong! here you say that systems with few memory can crunch tasks that requier large amounts of memory, that aint gonna work.

Not sure you're ever run World Community Grid, here users can choose to run only one or a few of the available applications, and in case limited to not all applications can also make a choise of "if no work available for selected application(s), send any other work".

But, even if user has set to only accept one type of task, BOINC still does the normal checks if computer has enough memory, free disk space and so on to handle the task.


Meaning, if there aren't any "low-memory" tasks available, a computer with less than 763 MB usable BOINC-memory still won't get the "high-memory" tasks.

The only mistake did make is, if not mis-remembers the BOINC-defaults is to use max 90% memory then idle, so setting memory-limit to 1 GB will disallow a large portion of usable computers. Setting the limit to 900 MB or something would be better.

i would suggest make the scedular see 3 wu's, and 3 types of machines. (multiple core issues has to be helped out first, so boinc sees memory/core and not total memory.)

machines with less then 512 mb of memory installed
machines with 512 to 1024 mb of memory installed
and machines with 1024 or more mb's of memory installed

and then make it like this :
give all machines wu's that fit within above rules.
if there are to much WU's with "less then 512" specification, then allow "512 to 1024mb" machines to crunch on those wu's
if there are not enaugh 1024 or more wu's let the "1024mb or more" pc's crunch "512 to 1024mb" tasks.

in this way i think the tasks getter devided better, and we can make optimal use of the resources we have.

[EDIT]
numbers metioned above can be changed, to optimize the spread of WU's.

As long as Rosetta@home AFAIK only has 95.37 MB and 763 MB-tasks, having more split-up wouldn't change anything.

Still, 512 MB installed memory is a fairly popular choise, going by The Computational and Storage Potential of Volunteer Computing of active SETI@home-computers February 2006, looking on figure 10 roughly 86.5% had atleast 512 MB, while roughly 50.8% had atleast 1 GB, and rougly 12.2% atleast 2 GB. Note, this is cpu-power, not #computers.

Going by my own, very unofficial, and potentially very wrong data from February 2007, similar SETI@home-data for active computers is:
4.65% more than 2 GB
24.69% atleast 2 GB
29.28% more than 1 GB
62.08% atleast 1 GB
68.15% more than 512 MB
88.08% atleast 512 MB

So, my guess is, in February 2008, atleast 30% of cpu-power will have atleast 2 GB memory, and maybe 75% atleast 1 GB.

As long as Rosetta@home has the "low"-memory-wu's that uses 130 MB or something, sneaking-in under 512 MB will be less and less important as users upgrades to faster computers with more and more memory.


How good these data fits Rosetta@home is another matter, but Rosetta@home can take a look on their database. Also, it is possible to wade-through the stats-dumps to gather info...


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 50215 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 50219 - Posted: 31 Dec 2007, 15:44:33 UTC

therefore we should ask astro, if he can gather some date like that with his program's. and find out wat is really needed, etc.
ID: 50219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 50224 - Posted: 31 Dec 2007, 16:48:00 UTC - in response to Message 50210.  

It's like other things listed in client_state.xml, marked <rsc_memory_bound>

If you haven't connected to Scheduling-server after got assigned Task(s), it's also listed in sched_reply_project-url.xml


Thanks. Now I wonder was "rsc" stands for?

I found this in the BOINC wiki
Memory Management.

I've got 3 machines, all have 1GB or more and 2 CPUs. So all are capable of running the "high memory" tasks, which used to be limited to something just shy of 512MB. Appears now that perhaps there are higher memory tasks then that (as based on the msgs people are reporting). Reviewing task manager in Windows, I see one active Rosetta task using over 180MB for "Mem Usage" on each of these machines. I reviewed the client state files on all of these systems and the rsc_memory_bound for all WUs is 100000000, which I presume is in bytes, and that is how you get to the 96MB number you mentioned.

Reading the above wiki link, it is unclear if these changes have actually been implemented. But it says that if the task exceeds the bound at any time, it will be aborted. It also says that on the server side, as the work unit is created, this is "an estimate", yet the client side sees it as a hard limit. So it seems to contradict itself.

In any case, my 3 Windows machines are all running these tasks and all have exceeded the 96K bound. I'm using BOINC 5.10.20. So, I take it that the enforcement and methods outlined in the wiki are not implemented at that version. Does anyone know specifics about if this has been implemented and if so, in which BOINC version?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 50224 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 50235 - Posted: 1 Jan 2008, 16:57:45 UTC - in response to Message 50224.  
Last modified: 1 Jan 2008, 17:11:07 UTC

Thanks. Now I wonder was "rsc" stands for?

Hmm, no direct idea, but would guess on something like ReSourCe or something...

I found this in the BOINC wiki
Memory Management.

I've got 3 machines, all have 1GB or more and 2 CPUs. So all are capable of running the "high memory" tasks, which used to be limited to something just shy of 512MB. Appears now that perhaps there are higher memory tasks then that (as based on the msgs people are reporting). Reviewing task manager in Windows, I see one active Rosetta task using over 180MB for "Mem Usage" on each of these machines. I reviewed the client state files on all of these systems and the rsc_memory_bound for all WUs is 100000000, which I presume is in bytes, and that is how you get to the 96MB number you mentioned.

Yes, it's bytes, due to the (wrong) usage of "Mega", this becomes 95.37 MB.

A quick look, I've got one running task, that is marked as 95.37 MB, but according to Task Manager has peak 203.8 MB, this is more than double that is specified in wu...

Reading the above wiki link, it is unclear if these changes have actually been implemented. But it says that if the task exceeds the bound at any time, it will be aborted. It also says that on the server side, as the work unit is created, this is "an estimate", yet the client side sees it as a hard limit. So it seems to contradict itself.

In any case, my 3 Windows machines are all running these tasks and all have exceeded the 96K bound. I'm using BOINC 5.10.20. So, I take it that the enforcement and methods outlined in the wiki are not implemented at that version. Does anyone know specifics about if this has been implemented and if so, in which BOINC version?

Each wu has 5 different limits:
<rsc_fpops_est>
<rsc_fpops_bound>
<rsc_memory_bound>
<rsc_disk_bound>
<delay_bound>

<delay_bound> is used by Scheduling-server then a Task is assigned, there
report_deadline = now + delay_bound
Used by client, to try to return all Tasks by their deadline. Newer clients is better than older clients.

<rsc_fpops_est> together with <duration_correction_factor> and some other parameters and <delay_bound> is used by Scheduling-server to see if a Task can be sent or not. Note, for some reason it was decided a client that has no task in a project can still get 1 task, even if can't meet <delay_bound>.
On client, used to estimate remaining cpu-time.

<rsc_disk_bound> is enforced by Scheduling-server, if a computer hasn't enough BOINC-usable disk-space, it will never be assigned.
Client, if the result-file(s) exceeds <rsc_disk_bound>, they get aborted. Not sure if it also includes any other temporary files in it's "Slots"-directory or not...

<rsc_fpops_bound> is client-side only. Since client doesn't really know how many flops a computer has used on a Task, in reality <rsc_fpops_bound> is a max cpu-time-limit:
if current_cpu_time > rsc_fpops_bound / p_fpops => abort_task
This means that anyone that runs an "optimized" BOINC-client, or other method to artificially increase p_fpops, has a bigger chance of hitting this limit...


<rsc_memory_bound> is the problematic one...
Scheduling-server has for a very long time also enforced this limit, but with v5.8.xx this has been slightly changed:
if pre-v5.8.xx-client:
if rsc_memory_bound > m_nbytes => don't assign task

if v5.8.xx or later clients:
if rsc_memory_bound > m_nbytes * max of (ram_max_used_busy_pct or ram_max_used_idle_pct) / 100 => don't assign task.

Client-side on the other hand, <rsc_memory_bound> has never been enforced, and not even in v6.1.x is this being used.

But, v5.8.xx-clients and later does enforce the 2 ram_max_used_busy_pct and ram_max_used_idle_pct. Meaning, if a Task uses more memory than max of these 2 limits, it will be aborted.

Since memory-limit is taken care of by the 2 ram-usage-parameters, it's likely decided it's not important that <rsc_memory_bound> is exceeded or not.

Also, BOINC support applications to variate memory-usage depending on available memory and memory-preferences, meaning <rsc_memory_bound> is set low so all computers can get the task, and on low-memory-computers this limit isn't exceeded. On computers with lots of memory on the other hand, maybe 3x more memory than <rsc_memory_bound> is being used. Due to this, would guess it's unlikely <rsc_memory_bound> will ever be enforced by client.


While client-side enforcing of <rsc_fpops_bound> and <rsc_disk_bound> has been included since pre-v3.xx, and Scheduling-server has enforced <rsc_memory_bound> and <rsc_disk_bound> has also been included since 2004, the big weakness is that #cpu's is not taken into consideration on either client or server.

The only very limited support is that after a Task has already started it can be paused if memory-usage gets too high, but better than to try to start multiple "high-memory" tasks would be to run 1 "high-memory" and 1 "low-memory" instead....
"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 50235 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 50238 - Posted: 1 Jan 2008, 19:02:50 UTC - in response to Message 50235.  


While client-side enforcing of <rsc_fpops_bound> and <rsc_disk_bound> has been included since pre-v3.xx, and Scheduling-server has enforced <rsc_memory_bound> and <rsc_disk_bound> has also been included since 2004, the big weakness is that #cpu's is not taken into consideration on either client or server.

The only very limited support is that after a Task has already started it can be paused if memory-usage gets too high, but better than to try to start multiple "high-memory" tasks would be to run 1 "high-memory" and 1 "low-memory" instead....



so if i get this right, they have to rewrite the largest part of the Boinc program to work with multiple cpu's when checking to memory etc...
ID: 50238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : System requirements????



©2024 University of Washington
https://www.bakerlab.org