Leave in Memory?

Message boards : Number crunching : Leave in Memory?

To post messages, you must log in.

AuthorMessage
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 9050 - Posted: 15 Jan 2006, 0:23:56 UTC

Ok. The Honeymoon is over. When is the problem that forces users to leave the applicaion in memory going to be fixed? It was brought to the forground over 4 months ago.

ID: 9050 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ethan
Volunteer moderator

Send message
Joined: 22 Aug 05
Posts: 286
Credit: 9,304,700
RAC: 0
Message 9055 - Posted: 15 Jan 2006, 0:58:03 UTC - in response to Message 9050.  
Last modified: 15 Jan 2006, 0:58:37 UTC

I don't think quoting a time is going to help anything. They have a list of issues that need to be worked on, and they prioritize their work on what is at the top of the list at any given time.

I operate half a dozen machines with 128 of ram, and dozens with 256, all of which haven't shown the least bit of trouble leaving the application in memory. This is even when they are running all our business applications and the physical memory is full (in some cases using >60% of the scratch space).

I'm not suggesting the bug should be forgotten, but there are issues that prevent people from participating (freezing at 1%, max time exceeded, bandwidth) that are currently being worked on.

-Ethan
ID: 9055 · Rating: 2 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile nasher

Send message
Joined: 5 Nov 05
Posts: 98
Credit: 618,288
RAC: 0
Message 9063 - Posted: 15 Jan 2006, 2:31:43 UTC

yes i would like it so you dont have to leave in memory but guess what i have that option selected even before i knew it was important for this project

Yea there are lots of issues out there and Leave in memory is one of them...

personaly i dont think its rank 1 or so priority

also it may take time to figur out WHY it errors when it swaps... more alpha/beta testing
ID: 9063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 9075 - Posted: 15 Jan 2006, 8:46:48 UTC - in response to Message 9055.  


ID: 9075 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Andrew

Send message
Joined: 19 Sep 05
Posts: 162
Credit: 105,512
RAC: 0
Message 9104 - Posted: 15 Jan 2006, 20:32:22 UTC - in response to Message 9075.  


ID: 9104 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Stites

Send message
Joined: 17 Sep 05
Posts: 25
Credit: 1,837,114
RAC: 0
Message 9112 - Posted: 15 Jan 2006, 22:54:28 UTC - in response to Message 9050.  

Ok. The Honeymoon is over. When is the problem that forces users to leave the applicaion in memory going to be fixed? It was brought to the forground over 4 months ago.

I don't leave mine in memory and I run Rosetta just fine.
David Stites
Mount Vernon, WA USA

ID: 9112 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 17 Sep 05
Posts: 161
Credit: 162,253
RAC: 0
Message 9118 - Posted: 16 Jan 2006, 2:15:09 UTC - in response to Message 9112.  
Last modified: 16 Jan 2006, 2:15:44 UTC

I don't leave mine in memory and I run Rosetta just fine.


If just fine means it completes SOME work units OK, I guess you're right.

Your Athlon X2 (host id 765) has nearly as many WUs with client errors as it has successes - I wonder how many of those are due to you not leaving them in memory.

*** Join BOINC@Australia today ***
ID: 9118 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keck_Komputers
Avatar

Send message
Joined: 17 Sep 05
Posts: 211
Credit: 4,246,150
RAC: 0
Message 9135 - Posted: 16 Jan 2006, 11:12:19 UTC - in response to Message 9118.  

I don't leave mine in memory and I run Rosetta just fine.


If just fine means it completes SOME work units OK, I guess you're right.

Your Athlon X2 (host id 765) has nearly as many WUs with client errors as it has successes - I wonder how many of those are due to you not leaving them in memory.

Yep I am also seeing this problem. Unfortunately leaving applications in memory will not help me. The host usually won't get back to the workunit before a restart anyway.
BOINC WIKI

BOINCing since 2002/12/8
ID: 9135 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 9145 - Posted: 16 Jan 2006, 18:12:18 UTC
Last modified: 16 Jan 2006, 18:14:13 UTC

Just as feedback, for consideration, I didn't have the "Leave in memory" option enabled until just now, and sofar had 2 errors in 50 WUs on my XP test machine:

host128426

On the other hand, I've had 2 errors in 4 WUs with a Linux box sofar (which has marginal under-spec'ed hardware vs R@H recommended hw, i.e. only 256MB RAM, but except for BOINC it's mostly idle), but it's probably the WUs themselves at fault, bec they failed in several other PCs as well:

err wu1
err wu2

Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 9145 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Stites

Send message
Joined: 17 Sep 05
Posts: 25
Credit: 1,837,114
RAC: 0
Message 9151 - Posted: 16 Jan 2006, 20:05:35 UTC - in response to Message 9118.  

I don't leave mine in memory and I run Rosetta just fine.


If just fine means it completes SOME work units OK, I guess you're right.

Your Athlon X2 (host id 765) has nearly as many WUs with client errors as it has successes - I wonder how many of those are due to you not leaving them in memory.

Probably none.
David Stites
Mount Vernon, WA USA

ID: 9151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Gelvin
Avatar

Send message
Joined: 7 Oct 05
Posts: 65
Credit: 10,612,039
RAC: 0
Message 10784 - Posted: 15 Feb 2006, 17:03:51 UTC


ID: 10784 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Andrew

Send message
Joined: 19 Sep 05
Posts: 162
Credit: 105,512
RAC: 0
Message 10785 - Posted: 15 Feb 2006, 17:18:07 UTC
Last modified: 15 Feb 2006, 17:22:26 UTC

David Baker has just posted that the new client is to be released "later this week" in this post.

Of course, what this upcoming release addresses is to be seen. :)

EDIT: found another post with more info.
ID: 10785 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 10796 - Posted: 16 Feb 2006, 1:35:45 UTC - in response to Message 10784.  
Last modified: 16 Feb 2006, 1:41:38 UTC

ID: 10796 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 10832 - Posted: 17 Feb 2006, 2:44:15 UTC

I wonder how exactly the process of "removing app from memory" is handled by BOINC and science app? I guess Rosetta would lose some of its temporary results, since its last "checkpoint"? (writing temporary results to disk every x minutes or y progress?) when it's pre-empted and removed from memory. It probably won't be able to save its work up to the last sec of computation.

In short, I've had zero trouble with the "leave apps in mem when pre-empted" (even with marginal hosts, as outlined below, btw that Linux w/256M RAM has over 110 processes running) and wonder how much overhead I'd pay by returning to "removing apps".

I know I could look at the source of some open-source science app like SETI, and asking this in the BOINC forums, but ... I thought I'd ask here too :-)
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 10832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 10833 - Posted: 17 Feb 2006, 3:20:25 UTC - in response to Message 10832.  
Last modified: 17 Feb 2006, 3:26:09 UTC

I wonder how exactly the process of "removing app from memory" is handled by BOINC and science app? I guess Rosetta would lose some of its temporary results, since its last "checkpoint"? (writing temporary results to disk every x minutes or y progress?) when it's pre-empted and removed from memory. It probably won't be able to save its work up to the last sec of computation....


You are correct. If the application is removed from memory during an application swap, it will loose the work performed since the last checkpoint. In the case of rosetta the checkpoints occur each time the percentage advances. Since is takes nominally 90-120 min between 10% checkpoints, if you do not keep the application in memory, and you set the swap interval to less that the time it takes your machine to reach a 10% mark, the work units can appear to be 'hung". This is why the recommended configuration is to set "Keep applications in memory" to "YES". As an added protection, it is also recommended to set a value of nominally 120 min between application swaps. This is of course a "belt and suspenders" approach to the issue. Either of these setting alone has been shown to reduce the problem.

It should be noted that ALL BOINC projects suffer from loss of CPU cycles if the applications are not kept in memory. Any work that is not save at a checkpoint before a swap, is lost when a swap occurs. On applications like Rosetta and Climate prediction this loss is significant (15 min to over an hour). On projects like Predictor, SETI and Einstein, the loss is less but it is still there (usually 60 seconds using the default setting for writing to disk). People who wish to squeeze every cycle out of their machine, usually keep applications in memory for this reason.

It is important to note that the memory we are talking about is not actual RAM but in fact is virtual memory (on disk), so the actuaal impact of this is not that significant. In addition you can adjust how much memory is used by the application in this regard.

With your permission I would like to add your question and the answer to the FAQs, and credit it to you and your team.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 10833 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 10847 - Posted: 17 Feb 2006, 14:44:39 UTC
Last modified: 17 Feb 2006, 14:48:15 UTC

It is important to note that the memory we are talking about is not actual RAM but in fact is virtual memory (on disk), so the actuaal impact of this is not that significant. In addition you can adjust how much memory is used by the application in this regard.


Note that it is indeed the actual RAM, only the system can decide to move
RAM to the swap space

... and then ... some apps (eg: simap) ends with no finished file when moved
to swap ...

Date Host Project ID Message
2/17/2006 10:48:50 AM crobertp.cp3 boincsimap 2825 Result 200601277.018731_1 exited with zero status but no 'finished' file
2/17/2006 10:48:50 AM crobertp.cp3 boincsimap 2826 If this happens repeatedly you may need to reset the project.
2/17/2006 10:48:50 AM crobertp.cp3 --- 2827 request_reschedule_cpus: process exited
2/17/2006 10:48:50 AM crobertp.cp3 boincsimap 2828 Restarting result 200601277.018731_1 using simap version 507

Leaving RAM filled with stopped tasks, is not a very good idea ...
Ideally all apps should checkpoint , and then exit or suspend into ram

How about reboot / power losses ? Is the ram keept accross reboots ?
I believe is not !!!




Click signature for global team stats
ID: 10847 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 10870 - Posted: 18 Feb 2006, 5:27:28 UTC - in response to Message 10847.  
Last modified: 18 Feb 2006, 5:30:16 UTC

It is important to note that the memory we are talking about is not actual RAM but in fact is virtual memory (on disk), so the actuaal impact of this is not that significant. In addition you can adjust how much memory is used by the application in this regard.


Note that it is indeed the actual RAM, only the system can decide to move
RAM to the swap space

... and then ... some apps (eg: simap) ends with no finished file when moved
to swap ...

Date Host Project ID Message
2/17/2006 10:48:50 AM crobertp.cp3 boincsimap 2825 Result 200601277.018731_1 exited with zero status but no 'finished' file
2/17/2006 10:48:50 AM crobertp.cp3 boincsimap 2826 If this happens repeatedly you may need to reset the project.
2/17/2006 10:48:50 AM crobertp.cp3 --- 2827 request_reschedule_cpus: process exited
2/17/2006 10:48:50 AM crobertp.cp3 boincsimap 2828 Restarting result 200601277.018731_1 using simap version 507

Leaving RAM filled with stopped tasks, is not a very good idea ...
Ideally all apps should checkpoint , and then exit or suspend into ram

How about reboot / power losses ? Is the ram keept accross reboots ?
I believe is not !!!



{color=red]A reboot or system crash is hardly comprable to an application swap. As for the RAM usage. If the actual RAM is required for use by another application the BOINC storage is placed in virtual memory on disk. If the system does not require it to be moved then it does stay in RAM. So the RAM is available if the system needs it.[/color]

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 10870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 10907 - Posted: 18 Feb 2006, 21:30:58 UTC - in response to Message 10833.  

With your permission I would like to add your question and the answer to the FAQs, and credit it to you and your team.


Sure, feel free to add/edit this Q (or any other post of mine) to the FAQ. Thx.

Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 10907 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Leave in Memory?



©2024 University of Washington
https://www.bakerlab.org