Problems with Minirosetta v1.54

Message boards : Number crunching : Problems with Minirosetta v1.54

To post messages, you must log in.

Previous · 1 . . . 12 · 13 · 14 · 15

AuthorMessage
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 60927 - Posted: 30 Apr 2009, 16:19:58 UTC

Robert, thanks for the comments. I have plenty of memory, but for 1/3 of the day I actually use it for a number of work applications and with the new increase in memory used by mini, I'm testing to see if BOINC is the cause of some sluggish behavior on my machine. Indeed it seems to be the case.

Yes, by HT, I meant hyperthreaded. But I believe setting number of CPUs to one on a machine configured with HT active would cut my credit roughly in half. I'd think that the other analysis you've read is comparing a machine with HT enabled running 2 tasks at a time, with the same machine with HT disabled running 1. Since my HT is enabled, running 2 tasks is the only way to break even. But yes, one option would be to disable HT, then I'd be focusing all the resource on one task at a time, and not have the desire to support memory enough for two tasks.

I was just trying to point out that 6.6.20 seems to be removing tasks from memory in some cases, even when configured to leave tasks in memory. And this can lead to cancelled WUs such as I reported. I wasn't limiting memory on my prior version of BOINC, so am unsure if this is new behavior or not.

I just saw another task suspended waiting for memory, but this time it remained in the task list. Could be BOINC saw it had 3 hours invested in it and didn't want to throw it away. I believe the tasks that are getting removed are actually only running for a couple of minutes.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 60927 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 466
Message 60929 - Posted: 30 Apr 2009, 21:15:55 UTC - in response to Message 60922.  

Hello !
I´ve difficulties to load up my last crunched file with version 1.54. I get repeatedly

I'm getting the same type of messages to
5/1/2009 8:51:54 AM rosetta@home Temporarily failed upload of less_careful_inward0.05_ub0.2_lb0.07_maxdist20_chunk_0_8_hb_t297__IGNORE_THE_REST_1VYHA_4_11644_1_0_0: HTTP error
5/1/2009 8:51:54 AM rosetta@home Backing off 2 hr 52 min 32 sec on upload of less_careful_inward0.05_ub0.2_lb0.07_maxdist20_chunk_0_8_hb_t297__IGNORE_THE_REST_1VYHA_4_11644_1_0_0
5/1/2009 8:51:54 AM rosetta@home Started upload of less_careful_inward0.05_ub0.2_lb0.07_maxdist20_chunk_0_8_hb_t297__IGNORE_THE_REST_1FXWF_6_11644_1_0_0
5/1/2009 8:51:56 AM Internet access OK - project servers may be temporarily down.
5/1/2009 8:51:59 AM rosetta@home Finished upload of less_careful_inward0.05_ub0.2_lb0.07_maxdist20_chunk_0_8_hb_t297__IGNORE_THE_REST_1FXWF_6_11644_1_0_0
5/1/2009 8:52:53 AM Project communication failed: attempting access to reference site
5/1/2009 8:52:53 AM rosetta@home Temporarily failed upload of less_careful_inward0.05_ub0.2_lb0.07_maxdist20_chunk_0_8_hb_t297__IGNORE_THE_REST_1VJGA_4_11644_1_0_0: HTTP error
5/1/2009 8:52:53 AM rosetta@home Backing off 12 min 18 sec on upload of less_careful_inward0.05_ub0.2_lb0.07_maxdist20_chunk_0_8_hb_t297__IGNORE_THE_REST_1VJGA_4_11644_1_0_0
5/1/2009 8:52:55 AM Internet access OK - project servers may be temporarily down.

Should I abort these transfers? I will wait for further instructios before I do anything to these.
Have a crunching good day!!
ID: 60929 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,806,125
RAC: 3,336
Message 60930 - Posted: 30 Apr 2009, 21:26:28 UTC - in response to Message 60927.  

Robert, thanks for the comments. I have plenty of memory, but for 1/3 of the day I actually use it for a number of work applications and with the new increase in memory used by mini, I'm testing to see if BOINC is the cause of some sluggish behavior on my machine. Indeed it seems to be the case.

Yes, by HT, I meant hyperthreaded. But I believe setting number of CPUs to one on a machine configured with HT active would cut my credit roughly in half. I'd think that the other analysis you've read is comparing a machine with HT enabled running 2 tasks at a time, with the same machine with HT disabled running 1. Since my HT is enabled, running 2 tasks is the only way to break even. But yes, one option would be to disable HT, then I'd be focusing all the resource on one task at a time, and not have the desire to support memory enough for two tasks.

I was just trying to point out that 6.6.20 seems to be removing tasks from memory in some cases, even when configured to leave tasks in memory. And this can lead to cancelled WUs such as I reported. I wasn't limiting memory on my prior version of BOINC, so am unsure if this is new behavior or not.

I just saw another task suspended waiting for memory, but this time it remained in the task list. Could be BOINC saw it had 3 hours invested in it and didn't want to throw it away. I believe the tasks that are getting removed are actually only running for a couple of minutes.


Do you have enough free disk space to allow BOINC enough space to increase the swap space it can use to store any partly completed work in a way that allows resuming it where it was interrupted? That way, BOINC could simply switch to helping projects with lower memory requirements while you need more memory for something else; for example, the POEM@HOME project requires less memory, but helps an earlier step in medical research. That way, the suspended tasks will move off of the list of tasks currently running, but in a way that lets them move back onto this list and at the point of interruption later, instead of being dropped entirely. Such tasks will need to go back to the last checkpoint if you reboot for any reason, though. If you prefer to run mainly Rosetta@home, just keep the percentage of your CPU time assigned to these lower memory requirement projects less than the percentage of your CPU time you actually need to run with lower memory requirements. Also, insuring that there is enough swap space for all the projects BOINC tries to keep running at once allows you to suspend all BOINC projects at once if you need to run something with even more requirements. It seems that the defaults for the amount of swap space BOINC is allowed to use aren't good enough if you attach to enough BOINC projects at once, and even one of them is as memory-hungry as Rosetta@home.

http://boinc.fzk.de/poem/

Also, turning off one of a pair of hyperthreaded CPUs shouldn't cause you to get only half the credits, since it then allows you to run the other one at full speed, instead of at barely more than half the full speed. It would, however, give you only half the credits if you actually had two fully independent CPU cores instead of a hyperthreaded pair, or if you use an older version of BOINC that isn't aware that it needs to keep track of CPU core sharing between hyperthreaded pairs.

If your main concern is credits for helping medical research and you happen to have one of the newer graphics boards GPUGRID can use (mainly recent Nvidia cards), consider adding GPUGRID to your list of BOINC projects. It will require switching to the newest version of BOINC I've read about, but then can run workunits on your graphics card instead of on your CPUs. Shouldn't interfere with your regular computer use if it isn't graphics-intensive.

http://www.gpugrid.net/

Also, check if that web site I gave mentions how much memory your machine can handle and what the price is. I spent only about $50 (US) to reach the maximum amount this computer can use, but that did have me as the person who installed the new and faster memory.
ID: 60930 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60931 - Posted: 30 Apr 2009, 21:32:24 UTC

Speedy, no don't abort them. I'm sure the problem with uploads must be related to the current problems with getting credit issued. When the back end file system is having problems, everything is having problems to some degree or another.
Rosetta Moderator: Mod.Sense
ID: 60931 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 466
Message 60932 - Posted: 30 Apr 2009, 21:40:26 UTC - in response to Message 60931.  

Speedy, no don't abort them. I'm sure the problem with uploads must be related to the current problems with getting credit issued. When the back end file system is having problems, everything is having problems to some degree or another.

Thank you. All my results that need to be uploaded have just been uploaded. All is good at my end. Thank you for your continued hard work.
Have a crunching good day!!
ID: 60932 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 60933 - Posted: 30 Apr 2009, 21:42:18 UTC

Robert, yes I've had all the same thoughts, and have plenty of disk allowed to BOINC, and to my swap file. But am finding that BOINC isn't smart enough to realize which projects require less memory. It cycles through all the work you currently have for the project it wants to repay debt to, and only after it gets about 2 minutes in to every single downloaded Rosetta task will it try to run a 10MB WCG rice task. But if I don't happen to have any WCG work, it isn't smart enough to think about getting some rather then leaving a CPU idle.

I'd love if it were smart enough to run one Rosetta and one rice during the day when I'm using the machine, and then run dual Rosetta tasks at night when my machine is idle and I allow more memory to BOINC. But it's just not smart enough to do so without major manual adjustments.

I could keep a larger cache of work, and therefore help assure I always have something from each project, but then it would cycle through 10 Rosetta tasks, running each for 2 minutes, rather then just 6.

Hopefully with all the discussion on the client work fetch policies, something will shake out that will work better.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 60933 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 60934 - Posted: 30 Apr 2009, 21:48:23 UTC

Guess I'd never noticed BOINC allows you to configure the amount of swap space (I thought you meant size of Win page file). It was set to 75%, and Win task manager shows my "commit charge" to be 1477M/3397M. So does that mean my swap file is 3.4GB? And so BOINC is allowed over 2GB of swap space, but my entire system hasn't reached that much.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 60934 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,806,125
RAC: 3,336
Message 60935 - Posted: 30 Apr 2009, 21:52:27 UTC - in response to Message 60929.  

Hello !
I´ve difficulties to load up my last crunched file with version 1.54. I get repeatedly

I'm getting the same type of messages to
5/1/2009 8:51:54 AM rosetta@home Temporarily failed upload of less_careful_inward0.05_ub0.2_lb0.07_maxdist20_chunk_0_8_hb_t297__IGNORE_THE_REST_1VYHA_4_11644_1_0_0: HTTP error
...
5/1/2009 8:52:53 AM Project communication failed: attempting access to reference site
5/1/2009 8:52:53 AM rosetta@home Temporarily failed upload of less_careful_inward0.05_ub0.2_lb0.07_maxdist20_chunk_0_8_hb_t297__IGNORE_THE_REST_1VJGA_4_11644_1_0_0: HTTP error
5/1/2009 8:52:53 AM rosetta@home Backing off 12 min 18 sec on upload of less_careful_inward0.05_ub0.2_lb0.07_maxdist20_chunk_0_8_hb_t297__IGNORE_THE_REST_1VJGA_4_11644_1_0_0
5/1/2009 8:52:55 AM Internet access OK - project servers may be temporarily down.

Should I abort these transfers? I will wait for further instructios before I do anything to these.


If the transfers aren't too close to their deadlines, I'd just let BOINC keep trying. I've had workunits upload successfully after getting similar messages for days, when router problems kept me from reaching the internet at all for several days. However, it's occasionally useful in such circumstances to first start viewing the Rosetta@home web site to make sure the connection is open,
then without closing your browser, start the BOINC manager program if it isn't already running, click on Advanced View if the simplified view appears first, then click on the Transfers tab, click on Advanced, then click on Do network communication in order to make it retry the communications while your connection to the internet is still open.

For some BOINC projects, even returning the results after their deadlines is useful, if you manage to return the results before anyone else does for the same workunit. Not all BOINC projects allow this, though.
ID: 60935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,806,125
RAC: 3,336
Message 60948 - Posted: 2 May 2009, 9:55:40 UTC - in response to Message 60934.  

Guess I'd never noticed BOINC allows you to configure the amount of swap space (I thought you meant size of Win page file). It was set to 75%, and Win task manager shows my "commit charge" to be 1477M/3397M. So does that mean my swap file is 3.4GB? And so BOINC is allowed over 2GB of swap space, but my entire system hasn't reached that much.


At least some versions of Windows automatically expand the swap space if BOINC is allowed to use a large enough fraction of it to come close enough to the amount already provided. I'd expect the name page file to be what some people call the swap file.

I've set up my machines to start up with the swap file size already set to 30 GB, with no sign of coming close to that limit. That doesn't allow any further expansion, but should keep the disk head from needing to move very far when going from one place in the swap file to another.

I have seen signs that BOINC divides the available swap space equally among either the active slots or all the enabled BOINC projects before deciding how much to give to each workunit, and does not adjust this based on how much memory each BOINC project is expected to require. For that reason, if you have enough free disk space, allowing both the swap file and the disk space for each workunit to be significantly more than the average required is helpful for the applications with high requirements, such as minirosetta.
ID: 60948 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 12 · 13 · 14 · 15

Message boards : Number crunching : Problems with Minirosetta v1.54



©2024 University of Washington
https://www.bakerlab.org