Rosetta needs 6675.72 MB RAM: is the restriction really needed?

Message boards : Number crunching : Rosetta needs 6675.72 MB RAM: is the restriction really needed?

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6

AuthorMessage
Kissagogo27

Send message
Joined: 31 Mar 20
Posts: 69
Credit: 1,737,845
RAC: 1,858
Message 101891 - Posted: 23 May 2021, 8:46:06 UTC

not yet, see it in 12 hours ...
ID: 101891 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1833
Credit: 33,801,752
RAC: 8,094
Message 101896 - Posted: 23 May 2021, 20:04:09 UTC - in response to Message 101890.  

The original figure was 7*10^9 - 7 followed by 9 zeros.
Divide by 1024 twice to convert to Mb = 6675.72Mb

653095368 converts to 622.84Mb RAM
525204451 converts to 500.87Mb RAM
hi, i've got seven of "pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_" WU per 2GB computer

That's just what we need to hear. You're back in business.
Sorry it took so long.
ID: 101896 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1331
Credit: 13,624,788
RAC: 7
Message 101899 - Posted: 24 May 2021, 7:23:47 UTC - in response to Message 101896.  

The original figure was 7*10^9 - 7 followed by 9 zeros.
Divide by 1024 twice to convert to Mb = 6675.72Mb

653095368 converts to 622.84Mb RAM
525204451 converts to 500.87Mb RAM
hi, i've got seven of "pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_" WU per 2GB computer

That's just what we need to hear. You're back in business.
Sorry it took so long.
Just had a look at one of my systems and all Tasks bar one are presently pre_helical_bundles_round1_ type, and out of all of them, only two have the reduced memory/disk values.
So it looks like the large value Tasks are still well & truly in the majority at this stage.
Grant
Darwin NT
ID: 101899 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1833
Credit: 33,801,752
RAC: 8,094
Message 101911 - Posted: 24 May 2021, 22:49:55 UTC - in response to Message 101899.  

The original figure was 7*10^9 - 7 followed by 9 zeros.
Divide by 1024 twice to convert to Mb = 6675.72Mb

653095368 converts to 622.84Mb RAM
525204451 converts to 500.87Mb RAM
hi, i've got seven of "pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_" WU per 2GB computer

That's just what we need to hear. You're back in business.
Sorry it took so long.
Just had a look at one of my systems and all Tasks bar one are presently pre_helical_bundles_round1_ type, and out of all of them, only two have the reduced memory/disk values.
So it looks like the large value Tasks are still well & truly in the majority at this stage.

Urghh, yes. Only 7 out of 50 here atm with reduced RAM settings, though some other task-types are beginning to come down now.
Which is likely to explain why In Progress has dropped again to 425k.
Very frustrating because we were looking close to a long-term solution for a little while
ID: 101911 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1833
Credit: 33,801,752
RAC: 8,094
Message 101913 - Posted: 25 May 2021, 1:03:23 UTC - in response to Message 101899.  

The original figure was 7*10^9 - 7 followed by 9 zeros.
Divide by 1024 twice to convert to Mb = 6675.72Mb

653095368 converts to 622.84Mb RAM
525204451 converts to 500.87Mb RAM
hi, i've got seven of "pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_" WU per 2GB computer
That's just what we need to hear. You're back in business.
Sorry it took so long.
Just had a look at one of my systems and all Tasks bar one are presently pre_helical_bundles_round1_ type, and out of all of them, only two have the reduced memory/disk values.
So it looks like the large value Tasks are still well & truly in the majority at this stage.

I've fed back that the reduction in RAM setting seems to have been very successful, in that I'm not aware of any crashes occurring as a result and hosts with only 2Gb RAM available have been able to download and run them successfully, and asked for more tasks to be modified to call for less RAM, on the assumption only a small sample number seem to have had their setting changed in order to confirm they run ok.
And also to set it up as a daily-running job if that's what needs to be done.
And I've added that the same change can be tried on other task types as well, if possible.
ID: 101913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1833
Credit: 33,801,752
RAC: 8,094
Message 101917 - Posted: 25 May 2021, 20:12:05 UTC - in response to Message 101913.  

The original figure was 7*10^9 - 7 followed by 9 zeros.
Divide by 1024 twice to convert to Mb = 6675.72Mb

653095368 converts to 622.84Mb RAM
525204451 converts to 500.87Mb RAM
hi, i've got seven of "pre_helical_bundles_round1_attempt1_SAVE_ALL_OUT_IGNORE_THE_REST_" WU per 2GB computer
That's just what we need to hear. You're back in business.
Sorry it took so long.
Just had a look at one of my systems and all Tasks bar one are presently pre_helical_bundles_round1_ type, and out of all of them, only two have the reduced memory/disk values.
So it looks like the large value Tasks are still well & truly in the majority at this stage.

I've fed back that the reduction in RAM setting seems to have been very successful, in that I'm not aware of any crashes occurring as a result and hosts with only 2Gb RAM available have been able to download and run them successfully, and asked for more tasks to be modified to call for less RAM, on the assumption only a small sample number seem to have had their setting changed in order to confirm they run ok.
And also to set it up as a daily-running job if that's what needs to be done.
And I've added that the same change can be tried on other task types as well, if possible.

Once a technical issue is resolved, the balance of pre_helical_bundles tasks will be amended. Hopefully won't be too long.

In the meantime, my main PC seems to have had an 'episode' today and crashed out my entire Rosetta cache, to be replaced by 240+ WCG tasks, so I won't have much idea what's going on for a little while
ID: 101917 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1331
Credit: 13,624,788
RAC: 7
Message 101927 - Posted: 26 May 2021, 9:01:19 UTC - in response to Message 101917.  

In the meantime, my main PC seems to have had an 'episode' today and crashed out my entire Rosetta cache, to be replaced by 240+ WCG tasks, so I won't have much idea what's going on for a little while
Something rather odd happened there- it went to start a Task, but a file was missing from the folder.

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
couldn't start app: Input file database_357d5d93529_n_methyl.zip missing or invalid: file missing</message>
]]>



Then a whole bunch of failed downloads when it tried to download a copy of the missing file, but couldn't find it.

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
  <file_name>database_357d5d93529_n_methyl.zip</file_name>
  <error_code>-120 (RSA key check failed for file)</error_code>
  <error_message>signature verification failed</error_message>
</file_xfer_error>
</message>
]]>

Grant
Darwin NT
ID: 101927 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1833
Credit: 33,801,752
RAC: 8,094
Message 101934 - Posted: 26 May 2021, 23:57:55 UTC - in response to Message 101927.  

In the meantime, my main PC seems to have had an 'episode' today and crashed out my entire Rosetta cache, to be replaced by 240+ WCG tasks, so I won't have much idea what's going on for a little while
Something rather odd happened there- it went to start a Task, but a file was missing from the folder.

Is that one of my error messages?
Yeah, I saw that in my event log too. I think it's the main Rosetta database file that gets downloaded - used with everything.
It happened a 2nd time too. It crashes all the tasks but not the PC itself. Main problem is it results in a 24hr backoff that I have to interrupt.
I strongly suspect it's caused by my overclock running very high temps.
It isn't happening on any of my other devices, so I'm pretty sure it's specific to this one host. Ugh...
ID: 101934 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1331
Credit: 13,624,788
RAC: 7
Message 101952 - Posted: 28 May 2021, 20:48:54 UTC
Last modified: 28 May 2021, 21:15:54 UTC

In progress and Successes last 24hours numbers are the lowest they've been in over a week. Over the last day or so it's been all pre_helical_bundles_ Tasks.

I just had a look at what's on my system, and as near as i can tell all of them have the improved memory requirement values, but they all still have the extreme storage requirement values. So i suspect the cause for the drop in work being done (at least for now) is due to those storage values alone.


A quick check shows 2.8MB as the largest amount of disk space used by a Task in my Task list at this time, but the required value is still set at <rsc_disk_bound>9000000000.000000</rsc_disk_bound> (roughly 8.8GB). Given that the most storage space used by Rosetta on my 6c/12t system that has multiple versions of Rosetta & Mini Rosetta still there has been 2.5GB, so i think 2.75GB would be more than enough (certainly no more than 3GB).
That will let most people get work without needing to change their default Computing preferences, and not run in to actual lack of space issues.
(Edit- if the value only needs to reflect the storage space required by the Task itself and not Rosetta as a whole, then 100MB would still be an excessive requirement given the actual amounts used, but still way better than 2.75GB, which is still way better than 8+GB).

For those with extreme core count systems (24+) hopefully the owners will be smart enough to realise they'll need more resources than people with considerably less cores/threads if they want to use all of them and adjust their Computing preference settings accordingly.
Grant
Darwin NT
ID: 101952 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1331
Credit: 13,624,788
RAC: 7
Message 101954 - Posted: 29 May 2021, 9:37:56 UTC - in response to Message 101952.  
Last modified: 29 May 2021, 9:42:25 UTC

In progress and Successes last 24hours numbers are the lowest they've been in over a week. Over the last day or so it's been all pre_helical_bundles_ Tasks.

I just had a look at what's on my system, and as near as i can tell all of them have the improved memory requirement values, but they all still have the extreme storage requirement values. So i suspect the cause for the drop in work being done (at least for now) is due to those storage values alone.
Looks like that is the case.
Several new batches of work have come through with less than half the required disk space value of the pre_helical_bundles_ Tasks, and work In progress is on the rise again.
Grant
Darwin NT
ID: 101954 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1833
Credit: 33,801,752
RAC: 8,094
Message 102036 - Posted: 8 Jun 2021, 22:53:58 UTC - in response to Message 101934.  
Last modified: 8 Jun 2021, 23:02:43 UTC

In the meantime, my main PC seems to have had an 'episode' today and crashed out my entire Rosetta cache, to be replaced by 240+ WCG tasks, so I won't have much idea what's going on for a little while
Something rather odd happened there- it went to start a Task, but a file was missing from the folder.

Is that one of my error messages?
Yeah, I saw that in my event log too. I think it's the main Rosetta database file that gets downloaded - used with everything.
It happened a 2nd time too. It crashes all the tasks but not the PC itself. Main problem is it results in a 24hr backoff that I have to interrupt.
I strongly suspect it's caused by my overclock running very high temps.
It isn't happening on any of my other devices, so I'm pretty sure it's specific to this one host. Ugh...

Hi. I'm back.
So, I stopped to dismantle and clean out my whole computer case, fans etc, put it back together and... I lost all video output.
Monitor appeared to be working - reporting no signal being received.
I tried another monitor from my other PC, which has been out of commission for a few weeks but I never got round to looking at it, and the same error on the monitor received.
I feared something terminal had happened to my very old graphics card - a GTX750 from 2013.

Into the repair shop while I was due to be working away for a few days. My home one and my other one.
First report back - nothing wrong with either of them. So I took over both of my monitors before going away - could be something to do with them.

Both returned now.
5800X Ryzen - nothing wrong with it, nor the graphics card, nor the monitor, nor the DVI-D cable. No idea why anything went wrong.
i3-8350K - nothing wrong with it, nor the monitor (no graphics card - on-board UHD630 graphics). But, HDMI cable failed - no wonder the second monitor wouldn't work either.
Cheapest repair I ever had - the guy refused to take any money from me (but I gave him something anyway - I don't agree with freebies).

I'd asked if he had a second-hand graphics card I could swap in - something between a GTX1050 & GTX1650
He's given me a card and told me to try it to see if it's any good. He didn't know what it was (and neither did I) until I checked out the serial number on it. Turns out it's a Radeon 260X
Better than the GTX750 I've got, but not quite good enough to swap out, so I'll be trying it on the i3 as it's miles better than that one's onboard graphics.
All being well with installing it at the end of the week I'll pay for it, then ask him to look out for a decent 1050-1650 and give me a call if one turns up.
I don't have any great need for a graphics card but neither do I like the look of the pricing nor availability on a new one if this GTX750 does fail on me.

Also, if I'm forced to go back to using a crappy laptop for any length of time, like I have recently, I may have to shoot myself. I despise them
ID: 102036 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1833
Credit: 33,801,752
RAC: 8,094
Message 102037 - Posted: 8 Jun 2021, 23:19:11 UTC - in response to Message 102036.  

i3-8350K - nothing wrong with it, nor the monitor (no graphics card - on-board UHD630 graphics). But, HDMI cable failed

I've just remembered why this came up in this thread.

Video output was lost but that PC was running, so I brought it home to check it out - just never got round to it.
With a new monitor cable, I booted it up tonight and, as I originally suspected, I hadn't allocated sufficient RAM nor disk space to Boinc to run Rosetta tasks. And WCG tasks came down to start work straight away, which it's doing.
A quick change of settings in the way we've all now learned and it's grabbed its first Rosetta tasks since the end of March, having run its backup project in most of that time.

Confirming what I've suspected all along that non-techie users, or people who weren't bothered what they ran as long as their machines were working on something, may not have found a solution or not cared enough to change what had always worked before, so all those hosts may be permanently lost unless resource demands reduce on the server side by only asking tasks to demand what they require.

Anyway, one more host has returned.
ID: 102037 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1884
Credit: 6,014,214
RAC: 62
Message 102046 - Posted: 9 Jun 2021, 23:15:56 UTC - in response to Message 102037.  

i3-8350K - nothing wrong with it, nor the monitor (no graphics card - on-board UHD630 graphics). But, HDMI cable failed

I've just remembered why this came up in this thread.

Video output was lost but that PC was running, so I brought it home to check it out - just never got round to it.
With a new monitor cable, I booted it up tonight and, as I originally suspected, I hadn't allocated sufficient RAM nor disk space to Boinc to run Rosetta tasks. And WCG tasks came down to start work straight away, which it's doing.
A quick change of settings in the way we've all now learned and it's grabbed its first Rosetta tasks since the end of March, having run its backup project in most of that time.

Confirming what I've suspected all along that non-techie users, or people who weren't bothered what they ran as long as their machines were working on something, may not have found a solution or not cared enough to change what had always worked before, so all those hosts may be permanently lost unless resource demands reduce on the server side by only asking tasks to demand what they require.

Anyway, one more host has returned.


Next time install VNC on each pc so you can remote in from that crappy laptop and at least see if things are working, there are other ways too but that works for me.
ID: 102046 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1833
Credit: 33,801,752
RAC: 8,094
Message 102060 - Posted: 11 Jun 2021, 23:11:43 UTC - in response to Message 102046.  

i3-8350K - nothing wrong with it, nor the monitor (no graphics card - on-board UHD630 graphics). But, HDMI cable failed

I've just remembered why this came up in this thread.

[...]

Anyway, one more host has returned.

Next time install VNC on each pc so you can remote in from that crappy laptop and at least see if things are working, there are other ways too but that works for me.

Sounds like a good idea tbf, now I've looked up what it is.
Chances of me actually doing it, zero...
ID: 102060 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1331
Credit: 13,624,788
RAC: 7
Message 102061 - Posted: 12 Jun 2021, 0:36:52 UTC - in response to Message 102037.  
Last modified: 12 Jun 2021, 0:43:55 UTC

..., as I originally suspected, I hadn't allocated sufficient RAM nor disk space to Boinc to run Rosetta tasks. And WCG tasks came down to start work straight away, which it's doing.
A quick change of settings in the way we've all now learned and it's grabbed its first Rosetta tasks since the end of March, having run its backup project in most of that time.

Confirming what I've suspected all along that non-techie users, or people who weren't bothered what they ran as long as their machines were working on something, may not have found a solution or not cared enough to change what had always worked before, so all those hosts may be permanently lost unless resource demands reduce on the server side by only asking tasks to demand what they require.
It's looking like that is the case.
I've got just the odd pre_helical_bundles_ Task now, and the ones i've seen all have the reduced RAM requirements (from their previous extreme values).
But they're still way more than the Tasks ever need, and that is the case for the other Task types as well. And with the disk requirements still way more than has ever been used it looks like the project has pretty much lost all of the more RAM or disk space limited systems. Systems that are capable of doing the work, but due to the excessive Task requirement values they've gone from 550k systems down to around 440k, which means they've lost roughly 20% of their compute resources.

I've never had more than 2.5GB of disk space being used for Rosetta. Apart from the RB Tasks i've never noticed any other Task type use more than 1GB of RAM (Some using 800MB, many using 600MB, 400MB and even only 200MB). But because of the high configuration values for Tasks that don't actually use anywhere near those amounts, the project is down 20% of its compute capacity.

Edit- I suspect it's the disk values having the biggest impact now the RAM requirements have been reduced from their previous high, but not knowing what the default RAM & disk space values are for BOINC makes it pretty much impossible to make it nothing but a wild arse guess.
Grant
Darwin NT
ID: 102061 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1884
Credit: 6,014,214
RAC: 62
Message 102067 - Posted: 12 Jun 2021, 21:36:08 UTC - in response to Message 102060.  

i3-8350K - nothing wrong with it, nor the monitor (no graphics card - on-board UHD630 graphics). But, HDMI cable failed

I've just remembered why this came up in this thread.

[...]

Anyway, one more host has returned.

Next time install VNC on each pc so you can remote in from that crappy laptop and at least see if things are working, there are other ways too but that works for me.


Sounds like a good idea tbf, now I've looked up what it is.
Chances of me actually doing it, zero...


I do it to cut down on the monitors, keyboards and mice in my computer room, it also means fewer times getting up to go check that pc over there
ID: 102067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1833
Credit: 33,801,752
RAC: 8,094
Message 102072 - Posted: 14 Jun 2021, 20:44:27 UTC - in response to Message 102061.  

Edit- I suspect it's the disk values having the biggest impact now the RAM requirements have been reduced from their previous high, but not knowing what the default RAM & disk space values are for BOINC makes it pretty much impossible to make it nothing but a wild arse guess.

Pretty sure you're right though.
Don't remember whether I mentioned this before, but some time last year the Rosetta admin reorganised where the downloaded files were placed and called from on our local machines, reducing the storage space required by Rosetta (and bandwidth on downloads, no doubt) which caused some of us to reduce our allocation for disk space, which subsequently made it worse when the recent files started calling for lots more. That's certainly where I fell down anyway.

The changes that've been made up to now are just to RAM and that proxy I was using of WiP doesn't seem to be reflecting that any more.
Maybe it's the Disk demands that are stopping it now. I'm wondering whether to dredge the subject up at this late stage or let it ride now people are used to it
ID: 102072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Administrator

Send message
Joined: 23 Oct 14
Posts: 1
Credit: 31,591
RAC: 0
Message 102164 - Posted: 3 Jul 2021, 14:21:03 UTC
Last modified: 3 Jul 2021, 14:21:35 UTC

good job!!
ID: 102164 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6

Message boards : Number crunching : Rosetta needs 6675.72 MB RAM: is the restriction really needed?



©2022 University of Washington
https://www.bakerlab.org