Setting for maximum HD space

Message boards : Number crunching : Setting for maximum HD space

To post messages, you must log in.

AuthorMessage
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 15615 - Posted: 6 May 2006, 17:55:46 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=18872518
Swap space 1692.22 MB
Total disk space 29.29 GB
Free Disk Space 8.53 GB
---
Use no more than 10 GB disk space
Leave at least 0.01 GB disk space free
Use no more than 50% of total disk space
Write to disk at most every 60 seconds
Use no more than 75% of total virtual memory

----

Generally, when I look at the memory usage on the machine itself, Rosetta is only claiming to use up around 20 megs. None of the partitions have less than 8 gigs free space - so did that WU really eat up the 8.52 gigs of HD space on the C: partition before erroring out?

----
5/2/2006 5:40:48 PM|rosetta@home|Aborting result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0: exceeded disk limit: 100308693.000000 > 100000000.000000
5/2/2006 5:40:48 PM|rosetta@home|Unrecoverable error for result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0 (Maximum disk usage exceeded)

From the message log, I see that it's whining about going over 100 megs. Where did it get this value from, since I can't see that representing the settings I've chosen for Boinc&Rosetta.
____________
Is this a change in the way Rosetta handles HD space, or have I just set the Boinc settings wrong for allowing Rosetta to use all but 10 megs of my free hard drive space? I just don't see how 8.6gigs (what I thought I'd set it up to use at the max) and 100megs (what Rosetta picked as the max HD space to use) are equal. After all.. it's eating up more Ram than 100 Megs.
ID: 15615 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 3,752,853
RAC: 1,732
Message 15654 - Posted: 7 May 2006, 16:00:23 UTC - in response to Message 15615.  

Is this a change in the way Rosetta handles HD space, or have I just set the Boinc settings wrong for allowing Rosetta to use all but 10 megs of my free hard drive space?


I only have 2c to offer today. But I would just point out that the BOINC controls you have are for all projects. And then these are further divided based on the resource share. So, if R@H were 75% resource share, you're maximum would only be 3/4 of your preference. ...still doesn't explain the 100million bytes number listed.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15654 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 15668 - Posted: 7 May 2006, 23:07:39 UTC

I've only signed up for Rosetta - and have it set for 100%. 50% of 100% of 10 Gigs should be 5 Gigs, not 100 megs.

Is the memory usage code that was just released looking at HD space, or Ram space? i.e. is this a bug in which the client is looking at the wrong type of memory space used?


ID: 15668 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mnb

Send message
Joined: 15 Dec 05
Posts: 51
Credit: 64,820
RAC: 0
Message 15671 - Posted: 8 May 2006, 0:58:04 UTC
Last modified: 8 May 2006, 0:59:34 UTC

I had the same problem with this result: 19321664

07/05/2006 20:37:09|rosetta@home|Aborting result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_472_6802_0: exceeded disk limit: 101466227.000000 > 100000000.000000
07/05/2006 20:37:09|rosetta@home|Unrecoverable error for result JUMP_CLOSE_CHAINBREAK_ALLBARCODE_1q7sA_SAVE_ALL_OUT_472_6802_0 (Maximum disk usage exceeded)


06/05/2006 21:47:34||Memory: 1023.48 MB physical, 1.90 GB virtual
06/05/2006 21:47:34||Disk: 8.79 GB total, 3.68 GB free
-----
Use no more than 1 GB disk space
Leave at least 1 GB disk space free
Use no more than 50% of total disk space
Write to disk at most every 120 seconds
Use no more than 50% of total virtual memory


running Rosetta and SIMAP currently.
list of my results
ID: 15671 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 3,752,853
RAC: 1,732
Message 15672 - Posted: 8 May 2006, 2:06:11 UTC
Last modified: 8 May 2006, 2:13:14 UTC

Perhaps the disk space required for the more frequent checkpointing is taking more than expected in some cases??

I've added a post on Ralph asking them to have a look at this.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15672 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15683 - Posted: 8 May 2006, 11:56:53 UTC - in response to Message 15654.  

Is this a change in the way Rosetta handles HD space, or have I just set the Boinc settings wrong for allowing Rosetta to use all but 10 megs of my free hard drive space?


I only have 2c to offer today. But I would just point out that the BOINC controls you have are for all projects. And then these are further divided based on the resource share. So, if R@H were 75% resource share, you're maximum would only be 3/4 of your preference. ...still doesn't explain the 100million bytes number listed.

The disk usage is separate from the resource share. Resource share only applies to CPU usage
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 15962 - Posted: 11 May 2006, 21:25:55 UTC - in response to Message 15615.  

5/2/2006 5:40:48 PM|rosetta@home|Aborting result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0: exceeded disk limit: 100308693.000000 > 100000000.000000
5/2/2006 5:40:48 PM|rosetta@home|Unrecoverable error for result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0 (Maximum disk usage exceeded)

From the message log, I see that it's whining about going over 100 megs. Where did it get this value from, since I can't see that representing the settings I've chosen for Boinc&Rosetta.


I have some similar WUs that errored out complaining about 100MB of HD being exceeded:
One with 5.07: https://boinc.bakerlab.org/rosetta/result.php?resultid=19606413
One with 5.12: https://boinc.bakerlab.org/rosetta/result.php?resultid=19714283

There was plenty of HD space, and BOINC is set to use up to 100 Giga-Bytes. I have no idea where the 100 MByte limit came from.

I have seen the stdout.txt file grow to tens of MBytes, although I wasn't watching while these two WUs were crunching.
ID: 15962 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15976 - Posted: 12 May 2006, 0:14:23 UTC - in response to Message 15962.  

5/2/2006 5:40:48 PM|rosetta@home|Aborting result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0: exceeded disk limit: 100308693.000000 > 100000000.000000
5/2/2006 5:40:48 PM|rosetta@home|Unrecoverable error for result JUMPTEST_CLOSECHAINBREAKS_1tul__469_2429_0 (Maximum disk usage exceeded)

From the message log, I see that it's whining about going over 100 megs. Where did it get this value from, since I can't see that representing the settings I've chosen for Boinc&Rosetta.


I have some similar WUs that errored out complaining about 100MB of HD being exceeded:
One with 5.07: https://boinc.bakerlab.org/rosetta/result.php?resultid=19606413
One with 5.12: https://boinc.bakerlab.org/rosetta/result.php?resultid=19714283

There was plenty of HD space, and BOINC is set to use up to 100 Giga-Bytes. I have no idea where the 100 MByte limit came from.

I have seen the stdout.txt file grow to tens of MBytes, although I wasn't watching while these two WUs were crunching.


I think there is a limit on the size of your error file. I would have to look this up, but 100MB sounds right.]

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15976 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 15980 - Posted: 12 May 2006, 1:15:09 UTC

If that's the source of the error message, it would be nice for the error to state which file is greater than 100 megs, (wait.. 100,000,000 bytes is not 100 megs!)
ID: 15980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 16002 - Posted: 12 May 2006, 3:37:27 UTC - in response to Message 15976.  

I think there is a limit on the size of your error file. I would have to look this up, but 100MB sounds right.


Well, stderr.txt was only a few lines (you can see it listed in the result links). I guess the 100,000,000 Byte limit applies to stdout.txt as well. I looked at the .xml files and it looks like the limit is specified by the WU itself. There seems to be an "rsc_disk_bound" value that's set to 100000000.
ID: 16002 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 3,752,853
RAC: 1,732
Message 16062 - Posted: 12 May 2006, 15:37:29 UTC - in response to Message 15980.  

If that's the source of the error message, it would be nice for the error to state which file is greater than 100 megs, (wait.. 100,000,000 bytes is not 100 megs!)

I believe the WU limit is for, well, the WU. So no one file throws it over the limit. It is all of them collectively. And I was thinking perhaps the checkpoint files count as well, and since that's the new player here, that was why I brought it up. Perhaps the checkpoint data is what's throwing it over the size limit.

AMD: did the stdout look like good info? Or more like a loop of repeating messages? Or, actually, back to the project folks, if these results are meaningful and it was just a large number of models produced or a large protein, then perhaps there are cases where the disk space limit needs to be increased. If the results were caused by some sort of loop, then obviously a fix is needed.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 16062 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 16083 - Posted: 12 May 2006, 19:15:01 UTC

AMD: did the stdout look like good info? Or more like a loop of repeating messages?


It turns out I still have a window open where I did some file listings, and scrolling back I see the stdout.txt file for one of those WUs at 70MB. So it was definitely the stdout.txt file that caused the problem. Unfortunetly, I didn't notice at the time and the file is long gone.

I did take a snapshot of some large stdout.txt files. One was 20MB at 5 hours of crunching. (I use a 10 hour crunch time.) It contained mostly:

...
set_omega:: move not allowed: 81
set_phi:: move not allowed: 81
set_psi:: move not allowed: 81
set_omega:: move not allowed: 81
set_phi:: move not allowed: 81
set_psi:: move not allowed: 81
set_omega:: move not allowed: 81
set_phi:: move not allowed: 81
set_psi:: move not allowed: 81
set_omega:: move not allowed: 81
set_phi:: move not allowed: 81
set_psi:: move not allowed: 81
...

And so on.


Some other stdout.txt files, around 8MB in size, contained stuff like:

...
Searching for dat file: ./1tul.dat
Searching for dat file: ./1tul.dat
WARNING!! .dat file not found!
Looking for fasta file: ./1tul_.fasta
[T/F OPT]Default FALSE value for [-find_disulf]
[T/F OPT]Default FALSE value for [-fix_disulf]
[T/F OPT]New TRUE value for [-n]
[T/F OPT]New TRUE value for [-n]
[STR OPT]New value for [-n] 1tul.pdb.
[T/F OPT]Default FALSE value for [-use_native_centroid]
WARNING:: end of pdb file reached: angle, secstruct, & res info not found
Looking for dssp file: 1tul.dssp
dssp file not found
calculating secondary structure from torsion angles
fragment file: ./aa1tul_03_05.200_v1_3.gz
Total Residue 102
frag size: 3 frags/residue: 200
fragment file: ./aa1tul_09_05.200_v1_3.gz
Total Residue 102
frag size: 9 frags/residue: 200
generating 1mer library from 3mer library
[T/F OPT]Default FALSE value for [-ssblocks]
[T/F OPT]Default FALSE value for [-check_homs]
[T/F OPT]New TRUE value for [-barcode_mode]
[INT OPT]New value for [-barcode_mode] 3
[T/F OPT]Default FALSE value for [-increment_barcode]
[T/F OPT]New TRUE value for [-barcode_file]
[STR OPT]New value for [-barcode_file] allbarcodes09.bar.
Feature: PERMUTE
Flavor 0.0394769
Flavor 6e-05
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 77 fval1= 84 fval2= 18 fval3= 25
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 18 fval1= 25 fval2= 36 fval3= 43
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 36 fval1= 43 fval2= 63 fval3= 72
Flavor 6e-05
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 77 fval1= 84 fval2= 18 fval3= 25
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 18 fval1= 25 fval2= 36 fval3= 43
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 36 fval1= 43 fval2= 63 fval3= 72
Flavor 6e-05
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 77 fval1= 84 fval2= 18 fval3= 25
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 18 fval1= 25 fval2= 36 fval3= 43
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 36 fval1= 43 fval2= 63 fval3= 72
Flavor 6e-05
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 77 fval1= 84 fval2= 18 fval3= 25
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 18 fval1= 25 fval2= 36 fval3= 43
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 36 fval1= 43 fval2= 63 fval3= 72
Flavor 6e-05
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 77 fval1= 84 fval2= 18 fval3= 25
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 18 fval1= 25 fval2= 36 fval3= 43
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 36 fval1= 43 fval2= 63 fval3= 72
Flavor 6e-05
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 56 fval1= 61 fval2= 94 fval3= 102
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 94 fval1= 102 fval2= 5 fval3= 13
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 5 fval1= 13 fval2= 77 fval3= 84
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 77 fval1= 84 fval2= 18 fval3= 25
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 18 fval1= 25 fval2= 36 fval3= 43
barcode_cst: torsion= 17 residue= 0 cval= - ival= 2 fval= 36 fval1= 43 fval2= 63 fval3= 72
Flavor 6e-05
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 56 fval1= 61 fval2= 5 fval3= 13
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 5 fval1= 13 fval2= 94 fval3= 102
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 94 fval1= 102 fval2= 77 fval3= 84
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 77 fval1= 84 fval2= 18 fval3= 25
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 18 fval1= 25 fval2= 36 fval3= 43
barcode_cst: torsion= 17 residue= 0 cval= - ival= 1 fval= 36 fval1= 43 fval2= 63 fval3= 72
...

And so on, with those 7 lines repeated with minor variations.

Those WUs completed, though.
ID: 16083 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Setting for maximum HD space



©2020 University of Washington
https://www.bakerlab.org