Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 203 · 204 · 205 · 206 · 207 · 208 · 209 . . . 309 · Next

AuthorMessage
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 196
Credit: 6,613,600
RAC: 5,541
Message 106010 - Posted: 24 Apr 2022, 19:24:37 UTC - in response to Message 105989.  

I run Linux and have never run out of disk space (because spinning hard drives are now so big and so cheap). I have about 512 GBytes of ssd, and two 4-Terabyte spinning hard drives.
But on Linux, you can find out how your disk space is being used very easily. Here is what is in my /var/lib /boinc directory and everything under it. To keep from boring you, I printed out only the first 24 lines. The numbers are in 1024-byte blocks. Right now, I have only universe and rosetta tasks running on my machine. So I seem to be using about 2.37 GigaBytes of disk space in that partition that is sized at about 500 GigaBytes of size. When I have a lot of ClimatgeaPrediction tasks and WCG tasks, I use a lot more, but even then, I come nowhere close to using it all.
[/var/lib/boinc]$ du . | sort -nr | head -n 24
2373204	.
2282044	./projects
1763172	./projects/boinc.bakerlab.org_rosetta
996448	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database
996448	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl
454540	./projects/www.worldcommunitygrid.org
310248	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/chemical
273928	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/chemical/pdb_components
243200	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/sampling
236248	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/scoring
191412	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/scoring/score_functions
190452	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/rotamer
91416	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/rotamer/ncaa_rotlibs
86112	./slots
84812	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/rotamer/ncaa_rotlibs/ncaa_rotamer_libraries
58676	./projects/climateprediction.net
53652	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/scoring/score_functions/rama
51688	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/scoring/score_functions/mhc_epitope
45804	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/rotamer/ncaa_rotlibs/ncaa_rotamer_libraries/n_methyl_amino_acid
39948	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/scoring/score_functions/P_AA_pp
39672	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/scoring/score_functions/P_AA_pp/shapovalov
37292	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/scoring/score_functions/P_AA_pp/shapovalov/2.5deg
34520	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/chemical/residue_type_sets
32532	./projects/boinc.bakerlab.org_rosetta/database_357d5d93529_n_methyl/minirosetta_database/scoring/motif


As far as .tmp files are concerned, there are very few: here are all of them:
[/var/lib/boinc]$ du -a | grep tmp
484	./slots/0/data0.tmp
0	./slots/0/data1.tmp
0	./slots/0/data2.tmp
12	./slots/0/error.tmp
228	./slots/1/data0.tmp
0	./slots/1/data1.tmp
0	./slots/1/data2.tmp
8	./slots/1/error.tmp
4	./slots/2/rosetta_tmp.txt
512	./slots/3/data0.tmp
0	./slots/3/data1.tmp
0	./slots/3/data2.tmp
12	./slots/3/error.tmp
4	./slots/4/rosetta_tmp.txt
4	./slots/5/rosetta_tmp.txt
4	./slots/6/rosetta_tmp.txt
4	./slots/7/rosetta_tmp.txt

ID: 106010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 106011 - Posted: 24 Apr 2022, 19:27:16 UTC - in response to Message 106010.  
Last modified: 24 Apr 2022, 19:27:45 UTC

I run Linux and have never run out of disk space (because spinning hard drives are now so big and so cheap).
You forgot "and slow". I've banned spinning disks from anything boinc related in my house. I have 7 PCs running Boinc and it's difficult to control them all when one is sat waiting on a disk! The only things rust spinners are used for is backups, TV/Film storage, and security cameras.
ID: 106011 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 106012 - Posted: 24 Apr 2022, 20:30:00 UTC

Is it time to panic ?
There are less than a million tasks left on the front page . . .
Does this mean we may run out of pythons sometime this year :-)
and then what will we do for `entertainment`
ID: 106012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 106013 - Posted: 24 Apr 2022, 21:02:29 UTC - in response to Message 106012.  

Is it time to panic ?
There are less than a million tasks left on the front page . . .
Does this mean we may run out of pythons sometime this year :-)
and then what will we do for `entertainment`

I am wondering that myself. The pythons are from a single researcher, and I don't know if there will be more.
Maybe it is just a one-shot experiment?

Since they never tell us anything, planning is not possible.
ID: 106013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 106014 - Posted: 24 Apr 2022, 21:09:20 UTC - in response to Message 106012.  

Is it time to panic ?
There are less than a million tasks left on the front page . . .
Does this mean we may run out of pythons sometime this year :-)
and then what will we do for `entertainment`
Play with WCG. If they ever work out how to move a server from one building to another. Another delay until 9th May....
ID: 106014 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,417,319
RAC: 20,286
Message 106016 - Posted: 24 Apr 2022, 21:25:01 UTC - in response to Message 106012.  

Is it time to panic ?
There are less than a million tasks left on the front page . . .
Does this mean we may run out of pythons sometime this year :-)
Given that the most In progress for them was a bit over 21,000, they tend to average around 15,000 or less, and that there are presently only 10,500 In progress, i think it will be a long, long, long time before they get cleared due to the very minuscule number of systems that are actually processing them.
Grant
Darwin NT
ID: 106016 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 106017 - Posted: 24 Apr 2022, 21:39:54 UTC - in response to Message 106016.  
Last modified: 24 Apr 2022, 21:41:18 UTC

Is it time to panic ?
There are less than a million tasks left on the front page . . .
Does this mean we may run out of pythons sometime this year :-)
Given that the most In progress for them was a bit over 21,000, they tend to average around 15,000 or less, and that there are presently only 10,500 In progress, i think it will be a long, long, long time before they get cleared due to the very minuscule number of systems that are actually processing them.
I make that four months. Depends how soon you want to panic. Anyway why panic when there's about 50 projects to play with? I'm off doing Milkyway (DP cards), Cosmology (CPUs), and Folding (SP cards) just now.
ID: 106017 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 106018 - Posted: 24 Apr 2022, 21:41:32 UTC - in response to Message 106017.  

Is it time to panic ?
There are less than a million tasks left on the front page . . .
Does this mean we may run out of pythons sometime this year :-)
Given that the most In progress for them was a bit over 21,000, they tend to average around 15,000 or less, and that there are presently only 10,500 In progress, i think it will be a long, long, long time before they get cleared due to the very minuscule number of systems that are actually processing them.
I make that four months. Depends how soon you want to panic.



And then we get to have fun with the buggy stuff.
ID: 106018 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 106019 - Posted: 24 Apr 2022, 21:50:48 UTC - in response to Message 106018.  

And then we get to have fun with the buggy stuff.
I prefer dune buggies.
ID: 106019 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,653
Message 106020 - Posted: 24 Apr 2022, 22:17:00 UTC - in response to Message 106018.  

Is it time to panic ?
There are less than a million tasks left on the front page . . .
Does this mean we may run out of pythons sometime this year :-)
Given that the most In progress for them was a bit over 21,000, they tend to average around 15,000 or less, and that there are presently only 10,500 In progress, i think it will be a long, long, long time before they get cleared due to the very minuscule number of systems that are actually processing them.
I make that four months. Depends how soon you want to panic.



And then we get to have fun with the buggy stuff.

I prefer both to the situation at Predictor@Home. They lost the two members of their project team who knew how the create useful new workunits (probably because they graduated). For several months, they kept the project running by repeatedly raising the number of times a workunit could fail before no more tasks would be sent out for it. Some of the remaining workunits failed over 30 times before the professor in charge decided it was not worthwhile to let the project continue, and it shut down.
ID: 106020 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 106021 - Posted: 24 Apr 2022, 23:03:48 UTC - in response to Message 106018.  

And then we get to have fun with the buggy stuff.

This IS the buggy stuff. That is one reason I am concerned we may not get more.
They did not bother to fix it, so it may be good enough for what they need it for.
ID: 106021 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 106022 - Posted: 24 Apr 2022, 23:20:20 UTC - in response to Message 106020.  

I prefer both to the situation at Predictor@Home. They lost the two members of their project team who knew how the create useful new workunits (probably because they graduated). For several months, they kept the project running by repeatedly raising the number of times a workunit could fail before no more tasks would be sent out for it. Some of the remaining workunits failed over 30 times before the professor in charge decided it was not worthwhile to let the project continue, and it shut down.
ROFL, Wikipedia says "Though it was quite successful, a "disagreement" between the project administration and the user base caused a mass exodus of participating users"
ID: 106022 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 106023 - Posted: 24 Apr 2022, 23:21:07 UTC - in response to Message 106021.  

And then we get to have fun with the buggy stuff.

This IS the buggy stuff. That is one reason I am concerned we may not get more.
They did not bother to fix it, so it may be good enough for what they need it for.
I'm not concerned. If all future tasks are made with 4.2, things will work properly again. Python is a shit programming language and Oracle is a shit virtual machine.
ID: 106023 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 196
Credit: 6,613,600
RAC: 5,541
Message 106024 - Posted: 25 Apr 2022, 2:16:31 UTC - in response to Message 106011.  

I do not know how slow my spinning disks are. True, these 7200 rpm SATA hard drives are not as fast as the 10,000 rpm SCSI/320 LVD hard drives on a former machine, but Boinc does not do all that much disk IO as to slow me down much. All my BOINC stuff is on one of those spinning hard drives. I note that Boinc homework assignments are severely compute-limited, so disk IO is just a small part of the work load. I use half my cores on Boinc stuff that runs mainly at nice level 19. Since the machine is doing little else at the moment, not that the machine is running about 50% computing, about 50% idle, and no time waiting for IO. More subjectively, the disk IO light blinks a very very short blink with about a 5-second interval; i.e., hardly any disk IO. The machine is running 5 rosetta and 3 universe jobs at the moment.
top - 22:01:25 up 4 days, 13:20,  1 user,  load average: 8.00, 8.13, 8.31
Tasks: 454 total,   9 running, 444 sleeping,   0 stopped,   1 zombie
%Cpu(s):  0.4 us,  0.1 sy, 49.7 ni, 49.7 id,  0.0 wa,  0.1 hi,  0.0 si,  0.0 st
MiB Mem :  63902.1 total,   2784.7 free,   6276.7 used,  54840.8 buff/cache
MiB Swap:  15992.0 total,  15987.0 free,      5.0 used.  56835.9 avail Mem

ID: 106024 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,653
Message 106025 - Posted: 25 Apr 2022, 2:24:16 UTC - in response to Message 106022.  

I prefer both to the situation at Predictor@Home. They lost the two members of their project team who knew how the create useful new workunits (probably because they graduated). For several months, they kept the project running by repeatedly raising the number of times a workunit could fail before no more tasks would be sent out for it. Some of the remaining workunits failed over 30 times before the professor in charge decided it was not worthwhile to let the project continue, and it shut down.
ROFL, Wikipedia says "Though it was quite successful, a "disagreement" between the project administration and the user base caused a mass exodus of participating users"

I'd expect a user base to disagree a lot and start exiting once every task started failing.

What I wrote came from the professor in charge.
ID: 106025 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,417,319
RAC: 20,286
Message 106026 - Posted: 25 Apr 2022, 3:07:36 UTC - in response to Message 106024.  

but Boinc does not do all that much disk IO as to slow me down much.
It depends on the application.
In the case of Rosetta, the Rosetta 4.20 Tasks don't require much disk I/O, however the Python Tasks require massive amounts of disk I/O when starting up & ending. And apparently they also require quite a bit during processing. The more cores & threads a system has & uses, then the higher the disk I/O requirements will be.
Grant
Darwin NT
ID: 106026 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2141
Credit: 41,538,222
RAC: 10,691
Message 106027 - Posted: 25 Apr 2022, 5:53:00 UTC - in response to Message 106009.  

Ok, I've just used Windows Disk cleanup and ensured storage sense is enabled and freed up a few Gb, but that's on a PC that isn't running VBox
I'll give that a go when I get back to my main PC tomorrow evening
I run the Windows disk cleanup (including system files) then run treesize which shows me what folders are using the most, so I can manually remove stuff I don't want anymore. Last time I reduced the stuff on my disk by a third.

I only bother doing this when the line changes from blue to red in windows explorer.

Looking at this message was a reminder to do all this.
No new .tmp files, freed up a few Gb here too, grabbed Treeview but it's not telling me anything I expect to find useful so removed again.
I've got BoincTasks but hadn't set it up to run at startup, which I've now done. Yes, very useful in finding tasks that are very far behind in CPU time compared to Elapsed time.
More useful when running VBox tasks compared to running plain Rosetta tasks - I'll keep this going now.
All good, ta
ID: 106027 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paddles

Send message
Joined: 15 Mar 15
Posts: 11
Credit: 5,434,545
RAC: 2,362
Message 106028 - Posted: 25 Apr 2022, 12:22:52 UTC - in response to Message 106007.  

Update: The first task to be postponed reached the end of its one day postponement, and now appears to be computing successfully (in VBox 6.1.34). Haven't tried reverting to previous version to see what happens, but whatever the problem was it seems to have resolved.


I may have spoken too soon. The tasks were running for exceptionally long times (18-26 hours) - although unlike the normal "not doing anything" vbox tasks, they were showing significant CPU time utilised (rather than the tasks that "run" for 18 hours but have only consumed 10-20 seconds of CPU). I shut down BOINC, rolled VBox back to version 6.1.12 (BOINC recommended version, not 6.1.32 which I had been running), restarted, and all the vbox tasks came up with computation errors.

Oh well, will see what happens with the next tasks to run.
ID: 106028 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 106029 - Posted: 25 Apr 2022, 16:37:56 UTC - in response to Message 106024.  

I do not know how slow my spinning disks are. True, these 7200 rpm SATA hard drives are not as fast as the 10,000 rpm SCSI/320 LVD hard drives on a former machine, but Boinc does not do all that much disk IO as to slow me down much. All my BOINC stuff is on one of those spinning hard drives. I note that Boinc homework assignments are severely compute-limited, so disk IO is just a small part of the work load. I use half my cores on Boinc stuff that runs mainly at nice level 19. Since the machine is doing little else at the moment, not that the machine is running about 50% computing, about 50% idle, and no time waiting for IO. More subjectively, the disk IO light blinks a very very short blink with about a 5-second interval; i.e., hardly any disk IO. The machine is running 5 rosetta and 3 universe jobs at the moment.
Try 24 cores running virtualbox. 2GB disk read and 2GB disk write to start each one, followed by many checkpoints.
ID: 106029 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 8,091
Message 106030 - Posted: 25 Apr 2022, 16:39:01 UTC - in response to Message 106025.  

I'd expect a user base to disagree a lot and start exiting once every task started failing.
I'm not that arrogant. I keep trying to help a project in difficulty.
ID: 106030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 203 · 204 · 205 · 206 · 207 · 208 · 209 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org