Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 142 · 143 · 144 · 145 · 146 · 147 · 148 . . . 309 · Next

AuthorMessage
Profile trevG

Send message
Joined: 5 Nov 13
Posts: 9
Credit: 687,475
RAC: 0
Message 103709 - Posted: 4 Dec 2021, 17:44:06 UTC - in response to Message 103708.  
Last modified: 4 Dec 2021, 18:08:53 UTC

No wonder I struggled but it did 2 WU,s. But for my fault finding glitches it could have completed them all..
The VM box controls were far from intuitive.
A warning would have helped, anyway..
Maybe I should stop all RAH work now as this seems the default..?
As a by the by- what is the minimum RAM needed for this work?
I probably won't upgrade -as other work seems ok, apart from LHC.

17+ hrs of completed work that has been missed from validation, 3 completed units:

Rosetta@home 03-12-2021 09:52 03:53:41 (03:47:29) 03-12-2021 09:56 aaas-SAR-VAL_pp-NMVAL-SUGA_pp_12_2559723_1_0 97.35 Reported: OK * 1.03 rosetta python projects (vbox64) DESKTOP-IUPTMBB
Rosetta@home 03-12-2021 05:57 07:03:18 (06:52:27) 03-12-2021 05:57 aaap-PIP_pp-mNMPHE_pp-TIC-AMACBEN3_pp_0_2502770_1_0 97.44 Reported: OK + 1.03 rosetta python projects (vbox64) DESKTOP-IUPTMBB
(203) 1.03 rosetta python projects (vbox64) DESKTOP-IUPTMBB
Rosetta@home 02-12-2021 21:15 06:44:38 (06:22:05) 02-12-2021 21:17 aaas-PHE_pp-mTIC_pp-NMVAL-mSUGA_1_2517870_1_0 94.43 Reported: OK + 1.03 rosetta python projects (vbox64) DESKTOP-IUPTMBB
ID: 103709 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jonathan

Send message
Joined: 4 Oct 17
Posts: 43
Credit: 1,337,472
RAC: 0
Message 103710 - Posted: 4 Dec 2021, 18:03:23 UTC - in response to Message 103709.  

The current VM work is using about 3Gb for the work units starting with 'aa'. The 'boinc_cages_IL' is running about 6Gb. You should be able to run camb_boinc2docker VMs from Cosmology at home as those only use 2Gb. You just need to set your preferences to 1 or 2 for Max # CPUs so it only assigns one or two cores to each work unit and VM.
It's probably best if you just stick to conventional Boinc work units as they behave better and share resources as they run at a lower priority. Virtual Box task run at a normal priority.
ID: 103710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trevG

Send message
Joined: 5 Nov 13
Posts: 9
Credit: 687,475
RAC: 0
Message 103711 - Posted: 4 Dec 2021, 18:17:09 UTC - in response to Message 103710.  

Yes- I did some CAH units ok recently, first time using VM.
I am annoyed about what happened on Rosetta though- and I can't be the first.
Also, a message to stop trying would be a good idea - whilst wasting effort
as even the apparent good units were ignored in the end!
ID: 103711 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 195
Credit: 6,613,600
RAC: 6,755
Message 103712 - Posted: 4 Dec 2021, 19:59:27 UTC - in response to Message 103705.  

Jean - I thought you might be on to something. But it was a fluke.
I put <name> in app_config and I set the project_concurrent to 2 and then to 1, but that is being ignored.
Still running 3.
I guess RAH will do what it wants to do no matter what commands you give it, short of cutting resource share which looks like the only way to get it to 2 tasks and maybe at 25% to get it to 1.


Why does it work for me and not for you?
ID: 103712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,378,164
RAC: 20,578
Message 103713 - Posted: 4 Dec 2021, 20:37:00 UTC - in response to Message 103699.  

Its windblows 7, opteron16 that has gone funky
I thort I had it fixed, but today its back on python only work ,
I still haven't seen any mention of what your BOINC disk settings actually are.
Use no more than ? GB
  Leave at least ? GB free
Use no more than ? % of total




11 at once and it is getting the disk space moan again, except even after all that clear out, its got worse !!!??
11 Python Tasks will require roughly 88GB of disk space.
Grant
Darwin NT
ID: 103713 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 103714 - Posted: 4 Dec 2021, 23:19:27 UTC - in response to Message 103712.  

Jean - I thought you might be on to something. But it was a fluke.
I put <name> in app_config and I set the project_concurrent to 2 and then to 1, but that is being ignored.
Still running 3.
I guess RAH will do what it wants to do no matter what commands you give it, short of cutting resource share which looks like the only way to get it to 2 tasks and maybe at 25% to get it to 1.


Why does it work for me and not for you?



That is the ultimate question.
Talk me through it again...you had all that directory stuff in the text, but I don't have that.
Whats the plain text version of all that?

I have project name (boinc agrees), project_max_concurrent (no disagreement there), but it ignores those.
I just aborted a stuck task and BOINC took 2 pythons to start.
ID: 103714 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 103715 - Posted: 4 Dec 2021, 23:22:51 UTC - in response to Message 103710.  

The current VM work is using about 3Gb for the work units starting with 'aa'. The 'boinc_cages_IL' is running about 6Gb. You should be able to run camb_boinc2docker VMs from Cosmology at home as those only use 2Gb. You just need to set your preferences to 1 or 2 for Max # CPUs so it only assigns one or two cores to each work unit and VM.
It's probably best if you just stick to conventional Boinc work units as they behave better and share resources as they run at a lower priority. Virtual Box task run at a normal priority.



Correction: Cages is 7,629.39 MB (i've been getting those quite a bit lately)
ID: 103715 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 103716 - Posted: 5 Dec 2021, 1:31:47 UTC - in response to Message 103713.  
Last modified: 5 Dec 2021, 1:54:08 UTC

Its windblows 7, opteron16 that has gone funky
I thort I had it fixed, but today its back on python only work ,
I still haven't seen any mention of what your BOINC disk settings actually are.
Use no more than ? GB
  Leave at least ? GB free
Use no more than ? % of total

11 at once and it is getting the disk space moan again, except even after all that clear out, its got worse !!!??
11 Python Tasks will require roughly 88GB of disk space.

It was in Message 103670, Posted: 2 Dec 2021, 21:51:18 UTC . I had gone as far to untick all the disk space boxes to give it unlimited use of the disk
at the moment I have 89GB free on drive C, and rosetta is using 98GB {greedy hog}
And this afternoon I reduced my workunit cash down to 0.1 + 0.1 just to see what happens [it was 1 + 0.5]
In the long run , it works , so whatever it is moaning about I can live with it
ID: 103716 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 195
Credit: 6,613,600
RAC: 6,755
Message 103717 - Posted: 5 Dec 2021, 4:49:17 UTC - in response to Message 103714.  


Why does it work for me and not for you?


That is the ultimate question.
Talk me through it again...you had all that directory stuff in the text, but I don't have that.
Whats the plain text version of all that?

I have project name (boinc agrees), project_max_concurrent (no disagreement there), but it ignores those.


Are you putting them in the right app_config.xml file?
I.e., the one in the ..../projects/boinc.bakerlab.org_rosetta directory?

/var/lib/boinc/projects/boinc.bakerlab.org_rosetta]# cat app_config.xml
<app_config>
<project_max_concurrent>3</project_max_concurrent>
</app_config>

I just aborted a stuck task and BOINC took 2 pythons to start.


I hope this is what you want.

cat /etc/redhat-release: Red Hat Enterprise Linux release 8.5 (Ootpa)
uname -r: 4.18.0-348.2.1.el8_5.x86_64
rpm -q boinc-client: boinc-client-7.16.11-3.el8.x86_64

# Two terabyte hard drive. [part of /etc/fstab]
UUID=90309ec8-b1d3-4438-b983-f7ab121421a8 /D3P1 ext4 defaults 1 2
UUID=9bea9d6e-2f0d-4636-ac83-7fb9c0b2e108 /D3P2 ext4 defaults 1 2
UUID=8d57a006-8363-4dd0-abe4-d8f77fc15182 /var/lib/boinc ext4 defaults 0 0 <---<<<
UUID=840e6522-89ff-4b81-9efa-33d97df3fb1e /home/guest xfs defaults 0 0
UUID=04a403c2-6199-4936-9ab7-1fe2ed25377e /D3P6 xfs defaults 0 0
UUID=c667796a-a283-4db1-9bb9-7f6ef0de9982 /D3P7 xfs defaults 0 0


Disk: Boinc will use the most restrictive of these settings;

Use no more than 110 GBytes
Leave at least 0.5 GBytes free
Use no more than 85% of total.

Memory [I have about 64 GBytes RAM]
When computer is in use, use at most 80%
When computer is not in use, use at most 90%
Leave non GPU tasks in memory when tasks are suspended

Page swap file: use at most 50% [100% = 16 Gigabytes] [1.5 Megabytes used]

[/var/lib/boinc/projects/boinc.bakerlab.org_rosetta]# ls -l
total 739232
        82 Jul  1  15:29  app_config.xml  <---<<<
      4096 May 11  2020  database_357d5d93529_n_methyl
 507570722 Nov 14  2020  database_357d5d93529_n_methyl.zip
         0 Nov 14  2020  database_357d5d93529_n_methyl.zip.is_bad
    352308 Nov 14  2020  LiberationSans-Regular.ttf
 125232600 Nov 14  2020  rosetta_4.20_x86_64-pc-linux-gnu
 123794008 Nov 14  2020  rosetta_graphics_4.20_x86_64-pc-linux-gnu

[/var/lib/boinc/projects/boinc.bakerlab.org_rosetta]# cat app_config.xml 
<app_config>
   <project_max_concurrent>3</project_max_concurrent>
</app_config>

top - 23:33:34 up 23:55,  1 user,  load average: 8.53, 8.55, 8.62
Tasks: 453 total,  10 running, 442 sleeping,   1 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.3 sy, 49.5 ni, 49.7 id,  0.0 wa,  0.1 hi,  0.1 si,  0.0 st
MiB Mem :  63902.2 total,   1888.6 free,   9855.8 used,  52157.7 buff/cache
MiB Swap:  15992.0 total,  15990.5 free,      1.5 used.  53269.4 avail Mem 

Boinc processes running; n.b.: I have no Rosetta tasks at the moment
    PID    PPID USER      PR  NI S    RES  %MEM  %CPU  P     TIME+ COMMAND                                              
  11368   11311 boinc     39  19 T   1.4g   2.2   0.0  2   1260:36 /var/lib/boinc/projects/climateprediction.net/hadam+ 
  11370   11310 boinc     39  19 R   1.3g   2.1  99.3  6   1334:37 /var/lib/boinc/projects/climateprediction.net/hadam+ 
  11378   11376 boinc     39  19 R   1.3g   2.1  99.2  7   1343:57 /var/lib/boinc/projects/climateprediction.net/hadam+ 
  11374   11309 boinc     39  19 R   1.3g   2.1  99.3  2   1268:12 /var/lib/boinc/projects/climateprediction.net/hadam+ 
  89853    2604 boinc     39  19 R 759984   1.2  99.3  4  68:29.20 ../../projects/www.worldcommunitygrid.org/wcgrid_ar+ 
  72940    2604 boinc     39  19 R 758500   1.2  99.0  1 350:46.26 ../../projects/www.worldcommunitygrid.org/wcgrid_ar+ 
  90073    2604 boinc     39  19 R 153240   0.2  99.3  5  64:58.59 ../../projects/www.worldcommunitygrid.org/wcgrid_op+ 
  93694    2604 boinc     39  19 R 113052   0.2  98.9  0  11:48.64 ../../projects/www.worldcommunitygrid.org/wcgrid_op+ 
  91188    2604 boinc     39  19 R  73000   0.1  99.2 11  50:54.74 ../../projects/www.worldcommunitygrid.org/wcgrid_mc+ 
   2604       1 boinc     30  10 S  37876   0.1   0.7 10   5521:29 /usr/bin/boinc                                       
  11309    2604 boinc     39  19 S  18564   0.0   0.0 10   1:17.60 ../../projects/climateprediction.net/hadam4_8.52_i6+ 
  11310    2604 boinc     39  19 S  18468   0.0   0.0 10   1:19.52 ../../projects/climateprediction.net/hadam4_8.52_i6+ 
  11376    2604 boinc     39  19 S  17536   0.0   0.1 14   0:43.15 ../../projects/climateprediction.net/hadam4_8.52_i6+ 
  11311    2604 boinc     39  19 S  17084   0.0   0.0 10   0:27.48 ../../projects/climateprediction.net/hadam4_8.52_i6+ 


ID: 103717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,378,164
RAC: 20,578
Message 103718 - Posted: 5 Dec 2021, 4:57:45 UTC - in response to Message 103716.  
Last modified: 5 Dec 2021, 5:02:38 UTC

I had gone as far to untick all the disk space boxes to give it unlimited use of the disk
The boxes aren't tickable, they require values. And one value in any one of the options overrides the values in any of the other two when it comes to what disk space is actually available.




While people are now able to get & do more Python Tasks, i can see many more people leaving the project anyway.
Rosetta 4.20 Tasks weren't exactly high payers with roughly 340 Credits for 8 hours of work (depending on the system). For the same time frame, Python Tasks only pay out around 130, roughly 2.5 times less.
Grant
Darwin NT
ID: 103718 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 103721 - Posted: 5 Dec 2021, 9:02:51 UTC - in response to Message 103717.  
Last modified: 5 Dec 2021, 9:06:36 UTC


Why does it work for me and not for you?


That is the ultimate question.
Talk me through it again...you had all that directory stuff in the text, but I don't have that.
Whats the plain text version of all that?

I have project name (boinc agrees), project_max_concurrent (no disagreement there), but it ignores those.


Are you putting them in the right app_config.xml file?
I.e., the one in the ..../projects/boinc.bakerlab.org_rosetta directory?

/var/lib/boinc/projects/boinc.bakerlab.org_rosetta]# cat app_config.xml
<app_config>
<project_max_concurrent>3</project_max_concurrent>
</app_config>

------------------------



Yep..for me its in boinc data/projects/boinc.bakerlab.org_rosetta.
<name>rosetta python projects</name>
<app_config>
<project_max_concurrent>1</project_max_concurrent> (it could be 2 but not 3)
</app_config>
And its back to 3 at a time again.

Resource share is at 100%. I'm thinking I will have to bring it back to 50 again.
ID: 103721 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 103722 - Posted: 5 Dec 2021, 11:20:36 UTC - in response to Message 103721.  
Last modified: 5 Dec 2021, 11:25:04 UTC


Why does it work for me and not for you?


That is the ultimate question.
Talk me through it again...you had all that directory stuff in the text, but I don't have that.
Whats the plain text version of all that?

I have project name (boinc agrees), project_max_concurrent (no disagreement there), but it ignores those.


Are you putting them in the right app_config.xml file?
I.e., the one in the ..../projects/boinc.bakerlab.org_rosetta directory?

/var/lib/boinc/projects/boinc.bakerlab.org_rosetta]# cat app_config.xml
<app_config>
<project_max_concurrent>3</project_max_concurrent>
</app_config>



------------------------



Yep..for me its in boinc data/projects/boinc.bakerlab.org_rosetta.
<name>rosetta python projects</name>
<app_config>
<project_max_concurrent>1</project_max_concurrent> (it could be 2 but not 3)
</app_config>
And its back to 3 at a time again.

Resource share is at 100%. I'm thinking I will have to bring it back to 50 again.


Oh man....how dumb can I be. I forgot to change the type from text to xml!!!!
SMH

Now that it is a xml file it works. Gees
That's what happens when your in a rush in the middle of the night.
ID: 103722 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 195
Credit: 6,613,600
RAC: 6,755
Message 103726 - Posted: 5 Dec 2021, 15:21:51 UTC - in response to Message 103722.  

Oh man....how dumb can I be. I forgot to change the type from text to xml!!!!
SMH

Now that it is a xml file it works. Gees
That's what happens when your in a rush in the middle of the night.


That is not dumb. That is just part of being human, and you do not even need to be forgiven for that. There are too many people who seem to have lost their humanity. They seem to me that they end up in government and upper management of large corporations.

I do not know your age,. but I am more than three score and ten and I can tell you it gets worse with age.

So do not insult yourself. Forgive yourself if you must.
ID: 103726 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 195
Credit: 6,613,600
RAC: 6,755
Message 103727 - Posted: 5 Dec 2021, 15:54:27 UTC - in response to Message 103722.  


Are you putting them in the right app_config.xml file?
I.e., the one in the ..../projects/boinc.bakerlab.org_rosetta directory?

/var/lib/boinc/projects/boinc.bakerlab.org_rosetta]# cat app_config.xml
<app_config>
<project_max_concurrent>3</project_max_concurrent>
</app_config>



------------------------



Yep..for me its in boinc data/projects/boinc.bakerlab.org_rosetta.
<name>rosetta python projects</name>
<app_config>
<project_max_concurrent>1</project_max_concurrent> (it could be 2 but not 3)
</app_config>
And its back to 3 at a time again.

Resource share is at 100%. I'm thinking I will have to bring it back to 50 again.

I do not even have a directory named [b]boinc data/projects/boinc.bakerlab.org_rosetta[/b]
                                             ^
                              ?????  >>>-----| I cannot believe this is a space.
Filesystem            1K-blocks      Used Available Use% Mounted on
/dev/sdb3             122908728  18785860  97856396  17% /var/lib/boinc

My system is running Red Hat Enterprise Linux release 8.5 (Ootpa) and all its Boinc stuff is in /val/lib/boinc. That is a partition all its own on one of my spinning hard drives. It has only the following directories;
[/var/lib/boinc]$ ls -l
drwx------.  2 root  root     4096 Jul  1 15:22   lost+found
drwxrwx--x.  2 boinc boinc    4096 Dec  5 09:46   notices
drwxrwx--x.  6 boinc boinc    4096 Nov 27  2020   projects
drwxrwx--x. 12 boinc boinc    4096 Dec  4 11:56   slots

ID: 103727 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 103728 - Posted: 5 Dec 2021, 17:12:15 UTC - in response to Message 103726.  

but I am more than three score and ten and I can tell you it gets worse with age. - I am more than XLV and less than LV
ID: 103728 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103730 - Posted: 5 Dec 2021, 18:02:50 UTC

For the "Vm job unmanageable errors", you can try this fix:
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=14839&postid=103729#103729

It will probably work as well as going back to VirtualBox 5.2.44, which is easy in Windows, but not so easy in Ubuntu.
ID: 103730 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trevG

Send message
Joined: 5 Nov 13
Posts: 9
Credit: 687,475
RAC: 0
Message 103731 - Posted: 5 Dec 2021, 19:05:30 UTC - in response to Message 103728.  

I'm LXXVIII and feeling it a little on here atm..:/
ID: 103731 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jonathan

Send message
Joined: 4 Oct 17
Posts: 43
Credit: 1,337,472
RAC: 0
Message 103732 - Posted: 5 Dec 2021, 19:19:01 UTC - in response to Message 103730.  

Rosetta is already using that wrapper version, vboxwrapper_26203_windows_x86_64.exe on windows.
ID: 103732 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103733 - Posted: 5 Dec 2021, 21:42:04 UTC - in response to Message 103732.  
Last modified: 5 Dec 2021, 21:49:02 UTC

Rosetta is already using that wrapper version, vboxwrapper_26203_windows_x86_64.exe on windows.

I haven't tried it on Windows yet. But maybe you could check the file size to see if they are the same?
(Or maybe Rosetta fixed it themselves? That would be the best fix.)

And they don't seem to have a fixed version for Linux on that website, so I will be using the one from LHC, which is different than the Rosetta one.

PS - The ones you have finished state:
<stderr_txt>
2021-12-05 09:51:08 (26084): Detected: vboxwrapper 26202

So maybe they just changed it?
ID: 103733 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103734 - Posted: 5 Dec 2021, 22:00:49 UTC - in response to Message 103731.  

I'm LXXVIII and feeling it a little on here atm..:/

That is an outstanding age. You have acquired a lot of experience by then.
ID: 103734 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 142 · 143 · 144 · 145 · 146 · 147 · 148 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org