Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 141 · 142 · 143 · 144 · 145 · 146 · 147 . . . 309 · Next

AuthorMessage
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103684 - Posted: 3 Dec 2021, 19:53:33 UTC - in response to Message 103683.  

Python tasks failing

I don't see that you have VirtualBox installed.
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=6157362

But you are better off with VBox 5.2.44 anyway. Version 6.1 gives "Vm job unmanageable" suspensions.
https://www.virtualbox.org/wiki/Download_Old_Builds_5_2
ID: 103684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jonathan

Send message
Joined: 4 Oct 17
Posts: 43
Credit: 1,337,472
RAC: 0
Message 103686 - Posted: 3 Dec 2021, 20:09:46 UTC - in response to Message 103684.  

Are you thinking of a different project that shows Virtual Box on the computer details page? I don't see it on either of the computer details for mine, nor on the two I check listed under you.
Virtual Box is working on both my computers but I have been sticking to 6.1 since it is supported by Virtualbox. Support was dropped for the earlier versions. I just don't load up my computers to 100 percent processor usage nor juggle too many concurrent VM tasks. These Python / Rosettas are brutal with creating almost 8 Gb images.

I just can't figure out why the one computer is having problems now as it was working with the previous python related tasks.
ID: 103686 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103687 - Posted: 3 Dec 2021, 20:18:55 UTC - in response to Message 103686.  
Last modified: 3 Dec 2021, 20:20:42 UTC

Probably so. I usually see VirtualBox listed on most projects.

The memory requirement for downloading the new pythons is now down to 3 GB, and the amount required to run is less than that.
But the .vdi images in the slots are still large. Maybe they will be reduced eventually.

Have you checked BOINC Manager for memory and disk usage allowed? It may not be enough.
ID: 103687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jonathan

Send message
Joined: 4 Oct 17
Posts: 43
Credit: 1,337,472
RAC: 0
Message 103688 - Posted: 3 Dec 2021, 21:03:38 UTC - in response to Message 103687.  

It's using Rosetta preferences
RAM set to %75 in use and not in use. So can use 9 out of 12Gb. That seems correct as It has 3 tasks. It doesn't keep non running tasks in memory. that box is unchecked.

I think it is something inside the VM. I kind of got spoiled with the LCH Atlas tasks and being able to see the second and third terminals. One for tasks and one showing TOP
ID: 103688 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103689 - Posted: 3 Dec 2021, 21:26:47 UTC - in response to Message 103688.  
Last modified: 3 Dec 2021, 21:40:26 UTC

I kind of got spoiled with the LCH Atlas tasks and being able to see the second and third terminals. One for tasks and one showing TOP

While the VBox version won't affect your ability to download, LHC is the only project that uses VBox 6.1 without the suspensions, from what I have seen at any rate.
That is apparently because they use a different wrapper, which I think they compile themselves. At least it is different.

But that is why I went to Win10. It allows the use of VBox 5.2.44, whereas Ubuntu 20.04.3 allows only 6.1.
I haven't had a suspension yet in Win10, though it has been running only a day. But I would normally get several in that time with VBox 6.1.

Unfortunately, it does not solve the "0 CPU" error, where a work unit uses very little (less than 1%) CPU power, and goes on forever, or else times out.
ID: 103689 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 61
Credit: 25,390,629
RAC: 31,778
Message 103692 - Posted: 4 Dec 2021, 0:17:40 UTC - in response to Message 103684.  

Python tasks failing

I don't see that you have VirtualBox installed.
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=6157362

But you are better off with VBox 5.2.44 anyway. Version 6.1 gives "Vm job unmanageable" suspensions.
https://www.virtualbox.org/wiki/Download_Old_Builds_5_2


I have 6.x and never have this issue with LHC but about half have these issues at Rosetta.. Plenty of place and memory. Rosetta has never had an efficient app.
ID: 103692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 103694 - Posted: 4 Dec 2021, 0:26:45 UTC - in response to Message 103692.  

Python tasks failing

I don't see that you have VirtualBox installed.
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=6157362

But you are better off with VBox 5.2.44 anyway. Version 6.1 gives "Vm job unmanageable" suspensions.
https://www.virtualbox.org/wiki/Download_Old_Builds_5_2


I have 6.x and never have this issue with LHC but about half have these issues at Rosetta.. Plenty of place and memory. Rosetta has never had an efficient app.



I run LHC ATLAS and I had to downgrade to run RAH Python.
Python is new to RAH style of computing on PC's.
I've been with this project since almost the beginning and they have never deviated from their base program.
They always have bugs, that's a given. We saw that here, quite a few things went wrong before they got a stable working project.
It's been the same with some of the projects they put out on normal Rosetta.
It's just one of those things that we have to deal with.

As far as the two versions of Vbox, I don't see any difference in the way ATLAS runs on 6 or on 5.
So I will just stick with 5 until a newer version of 6 comes out that may make the errors go away or maybe not.
But it really doesn't seem to make any difference on any of the other 2 Vbox projects I run.
So just down grade to 5 if you want to run Python.
ID: 103694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 103695 - Posted: 4 Dec 2021, 0:35:20 UTC - in response to Message 103683.  

Python tasks failing
Error posted inside VM is "Intel MKL FATAL ERROR: Error on loading function mkl_lapack_ps_mc3_dsytrf_l_small."
Stops right after it gives workunit name.

Computer is an old Intel I7-920. I get no cpu usage after hang.
I have removed and reattached project from a December 1 when this first started. I thought maybe it was bad files.

https://boinc.bakerlab.org/rosetta/results.php?hostid=6157362



You might want to research that error. I found quite a few things about it, but it way over my head to understand.
It's quite technical stuff that comes back in the search results.
https://www.google.com/search?client=firefox-b-d&q=%22Intel+MKL+FATAL+ERROR%3A+Error+on+loading+function+mkl_lapack_ps_mc3_dsytrf_l_small
ID: 103695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 103696 - Posted: 4 Dec 2021, 1:21:26 UTC
Last modified: 4 Dec 2021, 1:47:57 UTC

At the moment the only projects I am running on that computer are Moo on the gpu and rosetta on cpu
I have not had many disk space messages today though some rosetta 42 work has found its way here, wich eased the problem
I decided not to install any more programs to deleat the disk junk coz they would take up more space on the disk :)
though I know I can uninstall them later
So after the usual microwsoft `disk cleanup` and system files I had a good uninstall of everything I don't need, had a play with the digital chainsaw
and deleted everything that don't have to be on the disk including everything from documents and download folders. that got me 12GB back
even that did`nt get rid of the "disk space" message , though the demands where less.
the thing that finaly shut it up was reducing the virtual memory size on the disk coz it was holding 39GB to ransom and not using it , my account page still shows it as - Swap space 32784.33 MB
its got 32GB ram in it and windows automatically creates a page file 1 1/2 times the size of fitted RAM, give or take a bit
{having remembered the fun I had with win98se all those years ago with running out of memory when it only had 756MB in it to start with}
But you dident need gigabites of memory just to boot the thing back then.
so having read up on how its done these days and chopped it down to a tenth of what it was using
and now have 106GB free disk space even with several greedy python tasks running
I will just have to keep an eye on it and see what happens
.................
Just been to check on it
it last had a disk space moan ten hours ago so that seems to be it for now
Einstein does use the most disk space at 330MB [suspended]
Rosetta [we will all have much the same] is using 49GB
So , yes , rosetta is the disk hog
ID: 103696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 103697 - Posted: 4 Dec 2021, 10:15:10 UTC - in response to Message 103696.  

At the moment the only projects I am running on that computer are Moo on the gpu and rosetta on cpu
I have not had many disk space messages today though some rosetta 42 work has found its way here, wich eased the problem
I decided not to install any more programs to deleat the disk junk coz they would take up more space on the disk :)
though I know I can uninstall them later
So after the usual microwsoft `disk cleanup` and system files I had a good uninstall of everything I don't need, had a play with the digital chainsaw
and deleted everything that don't have to be on the disk including everything from documents and download folders. that got me 12GB back
even that did`nt get rid of the "disk space" message , though the demands where less.
the thing that finaly shut it up was reducing the virtual memory size on the disk coz it was holding 39GB to ransom and not using it , my account page still shows it as - Swap space 32784.33 MB
its got 32GB ram in it and windows automatically creates a page file 1 1/2 times the size of fitted RAM, give or take a bit
{having remembered the fun I had with win98se all those years ago with running out of memory when it only had 756MB in it to start with}
But you dident need gigabites of memory just to boot the thing back then.
so having read up on how its done these days and chopped it down to a tenth of what it was using
and now have 106GB free disk space even with several greedy python tasks running
I will just have to keep an eye on it and see what happens
.................
Just been to check on it
it last had a disk space moan ten hours ago so that seems to be it for now
Einstein does use the most disk space at 330MB [suspended]
Rosetta [we will all have much the same] is using 49GB
So , yes , rosetta is the disk hog


That's interesting...I had a look at my RAH folder and its 30.7 GB in size and in compressed form it is 13.7 (size on disk) 5,316 files and 424 folders. I have a smaller drive than you and yet I don't get errors and I am running 7 BOINC projects and FAH plus Facebook and Firefox with many tabs and I don't get a disk space error.

I am beginning to think RAH is having issues with Win7. I use Win 10.
Just a thought.
Which one of your systems is having issues?
ID: 103697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jonathan

Send message
Joined: 4 Oct 17
Posts: 43
Credit: 1,337,472
RAC: 0
Message 103698 - Posted: 4 Dec 2021, 11:55:36 UTC - in response to Message 103695.  

I aborted all the newer python jobs that started with 'aa'. I got a single 'boinc_cages_IL' job so I kept that one. That one runs fine. I set the computer to not receive VM jobs from Rosetta.
ID: 103698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 103699 - Posted: 4 Dec 2021, 12:56:28 UTC

Its windblows 7, opteron16 that has gone funky
I thort I had it fixed, but today its back on python only work ,
11 at once and it is getting the disk space moan again, except even after all that clear out, its got worse !!!??

04/12/2021 10:50:17 | Rosetta@home | Message from server: rosetta python projects needs 16200.20MB more disk space. You currently have 2873.28 MB available and it needs 19073.49 MB.
04/12/2021 12:19:11 | Rosetta@home | Message from server: rosetta python projects needs 16255.39MB more disk space. You currently have 2818.09 MB available and it needs 19073.49 MB.

right now, as far as the OS on drive C, its got 91GB of disk space free
even with the 11 pythons running
so just for interest I have set it off on a full 5 pass disk check to see if it finds anything
funny old world . . . .
ID: 103699 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,780,807
RAC: 5,492
Message 103700 - Posted: 4 Dec 2021, 14:29:04 UTC

Now my pcs have "got 0 new tasks" of python wus, but in the queue there are over 5000 wus...
ID: 103700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 103701 - Posted: 4 Dec 2021, 14:31:58 UTC - in response to Message 103700.  

Now my pcs have "got 0 new tasks" of python wus, but in the queue there are over 5000 wus...

I received 5 tasks just now.
ID: 103701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103703 - Posted: 4 Dec 2021, 16:12:44 UTC - in response to Message 103701.  

They have a mix of the old (8 GB) and new (3 GB) pythons.
The old ones are "boinc_cages-Il", and the new ones are "aaxx-xxx".

So maybe we will finish off the big ones at some point.
ID: 103703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile trevG

Send message
Joined: 5 Nov 13
Posts: 9
Credit: 687,475
RAC: 0
Message 103704 - Posted: 4 Dec 2021, 16:48:14 UTC - in response to Message 103703.  

I've been trying to run these Pyrhons starting aa.. and made some progress, after finding windows update had pre-stalled my VM 'manual start up' setting in services. No warning or anything useful. I had previously struggled with LHC due to this.
Checking the operation out showed that BoincMgr was freezing,same with EFMER version, which I prefer [for visibilty and function] - but less good at access to Boinc lately, under password issues.
I looked into the VM box setting and saw that the ram allocation was waning that max sertting would cause system lag- Problem, I couldn't mod the setting for a while till it suddenly went live. It seemed to reset back easily, though.
In trying to sort out this, I lost half a dozen WU's after completing two ok but on restarting I no longer get units after inceasing my disk allocation a lot -as pointed out by others prioe to successful two runs..
I waited 24 hours after failures to see if finihing other work affected getting download better- but no change.
I wonder if aborted work has led to blacklisting??
Annoying -as I lost 4 days of GPUGrid work in the process and spent hours sorting out the VM- which is pretty tricky to use.
Any thoughts, Maestros? I never had issues over years with old RAH clients..

World Community Grid 03-12-2021 16:03 02:20:28 (01:44:22) 03-12-2021 16:04 MCM1_0185708_4424_2 74.30 Reported: OK + 7.61 Mapping Cancer Markers DESKTOP-

Rosetta@home 03-12-2021 14:50 00:00:00 (00:00:00) 03-12-2021 14:51 aaam-SAR_pp-mPRO_pp-PIP-AMACBEN2_pp_12_2570219_1_0 0.00 Aborted (203) 1.03 rosetta python projects (vbox64) DESKTOP

*** GPUGRID 03-12-2021 14:39 *RUN TIME 02d,01:17:02 (02d,01:55:03) 03-12-2021 14:41 e7s224_e1s376p0f362-ADRIA_BanditGPCR_APJ_b0-0-1-RND6911_1 0.956C + 1NV 100.00 Aborted (203) 2.19 New version of ACEMD (cuda1121) DESKTOP-

Rosetta@home 03-12-2021 14:30 00:00:27 (00:00:00) 03-12-2021 14:31 aagb-PRO_pp-SAR-ACPenC13T-mB3PHG_pp_11_2697622_1_0 0.00 **Reported: Computation error (1,) 1.03 rosetta python projects (vbox64) DESKTOP-
World Community Grid 03-12-2021 12:59 02:37:38 (02:27:14) 03-12-2021 13:01
World Community Grid 03-12-2021 11:07 03:03:09 (02:36:18) 03-12-2021 11:09 OPN1_0095285_00301_0 85.34 Reported: OK + 7.21 OpenPandemics - COVID 19 DESKTOP-

Rosetta@home 03-12-2021 10:54 00:59:04 (00:54:55) 03-12-2021 10:56 aaam-mNMVAL_pp-FPR-mPHE-
AMACBEN2_pp_4_2496370_1_0 92.97 ** Reported: Computation error (0,) 1.03 rosetta python projects (vbox64) DESKTOP-
Rosetta@home 03-12-2021 09:52 03:53:41 (03:47:29) 03-12-2021 09:56 aaas-SAR-VAL_pp-NMVAL-SUGA_pp_12_2559723_1_0 97.35 Reported: OK * 1.03 rosetta python projects (vbox64) DESKTOP-

Rosetta@home 03-12-2021 05:57 07:03:18 (06:52:27) 03-12-2021 05:57 aaap-PIP_pp-mNMPHE_pp-TIC-AMACBEN3_pp_0_2502770_1_0 97.44 Reported: OK + 1.03 rosetta python projects (vbox64) DESKTOP-

Rosetta@home 02-12-2021 22:06 00:00:00 (00:00:00) 02-12-2021 22:08 aagb-mNMVAL-mPHE-GPN-B3PHG_pp_12_2632874_1_0 0.00 **Aborted (203) 1.03 rosetta python projects (vbox64) DESKTOP-I

Rosetta@home 02-12-2021 21:15 06:44:38 (06:22:05) 02-12-2021 21:17 aaas-PHE_pp-mTIC_pp-NMVAL-mSUGA_1_2517870_1_0 94.43 Reported: OK + 1.03 rosetta python projects (vbox64) DESKTOP
ID: 103704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 103705 - Posted: 4 Dec 2021, 17:02:53 UTC - in response to Message 103632.  
Last modified: 4 Dec 2021, 17:04:19 UTC

Well, I use only the <project_max_concurrent> not <max_concurrent>,
and then only in the app_config.xml file in the

/var/lib/boinc/projects/boinc.bakerlab.org_rosetta directory.

[/var/lib/boinc/projects/boinc.bakerlab.org_rosetta]$ cat app_config.xml
<app_config>
<project_max_concurrent>3</project_max_concurrent>
</app_config>


So? How's that working out on Python? That might be the solution to limit them.

I have no idea.

Mon 29 Nov 2021 01:31:22 AM EST | Rosetta@home | Message from server: VirtualBox is not installed

I do no have VirtualBox, so I cannot run them.



Jean - I thought you might be on to something. But it was a fluke.
I put <name> in app_config and I set the project_concurrent to 2 and then to 1, but that is being ignored.
Still running 3.
I guess RAH will do what it wants to do no matter what commands you give it, short of cutting resource share which looks like the only way to get it to 2 tasks and maybe at 25% to get it to 1.
Because they still want to use/reserve 7629 MB per task which times 3 is 22,887 MB which is 90% of my memory.

That is with the boinc_cages_IL tasks.
ID: 103705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2002
Credit: 9,780,807
RAC: 5,492
Message 103706 - Posted: 4 Dec 2021, 17:05:44 UTC - in response to Message 103701.  

Now my pcs have "got 0 new tasks" of python wus, but in the queue there are over 5000 wus...

I received 5 tasks just now.


Uh, i cannot understand.
In the pc profiles the phyton wus are "disable" (skip) but i don't change this option.
ID: 103706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jonathan

Send message
Joined: 4 Oct 17
Posts: 43
Credit: 1,337,472
RAC: 0
Message 103707 - Posted: 4 Dec 2021, 17:19:48 UTC - in response to Message 103706.  

You got Blacklisted
ID: 103707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jonathan

Send message
Joined: 4 Oct 17
Posts: 43
Credit: 1,337,472
RAC: 0
Message 103708 - Posted: 4 Dec 2021, 17:21:13 UTC - in response to Message 103704.  

trevG, if your computer has only 4Gb of RAM, you don't have enough to run the VM tasks.
ID: 103708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 141 · 142 · 143 · 144 · 145 · 146 · 147 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org