Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 135 · 136 · 137 · 138 · 139 · 140 · 141 . . . 309 · Next

AuthorMessage
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103454 - Posted: 21 Nov 2021, 5:03:19 UTC - in response to Message 103453.  

Instead of running this task, BOINC keeps downloading new Rosetta tasks and running those.

You probably have an app_config.xml file, with a project_max_concurrent entry. Get rid of it. BOINC has a bug.
ID: 103454 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1725
Credit: 18,382,444
RAC: 19,446
Message 103455 - Posted: 21 Nov 2021, 5:13:11 UTC - in response to Message 103454.  

Instead of running this task, BOINC keeps downloading new Rosetta tasks and running those.

You probably have an app_config.xml file, with a project_max_concurrent entry. Get rid of it. BOINC has a bug.
Nah, that's where it keeps downloading more work over & over again ignoring deadlines & cache settings.
I suspect this one will be a Task that in the BOINC Manager will show as running, but if you look in Task Manager it will show 0 CPU usage. Some fail to start, while others start but never actually finish.
Grant
Darwin NT
ID: 103455 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103456 - Posted: 21 Nov 2021, 5:17:46 UTC - in response to Message 103455.  
Last modified: 21 Nov 2021, 5:34:43 UTC

Nah, that's where it keeps downloading more work over & over again ignoring deadlines & cache settings.
I suspect this one will be a Task that in the BOINC Manager will show as running, but if you look in Task Manager it will show 0 CPU usage. Some fail to start, while others start but never actually finish.

That is the app_config bug. The 0 CPU usage bug is not actually 0, just very low. It will actually occupy a slot and you won't get one to replace it until you abort it.

But I have seen Rosetta download too many pythons even without either problem. So it is not all that reliable in the best of times. I just yesterday had to set NNW after three days nine hours of download, though I had set the buffer to 1 +1.5 days, and did not see any 0 resource ones, and I don't have an app_config any longer either.

It may just be that the estimates are off, though they look OK at eight hours, which is actually a bit long. They usually run six hours or less on my machine. But who knows what value it is actually using. It may not be the one displayed.
ID: 103456 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 103457 - Posted: 21 Nov 2021, 9:45:49 UTC - in response to Message 103450.  

Could it be that the Python tasks are only for teaching their younger students how to create tasks?

The only thing that I recall is that one of their experimenters was using them, and I don't know for what purpose.
It was not exactly a ringing endorsement that they would be widely used as a basis for new work, as we might hope.

If anyone can find a mention of a use, maybe they could post it.



I belive this is what you are referring to.
ID: 103457 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103458 - Posted: 21 Nov 2021, 13:52:27 UTC - in response to Message 103457.  
Last modified: 21 Nov 2021, 14:01:22 UTC

Thanks. That looks much more promising.
I have ordered more memory, and will put another large machine on it if it is stable. But 128 GB may cause problems. You have to be careful.

PS: I know I will still have at least the "Vm job unmanageable" problem, which prevents the further download of any work unit until it times out or you reboot. So I am going to try an automatic reboot.
https://askubuntu.com/questions/13730/how-can-i-schedule-a-nightly-reboot

But the first method shown did not work.
Using sudo gedit /etc/crontab
00 6 * * * root reboot
looks more promising.

And I am hoping that they will solve the 0 CPU jobs, whenever that will be.
ID: 103458 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
StarCastle

Send message
Joined: 25 Apr 20
Posts: 7
Credit: 1,025,975
RAC: 183
Message 103460 - Posted: 22 Nov 2021, 2:35:27 UTC

Even though I have the BOINC client set to use up to 6 cores the Rosetta python project 1.03 tasks only ever single thread which reduces the amount of work the system can do.

Is this a config issue?
ID: 103460 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103461 - Posted: 22 Nov 2021, 2:38:43 UTC - in response to Message 103460.  
Last modified: 22 Nov 2021, 2:39:12 UTC

No, they are single-threaded. Each work unit occupies one virtual core.
ID: 103461 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
StarCastle

Send message
Joined: 25 Apr 20
Posts: 7
Credit: 1,025,975
RAC: 183
Message 103462 - Posted: 22 Nov 2021, 2:41:46 UTC - in response to Message 103461.  

But I have 6 virtual cores available so why are there not 6 tasks running at the same time?

I don't have this issue with other Projects so it is not a system issue
ID: 103462 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 103463 - Posted: 22 Nov 2021, 2:44:44 UTC - in response to Message 103460.  

Even though I have the BOINC client set to use up to 6 cores the Rosetta python project 1.03 tasks only ever single thread which reduces the amount of work the system can do.

Is this a config issue?

No, it is due to the very large amount of memory each Python task reserves, 7.45 GB. They seldom actually use more than 100 MB, but that's not as important.

Computers with only 16 GB of memory cannot fit in two of those tasks and also the other programs needed to keep BOINC running.
ID: 103463 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103464 - Posted: 22 Nov 2021, 2:44:57 UTC - in response to Message 103462.  
Last modified: 22 Nov 2021, 2:47:37 UTC

Each of the python work units (the only ones available at the moment) take 8 GB of memory. They don't actually use that much, but they require it for download.
Your machines have 16 GB each, so they could run only two at most. Do you have BOINC set to use 100% of the memory?
(As robertmiles said, the OS takes some too. I am not sure whether BOINC takes that into account.)
ID: 103464 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
StarCastle

Send message
Joined: 25 Apr 20
Posts: 7
Credit: 1,025,975
RAC: 183
Message 103465 - Posted: 22 Nov 2021, 2:46:58 UTC - in response to Message 103464.  

OK, so that is the issue, I understand.

Thanks a million for the feedback.

Time to add more memory!
ID: 103465 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
WR-HW95

Send message
Joined: 5 Jan 06
Posts: 2
Credit: 8,086,818
RAC: 0
Message 103468 - Posted: 22 Nov 2021, 15:29:20 UTC

So.. I started to crunch R@h after long break.
What should be running time for python 1.03 (Vbox64) units?
Atm. I have 3 of those running on AMD 5900X and those are now @99.996% after 57hours.
2 more units are "Postponed:VM environment needed to be cleaned up."
ID: 103468 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103469 - Posted: 22 Nov 2021, 16:15:41 UTC - in response to Message 103468.  

Atm. I have 3 of those running on AMD 5900X and those are now @99.996% after 57hours.
2 more units are "Postponed:VM environment needed to be cleaned up."

Congratulations. You have managed to hit both of the problems right off the bat.

You could have aborted the 99.996% ones in the first five minutes, when they would show less than 1% CPU usage (I use BoincTasks to check that).
And to fix the postponed ones, just reboot. Otherwise, you have to wait about a day for them to restart.

You have made much more progress than most.
ID: 103469 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 103472 - Posted: 22 Nov 2021, 18:16:57 UTC - in response to Message 103465.  

OK, so that is the issue, I understand.
Thanks a million for the feedback.
Time to add more memory!

Lets see now
Python ram calculator says 12 cpu`s times 8 gig each [allowing a bit] seems you need to fit about 100GB to be safe to run full on
And that`s no joke
well the way RPP tasks hog ram its a sad joke the project needs to do some thing about
and this is only the hundredth {or so} post to `hint` at it
OK folks I`le go back to banging my head on the wall . . . .
ID: 103472 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BachiHD

Send message
Joined: 24 Sep 21
Posts: 1
Credit: 58,922
RAC: 9
Message 103474 - Posted: 22 Nov 2021, 20:56:48 UTC
Last modified: 22 Nov 2021, 20:57:06 UTC

I just encountered an issue. My BOINC client cannot recieve any Rosetta@Home projects.
The log says:

22/11/2021 20:40:55 | Rosetta@home | Sending scheduler request: To fetch work.
22/11/2021 20:40:55 | Rosetta@home | Requesting new tasks for CPU
22/11/2021 20:40:57 | Rosetta@home | Scheduler request completed: got 0 new tasks
22/11/2021 20:40:57 | Rosetta@home | No tasks sent
22/11/2021 20:40:57 | Rosetta@home | Message from server: VirtualBox is not installed
22/11/2021 20:40:57 | Rosetta@home | Project requested delay of 31 seconds


After that I installed VBox but it still outputs this error.
I also tried to reset the project but it doesn't make any difference.

What can I do now?
ID: 103474 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 2,014
Message 103475 - Posted: 22 Nov 2021, 21:29:57 UTC - in response to Message 103474.  

I just encountered an issue. My BOINC client cannot recieve any Rosetta@Home projects.
The log says:

22/11/2021 20:40:55 | Rosetta@home | Sending scheduler request: To fetch work.
22/11/2021 20:40:55 | Rosetta@home | Requesting new tasks for CPU
22/11/2021 20:40:57 | Rosetta@home | Scheduler request completed: got 0 new tasks
22/11/2021 20:40:57 | Rosetta@home | No tasks sent
22/11/2021 20:40:57 | Rosetta@home | Message from server: VirtualBox is not installed
22/11/2021 20:40:57 | Rosetta@home | Project requested delay of 31 seconds


After that I installed VBox but it still outputs this error.
I also tried to reset the project but it doesn't make any difference.

What can I do now?

What version of Vbox? A version recent enough to include vbox64 is required for the Python tasks.

How much memory does your computer have? 12 GB is about the least that will run Python tasks, and then only if you have enabled virtualization in the BIOS settings,

Under Options, then computer preferences, you may have to tell BOINC that it is allowed to use higher fractions of the memory and the disk space.

There's an occasional task with much smaller requirements, but people try to download them much more often than they become available.

Also, you may have to restart or reboot your operating system (probably Windows or Linux) before such changes will take effect.
ID: 103475 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,276,393
RAC: 2,018
Message 103476 - Posted: 22 Nov 2021, 21:30:43 UTC - in response to Message 103474.  

Did you restart Windows? If not, did you restart BOINC?
It will likely be fixed if you restart Windows.
ID: 103476 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103479 - Posted: 22 Nov 2021, 21:43:59 UTC - in response to Message 103472.  

Python ram calculator says 12 cpu`s times 8 gig each [allowing a bit] seems you need to fit about 100GB to be safe to run full on
And that`s no joke
well the way RPP tasks hog ram its a sad joke the project needs to do some thing about
and this is only the hundredth {or so} post to `hint` at it

Never has so much been asked to do so little for so few.
ID: 103479 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile G.L.I.S.
Avatar

Send message
Joined: 25 Dec 08
Posts: 26
Credit: 2,393,746
RAC: 3,266
Message 103491 - Posted: 23 Nov 2021, 11:03:02 UTC - in response to Message 103474.  
Last modified: 23 Nov 2021, 11:22:50 UTC

I just encountered an issue. My BOINC client cannot recieve any Rosetta@Home projects.
The log says:

22/11/2021 20:40:55 | Rosetta@home | Sending scheduler request: To fetch work.
22/11/2021 20:40:55 | Rosetta@home | Requesting new tasks for CPU
22/11/2021 20:40:57 | Rosetta@home | Scheduler request completed: got 0 new tasks
22/11/2021 20:40:57 | Rosetta@home | No tasks sent
22/11/2021 20:40:57 | Rosetta@home | Message from server: VirtualBox is not installed
22/11/2021 20:40:57 | Rosetta@home | Project requested delay of 31 seconds


After that I installed VBox but it still outputs this error.
I also tried to reset the project but it doesn't make any difference.

What can I do now?
VirtualBox, is not compatible and does not work in Windows if Microsoft Hyper-V is enabled.
First of all, uninstall any component of Hypr-V from your system.



Windows it will restart, then try again.
If again, BOIN Manager, should give you the same warning, you will have to open 'Windows PowerShell' (with administrator rights) and type (copy-paste):

bcdedit /set hypervisorlaunchtype off and press [OK]
Subsequently 'exit' ...

Restart Windows and now BOINC, should be able to download the wus
ID: 103491 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 103493 - Posted: 23 Nov 2021, 14:49:03 UTC
Last modified: 23 Nov 2021, 14:54:30 UTC

I am starting to get some server aborts (202) on the regular Rosettas, so they must have some bad ones.
Also I see some pythons eliminated that way too, which is good. Hopefully they are finding and getting rid of the "0 CPU" tasks.

I still get "Vm job unmanageable" suspensions on the pythons, but that may be beyond their power to easily correct for a while.
My automatic reboot cron job (as posted previously) takes care of it by rebooting daily.
ID: 103493 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 135 · 136 · 137 · 138 · 139 · 140 · 141 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org