Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 148 · 149 · 150 · 151 · 152 · 153 · 154 . . . 237 · Next

AuthorMessage
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1316
Credit: 5,992,753
RAC: 11,525
Message 103911 - Posted: 26 Dec 2021, 15:30:45 UTC - in response to Message 103885.  

most work requests get the `no tasks sent` and no reason why message.

Do you have any of the "Vm job unmanageable" ones on you machine? That will prevent any more from downloading.
You need to reboot to fix it. Or find another project.

I luckily have never had any `unmanageable` jobs or things like that,
and I only have 5 error tasks and one of them was `cancelled by server` another three where the `one minit wunders` that run for hours and do nothing , aborted them.
I did reboot the computer earlier today anyway, just in case something had gone funky
Just had a look at the server status, only three R4.2 jobs in que.


I got a "the VB environment needs cleaning up", which paused a task. I went into VBox itself and removed some images that LHC had left in there cluttering it up, then rebooted the computer. Everything is fine now.
ID: 103911 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1316
Credit: 5,992,753
RAC: 11,525
Message 103912 - Posted: 26 Dec 2021, 15:31:43 UTC - in response to Message 103894.  

I had gone as far to untick all the disk space boxes to give it unlimited use of the disk
The boxes aren't tickable, they require values. And one value in any one of the options overrides the values in any of the other two when it comes to what disk space is actually available.

They are quite tickable - there are checkboxes to the left of each value box which turn off corresponding limit.

I was referring to the web based settings.
If you've only got one system, local Setting are ok. More than one, web based settings make life much easier.

web based settings also have same checkboxes for disk and network usage limits as local settings do. At least here on Rosetta server web based settings for BOINC.
Hmm.
That's new.
Clicked on Edit and up come the check boxes with the value boxes next to them.


What are these boxes? I just use local settings, even though I have 7 computers, because they're all different. Pythons run ok here.
ID: 103912 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 343
Credit: 988,736
RAC: 107
Message 103913 - Posted: 26 Dec 2021, 16:42:18 UTC
Last modified: 26 Dec 2021, 16:43:41 UTC

Looks like a small batch of Zika/West Nile stuff. Around 400,000 tasks.
That's the 3rd of 4th batch of work on those viruses in recent times.
ID: 103913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 185
Credit: 23,193,732
RAC: 1,125
Message 103917 - Posted: 27 Dec 2021, 0:06:42 UTC - in response to Message 103912.  
Last modified: 27 Dec 2021, 0:09:48 UTC

I had gone as far to untick all the disk space boxes to give it unlimited use of the disk
The boxes aren't tickable, they require values. And one value in any one of the options overrides the values in any of the other two when it comes to what disk space is actually available.

They are quite tickable - there are checkboxes to the left of each value box which turn off corresponding limit.

I was referring to the web based settings.
If you've only got one system, local Setting are ok. More than one, web based settings make life much easier.

web based settings also have same checkboxes for disk and network usage limits as local settings do. At least here on Rosetta server web based settings for BOINC.
Hmm.
That's new.
Clicked on Edit and up come the check boxes with the value boxes next to them.


What are these boxes? I just use local settings, even though I have 7 computers, because they're all different. Pythons run ok here.

I was setting the disk usage in Boinc Manager as large as possible - click the `Options` tab, then `computing preferences` then `Disk and memory` settings , to try and get rid of some messages about low disk space.
It did not work . even with 200GB+ of disk space "Free available to Boinc"
I still sometimes get.. and this is only ten minits after a reboot
-----------------
26/12/2021 16:31:29 Rosetta@home Message from server : Rosetta needs 1907.35MB more disk space. You currently have 0.00 MB available and it needs 1907.35 MB.
26/12/2021 16:31:29 Rosetta@home Message from server : rosetta python projects needs 19073.49MB more disk space. You currently have 0.00 MB available and it needs 19073.49 MB.
--------------------
and that is with - on the `Disk` tab of boinc manager
used by boinc - 123.97GB
free . available to BOINC - 226.19GB
used by other programs - 114.99GB
-------------
I have given up caring about those messages so long as rosetta works.
ID: 103917 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1366
Credit: 13,624,788
RAC: 0
Message 103918 - Posted: 27 Dec 2021, 2:14:51 UTC - in response to Message 103917.  
Last modified: 27 Dec 2021, 2:19:52 UTC

I was setting the disk usage in Boinc Manager as large as possible - click the `Options` tab, then `computing preferences` then `Disk and memory` settings , to try and get rid of some messages about low disk space.
It did not work . even with 200GB+ of disk space "Free available to Boinc"
It doesn't matter how large the "Use no more than" value is, if the "Leave at least" & "Use no more than % of total" result in a lower amount being available.
As it says at the top of those options, the most restrictive setting is the one that is used.



I have given up caring about those messages so long as rosetta works.
When it comes to Python, you're aborting more than you actually process. So i wouldn't consider that as it being working.
Grant
Darwin NT
ID: 103918 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 185
Credit: 23,193,732
RAC: 1,125
Message 103920 - Posted: 27 Dec 2021, 12:15:41 UTC - in response to Message 103918.  

I have given up caring about those messages so long as rosetta works.
When it comes to Python, you're aborting more than you actually process. So i wouldn't consider that as it being working.

That is true,
I don't want to abort them, I am ok with running anything that will do some thing usefull
I take it you have looked at my returned work units.
the problem is that the one`s I abort are the `one minit wunders` that have only a few seconds of CPU time after several hours of elapsed time, and seem to be pointless to continue running them
I did see one of mine had run for 23 hours before I gave up on it,
in another thread on was seen to run for over three days and still not finish
What do you do with them ?
Do they ever finish and produce usefull work
Any idea if the over run watch dog is working with python work units, it does not look like it from what I can see.
ID: 103920 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 171
Credit: 4,775,525
RAC: 1,490
Message 103923 - Posted: 27 Dec 2021, 15:11:02 UTC - in response to Message 103920.  
Last modified: 27 Dec 2021, 15:13:22 UTC

I've been out of the loop for a little while. Did they recently fix the RAM requirements for the vBox tasks? I'm running 7 Rosetta Python tasks + 1 WCG ARP on 16GBs of RAM.

I'm not having issues, for once.
ID: 103923 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 171
Credit: 4,775,525
RAC: 1,490
Message 103924 - Posted: 27 Dec 2021, 15:18:06 UTC - in response to Message 103920.  
Last modified: 27 Dec 2021, 15:19:37 UTC


the problem is that the one`s I abort are the `one minit wunders` that have only a few seconds of CPU time after several hours of elapsed time, and seem to be pointless to continue running them


I've only seen that type of task once, I aborted it after over 20 hours. CPU time was ridiculously low, a few minutes at most, IIRC. Far more times, I see tasks that claim to be running, but with the timer stuck at 0:00 and no results in days. Those tasks get unstuck after relaunching BOINC.

I haven't had any issues in a while, knock on wood.
ID: 103924 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1194
Credit: 13,234,159
RAC: 1,115
Message 103927 - Posted: 27 Dec 2021, 19:17:51 UTC - in response to Message 103923.  

I've been out of the loop for a little while. Did they recently fix the RAM requirements for the vBox tasks? I'm running 7 Rosetta Python tasks + 1 WCG ARP on 16GBs of RAM.

I'm not having issues, for once.

Appears to be PARTIALLY fixed, at least under Windows 10. Same amount of free memory required to START a task, but the amount of memory the task reserves after it starts is usually much less than before.
ID: 103927 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1316
Credit: 5,992,753
RAC: 11,525
Message 103928 - Posted: 27 Dec 2021, 19:18:44 UTC - in response to Message 103923.  

I've been out of the loop for a little while. Did they recently fix the RAM requirements for the vBox tasks? I'm running 7 Rosetta Python tasks + 1 WCG ARP on 16GBs of RAM.

I'm not having issues, for once.


They seem to have done, except two of my machines refuse to run more than 2 (quad core, 8GB). They don't indicate a shortage of a RAM, the tasks just sit there waiting.
ID: 103928 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 877
Credit: 51,526,729
RAC: 1,350
Message 103933 - Posted: 28 Dec 2021, 17:54:15 UTC - in response to Message 103885.  

Do you have any of the "Vm job unmanageable" ones on you machine? That will prevent any more from downloading.
You need to reboot to fix it. Or find another project.

I luckily have never had any `unmanageable` jobs or things like that,

That is because you are on Windows. BOINC has a pre-made VBox wrapper for that, which is probably what the pythons use:
https://boinc.berkeley.edu/trac/wiki/VboxApps#Premadevboxwrapperexecutables
That avoids the COM interface, which causes the problem.

However, they don't have a pre-made Linux wrapper that avoids the problem.

It also helps to run VirtualBox 5.x.x rather than 6.x.x, which also avoids the COM interface.
Since you are on Win7, that is probably what you are using.
ID: 103933 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1828
Credit: 107,051,352
RAC: 8,739
Message 103934 - Posted: 28 Dec 2021, 18:16:20 UTC

How do you use vboxwrapper on Windows? I can't find any good instructions on how to implement it.

D
ID: 103934 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 877
Credit: 51,526,729
RAC: 1,350
Message 103935 - Posted: 28 Dec 2021, 18:33:03 UTC - in response to Message 103934.  
Last modified: 28 Dec 2021, 18:52:17 UTC

How do you use vboxwrapper on Windows? I can't find any good instructions on how to implement it.

Beats me. The project uses it when they compile their stuff insofar as I know.
I think LHC did their own wrapper, and fixed the problem for Linux; I run CMS on it without the problem.

PS - I tried substituting the wrapper from LHC (vboxwrapper_26196_x86_64-pc-linux-gnu) for the wrapper here.
But BOINC does a checksum and rejects it. It uses only the python version here on Rosetta.

PPS - I followed the instructions given on Cosmology, which also had the problem to some extent (but less than here it seems).
http://www.cosmologyathome.org/forum_thread.php?id=7769#22921
Maybe it works differently there, or on a different version of BOINC, but not here and now.
ID: 103935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1828
Credit: 107,051,352
RAC: 8,739
Message 103936 - Posted: 28 Dec 2021, 18:49:25 UTC

ID: 103936 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 877
Credit: 51,526,729
RAC: 1,350
Message 103937 - Posted: 28 Dec 2021, 18:54:43 UTC - in response to Message 103936.  

There's a version for download here:

https://boinc.berkeley.edu/trac/wiki/VboxApps#Premadevboxwrapperexecutables

Yes, but the Linux versions state:
The following uses the COM interface; not recommended.

The version they give for linux (vboxwrapper_26198_x86_64-pc-linux-gnu) is the same version as used here.
ID: 103937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5573
Credit: 5,560,753
RAC: 558
Message 103940 - Posted: 29 Dec 2021, 18:55:46 UTC
Last modified: 29 Dec 2021, 19:02:19 UTC

WTH? I got flagged for VM errors?
So is 5.x generating VM errors or what?
This is getting stupid.
I need 5.x for Quchem or at least that is the theory, but i need 6 for here?

Switching to 6, abandoned Quchem until I get more RAM, because another person that also does this project and Quchem can run it in 6 and apparently 5 kicks up errors here. And I get all kinds of errors that seem related to memory in Quchem.
Craziness!

Yeah I know...pushing the machine to far for now. But 2 memory sticks are from one of my setups that I upgraded and offered the old MOBO and CPU to another person here in Europe. So what I want to do in projects I guess with 24gigs is not enough memory.

I wish this project would give you a automated headsup if you kick up to many VM errors, but then that is to advanced for this project.
ID: 103940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 877
Credit: 51,526,729
RAC: 1,350
Message 103941 - Posted: 29 Dec 2021, 19:15:35 UTC - in response to Message 103940.  

I need 5.x for Quchem or at least that is the theory, but i need 6 for here?

VBox 5.2.44 is working fine for me with Win10. But I have 48 GB memory, and am running only 7 work units on a Ryzen 3600.
https://boinc.bakerlab.org/rosetta/results.php?hostid=6146985&offset=0&show_names=0&state=4&appid=
ID: 103941 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5573
Credit: 5,560,753
RAC: 558
Message 103942 - Posted: 29 Dec 2021, 23:17:39 UTC - in response to Message 103941.  

I need 5.x for Quchem or at least that is the theory, but i need 6 for here?

VBox 5.2.44 is working fine for me with Win10. But I have 48 GB memory, and am running only 7 work units on a Ryzen 3600.
https://boinc.bakerlab.org/rosetta/results.php?hostid=6146985&offset=0&show_names=0&state=4&appid=



Yeah, I am going up towards you limit in RAM after new years.
I got 24, but I am going to drop the 2 x 4 and go with 2 x 16 along with the 2 x 8 that I already have.
But remember I run a lot of different stuff all at the same time.

Since I got erased from python it seems i have to catch up again. so its running 8 python then 4 WCG MCM and 2 sidock plsu einstein and prime grid and FAH. That sucks up 77% of my total memory.
Einstein and Prime and FAH are on GPU.

Tuillo is running 6 both here and Quchem and having no problem, but he isn't maxing out his machine.
ID: 103942 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
den777

Send message
Joined: 29 Apr 13
Posts: 1
Credit: 1,535,943
RAC: 0
Message 103945 - Posted: 30 Dec 2021, 10:13:48 UTC

Recently I had to abort tasks that are not using CPU and showing no progress for over a day.
Virtual machine console looks like this

So, you are pushing tasks with obvious errors without even minimal checking if they can start at all?
ID: 103945 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
gbayler

Send message
Joined: 10 Apr 20
Posts: 14
Credit: 3,069,484
RAC: 0
Message 103946 - Posted: 30 Dec 2021, 11:50:48 UTC

I have 3 WUs/tasks running longer than any other tasks I have seen before; they don't seem to terminate. Their progress asymptotically approaches 100%, but, as it seems, never reaches it.

These are the WUs in question:

https://boinc.bakerlab.org/rosetta/result.php?resultid=1462247667 progress: 99.986% elapsed: 2d 23:19:00 CPU time: 00:19:44
https://boinc.bakerlab.org/rosetta/result.php?resultid=1462512698 progress: 99.929% elapsed: 2d 10:03:00 CPU time: 00:15:56
https://boinc.bakerlab.org/rosetta/result.php?resultid=1462518266 progress: 99.822% elapsed: 2d 02:42:00 CPU time: 00:13:54

Do I have to manually abort such WUs?

Best regards,

Günther
ID: 103946 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 148 · 149 · 150 · 151 · 152 · 153 · 154 . . . 237 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2022 University of Washington
https://www.bakerlab.org