Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 152 · 153 · 154 · 155 · 156 · 157 · 158 . . . 232 · Next

AuthorMessage
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104067 - Posted: 7 Jan 2022, 20:04:36 UTC - in response to Message 104066.  
Last modified: 7 Jan 2022, 20:22:22 UTC

Well so far my i5 has worked perfectly, my Ryzen got banned, and my old Xeons I just noticed have spent 24 hours running python tasks with a total of 13 minutes CPU time. I wondered why they felt cold to the touch. There's something terribly wrong with these WUs.

These are the two Xeons, I'm in the process of aborting the tasks, if anyone can look and interpret the outputs. https://boinc.bakerlab.org/rosetta/results.php?hostid=6169682 https://boinc.bakerlab.org/rosetta/results.php?hostid=6169697 Make sure you look at the right ones, the ones aborted just now, not the ones aborted yesterday (that was something else when I was trying to set things up).

Here is a dodgy one, many errors, please interpret: https://boinc.bakerlab.org/rosetta/result.php?resultid=1463541284

It includes many of these lines:

Hypervisor System Log:
24:11:34.575288 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={85cd948e-a71f-4289-281e-0ca7ad48cd89} aComponent={MachineWrap} aText={The object functionality is limited}, preserve=false aResultDetail=0"

I have asked over in the main Boinc forum too, https://boinc.berkeley.edu/dev/forum_thread.php?id=14532
ID: 104067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104068 - Posted: 7 Jan 2022, 21:37:14 UTC - in response to Message 104067.  

I've asked in the LHC forum, since they use vbox on almost all tasks and might know what the problem is: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5781
ID: 104068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1194
Credit: 13,218,848
RAC: 69
Message 104069 - Posted: 7 Jan 2022, 23:00:47 UTC - in response to Message 104068.  

I've asked in the LHC forum, since they use vbox on almost all tasks and might know what the problem is: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5781

Don't confuse vbox (which handles 32-bit work) with vbox64 (which handles 64-bit work).
ID: 104069 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104072 - Posted: 7 Jan 2022, 23:28:31 UTC - in response to Message 104069.  
Last modified: 7 Jan 2022, 23:48:15 UTC

I assume everyone is on vbox64 by now?

LHC will be, and they seem to use the same wrapper as Rosetta.

I'm not sure what it is you're trying to tell me. I only installed one piece of software, virtualbox, from the Oracle site, same version that Boinc issues. Are you telling me there's two halves and Rosetta uses the other one to LHC? My i5 which does python ok has vboxheadless and virtualbox interface listed in the windows task manager azs running, no mention of 32 or 64 bit.

After following the advice from the LHC forum, I am no further forwards. My old xeons don't do any CPU time, my Ryzen (I think, can't check as it's now banned) computes but is not validated, and my i5 runs them perfectly. Same version of everything on all of them.
ID: 104072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1194
Credit: 13,218,848
RAC: 69
Message 104074 - Posted: 7 Jan 2022, 23:47:11 UTC - in response to Message 104072.  
Last modified: 7 Jan 2022, 23:56:40 UTC

RNA World is still on vbox, but they're down to 19 unfinished workunits. So, not everyone.

Virtualbox (at least the latest versions) has two parts, the vbox part for 32-bit work and the vbox64 part for 64-bit work.

Rosetta. and probably also LHC. use the vbox64 part. I don't participate in LHC. so I haven't seen what they use.
ID: 104074 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104075 - Posted: 7 Jan 2022, 23:48:51 UTC - in response to Message 104074.  
Last modified: 7 Jan 2022, 23:49:28 UTC

RNA World is still on vbox, but they're down to 19 unfinished workunits. So, not everyone.
But LHC and Rosetta are 64 bit?

And how does RNA world work, do you have to download an old 32 bit version?
ID: 104075 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5550
Credit: 5,554,708
RAC: 58
Message 104077 - Posted: 8 Jan 2022, 0:04:15 UTC - in response to Message 104067.  
Last modified: 8 Jan 2022, 0:06:01 UTC

Well so far my i5 has worked perfectly, my Ryzen got banned, and my old Xeons I just noticed have spent 24 hours running python tasks with a total of 13 minutes CPU time. I wondered why they felt cold to the touch. There's something terribly wrong with these WUs.

These are the two Xeons, I'm in the process of aborting the tasks, if anyone can look and interpret the outputs. https://boinc.bakerlab.org/rosetta/results.php?hostid=6169682 https://boinc.bakerlab.org/rosetta/results.php?hostid=6169697 Make sure you look at the right ones, the ones aborted just now, not the ones aborted yesterday (that was something else when I was trying to set things up).

Here is a dodgy one, many errors, please interpret: https://boinc.bakerlab.org/rosetta/result.php?resultid=1463541284

It includes many of these lines:

Hypervisor System Log:
24:11:34.575288 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={85cd948e-a71f-4289-281e-0ca7ad48cd89} aComponent={MachineWrap} aText={The object functionality is limited}, preserve=false aResultDetail=0"

I have asked over in the main Boinc forum too, https://boinc.berkeley.edu/dev/forum_thread.php?id=14532



It was chugging along just fine and then blows up with access denied? That's weird.
Did windows all of sudden block it or it ran into a fault with the data.
That it ran 24 hours is really odd. These finish in 4 hours or less.
A quick look with the object statement says something went wrong in Vbox.
If that happens repeatedly, then you need to remove Vbox and reinstall it.


Again its very late in the EU, so I will have to dig into more later.
Maybe our two experts can help you more.
ID: 104077 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104078 - Posted: 8 Jan 2022, 0:12:10 UTC - in response to Message 104077.  

The xeons never chugged along just fine. The CPU usage in Windows task manager was virtually zero, as was the number in brackets in Boinc showing the actual cpu usage. If left alone, it gradually moved to 100% over 24 hours, but did virtually no CPU usage (I believe it was about 13 minutes CPU time over 24 hours). On the working i5, I see full CPU usage almost immediately.

If it's something in windows, do you have an idea what? Can I check some folder permissions?

Vbox has been reinstalled already, didn't help. I tried 5.2 and 6.1. Both with the extension pack.

I am also in the EU, in the UK, the founder of the world as we know it :-)

When I say EU I mean geographically, since we told you lot to go forth and multiply :-P
ID: 104078 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104079 - Posted: 8 Jan 2022, 0:13:45 UTC - in response to Message 104074.  

RNA World is still on vbox, but they're down to 19 unfinished workunits. So, not everyone.

Virtualbox (at least the latest versions) has two parts, the vbox part for 32-bit work and the vbox64 part for 64-bit work.

Rosetta. and probably also LHC. use the vbox64 part. I don't participate in LHC. so I haven't seen what they use.
The wrapper for LHC's CMS simulations uses the same wrapper as Rosetta, so I assume both are 64 bit. LHC are always bang up to date with everything.

Still not sure what you thought I was confusing.
ID: 104079 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1194
Credit: 13,218,848
RAC: 69
Message 104080 - Posted: 8 Jan 2022, 0:19:41 UTC - in response to Message 104075.  
Last modified: 8 Jan 2022, 0:20:22 UTC

RNA World is still on vbox, but they're down to 19 unfinished workunits. So, not everyone.
But LHC and Rosetta are 64 bit?

Probably.

And how does RNA world work, do you have to download an old 32 bit version?

Their application is written to run in 32-bit mode.

They haven't updated their application since VirtualBox would only do 32-bit mode. With so few workunits left, they don't plan to.

The recent versions of VirtualBox can do both 32-bit and 64-bit, so downloading the 32-bit application from RNA World is enough.

It's not easy to make 32-bit applications run in 64-bit mode. You have to recompile them in 64-bit mode. but that's usually not enough.
ID: 104080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1194
Credit: 13,218,848
RAC: 69
Message 104081 - Posted: 8 Jan 2022, 0:27:10 UTC - in response to Message 104078.  
Last modified: 8 Jan 2022, 0:27:48 UTC

The xeons never chugged along just fine. The CPU usage in Windows task manager was virtually zero, as was the number in brackets in Boinc showing the actual cpu usage. If left alone, it gradually moved to 100% over 24 hours, but did virtually no CPU usage (I believe it was about 13 minutes CPU time over 24 hours). On the working i5, I see full CPU usage almost immediately.

If it's something in windows, do you have an idea what? Can I check some folder permissions?

[snip]

It appears to be something in the workunits, not in the folders. I haven't spotted just what.
ID: 104081 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104082 - Posted: 8 Jan 2022, 0:46:57 UTC - in response to Message 104081.  

It appears to be something in the workunits, not in the folders. I haven't spotted just what.
What I don't understand is why it makes it either work or not on different machines.

Ryzen 9 3900XT - no.
i5-8600K - yes.
Xeon X5650 - no.
ID: 104082 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1194
Credit: 13,218,848
RAC: 69
Message 104083 - Posted: 8 Jan 2022, 0:51:08 UTC - in response to Message 104082.  

It appears to be something in the workunits, not in the folders. I haven't spotted just what.
What I don't understand is why it makes it either work or not on different machines.

Ryzen 9 3900XT - no.
i5-8600K - yes.
Xeon X5650 - no.

It might mean that those three CPUs have slightly different instruction sets, and the workunits were not written to use only instructions that are available on all three of them.
ID: 104083 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104084 - Posted: 8 Jan 2022, 1:03:36 UTC - in response to Message 104083.  

It appears to be something in the workunits, not in the folders. I haven't spotted just what.
What I don't understand is why it makes it either work or not on different machines.

Ryzen 9 3900XT - no.
i5-8600K - yes.
Xeon X5650 - no.

It might mean that those three CPUs have slightly different instruction sets, and the workunits were not written to use only instructions that are available on all three of them.
Possible. The Ryzen is much newer though. Are there instructions AMD have omitted that older Intels had?
ID: 104084 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1194
Credit: 13,218,848
RAC: 69
Message 104085 - Posted: 8 Jan 2022, 1:24:22 UTC - in response to Message 104084.  

It appears to be something in the workunits, not in the folders. I haven't spotted just what.
What I don't understand is why it makes it either work or not on different machines.

Ryzen 9 3900XT - no.
i5-8600K - yes.
Xeon X5650 - no.

It might mean that those three CPUs have slightly different instruction sets, and the workunits were not written to use only instructions that are available on all three of them.
Possible. The Ryzen is much newer though. Are there instructions AMD have omitted that older Intels had?

Probably, if Intel put patents on these instructions.
ID: 104085 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104086 - Posted: 8 Jan 2022, 2:07:08 UTC - in response to Message 104085.  
Last modified: 8 Jan 2022, 2:20:45 UTC

Probably, if Intel put patents on these instructions.
That can't happen very often, or AMD and Intel CPUs would be totally incompatible.

My AMD Ryzen 9 3900XT has, according to Boinc:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 svm sse4a osvw ibs skinit wdt tce topx page1gb r

My Intel i5 8600K has, according to Boinc:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle

Intel only: dts acpi ss tm vmx smx tm2 pbe fsgsbase bmi1 hle
AMD only: svm sse4a osvw ibs skinit wdt tce topx page1gb r

It's a miracle anything works at all. [shakes head]Capitalists....[/shakes head]

This is an interesting read:
https://itigic.com/x86-on-intel-and-amd-why-cant-anyone-else-make-cpus/
ID: 104086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 73
Credit: 253,361
RAC: 3
Message 104088 - Posted: 8 Jan 2022, 6:04:47 UTC - in response to Message 104063.  

quote]Lucky you, I have chronic fatigue :-([/quote]

Epstein-Barr syndrome?

SGaber
ID: 104088 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104090 - Posted: 8 Jan 2022, 17:36:42 UTC - in response to Message 104088.  
Last modified: 8 Jan 2022, 17:36:56 UTC

Lucky you, I have chronic fatigue :-(


Epstein-Barr syndrome?

SGaber
I've had it since before I was her toyboy.
ID: 104090 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104091 - Posted: 8 Jan 2022, 17:47:50 UTC - in response to Message 104090.  
Last modified: 8 Jan 2022, 17:52:22 UTC

From LHC:

7 GB is rather huge for this kind of usecase and it appears that the virtual disk has only 1 GB space left.
LHC@home usually allows up to 20 GB capacity.
Part of the error output on one of my machines failing a Rosetta VB Python task:

2022-01-07 22:37:22 (9144):
Command: VBoxManage -q showhdinfo "C:ProgramDataBOINCslots/vm_image.vdi"
Exit Code: 0
Output:
UUID: ef35dff9-d482-48f8-9519-fef6c1b23a3b
Parent UUID: base
State: created
Type: normal (base)
Location: C:ProgramDataBOINCslotsvm_image.vdi
Storage format: VDI
Format variant: dynamic default
Capacity: 8192 MBytes
Size on disk: 7115 MBytes
Encryption: disabled
Property: AllocationBlockSize=1048576
Also, I'm wondering if "dynamic default" means it should auto-grow?
ID: 104091 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1278
Credit: 5,835,886
RAC: 2,367
Message 104092 - Posted: 8 Jan 2022, 18:58:16 UTC - in response to Message 104091.  

Also, I'm wondering if "dynamic default" means it should auto-grow?
It doesn't. LHC tell me capacity is the upper limit. I think perhaps Rosetta need to check this.
ID: 104092 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 152 · 153 · 154 · 155 · 156 · 157 · 158 . . . 232 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2022 University of Washington
https://www.bakerlab.org