Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 211 · 212 · 213 · 214 · 215 · 216 · 217 . . . 219 · Next

AuthorMessage
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 109
Credit: 94,002
RAC: 13
Message 106226 - Posted: 16 May 2022, 18:46:20 UTC - in response to Message 106225.  

Sometimes vm os hangs on "Spectre V2 mitigation: LFENCE not serializing. Switching to generic retpoline" during loading.
i was able to reboot vm from vbox manager and it worked.
ID: 106226 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1176
Credit: 13,195,130
RAC: 3,074
Message 106227 - Posted: 16 May 2022, 19:23:33 UTC - in response to Message 106225.  

I don't get this....currently 29 seconds processing on 4hrs and 12 minutes run time.
CPU is at .19% Progress is .006% every 2 seconds.
Again a aam task. Some work some don't.
ABORTED
That's 3,
The other two completed ok on the wingmen.
No idea what the difference is. One was linux and the other was windows.

2022-05-16 15:41:54 (2912): VM state change detected. (old = 'running', new = 'paused')
2022-05-16 15:46:33 (2912): VM state change detected. (old = 'paused', new = 'running')

[snip]

The important lines were:

2022-05-16 17:21:17 (2912): Status Report: Elapsed Time: '6000.901087'
2022-05-16 17:21:17 (2912): Status Report: CPU Time: '13.968750'

Basically, the emulated operating system encountered an error and didn't report what it was. It then started waiting for another command, but the task didn't have any, so it did almost nothing for a few hours.

Check your event log for a line similar to this one, near its beginning:

5/14/2022 11:59:47 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2

If avx and avx2 do not appear in this line for your computer, expect very few if any of the Rosetta Python task to work correctly.

If they do appear, most of them will work, but an occasional few won't. It's a known problem and the project shows very little sign of planning to fix it.

In other words, the Rosetta VM tasks aren't set up to work correctly on many of the older CPUs.
ID: 106227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 109
Credit: 94,002
RAC: 13
Message 106228 - Posted: 16 May 2022, 19:24:52 UTC - in response to Message 106227.  

It happened once on ryzen 3 3100
ID: 106228 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5502
Credit: 5,476,728
RAC: 1,985
Message 106229 - Posted: 16 May 2022, 22:32:43 UTC - in response to Message 106227.  
Last modified: 16 May 2022, 22:34:49 UTC

I don't get this....currently 29 seconds processing on 4hrs and 12 minutes run time.
CPU is at .19% Progress is .006% every 2 seconds.
Again a aam task. Some work some don't.
ABORTED
That's 3,
The other two completed ok on the wingmen.
No idea what the difference is. One was linux and the other was windows.

2022-05-16 15:41:54 (2912): VM state change detected. (old = 'running', new = 'paused')
2022-05-16 15:46:33 (2912): VM state change detected. (old = 'paused', new = 'running')

[snip]

The important lines were:

2022-05-16 17:21:17 (2912): Status Report: Elapsed Time: '6000.901087'
2022-05-16 17:21:17 (2912): Status Report: CPU Time: '13.968750'

Basically, the emulated operating system encountered an error and didn't report what it was. It then started waiting for another command, but the task didn't have any, so it did almost nothing for a few hours.

Check your event log for a line similar to this one, near its beginning:

5/14/2022 11:59:47 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx smx tm2 pbe fsgsbase bmi1 hle smep bmi2

If avx and avx2 do not appear in this line for your computer, expect very few if any of the Rosetta Python task to work correctly.

If they do appear, most of them will work, but an occasional few won't. It's a known problem and the project shows very little sign of planning to fix it.

In other words, the Rosetta VM tasks aren't set up to work correctly on many of the older CPUs.


OLDER? Really? Ryzen 3700x here, barely 2 years old in my system.
In stderr that does not show
In the BOINC manager log at the start you get that
5/16/2022 8:41:38 AM | | Processor: 16 AuthenticAMD AMD Ryzen 7 3700X 8-Core Processor [Family 23 Model 113 Stepping 0]
5/16/2022 8:41:38 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 svm sse4a osvw ibs skinit wdt tce topx page1gb rdtscp fsgsbase bmi1 smep bmi2

Project doesn't give a S--- about failures. As long as they get the data somehow from someone and if its just one task somewhere that dies...oh well.

Was just checking it was not my machine. Your explanation of the process is very interesting. Thanks
ID: 106229 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1187
Credit: 5,647,435
RAC: 1,441
Message 106231 - Posted: 17 May 2022, 8:22:14 UTC - in response to Message 106223.  

I have two rosetta python tasks runnng and three wating to run on my Intel i5, 12 GB RAM, WINDOWS 11.
tULLIO
That's a better CPU than my i5, but has less RAM. You need a tonne of RAM to run Pythons. With 16GB mine will only run 4 or 5. Which I'll leave it at because the other core or 2 can help the Intel and AMD GPUs.
ID: 106231 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1187
Credit: 5,647,435
RAC: 1,441
Message 106232 - Posted: 17 May 2022, 8:29:15 UTC - in response to Message 106224.  

I've probably spent 400 in upgrades and replacements over the last 2 years.
I keep spending almost £100 on yet another GPU. I now have 9 + possibly one more if I win the auction and can repair it.

I accidentally got water drops on my MOBO after trying to disassemble a custom AIO refillable custom cooler after the pump broke down. So that cooked the MOBO a bit and then I decided to upgrade the CPU to go with a new future proof MOBO.
A long long time ago I made my own water cooling system, well it was a put together select parts kinda thing, but I actually water cooled the PSU. A couple of mishaps - a lot of water under the northbridge. MB was fine after drying out! PSU was not fine when it got filled with water, there were lots of sparks and many components didn't recover.

I lost money at a incompetent repair center. Bought a new cooler (PC Mag EU cooler of the year) after researching it deeply. New center rebuilt the PC with this cooler.
Waiting for the incompetance mention....

Still using my 1050 that I bought eons ago (in computer terms) and a secondhand 1080 I got for a steal from a graphics company offloading them (from their server room). I also changed out my power supply for a digital power supply.
Not even sure what that means. How can electricity be digital?

My case was to small for the new setup so I had to get a new case.
Just do what I do and use a bookshelf.

Now this custom setup can handle anything I throw at. Pythons and videos at the same time along with primegrid on GPU and so on. It was worth the money, that's why I am pissed off when a task locks up my system. I burn a ton of power with this thing, I don't need buggy tasks.
I've never seen a system run VB tasks and have a usable interface, how did you do that? Nothing gets maxed out, but it just sticks for half a second every so often. I thought it was AMD being rubbish at VTX or whatever it is, but my i5 is the same.
ID: 106232 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1187
Credit: 5,647,435
RAC: 1,441
Message 106233 - Posted: 17 May 2022, 8:30:37 UTC - in response to Message 106225.  

I don't get this....currently 29 seconds processing on 4hrs and 12 minutes run time.
CPU is at .19% Progress is .006% every 2 seconds.
Again a aam task. Some work some don't.
Approximately what proportion of tasks does this happen with?
ID: 106233 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1187
Credit: 5,647,435
RAC: 1,441
Message 106234 - Posted: 17 May 2022, 8:31:22 UTC - in response to Message 106226.  

Sometimes vm os hangs on "Spectre V2 mitigation: LFENCE not serializing. Switching to generic retpoline" during loading.
i was able to reboot vm from vbox manager and it worked.
Sounds like made up words from a scifi film.
ID: 106234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1187
Credit: 5,647,435
RAC: 1,441
Message 106235 - Posted: 17 May 2022, 8:32:36 UTC - in response to Message 106227.  

The important lines were:

2022-05-16 17:21:17 (2912): Status Report: Elapsed Time: '6000.901087'
2022-05-16 17:21:17 (2912): Status Report: CPU Time: '13.968750'

Basically, the emulated operating system encountered an error and didn't report what it was. It then started waiting for another command, but the task didn't have any, so it did almost nothing for a few hours.
Have the programmers not heard of something called "timeout"?
ID: 106235 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1187
Credit: 5,647,435
RAC: 1,441
Message 106236 - Posted: 17 May 2022, 8:34:32 UTC - in response to Message 106229.  
Last modified: 17 May 2022, 8:35:32 UTC

OLDER? Really? Ryzen 3700x here, barely 2 years old in my system.
ROTFPMSL at your computer being insulted.

Project doesn't give a S--- about failures. As long as they get the data somehow from someone and if its just one task somewhere that dies...oh well.
There's a lot less pythons in the queue than there was. Either we've crunched them way faster than I thought we would, or they've been deleting some, or many have failed. Perhaps the next batch will have improvements.
ID: 106236 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5502
Credit: 5,476,728
RAC: 1,985
Message 106238 - Posted: 17 May 2022, 19:07:48 UTC - in response to Message 106236.  

OLDER? Really? Ryzen 3700x here, barely 2 years old in my system.
ROTFPMSL at your computer being insulted. <-- yeah, don't you know microchips have feelings?

Project doesn't give a S--- about failures. As long as they get the data somehow from someone and if its just one task somewhere that dies...oh well.
There's a lot less pythons in the queue than there was. Either we've crunched them way faster than I thought we would, or they've been deleting some, or many have failed. Perhaps the next batch will have improvements.
<-- what is the source of these pythons tasks and has anybody ever seen the output from them?
ID: 106238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1187
Credit: 5,647,435
RAC: 1,441
Message 106239 - Posted: 18 May 2022, 13:49:58 UTC
Last modified: 18 May 2022, 13:50:56 UTC

The machine I'm trying to run python on keeps getting banned, despite completing most of them successfully. Is there ever an end to problems here?

And yes I know they have feelings, that's why I buy "broken" GPUs on Ebay and try to get them to do something. Actually that's probably cruel as the poor things thought they'd retired.
ID: 106239 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5502
Credit: 5,476,728
RAC: 1,985
Message 106240 - Posted: 18 May 2022, 18:10:27 UTC - in response to Message 106239.  

The machine I'm trying to run python on keeps getting banned, despite completing most of them successfully. Is there ever an end to problems here?

And yes I know they have feelings, that's why I buy "broken" GPUs on Ebay and try to get them to do something. Actually that's probably cruel as the poor things thought they'd retired.



Probably your GPU's are complaining loudly and the RAH server is taking sympathy.
Maybe you insulted it to many times?
Or your just lucky 13.
ID: 106240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1187
Credit: 5,647,435
RAC: 1,441
Message 106241 - Posted: 19 May 2022, 5:43:05 UTC

I've lost a Radeon 6990 and a cockatiel in the last couple of days, I'm not happy. Things should be made to last forever.
ID: 106241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1176
Credit: 13,195,130
RAC: 3,074
Message 106244 - Posted: 19 May 2022, 14:14:53 UTC - in response to Message 106241.  

I've lost a Radeon 6990 and a cockatiel in the last couple of days, I'm not happy. Things should be made to last forever.

Note that it would then take forever to determine if they actually last forever or not.
ID: 106244 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1187
Credit: 5,647,435
RAC: 1,441
Message 106245 - Posted: 19 May 2022, 14:22:03 UTC - in response to Message 106244.  

I've lost a Radeon 6990 and a cockatiel in the last couple of days, I'm not happy. Things should be made to last forever.

Note that it would then take forever to determine if they actually last forever or not.
Forever is not a fixed time. It could be 5 times longer than normal for example.
ID: 106245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5502
Credit: 5,476,728
RAC: 1,985
Message 106247 - Posted: 19 May 2022, 18:44:55 UTC - in response to Message 106245.  
Last modified: 19 May 2022, 18:47:07 UTC

I've lost a Radeon 6990 and a cockatiel in the last couple of days, I'm not happy. Things should be made to last forever.

Note that it would then take forever to determine if they actually last forever or not.
Forever is not a fixed time. It could be 5 times longer than normal for example.



You know those McDonald's trackers for your table? (well here in EU we have them)
I looked at the underside last night, made in Thailand assembled in China.
And since most stuff these days is or was made in China, its cheap and throw away.
The GPU mfg's would not be in business if their cards lasted forever.
Cars used to last forever, but the "forever" went away in the 80s I think when we switched over to make things as cheap as possible and charge regular price to make more profit and get the consumer to buy more.
ID: 106247 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Peter Hucker of the Scottish Boinc Team
Avatar

Send message
Joined: 12 Aug 06
Posts: 1187
Credit: 5,647,435
RAC: 1,441
Message 106248 - Posted: 20 May 2022, 5:56:26 UTC - in response to Message 106247.  

You know those McDonald's trackers for your table? (well here in EU we have them)
I looked at the underside last night, made in Thailand assembled in China.
And since most stuff these days is or was made in China, its cheap and throw away.
The GPU mfg's would not be in business if their cards lasted forever.
Cars used to last forever, but the "forever" went away in the 80s I think when we switched over to make things as cheap as possible and charge regular price to make more profit and get the consumer to buy more.
Damn, closed browser after checking preview thinking I'd posted it, so the following is shorter as I'm lazy.

What's a McD tracker? I'm the UK but rarely go there. Cars last me 20 years now, used to be 10. GPUs get replaced for the latest game. If the old one keeps value, those gamers have more money to buy the new one. If something breaks you don't buy from the same make again and write a nasty review, so making shit quality stuff harms your company.
ID: 106248 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
raymond

Send message
Joined: 27 Apr 20
Posts: 1
Credit: 224,307
RAC: 27
Message 106249 - Posted: 20 May 2022, 22:57:35 UTC

Why am I getting a notice "Waiting to contact project servers"?
ID: 106249 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1332
Credit: 13,624,788
RAC: 5
Message 106250 - Posted: 21 May 2022, 0:54:11 UTC - in response to Message 106249.  

Why am I getting a notice "Waiting to contact project servers"?
No idea.
In the Advanced view of BOINC Manager, select the Projects tab, select Rosetta & click on Update.
Then check in Tools, Event log & see what messages are there.
Grant
Darwin NT
ID: 106250 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 211 · 212 · 213 · 214 · 215 · 216 · 217 . . . 219 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2022 University of Washington
https://www.bakerlab.org