Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 190 · 191 · 192 · 193 · 194 · 195 · 196 . . . 281 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,739,521
RAC: 1,936
Message 105535 - Posted: 19 Mar 2022, 13:17:10 UTC - in response to Message 105533.  

An additional comment, The pythons are pretty much faultless now. I have not had any error out on me.
That is from a much larger database not within Baker lab.

The 4.2 stuff, it seems the RB is ok, that's external work. The non RB can be hit and miss. For instance this group: preetham_gen_ etc. had one specific bug.

It's annoying to run something that errors out. In the past they would heard about it via the mod and pull it or fix it. Now we have to burn through all of the tasks and they take what works and toss the rest I guess.
Stupid, but that's how they work now. Thing is 4.2 tasks are not that often anymore and when they are, the are consumed quickly. I think I ran about 12 or 16 of these in total. They all errored out quick, so not a lot of wasted time. Annoying? Yes. But that's just part of the package now.
ID: 105535 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,987,845
RAC: 10,024
Message 105536 - Posted: 19 Mar 2022, 13:24:30 UTC - in response to Message 105535.  
Last modified: 19 Mar 2022, 13:24:59 UTC

An additional comment, The pythons are pretty much faultless now. I have not had any error out on me.
I have 7 computers and despite trying every version of virtualbox, and stopping AVG messing with virtual machines (which fixed LHC), I can only run Pythons on 2 of the 7. The others just don't process. Walltime passes, CPU time stops about 20 seconds.
ID: 105536 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 238
Credit: 381,511
RAC: 2,750
Message 105537 - Posted: 19 Mar 2022, 13:35:03 UTC - in response to Message 105536.  
Last modified: 19 Mar 2022, 13:39:17 UTC

Can you open virtuualbox gui and press show to attach gui screen to the running vm and look at what it writes writing to the screen?
Then you can detach gui from the vm and switch to next.
You can post screenshots to imgur.
you can make screenshots with win+printscreen.
They are saved to C:\Users\[username]\Pictures\Screenshots
ID: 105537 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,987,845
RAC: 10,024
Message 105538 - Posted: 19 Mar 2022, 14:40:25 UTC - in response to Message 105537.  
Last modified: 19 Mar 2022, 14:46:14 UTC

Can you open virtuualbox gui and press show to attach gui screen to the running vm and look at what it writes writing to the screen?
Then you can detach gui from the vm and switch to next.
You can post screenshots to imgur.
you can make screenshots with win+printscreen.
They are saved to C:\Users\[username]\Pictures\Screenshots

https://imgur.com/a/ENKUzDk
Does that Intel error mean my CPU doens't support the command being used? I can do LHC VB stuff ok on it. Is Rosetta using CPU extensions that one doesn't have? It's a Xeon X5650. The two which DO run Python ok are the newest two.

Mind you, I looked up that error and similar ones I found suggest incorrect libraries, but how is that possible on some of my machines and not others? Aren't the libraries inside the VM?
ID: 105538 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zxcvbob

Send message
Joined: 4 Jan 06
Posts: 8
Credit: 830,878
RAC: 0
Message 105539 - Posted: 19 Mar 2022, 14:48:52 UTC

I have one work-unit that has been running for days. I thought it was stuck yesterday morning at 99.1%, but it was still climbing *very* slowly. Today it is at 99.9+%; maybe it will be finished by tomorrow. It was due yesterday so I don't know if it will even be accepted. It's a virtual-box one. I'm trying now to figure out how to post a screen shot. Apparently I can't just add an attachment. And I can't copy text from the Boinc Manager. It's one of the aagb-mNMPHE ones.
ID: 105539 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 238
Credit: 381,511
RAC: 2,750
Message 105541 - Posted: 19 Mar 2022, 14:53:17 UTC - in response to Message 105539.  
Last modified: 19 Mar 2022, 14:53:34 UTC

drag and drop it to imgur.com and use img or url tag
ID: 105541 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,987,845
RAC: 10,024
Message 105542 - Posted: 19 Mar 2022, 14:56:29 UTC - in response to Message 105539.  

I have one work-unit that has been running for days. I thought it was stuck yesterday morning at 99.1%, but it was still climbing *very* slowly. Today it is at 99.9+%; maybe it will be finished by tomorrow. It was due yesterday so I don't know if it will even be accepted. It's a virtual-box one. I'm trying now to figure out how to post a screen shot. Apparently I can't just add an attachment. And I can't copy text from the Boinc Manager. It's one of the aagb-mNMPHE ones.
That percentage sometimes is wrong. Is it actually using your CPU? Check in task manager. Mine appear to run in Boinc Manager, but they don't do any calculations and the CPU is idle. Boinctasks is better, it shows CPU usage.
ID: 105542 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,739,521
RAC: 1,936
Message 105543 - Posted: 19 Mar 2022, 18:39:33 UTC - in response to Message 105538.  
Last modified: 19 Mar 2022, 18:49:00 UTC

Have a look here: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-FATAL-ERROR-Cannot-load-lt-mkl-loader-gt/m-p/1244368
It might answer some of the questions

Maybe go look at Github or post there and see what the experts say.
I can not find anything to your specific error message.
ID: 105543 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5664
Credit: 5,739,521
RAC: 1,936
Message 105544 - Posted: 19 Mar 2022, 18:52:44 UTC - in response to Message 105542.  

I have one work-unit that has been running for days. I thought it was stuck yesterday morning at 99.1%, but it was still climbing *very* slowly. Today it is at 99.9+%; maybe it will be finished by tomorrow. It was due yesterday so I don't know if it will even be accepted. It's a virtual-box one. I'm trying now to figure out how to post a screen shot. Apparently I can't just add an attachment. And I can't copy text from the Boinc Manager. It's one of the aagb-mNMPHE ones.
That percentage sometimes is wrong. Is it actually using your CPU? Check in task manager. Mine appear to run in Boinc Manager, but they don't do any calculations and the CPU is idle. Boinctasks is better, it shows CPU usage.


I've had non Vbox task from LHC ATLAS stall at 99.91% and would not advance. It ran for 1.5 days before I noticed.
Have to check CPU % for that task. I use BOINC tasks to do that.
If you see a task with .10% or lower using that program and the % complete does not go up any like the other tasks and your at 99% with a few "minutes" to go that never advance after 5 minutes, its stuck and you will have to abort.
Anything over 12 hrs run time is suspect.
ID: 105544 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,987,845
RAC: 10,024
Message 105546 - Posted: 19 Mar 2022, 19:29:12 UTC - in response to Message 105543.  

Have a look here: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-FATAL-ERROR-Cannot-load-lt-mkl-loader-gt/m-p/1244368
It might answer some of the questions

Maybe go look at Github or post there and see what the experts say.
I can not find anything to your specific error message.
I was hoping kotenok was going to come up with something, they asked me to post it.

Is there a github Rosetta group? The word Rosetta seems to refer to many other things in there.

Your link didn't help me, I don't understand the technical stuff, but it made me laugh: "I upgraded Numpy and Pandas, when I run python with django and apache"!!
ID: 105546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,987,845
RAC: 10,024
Message 105547 - Posted: 19 Mar 2022, 19:33:16 UTC - in response to Message 105544.  

I've had non Vbox task from LHC ATLAS stall at 99.91% and would not advance. It ran for 1.5 days before I noticed.
Have to check CPU % for that task. I use BOINC tasks to do that.
If you see a task with .10% or lower using that program and the % complete does not go up any like the other tasks and your at 99% with a few "minutes" to go that never advance after 5 minutes, its stuck and you will have to abort.
Anything over 12 hrs run time is suspect.
ATLAS does non-vbox tasks? I thought only sixtrack did that.

I don't have the % column switched on, got too many columns as it is to fit on the screen (it's on my right, on two 1280*1024 monitors above each other), I just glance at the time in brackets (actual core time) occasionally, to make sure it's applicable to the number of cores the task should be using. I only get VB doing nothing at all, so the brackets never gets to a minute, or Universe on my phone sticking near the end in which case the brackets stops moving, if I see the wall time a few hours over the CPU time, I abort it. They know about the problem and don't know how to fix it, random tasks fail on random phones but work on another.
ID: 105547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 238
Credit: 381,511
RAC: 2,750
Message 105548 - Posted: 19 Mar 2022, 19:35:21 UTC - in response to Message 105547.  

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4395
ID: 105548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,987,845
RAC: 10,024
Message 105550 - Posted: 19 Mar 2022, 19:49:47 UTC - in response to Message 105548.  
Last modified: 19 Mar 2022, 19:50:02 UTC

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4395
Ugh, Linux.

Any idea on my VB error?
ID: 105550 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 238
Credit: 381,511
RAC: 2,750
Message 105551 - Posted: 19 Mar 2022, 19:52:45 UTC - in response to Message 105550.  

Do virtualbox apps use AIMNet_vm_v2.vdi hard drive image?
You can make an issue at aimet github page.
https://github.com/aiqm/aimnet/issues
ID: 105551 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,987,845
RAC: 10,024
Message 105552 - Posted: 19 Mar 2022, 19:59:34 UTC - in response to Message 105551.  

Do virtualbox apps use AIMNet_vm_v2.vdi hard drive image?
Yes they do

You can make an issue at aimet github page.
https://github.com/aiqm/aimnet/issues
But it never goes wrong on two of my machines. I'm not sure what to blame or what details to give them. My two newest machines work:
Ryzen 9 3900XT
i5 8600K

My older 5 machines do not:
Core 2 Quad Q8400
i3 M350
Xeon X5650 (dual)
Another Xeon X5650 (dual)
Pentium N3700
ID: 105552 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 238
Credit: 381,511
RAC: 2,750
Message 105553 - Posted: 19 Mar 2022, 20:14:49 UTC - in response to Message 105552.  

Maybe some file got corrupted?
try to calculate checksum with 7-zip
https://imgur.com/a/R3UDk5o
ID: 105553 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,987,845
RAC: 10,024
Message 105554 - Posted: 19 Mar 2022, 20:52:57 UTC - in response to Message 105553.  
Last modified: 19 Mar 2022, 21:02:21 UTC

Maybe some file got corrupted?
try to calculate checksum with 7-zip
https://imgur.com/a/R3UDk5o
My 7zip doesn't have the CRC option at the bottom of the menu. I tried upgrading to the latest version, but it's still not there. Do you have a paid version?

I tried Windows 11 zip, but that only gives a CRC32 of 087E2283 - is that enough to check? Where do I find what it should be? (OOPS! That's on a good machine, checking a faulty machine now....)

Actually forget all that. I've detached and reattached to Rosetta so many times to test it, there's no way those 5 machines always corrupted it.
ID: 105554 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 238
Credit: 381,511
RAC: 2,750
Message 105555 - Posted: 19 Mar 2022, 21:01:13 UTC - in response to Message 105554.  

Open 7-Zip File Manager.
On the 7-Zip window, switch to the Tools menu and then select the Options button.
On the Options page, switch to the 7-Zip tab and then check the CRC SHA option.
ID: 105555 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,987,845
RAC: 10,024
Message 105556 - Posted: 19 Mar 2022, 21:02:54 UTC - in response to Message 105555.  

I've detached and reattached to Rosetta so many times to test it, there's no way those 5 machines always corrupted it.
ID: 105556 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 238
Credit: 381,511
RAC: 2,750
Message 105557 - Posted: 19 Mar 2022, 21:04:05 UTC - in response to Message 105555.  

Does bad machine have VM VirtualBox Extension Pack installed, and if it doesn't have it installed, do good machines have it installed
ID: 105557 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 190 · 191 · 192 · 193 · 194 · 195 · 196 . . . 281 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org