Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 190 · 191 · 192 · 193 · 194 · 195 · 196 . . . 309 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105543 - Posted: 19 Mar 2022, 18:39:33 UTC - in response to Message 105538.  
Last modified: 19 Mar 2022, 18:49:00 UTC

Have a look here: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-FATAL-ERROR-Cannot-load-lt-mkl-loader-gt/m-p/1244368
It might answer some of the questions

Maybe go look at Github or post there and see what the experts say.
I can not find anything to your specific error message.
ID: 105543 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105544 - Posted: 19 Mar 2022, 18:52:44 UTC - in response to Message 105542.  

I have one work-unit that has been running for days. I thought it was stuck yesterday morning at 99.1%, but it was still climbing *very* slowly. Today it is at 99.9+%; maybe it will be finished by tomorrow. It was due yesterday so I don't know if it will even be accepted. It's a virtual-box one. I'm trying now to figure out how to post a screen shot. Apparently I can't just add an attachment. And I can't copy text from the Boinc Manager. It's one of the aagb-mNMPHE ones.
That percentage sometimes is wrong. Is it actually using your CPU? Check in task manager. Mine appear to run in Boinc Manager, but they don't do any calculations and the CPU is idle. Boinctasks is better, it shows CPU usage.


I've had non Vbox task from LHC ATLAS stall at 99.91% and would not advance. It ran for 1.5 days before I noticed.
Have to check CPU % for that task. I use BOINC tasks to do that.
If you see a task with .10% or lower using that program and the % complete does not go up any like the other tasks and your at 99% with a few "minutes" to go that never advance after 5 minutes, its stuck and you will have to abort.
Anything over 12 hrs run time is suspect.
ID: 105544 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 105546 - Posted: 19 Mar 2022, 19:29:12 UTC - in response to Message 105543.  

Have a look here: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-FATAL-ERROR-Cannot-load-lt-mkl-loader-gt/m-p/1244368
It might answer some of the questions

Maybe go look at Github or post there and see what the experts say.
I can not find anything to your specific error message.
I was hoping kotenok was going to come up with something, they asked me to post it.

Is there a github Rosetta group? The word Rosetta seems to refer to many other things in there.

Your link didn't help me, I don't understand the technical stuff, but it made me laugh: "I upgraded Numpy and Pandas, when I run python with django and apache"!!
ID: 105546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 105547 - Posted: 19 Mar 2022, 19:33:16 UTC - in response to Message 105544.  

I've had non Vbox task from LHC ATLAS stall at 99.91% and would not advance. It ran for 1.5 days before I noticed.
Have to check CPU % for that task. I use BOINC tasks to do that.
If you see a task with .10% or lower using that program and the % complete does not go up any like the other tasks and your at 99% with a few "minutes" to go that never advance after 5 minutes, its stuck and you will have to abort.
Anything over 12 hrs run time is suspect.
ATLAS does non-vbox tasks? I thought only sixtrack did that.

I don't have the % column switched on, got too many columns as it is to fit on the screen (it's on my right, on two 1280*1024 monitors above each other), I just glance at the time in brackets (actual core time) occasionally, to make sure it's applicable to the number of cores the task should be using. I only get VB doing nothing at all, so the brackets never gets to a minute, or Universe on my phone sticking near the end in which case the brackets stops moving, if I see the wall time a few hours over the CPU time, I abort it. They know about the problem and don't know how to fix it, random tasks fail on random phones but work on another.
ID: 105547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105548 - Posted: 19 Mar 2022, 19:35:21 UTC - in response to Message 105547.  

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4395
ID: 105548 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 105550 - Posted: 19 Mar 2022, 19:49:47 UTC - in response to Message 105548.  
Last modified: 19 Mar 2022, 19:50:02 UTC

https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4395
Ugh, Linux.

Any idea on my VB error?
ID: 105550 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105551 - Posted: 19 Mar 2022, 19:52:45 UTC - in response to Message 105550.  

Do virtualbox apps use AIMNet_vm_v2.vdi hard drive image?
You can make an issue at aimet github page.
https://github.com/aiqm/aimnet/issues
ID: 105551 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 105552 - Posted: 19 Mar 2022, 19:59:34 UTC - in response to Message 105551.  

Do virtualbox apps use AIMNet_vm_v2.vdi hard drive image?
Yes they do

You can make an issue at aimet github page.
https://github.com/aiqm/aimnet/issues
But it never goes wrong on two of my machines. I'm not sure what to blame or what details to give them. My two newest machines work:
Ryzen 9 3900XT
i5 8600K

My older 5 machines do not:
Core 2 Quad Q8400
i3 M350
Xeon X5650 (dual)
Another Xeon X5650 (dual)
Pentium N3700
ID: 105552 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105553 - Posted: 19 Mar 2022, 20:14:49 UTC - in response to Message 105552.  

Maybe some file got corrupted?
try to calculate checksum with 7-zip
https://imgur.com/a/R3UDk5o
ID: 105553 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 105554 - Posted: 19 Mar 2022, 20:52:57 UTC - in response to Message 105553.  
Last modified: 19 Mar 2022, 21:02:21 UTC

Maybe some file got corrupted?
try to calculate checksum with 7-zip
https://imgur.com/a/R3UDk5o
My 7zip doesn't have the CRC option at the bottom of the menu. I tried upgrading to the latest version, but it's still not there. Do you have a paid version?

I tried Windows 11 zip, but that only gives a CRC32 of 087E2283 - is that enough to check? Where do I find what it should be? (OOPS! That's on a good machine, checking a faulty machine now....)

Actually forget all that. I've detached and reattached to Rosetta so many times to test it, there's no way those 5 machines always corrupted it.
ID: 105554 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105555 - Posted: 19 Mar 2022, 21:01:13 UTC - in response to Message 105554.  

Open 7-Zip File Manager.
On the 7-Zip window, switch to the Tools menu and then select the Options button.
On the Options page, switch to the 7-Zip tab and then check the CRC SHA option.
ID: 105555 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 105556 - Posted: 19 Mar 2022, 21:02:54 UTC - in response to Message 105555.  

I've detached and reattached to Rosetta so many times to test it, there's no way those 5 machines always corrupted it.
ID: 105556 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105557 - Posted: 19 Mar 2022, 21:04:05 UTC - in response to Message 105555.  

Does bad machine have VM VirtualBox Extension Pack installed, and if it doesn't have it installed, do good machines have it installed
ID: 105557 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 105558 - Posted: 19 Mar 2022, 21:06:22 UTC - in response to Message 105557.  
Last modified: 19 Mar 2022, 21:07:27 UTC

All machines are currently running V5 with extensions (as cosmology at home fails with V6). They have previously had V6 with extensions. I always put the relevant extension pack on.

From the 7 computers I have, it would appear older machines fail. Do you or anyone else in here have an older machine it works ok on?
ID: 105558 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105559 - Posted: 19 Mar 2022, 23:10:57 UTC - in response to Message 105547.  

My bad...ATLAS is Vbox
But maybe Vbox isn't the issue, maybe its the underlying program structure of the project/task?
Have you tried ATLAS or any other Vbox projects as a elimination of Vbox problems?
If they also fail, then its Vbox. If just python fails, then its the python code that does not work.
ID: 105559 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 105560 - Posted: 19 Mar 2022, 23:12:38 UTC - in response to Message 105559.  
Last modified: 19 Mar 2022, 23:16:28 UTC

My bad...ATLAS is Vbox
But maybe Vbox isn't the issue, maybe its the underlying program structure of the project/task?
Have you tried ATLAS or any other Vbox projects as a elimination of Vbox problems?
If they also fail, then its Vbox. If just python fails, then its the python code that does not work.
Atlas (and CMS and Theory) work on 6 of 7 machines (after telling AVG antivirus to not scan virtual machines). Python only works on 2 of 7 machines. All I can assume is the Python code only likes certain processors, in my case it happens to be the two newest ones. We need a list of which CPUs are returning good results, I wonder if this can be seen somewhere on Rosetta stats? This page https://boinc.bakerlab.org/rosetta/top_hosts.php has the information, but you'd have to look through and find a host with a processor you're interested in, then see if they happen to have done python tasks. If they haven't, you don't know if they've refused to do them or have failed to do them.
ID: 105560 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 105561 - Posted: 19 Mar 2022, 23:40:18 UTC

One host you aborted every python that ever came to it.
Let at least one of the "faulty" hosts run python and then post the stderr output here before you abort.
Maybe we can see something in the output text of that file that could give us a clue.

It is quite possible that since python is being used on such a complex structure that its just to much for your old processors to handle.

Let me see your stderr on each failed machine and then if I don't see anything or anyone else does not see anything then we might suggest you talk to the experts at github.

The computermezle or however he spells his name is an expert at the github site.
He is on here from time to time.

If Vbox works with ATLAS then I think its pretty fair to say python is the problem.


I've ruled out 64 vs 32 bit (python being 64 and your machines 32) since your running win64.


If its not going to bean easy thing to see, the simple thing I would suggest is disable python on your older machines and keep them attached for 4.2 and let them run other less complex projects.
ID: 105561 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 105562 - Posted: 20 Mar 2022, 0:14:30 UTC - in response to Message 105561.  
Last modified: 20 Mar 2022, 0:14:49 UTC

One host you aborted every python that ever came to it.
Let at least one of the "faulty" hosts run python and then post the stderr output here before you abort.
Maybe we can see something in the output text of that file that could give us a clue.

It is quite possible that since python is being used on such a complex structure that its just to much for your old processors to handle.

Let me see your stderr on each failed machine and then if I don't see anything or anyone else does not see anything then we might suggest you talk to the experts at github.

The computermezle or however he spells his name is an expert at the github site.
He is on here from time to time.

If Vbox works with ATLAS then I think its pretty fair to say python is the problem.


I've ruled out 64 vs 32 bit (python being 64 and your machines 32) since your running win64.


If its not going to bean easy thing to see, the simple thing I would suggest is disable python on your older machines and keep them attached for 4.2 and let them run other less complex projects.
How long until I get a stderr.txt? If I let them run to the end they take days.

I do currently have python disabled on the 5 bad machines, it's just I can't do much Rosetta work that way with 5/7 of my machines waiting for 4.2 work.
ID: 105562 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 272
Credit: 507,897
RAC: 334
Message 105563 - Posted: 20 Mar 2022, 0:28:45 UTC - in response to Message 105562.  

When you find vm that is stuck with that error
open storage on vm settings and look where the hard drive image is located.
Stderr should be in the same slot folder.
https://imgur.com/a/KQhB1dd
ID: 105563 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 9,863
Message 105564 - Posted: 20 Mar 2022, 1:07:19 UTC - in response to Message 105563.  
Last modified: 20 Mar 2022, 1:09:19 UTC

When you find vm that is stuck with that error
open storage on vm settings and look where the hard drive image is located.
Stderr should be in the same slot folder.
https://imgur.com/a/KQhB1dd
This is one that has been running for 6 minuts (on an SSD) and has only done 20 seconds of processing, I'll leave it running incase you need to see more.:

2022-03-20 00:53:25 (43240): Detected: vboxwrapper 26202
2022-03-20 00:53:25 (43240): Detected: BOINC client v7.19.0
2022-03-20 00:53:27 (43240): Detected: VirtualBox VboxManage Interface (Version: 5.2.44)
2022-03-20 00:53:27 (43240): Feature: Checkpoint interval offset (186 seconds)
2022-03-20 00:53:27 (43240): Detected: Minimum checkpoint interval (600.000000 seconds)
2022-03-20 00:53:29 (43240): Create VM. (boinc_923b5ddf89fefc4a, slot#0)
2022-03-20 00:53:29 (43240): Setting Memory Size for VM. (6144MB)
2022-03-20 00:53:29 (43240): Setting CPU Count for VM. (1)
2022-03-20 00:53:30 (43240): Setting Chipset Options for VM.
2022-03-20 00:53:30 (43240): Setting Boot Options for VM.
2022-03-20 00:53:30 (43240): Setting Network Configuration for NAT.
2022-03-20 00:53:31 (43240): Disabling VM Network Access.
2022-03-20 00:53:31 (43240): Disabling USB Support for VM.
2022-03-20 00:53:31 (43240): Disabling COM Port Support for VM.
2022-03-20 00:53:31 (43240): Disabling LPT Port Support for VM.
2022-03-20 00:53:32 (43240): Disabling Audio Support for VM.
2022-03-20 00:53:32 (43240): Disabling Clipboard Support for VM.
2022-03-20 00:53:32 (43240): Disabling Drag and Drop Support for VM.
2022-03-20 00:53:33 (43240): Adding storage controller(s) to VM.
2022-03-20 00:53:33 (43240): Adding virtual disk drive to VM. (vm_image.vdi)
2022-03-20 00:53:33 (43240): Adding VirtualBox Guest Additions to VM.
2022-03-20 00:53:33 (43240): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2022-03-20 00:53:34 (43240): Enabling shared directory for VM.
2022-03-20 00:53:34 (43240): Starting VM using VBoxManage interface. (boinc_923b5ddf89fefc4a, slot#0)
2022-03-20 00:53:42 (43240): Successfully started VM. (PID = '42088')
2022-03-20 00:53:42 (43240): Reporting VM Process ID to BOINC.
2022-03-20 00:53:42 (43240): Guest Log: BIOS: VirtualBox 5.2.44
2022-03-20 00:53:42 (43240): Guest Log: CPUID EDX: 0x078bfbff
2022-03-20 00:53:42 (43240): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63
2022-03-20 00:53:42 (43240): VM state change detected. (old = 'poweredoff', new = 'running')
2022-03-20 00:53:42 (43240): Preference change detected
2022-03-20 00:53:42 (43240): Setting CPU throttle for VM. (100%)
2022-03-20 00:53:42 (43240): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 30 seconds) or (Vbox_job.xml: 600 seconds))
2022-03-20 00:53:44 (43240): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032
2022-03-20 00:53:44 (43240): Guest Log: BIOS: Booting from Hard Disk...
2022-03-20 00:53:49 (43240): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds
2022-03-20 00:53:49 (43240): Guest Log: vboxguest: misc device minor 58, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000)
2022-03-20 00:53:54 (43240): Guest Log: VBoxService 5.2.42 r137960 (verbosity: 0) linux.amd64 (May 13 2020 21:45:13) release log
2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000224 main     Log opened 2022-03-20T00:53:52.350889000Z
2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000407 main     OS Product: Linux
2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000460 main     OS Release: 4.19.0-14-amd64
2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000518 main     OS Version: #1 SMP Debian 4.19.171-2 (2021-01-30)
2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000569 main     Executable: /opt/VBoxGuestAdditions-5.2.42/sbin/VBoxService
2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000570 main     Process ID: 539
2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000571 main     Package type: LINUX_64BITS_GENERIC
2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.002363 main     5.2.42 r137960 started. Verbose level = 0
ID: 105564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 190 · 191 · 192 · 193 · 194 · 195 · 196 . . . 309 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org