Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 190 · 191 · 192 · 193 · 194 · 195 · 196 . . . 309 · Next
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Have a look here: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-FATAL-ERROR-Cannot-load-lt-mkl-loader-gt/m-p/1244368 It might answer some of the questions Maybe go look at Github or post there and see what the experts say. I can not find anything to your specific error message. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I have one work-unit that has been running for days. I thought it was stuck yesterday morning at 99.1%, but it was still climbing *very* slowly. Today it is at 99.9+%; maybe it will be finished by tomorrow. It was due yesterday so I don't know if it will even be accepted. It's a virtual-box one. I'm trying now to figure out how to post a screen shot. Apparently I can't just add an attachment. And I can't copy text from the Boinc Manager. It's one of the aagb-mNMPHE ones.That percentage sometimes is wrong. Is it actually using your CPU? Check in task manager. Mine appear to run in Boinc Manager, but they don't do any calculations and the CPU is idle. Boinctasks is better, it shows CPU usage. I've had non Vbox task from LHC ATLAS stall at 99.91% and would not advance. It ran for 1.5 days before I noticed. Have to check CPU % for that task. I use BOINC tasks to do that. If you see a task with .10% or lower using that program and the % complete does not go up any like the other tasks and your at 99% with a few "minutes" to go that never advance after 5 minutes, its stuck and you will have to abort. Anything over 12 hrs run time is suspect. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Have a look here: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-FATAL-ERROR-Cannot-load-lt-mkl-loader-gt/m-p/1244368I was hoping kotenok was going to come up with something, they asked me to post it. Is there a github Rosetta group? The word Rosetta seems to refer to many other things in there. Your link didn't help me, I don't understand the technical stuff, but it made me laugh: "I upgraded Numpy and Pandas, when I run python with django and apache"!! |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
I've had non Vbox task from LHC ATLAS stall at 99.91% and would not advance. It ran for 1.5 days before I noticed.ATLAS does non-vbox tasks? I thought only sixtrack did that. I don't have the % column switched on, got too many columns as it is to fit on the screen (it's on my right, on two 1280*1024 monitors above each other), I just glance at the time in brackets (actual core time) occasionally, to make sure it's applicable to the number of cores the task should be using. I only get VB doing nothing at all, so the brackets never gets to a minute, or Universe on my phone sticking near the end in which case the brackets stops moving, if I see the wall time a few hours over the CPU time, I abort it. They know about the problem and don't know how to fix it, random tasks fail on random phones but work on another. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4395 |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4395Ugh, Linux. Any idea on my VB error? |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
Do virtualbox apps use AIMNet_vm_v2.vdi hard drive image? You can make an issue at aimet github page. https://github.com/aiqm/aimnet/issues |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Do virtualbox apps use AIMNet_vm_v2.vdi hard drive image?Yes they do You can make an issue at aimet github page.But it never goes wrong on two of my machines. I'm not sure what to blame or what details to give them. My two newest machines work: Ryzen 9 3900XT i5 8600K My older 5 machines do not: Core 2 Quad Q8400 i3 M350 Xeon X5650 (dual) Another Xeon X5650 (dual) Pentium N3700 |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
|
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Maybe some file got corrupted?My 7zip doesn't have the CRC option at the bottom of the menu. I tried upgrading to the latest version, but it's still not there. Do you have a paid version? I tried Windows 11 zip, but that only gives a CRC32 of 087E2283 - is that enough to check? Where do I find what it should be? (OOPS! That's on a good machine, checking a faulty machine now....) Actually forget all that. I've detached and reattached to Rosetta so many times to test it, there's no way those 5 machines always corrupted it. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
Open 7-Zip File Manager. On the 7-Zip window, switch to the Tools menu and then select the Options button. On the Options page, switch to the 7-Zip tab and then check the CRC SHA option. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
I've detached and reattached to Rosetta so many times to test it, there's no way those 5 machines always corrupted it. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
Does bad machine have VM VirtualBox Extension Pack installed, and if it doesn't have it installed, do good machines have it installed |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
All machines are currently running V5 with extensions (as cosmology at home fails with V6). They have previously had V6 with extensions. I always put the relevant extension pack on. From the 7 computers I have, it would appear older machines fail. Do you or anyone else in here have an older machine it works ok on? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
My bad...ATLAS is Vbox But maybe Vbox isn't the issue, maybe its the underlying program structure of the project/task? Have you tried ATLAS or any other Vbox projects as a elimination of Vbox problems? If they also fail, then its Vbox. If just python fails, then its the python code that does not work. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
My bad...ATLAS is VboxAtlas (and CMS and Theory) work on 6 of 7 machines (after telling AVG antivirus to not scan virtual machines). Python only works on 2 of 7 machines. All I can assume is the Python code only likes certain processors, in my case it happens to be the two newest ones. We need a list of which CPUs are returning good results, I wonder if this can be seen somewhere on Rosetta stats? This page https://boinc.bakerlab.org/rosetta/top_hosts.php has the information, but you'd have to look through and find a host with a processor you're interested in, then see if they happen to have done python tasks. If they haven't, you don't know if they've refused to do them or have failed to do them. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
One host you aborted every python that ever came to it. Let at least one of the "faulty" hosts run python and then post the stderr output here before you abort. Maybe we can see something in the output text of that file that could give us a clue. It is quite possible that since python is being used on such a complex structure that its just to much for your old processors to handle. Let me see your stderr on each failed machine and then if I don't see anything or anyone else does not see anything then we might suggest you talk to the experts at github. The computermezle or however he spells his name is an expert at the github site. He is on here from time to time. If Vbox works with ATLAS then I think its pretty fair to say python is the problem. I've ruled out 64 vs 32 bit (python being 64 and your machines 32) since your running win64. If its not going to bean easy thing to see, the simple thing I would suggest is disable python on your older machines and keep them attached for 4.2 and let them run other less complex projects. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
One host you aborted every python that ever came to it.How long until I get a stderr.txt? If I let them run to the end they take days. I do currently have python disabled on the 5 bad machines, it's just I can't do much Rosetta work that way with 5/7 of my machines waiting for 4.2 work. |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 272 Credit: 507,897 RAC: 334 |
When you find vm that is stuck with that error open storage on vm settings and look where the hard drive image is located. Stderr should be in the same slot folder. https://imgur.com/a/KQhB1dd |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
When you find vm that is stuck with that errorThis is one that has been running for 6 minuts (on an SSD) and has only done 20 seconds of processing, I'll leave it running incase you need to see more.: 2022-03-20 00:53:25 (43240): Detected: vboxwrapper 26202 2022-03-20 00:53:25 (43240): Detected: BOINC client v7.19.0 2022-03-20 00:53:27 (43240): Detected: VirtualBox VboxManage Interface (Version: 5.2.44) 2022-03-20 00:53:27 (43240): Feature: Checkpoint interval offset (186 seconds) 2022-03-20 00:53:27 (43240): Detected: Minimum checkpoint interval (600.000000 seconds) 2022-03-20 00:53:29 (43240): Create VM. (boinc_923b5ddf89fefc4a, slot#0) 2022-03-20 00:53:29 (43240): Setting Memory Size for VM. (6144MB) 2022-03-20 00:53:29 (43240): Setting CPU Count for VM. (1) 2022-03-20 00:53:30 (43240): Setting Chipset Options for VM. 2022-03-20 00:53:30 (43240): Setting Boot Options for VM. 2022-03-20 00:53:30 (43240): Setting Network Configuration for NAT. 2022-03-20 00:53:31 (43240): Disabling VM Network Access. 2022-03-20 00:53:31 (43240): Disabling USB Support for VM. 2022-03-20 00:53:31 (43240): Disabling COM Port Support for VM. 2022-03-20 00:53:31 (43240): Disabling LPT Port Support for VM. 2022-03-20 00:53:32 (43240): Disabling Audio Support for VM. 2022-03-20 00:53:32 (43240): Disabling Clipboard Support for VM. 2022-03-20 00:53:32 (43240): Disabling Drag and Drop Support for VM. 2022-03-20 00:53:33 (43240): Adding storage controller(s) to VM. 2022-03-20 00:53:33 (43240): Adding virtual disk drive to VM. (vm_image.vdi) 2022-03-20 00:53:33 (43240): Adding VirtualBox Guest Additions to VM. 2022-03-20 00:53:33 (43240): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB) 2022-03-20 00:53:34 (43240): Enabling shared directory for VM. 2022-03-20 00:53:34 (43240): Starting VM using VBoxManage interface. (boinc_923b5ddf89fefc4a, slot#0) 2022-03-20 00:53:42 (43240): Successfully started VM. (PID = '42088') 2022-03-20 00:53:42 (43240): Reporting VM Process ID to BOINC. 2022-03-20 00:53:42 (43240): Guest Log: BIOS: VirtualBox 5.2.44 2022-03-20 00:53:42 (43240): Guest Log: CPUID EDX: 0x078bfbff 2022-03-20 00:53:42 (43240): Guest Log: BIOS: ata0-0: PCHS=16383/16/63 LCHS=1024/255/63 2022-03-20 00:53:42 (43240): VM state change detected. (old = 'poweredoff', new = 'running') 2022-03-20 00:53:42 (43240): Preference change detected 2022-03-20 00:53:42 (43240): Setting CPU throttle for VM. (100%) 2022-03-20 00:53:42 (43240): Setting checkpoint interval to 600 seconds. (Higher value of (Preference: 30 seconds) or (Vbox_job.xml: 600 seconds)) 2022-03-20 00:53:44 (43240): Guest Log: BIOS: Boot : bseqnr=1, bootseq=0032 2022-03-20 00:53:44 (43240): Guest Log: BIOS: Booting from Hard Disk... 2022-03-20 00:53:49 (43240): Guest Log: vgdrvHeartbeatInit: Setting up heartbeat to trigger every 2000 milliseconds 2022-03-20 00:53:49 (43240): Guest Log: vboxguest: misc device minor 58, IRQ 20, I/O port d020, MMIO at 00000000f0400000 (size 0x400000) 2022-03-20 00:53:54 (43240): Guest Log: VBoxService 5.2.42 r137960 (verbosity: 0) linux.amd64 (May 13 2020 21:45:13) release log 2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000224 main Log opened 2022-03-20T00:53:52.350889000Z 2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000407 main OS Product: Linux 2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000460 main OS Release: 4.19.0-14-amd64 2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000518 main OS Version: #1 SMP Debian 4.19.171-2 (2021-01-30) 2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000569 main Executable: /opt/VBoxGuestAdditions-5.2.42/sbin/VBoxService 2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000570 main Process ID: 539 2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.000571 main Package type: LINUX_64BITS_GENERIC 2022-03-20 00:53:54 (43240): Guest Log: 00:00:00.002363 main 5.2.42 r137960 started. Verbose level = 0 |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org