Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 151 · 152 · 153 · 154 · 155 · 156 · 157 . . . 309 · Next
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I could do, but I've decided to stick to Gridcoin as I've worked out it pays for the hardware in 1.5 years of crunching, so not to be sneezed at. I've had a glance at it, but never really dug into it at all. After 1.5 years. Slow but pretty good investment. I killed a CPU after a year of hard flat rate OC. So I don't do that anymore, that was an expensive lesson that even Gridcoin could not pay in 1.5 years. 2 places with a bunch of tech time. New MOBO and new CPU (that hurt). Since I want to run a bunch of projects I just dropped $150 (aprox) for 2 sticks of 16GB of memory, since python is such a memory hog and RAH does not allow us to control the amount of cores used. 2 sticks of 4 have been with me since they were some of the largest memory on the market many moons ago. I know expense. Now if GC could pay for my electric, then that might be something to look at. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
I don't OC because of crashes and data corruption. Never killed a CPU though, I thought they lasted forever. But then I never experimented too harshly with overvoltage, which sounds a very nasty thing to do to a chip. I don't actually see the point, Intel/AMD test these chips thoroughly to see what speed can go at reliably. I'm sure they know what they're doing. There are ways of getting cheap electricity, some of which are legal :-) Discounts for direct debit, paperless billing, duel fuel, read your own meter, etc. And choosing a supplier that's 30% cheaper. Or using night time rates. Or installing solar panels and taking absurd government subsidies. Just connected two dual xeons (I needed a proprietary cable to make the stupid things boot up), then fixed the same duplicated ID cloning problem I had before. They total 48 cores, and they're taking pythons :-) |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
It's just a pity you can't use LHC with gridcoin. The reason being the stupid creditnew system screws up with multicore tasks, and is very easy to cheat, so people were managing 10x the coins they were due and taking money from the rest of us. LHC refuse to fix it and say it's Boinc's fault, which I agree with. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
If it's not my Vbox version, the only other difference is AMD vs Intel. Do AMDs run the Rosetta Python ok? They have different virtualization technology. And by ok I mean check if they are validated on the server, since they appeared alright on my end. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Shouldn't be. Nobody else has complained about Intel. I'm a AMD user, so really can't be of any help. Again, via our view all the tasks on your computer have disaoeared. If I can find some time tomorrow [Friday] (EU time) I will try to dig into validate errors. Everything from my system is chugging along just fine. You know there is one other thing you can check...look at the task itself and see if it was sent to another computer and what that computer got out of it. Valid or invalid. If you both got invalid, then there is something wrong with the data. If your #1 and invalid and then you look at it again and #2 is valid, then there is something wrong with your data. You are completing the tasks, but get validation inconclusive? Can you copy the readout from Stderr output on the task page if it is anything other than something like this: <core_client_version>7.16.20</core_client_version> <![CDATA[ <stderr_txt> command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting -parser:protocol pdblite_boinc_998_10_tfirst--fuse--predictor_v11_boinc_fix--fuse--tslp_design_v1_boinc_fix_tyr.xml @tau_site_altern_row2_V_gggraft_bcov_flags -in:file:silent tau_site_altern_row2_V_gggraft_bcov_v1_xaa_SAVE_ALL_OUT_IGNORE_THE_REST_2oa4rj8j.silent -in:file:silent_struct_type binary -silent_gz -mute all -silent_read_through_errors true -out:file:silent_struct_type binary -out:file:silent default.out -in:file:boinc_wu_zip tau_site_altern_row2_V_gggraft_bcov_v1_xaa_SAVE_ALL_OUT_IGNORE_THE_REST_2oa4rj8j.zip @tau_site_altern_row2_V_gggraft_bcov_v1_xaa_SAVE_ALL_OUT_IGNORE_THE_REST_2oa4rj8j.flags -nstruct 100 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2406449 Using database: database_357d5d93529_n_methylminirosetta_database ====================================================== DONE :: 100 starting structures 11805.6 cpu seconds This process generated 100 decoys from 100 attempts ====================================================== BOINC :: WS_max 6.34278e+08 12:22:14 (16996): called boinc_finish(0) </stderr_txt> ]]> This was a valid task....I haven't had any invalids in so long.... |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Shouldn't be.Have you misread my post? I'm having problems with AMD, and not Intel. If your AMDs are working fine, this shows AMD is ok. Rosetta recommend the latest Vbox. So if it isn't either, I can't think why my Ryzen has a problem. They all completed successfully here, but failed to validate on the server. Again, via our view all the tasks on your computer have disaoeared.Don't know why that happened. You can see the totals of consecutive validations, but not individual tasks, see https://boinc.bakerlab.org/rosetta/host_app_versions.php?hostid=6167614 under rosetta python projects - "Number of tasks completed 89", "Consecutive valid tasks 0" If I can find some time tomorrow [Friday] (EU time) I will try to dig into validate errors.I would assume if there was something wrong with the data, I was very unlucky. Assuming the grcpool admin resets my computer, I should be lucky next time. You are completing the tasks, but get validation inconclusive?Can't get to such things on my machine, due to grcpool owning the account. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
I'm having problems with AMD, and not Intel That is interesting because I was working with the intel Vs AMD idea Except i have more problems with my intel xeon cruncher than my AMD opteron, pop go`s another theory as to why this stuff happens. I have tried 5xx and 6xx Vbox and it seemed to make no difference to my problems. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
So far I've only proved an Intel i5 works, and an AMD Ryzen 9 doesn't. I have 4 old Intel Xeons (X5650, 3 years older than yours) running overnight on python, I'll find out tomorrow if they work and post here.I'm having problems with AMD, and not Intel |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
So far I've only proved an Intel i5 works, and an AMD Ryzen 9 doesn't. I have 4 old Intel Xeons (X5650, 3 years older than yours) running overnight on python, I'll find out tomorrow if they work and post here.I'm having problems with AMD, and not Intel Why would Intel process the data any differently than AMD? Data is data, a program is a program. Or is Intel garbling the data? And Peter, I am responding late at night and reading fast, so I might misread some details of your post. Sorry. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Anything using virtualbox, like Rosetta's Python, or anything from LHC, requires hardware virtualisation, which is done differently with Intel and AMD. I can't find any info on what is different other than it's not very significant, but there may be something that causes a bug in one and not the other. But my Intels all work as far as I know (haven't had a validation from the xeons yet) and my AMD doesn't, which is the opposite of what you get, so perhaps it's nothing to do with AMD/Intel. I do notice however that if I have virtualbox on all the AMD's cores, the Windows interface slows to a crawl, and I've not seen that with an Intel, so something is different.So far I've only proved an Intel i5 works, and an AMD Ryzen 9 doesn't. I have 4 old Intel Xeons (X5650, 3 years older than yours) running overnight on python, I'll find out tomorrow if they work and post here.I'm having problems with AMD, and not Intel And Peter, I am responding late at night and reading fast, so I might misread some details of your post. Sorry.I have trouble sleeping, so my hours are weird, I'm possibly half dozed off sometimes too. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I have trouble sleeping, so my hours are weird, I'm possibly half dozed off sometimes too. Not me, 1am is my limit. Then I am off to bed and need the 8 hour recharge. 7.5 is the minimum. But I have a very physically demanding job. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Lucky you, I have chronic fatigue :-(I have trouble sleeping, so my hours are weird, I'm possibly half dozed off sometimes too. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Why would Intel process the data any differently than AMD? This isn't any data this is `Python` data , and it will wot funky stuff it wants. [that is a skit on the M&S adverts of uk tv] |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
Well so far my i5 has worked perfectly, my Ryzen got banned, and my old Xeons I just noticed have spent 24 hours running python tasks with a total of 13 minutes CPU time. I wondered why they felt cold to the touch. There's something terribly wrong with these WUs. These are the two Xeons, I'm in the process of aborting the tasks, if anyone can look and interpret the outputs. https://boinc.bakerlab.org/rosetta/results.php?hostid=6169682 https://boinc.bakerlab.org/rosetta/results.php?hostid=6169697 Make sure you look at the right ones, the ones aborted just now, not the ones aborted yesterday (that was something else when I was trying to set things up). Here is a dodgy one, many errors, please interpret: https://boinc.bakerlab.org/rosetta/result.php?resultid=1463541284 It includes many of these lines: Hypervisor System Log: 24:11:34.575288 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={85cd948e-a71f-4289-281e-0ca7ad48cd89} aComponent={MachineWrap} aText={The object functionality is limited}, preserve=false aResultDetail=0" I have asked over in the main Boinc forum too, https://boinc.berkeley.edu/dev/forum_thread.php?id=14532 |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
I've asked in the LHC forum, since they use vbox on almost all tasks and might know what the problem is: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5781 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,014 |
I've asked in the LHC forum, since they use vbox on almost all tasks and might know what the problem is: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=5781 Don't confuse vbox (which handles 32-bit work) with vbox64 (which handles 64-bit work). |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
I assume everyone is on vbox64 by now? LHC will be, and they seem to use the same wrapper as Rosetta. I'm not sure what it is you're trying to tell me. I only installed one piece of software, virtualbox, from the Oracle site, same version that Boinc issues. Are you telling me there's two halves and Rosetta uses the other one to LHC? My i5 which does python ok has vboxheadless and virtualbox interface listed in the windows task manager azs running, no mention of 32 or 64 bit. After following the advice from the LHC forum, I am no further forwards. My old xeons don't do any CPU time, my Ryzen (I think, can't check as it's now banned) computes but is not validated, and my i5 runs them perfectly. Same version of everything on all of them. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,014 |
RNA World is still on vbox, but they're down to 19 unfinished workunits. So, not everyone. Virtualbox (at least the latest versions) has two parts, the vbox part for 32-bit work and the vbox64 part for 64-bit work. Rosetta. and probably also LHC. use the vbox64 part. I don't participate in LHC. so I haven't seen what they use. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,116,986 RAC: 9,863 |
RNA World is still on vbox, but they're down to 19 unfinished workunits. So, not everyone.But LHC and Rosetta are 64 bit? And how does RNA world work, do you have to download an old 32 bit version? |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Well so far my i5 has worked perfectly, my Ryzen got banned, and my old Xeons I just noticed have spent 24 hours running python tasks with a total of 13 minutes CPU time. I wondered why they felt cold to the touch. There's something terribly wrong with these WUs. It was chugging along just fine and then blows up with access denied? That's weird. Did windows all of sudden block it or it ran into a fault with the data. That it ran 24 hours is really odd. These finish in 4 hours or less. A quick look with the object statement says something went wrong in Vbox. If that happens repeatedly, then you need to remove Vbox and reinstall it. Again its very late in the EU, so I will have to dig into more later. Maybe our two experts can help you more. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org