Message boards : Number crunching : Current issues with 7+ boinc client
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
mmstick Send message Joined: 4 Dec 12 Posts: 8 Credit: 606,792 RAC: 0 |
I had no idea this was a problem. I've been crunching with my Radeon HD 7950 in World Community Grid and POEM@Home while doing Rosetta@home tasks and never had a single problem with invalidated or errored work units; Using BOINC v7 as well. Wrong, don't try to look at my stuff; I don't run this project on anything but an old laptop. I aborted all tasks about a week ago on my desktop after I switched completely to World Community Grid because it demands my entire CPU to keep my GPU fed (Note my RAC, a high end desktop CPU would not have a low RAC). The compute error was caused from restarting the client abruptly. All work units passed successfully until I switched. I merely ran this project for one day on my high end desktop (FX-8120@4Ghz+HD7950 in POEM/WCG). Not a single task failed. This has nothing to do with AMD GPUs as far as I am concerned, nor do I see why it would be involved with NVIDIA GPUs. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
I had no idea this was a problem. I've been crunching with my Radeon HD 7950 in World Community Grid and POEM@Home while doing Rosetta@home tasks and never had a single problem with invalidated or errored work units; Using BOINC v7 as well. There are many different system configurations. Yours apparently isn't affected by the bug, but most, if not ALL, of the systems suffering from this bug have a a GPU involved. It shouldn't really matter anyways, it's a Rosetta@home problem, not a cruncher problem. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
I think I've said this before, and I'm going to say it again. :) I thought they have said they could not use the gpu because of the way the processing is done? |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
I had no idea this was a problem. I've been crunching with my Radeon HD 7950 in World Community Grid and POEM@Home while doing Rosetta@home tasks and never had a single problem with invalidated or errored work units; Using BOINC v7 as well. So you ran Poem, WCG and Rosetta at the same time and had no problems? I have a large handful of pc's and can run Rosetta ONLY on those that are NOT also running gpu projects. Any time I try to run Rosetta units on a pc that is ALSO crunching with the gpu the Rosetta units fail, no matter whether it is an AMD or an Nvidia gpu in the machine. |
mmstick Send message Joined: 4 Dec 12 Posts: 8 Credit: 606,792 RAC: 0 |
Yes, I was running POEM and WCG GPU projects and Rosetta@home at the same time. Not a single error. |
microchip Send message Joined: 10 Nov 10 Posts: 10 Credit: 2,242,206 RAC: 1,430 |
Yes, I was running POEM and WCG GPU projects and Rosetta@home at the same time. Not a single error. Same here on Linux with BOINC 6.12.34. Everything runs fine even with GPU crunching Team Belgium |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
Yes, I was running POEM and WCG GPU projects and Rosetta@home at the same time. Not a single error. I guess we can now see why the issue hasn't been fixed yet, because it is only seen by some people, not everyone! That is like taking you car to the mechanic because it is making a noise and when you get there it stops! |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,888,112 RAC: 10,542 |
One more fresh example of 100% error rate at R@H https://boinc.bakerlab.org/rosetta/results.php?hostid=1582894&offset=20 Kepler GPU as well (this time mobile version in notebook). Errors starts at 21.12.12 after video driver update. Was ver. 306.97 and R@H runs relatively normal. After update to 310.70 NV driver - 100% error rate at validation stage. |
JAMES DORISIO Send message Joined: 25 Dec 05 Posts: 15 Credit: 200,879,781 RAC: 36,784 |
I just upgraded another computer to Ubuntu linux 12.04 amd64 ,nvidia driver 310.14, Boinc 7.0.27, all downloaded from the Ubuntu repository. I have the run time preference set at 12 hours. https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1485068 This computer was running Ubuntu 10.04 amd64 nvidia driver 304.** Boinc 6.10.17. All hardware remained the same, it was and still is running Gpu work from GPUgrid and WCG on a GTS450. Just an upgrade to Ubuntu 12.04 along with the new versions of Boinc and nvidia drivers that came with it. Before the upgrade it ran with no errors after the upgrade it has produced 3 errors out of 3 work units. I have stopped new tasks from Rosetta@home for now. It is successfully completing WCG human proteome folding phase 2 work units from WCG which use Rosetta software. The below computer also is affected by this bug. See message 74598 https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1579123 I have 3 more computers to upgrade but it looks like they will not be able to run here if I do. For now I will hold off. It would be nice if someone from the Rosetta staff could post here so we know they are looking into this. Thanks Jim |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
I just upgraded another computer to Ubuntu linux 12.04 amd64 ,nvidia driver 310.14, Boinc 7.0.27, all downloaded from the Ubuntu repository. I have the run time preference set at 12 hours. The guy that started this thread is a 'Rosetta guy', but he hasn't come back very much!! MERRY CHRISTMAS!! |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,888,112 RAC: 10,542 |
One more fresh example of 100% error rate at R@H If turn off kepler GPU in notebook bios(switch to integrated)- R@H errors is gone. Turn on - all WUs fails after validation. Revert to old (v. 306.97) video driver - WUs finish OK again. So now source of problem is clear - some sort of conflict between R@H and latest NV video drivers (and drivers must be active, not just installed in system). |
tanstaafl9999 Send message Joined: 8 Mar 12 Posts: 2 Credit: 1,688,827 RAC: 0 |
Just to add to the confusion: I stopped doing Rosetta WU's (the 3.45 ones) several weeks ago because of a 100% failure rate. After all the comments here about the possible connection to GPU crunching, I decided to (temporarily) stop doing GPU work units and see if that would let me do Rosetta WU's without problems. Without any GPU WU's, I successfully conpleted several Rosetta WU's. So I decided to start doing GPU work units again to see if that would cause the Rosetta WU's to start failing. After a couple of days running both GPU and Rosetta WU's, I've only had one Rosetta WU fail... (Before, I was getting a 100% failure rate when I was running Rosetta and GPU work units together.) I'm using the exact same AMD video drivers and I haven't done any hardware changes to my computer recently. It seems to be working OK at the moment, so I'll keep running GPU and Rosettta work units for the time being and hope it keeps on working. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
One more fresh example of 100% error rate at R@H And NOT just Nvidia drivers though, I have a handful of AMD cards and those machines get nothing but errors too. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
Just to add to the confusion: GOOD LUCK and I hope it keeps working for you! |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,888,112 RAC: 10,542 |
Please add one or two examples of the same error on computer with ATI / AMD cards (and not NV+ATI cards in same computer) to our collection. I saw several dozen computers with this problem and all of them have been installed nVidia card. And in 3 of them replaceing/turn off NV card solve problem (but just stop crunching on it was not enough). |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
I am crunching for other projects right now so can't do that, but NONE of my machines with AMD cards ALSO have Nvidia cards so I am the example. On my list of pc's, you have to click on All Computers, only my Servers do not have AMD cards in them, they are too old and do not even have pci-e slots in them. They have standalone gpu's, but not ones that can crunch. Sorry I do not see any workunits listed under any of my pc's though. |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 25,888,112 RAC: 10,542 |
2 mikey Yes, i checks your computers list before my post. But R@H do not show any information about GPUs installed (due very old server side BOINC code) - that's why I asked about the NV card. And yes - not any workunits listed because you stop crunch R@H long ago (at least a few month ago) and all the WU you complete have already been removed from the database. So if you have a little time, you can connect one or two machines back (or to permit the new job if they just in NNT mode) with AMD card only and were you saw bug before. And let R @ H work say half a day before switch back to your main projects. Or set target CPU time to 1hr and run just few hours - to munimaze lost of resourses. I was just curious (and as well this may help project programmers pinpoint the cause of the error) - it is the exactly same bug on host with AMD cards, or something else. On "problem" computers with NV cards they look like: 1. Not just a very large number of errors, but all (100%) WUs fail with no exceptions. 2. Error appears only after WUs validation on server - no any visible errors at client side while WU runs or in WUs logs. 3. In tasks logs information about rosetta version is missing. + physical remove (or turn off in BIOS for laptops) of NV card stop this errors. (not sure in all or just in some cases - too few statistics) |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
2 mikey Give me a couple of days, I am busy with family at the moment AND almost at a mini goal on a cpu project, then I will connect one or two and see what happens. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
Mad Max I have add4ed two pc's to Rosetta...the first has NO gpu but is running Boinc version 7.0.40. The second is also running Boinc 7.0.40 but DOES have an AMD gpu in it and it is crunching for Collatz. We will see what happens from here! |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,152,433 RAC: 4,296 |
Mad Max I have add4ed two pc's to Rosetta...the first has NO gpu but is running Boinc version 7.0.40. The second is also running Boinc 7.0.40 but DOES have an AMD gpu in it and it is crunching for Collatz. [update] BOTH units completed and got credits! I guess I will bring more pc's over. |
Message boards :
Number crunching :
Current issues with 7+ boinc client
©2024 University of Washington
https://www.bakerlab.org