Client Errors

Message boards : Number crunching : Client Errors

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8

AuthorMessage
wbblakemore

Send message
Joined: 18 Dec 07
Posts: 33
Credit: 4,181
RAC: 0
Message 72992 - Posted: 4 May 2012, 22:55:57 UTC

OK .... we're coming up on three months since this thread was opened back in mid-February. At this point, I'm tempted to just write Rosetta off as a bad idea.

How about it, support people? Are we any closer to a fix for this problem?
ID: 72992 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,756,248
RAC: 13,174
Message 72995 - Posted: 5 May 2012, 10:49:57 UTC - in response to Message 72992.  

OK .... we're coming up on three months since this thread was opened back in mid-February. At this point, I'm tempted to just write Rosetta off as a bad idea.

How about it, support people? Are we any closer to a fix for this problem?


I AGREE, since Ralph works WHY have they not discussed moving everything over there and at least doing SOMETHING worthwhile?!!! For a project like Rosetta and as big as they say they are and to get the funding they do this is PATHETIC!!!
ID: 72995 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (retired account)

Send message
Joined: 4 May 12
Posts: 5
Credit: 200,841
RAC: 0
Message 72996 - Posted: 5 May 2012, 12:24:32 UTC - in response to Message 72714.  

Is there anyone with our problem that does not have all of the following attributes:

1) one or more NVIDIA GPU's, &
2) running Win7 64-bit, &
3) Intel I7 processor ?



Joined yesterday to participate in the Pentathlon, but all workunit so far failed with client errors.

1) yes
2) yes
3) no, only an AMD Phenom II X6

Stopped Einstein GPU units, but no effect, still client errors. Since I lack the time for tweaking in the moment, I will try to participate only with my subnotebook, unfortunately being a lot weaker.

System specs for the records:
CPU: AMD Phenom II X6 1090T @ 3.20GHz (stock speed)
RAM: 8GB
GPU: NVIDIA GeForce GTX 560 Ti (2048MB) driver: 285.62
OS: Win7 Prof. x64 Edition
BOINC: 7.0.25 (64bit)

Regards
ID: 72996 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
A.M.

Send message
Joined: 13 Jun 06
Posts: 12
Credit: 954,586
RAC: 0
Message 72998 - Posted: 5 May 2012, 18:02:38 UTC

I've been getting some good WUs... still a lot of errors, although not of the type seen previously.

Most of what I'm seeing right now seems to be memory Access Violations.
ID: 72998 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (retired account)

Send message
Joined: 4 May 12
Posts: 5
Credit: 200,841
RAC: 0
Message 73008 - Posted: 6 May 2012, 17:54:32 UTC - in response to Message 72996.  

Joined yesterday to participate in the Pentathlon, but all workunit so far failed with client errors.


Footnote: astonishingly enough I have accumulated credits without getting granted credits... ?
ID: 73008 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73010 - Posted: 7 May 2012, 1:34:28 UTC

R@h awards credit, even for tasks that end with computation errors. This credit is done on a daily basis, and is not reflected on the work units display of granted credit. You have to look at each specific task's details to see the granted credit.

Welcome aboard!

I see your 6 CPU system is having consistent client errors. What BOINC version are you running on that machine?
Rosetta Moderator: Mod.Sense
ID: 73010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
woland

Send message
Joined: 17 Dec 05
Posts: 5
Credit: 124,792
RAC: 0
Message 73012 - Posted: 7 May 2012, 9:23:43 UTC

Guys, I'm sorry to say that, but this is really embarrassing. I'm also software developer and I also have to deal with user reported bugs and I cannot imagine having a bug reported 3 months ago, with tons of data to reproduce the issue, and no answer. There's a bug in validation code - where else could it be? Results are calculated correctly but marked as invalid because of CUDA information in it. How long can it take to debug the validation code and cover the uncaught parsing exception or whatever it is...
ID: 73012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,756,248
RAC: 13,174
Message 73013 - Posted: 7 May 2012, 10:39:47 UTC - in response to Message 73012.  

Guys, I'm sorry to say that, but this is really embarrassing. I'm also software developer and I also have to deal with user reported bugs and I cannot imagine having a bug reported 3 months ago, with tons of data to reproduce the issue, and no answer. There's a bug in validation code - where else could it be? Results are calculated correctly but marked as invalid because of CUDA information in it. How long can it take to debug the validation code and cover the uncaught parsing exception or whatever it is...


I am not sure they care enough yet! Sure they care some but as long as people are still sending back units Rosetta is turning out the research, it is just NOT as helpful as it should be. When their workunits dry up, probably not anytime soon as they are STILL doing Challenges even now, then they will say 'sorry we missed it, it was a bug in a couple of lines of code and should be fixed now, yada, yada, yada'! I wish we had the power to write some of their sponsors and put a bug in their ear about the problems! I don't say that to be mean, I LIKE Rosetta!!! Rosetta is getting to be like the GSA movies, 'PARTY ON DUDE, the money is rolling in, who cares that we have very little results to show for it!' I just don't think they care about the crunchers right now, too many OTHER things going on!! Many years ago Seti had a problem, they stopped sending out units instead of overloading the server to send out a unit, get it back as bad and having to send it out again and again and again!! As bad as Seti could, and can, be, it STILL did some things to perfection!!!
ID: 73013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile In Memory of Kimsey M Fowler Sr

Send message
Joined: 10 Mar 12
Posts: 26
Credit: 39,033,222
RAC: 0
Message 73031 - Posted: 9 May 2012, 18:59:29 UTC - in response to Message 73013.  

Guys, I'm sorry to say that, but this is really embarrassing. I'm also software developer and I also have to deal with user reported bugs and I cannot imagine having a bug reported 3 months ago, with tons of data to reproduce the issue, and no answer. There's a bug in validation code - where else could it be?


I'm just shaking my head in frustration about this too... I'm a former software engineer from an SEI CMM Level 5 software design organization in Seattle. I was dumbstruck by an e-mail back from Rosetta staff yesterday that no further effort will be expended to determine why the Rosetta servers are rejecting WU's. One can assume that the staff doesn't see this problem as widespread enough to make it worth their time to look into. Yesterday's post by David Baker suggests he is thrilled with the available computing power available to the project at the present time.

Might I suggest that those of you experiencing this problem consider donating your computing resources to folding@home at Stanford Medical School. Their software is self-contained, stable, doesn't run under BOINC middleware, computes on your choice of CPU and/or GPU, has an excellent working simulation of the protein molecule that can be manipulated with the mouse to view/rotate/enlarge/etc, lots of interesting information that's easily accessible about each protein you're folding and why it is important, and is the world's largest computing network.
ID: 73031 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wbblakemore

Send message
Joined: 18 Dec 07
Posts: 33
Credit: 4,181
RAC: 0
Message 73032 - Posted: 9 May 2012, 20:13:39 UTC - in response to Message 73031.  



I'm just shaking my head in frustration about this too... I'm a former software engineer from an SEI CMM Level 5 software design organization in Seattle. I was dumbstruck by an e-mail back from Rosetta staff yesterday that no further effort will be expended to determine why the Rosetta servers are rejecting WU's. One can assume that the staff doesn't see this problem as widespread enough to make it worth their time to look into. Yesterday's post by David Baker suggests he is thrilled with the available computing power available to the project at the present time.

Might I suggest that those of you experiencing this problem consider donating your computing resources to folding@home at Stanford Medical School. Their software is self-contained, stable, doesn't run under BOINC middleware, computes on your choice of CPU and/or GPU, has an excellent working simulation of the protein molecule that can be manipulated with the mouse to view/rotate/enlarge/etc, lots of interesting information that's easily accessible about each protein you're folding and why it is important, and is the world's largest computing network.


Thanks for the update. Words simply fail me when I try to express my contempt for support staff that can't be bothered with actually providing support.

My best regards to all those valiant users who tried to assist in dealing with this issue. You're good people who have earned my respect. I'm outta here ...
ID: 73032 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
woland

Send message
Joined: 17 Dec 05
Posts: 5
Credit: 124,792
RAC: 0
Message 73033 - Posted: 9 May 2012, 20:38:14 UTC - in response to Message 73031.  

I was dumbstruck by an e-mail back from Rosetta staff yesterday that no further effort will be expended to determine why the Rosetta servers are rejecting WU's.

Please tell me that this is a joke...
ID: 73033 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sky King

Send message
Joined: 28 Feb 12
Posts: 11
Credit: 15,912
RAC: 0
Message 73034 - Posted: 9 May 2012, 20:51:19 UTC - in response to Message 73031.  
Last modified: 9 May 2012, 20:53:00 UTC

Might I suggest that those of you experiencing this problem consider donating your computing resources to folding@home at Stanford Medical School. Their software is self-contained, stable, doesn't run under BOINC middleware, computes on your choice of CPU and/or GPU, has an excellent working simulation of the protein molecule that can be manipulated with the mouse to view/rotate/enlarge/etc, lots of interesting information that's easily accessible about each protein you're folding and why it is important, and is the world's largest computing network.


If people want some help/advice on F@H, thee are probably some here who can provide a lot of insight and help in getting started. I myself am a 10 million point F@H contributor, and have been at the very bleeding edge of beta'ing the newest Windows SMP, Windows GPU, and linux VM appliance clients.

I don't want to use up a lot of R@H's forum space touting a "competitor" but here's my observations about F@H. Running a basic F@H client as a service in the background of your PC is very straightforward and involves very little interaction from you, the user. Install it, fire it up, let it run, and check on your stats every week or so, and you're good to go, and maybe churn out 2,000 PPD. However, as you move up the performance curve to optimized SMP or GPU configs and you're trying to squeeze out every last bonus point, the workload increases quite a bit. That was kind of the downside of F@H for me, I was squeezing every last point out of my i7 and getting big bonuses. (Under an initiative called "-bigadv", users of higher end i7s and up can opt in to a bonus program where you get time-sensitive bonus points for returning very large, complex units quickly--like 3 day deadlines.) In fact, I couldn't run the GPU client because the i7 needs every spare cycle in order to make the deadlines and thus be bonus eligible... and the bonuses for this scientifically urgent work were way more points than my ATI 4850 could churn.

After over a year of being deeply involved in profiling the performance optimization of the i7 using both Windows native SMP clients and VMware linux appliances, suddenly I got hammered by Stanford about 3 months ago... 8 core i7's are no longer bonus eligible, you have to be running at least 16 cores on the same WU or you can't make the deadlines and you lose all your points. Not even a high overclock can get you home on 8 cores. Suddenly I was going from 20,000 PPD on my i7 alone to a max of about 3 or 4,000 PPD.

Feeling somewhat abandoned, but committed to protein folding, I decided to bail out of the huge workload of super-optimized F@H and opted into the simple life of BOINC-managed folding at R@H. What could be easier, this will be great!

Of course, I got 8 WUs a day and couldn't figure out why my CPU was never busy, and when I investigated, I found the client error issue, hence my bump of this thread a few months ago.

So I am right back where I started... Big ass CPU, big ass GPU, and feeling like no one is all that jazzed about my willingness to contribute it all to optimized folding. R@H doesn't want my cycles because I don't want to have to downgrade back to my ATI 4850 card and abandon my brand new NV 560.

So, I can pull my new nvidia 560 out and run R@H... or I can leave it in and run pretty "stock" F@H clients easily for about 2K PPD, or I can put in a lot of work and run a carefully managed, optimized F@H config and maybe get 6K PPD in return for the huge fan noise, heat, and power cost associated with running the CPU and both GPU cores at 100%.

I haven't decided what to do. For now, I have the i7 on BOINC with all my cycles going to the World Community Grid. I was staying on BOINC in the hopes that R@H would be fixed, but I wuill probably wait for a long weekend with some down time and switch my iron back to F@H.

But my point is, if people here need help with F@H, there are some pretty experienced folders here.
ID: 73034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,756,248
RAC: 13,174
Message 73035 - Posted: 10 May 2012, 10:21:50 UTC - in response to Message 73034.  

Might I suggest that those of you experiencing this problem consider donating your computing resources to folding@home at Stanford Medical School. Their software is self-contained, stable, doesn't run under BOINC middleware, computes on your choice of CPU and/or GPU, has an excellent working simulation of the protein molecule that can be manipulated with the mouse to view/rotate/enlarge/etc, lots of interesting information that's easily accessible about each protein you're folding and why it is important, and is the world's largest computing network.


If people want some help/advice on F@H, thee are probably some here who can provide a lot of insight and help in getting started. I myself am a 10 million point F@H contributor, and have been at the very bleeding edge of beta'ing the newest Windows SMP, Windows GPU, and linux VM appliance clients.

I don't want to use up a lot of R@H's forum space touting a "competitor" but here's my observations about F@H. Running a basic F@H client as a service in the background of your PC is very straightforward and involves very little interaction from you, the user. Install it, fire it up, let it run, and check on your stats every week or so, and you're good to go, and maybe churn out 2,000 PPD. However, as you move up the performance curve to optimized SMP or GPU configs and you're trying to squeeze out every last bonus point, the workload increases quite a bit. That was kind of the downside of F@H for me, I was squeezing every last point out of my i7 and getting big bonuses. (Under an initiative called "-bigadv", users of higher end i7s and up can opt in to a bonus program where you get time-sensitive bonus points for returning very large, complex units quickly--like 3 day deadlines.) In fact, I couldn't run the GPU client because the i7 needs every spare cycle in order to make the deadlines and thus be bonus eligible... and the bonuses for this scientifically urgent work were way more points than my ATI 4850 could churn.

After over a year of being deeply involved in profiling the performance optimization of the i7 using both Windows native SMP clients and VMware linux appliances, suddenly I got hammered by Stanford about 3 months ago... 8 core i7's are no longer bonus eligible, you have to be running at least 16 cores on the same WU or you can't make the deadlines and you lose all your points. Not even a high overclock can get you home on 8 cores. Suddenly I was going from 20,000 PPD on my i7 alone to a max of about 3 or 4,000 PPD.

Feeling somewhat abandoned, but committed to protein folding, I decided to bail out of the huge workload of super-optimized F@H and opted into the simple life of BOINC-managed folding at R@H. What could be easier, this will be great!

Of course, I got 8 WUs a day and couldn't figure out why my CPU was never busy, and when I investigated, I found the client error issue, hence my bump of this thread a few months ago.

So I am right back where I started... Big ass CPU, big ass GPU, and feeling like no one is all that jazzed about my willingness to contribute it all to optimized folding. R@H doesn't want my cycles because I don't want to have to downgrade back to my ATI 4850 card and abandon my brand new NV 560.

So, I can pull my new nvidia 560 out and run R@H... or I can leave it in and run pretty "stock" F@H clients easily for about 2K PPD, or I can put in a lot of work and run a carefully managed, optimized F@H config and maybe get 6K PPD in return for the huge fan noise, heat, and power cost associated with running the CPU and both GPU cores at 100%.

I haven't decided what to do. For now, I have the i7 on BOINC with all my cycles going to the World Community Grid. I was staying on BOINC in the hopes that R@H would be fixed, but I wuill probably wait for a long weekend with some down time and switch my iron back to F@H.

But my point is, if people here need help with F@H, there are some pretty experienced folders here.


I think Poem does folding type work too but under Boinc.
ID: 73035 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1894
Credit: 8,756,248
RAC: 13,174
Message 73036 - Posted: 10 May 2012, 10:27:35 UTC - in response to Message 73031.  

Guys, I'm sorry to say that, but this is really embarrassing. I'm also software developer and I also have to deal with user reported bugs and I cannot imagine having a bug reported 3 months ago, with tons of data to reproduce the issue, and no answer. There's a bug in validation code - where else could it be?


I'm just shaking my head in frustration about this too... I'm a former software engineer from an SEI CMM Level 5 software design organization in Seattle. I was dumbstruck by an e-mail back from Rosetta staff yesterday that no further effort will be expended to determine why the Rosetta servers are rejecting WU's. One can assume that the staff doesn't see this problem as widespread enough to make it worth their time to look into. Yesterday's post by David Baker suggests he is thrilled with the available computing power available to the project at the present time.


I agree with the total frustration being expressed above! I have over 1 million Rosetta credits and will not get one single one more as Rosetta does NOT CARE anymore!! Rosetta YOU are a selfish project with your sights set so low that you are at present unable, but more likely unable, to make YOUR project work alongside other projects as Boinc itself is DESIGNED TO DO!!! In the future I expect to see Rosetta on the trash pile of projects that could have been something, but instead died off!
ID: 73036 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lance Stringham

Send message
Joined: 8 Oct 06
Posts: 3
Credit: 38,575,303
RAC: 8,363
Message 73237 - Posted: 6 Jun 2012, 9:53:48 UTC

Any progress on resolving this server validation bug with clients who have gpus installed? I'm still being affected by it and there has been no new information in this thread for a while. I really am starting to lose my patience with this problem.

Thank you.
ID: 73237 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
woland

Send message
Joined: 17 Dec 05
Posts: 5
Credit: 124,792
RAC: 0
Message 73238 - Posted: 6 Jun 2012, 9:58:54 UTC

No, they simply don't care. Sorry Rosetta, I've already left you for Poem. If you don't care - why should I?
ID: 73238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
peristalsis

Send message
Joined: 29 Mar 09
Posts: 8
Credit: 2,421,694
RAC: 0
Message 73263 - Posted: 9 Jun 2012, 9:43:18 UTC

I take a look at my Boinc messages this morning to see how things are going. I see a lot of errors with Rossmann2x3. Check here and see all of my errors duplicated by another machine. It's a relief that it is a problem with Rosetta and not my machine. It is not an enjoyable experience knowing I've wasted some of my bandwidth allowance on processing crap coding. Aborted the remaining Rossmann2x3 unit. Calm down, it's not important, life is not perfect. Just blowing off steam...p
ID: 73263 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The-Real-Link

Send message
Joined: 27 Dec 10
Posts: 6
Credit: 2,676,652
RAC: 0
Message 73279 - Posted: 13 Jun 2012, 21:44:19 UTC
Last modified: 13 Jun 2012, 21:46:37 UTC

Hey guys, same problem here for my E5645 config. Now it's interesting, I was able to run with my old E5620 system for months without any isues at all, and then they started failing. Oddly enough though, I can't even get these new processors to complete a valid unit at all.

I let the project stay detatched or a good week or so and then it did seem to fix itself by downloading my preferred workload (several days) as it queued up a few dozen units. They all appeared to be crunched successfully and uploaded, yet, on my stats page there are pages of "over" and "client errors" shown.

Also running an EVGA board, EVGA GTX 680, with Windows 7 x64. I wouldn't mind crunching for this project but I simply can't get any work.

Despite my log saying the work is successful and that the project was also successfully uploaded, my work queue is stuck at 8 per day (which is odd because that would be true with my old E5620s but not my E5645s) - I'd imagine I should be seeing a minimum of 12 units per day. I turn work in and yet don't see any more than the 8 come back when I should see a doubling if I understand it right. Any help is appreciated.

Sorry for the rambling, just frustrated.
ID: 73279 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1981
Credit: 38,388,808
RAC: 11,641
Message 73280 - Posted: 14 Jun 2012, 1:48:31 UTC - in response to Message 73279.  

Hey guys, same problem here for my E5645 config. Now it's interesting, I was able to run with my old E5620 system for months without any isues at all, and then they started failing. Oddly enough though, I can't even get these new processors to complete a valid unit at all.

I let the project stay detatched or a good week or so and then it did seem to fix itself by downloading my preferred workload (several days) as it queued up a few dozen units. They all appeared to be crunched successfully and uploaded, yet, on my stats page there are pages of "over" and "client errors" shown.

Also running an EVGA board, EVGA GTX 680, with Windows 7 x64. I wouldn't mind crunching for this project but I simply can't get any work.

Despite my log saying the work is successful and that the project was also successfully uploaded, my work queue is stuck at 8 per day (which is odd because that would be true with my old E5620s but not my E5645s) - I'd imagine I should be seeing a minimum of 12 units per day. I turn work in and yet don't see any more than the 8 come back when I should see a doubling if I understand it right. Any help is appreciated.

Sorry for the rambling, just frustrated.

Urgh... Usually I can see something obvious, but the spec of your machine looks high (more than mine anyway) - no idea why yours aren't validating when they seem to complete successfully.

Take a look at this message and see if you can spot anything in your Boinc manager settings that might be a problem. If you can't then it's a real mystery. I doubt it's anything to do with your 1 hour run setting :(
ID: 73280 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 73281 - Posted: 14 Jun 2012, 3:01:18 UTC
Last modified: 14 Jun 2012, 3:09:47 UTC

The most obvious thing that follows recent pattern is that you are running the newer version of BOINC Manager:
<core_client_version>7.0.25</core_client_version>

...which is the topic in this thread.
Rosetta Moderator: Mod.Sense
ID: 73281 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8

Message boards : Number crunching : Client Errors



©2024 University of Washington
https://www.bakerlab.org