No credit!

Message boards : Number crunching : No credit!

To post messages, you must log in.

AuthorMessage
Pixiebot
Avatar

Send message
Joined: 6 Nov 05
Posts: 50
Credit: 60,515
RAC: 0
Message 7246 - Posted: 22 Dec 2005, 20:31:37 UTC
Last modified: 22 Dec 2005, 20:34:50 UTC

Windows 98SE. Rosetta 4.81.

The job below took at least 1 hour 40 minutes, ran to 100%, the job returned as valid, yet no credit. Why?

22 Dec 2005 15:01:49 UTC 22 Dec 2005 20:20:10 UTC Over Success Done 0.00 0.00 0.00

computer summary

Just checked a Windows ME machine, same thing.

Is there any point in running 9X machines?

I know the minimum required is XP but these machines have run well up to now, until Rosetta 4.81 that is :(

Oh and I'm now seeing errors on all my machines.... Not good.

When will these bad jobs flush out of the system?




ID: 7246 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Andrew

Send message
Joined: 19 Sep 05
Posts: 162
Credit: 105,512
RAC: 0
Message 7249 - Posted: 22 Dec 2005, 20:40:17 UTC
Last modified: 22 Dec 2005, 20:41:28 UTC

well you were issued 0.001 for a WUs that took 0 secs here :)

as well as the one you noted: here


The problem is you say it took 100 mins, but the server show it taking 0 secs.
ID: 7249 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pixiebot
Avatar

Send message
Joined: 6 Nov 05
Posts: 50
Credit: 60,515
RAC: 0
Message 7251 - Posted: 22 Dec 2005, 20:42:34 UTC

One can hardly call that credit! However, yes you are right, the job took longer than the credit granted.
ID: 7251 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 575
Credit: 4,595,050
RAC: 2,731
Message 7252 - Posted: 22 Dec 2005, 20:49:56 UTC - in response to Message 7246.  
Last modified: 22 Dec 2005, 20:50:31 UTC

Just checked a Windows ME machine, same thing.

Is there any point in running 9X machines?


I didn't realize Rosetta's minimum was Win2000/XP until today. Some projects go back as far as Win95, so BOINC itself will handle back that far. However, the "not reporting CPU time" is pretty common on Win9x, it's just not a big deal at other projects because of the quorum. You request 0 credits, so you're the "low" value that's thrown out, and all of you get the "middle" value. Here, with no quorum and "you get what you ask for"...

Once flops-counting is implemented, it shouldn't matter, as that will replace the benchmark*time approach. No idea when that will be, however.

ID: 7252 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Andrew

Send message
Joined: 19 Sep 05
Posts: 162
Credit: 105,512
RAC: 0
Message 7253 - Posted: 22 Dec 2005, 20:57:39 UTC - in response to Message 7252.  
Last modified: 22 Dec 2005, 20:58:00 UTC

Once flops-counting is implemented, it shouldn't matter, as that will replace the benchmark*time approach. No idea when that will be, however.


So unless you want to crunch for no credit... suspend until a new client is release that addresses this issue :(
ID: 7253 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7261 - Posted: 22 Dec 2005, 21:51:20 UTC - in response to Message 7246.  
Last modified: 22 Dec 2005, 21:52:22 UTC

Windows 98SE. Rosetta 4.81.... no credit
....Just checked a Windows ME machine, same thing.

Is there any point in running 9X machines?


This ME box has had credit from the 4.81 app. I think you have just been hit worse than most with the batches of bad WU that are coming round just now.

If you or Bill ask I can boot up my Win98SE box again to tet 4.81 (it last ran with app 4.80) -- currently it's booted into Linux.

River~~
ID: 7261 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 575
Credit: 4,595,050
RAC: 2,731
Message 7267 - Posted: 22 Dec 2005, 22:04:37 UTC - in response to Message 7261.  

This ME box has had credit from the 4.81 app. I think you have just been hit worse than most with the batches of bad WU that are coming round just now.

If you or Bill ask I can boot up my Win98SE box again to tet 4.81 (it last ran with app 4.80) -- currently it's booted into Linux.


River, I'm not sure that it would tell us anything. The "not reporting the correct amount of CPU time" issue has been known about Win95/Win98/WinME since the beginning of BOINC, and it comes down to the OS not being "true multitasking". Sometimes it works - sometimes it doesn't. ONE fix I've heard recommended is to either set "leave applications in memory when preempted" to _no_ (just the opposite of the normal Rosetta recommendation), or to run only a single project, so it doesn't get switched out. I don't think either guarantee success, but they improve your odds.

I really doubt it's a 4.80 vs 4.81 issue. If anything, 4.81 probably checkpoints _more_, but I don't see why that would cause the problem.

ID: 7267 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pixiebot
Avatar

Send message
Joined: 6 Nov 05
Posts: 50
Credit: 60,515
RAC: 0
Message 7275 - Posted: 22 Dec 2005, 22:33:57 UTC
Last modified: 22 Dec 2005, 23:02:22 UTC

Some jobs run ok, others don't.

No jobs on my 9X machines have reported valid jobs with 0 time/credit prior to 4.81 I've just checked them all.

All instances have happened on or since the 21st December, when 4.81 was sent out.

17 instances of valid jobs 0 time/credit (total) on 5 diferent boxes, since version 4.81 is not coincidence.

These are not the bad batch jobs River, I've had them too.



EDIT I have found 1 valid job 0 time/credit prior to 4.81. So I take back some of what I said above, but it is happening a lot more frequently (16 instances since 4.81).






ID: 7275 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7398 - Posted: 23 Dec 2005, 19:39:52 UTC - in response to Message 7275.  


17 instances of valid jobs 0 time/credit (total) on 5 diferent boxes, since version 4.81 is not coincidence.

These are not the bad batch jobs River, I've had them too.

EDIT I have found 1 valid job 0 time/credit prior to 4.81. So I take back some of what I said above, but it is happening a lot more frequently (16 instances since 4.81).


Thanks for the figures - I agree a sudden increase on that scale is likley to be causal given that it remains after excluding the bad batches.

It seems then that the difference is one of degree. Not unseen before, but significantly more frequent now. How unreliable it has to get before you take your box to another project is an area where different people will have different feelings.

Myself, I'd already go fed up with win-98 on another project, which is why that box dual boots with Linux - but to go there would be off-topic... It is an alternative to going to another project tho ;-)

River~~
ID: 7398 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 575
Credit: 4,595,050
RAC: 2,731
Message 7407 - Posted: 23 Dec 2005, 20:16:14 UTC - in response to Message 7275.  
Last modified: 23 Dec 2005, 20:18:04 UTC

17 instances of valid jobs 0 time/credit (total) on 5 diferent boxes, since version 4.81 is not coincidence.

These are not the bad batch jobs River, I've had them too.


Did you notice if the 0-credit jobs _followed_ a bad-batch error job? I posted a comment somewhere around here that on SETI, they saw that a "-6" error on one WU would cause the next WU to have 0 CPU time.

Edit:: Looking here at one of your Win98 boxes, this actually does seem to be a strong possibility...
ID: 7407 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pixiebot
Avatar

Send message
Joined: 6 Nov 05
Posts: 50
Credit: 60,515
RAC: 0
Message 7414 - Posted: 23 Dec 2005, 20:53:31 UTC - in response to Message 7398.  
Last modified: 23 Dec 2005, 20:54:02 UTC


17 instances of valid jobs 0 time/credit (total) on 5 diferent boxes, since version 4.81 is not coincidence.

These are not the bad batch jobs River, I've had them too.

EDIT I have found 1 valid job 0 time/credit prior to 4.81. So I take back some of what I said above, but it is happening a lot more frequently (16 instances since 4.81).


Thanks for the figures - I agree a sudden increase on that scale is likley to be causal given that it remains after excluding the bad batches.

It seems then that the difference is one of degree. Not unseen before, but significantly more frequent now. How unreliable it has to get before you take your box to another project is an area where different people will have different feelings.

Myself, I'd already go fed up with win-98 on another project, which is why that box dual boots with Linux - but to go there would be off-topic... It is an alternative to going to another project tho ;-)

River~~



I've already gone! After 5 weeks or so with very smooth running, I was dismayed at all these things seemingly going wrong at once. Seemed like I was beta testing and I don't recollect signing up for that.

Is the science valid on these zero time units? Personal credit isn't the only thing that drives me to run projects, but good science and full credit is the least we should expect I would have thought. Seems at least one of these things isn't happening here at the moment.

I was also trying to troubleshoot a team mates inability to return valid work, then these bad units entered the mix, and hey ho, what's the point....

I was gobsmacked to read in another thread that there is as yet no in-house testing of work about to be sent out, and to make changes with no testing and before the holidays seems ludicrous, and dare I say a touch amateurish.


But, I would just like to say thanks to all the volunteers here who have helped me during this brief stay.



Onward to a more stable and Windows 9X friendly project methinks.


ID: 7414 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 7423 - Posted: 23 Dec 2005, 21:15:37 UTC - in response to Message 7414.  


I was gobsmacked to read in another thread that there is as yet no in-house testing of work about to be sent out, and to make changes with no testing and before the holidays seems ludicrous, and dare I say a touch amateurish.

But, I would just like to say thanks to all the volunteers here who have helped me during this brief stay.

Onward to a more stable and Windows 9X friendly project methinks.


Everyone has to do what that are comfortable with, and I hope you will give us another chance once things have stabilized.

We do have in-house testing, but it has been obviously been demonstrated to be inadequate. Distributed Computing challenges code in ways that can unanticpated. I think one of the things that makes Rosetta@home interesting is that code and algorithms will be constantly evolving. This probably makes it more likely that we will send out faulty work units. I think this is just going to be that kind of project. In my experience on the Rosetta project before it moved to distributed computing, I know that David Baker will always want to be updating the code with new ideas. Because of this especially, we will definitely need to implement a more rigorous test suite, and I wouldn't blame anyone for waiting to sign back up until that happens.

ID: 7423 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 7425 - Posted: 23 Dec 2005, 21:23:33 UTC - in response to Message 7414.  


I've already gone!
[/qoute]
In which case it is good of you to hang around the forums and let us know why you left. Thank you.

[quote] After 5 weeks or so with very smooth running, I was dismayed at all these things seemingly going wrong at once. Seemed like I was beta testing and I don't recollect signing up for that.

True. That is one of the issues which led me to suggest a Rosetta Alpha project, where geeks like myself could head off new releases of WU before they hit the majority of participants.

I found it encouraging that within the day there were postings from two paid team members endorsing the idea. Personally I find it easier to accept the team's apologies precisely because they take on practical suggestion about how problem X can be avoided in future.

Is the science valid on these zero time units?

Yes

The zero time problem arises because of poor communication between the client and the application. The client is the part of the software that is common to any BOINC project, the application is the science-specific part. They run as separate processes so that they can be compiled separately, and so you can change one without changint the other. The Win95-98-ME family of operating systems is particularly poor at inter-process communication.

In contrast, the science is all contained within the app. There the problems of inter-process communication do not arise. There will be bigger differences betwen Linux and winXP than between winME and winXP as far as the science output is concerned.

The work you did, whether credited or not, is no less likely to be valid than stuff run on winXP or Linux.

River~~
ID: 7425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 7495 - Posted: 24 Dec 2005, 7:43:18 UTC - in response to Message 7261.  

Windows 98SE. Rosetta 4.81.... no credit
....Just checked a Windows ME machine, same thing.

Is there any point in running 9X machines?


This ME box has had credit from the 4.81 app. I think you have just been hit worse than most with the batches of bad WU that are coming round just now.

If you or Bill ask I can boot up my Win98SE box again to tet 4.81 (it last ran with app 4.80) -- currently it's booted into Linux.

River~~


I wouldn't burn too much time on that. My 4 win98 boxes are all starting to show 0 credits for results that took reasonable time (i.e. several hours "wallclock time"), presumably as a result of 4.81

In the grand scheme of things, the zero credits doesn't make the slightest bit of difference, the main thing is that the Rosetta team get back meaningful results.

What I'll probably do is wait it out till the new year, and see if Flops-Counting is on their radar. If not, I'll reinstall them as Win2K.
ID: 7495 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 575
Credit: 4,595,050
RAC: 2,731
Message 7501 - Posted: 24 Dec 2005, 8:00:45 UTC - in response to Message 7495.  

I wouldn't burn too much time on that. My 4 win98 boxes are all starting to show 0 credits for results that took reasonable time (i.e. several hours "wallclock time"), presumably as a result of 4.81


dg, can you check and see if the "zero credit" results followed a "short WU" error result? I'm thinking this is not a 4.81 problem so much as a "if an error occurred, the next one won't have the CPU clock running" problem.

ID: 7501 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 7793 - Posted: 28 Dec 2005, 6:50:03 UTC - in response to Message 7501.  
Last modified: 28 Dec 2005, 7:03:55 UTC

I wouldn't burn too much time on that. My 4 win98 boxes are all starting to show 0 credits for results that took reasonable time (i.e. several hours "wallclock time"), presumably as a result of 4.81


dg, can you check and see if the "zero credit" results followed a "short WU" error result? I'm thinking this is not a 4.81 problem so much as a "if an error occurred, the next one won't have the CPU clock running" problem.


In at least one case, a "zero credit" definitely did follow an error result. It took a bit of poking through the host messages, and the results, but this WU errored on host 65292 immediately prior to this WU getting zero credits, same system: 65292, which is one of my 98 se boxen.

What else is food for thought. Boincview has a "CPU efficiency" display, and on some (but not all) of my 98 se boxen that shows zero, even though they're working 100% on Rosetta.

Do you want me to try to find a few more cases of this?
ID: 7793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 575
Credit: 4,595,050
RAC: 2,731
Message 7796 - Posted: 28 Dec 2005, 7:47:00 UTC - in response to Message 7793.  

Do you want me to try to find a few more cases of this?


No... that makes "several", I think it's time to (if it matters) do the search using SQL on the server side.

ID: 7796 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : No credit!



©2024 University of Washington
https://www.bakerlab.org