TOO MANY ERROR MESSAGES WHY?

Message boards : Rosetta@home Science : TOO MANY ERROR MESSAGES WHY?

To post messages, you must log in.

AuthorMessage
John

Send message
Joined: 24 Oct 06
Posts: 2
Credit: 1,863,033
RAC: 0
Message 31298 - Posted: 17 Nov 2006, 14:25:26 UTC

I HAVE RUN THE ROSSETA PROJECT CONTINNUSLY FOR A MONTH AND I HAVE TOO MANY ERROR MESSAGES 'Client error Compute error'. BEFORE I JOINT ROSSETA I WAS RUNNING SETI, I HAD ALSO SOME Client error Compute error MESSAGES BUT NOT THAT MUCH... I DIDN'T CHANGE ANYTHING ON MY PC WHY DO I GET SO MANY Client error Compute error ? CAN ANYONE HELP ME?

ID: 31298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 31299 - Posted: 17 Nov 2006, 14:54:25 UTC

First, can you stop shouting (capitals, in internet culture, is used to "shout"). Text is much easier to read if it's correctly capitalized.

Now to your problem: Rosetta is probably a bit more sensitive to computer problems than SETI - it's doing a more complex set of math, simply put.

However, there is also the problem that Rosetta can have "bad workunits", i.e. that the work-units themselves cause Rosetta to crash, so you could just be "unlucky".

In at least one case, the other computer attempting to calculate the same work-unit also got an error, which indicates that the workunit itself is "broken".

Here's some examples where two computers failed to do the same task:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=42022337
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=41972301

Here are some where the second attempt succeds:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=41705331
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=41704823
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=41703070

--
Mats

ID: 31299 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 31305 - Posted: 17 Nov 2006, 15:59:33 UTC

first, can i agree with mats about not shouting. in my opinion if for any reason you have difficulty with using the shift key (disabilitiy etc) it is more friendly to others to use all lower case instead of all capitals. that is what i am doing here, deliberately to make the point.

of course, we can see it is your first post here, and accept that it takes time for newcomers to get to know 'how to eat pie with a fork' online.

Second, again I agree with Mats about this project having more errors than (say) SETI or Einstein. It is important to know that this project is doing two things at once a lot of the time. One thing it is doing is making new predictions that (hopefully!) will be useful to other biochem researchers (both pure research and medical). The other thing it is doing is testing out new ways of crunching these predictions. Sometimes going into areas where programs do not exist at all, and sometimes trying to improve the speed or accuracy of existing programs.

These aims mean that this project is always running reasonably newly written code. We do not have the luxury (as SETI and Einstein do) or running the same app for half a year at a time, changing only the input data. Those projects are each part of an experiment that involves collecting gigabytes of similar data and then sifting it (for aliens or for gravy waves). As far as SETI or Einstein are concerned, the IT is a tool towards their main aim (aliens or wavy gravy).

Here, developing the IT is an end in itself - to develop better tools for other groups to use. Until revently, for example, an older version of the Rosetta code was in use on the World Community Grid, where it ran bug free for months, being an old tested version that was doing some production work on some interesting science.

For that reason, on this project you almost always get credit for results that error out - because when you develop new code catching the bugs is part of the science.

Now you understand better what we are about, you may choose to stay and enjoy being part of pushing forward the IT boundaries (accepting the higher error rate). Or you may choose to go to other kinds of project -- all of which have their own legitimate aims -- so that you can enjoy a lower error rate. The choice is yours.

The only mistake would be to stay and hope the errors will go away - they won't, this is not that kind of project. Errors come in clusters (and we seem to be in the middle of a cluster now) but they will never go away for good.

If you choose to go elsewhere, then thank you for the 3.5k credits you have donated to this project - not only have you given Rosetta a fair test but also the work you have crunched, including those that went wrong, have genuinely helped the project.

If you choose to stay, we thank you for your willingness to understand the different needs of this project now that you have experienced those needs in action.

So, welcome, or go well; as the case may be.
River~~
ID: 31305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Don

Send message
Joined: 28 Oct 06
Posts: 2
Credit: 294,270
RAC: 0
Message 31314 - Posted: 17 Nov 2006, 17:17:00 UTC - in response to Message 31305.  
Last modified: 17 Nov 2006, 17:23:51 UTC

I am experiencing a lot of errors as well, but only on two of my four machines.

7 out of 13 results were errors on a P4 3.0 prescott on an Intel board not overclocked.

4 of 16 results were errors on a P4 3.0 northwood @3.3 GHz on an Abit board.

no errors on 16 results on an Athlon 64 3700+ @2.8 GHz on a DFI board

no errors on 16 results on a PIII 1 GHz on an Intel board.

Funny that the hyperthreaded processors are the ones that are throwing errors.
ID: 31314 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 31317 - Posted: 17 Nov 2006, 19:24:05 UTC - in response to Message 31314.  

I am experiencing a lot of errors as well, but only on two of my four machines.

7 out of 13 results were errors on a P4 3.0 prescott on an Intel board not overclocked.

4 of 16 results were errors on a P4 3.0 northwood @3.3 GHz on an Abit board.

no errors on 16 results on an Athlon 64 3700+ @2.8 GHz on a DFI board

no errors on 16 results on a PIII 1 GHz on an Intel board.

Funny that the hyperthreaded processors are the ones that are throwing errors.


Are you getting the same kind of work on all the machines?

In the short term thais kind of pattern can arise on this project simply because the work tends to go out in batches, so that one machine can be hit by a load of errors while another gets none.

If you are seeing a lot of errors on the HT boxes and none on the others on the same types of work unit (ie the words in the wu name are the same, only differing by the numbers), then that would be a valuable clue and I'd ask you to post details in the Problem with Rosetta Version blah thread.

I won't post a link, as the right thread changes from time to time, but just put

problem rosetta version 5.40

into the search box on this page and it will take you there (alter the number, of course, if you are on a different version of Rosetta).

R~~
ID: 31317 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 31336 - Posted: 18 Nov 2006, 1:34:27 UTC - in response to Message 31317.  
Last modified: 18 Nov 2006, 1:36:22 UTC

If you are seeing a lot of errors on the HT boxes and none on the others on the same types of work unit (ie the words in the wu name are the same, only differing by the numbers), then that would be a valuable clue and I'd ask you to post details in the Problem with Rosetta Version blah thread.

I won't post a link, as the right thread changes from time to time, but just put

problem rosetta version 5.40

into the search box on this page and it will take you there (alter the number, of course, if you are on a different version of Rosetta).

R~~


Hi River

I started the Problems with Rosetta Version 5.40 thread the other day. For those of you reading this now and have questions or concerns about the current application errors please click here .

Thanks!
Tim

Edit: Spelling and other stuff that will drive me crazy if I don't fix it.



ID: 31336 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 31370 - Posted: 18 Nov 2006, 11:36:23 UTC - in response to Message 31336.  

I won't post a link, as the right thread changes from time to time, ...

...
I started the Problems with Rosetta Version 5.40 thread the other day. For those of you reading this now and have questions or concerns about the current application errors please ...
...


I'd emphasise that Tim's links refer specifically to v5.40, which is current at the time of writing but may not be by the time you read this. So please check the version number, and if it is not 5.40 then please use the search box to find the right thread.

Thanks everyone
River~~
ID: 31370 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Rosetta@home Science : TOO MANY ERROR MESSAGES WHY?



©2024 University of Washington
https://www.bakerlab.org