Report Problems with Rosetta Version 5.13

Message boards : Number crunching : Report Problems with Rosetta Version 5.13

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
senatoralex85

Send message
Joined: 27 Sep 05
Posts: 66
Credit: 169,644
RAC: 0
Message 16406 - Posted: 16 May 2006, 19:48:01 UTC
Last modified: 16 May 2006, 19:52:43 UTC

I see. Thank you to all of those who answered my question. I did not think to look in the FAQ because I had never seen this problem hence it was not "frequently" happening. I also didn't know about adjusting the time preferences either! Good info on the FAQ

***Edit****

I am still a little confused. Does the time to completion decrease after each checkpoint or does it decrease after each completion of a model? When I first started crunching for Rosetta, the project crunched three models. The first two in low resolution and the third in high resolution. In that case, the progress was often stuck at 70%.

Is the same logic followed here with the CASP workunits. The time to completion is not adjusted until the first model is completed?
ID: 16406 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 16407 - Posted: 16 May 2006, 20:00:01 UTC - in response to Message 16402.  

I am running another BOINC Application. Lets see what happens there.

I am not running Rosetta. I cannot take more frustration for the moment. Nor I can take more aggravation.

I tried everything you suggested :all but the MAC. So I I will stop being the odd-person out and leave the projects to those who can actually run it and process applications without the humongous quantity of errors I got.

I seriously doubt, the conflicts will be solved.

Since I am a CASP 7 observer, I will keep track of Team Rosetta's progress. I wish you all sucess.

Jose,

I can truly empathize with the frustration you are feeling. If you scroll back through the old posts in this thread, you can see that, at one time, I had a string of at least 41 consecutive errors!

But that is in the past now. I have nearly twenty consecutive SUCCESSFUL workunits with the exact same machine and setup.

I have a last ditch proposal for you: Don't run Rosetta at all when you need to use the computer, but do let it run overnight. Let the errors fall where they may. Perhaps when you get through a certain group of WU's, the error situation will resolve itself.

Before you tuck the computer in for an evening of number crunching, disable the screen saver. Power the machine down completely. You might go so far (as I did) as to unplug the power supply so that the 5v standby power is interrupted long enough for the capacitors to discharge. Do a cold restart, get the BOINC client running and attached to Rosetta, turn off the monitor and go to bed.

You might wake up to a page full of errors, but amidst those errors there may be some successes. My experience was that once the successes resumed, they were continuous.

If all else fails, you might consider attaching to Ralph@home

I personally place a tremendous value in the science of the Rosseta project. I hope that you can continue to be a part of it in some way.
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 16407 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16408 - Posted: 16 May 2006, 20:26:30 UTC - in response to Message 16406.  
Last modified: 16 May 2006, 20:33:04 UTC

I see. Thank you to all of those who answered my question. I did not think to look in the FAQ because I had never seen this problem hence it was not "frequently" happening. I also didn't know about adjusting the time preferences either! Good info on the FAQ

***Edit****

I am still a little confused. Does the time to completion decrease after each checkpoint or does it decrease after each completion of a model? When I first started crunching for Rosetta, the project crunched three models. The first two in low resolution and the third in high resolution. In that case, the progress was often stuck at 70%.

Is the same logic followed here with the CASP workunits. The time to completion is not adjusted until the first model is completed?

The time actually rises during processing. When the percent complete changes, the time will jump down to a lower value. From there it will rise until the percent complete changes again.

It is like it is running backwards for a while then resets itself.

In terms of the way they run there is no difference between a CASP work unit, and any other work unit. The only difference is that for CASP we do not know the structure in advance. The goal of CASP is to see if we can figure out the structure. The normal Work units are use to develop the methods for figuring out the structures. So in those cases we know what the computer should be looking for, and the idea is to see if the software is good enough to find it.

You might want to take a look at This thread. There is a lot of information that explains all of this in far more detail.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16408 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
NJMHoffmann

Send message
Joined: 17 Dec 05
Posts: 45
Credit: 45,891
RAC: 0
Message 16411 - Posted: 16 May 2006, 22:38:36 UTC - in response to Message 16406.  

I am still a little confused. Does the time to completion decrease after each checkpoint or does it decrease after each completion of a model?

The time to completion decrease happens only, when the percent completed is recalculated. For now this is done only when a model is finished. I would like a recalculation at every checkpoint (that should be possible).

Norbert
ID: 16411 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16413 - Posted: 16 May 2006, 22:51:07 UTC - in response to Message 16411.  

I am still a little confused. Does the time to completion decrease after each checkpoint or does it decrease after each completion of a model?

The time to completion decrease happens only, when the percent completed is recalculated. For now this is done only when a model is finished. I would like a recalculation at every checkpoint (that should be possible).

Norbert

I have submitted a request for just that feature. I would like the whole number of the percentage to represent the percent complete, the 1/10th to be the checkpoint, and the 1/1000 to be the location in the model as it is now.

Such that 15.459 would represent 15% complete, checkpoint 4, position 59. That way we can see the checkpointing. If a problem occurs the project can see the position, and we have a rough idea what percent of the work unit is complete.

We will have to see what they come up with. I would also like to see a checkpoint message in the messages tab of BOINC monitor.

All of this would allow people to manage the shutdown of their system to best advantage.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16413 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Seth Aaronson
Avatar

Send message
Joined: 5 Mar 06
Posts: 18
Credit: 3,976
RAC: 0
Message 16415 - Posted: 17 May 2006, 0:30:19 UTC

I have totally new error now:

5/16/2006 5:26:38 PM|rosetta@home|Unrecoverable error for result T0283_FACONTACTS_hom006_508_12872_0 (Incorrect function. (0x1) - exit code 1 (0x1))

I still have to crtl-alt-del, end the process, and decline to debug it program.
What does this new error mean?
-Seth
ID: 16415 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16416 - Posted: 17 May 2006, 0:58:42 UTC - in response to Message 16415.  

I have totally new error now:

5/16/2006 5:26:38 PM|rosetta@home|Unrecoverable error for result T0283_FACONTACTS_hom006_508_12872_0 (Incorrect function. (0x1) - exit code 1 (0x1))

I still have to crtl-alt-del, end the process, and decline to debug it program.
What does this new error mean?
-Seth

It found something in the work unit it did not like. Rhiju or Bin will have to take a look to answer your question.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16416 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Seth Aaronson
Avatar

Send message
Joined: 5 Mar 06
Posts: 18
Credit: 3,976
RAC: 0
Message 16428 - Posted: 17 May 2006, 4:45:03 UTC - in response to Message 16416.  

I have totally new error now:

5/16/2006 5:26:38 PM|rosetta@home|Unrecoverable error for result T0283_FACONTACTS_hom006_508_12872_0 (Incorrect function. (0x1) - exit code 1 (0x1))

I still have to crtl-alt-del, end the process, and decline to debug it program.
What does this new error mean?
-Seth

It found something in the work unit it did not like. Rhiju or Bin will have to take a look to answer your question.


Thanks. I wonder what they'll find.

It also now looks like BOINC manager has downloaded version 5.16:
5/16/2006 7:27:21 PM|rosetta@home|Finished download of file rosetta_5.16_windows_intelx86.exe

It still seems to freeze my machine. This is now my latest error message:

5/16/2006 9:22:21 PM|rosetta@home|Unrecoverable error for result TEST_HOMOLOG_ABRELAX_hom003_1fna__503_50195_0 (Incorrect function. (0x1) - exit code 1 (0x1))

-Seth
ID: 16428 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 25 Sep 05
Posts: 16
Credit: 15,524
RAC: 0
Message 16430 - Posted: 17 May 2006, 6:39:33 UTC
Last modified: 17 May 2006, 7:09:02 UTC

My box hung during the 5 minutes Rosetta screensaver was allowed to run.
(at 82.93% done, model 8 step 339873, 3 hr 24 min 28 sec cpu). The display was frozen on the Rosetta screensaver image and I had to power-cycle the box. The BOINC Log shows:
5/16/2006 21:52:03 rosetta not responding to screensaver, exiting
5/16/2006 21:52:09 Unrecoverable error for result HOMOLOG_ABRELAX_hom003_t283__505_33632_0 ( - exit code -1 (0xffffffff))

It then went on to crunch CPDN, supposedly.
The above was preceded by a Windows application event log error 1000, timestamp 21:51:57:
Faulting application rosetta_5.13_windows_intelx86.exe, version 0.0.0.0, faulting module rosetta_5.13_windows_intelx86.exe, version 0.0.0.0, fault address 0x0056b66e.

Result link

I do have an ATI graphics card (Radeon 9000, circa 2003 )but I've been running boinc with this hw for over a year without this happening, and drivers are up to date.

Not willing to crunch 5.16 with only 1 GB memory, did it with ralph and it's way too greedy. Will wait until more memory arrives but concerned about this error because it effectively crashed my system.
ID: 16430 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Simon Walker

Send message
Joined: 17 Oct 05
Posts: 3
Credit: 459,592
RAC: 0
Message 16431 - Posted: 17 May 2006, 7:04:55 UTC

Well woke up this AM to new Rosetta errors, I was wondering is it at all possible for a Rosetta process being worked on by one CPU (in a dual Core) to have the other CPU come along and try to run it as well, resulting in a crash?

Just a query.

Anyway Boinc says this about the crash :

17/05/2006 07:43:36|rosetta@home|Unrecoverable error for result T0283_FACONTACTS_hom006_508_10773_0 ( - exit code -1073741811 (0xc000000d))

On the FX-53 showing these problems the messages are :

16/05/2006 07:50:40|rosetta@home|Unrecoverable error for result JUMP_ALLBARCODES_ANTIPARALLEL_1tul__SAVE_ALL_OUT_491_30163_0 ( - exit code -1073741811 (0xc000000d))


16/05/2006 16:40:43|rosetta@home|Unrecoverable error for result T0283_FACONTACTS_hom003_508_8903_0 ( - exit code -1073741811 (0xc000000d))


17/05/2006 07:55:34|rosetta@home|Unrecoverable error for result HOMOLOG_ABRELAX_hom003_t283__505_32419_0 ( - exit code -1073741811 (0xc000000d))


Both of these machines were running unattended, and when the monitors were powered up the error messages were present.

The Dual core processor result listing : https://boinc.bakerlab.org/rosetta/results.php?hostid=145422

The FX-53 result listing : https://boinc.bakerlab.org/rosetta/results.php?hostid=193752

If you look at the FX-53 result listing you will be able to see that of the 12 completed units, 6 failed (only 1 of which was credited)

Whats going wrong?
Active PC's

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=145422

https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=193752

Results: https://boinc.bakerlab.org/rosetta/results.php?userid=5150
ID: 16431 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16435 - Posted: 17 May 2006, 12:00:32 UTC - in response to Message 16431.  

Well woke up this AM to new Rosetta errors, I was wondering is it at all possible for a Rosetta process being worked on by one CPU (in a dual Core) to have the other CPU come along and try to run it as well, resulting in a crash?

Just a query.

...If you look at the FX-53 result listing you will be able to see that of the 12 completed units, 6 failed (only 1 of which was credited)

Whats going wrong?

Please see This FAQ concerning the credits. As for the dual core issue, anything is possible, but it is very unlikely that both cores would run the same Work unit.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16435 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 16452 - Posted: 17 May 2006, 17:03:57 UTC - in response to Message 16435.  
Last modified: 17 May 2006, 17:04:12 UTC

I was wondering is it at all possible for a Rosetta process being worked on by one CPU (in a dual Core) to have the other CPU come along and try to run it as well, resulting in a crash?


I run two dual core systems. BOINC assigns the work, and will not assign the same WU to two different processes. You may see two different WUs running at the same time. This is the power of the dual core. But BOINC is designed to assure no such conflicts occur.

As for credit, when the "client state" still shows "computing", and yet you've reported the WU, and have credit claimed, it just means that the daily process to grant credit for client errors hasn't been run yet. You should see credit tomorrow.

Your machine seems to be successfully crunching several (indeed one had 99) models before the errors. So, even though the last model had a problem, you are still producing valuable results and will be granted credit... in 24hrs.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 16452 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
senatoralex85

Send message
Joined: 27 Sep 05
Posts: 66
Credit: 169,644
RAC: 0
Message 16457 - Posted: 17 May 2006, 18:02:24 UTC - in response to Message 16413.  

I am still a little confused. Does the time to completion decrease after each checkpoint or does it decrease after each completion of a model?

The time to completion decrease happens only, when the percent completed is recalculated. For now this is done only when a model is finished. I would like a recalculation at every checkpoint (that should be possible).

Norbert

I have submitted a request for just that feature. I would like the whole number of the percentage to represent the percent complete, the 1/10th to be the checkpoint, and the 1/1000 to be the location in the model as it is now.

Such that 15.459 would represent 15% complete, checkpoint 4, position 59. That way we can see the checkpointing. If a problem occurs the project can see the position, and we have a rough idea what percent of the work unit is complete.

We will have to see what they come up with. I would also like to see a checkpoint message in the messages tab of BOINC monitor.

All of this would allow people to manage the shutdown of their system to best advantage.




Thank you! The checkpoint button is a great idea!

ID: 16457 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 16469 - Posted: 17 May 2006, 19:13:18 UTC - in response to Message 16413.  

I would also like to see a checkpoint message in the messages tab of BOINC monitor.

All of this would allow people to manage the shutdown of their system to best advantage.

That might get to be a lot of messages, I always prefer less messages, but if you could tell by the % complete when a checkpoint occurs, then you wouldn't even have to bring up the graphic to tell. Perhaps a field in the graphic for the CPU time of the last checkpoint? That might help people more readily see the reality of what checkpoints are, and what work they're throwing away. And yet still not add any more math to show like a count up of time since last checkpoint.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 16469 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16483 - Posted: 17 May 2006, 21:03:53 UTC - in response to Message 16469.  

I would also like to see a checkpoint message in the messages tab of BOINC monitor.

All of this would allow people to manage the shutdown of their system to best advantage.

That might get to be a lot of messages, I always prefer less messages, but if you could tell by the % complete when a checkpoint occurs, then you wouldn't even have to bring up the graphic to tell. Perhaps a field in the graphic for the CPU time of the last checkpoint? That might help people more readily see the reality of what checkpoints are, and what work they're throwing away. And yet still not add any more math to show like a count up of time since last checkpoint.

What they are doing will work for the widest possible audience, and will satisfy the broadest possible range of tastes. A single message line of the message tab every 20-30 min is not going to tax anyones machine.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16483 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16488 - Posted: 17 May 2006, 22:14:58 UTC - in response to Message 16407.  
Last modified: 17 May 2006, 22:17:30 UTC


I personally place a tremendous value in the science of the Rosetta project. I hope that you can continue to be a part of it in some way.


I did too, But I tried what you proposed : one more error, a phantom Wu and a Wu that is so slow a slug can out race it. This after I had gotten the computer to use more CPU for Rosetta (around 90% then , now it is barely 19 %)

And what I have been reading is that 5.16 has not been the panacea promised. OH I am running 5.16.

So no more tasks after this one.

G-d knows I tried.

Ps the only thing I have not tried is a human sacrifice and I doubt Moderator nine would volunteer . ( Lame attempt at joke.)

This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 16488 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16499 - Posted: 18 May 2006, 0:29:00 UTC - in response to Message 16488.  

...Ps the only thing I have not tried is a human sacrifice and I doubt Moderator nine would volunteer . ( Lame attempt at joke.)

How do you think i got the job?
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16499 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 14 Apr 06
Posts: 29
Credit: 335,780
RAC: 825
Message 16503 - Posted: 18 May 2006, 0:42:34 UTC

Failure from Tuesday - only just noticed it.

https://boinc.bakerlab.org/rosetta/result.php?resultid=20513809

One from Monday

https://boinc.bakerlab.org/rosetta/result.php?resultid=20390954

And looking at my list there's one for 5.16 that seems to have happened before my 5.13 WUs finished (posted on the 5,16 thread in a mo).

My results list seems to be out of chronological order.
Ian Cundell, St Albans, UK
ID: 16503 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16505 - Posted: 18 May 2006, 0:46:37 UTC - in response to Message 16503.  

...My results list seems to be out of chronological order.

That happens on a few of the projects.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16505 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 16508 - Posted: 18 May 2006, 3:14:55 UTC - in response to Message 16488.  


I personally place a tremendous value in the science of the Rosetta project. I hope that you can continue to be a part of it in some way.


I did too, But I tried what you proposed : one more error, a phantom Wu and a Wu that is so slow a slug can out race it. This after I had gotten the computer to use more CPU for Rosetta (around 90% then , now it is barely 19 %)

And what I have been reading is that 5.16 has not been the panacea promised. OH I am running 5.16.

So no more tasks after this one.

G-d knows I tried.

Ps the only thing I have not tried is a human sacrifice and I doubt Moderator nine would volunteer . ( Lame attempt at joke.)

I wish you the best success at whatever BOINC projects you are able to run =)

Please consider Ralph @home as one possibility. If the particular configuration of your computer produces a lot of errors in the Rosetta app, I can only imagine that the full error codes available in the Ralph app would be of tremendous value.

I also run SETI @home and SZTAKI. Perhaps I will see you on the message boards at those projects... Okay, probably NOT at the SZTAKI site, unless we both learn Hungarian :p
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 16508 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.13



©2024 University of Washington
https://www.bakerlab.org