Rosetta@home

Report Problems with Rosetta Version 5.22

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Report Problems with Rosetta Version 5.22

Sort
AuthorMessage
Moderator9
Forum moderator
Project administrator

Joined: Jan 22 06
Posts: 1014
ID: 53254
Credit: 0
RAC: 0
Message 18196 - Posted 8 Jun 2006 23:06:02 UTC

This thread is for reporting problems with Rosetta Version 5.22.
____________
Moderator9
ROSETTA@home FAQ
Moderator Contact

Bandit Profile

Joined: May 21 06
Posts: 12
ID: 83715
Credit: 197,197
RAC: 0
Message 18275 - Posted 9 Jun 2006 13:15:48 UTC - in response to Message ID 18196.

My compuer is freezing on one of these two WUs - 19746278 or 19737574. It's happened several times today. I'd come into the room, note that the graphics weren't animated, the steps weren't incrementing, and the clock would be stopped. I'd move the mouse (I have it configured to work on the project only when I'm not using the machine), and would get a notice that there was an error and would I like to report to MicroSoft. Next time, I'll get a screenshot.

Bandit's Mom


____________

Moderator9
Forum moderator
Project administrator

Joined: Jan 22 06
Posts: 1014
ID: 53254
Credit: 0
RAC: 0
Message 18277 - Posted 9 Jun 2006 13:35:25 UTC - in response to Message ID 18275.

My compuer is freezing on one of these two WUs - 19746278 or 19737574. It's happened several times today. I'd come into the room, note that the graphics weren't animated, the steps weren't incrementing, and the clock would be stopped. I'd move the mouse (I have it configured to work on the project only when I'm not using the machine), and would get a notice that there was an error and would I like to report to MicroSoft. Next time, I'll get a screenshot.

Bandit's Mom


It is possible that there is a background task running(disk defrag, virus check, etc) that is preventing your system from becoming fully idle. This of course would also prevent BOINC from processing any work.
____________
Moderator9
ROSETTA@home FAQ
Moderator Contact

Bandit Profile

Joined: May 21 06
Posts: 12
ID: 83715
Credit: 197,197
RAC: 0
Message 18283 - Posted 9 Jun 2006 14:36:52 UTC - in response to Message ID 18277.


It is possible that there is a background task running(disk defrag, virus check, etc) that is preventing your system from becoming fully idle. This of course would also prevent BOINC from processing any work.


Nope - the only other things I have open are Word (x2), Excel (x2), Reference Manager, and a Mah Jongg game program.

Here's the text of the error message: "rosetta_5.22_windows_intelx86.exe has encountered a problem and needs to close. We are sorry for the inconvenience."

Bandit's Mom

____________

Moderator9
Forum moderator
Project administrator

Joined: Jan 22 06
Posts: 1014
ID: 53254
Credit: 0
RAC: 0
Message 18285 - Posted 9 Jun 2006 15:04:12 UTC - in response to Message ID 18283.
Last modified: 9 Jun 2006 15:16:49 UTC


It is possible that there is a background task running(disk defrag, virus check, etc) that is preventing your system from becoming fully idle. This of course would also prevent BOINC from processing any work.


Nope - the only other things I have open are Word (x2), Excel (x2), Reference Manager, and a Mah Jongg game program.

Here's the text of the error message: "rosetta_5.22_windows_intelx86.exe has encountered a problem and needs to close. We are sorry for the inconvenience."

Bandit's Mom

There was a problem with the screen saver on some Windows systems with version 5.16. This was supposed to be fixed in the new release. Have you tried running BOINC with the screen saver turned off?

In order to assist you you will either have to provide a link to the reported results in your Stats list, or make your computer visible. Currently your computers are hidden so I cannot look up any of your results to see the actual errors.

You can make your computers visible from your preferences without risk to your computer security. If you want to see the kind of information others might see, you can clink on any other user in the forums, and then clink on the link to view their computers.

Of course the system will allow you to see more information on your own systems than it would reveal to others.


____________
Moderator9
ROSETTA@home FAQ
Moderator Contact

Ricardo

Joined: Dec 9 05
Posts: 26
ID: 33394
Credit: 24,039
RAC: 0
Message 18303 - Posted 9 Jun 2006 17:36:50 UTC

I get the following report with the new 5.22:

Result ID 23395831
Name t306__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom001__656_21236_0
Workunit 19711334
Created 8 Jun 2006 22:11:53 UTC
Sent 8 Jun 2006 23:44:52 UTC
Received 9 Jun 2006 11:22:33 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 246538
Report deadline 15 Jun 2006 23:44:52 UTC
CPU time 21089.578125
stderr out <core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
# random seed: 1831515
# cpu_run_time_pref: 21600
# DONE :: 1 starting structures built 21 (nstruct) times
# This process generated 21 decoys from 21 attempts


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>


Validate state Valid
Claimed credit 71.993372528454
Granted credit 71.993372528454
application version 5.22

In other post I have see that someone also reported that Watchdog has shutting down the process.

Regards,
Ricardo

____________

rbpeake Profile

Joined: Sep 25 05
Posts: 168
ID: 1036
Credit: 246,593
RAC: 0
Message 18306 - Posted 9 Jun 2006 17:56:25 UTC - in response to Message ID 18303.
Last modified: 9 Jun 2006 17:57:22 UTC

I get the following report with the new 5.22:

Result ID 23395831
Name t306__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom001__656_21236_0
Workunit 19711334
Created 8 Jun 2006 22:11:53 UTC
Sent 8 Jun 2006 23:44:52 UTC
Received 9 Jun 2006 11:22:33 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 246538
Report deadline 15 Jun 2006 23:44:52 UTC
CPU time 21089.578125
stderr out <core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
# random seed: 1831515
# cpu_run_time_pref: 21600
# DONE :: 1 starting structures built 21 (nstruct) times
# This process generated 21 decoys from 21 attempts


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>


Validate state Valid
Claimed credit 71.993372528454
Granted credit 71.993372528454
application version 5.22

In other post I have see that someone also reported that Watchdog has shutting down the process.

Regards,
Ricardo

This is a normal shutdown for a successfully completed workunit.

The note regarding the watchdog is just to identify that now that the work unit has finished, the watchdog function is being closed down as well.
____________
Regards,
Bob P.

Ricardo

Joined: Dec 9 05
Posts: 26
ID: 33394
Credit: 24,039
RAC: 0
Message 18307 - Posted 9 Jun 2006 18:25:22 UTC - in response to Message ID 18306.

I get the following report with the new 5.22:

Result ID 23395831
Name t306__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom001__656_21236_0
Workunit 19711334
Created 8 Jun 2006 22:11:53 UTC
Sent 8 Jun 2006 23:44:52 UTC
Received 9 Jun 2006 11:22:33 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 246538
Report deadline 15 Jun 2006 23:44:52 UTC
CPU time 21089.578125
stderr out <core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
# random seed: 1831515
# cpu_run_time_pref: 21600
# DONE :: 1 starting structures built 21 (nstruct) times
# This process generated 21 decoys from 21 attempts


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>


Validate state Valid
Claimed credit 71.993372528454
Granted credit 71.993372528454
application version 5.22

In other post I have see that someone also reported that Watchdog has shutting down the process.

Regards,
Ricardo

This is a normal shutdown for a successfully completed workunit.

The note regarding the watchdog is just to identify that now that the work unit has finished, the watchdog function is being closed down as well.



Noted and thanks Bob for your clarification

Regards
Ricardo
____________

Bandit Profile

Joined: May 21 06
Posts: 12
ID: 83715
Credit: 197,197
RAC: 0
Message 18312 - Posted 9 Jun 2006 19:20:42 UTC - in response to Message ID 18285.

I've turned the screensaver off on the Control Panel - is there something I should do in my BOINC preferences? (Color me ignorant, and what I read isn't staying with me at the moment.) You should be able to "see" my computer now, unless there's something else I should do. "... provide a link to the reported results in your Stats list ..." Not sure how to do this.

Bandit's Mom



There was a problem with the screen saver on some Windows systems with version 5.16. This was supposed to be fixed in the new release. Have you tried running BOINC with the screen saver turned off?

In order to assist you you will either have to provide a link to the reported results in your Stats list, or make your computer visible. Currently your computers are hidden so I cannot look up any of your results to see the actual errors.

You can make your computers visible from your preferences without risk to your computer security. If you want to see the kind of information others might see, you can clink on any other user in the forums, and then clink on the link to view their computers.

Of course the system will allow you to see more information on your own systems than it would reveal to others.



____________

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 18315 - Posted 9 Jun 2006 19:32:10 UTC - in response to Message ID 18312.
Last modified: 9 Jun 2006 19:32:36 UTC

...You should be able to "see" my computer now, unless there's something else I should do.


They still show "hidden". In your Rosetta preferences, select YES for the question "Should Rosetta@home show your computers on its web site". More details here.

____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

Bandit Profile

Joined: May 21 06
Posts: 12
ID: 83715
Credit: 197,197
RAC: 0
Message 18316 - Posted 9 Jun 2006 19:48:15 UTC - in response to Message ID 18315.

It's selected as "yes." Maybe it needed time to implement in the system. Maybe you could try again?

Bandit's Mom



They still show "hidden". In your Rosetta preferences, select YES for the question "Should Rosetta@home show your computers on its web site".


____________

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 18318 - Posted 9 Jun 2006 19:52:40 UTC - in response to Message ID 18316.

It's selected as "yes." Maybe it needed time to implement in the system. Maybe you could try again?

Did you hit the "Update preferences" button at the bottom of the screen? I just looked again and it still shows as "hidden".

____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

Bandit Profile

Joined: May 21 06
Posts: 12
ID: 83715
Credit: 197,197
RAC: 0
Message 18319 - Posted 9 Jun 2006 20:06:48 UTC - in response to Message ID 18318.

I went back to look and it was saved as "yes," but hit the "Update" button again, just for giggles and grins. Maybe it didn't take the first time around.

Bandit's Mom


Did you hit the "Update preferences" button at the bottom of the screen? I just looked again and it still shows as "hidden".


____________

Moderator9
Forum moderator
Project administrator

Joined: Jan 22 06
Posts: 1014
ID: 53254
Credit: 0
RAC: 0
Message 18321 - Posted 9 Jun 2006 20:33:45 UTC - in response to Message ID 18319.

I went back to look and it was saved as "yes," but hit the "Update" button again, just for giggles and grins. Maybe it didn't take the first time around.

Bandit's Mom


Did you hit the "Update preferences" button at the bottom of the screen? I just looked again and it still shows as "hidden".


Looking at your two errors, one is not well described, and in fact looks like a normal work unit. The latest has a fountain of error data to examine, but at first glance it does seem to be related to the screen saver issue, (-107 error) but the window that was in the foreground when the system terminated the work unit was Explorer. This could just be an artifact of the way windows uses explorer, but is there any chance you had left your browser loaded when the problem occurred?

My best guess right now is that if you set your screen saver to something other than BOINC or turn it off (from what I read you already did that) the problem may go away. If it does could you please let us know? We are trying to fix that particular issue, and the programmers thought they had it under control. We need to know if they do not.

____________
Moderator9
ROSETTA@home FAQ
Moderator Contact

Bandit Profile

Joined: May 21 06
Posts: 12
ID: 83715
Credit: 197,197
RAC: 0
Message 18324 - Posted 9 Jun 2006 20:50:45 UTC - in response to Message ID 18321.
Last modified: 9 Jun 2006 20:51:25 UTC

It's not only possible, it's probable that IE was loaded. With the BOINC screensaver off, I'm not certain that I would be able to tell that there was a problem as quickly, but am willing to give it a go.

I'm going to switch my computer back to "hidden."

Thanks for your help.

Bandit's Mom




... but is there any chance you had left your browser loaded when the problem occurred?

My best guess right now is that if you set your screen saver to something other than BOINC or turn it off (from what I read you already did that) the problem may go away. If it does could you please let us know? We are trying to fix that particular issue, and the programmers thought they had it under control. We need to know if they do not.


____________

Moderator9
Forum moderator
Project administrator

Joined: Jan 22 06
Posts: 1014
ID: 53254
Credit: 0
RAC: 0
Message 18326 - Posted 9 Jun 2006 21:01:27 UTC - in response to Message ID 18324.

...I'm going to switch my computer back to "hidden."

Thanks for your help.

Bandit's Mom
...

While there is no requirement that you leave your computer visible, there is no real risk to you in doing os. It is almost impossible for people to assist you unless they can look at the work unit results for your computer if there is a problem. it is your choice however.
____________
Moderator9
ROSETTA@home FAQ
Moderator Contact

Alan Roberts

Joined: Jun 7 06
Posts: 56
ID: 93009
Credit: 1,210,119
RAC: 1,392
Message 18327 - Posted 9 Jun 2006 21:03:27 UTC - in response to Message ID 18196.

Hello Moderator9

I pulled a client error on Work Unit 19677012 (the result is at http://boinc.bakerlab.org/rosetta/result.php?resultid=23357465), with exit code -1. Client messages for the error began with, "rosetta not responding to screensaver, exiting"

Then again on WU 19684873 (result link http://boinc.bakerlab.org/rosetta/result.php?resultid=23366061). Exit code was again -1, client-side messages once again report, "rosetta notresponding to screensaver, exiting".

The first was a 5.16 client, the second was with a 5.22. I'm not sure the screensaver bug is completely dead yet?

Cheers,
Alan


____________

Moderator9
Forum moderator
Project administrator

Joined: Jan 22 06
Posts: 1014
ID: 53254
Credit: 0
RAC: 0
Message 18381 - Posted 10 Jun 2006 15:24:23 UTC
Last modified: 10 Jun 2006 15:28:03 UTC

I have sent a message to the project team regarding the possibility that the screen saver may still be causing problems for a few of you. If you report a problem that seems related to the screen saver, please be certain to make your computers visible in your preferences to assist them in examining the problem. Also if you could provide a link to any workunit results that may have been produced during the time you noticed the problem that would be helpful.

For those of you who may not know how to create a link -


    1) Go to the place you want the link to take the user and copy the "HTTP" address from the browser address field at the top of your browser window.
    2) Open the post in which you wish to place the link and type "[_url=" (leaving out the "_" which I have added here to disable the command) at the point in the message where you want the link to appear.
    3) Paste the link address after the "=" with no spaces and add a "]" to the end of the address.
    4) Type any text you may want to identify the link.
    5) Complete the link command by typing "[/url]".



So if I wanted to create a link to the Rosetta FAQs it would look like this -
(Except you would leave out the "_", which I added for display purposes)

[_url=http://boinc.bakerlab.org/rosetta/forum_thread.php?id=669]Rosetta FAQs[/url]

And it would look like this in the final message.

Rosetta FAQs

Thank you.


____________
Moderator9
ROSETTA@home FAQ
Moderator Contact

RWIoffice

Joined: Jun 7 06
Posts: 4
ID: 93159
Credit: 37,344
RAC: 0
Message 18391 - Posted 10 Jun 2006 17:11:36 UTC

Possible problem with a t299_CASP7 work unit (link to WU). Was a happy camper, then at step 370K+ on Model 6 my CPU dropped from 100% to nothing, and graphics display showed no progress. Didn't write down the stuck step number, sorry.

Suspend on that task released the waiting next, which drove the CPU back to full load. Suspended task #2 and resumed #1, but it still didn't seem to grab any CPU.

For lack of knowing any better (new user), I shutdown BOINC and restarted, which I think I understand to mean that the task resumes at the previous checkpoint (model boundary)? It is now running again.

What is accepted practice if it hangs up again, please? Do I wait for some watchdog abort, or manually abort it? I don't really care about credits, I'll take whatever action provides the best feedback about the failure. Thanks!
____________

tralala

Joined: Apr 8 06
Posts: 376
ID: 73828
Credit: 581,806
RAC: 1
Message 18399 - Posted 10 Jun 2006 18:30:36 UTC - in response to Message ID 18391.
Last modified: 10 Jun 2006 18:32:17 UTC

see above
____________

tralala

Joined: Apr 8 06
Posts: 376
ID: 73828
Credit: 581,806
RAC: 1
Message 18400 - Posted 10 Jun 2006 18:31:43 UTC - in response to Message ID 18399.
Last modified: 10 Jun 2006 18:33:08 UTC


What is accepted practice if it hangs up again, please? Do I wait for some watchdog abort, or manually abort it? I don't really care about credits, I'll take whatever action provides the best feedback about the failure. Thanks!


Wait and don't abort. It will finish after "completion" in maximum an hour. Rosetta waits for the watchdog to shut down. It was something introduced in 5.19 for better debugging but reported over at RALPH and supposedly fixed in 5.22. It is very good that you report this here.
If you happen to observe this again please check whether the graphics show 100% as well or something lower and make a screenshot from the graphics window in "idling" state.
____________

Astro
Avatar

Joined: Oct 2 05
Posts: 987
ID: 2322
Credit: 500,253
RAC: 0
Message 18405 - Posted 10 Jun 2006 19:27:14 UTC
Last modified: 10 Jun 2006 19:27:26 UTC

The Fatal Winows Error Bug is still with us, I'm afraid. wuid=19791659

Result ID 23483927
Name t309__CASP7_ABRELAX_SAVE_ALL_OUT_nohistag_hom001__661_7645_0
Workunit 19791659
Created 9 Jun 2006 11:23:04 UTC
Sent 9 Jun 2006 12:59:31 UTC
Received 10 Jun 2006 19:23:54 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 212252
Report deadline 16 Jun 2006 12:59:31 UTC
CPU time 28426.171875
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 1655106

</stderr_txt>


Validate state Invalid
Claimed credit 109.225790694355
Granted credit 0
application version 5.22

Rhiju
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 18409 - Posted 10 Jun 2006 20:30:57 UTC - in response to Message ID 18327.
Last modified: 10 Jun 2006 20:34:36 UTC

Hi Alan:

Thanks for reporting. There seem to be numerous little issues with the screensaver, and we've been trying to track them down one-by-one over on the test project, ralph. But I haven't seen a lot of problems like the one you describe -- has it happened in previous work units before this double batch? I wonder if something went haywire with the core boinc application -- you may need to restart.

Hello Moderator9

I pulled a client error on Work Unit 19677012 (the result is at http://boinc.bakerlab.org/rosetta/result.php?resultid=23357465), with exit code -1. Client messages for the error began with, "rosetta not responding to screensaver, exiting"

Then again on WU 19684873 (result link http://boinc.bakerlab.org/rosetta/result.php?resultid=23366061). Exit code was again -1, client-side messages once again report, "rosetta notresponding to screensaver, exiting".

The first was a 5.16 client, the second was with a 5.22. I'm not sure the screensaver bug is completely dead yet?

Cheers,
Alan



____________

Rhiju
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 18410 - Posted 10 Jun 2006 20:37:36 UTC - in response to Message ID 18405.

Hi mmciastro... yes, we know its still there. You might be happy to know that the error -1073741811 (0xc000000d) is currently number one on our lists of things to kill. Its been the most common error for a while, but Rom only now has come up with a hypothesis for what the cause might be. He has just put in some extra debugging stuff on ralph to track it down -- maybe that will let us unravel this puzzle!

The Fatal Winows Error Bug is still with us, I'm afraid. wuid=19791659

Result ID 23483927
Name t309__CASP7_ABRELAX_SAVE_ALL_OUT_nohistag_hom001__661_7645_0
Workunit 19791659
Created 9 Jun 2006 11:23:04 UTC
Sent 9 Jun 2006 12:59:31 UTC
Received 10 Jun 2006 19:23:54 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 212252
Report deadline 16 Jun 2006 12:59:31 UTC
CPU time 28426.171875
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 1655106

</stderr_txt>


Validate state Invalid
Claimed credit 109.225790694355
Granted credit 0
application version 5.22


____________

Astro
Avatar

Joined: Oct 2 05
Posts: 987
ID: 2322
Credit: 500,253
RAC: 0
Message 18411 - Posted 10 Jun 2006 20:54:32 UTC - in response to Message ID 18410.

Hi mmciastro... yes, we know its still there.

It's only a problem for certain video cards, .net, whatever it is. If regular users who get this, turn OFF the screensaver, they'll never see it until it's fixed. I'm in direct communication with Rom on this bug, just FYI. I happen to have a machine, that regularly gets this error (lucky me, and I guess, lucky Rosetta/Rom}.

tony

Alan Roberts

Joined: Jun 7 06
Posts: 56
ID: 93009
Credit: 1,210,119
RAC: 1,392
Message 18444 - Posted 11 Jun 2006 4:01:01 UTC - in response to Message ID 18409.

Hi Alan:

Thanks for reporting. There seem to be numerous little issues with the screensaver, and we've been trying to track them down one-by-one over on the test project, ralph. But I haven't seen a lot of problems like the one you describe -- has it happened in previous work units before this double batch? I wonder if something went haywire with the core boinc application -- you may need to restart.



Rhiju,

Work units before the failures and after were completed based on a look at my results. I may have pulled a boinc restart somewhere in there ...

I'm pitching this as an employee contribution/team-effort project at one of my customer sites, and the three of us who are the test cases have been grabbing our volunteer minutes here and there getting our sample desktops running, to demonstrate safety (at least lack of harm and impact on the "real work") and stability.

When I saw the comments about screen saver issues on the forum and noticed my failures, I may have restarted boinc in a quick-and-dirty quest for a fix.

I set the test machines to not use the screen saver over the weekend, but if there is a better procedure for providing debugging information (i.e., "run the screensaver and do the following if you get another error"), please let me know.

Cheers,
Alan

____________

RWIoffice

Joined: Jun 7 06
Posts: 4
ID: 93159
Credit: 37,344
RAC: 0
Message 18465 - Posted 11 Jun 2006 14:56:18 UTC

Screensaver or possibly some other flavor of "completion" problem with a t299__CASP7 work unit.

I noticed lack of completion progress from home this morning. When I got to the office, the screensaver was sitting on "Model 9, Step 0."

Once I got past the screensaver, BOINC Manager was reporting the work unit as "Running" and "100%" for Progress. BOINC Manager would not display the graphics.

Shutdown of BOINC Manager seemed to take a long time, but it finally happened. I rebooted the system. Once BOINC Manager launched it reported the status on this work unit as completed, "Ready to Report" and started the next work unit.

I forced an update so the result would be available prior to my posting this report.

Keeping in mind that I'm a newbie and could easily be misinterpreting, the result seems to be referring to only 8 models, so the screensaver graphic's Model 9 reference doesn't make sense to me.

____________

tralala

Joined: Apr 8 06
Posts: 376
ID: 73828
Credit: 581,806
RAC: 1
Message 18473 - Posted 11 Jun 2006 18:46:15 UTC - in response to Message ID 18465.

Screensaver or possibly some other flavor of "completion" problem with a t299__CASP7 work unit.

I noticed lack of completion progress from home this morning. When I got to the office, the screensaver was sitting on "Model 9, Step 0."

Once I got past the screensaver, BOINC Manager was reporting the work unit as "Running" and "100%" for Progress. BOINC Manager would not display the graphics.

Shutdown of BOINC Manager seemed to take a long time, but it finally happened. I rebooted the system. Once BOINC Manager launched it reported the status on this work unit as completed, "Ready to Report" and started the next work unit.

I forced an update so the result would be available prior to my posting this report.

Keeping in mind that I'm a newbie and could easily be misinterpreting, the result seems to be referring to only 8 models, so the screensaver graphic's Model 9 reference doesn't make sense to me.



This is related to a bug report on Ralph. The behaviour is exactly the same. It was supposed to be fixed in 5.22 obviously it is not.
____________

Craig Miller

Joined: Jun 5 06
Posts: 1
ID: 89284
Credit: 116,933
RAC: 56
Message 18482 - Posted 11 Jun 2006 23:29:23 UTC

I am having a problem running Rosetta. I attach to Rosetta using BOINC manager, and receive the notice of a successfull attachment. When I look at BOINC manager it shows Rosetta running, while Einstein and SETI are suspended. But when I come back several hours later Rosseta is not present, either in Projects or Tasks. When I look at the messages they seem to show Rosetta being loaded and started, but then it ends with: Detaching from project, shown below.

-------------
11-Jun-06 12:41:20|rosetta@home|Starting task t314__CASP7_ABRELAX_SAVE_ALL_OUT_hom002__666_13970_1 using rosetta version 522
11-Jun-06 12:49:01||Contacting account manager at http://bam.boincstats.com/
11-Jun-06 12:49:03||Account manager: BAM Host-ID: 2098
11-Jun-06 12:49:03||Account manager contact succeeded
11-Jun-06 12:49:03|rosetta@home|Resetting project
11-Jun-06 12:49:04||Rescheduling CPU: exit_tasks
11-Jun-06 12:49:04|rosetta@home|Detaching from project


When I check with BAM, my resources are shown as Einstein, SETU, and Rosetta.

What could be causing this problem?

Caig Miller

____________

Dogbytes Profile
Avatar

Joined: Dec 4 05
Posts: 37
ID: 29870
Credit: 207,563
RAC: 0
Message 18483 - Posted 12 Jun 2006 0:17:55 UTC
Last modified: 12 Jun 2006 0:31:24 UTC

The below linked WU crunched for over 2 hours, and yet was stuck at 0.00%. I aborted the unit because it appeared to be completely hung up, and stopped crunching. I would be interested to know what the problem was...

Aborted 5.22 WU link

Ian

Joined: Apr 14 06
Posts: 29
ID: 76277
Credit: 25,245
RAC: 0
Message 18484 - Posted 12 Jun 2006 0:50:05 UTC

Here's one from Saturday:

http://boinc.bakerlab.org/rosetta/result.php?resultid=23575087

And one from Friday:

http://boinc.bakerlab.org/rosetta/result.php?resultid=23484615

The only two errors for quite q while.
____________
Ian Cundell, St Albans, UK

Dogbytes Profile
Avatar

Joined: Dec 4 05
Posts: 37
ID: 29870
Credit: 207,563
RAC: 0
Message 18485 - Posted 12 Jun 2006 1:12:54 UTC

The below linked WU has severe memory leakage...using >275Megs of CPU memory bringing the hosts commit charge to nearly 600Megs. WU was aborted by user.

Aborted WU

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 18487 - Posted 12 Jun 2006 1:57:59 UTC - in response to Message ID 18482.
Last modified: 12 Jun 2006 1:59:43 UTC

When I check with BAM, my resources are shown as Einstein, SETU, and Rosetta.

What could be causing this problem?


Rosetta's servers were just upgraded to support BAM last week. But it looks like BAM did something, not Rosetta.

____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

scsimodo Profile

Joined: Sep 17 05
Posts: 93
ID: 349
Credit: 270,232
RAC: 0
Message 18525 - Posted 12 Jun 2006 17:52:24 UTC

Had a few WUs crashing when hitting the "show graphics" button. The window popped up, closed immediately and trashed the WU. The Wus are:

WU1
WU2
WU3
WU4

Host list is unhidden, host is a Mac Mini Core Duo, 1,66Ghz, 2GB RAM. Please drop a short notice when I can hide my hosts again...

What's strange is: hitting the "show graphics" button a few minutes before worked perfectly, seems to be a random problem...



Billy

Joined: May 29 06
Posts: 6
ID: 85245
Credit: 7,278
RAC: 0
Message 18569 - Posted 13 Jun 2006 14:12:12 UTC - in response to Message ID 18196.
Last modified: 13 Jun 2006 14:12:54 UTC

I had a work unit processing at about 80% complete and it seemed to be going normally. I suspended the project (as well as Einstein and Seti) and quit Boinc. I shutdown the computer and restarted. When Boinc started again, it reported this work unit as complete and uploaded it. Either it was stuck before or isn't actually complete. I had a similar thing happen a couple of days ago and it also reported work units complete even though the completion times were unusually short.

http://boinc.bakerlab.org/rosetta/result.php?resultid=23946621

iMac Core Duo, Rosetta version 5.22
____________

Stwato

Joined: Jan 11 06
Posts: 150
ID: 49485
Credit: 147,060
RAC: 0
Message 18570 - Posted 13 Jun 2006 14:44:02 UTC
Last modified: 13 Jun 2006 14:44:25 UTC

This work unit looks very strange in the graphics. Parts of the protein do not appear to be connected to the rest of it. It's like some of its missing. The protein appears very small. I first noticed the problem when I saw that the work unit is only using 20Mb memory and ~70Mb virtual but has nearly 9 million page faults. Assuming it was a small protein I took a look at the graphics and noticed the disjointed parts. Any ideas? I'll let it continue for now (5 hours into an 8 hour work unit)

If you would like me to screen shot it please let me know the best dimensions for posting into the forum as I have nowhere to upload it to.

Cheers
Stwato
____________

Keith Akins

Joined: Oct 22 05
Posts: 176
ID: 6022
Credit: 71,779
RAC: 0
Message 18572 - Posted 13 Jun 2006 16:04:07 UTC

After patting myself on the back for so many successful WU's, I get the following error on this unit:

6/12/2006 11:05:25 PM|rosetta@home|Unrecoverable error for result t306__CASP7_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_hom001__680_902_0 ( - exit code -1073741811 (0xc000000d))

This could be a v5.22, BOINC 5.4.9 or a conflict when checking mail with Mozilla Thunderbird.

Win XP Home Service Pack 2

Mozilla Firefox/Thunderbird combo.

Computers are visible and BOINC 5.4.9 should be debug reporting.

Ignore the Linux Computer as mine is a dual booter.
____________

Snake Doctor Profile
Avatar

Joined: Sep 17 05
Posts: 180
ID: 88
Credit: 3,018,835
RAC: 2,841
Message 18580 - Posted 13 Jun 2006 17:57:34 UTC - in response to Message ID 18570.

This work unit looks very strange in the graphics. Parts of the protein do not appear to be connected to the rest of it. It's like some of its missing. The protein appears very small. I first noticed the problem when I saw that the work unit is only using 20Mb memory and ~70Mb virtual but has nearly 9 million page faults. Assuming it was a small protein I took a look at the graphics and noticed the disjointed parts. Any ideas? I'll let it continue for now (5 hours into an 8 hour work unit)

If you would like me to screen shot it please let me know the best dimensions for posting into the forum as I have nowhere to upload it to.

Cheers
Stwato

this is a known problem with a few of the processing techniques being used. Not all the work units are using the same processing approach. In some cases they are only looking at parts of the protein structure and that somehow affects the display.
____________

We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 660
ID: 14
Credit: 838,217
RAC: 46
Message 18581 - Posted 13 Jun 2006 18:12:31 UTC

See Rhiju's post describing "Jumping"

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=1453#15060

TCU Computer Science

Joined: Dec 7 05
Posts: 28
ID: 32027
Credit: 12,616,170
RAC: 116
Message 18612 - Posted 14 Jun 2006 3:21:02 UTC

rosetta 5.22
WU Name: t314__CASP7_ABRELAX_SAVE_ALL_OUT_hom004__666_16529_0
running on Mac OS 10.4.6

BOINC Manager Tasks tab shows CPU Time stuck at 01:30:40 and 15%
top command shows TIME = 28:53:41 and climbing

stopped and restarted BOINC
CPU Time reverted to 01:13:00 and 15% but no longer stuck

Symptoms are identical to my post for ralph 5.18

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 18614 - Posted 14 Jun 2006 4:10:16 UTC

Tony, do you have JP's EMail address? The French guy who always needs new Rosetta .exe EMailed? Could you ask him to see if he can help with this post by French person on Q&A boards? The main parts that translated properly were that he's poor, alone and in a wheelchair. Perhaps JP can read between the lines better than the translation website.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

Astro
Avatar

Joined: Oct 2 05
Posts: 987
ID: 2322
Credit: 500,253
RAC: 0
Message 18625 - Posted 14 Jun 2006 10:27:59 UTC - in response to Message ID 18614.

Tony, do you have JP's EMail address? The French guy who always needs new Rosetta .exe EMailed? Could you ask him to see if he can help with this post by French person on Q&A boards? The main parts that translated properly were that he's poor, alone and in a wheelchair. Perhaps JP can read between the lines better than the translation website.

Mail sent

Stwato

Joined: Jan 11 06
Posts: 150
ID: 49485
Credit: 147,060
RAC: 0
Message 18626 - Posted 14 Jun 2006 10:48:49 UTC

Another question quickly, efficiently and comprehensivly answered!
Thanks a lot guys.

Stwato

Ian

Joined: Apr 14 06
Posts: 29
ID: 76277
Credit: 25,245
RAC: 0
Message 18676 - Posted 15 Jun 2006 1:09:09 UTC

Another for you. Result for WU 20318986

Hope these are useful.
____________
Ian Cundell, St Albans, UK

Robert Everly

Joined: Oct 8 05
Posts: 27
ID: 3289
Credit: 665,094
RAC: 0
Message 18758 - Posted 16 Jun 2006 3:02:58 UTC
Last modified: 16 Jun 2006 3:04:18 UTC

Here's one that went crazy. resultid 23976562

On this host hostid 214416

This host does nothing but crunch.

Another host did complete the WU sucessfully.

<core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
X_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors

*snipped a lot of lines*

allatom: 1567 res: 103 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1567 res: 103 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0061B538 read attempt to address 0x790C3DE3

Engaging BOINC Windows Runtime Debugger...


____________

Martin P.

Joined: May 26 06
Posts: 38
ID: 84658
Credit: 162,945
RAC: 16
Message 18775 - Posted 16 Jun 2006 8:11:37 UTC
Last modified: 16 Jun 2006 8:12:29 UTC

Problems with download of WUs: Either now work or heavily overcommitted.

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=1703

Hi,

I run SETI@Home, Einstein and Rosetta. Rosetta is set to 20%. The problem is, that once Rosetta has finished all WUs it never downloads any new WUs. Even when the long_term_debt is highly positive (e.g. 30,000 and bigger) it does not download any WUs. The only way to force download is to pause other projects, but in this case it downloads so many WUs that the computer is overcommitted for many days.

I currently run at "Contact server every 3 days". Even when setting this to 0.3 days before suspending the other projects and resetting it after the download it still downloads too many WUs.

This is what I tried:
1. Set "Contact server every 3 days" to 0.3 days.
2. Set SETI@Home and Einstein to "No new work"
3. Suspend SETI@Home and Einstein
4. Rosetta downloads some WUs
5. Set SETI@Home and Einstein to "Allow new work"
6. Restart SETI@Home and Einstein
7. Set "Contact server every xx days" back to 3 days.
8. Now Rosetta downloads even more WUs, which should not happen since SETI and Einstein are both active -> computer is overcommitted.

Is there a solution to this problem? Resetting long_term_dept to 0.0 on all projects does not help either.


The client errors are there because I have other projects running and therefore manually aborted these Work-Units so that the other project get their share as well. Otherwise Rosetta would have taken over my computers exclusively for several days.

I followed your advice and let it run for several days without any interfearance. I did not get any new work for 5 days but tonight it downloaded 30 work-units and is overcommitted again!

Obviously the scheduling of Rosetta does not work at all.


____________

Ian

Joined: Apr 14 06
Posts: 29
ID: 76277
Credit: 25,245
RAC: 0
Message 18833 - Posted 17 Jun 2006 0:16:35 UTC

Blimey. Whole flurry of errors. All today (well, yesterday - 16 June). Had nothing like this for weeks.

http://boinc.bakerlab.org/rosetta/result.php?resultid=24427279

http://boinc.bakerlab.org/rosetta/result.php?resultid=24460877

http://boinc.bakerlab.org/rosetta/result.php?resultid=24463664

http://boinc.bakerlab.org/rosetta/result.php?resultid=24495408

http://boinc.bakerlab.org/rosetta/result.php?resultid=24513042
____________
Ian Cundell, St Albans, UK

Lee Carre

Joined: Oct 6 05
Posts: 96
ID: 2884
Credit: 79,331
RAC: 0
Message 18835 - Posted 17 Jun 2006 2:40:05 UTC
Last modified: 17 Jun 2006 2:40:41 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=24571715

i was viewing the graphics window at the time it failed incase that makes a difference
____________
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins

Tigher Profile

Joined: Jun 16 06
Posts: 5
ID: 95326
Credit: 5,814
RAC: 0
Message 18842 - Posted 17 Jun 2006 8:49:03 UTC
Last modified: 17 Jun 2006 8:54:05 UTC

I have just joined the project. On one PC of the 9 WUs it has been sent it has successfully processed 5 but errored out on 4.

from my log:

17/06/2006 04:25:04 Unrecoverable error for result t299__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_cterm2_nohelix3_hom001__681_83011_0 ( - exit code -1073741819 (0xc0000005))

Clues or advice?


A different unit to that above but some debug info to help devs.
http://boinc.bakerlab.org/rosetta/result.php?resultid=24466158

____________

Jimi@0wned.org.uk

Joined: Mar 10 06
Posts: 29
ID: 64757
Credit: 335,252
RAC: 0
Message 18844 - Posted 17 Jun 2006 11:27:51 UTC

First error ever on this machine (31,000 credit):

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=19785224

stderr out

<core_client_version>5.5.0</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 3706611
# cpu_run_time_pref: 14400
# cpu_run_time_pref: 14400
ERROR:: Exit at: .\dock_structure.cc line:401

</stderr_txt>

btw [BOINCUK]Tigher, (0xc0000005) is usually a memory error, in my experience.

Tigher Profile

Joined: Jun 16 06
Posts: 5
ID: 95326
Credit: 5,814
RAC: 0
Message 18855 - Posted 17 Jun 2006 15:22:10 UTC - in response to Message ID 18844.



btw [BOINCUK]Tigher, (0xc0000005) is usually a memory error, in my experience.



Gulp! Hmmm thanks.
____________

Bandit Profile

Joined: May 21 06
Posts: 12
ID: 83715
Credit: 197,197
RAC: 0
Message 18857 - Posted 17 Jun 2006 16:14:38 UTC - in response to Message ID 18855.
Last modified: 17 Jun 2006 16:14:54 UTC

Another problem - looks the same from this end as the other ones I had.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=20802010

When I leave from working on the computer, I'll exit IE to see if that helps.

Bandit's Mom
____________

Leonard Kevin Mcguire Jr. Profile

Joined: Jun 13 06
Posts: 29
ID: 94664
Credit: 10,529
RAC: 0
Message 18861 - Posted 17 Jun 2006 18:47:32 UTC
Last modified: 17 Jun 2006 18:48:14 UTC

http://boinc.bakerlab.org/rosetta/hosts_user.php?userid=94664

I have been accumulating computation errors lately.
____________

tralala

Joined: Apr 8 06
Posts: 376
ID: 73828
Credit: 581,806
RAC: 1
Message 18881 - Posted 18 Jun 2006 9:01:30 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=24638007

This WU created about three good models with energy minima between -200 and -300. then it failed to do more good models which each succeeding model completing within minutes and always the same energy minimum of about -30. Watching on the graphics showed a stretched protein where no folding was achieved. I "aborted" the model the soft way with 6 restarts of BOINC (to prevent sending out the same WU).

I watched such WU in the past. Perhaps there is a pattern.
____________

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 18892 - Posted 18 Jun 2006 16:44:27 UTC - in response to Message ID 18881.

This WU created about three good models with energy minima between -200 and -300. then it failed to do more good models which each succeeding model completing within minutes and always the same energy minimum of about -30.

I for one have been HOPING to see WUs that would act like that. If you knew that a -300 was possible, and you are sitting at a -30, there are cases where it might be SMART to bail on this one and invest the time in pursuing something with more potential.

I don't know that this is what happened in your case, I'll leave that for the project team to assess. I just wanted to point out that it is the TYPE of thing that I think we'll see more of as the algorythm gets smarter.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

tralala

Joined: Apr 8 06
Posts: 376
ID: 73828
Credit: 581,806
RAC: 1
Message 18903 - Posted 18 Jun 2006 20:31:41 UTC - in response to Message ID 18892.

This WU created about three good models with energy minima between -200 and -300. then it failed to do more good models which each succeeding model completing within minutes and always the same energy minimum of about -30.

I for one have been HOPING to see WUs that would act like that. If you knew that a -300 was possible, and you are sitting at a -30, there are cases where it might be SMART to bail on this one and invest the time in pursuing something with more potential.

I don't know that this is what happened in your case, I'll leave that for the project team to assess. I just wanted to point out that it is the TYPE of thing that I think we'll see more of as the algorythm gets smarter.


I agree! Using previous result for "pruning" decision is an idea that for a long time crossed my mind. I'm a bit in chess engine programming and in these engines a lot of "pruning" is done in positions where one side is just too worse to have any chance of reaching the current score with any move. However in the case reported it was most certainly something different, since the models finished successively in a few minutes without really folding the protein (it was stretched in the graphics) and with always the same score. In the end I had over 150 models of which only three had not been "aborted".

____________

Carlos_Pfitzner Profile
Avatar

Joined: Dec 22 05
Posts: 71
ID: 42027
Credit: 138,867
RAC: 0
Message 18913 - Posted 19 Jun 2006 3:40:47 UTC

stuck at 74.101% Rosetta 5.22 Windows 0.0000% of CPU usage
Thus, aborted by hand after 3 hours of IDLE time!
http://boinc.bakerlab.org/rosetta/result.php?resultid=24659040

Thanks

____________
Click signature for global team stats

rriggs

Joined: Jun 5 06
Posts: 5
ID: 88342
Credit: 48,672
RAC: 0
Message 18931 - Posted 19 Jun 2006 14:09:07 UTC

For the past week or so I've been getting 2-3 crashes per day. The failed work units show up as "Compute Error" with no credit. Do I need to report this? Or will the appropriate party see these errors and be able to deal with them on their own?

____________

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 18934 - Posted 19 Jun 2006 15:22:36 UTC - in response to Message ID 18931.

Do I need to report this? Or will the appropriate party see these errors and be able to deal with them on their own?


It is "HELPFUL" if you report them. It gives the opportunity to ask you questions about your computing environment so they might learn more about the system that's seeing the failure. It is not "required".

Credit for failed WUs is issued once the daily credit run is made. You will see this when you display the WU details... not on the WU listing. Like this one for example.

It looks like most of them were ended by the "watchdog". One was a -107 error (which is something that's been under review for a while already).

The watchdog is trying to assure your computer doesn't get stuck in an unexpected loop on a work unit. If it notices no progress on a work unit in 5 restarts, then it ends it. Do you restart this computer frequently? Or have a number of other projects running in BOINC?

If you would, go to your General Preferences, and let us know what you have set for "Switch between applications every...minutes", and for "Leave applications in memory while preempted?". And is Rosetta your only BOINC project?
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

anders n Profile

Joined: Sep 19 05
Posts: 403
ID: 578
Credit: 537,904
RAC: 0
Message 18939 - Posted 19 Jun 2006 15:58:22 UTC

A crash.

http://boinc.bakerlab.org/rosetta/result.php?resultid=24876847

It happend when i was shutting down grafics window.

Anders n
____________

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 18940 - Posted 19 Jun 2006 17:26:32 UTC - in response to Message ID 18934.

It looks like most of them were ended by the "watchdog". One was a -107 error (which is something that's been under review for a while already).


Correction, I misread that "watchdog is shutting down" message (again!). I keep thinking this message indicates that the watchdog is shutting down the WU, not just ending itself as a normal end of processing a WU.

Most of their errors were -107s.

____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

rriggs

Joined: Jun 5 06
Posts: 5
ID: 88342
Credit: 48,672
RAC: 0
Message 18978 - Posted 20 Jun 2006 15:14:25 UTC - in response to Message ID 18934.


The watchdog is trying to assure your computer doesn't get stuck in an unexpected loop on a work unit. If it notices no progress on a work unit in 5 restarts, then it ends it. Do you restart this computer frequently? Or have a number of other projects running in BOINC?

If you would, go to your General Preferences, and let us know what you have set for "Switch between applications every...minutes", and for "Leave applications in memory while preempted?". And is Rosetta your only BOINC project?


I'll try to answer your questions here:

Machine is rarely restarted, once every 2-3 days.

This is the only project I have under BOINC. No other background/SETI type applications are installed.

I'm not sure where this "General Preferences" dialog is you're referring to. I don't see anything like this in BOINC.

I am an accomplished C++/Java/.NET developer w/ Visual Studio installed on this box if you need me to grab a stack trace, I'd be happy to next time!

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 18983 - Posted 20 Jun 2006 15:50:45 UTC - in response to Message ID 18978.

I'm not sure where this "General Preferences" dialog is you're referring to. I don't see anything like this in BOINC.


Now that you are viewing this message board, click the "Participants" link in the heading of the screen. In the "Preferences" section, click the link for "view or edit" of General preferences. Any changes made there require BOINC to update to the project to take effect. This is done from the projects tab of BOINC, select Rosetta, then click the update button.

____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

Bandit Profile

Joined: May 21 06
Posts: 12
ID: 83715
Credit: 197,197
RAC: 0
Message 18998 - Posted 20 Jun 2006 17:54:18 UTC - in response to Message ID 18196.

In followup to Message ID 18855, as long as I don't have IE running, I don't seem to have any BOINC problems. If I leave IE on, I have intermittant BOINC crashes. For me, it does not seem to be the screensaver at this time.

Bandit's Mom
____________

andrewsi

Joined: Jun 19 06
Posts: 1
ID: 95888
Credit: 908,495
RAC: 515
Message 19008 - Posted 20 Jun 2006 19:16:58 UTC
Last modified: 20 Jun 2006 19:19:37 UTC

Ran into a compute error with 522.

6/20/2006 12:12:35 PM|rosetta@home|Unrecoverable error for result t304__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom001__691_17229_0 ( - exit code -1 (0xffffffff)).

Looks like it was: http://boinc.bakerlab.org/rosetta/workunit.php?wuid=21222160

What other information should I provide?

____________

rriggs

Joined: Jun 5 06
Posts: 5
ID: 88342
Credit: 48,672
RAC: 0
Message 19009 - Posted 20 Jun 2006 19:20:55 UTC - in response to Message ID 18983.


Now that you are viewing this message board, click the "Participants" link in the heading of the screen. In the "Preferences" section, click the link for "view or edit" of General preferences. Any changes made there require BOINC to update to the project to take effect. This is done from the projects tab of BOINC, select Rosetta, then click the update button.


You didn't say what these 'should be' so I'm just reporting what they currently are and not changing anything:

work on batteries: no
work while in use: no
idle: 3 mins
hours: (no restrictions)
leave in memory: no
switch between: 60 mins
multiprocessors: 0 processors (although I have two of them!?)
use at most: 100 percent of CPU


____________

tralala

Joined: Apr 8 06
Posts: 376
ID: 73828
Credit: 581,806
RAC: 1
Message 19012 - Posted 20 Jun 2006 20:09:31 UTC
Last modified: 20 Jun 2006 20:15:08 UTC

When I restarted my computer I lost over an hour on this WU. It went back at restart to 0% after running about an hour on my fast Athlon 64 @2.44 GHz. Obvioulsy no checkpoint occured during this time. I know t296 is very big but no checkpoint within an hour is not good (since one hour is the default switch time of BOINC).
____________

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 19016 - Posted 20 Jun 2006 21:05:18 UTC

rriggs, if you are actively using that computer much at all... your settings are preventing you from getting much work done. You see you've told BOINC to wait until you've not used your computer for 3 minutes before it runs. Then it starts running. When you return to use your computer, you've told it to remove the applications from memory, and so any work it has performed since the last checkpoint will be lost. Since Rosetta typically checkpoints no more than every 20 minutes, if you have left your computer for 15 minutes, then you've crunched for 12 minutes (after waiting for the 3 minute delay before it starts) and then when you use your computer again, you are throwing away the 12 minutes of work. And so you later have to redo that 12 minutes of work.

I don't know now agressively you intend to crunch. But you can preserve the work done (12 min. in my example) to continue on it later by setting the "leave in memory" setting to YES. You've got 2GB of memory, so that gives you plenty of room. Also, it just keeps it in virtual memory, not actually the physical memory of the machine. So, changing this setting will preserve these short work periods, and not impact your computer use. By keeping applications in memory, you would only lose bits of work when you actually turn off the computer.

Now, you also have a dual-core CPU. So you could be crunching 2 work units at the same time. But you have set BOINC to only use one. You can set the "On multiprocessors, use at most" setting to 2 and use both of them. I'm not positive what it does when you have that set to zero.

It would be further agressive to crunch while your computer is in use. I take it you've got 2GB of memory because you have some pretty intense applications to wish to use. So, your current setting of NOT working while your computer is in use should probably remain. But, just FYI, I run with half as much memory and run it all the time, and there is no noticeble effect on my running applications.

Having said all of that... your errors are mostly the -107 errors. Looks like you get either a -107 or a -1 about 10% of the time. I'm not sure, perhaps leaving in memory will reduce your chances of hitting the -107 errors. But otherwise, I don't believe the above will resolve the problem you are having with erroring work units. They are already working on a fix for the -107 errors. There are a number of people hitting that more frequently lately.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

Winkle

Joined: May 22 06
Posts: 88
ID: 83983
Credit: 1,093,044
RAC: 499
Message 19037 - Posted 21 Jun 2006 7:42:53 UTC

I have t307__CASP7_ABRELAX_SAVE_ALL_OUT_BARCODE_hom001__714_20997_0 using rosetta version 5.22 and it has been running now for 24 hrs. It has been stuck on 100% for at least the last hour I have been watching it. Mem usage of Rosetta was 88M and id now 94M after 30 mins. Now 97M ans climbing.
CPU usage doesn't change when I suspend the task from the BOINC manager.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=20861564

The show graphics screen says...
68.601% complete
CPU time: 24 hr 0 min
Stage: Ab initio + relax
Model 116 step 0
Accepted Enrgy 44.55485

Nothing is changing on the screen. The protein looks like a single zig-zag line

Target CPU time is set to 8 hrs.

Do I abort ?

Winkle

Joined: May 22 06
Posts: 88
ID: 83983
Credit: 1,093,044
RAC: 499
Message 19041 - Posted 21 Jun 2006 7:59:09 UTC - in response to Message ID 19037.

I have t307__CASP7_ABRELAX_SAVE_ALL_OUT_BARCODE_hom001__714_20997_0 using rosetta version 5.22 and it has been running now for 24 hrs. It has been stuck on 100% for at least the last hour I have been watching it. Mem usage of Rosetta was 88M and id now 94M after 30 mins. Now 97M ans climbing.
CPU usage doesn't change when I suspend the task from the BOINC manager.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=20861564

The show graphics screen says...
68.601% complete
CPU time: 24 hr 0 min
Stage: Ab initio + relax
Model 116 step 0
Accepted Enrgy 44.55485

Nothing is changing on the screen. The protein looks like a single zig-zag line

Target CPU time is set to 8 hrs.

Do I abort ?


I ended up aborting it... The machine became unworkable.
I have reported it in another thread.

Ian

Joined: Apr 14 06
Posts: 29
ID: 76277
Credit: 25,245
RAC: 0
Message 19044 - Posted 21 Jun 2006 8:14:24 UTC

Another:

http://boinc.bakerlab.org/rosetta/result.php?resultid=25008379 (WU 21172214)

<core_client_version>5.2.13</core_client_version>
<message>process exited with code 131 (0x83)
</message>
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 3248719
SIGBUS: bus error

Ooo-er. That doesn't sound healthy.
____________
Ian Cundell, St Albans, UK

Bober [B@P] Profile
Avatar

Joined: Jun 12 06
Posts: 3
ID: 94620
Credit: 5,113
RAC: 0
Message 19053 - Posted 21 Jun 2006 12:15:28 UTC - in response to Message ID 19044.
Last modified: 21 Jun 2006 12:17:42 UTC

Recently I've had -107 errors:
http://boinc.bakerlab.org/rosetta/result.php?resultid=24946846
http://boinc.bakerlab.org/rosetta/result.php?resultid=24946856

I've just started crunching for Rosetta. I don't use any screensaver.
The same error have just occured on my Ralph but with 5.24 app.

What can I do to avoid those errors?
____________

rriggs

Joined: Jun 5 06
Posts: 5
ID: 88342
Credit: 48,672
RAC: 0
Message 19060 - Posted 21 Jun 2006 14:38:07 UTC - in response to Message ID 19016.
Last modified: 21 Jun 2006 14:46:22 UTC

rriggs, if you are actively using that computer much at all... your settings are preventing you from getting much work done. You see you've told BOINC to wait until you've not used your computer for 3 minutes before it runs. Then it starts running. When you return to use your computer, you've told it to remove the applications from memory, and so any work it has performed since the last checkpoint will be lost. Since Rosetta typically checkpoints no more than every 20 minutes, if you have left your computer for 15 minutes, then you've crunched for 12 minutes (after waiting for the 3 minute delay before it starts) and then when you use your computer again, you are throwing away the 12 minutes of work. And so you later have to redo that 12 minutes of work.

Now, you also have a dual-core CPU. So you could be crunching 2 work units at the same time. But you have set BOINC to only use one. You can set the "On multiprocessors, use at most" setting to 2 and use both of them. I'm not positive what it does when you have that set to zero.


I never even saw this page, let alone adjusted the settings so these are the defaults. Perhaps the setup process should either pick better defaults or bring this page to my attention so I would have found it sooner?

I guess when I installed I just picked Activity|Run Always and Activity|Network always available, so it has been running non-stop! This may potentially invalidate your hypothesis about why I'm getting -107 errors since the app is never leaving memory.

ps. It was crashed this morning when I came in, so I tried to debug it, but my machine locked up launching the debugger. I will try again tomorrow!

____________

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 19062 - Posted 21 Jun 2006 14:45:45 UTC - in response to Message ID 19060.
Last modified: 21 Jun 2006 15:26:31 UTC

Perhaps the setup process should either pick better defaults or bring this page to my attention so I would have found it sooner?

My task manager has always shown them both crunching at 50% CPU.

What do you think?


OK, my apologies. As I said I wasn't certain what it does when CPUs is set to zero. So that isn't an issue. If you've got 2 WUs running at 50% CPU each then you are fully crunching... when your computer is not in use.

I would still suggest you set the leave in memory to YES. Save all the work done during coffee breaks and during meetings or conference calls or whatever pulls you away from the computer.

As for changing the setup process, unfortunately that is not something Rosetta could change. It would be changed by the BOINC folks. So you would have to take up that suggestion on the BOINC boards.

Every time your PC is idle for 3 minutes, BOINC will start crunching... then when you come back and use the computer, BOINC suspends... and removes from memory, that was the thought. ...except since you've not said to "run based on preferences"... it's actually running all the time, regardless of whether other applications are in use?
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

rriggs

Joined: Jun 5 06
Posts: 5
ID: 88342
Credit: 48,672
RAC: 0
Message 19063 - Posted 21 Jun 2006 14:47:56 UTC - in response to Message ID 19062.

OK, my apologies. As I said I wasn't certain what it does when CPUs is set to zero. So that isn't an issue. If you've got 2 WUs running at 50% CPU each then you are fully crunching... when your computer is not in use.


Oops. I edited my post while you were replying. Please recheck it now!

____________

Charles Dennett Profile

Joined: Sep 27 05
Posts: 88
ID: 1447
Credit: 877,343
RAC: 0
Message 19064 - Posted 21 Jun 2006 14:48:52 UTC

Before leaving for work this morning I checked my Linux box. CPU was at 0%. The ps command showed the boinc and rosetta processes there but doing nothing. Looked like it had stopped just a short while after starting a new WU. I stoped and restarted boinc and the WU took off normally. It just finished and reported. Here's the WU:

http://boinc.bakerlab.org/rosetta/result.php?resultid=25090497

Charlie

____________

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 19065 - Posted 21 Jun 2006 15:23:11 UTC - in response to Message ID 19053.

Recently I've had -107 errors:
http://boinc.bakerlab.org/rosetta/result.php?resultid=24946846
http://boinc.bakerlab.org/rosetta/result.php?resultid=24946856

I've just started crunching for Rosetta. I don't use any screensaver.
The same error have just occured on my Ralph but with 5.24 app.

What can I do to avoid those errors?


Lukasz, you've already done what you can (so far as I know). One of your results reported a lot of useful information that will help analyze the problem.

Your computer time is still helping the project, and you are still getting credit for all the time crunching, so do not be detoured. Running on Ralph records additional diagnostic information back to the project. Hopefully they can determine the root cause soon. I see you have 2 out of 5 of your WUs failed, and another that you aborted for some reason.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

Bober [B@P] Profile
Avatar

Joined: Jun 12 06
Posts: 3
ID: 94620
Credit: 5,113
RAC: 0
Message 19069 - Posted 21 Jun 2006 15:48:54 UTC - in response to Message ID 19065.
Last modified: 21 Jun 2006 15:53:12 UTC


I see you have 2 out of 5 of your WUs failed, and another that you aborted for some reason.


The reason is I thought that it's my computer's fault and I didn't want it to spoil more WUs. I have to admit that PC was overclocked a bit and it is very hot today, so I had to change some settings. But don't worry I'm far from being discouraged. I will crunch again for Rosetta soon:)

Thank you for reply!
____________

Charles Dennett Profile

Joined: Sep 27 05
Posts: 88
ID: 1447
Credit: 877,343
RAC: 0
Message 19080 - Posted 21 Jun 2006 20:45:20 UTC - in response to Message ID 19064.

Before leaving for work this morning I checked my Linux box. CPU was at 0%. The ps command showed the boinc and rosetta processes there but doing nothing. Looked like it had stopped just a short while after starting a new WU. I stoped and restarted boinc and the WU took off normally. It just finished and reported. Here's the WU:

http://boinc.bakerlab.org/rosetta/result.php?resultid=25090497

Charlie


This might be a problem on my end. Came home from work to find the machine in the same state. Checked STDOUT from boinc (I redirect it to a file) and both this morning and this afternoon it complained about network problems. However, this afternoon restarting boinc didn't work. It was trying to download new work but the network problems were prevening it. Boinc kept shutting down. I also could not get out to the net in my web browser. So, I reset my router. It's either a problem with my router or the cable connection is messing up. Hard to tell which at this point but this past weekend the router was hung so bad I had to do a hard reset and reconfigure it from scratch. The cable company's network status page show some problems in some surrounding areas but not my particular area. Time to use that gift card from Best Buy!

Charlie
____________

TCU Computer Science

Joined: Dec 7 05
Posts: 28
ID: 32027
Credit: 12,616,170
RAC: 116
Message 19102 - Posted 22 Jun 2006 3:49:26 UTC - in response to Message ID 18612.

rosetta 5.22
WU Name: t316__CASP7_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_secondhalf_hom019__726_329
running on Mac OS 10.4.6

BOINC Manager Tasks tab shows CPU Time stuck at 03:21:43 and 35.5%
top command shows TIME = 37:51:05 and climbing

stopped and restarted BOINC
CPU Time reverted to 02:50:49 and 35.5% but no longer stuck

This is on a G5 crunching only for rosetta.
The two previous instances of this problem occurred on a G4 crunching rosetta + ralph + einstein.

[B^S] Dr. Bill Skiba Profile
Avatar

Joined: Oct 26 05
Posts: 5
ID: 6918
Credit: 100,454
RAC: 1
Message 19241 - Posted 24 Jun 2006 21:52:31 UTC

Just aborted this work unit.

http://boinc.bakerlab.org/rosetta/result.php?resultid=25006316

Stuck at 1hr 7min - suspened and resumed several times to no avail. Next work Rosetta work unit seems to be running normally.

rosetta 5.22
windows 2K
athlon xp 2500 barton
____________

Clare Jarvis Profile

Joined: Dec 14 05
Posts: 8
ID: 37042
Credit: 781,656
RAC: 210
Message 19338 - Posted 27 Jun 2006 1:48:14 UTC

I have been having similar problems. I cannot
leave Rosetta alone or it simply hangs. But if I
visit and hit "Update" every day then I get much better production.
Is this a problem with Rosetta or with Boinc. It is very frustrating.
I wish the statistics page had the start time and date of each
run along with the deadline.


____________

Pepo
Avatar

Joined: Sep 28 05
Posts: 115
ID: 1676
Credit: 75,237
RAC: 18
Message 19418 - Posted 28 Jun 2006 14:38:25 UTC
Last modified: 28 Jun 2006 15:09:05 UTC

I have (occassionally) the problem of stalled/hanging Rosettas (somewhere, not at 0% or 1% or 100% progress) already for ages, on Red Hat EL 4.1. Now using BCC 5.4.9, attached to 7 projects, Rosetta's share is ~20%. The computer is running for months betwen reboots, without graphics.

The symptoms are that Rosetta app seems to be running, but the CPU time does not increase. Recently I've noticed that even BCC is not able to run benchmarks, if this happens. IIRC previously if BCC was able to switch to aother app, it got 0 CPU cyces (because Rosetta was consuming all) and did not increment time. Usually the only way to overcome this problem was to manually restart BCC. This way the Rosettas were able to continue and finish. (Whether correctly? Now I can see a few (5) process exited with code 131 (0x83) messages since March in the logs.)

This time, a week ago I've made few snapshots of suspended rosetta 5.22' result t312__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom010__711_1635_0 and reported them in the Rosetta WU's stall on RedHat Fedora thread. It is stuck at 28.80% (2:43:29 CPU time), maybe for a day already. I'll try to restart BCC, if something new will come into the files in it's slot/3/ dir. And then abort and report, it's now after deadline anyway...

Yes, it restarted happily, CPU time jumped from 2:43:29 to 1:43:29 and is incrementing, but progress stayed at 28.80% and does not move. Aborting...

<core_client_version>5.4.9</core_client_version>
<message>
aborted by user
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# random seed: 1940641
# cpu_run_time_pref: 21600
SIGSEGV: segmentation violation
Stack trace (14 frames):
[0x884cb9f]
[0x8864cfc]
[0x88cade8]
[0x8621564]
[0x87f229b]
[0x873b844]
[0x873d0af]
[0x85a95e9]
[0x85b190a]
[0x83d6c9f]
[0x86022d3]
[0x84740c8]
[0x88c41e4]
[0x8048111]

Exiting...
Graphics are disabled due to configuration...
# cpu_run_time_pref: 21600
SIGSEGV: segmentation violation
Stack trace (15 frames):
[0x884cb9f]
[0x8864cfc]
[0x88cade8]
[0x88e5473]
[0x88b6601]
[0x88b8029]
[0x805fdd8]
[0x83d75de]
[0x83d90a0]
[0x83d8f89]
[0x83d72ca]
[0x88cb7ef]
[0x885bff0]
[0x8865f65]
[0x88f771a]

Exiting...
SIGSEGV: segmentation violation
Stack trace (14 frames):
[0x884cb9f]
[0x8864cfc]
[0x88cade8]
[0x853664c]
[0x854a184]
[0x830867c]
[0x8308fdf]
[0x86c4a6a]
[0x86c6f15]
[0x83d6f08]
[0x86022d3]
[0x84740c8]
[0x88c41e4]
[0x8048111]

Exiting...
Graphics are disabled due to configuration...
# cpu_run_time_pref: 21600
ERROR:: Exit at: fragments.cc line:459
FILE_LOCK::unlock(): close failed.: Bad file descriptor

</stderr_txt>

Peter

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 19422 - Posted 28 Jun 2006 16:21:28 UTC - in response to Message ID 19338.

But if I visit and hit "Update" every day then I get much better production. Is this a problem with Rosetta or with Boinc.


BOINC is responsible to contact the projects that it needs to get work from. Performing an update wouldn't have much to do with a hung work unit. Are you saying to end up without work? Or are you saying that your existing WUs are not ending properly?

____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1725
ID: 44890
Credit: 843,377
RAC: 108
Message 19423 - Posted 28 Jun 2006 16:24:06 UTC

Pepo: I'm not clear how long you observed the running of the WU after restarting it. But the progress % does not change very frequently and this is normal. Here is some relevant information on the subject. Perhaps you are saying you let it run for over an hour with no progress... that would be another matter. But, if not, that portion of what you are describing is probably normal and does not require your intervention to abort.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

Pepo
Avatar

Joined: Sep 28 05
Posts: 115
ID: 1676
Credit: 75,237
RAC: 18
Message 19425 - Posted 28 Jun 2006 16:39:14 UTC - in response to Message ID 19423.
Last modified: 28 Jun 2006 16:40:16 UTC

Pepo: I'm not clear how long you observed the running of the WU after restarting it. But the progress % does not change very frequently and this is normal. Here is some relevant information on the subject. Perhaps you are saying you let it run for over an hour with no progress... that would be another matter. But, if not, that portion of what you are describing is probably normal and does not require your intervention to abort.

Yes, I've read the FAQ. If you look at the Rosetta WU's stall on RedHat Fedora thread I mentioned, the Rosetta was hung for at least more than a day, I could look into the logs to tell exactly.

I usually check the machine once in a day-two (because of Rosetta :-) and restart Boinc if this happens. And it is happening for long already. I'm pretty sure that for few months.

Peter

Pepo
Avatar

Joined: Sep 28 05
Posts: 115
ID: 1676
Credit: 75,237
RAC: 18
Message 19432 - Posted 28 Jun 2006 19:24:05 UTC - in response to Message ID 19425.

Pepo: I'm not clear how long you observed the running of the WU after restarting it.

[...]the Rosetta was hung for at least more than a day, I could look into the logs to tell exactly.

I'm sory, Feet1st, I did not read carefully enough. I aborted the result 20 minutes after restarting it.

Peter

Message boards : Number crunching : Report Problems with Rosetta Version 5.22


Home | Join | About | Participants | Community | Statistics

Copyright © 2010 University of Washington

Last Modified: 3 Dec 2007 20:36:19 UTC
Back to top ^