My compuer is freezing on one of these two WUs - 19746278 or 19737574. It's happened several times today. I'd come into the room, note that the graphics weren't animated, the steps weren't incrementing, and the clock would be stopped. I'd move the mouse (I have it configured to work on the project only when I'm not using the machine), and would get a notice that there was an error and would I like to report to MicroSoft. Next time, I'll get a screenshot.
Bandit's Mom
____________
ID: 18275 | Rating: 0 | rate:
/
Moderator9 Forum moderator Project administrator Joined: Jan 22 06 Posts: 1014 ID: 53254 Credit: 0 RAC: 0
My compuer is freezing on one of these two WUs - 19746278 or 19737574. It's happened several times today. I'd come into the room, note that the graphics weren't animated, the steps weren't incrementing, and the clock would be stopped. I'd move the mouse (I have it configured to work on the project only when I'm not using the machine), and would get a notice that there was an error and would I like to report to MicroSoft. Next time, I'll get a screenshot.
Bandit's Mom
It is possible that there is a background task running(disk defrag, virus check, etc) that is preventing your system from becoming fully idle. This of course would also prevent BOINC from processing any work.
____________
Moderator9
ROSETTA@home FAQ
Moderator Contact
It is possible that there is a background task running(disk defrag, virus check, etc) that is preventing your system from becoming fully idle. This of course would also prevent BOINC from processing any work.
Nope - the only other things I have open are Word (x2), Excel (x2), Reference Manager, and a Mah Jongg game program.
Here's the text of the error message: "rosetta_5.22_windows_intelx86.exe has encountered a problem and needs to close. We are sorry for the inconvenience."
Bandit's Mom
____________
ID: 18283 | Rating: 0 | rate:
/
Moderator9 Forum moderator Project administrator Joined: Jan 22 06 Posts: 1014 ID: 53254 Credit: 0 RAC: 0
It is possible that there is a background task running(disk defrag, virus check, etc) that is preventing your system from becoming fully idle. This of course would also prevent BOINC from processing any work.
Nope - the only other things I have open are Word (x2), Excel (x2), Reference Manager, and a Mah Jongg game program.
Here's the text of the error message: "rosetta_5.22_windows_intelx86.exe has encountered a problem and needs to close. We are sorry for the inconvenience."
Bandit's Mom
There was a problem with the screen saver on some Windows systems with version 5.16. This was supposed to be fixed in the new release. Have you tried running BOINC with the screen saver turned off?
In order to assist you you will either have to provide a link to the reported results in your Stats list, or make your computer visible. Currently your computers are hidden so I cannot look up any of your results to see the actual errors.
You can make your computers visible from your preferences without risk to your computer security. If you want to see the kind of information others might see, you can clink on any other user in the forums, and then clink on the link to view their computers.
Of course the system will allow you to see more information on your own systems than it would reveal to others.
____________
Moderator9
ROSETTA@home FAQ
Moderator Contact
Result ID 23395831
Name t306__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom001__656_21236_0
Workunit 19711334
Created 8 Jun 2006 22:11:53 UTC
Sent 8 Jun 2006 23:44:52 UTC
Received 9 Jun 2006 11:22:33 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 246538
Report deadline 15 Jun 2006 23:44:52 UTC
CPU time 21089.578125
stderr out <core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
# random seed: 1831515
# cpu_run_time_pref: 21600
# DONE :: 1 starting structures built 21 (nstruct) times
# This process generated 21 decoys from 21 attempts
Result ID 23395831
Name t306__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom001__656_21236_0
Workunit 19711334
Created 8 Jun 2006 22:11:53 UTC
Sent 8 Jun 2006 23:44:52 UTC
Received 9 Jun 2006 11:22:33 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 246538
Report deadline 15 Jun 2006 23:44:52 UTC
CPU time 21089.578125
stderr out <core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
# random seed: 1831515
# cpu_run_time_pref: 21600
# DONE :: 1 starting structures built 21 (nstruct) times
# This process generated 21 decoys from 21 attempts
Validate state Valid
Claimed credit 71.993372528454
Granted credit 71.993372528454
application version 5.22
In other post I have see that someone also reported that Watchdog has shutting down the process.
Regards,
Ricardo
This is a normal shutdown for a successfully completed workunit.
The note regarding the watchdog is just to identify that now that the work unit has finished, the watchdog function is being closed down as well.
____________
Regards,
Bob P.
Result ID 23395831
Name t306__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom001__656_21236_0
Workunit 19711334
Created 8 Jun 2006 22:11:53 UTC
Sent 8 Jun 2006 23:44:52 UTC
Received 9 Jun 2006 11:22:33 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 246538
Report deadline 15 Jun 2006 23:44:52 UTC
CPU time 21089.578125
stderr out <core_client_version>5.3.12.tx36</core_client_version>
<stderr_txt>
# random seed: 1831515
# cpu_run_time_pref: 21600
# DONE :: 1 starting structures built 21 (nstruct) times
# This process generated 21 decoys from 21 attempts
I've turned the screensaver off on the Control Panel - is there something I should do in my BOINC preferences? (Color me ignorant, and what I read isn't staying with me at the moment.) You should be able to "see" my computer now, unless there's something else I should do. "... provide a link to the reported results in your Stats list ..." Not sure how to do this.
Bandit's Mom
There was a problem with the screen saver on some Windows systems with version 5.16. This was supposed to be fixed in the new release. Have you tried running BOINC with the screen saver turned off?
In order to assist you you will either have to provide a link to the reported results in your Stats list, or make your computer visible. Currently your computers are hidden so I cannot look up any of your results to see the actual errors.
You can make your computers visible from your preferences without risk to your computer security. If you want to see the kind of information others might see, you can clink on any other user in the forums, and then clink on the link to view their computers.
Of course the system will allow you to see more information on your own systems than it would reveal to others.
...You should be able to "see" my computer now, unless there's something else I should do.
They still show "hidden". In your Rosetta preferences, select YES for the question "Should Rosetta@home show your computers on its web site". More details here.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
It's selected as "yes." Maybe it needed time to implement in the system. Maybe you could try again?
Did you hit the "Update preferences" button at the bottom of the screen? I just looked again and it still shows as "hidden".
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
I went back to look and it was saved as "yes," but hit the "Update" button again, just for giggles and grins. Maybe it didn't take the first time around.
Bandit's Mom
Did you hit the "Update preferences" button at the bottom of the screen? I just looked again and it still shows as "hidden".
____________
ID: 18319 | Rating: 0 | rate:
/
Moderator9 Forum moderator Project administrator Joined: Jan 22 06 Posts: 1014 ID: 53254 Credit: 0 RAC: 0
I went back to look and it was saved as "yes," but hit the "Update" button again, just for giggles and grins. Maybe it didn't take the first time around.
Bandit's Mom
Did you hit the "Update preferences" button at the bottom of the screen? I just looked again and it still shows as "hidden".
Looking at your two errors, one is not well described, and in fact looks like a normal work unit. The latest has a fountain of error data to examine, but at first glance it does seem to be related to the screen saver issue, (-107 error) but the window that was in the foreground when the system terminated the work unit was Explorer. This could just be an artifact of the way windows uses explorer, but is there any chance you had left your browser loaded when the problem occurred?
My best guess right now is that if you set your screen saver to something other than BOINC or turn it off (from what I read you already did that) the problem may go away. If it does could you please let us know? We are trying to fix that particular issue, and the programmers thought they had it under control. We need to know if they do not.
____________
Moderator9
ROSETTA@home FAQ
Moderator Contact
It's not only possible, it's probable that IE was loaded. With the BOINC screensaver off, I'm not certain that I would be able to tell that there was a problem as quickly, but am willing to give it a go.
I'm going to switch my computer back to "hidden."
Thanks for your help.
Bandit's Mom
... but is there any chance you had left your browser loaded when the problem occurred?
My best guess right now is that if you set your screen saver to something other than BOINC or turn it off (from what I read you already did that) the problem may go away. If it does could you please let us know? We are trying to fix that particular issue, and the programmers thought they had it under control. We need to know if they do not.
____________
ID: 18324 | Rating: 0 | rate:
/
Moderator9 Forum moderator Project administrator Joined: Jan 22 06 Posts: 1014 ID: 53254 Credit: 0 RAC: 0
...I'm going to switch my computer back to "hidden."
Thanks for your help.
Bandit's Mom
...
While there is no requirement that you leave your computer visible, there is no real risk to you in doing os. It is almost impossible for people to assist you unless they can look at the work unit results for your computer if there is a problem. it is your choice however.
____________
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 18326 | Rating: 0 | rate:
/
Alan Roberts Joined: Jun 7 06 Posts: 56 ID: 93009 Credit: 1,210,119 RAC: 1,392
Hello Moderator9
I pulled a client error on Work Unit 19677012 (the result is at http://boinc.bakerlab.org/rosetta/result.php?resultid=23357465), with exit code -1. Client messages for the error began with, "rosetta not responding to screensaver, exiting"
Then again on WU 19684873 (result link http://boinc.bakerlab.org/rosetta/result.php?resultid=23366061). Exit code was again -1, client-side messages once again report, "rosetta notresponding to screensaver, exiting".
The first was a 5.16 client, the second was with a 5.22. I'm not sure the screensaver bug is completely dead yet?
Cheers,
Alan
____________
ID: 18327 | Rating: 0 | rate:
/
Moderator9 Forum moderator Project administrator Joined: Jan 22 06 Posts: 1014 ID: 53254 Credit: 0 RAC: 0
I have sent a message to the project team regarding the possibility that the screen saver may still be causing problems for a few of you. If you report a problem that seems related to the screen saver, please be certain to make your computers visible in your preferences to assist them in examining the problem. Also if you could provide a link to any workunit results that may have been produced during the time you noticed the problem that would be helpful.
For those of you who may not know how to create a link -
1) Go to the place you want the link to take the user and copy the "HTTP" address from the browser address field at the top of your browser window.
2) Open the post in which you wish to place the link and type "[_url=" (leaving out the "_" which I have added here to disable the command) at the point in the message where you want the link to appear.
3) Paste the link address after the "=" with no spaces and add a "]" to the end of the address.
4) Type any text you may want to identify the link.
5) Complete the link command by typing "[/url]".
So if I wanted to create a link to the Rosetta FAQs it would look like this -
(Except you would leave out the "_", which I added for display purposes)
Possible problem with a t299_CASP7 work unit (link to WU). Was a happy camper, then at step 370K+ on Model 6 my CPU dropped from 100% to nothing, and graphics display showed no progress. Didn't write down the stuck step number, sorry.
Suspend on that task released the waiting next, which drove the CPU back to full load. Suspended task #2 and resumed #1, but it still didn't seem to grab any CPU.
For lack of knowing any better (new user), I shutdown BOINC and restarted, which I think I understand to mean that the task resumes at the previous checkpoint (model boundary)? It is now running again.
What is accepted practice if it hangs up again, please? Do I wait for some watchdog abort, or manually abort it? I don't really care about credits, I'll take whatever action provides the best feedback about the failure. Thanks!
____________
What is accepted practice if it hangs up again, please? Do I wait for some watchdog abort, or manually abort it? I don't really care about credits, I'll take whatever action provides the best feedback about the failure. Thanks!
Wait and don't abort. It will finish after "completion" in maximum an hour. Rosetta waits for the watchdog to shut down. It was something introduced in 5.19 for better debugging but reported over at RALPH and supposedly fixed in 5.22. It is very good that you report this here.
If you happen to observe this again please check whether the graphics show 100% as well or something lower and make a screenshot from the graphics window in "idling" state.
____________
The Fatal Winows Error Bug is still with us, I'm afraid. wuid=19791659
Result ID 23483927
Name t309__CASP7_ABRELAX_SAVE_ALL_OUT_nohistag_hom001__661_7645_0
Workunit 19791659
Created 9 Jun 2006 11:23:04 UTC
Sent 9 Jun 2006 12:59:31 UTC
Received 10 Jun 2006 19:23:54 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 212252
Report deadline 16 Jun 2006 12:59:31 UTC
CPU time 28426.171875
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 1655106
</stderr_txt>
Validate state Invalid
Claimed credit 109.225790694355
Granted credit 0
application version 5.22
ID: 18405 | Rating: 0 | rate:
/
Rhiju Forum moderator Project administrator Project developer Project scientist Joined: Jan 8 06 Posts: 223 ID: 48256 Credit: 3,546 RAC: 0
Hi Alan:
Thanks for reporting. There seem to be numerous little issues with the screensaver, and we've been trying to track them down one-by-one over on the test project, ralph. But I haven't seen a lot of problems like the one you describe -- has it happened in previous work units before this double batch? I wonder if something went haywire with the core boinc application -- you may need to restart.
Hello Moderator9
I pulled a client error on Work Unit 19677012 (the result is at http://boinc.bakerlab.org/rosetta/result.php?resultid=23357465), with exit code -1. Client messages for the error began with, "rosetta not responding to screensaver, exiting"
Then again on WU 19684873 (result link http://boinc.bakerlab.org/rosetta/result.php?resultid=23366061). Exit code was again -1, client-side messages once again report, "rosetta notresponding to screensaver, exiting".
The first was a 5.16 client, the second was with a 5.22. I'm not sure the screensaver bug is completely dead yet?
Cheers,
Alan
____________
ID: 18409 | Rating: 0 | rate:
/
Rhiju Forum moderator Project administrator Project developer Project scientist Joined: Jan 8 06 Posts: 223 ID: 48256 Credit: 3,546 RAC: 0
Hi mmciastro... yes, we know its still there. You might be happy to know that the error -1073741811 (0xc000000d) is currently number one on our lists of things to kill. Its been the most common error for a while, but Rom only now has come up with a hypothesis for what the cause might be. He has just put in some extra debugging stuff on ralph to track it down -- maybe that will let us unravel this puzzle!
The Fatal Winows Error Bug is still with us, I'm afraid. wuid=19791659
Result ID 23483927
Name t309__CASP7_ABRELAX_SAVE_ALL_OUT_nohistag_hom001__661_7645_0
Workunit 19791659
Created 9 Jun 2006 11:23:04 UTC
Sent 9 Jun 2006 12:59:31 UTC
Received 10 Jun 2006 19:23:54 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 212252
Report deadline 16 Jun 2006 12:59:31 UTC
CPU time 28426.171875
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 1655106
</stderr_txt>
Validate state Invalid
Claimed credit 109.225790694355
Granted credit 0
application version 5.22
It's only a problem for certain video cards, .net, whatever it is. If regular users who get this, turn OFF the screensaver, they'll never see it until it's fixed. I'm in direct communication with Rom on this bug, just FYI. I happen to have a machine, that regularly gets this error (lucky me, and I guess, lucky Rosetta/Rom}.
tony
ID: 18411 | Rating: 0 | rate:
/
Alan Roberts Joined: Jun 7 06 Posts: 56 ID: 93009 Credit: 1,210,119 RAC: 1,392
Hi Alan:
Thanks for reporting. There seem to be numerous little issues with the screensaver, and we've been trying to track them down one-by-one over on the test project, ralph. But I haven't seen a lot of problems like the one you describe -- has it happened in previous work units before this double batch? I wonder if something went haywire with the core boinc application -- you may need to restart.
Rhiju,
Work units before the failures and after were completed based on a look at my results. I may have pulled a boinc restart somewhere in there ...
I'm pitching this as an employee contribution/team-effort project at one of my customer sites, and the three of us who are the test cases have been grabbing our volunteer minutes here and there getting our sample desktops running, to demonstrate safety (at least lack of harm and impact on the "real work") and stability.
When I saw the comments about screen saver issues on the forum and noticed my failures, I may have restarted boinc in a quick-and-dirty quest for a fix.
I set the test machines to not use the screen saver over the weekend, but if there is a better procedure for providing debugging information (i.e., "run the screensaver and do the following if you get another error"), please let me know.
Screensaver or possibly some other flavor of "completion" problem with a t299__CASP7 work unit.
I noticed lack of completion progress from home this morning. When I got to the office, the screensaver was sitting on "Model 9, Step 0."
Once I got past the screensaver, BOINC Manager was reporting the work unit as "Running" and "100%" for Progress. BOINC Manager would not display the graphics.
Shutdown of BOINC Manager seemed to take a long time, but it finally happened. I rebooted the system. Once BOINC Manager launched it reported the status on this work unit as completed, "Ready to Report" and started the next work unit.
I forced an update so the result would be available prior to my posting this report.
Keeping in mind that I'm a newbie and could easily be misinterpreting, the result seems to be referring to only 8 models, so the screensaver graphic's Model 9 reference doesn't make sense to me.
Screensaver or possibly some other flavor of "completion" problem with a t299__CASP7 work unit.
I noticed lack of completion progress from home this morning. When I got to the office, the screensaver was sitting on "Model 9, Step 0."
Once I got past the screensaver, BOINC Manager was reporting the work unit as "Running" and "100%" for Progress. BOINC Manager would not display the graphics.
Shutdown of BOINC Manager seemed to take a long time, but it finally happened. I rebooted the system. Once BOINC Manager launched it reported the status on this work unit as completed, "Ready to Report" and started the next work unit.
I forced an update so the result would be available prior to my posting this report.
Keeping in mind that I'm a newbie and could easily be misinterpreting, the result seems to be referring to only 8 models, so the screensaver graphic's Model 9 reference doesn't make sense to me.
This is related to a bug report on Ralph. The behaviour is exactly the same. It was supposed to be fixed in 5.22 obviously it is not.
____________
ID: 18473 | Rating: 0 | rate:
/
Craig Miller Joined: Jun 5 06 Posts: 1 ID: 89284 Credit: 116,933 RAC: 56
I am having a problem running Rosetta. I attach to Rosetta using BOINC manager, and receive the notice of a successfull attachment. When I look at BOINC manager it shows Rosetta running, while Einstein and SETI are suspended. But when I come back several hours later Rosseta is not present, either in Projects or Tasks. When I look at the messages they seem to show Rosetta being loaded and started, but then it ends with: Detaching from project, shown below.
-------------
11-Jun-06 12:41:20|rosetta@home|Starting task t314__CASP7_ABRELAX_SAVE_ALL_OUT_hom002__666_13970_1 using rosetta version 522
11-Jun-06 12:49:01||Contacting account manager at http://bam.boincstats.com/
11-Jun-06 12:49:03||Account manager: BAM Host-ID: 2098
11-Jun-06 12:49:03||Account manager contact succeeded
11-Jun-06 12:49:03|rosetta@home|Resetting project
11-Jun-06 12:49:04||Rescheduling CPU: exit_tasks
11-Jun-06 12:49:04|rosetta@home|Detaching from project
When I check with BAM, my resources are shown as Einstein, SETU, and Rosetta.
The below linked WU crunched for over 2 hours, and yet was stuck at 0.00%. I aborted the unit because it appeared to be completely hung up, and stopped crunching. I would be interested to know what the problem was...
The below linked WU has severe memory leakage...using >275Megs of CPU memory bringing the hosts commit charge to nearly 600Megs. WU was aborted by user.
When I check with BAM, my resources are shown as Einstein, SETU, and Rosetta.
What could be causing this problem?
Rosetta's servers were just upgraded to support BAM last week. But it looks like BAM did something, not Rosetta.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
Host list is unhidden, host is a Mac Mini Core Duo, 1,66Ghz, 2GB RAM. Please drop a short notice when I can hide my hosts again...
What's strange is: hitting the "show graphics" button a few minutes before worked perfectly, seems to be a random problem...
ID: 18525 | Rating: 0 | rate:
/
Billy Joined: May 29 06 Posts: 6 ID: 85245 Credit: 7,278 RAC: 0
I had a work unit processing at about 80% complete and it seemed to be going normally. I suspended the project (as well as Einstein and Seti) and quit Boinc. I shutdown the computer and restarted. When Boinc started again, it reported this work unit as complete and uploaded it. Either it was stuck before or isn't actually complete. I had a similar thing happen a couple of days ago and it also reported work units complete even though the completion times were unusually short.
This work unit looks very strange in the graphics. Parts of the protein do not appear to be connected to the rest of it. It's like some of its missing. The protein appears very small. I first noticed the problem when I saw that the work unit is only using 20Mb memory and ~70Mb virtual but has nearly 9 million page faults. Assuming it was a small protein I took a look at the graphics and noticed the disjointed parts. Any ideas? I'll let it continue for now (5 hours into an 8 hour work unit)
If you would like me to screen shot it please let me know the best dimensions for posting into the forum as I have nowhere to upload it to.
Cheers
Stwato
____________
ID: 18570 | Rating: 0 | rate:
/
Keith Akins Joined: Oct 22 05 Posts: 176 ID: 6022 Credit: 71,779 RAC: 0
After patting myself on the back for so many successful WU's, I get the following error on this unit:
6/12/2006 11:05:25 PM|rosetta@home|Unrecoverable error for result t306__CASP7_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_hom001__680_902_0 ( - exit code -1073741811 (0xc000000d))
This could be a v5.22, BOINC 5.4.9 or a conflict when checking mail with Mozilla Thunderbird.
Win XP Home Service Pack 2
Mozilla Firefox/Thunderbird combo.
Computers are visible and BOINC 5.4.9 should be debug reporting.
Ignore the Linux Computer as mine is a dual booter.
____________
This work unit looks very strange in the graphics. Parts of the protein do not appear to be connected to the rest of it. It's like some of its missing. The protein appears very small. I first noticed the problem when I saw that the work unit is only using 20Mb memory and ~70Mb virtual but has nearly 9 million page faults. Assuming it was a small protein I took a look at the graphics and noticed the disjointed parts. Any ideas? I'll let it continue for now (5 hours into an 8 hour work unit)
If you would like me to screen shot it please let me know the best dimensions for posting into the forum as I have nowhere to upload it to.
Cheers
Stwato
this is a known problem with a few of the processing techniques being used. Not all the work units are using the same processing approach. In some cases they are only looking at parts of the protein structure and that somehow affects the display.
____________
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
ID: 18580 | Rating: 0 | rate:
/
David E K Forum moderator Project administrator Project developer Project scientist Joined: Jul 1 05 Posts: 660 ID: 14 Credit: 838,217 RAC: 46
Tony, do you have JP's EMail address? The French guy who always needs new Rosetta .exe EMailed? Could you ask him to see if he can help with this post by French person on Q&A boards? The main parts that translated properly were that he's poor, alone and in a wheelchair. Perhaps JP can read between the lines better than the translation website.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
Tony, do you have JP's EMail address? The French guy who always needs new Rosetta .exe EMailed? Could you ask him to see if he can help with this post by French person on Q&A boards? The main parts that translated properly were that he's poor, alone and in a wheelchair. Perhaps JP can read between the lines better than the translation website.
<core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
X_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1437 res: 95 atom: 3 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1450 res: 95 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1438 res: 95 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1434 res: 95 atom: 1 has more than MAX_NB neighbors
*snipped a lot of lines*
allatom: 1567 res: 103 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1567 res: 103 atom: 4 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors
increase MAX_NB in param.cc
allatom: 1550 res: 103 atom: 5 has more than MAX_NB neighbors
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0061B538 read attempt to address 0x790C3DE3
Engaging BOINC Windows Runtime Debugger...
____________
ID: 18758 | Rating: 0 | rate:
/
Martin P. Joined: May 26 06 Posts: 38 ID: 84658 Credit: 162,945 RAC: 16
Problems with download of WUs: Either now work or heavily overcommitted.
I run SETI@Home, Einstein and Rosetta. Rosetta is set to 20%. The problem is, that once Rosetta has finished all WUs it never downloads any new WUs. Even when the long_term_debt is highly positive (e.g. 30,000 and bigger) it does not download any WUs. The only way to force download is to pause other projects, but in this case it downloads so many WUs that the computer is overcommitted for many days.
I currently run at "Contact server every 3 days". Even when setting this to 0.3 days before suspending the other projects and resetting it after the download it still downloads too many WUs.
This is what I tried:
1. Set "Contact server every 3 days" to 0.3 days.
2. Set SETI@Home and Einstein to "No new work"
3. Suspend SETI@Home and Einstein
4. Rosetta downloads some WUs
5. Set SETI@Home and Einstein to "Allow new work"
6. Restart SETI@Home and Einstein
7. Set "Contact server every xx days" back to 3 days.
8. Now Rosetta downloads even more WUs, which should not happen since SETI and Einstein are both active -> computer is overcommitted.
Is there a solution to this problem? Resetting long_term_dept to 0.0 on all projects does not help either.
The client errors are there because I have other projects running and therefore manually aborted these Work-Units so that the other project get their share as well. Otherwise Rosetta would have taken over my computers exclusively for several days.
I followed your advice and let it run for several days without any interfearance. I did not get any new work for 5 days but tonight it downloaded 30 work-units and is overcommitted again!
Obviously the scheduling of Rosetta does not work at all.
i was viewing the graphics window at the time it failed incase that makes a difference
____________
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
This WU created about three good models with energy minima between -200 and -300. then it failed to do more good models which each succeeding model completing within minutes and always the same energy minimum of about -30. Watching on the graphics showed a stretched protein where no folding was achieved. I "aborted" the model the soft way with 6 restarts of BOINC (to prevent sending out the same WU).
I watched such WU in the past. Perhaps there is a pattern.
____________
This WU created about three good models with energy minima between -200 and -300. then it failed to do more good models which each succeeding model completing within minutes and always the same energy minimum of about -30.
I for one have been HOPING to see WUs that would act like that. If you knew that a -300 was possible, and you are sitting at a -30, there are cases where it might be SMART to bail on this one and invest the time in pursuing something with more potential.
I don't know that this is what happened in your case, I'll leave that for the project team to assess. I just wanted to point out that it is the TYPE of thing that I think we'll see more of as the algorythm gets smarter.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
This WU created about three good models with energy minima between -200 and -300. then it failed to do more good models which each succeeding model completing within minutes and always the same energy minimum of about -30.
I for one have been HOPING to see WUs that would act like that. If you knew that a -300 was possible, and you are sitting at a -30, there are cases where it might be SMART to bail on this one and invest the time in pursuing something with more potential.
I don't know that this is what happened in your case, I'll leave that for the project team to assess. I just wanted to point out that it is the TYPE of thing that I think we'll see more of as the algorythm gets smarter.
I agree! Using previous result for "pruning" decision is an idea that for a long time crossed my mind. I'm a bit in chess engine programming and in these engines a lot of "pruning" is done in positions where one side is just too worse to have any chance of reaching the current score with any move. However in the case reported it was most certainly something different, since the models finished successively in a few minutes without really folding the protein (it was stretched in the graphics) and with always the same score. In the end I had over 150 models of which only three had not been "aborted".
For the past week or so I've been getting 2-3 crashes per day. The failed work units show up as "Compute Error" with no credit. Do I need to report this? Or will the appropriate party see these errors and be able to deal with them on their own?
Do I need to report this? Or will the appropriate party see these errors and be able to deal with them on their own?
It is "HELPFUL" if you report them. It gives the opportunity to ask you questions about your computing environment so they might learn more about the system that's seeing the failure. It is not "required".
Credit for failed WUs is issued once the daily credit run is made. You will see this when you display the WU details... not on the WU listing. Like this one for example.
It looks like most of them were ended by the "watchdog". One was a -107 error (which is something that's been under review for a while already).
The watchdog is trying to assure your computer doesn't get stuck in an unexpected loop on a work unit. If it notices no progress on a work unit in 5 restarts, then it ends it. Do you restart this computer frequently? Or have a number of other projects running in BOINC?
If you would, go to your General Preferences, and let us know what you have set for "Switch between applications every...minutes", and for "Leave applications in memory while preempted?". And is Rosetta your only BOINC project?
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
ID: 18934 | Rating: 0 | rate:
/
anders n Joined: Sep 19 05 Posts: 403 ID: 578 Credit: 537,904 RAC: 0
It looks like most of them were ended by the "watchdog". One was a -107 error (which is something that's been under review for a while already).
Correction, I misread that "watchdog is shutting down" message (again!). I keep thinking this message indicates that the watchdog is shutting down the WU, not just ending itself as a normal end of processing a WU.
Most of their errors were -107s.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
The watchdog is trying to assure your computer doesn't get stuck in an unexpected loop on a work unit. If it notices no progress on a work unit in 5 restarts, then it ends it. Do you restart this computer frequently? Or have a number of other projects running in BOINC?
If you would, go to your General Preferences, and let us know what you have set for "Switch between applications every...minutes", and for "Leave applications in memory while preempted?". And is Rosetta your only BOINC project?
I'll try to answer your questions here:
Machine is rarely restarted, once every 2-3 days.
This is the only project I have under BOINC. No other background/SETI type applications are installed.
I'm not sure where this "General Preferences" dialog is you're referring to. I don't see anything like this in BOINC.
I am an accomplished C++/Java/.NET developer w/ Visual Studio installed on this box if you need me to grab a stack trace, I'd be happy to next time!
I'm not sure where this "General Preferences" dialog is you're referring to. I don't see anything like this in BOINC.
Now that you are viewing this message board, click the "Participants" link in the heading of the screen. In the "Preferences" section, click the link for "view or edit" of General preferences. Any changes made there require BOINC to update to the project to take effect. This is done from the projects tab of BOINC, select Rosetta, then click the update button.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
In followup to Message ID 18855, as long as I don't have IE running, I don't seem to have any BOINC problems. If I leave IE on, I have intermittant BOINC crashes. For me, it does not seem to be the screensaver at this time.
Now that you are viewing this message board, click the "Participants" link in the heading of the screen. In the "Preferences" section, click the link for "view or edit" of General preferences. Any changes made there require BOINC to update to the project to take effect. This is done from the projects tab of BOINC, select Rosetta, then click the update button.
You didn't say what these 'should be' so I'm just reporting what they currently are and not changing anything:
work on batteries: no
work while in use: no
idle: 3 mins
hours: (no restrictions)
leave in memory: no
switch between: 60 mins
multiprocessors: 0 processors (although I have two of them!?)
use at most: 100 percent of CPU
When I restarted my computer I lost over an hour on this WU. It went back at restart to 0% after running about an hour on my fast Athlon 64 @2.44 GHz. Obvioulsy no checkpoint occured during this time. I know t296 is very big but no checkpoint within an hour is not good (since one hour is the default switch time of BOINC).
____________
rriggs, if you are actively using that computer much at all... your settings are preventing you from getting much work done. You see you've told BOINC to wait until you've not used your computer for 3 minutes before it runs. Then it starts running. When you return to use your computer, you've told it to remove the applications from memory, and so any work it has performed since the last checkpoint will be lost. Since Rosetta typically checkpoints no more than every 20 minutes, if you have left your computer for 15 minutes, then you've crunched for 12 minutes (after waiting for the 3 minute delay before it starts) and then when you use your computer again, you are throwing away the 12 minutes of work. And so you later have to redo that 12 minutes of work.
I don't know now agressively you intend to crunch. But you can preserve the work done (12 min. in my example) to continue on it later by setting the "leave in memory" setting to YES. You've got 2GB of memory, so that gives you plenty of room. Also, it just keeps it in virtual memory, not actually the physical memory of the machine. So, changing this setting will preserve these short work periods, and not impact your computer use. By keeping applications in memory, you would only lose bits of work when you actually turn off the computer.
Now, you also have a dual-core CPU. So you could be crunching 2 work units at the same time. But you have set BOINC to only use one. You can set the "On multiprocessors, use at most" setting to 2 and use both of them. I'm not positive what it does when you have that set to zero.
It would be further agressive to crunch while your computer is in use. I take it you've got 2GB of memory because you have some pretty intense applications to wish to use. So, your current setting of NOT working while your computer is in use should probably remain. But, just FYI, I run with half as much memory and run it all the time, and there is no noticeble effect on my running applications.
Having said all of that... your errors are mostly the -107 errors. Looks like you get either a -107 or a -1 about 10% of the time. I'm not sure, perhaps leaving in memory will reduce your chances of hitting the -107 errors. But otherwise, I don't believe the above will resolve the problem you are having with erroring work units. They are already working on a fix for the -107 errors. There are a number of people hitting that more frequently lately.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
I have t307__CASP7_ABRELAX_SAVE_ALL_OUT_BARCODE_hom001__714_20997_0 using rosetta version 5.22 and it has been running now for 24 hrs. It has been stuck on 100% for at least the last hour I have been watching it. Mem usage of Rosetta was 88M and id now 94M after 30 mins. Now 97M ans climbing.
CPU usage doesn't change when I suspend the task from the BOINC manager.
I have t307__CASP7_ABRELAX_SAVE_ALL_OUT_BARCODE_hom001__714_20997_0 using rosetta version 5.22 and it has been running now for 24 hrs. It has been stuck on 100% for at least the last hour I have been watching it. Mem usage of Rosetta was 88M and id now 94M after 30 mins. Now 97M ans climbing.
CPU usage doesn't change when I suspend the task from the BOINC manager.
rriggs, if you are actively using that computer much at all... your settings are preventing you from getting much work done. You see you've told BOINC to wait until you've not used your computer for 3 minutes before it runs. Then it starts running. When you return to use your computer, you've told it to remove the applications from memory, and so any work it has performed since the last checkpoint will be lost. Since Rosetta typically checkpoints no more than every 20 minutes, if you have left your computer for 15 minutes, then you've crunched for 12 minutes (after waiting for the 3 minute delay before it starts) and then when you use your computer again, you are throwing away the 12 minutes of work. And so you later have to redo that 12 minutes of work.
Now, you also have a dual-core CPU. So you could be crunching 2 work units at the same time. But you have set BOINC to only use one. You can set the "On multiprocessors, use at most" setting to 2 and use both of them. I'm not positive what it does when you have that set to zero.
I never even saw this page, let alone adjusted the settings so these are the defaults. Perhaps the setup process should either pick better defaults or bring this page to my attention so I would have found it sooner?
I guess when I installed I just picked Activity|Run Always and Activity|Network always available, so it has been running non-stop! This may potentially invalidate your hypothesis about why I'm getting -107 errors since the app is never leaving memory.
ps. It was crashed this morning when I came in, so I tried to debug it, but my machine locked up launching the debugger. I will try again tomorrow!
Perhaps the setup process should either pick better defaults or bring this page to my attention so I would have found it sooner?
My task manager has always shown them both crunching at 50% CPU.
What do you think?
OK, my apologies. As I said I wasn't certain what it does when CPUs is set to zero. So that isn't an issue. If you've got 2 WUs running at 50% CPU each then you are fully crunching... when your computer is not in use.
I would still suggest you set the leave in memory to YES. Save all the work done during coffee breaks and during meetings or conference calls or whatever pulls you away from the computer.
As for changing the setup process, unfortunately that is not something Rosetta could change. It would be changed by the BOINC folks. So you would have to take up that suggestion on the BOINC boards.
Every time your PC is idle for 3 minutes, BOINC will start crunching... then when you come back and use the computer, BOINC suspends... and removes from memory, that was the thought. ...except since you've not said to "run based on preferences"... it's actually running all the time, regardless of whether other applications are in use?
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
OK, my apologies. As I said I wasn't certain what it does when CPUs is set to zero. So that isn't an issue. If you've got 2 WUs running at 50% CPU each then you are fully crunching... when your computer is not in use.
Oops. I edited my post while you were replying. Please recheck it now!
Before leaving for work this morning I checked my Linux box. CPU was at 0%. The ps command showed the boinc and rosetta processes there but doing nothing. Looked like it had stopped just a short while after starting a new WU. I stoped and restarted boinc and the WU took off normally. It just finished and reported. Here's the WU:
I've just started crunching for Rosetta. I don't use any screensaver.
The same error have just occured on my Ralph but with 5.24 app.
What can I do to avoid those errors?
Lukasz, you've already done what you can (so far as I know). One of your results reported a lot of useful information that will help analyze the problem.
Your computer time is still helping the project, and you are still getting credit for all the time crunching, so do not be detoured. Running on Ralph records additional diagnostic information back to the project. Hopefully they can determine the root cause soon. I see you have 2 out of 5 of your WUs failed, and another that you aborted for some reason.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
I see you have 2 out of 5 of your WUs failed, and another that you aborted for some reason.
The reason is I thought that it's my computer's fault and I didn't want it to spoil more WUs. I have to admit that PC was overclocked a bit and it is very hot today, so I had to change some settings. But don't worry I'm far from being discouraged. I will crunch again for Rosetta soon:)
Before leaving for work this morning I checked my Linux box. CPU was at 0%. The ps command showed the boinc and rosetta processes there but doing nothing. Looked like it had stopped just a short while after starting a new WU. I stoped and restarted boinc and the WU took off normally. It just finished and reported. Here's the WU:
This might be a problem on my end. Came home from work to find the machine in the same state. Checked STDOUT from boinc (I redirect it to a file) and both this morning and this afternoon it complained about network problems. However, this afternoon restarting boinc didn't work. It was trying to download new work but the network problems were prevening it. Boinc kept shutting down. I also could not get out to the net in my web browser. So, I reset my router. It's either a problem with my router or the cable connection is messing up. Hard to tell which at this point but this past weekend the router was hung so bad I had to do a hard reset and reconfigure it from scratch. The cable company's network status page show some problems in some surrounding areas but not my particular area. Time to use that gift card from Best Buy!
I have been having similar problems. I cannot
leave Rosetta alone or it simply hangs. But if I
visit and hit "Update" every day then I get much better production.
Is this a problem with Rosetta or with Boinc. It is very frustrating.
I wish the statistics page had the start time and date of each
run along with the deadline.
I have (occassionally) the problem of stalled/hanging Rosettas (somewhere, not at 0% or 1% or 100% progress) already for ages, on Red Hat EL 4.1. Now using BCC 5.4.9, attached to 7 projects, Rosetta's share is ~20%. The computer is running for months betwen reboots, without graphics.
The symptoms are that Rosetta app seems to be running, but the CPU time does not increase. Recently I've noticed that even BCC is not able to run benchmarks, if this happens. IIRC previously if BCC was able to switch to aother app, it got 0 CPU cyces (because Rosetta was consuming all) and did not increment time. Usually the only way to overcome this problem was to manually restart BCC. This way the Rosettas were able to continue and finish. (Whether correctly? Now I can see a few (5) process exited with code 131 (0x83) messages since March in the logs.)
This time, a week ago I've made few snapshots of suspended rosetta 5.22' result t312__CASP7_JUMPRELAX_SAVE_ALL_OUT_BARCODE_hom010__711_1635_0 and reported them in the Rosetta WU's stall on RedHat Fedora thread. It is stuck at 28.80% (2:43:29 CPU time), maybe for a day already. I'll try to restart BCC, if something new will come into the files in it's slot/3/ dir. And then abort and report, it's now after deadline anyway...
Yes, it restarted happily, CPU time jumped from 2:43:29 to 1:43:29 and is incrementing, but progress stayed at 28.80% and does not move. Aborting...
<core_client_version>5.4.9</core_client_version>
<message>
aborted by user
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# random seed: 1940641
# cpu_run_time_pref: 21600
SIGSEGV: segmentation violation
Stack trace (14 frames):
[0x884cb9f]
[0x8864cfc]
[0x88cade8]
[0x8621564]
[0x87f229b]
[0x873b844]
[0x873d0af]
[0x85a95e9]
[0x85b190a]
[0x83d6c9f]
[0x86022d3]
[0x84740c8]
[0x88c41e4]
[0x8048111]
Exiting...
Graphics are disabled due to configuration...
# cpu_run_time_pref: 21600
ERROR:: Exit at: fragments.cc line:459
FILE_LOCK::unlock(): close failed.: Bad file descriptor
But if I visit and hit "Update" every day then I get much better production. Is this a problem with Rosetta or with Boinc.
BOINC is responsible to contact the projects that it needs to get work from. Performing an update wouldn't have much to do with a hung work unit. Are you saying to end up without work? Or are you saying that your existing WUs are not ending properly?
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
Pepo: I'm not clear how long you observed the running of the WU after restarting it. But the progress % does not change very frequently and this is normal. Here is some relevant information on the subject. Perhaps you are saying you let it run for over an hour with no progress... that would be another matter. But, if not, that portion of what you are describing is probably normal and does not require your intervention to abort.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com
Pepo: I'm not clear how long you observed the running of the WU after restarting it. But the progress % does not change very frequently and this is normal. Here is some relevant information on the subject. Perhaps you are saying you let it run for over an hour with no progress... that would be another matter. But, if not, that portion of what you are describing is probably normal and does not require your intervention to abort.
Yes, I've read the FAQ. If you look at the Rosetta WU's stall on RedHat Fedora thread I mentioned, the Rosetta was hung for at least more than a day, I could look into the logs to tell exactly.
I usually check the machine once in a day-two (because of Rosetta :-) and restart Boinc if this happens. And it is happening for long already. I'm pretty sure that for few months.