Message boards : Number crunching : Report Problems with Rosetta Version 5.25
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next
Author | Message |
---|---|
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
ok, thanks, that's plausible too of course |
RodEllery Send message Joined: 17 Oct 05 Posts: 1 Credit: 22,181 RAC: 0 |
Just had about 8 of the following reported. Each one causes the Visual C Runtime to abort with message saying it was told to close in an unusual way. Eeach one crashed within 90 seconds of startup and needed a click on the error box to flag the Computation error and move on to next WU. Result ID 225319 Name t353_LOOPRELAX_hom006_S_00001_0001449_0_1030_13_0 Workunit 200825 Created 18 Jul 2006 23:09:41 UTC Sent 18 Jul 2006 23:11:42 UTC Received 18 Jul 2006 23:14:44 UTC Server state Over Outcome Client error Client state Computing Exit status 3 (0x3) Computer ID 913 Report deadline 22 Jul 2006 23:11:42 UTC CPU time 30.784266 stderr out <core_client_version>5.4.9</core_client_version> <message> The system cannot find the path specified. (0x3) - exit code 3 (0x3) </message> <stderr_txt> # random seed: 2949918 </stderr_txt> Validate state Invalid Claimed credit 0.059213259129471 Granted credit 0 application version 5.25 Member of UK BOINC Team |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
http://ralph.bakerlab.org/workunit.php?wuid=200815 http://ralph.bakerlab.org/workunit.php?wuid=200827 As those who received the same WUs after you have this problem too, it looks like a workunit configuration error, maybe RALPH and Rosetta WUs got mixed on the server somehow. Feet1st should look into it, it probably was an attempt to include the RALPH results into CASP7 |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Happy to help any way that I can, but I'm just a participant like everyone else. I got the same error this morning with a Ralph WU for what it is worth. I've reported on the Ralph boards. Also crunched about 90 seconds reported CPU, just like yours. Mine was also was a "T353 LOOPRELAX". Don't believe it is possible for Ralph and Rosetta to mix their WUs. But we're all studying the same proteins. If you review the entirety of the WU name, the links Ananas provides take you to the very similar, but not identical WUs being tested on Ralph. The project team will look in to these failing WUs (as they always do), determine the cause and test on Ralph if needed. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
Oops, sorry, I you sometimes provide informations that made me think that you're part of the developers team :-) I doubt that you're mad about this misunderstanding ;-) |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Mod9 hasn't been around recently, so I've tried to step up a notch to try and help out. I really just read the available information, and then apply it to questions as they come up. Most posts have already been discussed elsewhere, so if you can keep up with the posts for a while, it only takes a month or so before you find you can post meaningful responses and references to quite a number of questions. I sometimes confuse the issue by my use of phrases like "what we're working on". I simply mean all of us collectively. I'm off topic, but wanted to point out that others could follow the same path and become very helpful to project operations. If we're (there I go again!) going to get to 150 TFLOPs, we're gonna need more than just machines. I'm very pleased to see you feel my posts are informative. That's my whole goal. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Many of us offer help when we can. Feet1st has been doing a really good job about getting to questions first, and posting a response lately. As for problems: I've been running F@H on my Athlon 64 x2 3800+ at work for at least the last year. It's running win2k pro SP4, and had 512Megs. I stopped F@H's two running services and put them in manual mode, and started up Boinc. It downloaded a few jobs, and started crunching. The system became incredibly slow (using 768megs on a 512Meg system will do that..) so I shut the system down, upgraded to 1 Gig, and turned the system back on. Everything was fine.. I got in the next day, and I couldn't use any tcp/ip apps. They all hung while trying to access the internet. After getting the critical work related items taken care of, I started Boinc again, and didn't have any problems accessing the internet up until I ran off at 10:30 or 11pm when Boinc was claiming it had about an hour to go before finishing the 24 hour WUs. I get in today, Boinc manager is still running. Cpu core 1 and 2 aren't being used at all. And the internet is inaccessible again. Shut things down, reboot, and the internet is back. Boinc is able to start - and starts crunching away - with 15 mins to go on the 2ea 24 hour WUs that were supposed to have finished right after I left last night. The WUs were finished and supposedly uploaded the results; although I haven't seen proof from the page listing my results for this system. It's back to F@H. (Perhaps I'll try two of the Win98 machines for Rosetta, instead.) Hardware: DFI nF4-DAGF motherboard; crucial Ballistix 3200 cl2 ram (2 ea 256Megs to start.. 2ea 512 Megs now). Using nForce Networking controller driver version 4.8.2.0 (and 3 version 1.x sub components). Any explanation for why Boinc/Rosetta keeps killing my internet connection - plus kills itself off when I leave the shop? :) (No screensaver, no power saver mode, hibernation off, no shutting off any hardware if the system if the system doesn't see me using the keyboard/mouse.) |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
The WUs were finished and supposedly uploaded the results; although I haven't seen proof from the page listing my results for this system. I've noticed the WU list doesn't seem to reflect in your report until the WUs complete the "ready to report" stage with another project update. Is that what you mean? Any explanation for why Boinc/Rosetta keeps killing my internet connection - plus kills itself off when I leave the shop? :) (No screensaver, no power saver mode, hibernation off, no shutting off any hardware if the system if the system doesn't see me using the keyboard/mouse.) Same thing happens to me sometimes. I've always chalked it up to Windows TCP stack problems... COULD be BOINC I suppose. I many try setting me network hours on the PC I have the most problems with and see if this effects the frequncy of TCP probs. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
I let it update again, and the credits instantly appeared after not being there when I checked just before running the Boinc manager and clicking update. (I leave Boinc and Rosetta alone on my single core home machine.) There's only a few changes that were made to this system: disabled F@H's two services so cpu usage drops to 0. Start Boinc manager and watch Rosetta kick in, download WUs and use both cores. Swap ram. If I don't have any problem by tomorrow, then Boinc and Rosetta are the only difference left between months of non stop 24/7 crunching with no tcp/ip stack problems - two days of non stop problems - and now. |
TCU Computer Science Send message Joined: 7 Dec 05 Posts: 28 Credit: 12,861,977 RAC: 0 |
Another stuck work unit: Mac OS X 10.4.7 BOINC 5.4.9 wuid=25113559 The Messages tab shows the entry Thu Jul 20 08:28:59 2006|rosetta@home|Starting task FRA_t370_CASP7_hom001_4_t370_4_1g76A_IGNORE_THE_REST_46_1010_22_0 using rosetta version 525 followed by a few lines about uploading the previous result and reporting task completion. Then nothing for 24 hours. The Tasks tab shows CPU Time stuck at 00:00:01 top command shows over 24 hours accumulated and rising. Had to stop and restart BOINC. |
calvin Send message Joined: 13 May 06 Posts: 3 Credit: 448,453 RAC: 0 |
Maybe a problem, but I am one retiring after 17000 credits on my one little old computer. |
calvin Send message Joined: 13 May 06 Posts: 3 Credit: 448,453 RAC: 0 |
I see its 22000 credits. I was proud to get my rac up to about 375 on my one little Dell. Then I had a power failure at my home for a few hours Jobs resumed but I now receive only about 65% as many credits when I complete the jobs. Don't understand how the credit thing works, understand they have no value, but I don't like the lower Rac. Therefore, I am finishing the jobs that have been downloaded, and will give my computer a rest. By the way no jobs were late. Carry on. Any explanation? |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
I see its 22000 credits. I was proud to get my rac up to about 375 on my one little Dell. Then I had a power failure at my home for a few hours Jobs resumed but I now receive only about 65% as many credits when I complete the jobs. Don't understand how the credit thing works, understand they have no value, but I don't like the lower Rac. Therefore, I am finishing the jobs that have been downloaded, and will give my computer a rest. By the way no jobs were late. Carry on. Hi cal Try running benchmarks. They update from time to time and if you have bad luck they update when you are using the computer for something else. Anders n |
[DPC]Division_Brabant~OldButNotSoWise Send message Joined: 23 Jan 06 Posts: 42 Credit: 371,797 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=30090209 after 2 hours and something ,the client stops processing it (the other thread still crunching), waited 10 minutes and then aborted it. |
Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0 |
Yoy=u are lucky my last 15 work units have errored out with the lonets havin 12 to 13 minutes. :( |
dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0 |
18 WUs per day (7 hosts) since 5.25 was released and no stuck or hung WUs. Just thought you should hear the other side of the story. dag --Finding aliens is cool, but understanding the structure of proteins is useful. |
mnb Send message Joined: 15 Dec 05 Posts: 51 Credit: 69,458 RAC: 0 |
|
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
Maybe the fan is dirty? You had the problem before btw., look at this result : https://boinc.bakerlab.org/rosetta/result.php?resultid=29948473 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 5.5.0 Dump Timestamp : 07/26/06 18:30:50 # cpu_run_time_pref: 43200 # cpu_run_time_pref: 43200 # DONE :: 1 starting structures built 12 (nstruct) times # This process generated 12 decoys from 12 attempts # 0 starting pdbs were skipped (sorry, the double linefeeds are a forum script bug) It crashed and then it restarted for some reason - with a valid result. Virus scanner running on BOINC directory (should be excluded)? Windows Indexing service (one of the worst things Microsoft ever had enabled by default)? |
kevint Send message Joined: 8 Oct 05 Posts: 84 Credit: 2,530,451 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=30090209 Same problem here on a couple of my dual cores - seems to have started today or sometime yesterday. One core keeps processing, the other WU stays at 0% - have not timed how long it hangs there. Seems when I restrt BOINC or abort the hung WU then the problem goes away. Another thing I have noticed happening during the past few days is the WU's are taking longer to crunch. I have lots of VERY SLOW machines so I have my WU time set to one hour for these guys, in the past they would crunch for about 3-4 hourse with the 1 hour setting, now it looks like they may take 10-12 hours or longer to complete. I don't mind the longer WU, problem is I had my cache set for 4 days - many of these WU's will have to be aborted because they will not finish before deadline. SETI.USA |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
It might be a workunit configuration problem too. One of my crashed results has been returned by a second box now and it had a 0xc0000005 error too : https://boinc.bakerlab.org/rosetta/workunit.php?wuid=25836810 Sometimes they survive the second attempt with a reduced target runtime. |
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.25
©2024 University of Washington
https://www.bakerlab.org