Report Problems with Rosetta Version 5.25

Message boards : Number crunching : Report Problems with Rosetta Version 5.25

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next

AuthorMessage
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 20456 - Posted: 18 Jul 2006, 11:32:09 UTC

ok, thanks, that's plausible too of course
ID: 20456 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RodEllery
Avatar

Send message
Joined: 17 Oct 05
Posts: 1
Credit: 22,181
RAC: 0
Message 20550 - Posted: 18 Jul 2006, 23:48:32 UTC

Just had about 8 of the following reported. Each one causes the Visual C Runtime to abort with message saying it was told to close in an unusual way.
Eeach one crashed within 90 seconds of startup and needed a click on the error box to flag the Computation error and move on to next WU.

Result ID 225319
Name t353_LOOPRELAX_hom006_S_00001_0001449_0_1030_13_0
Workunit 200825
Created 18 Jul 2006 23:09:41 UTC
Sent 18 Jul 2006 23:11:42 UTC
Received 18 Jul 2006 23:14:44 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status 3 (0x3)
Computer ID 913
Report deadline 22 Jul 2006 23:11:42 UTC
CPU time 30.784266
stderr out <core_client_version>5.4.9</core_client_version>
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# random seed: 2949918

</stderr_txt>


Validate state Invalid
Claimed credit 0.059213259129471
Granted credit 0
application version 5.25

Member of UK BOINC Team
ID: 20550 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 20609 - Posted: 19 Jul 2006, 9:20:41 UTC
Last modified: 19 Jul 2006, 9:21:05 UTC

http://ralph.bakerlab.org/workunit.php?wuid=200815

http://ralph.bakerlab.org/workunit.php?wuid=200827

As those who received the same WUs after you have this problem too, it looks like a workunit configuration error, maybe RALPH and Rosetta WUs got mixed on the server somehow.

Feet1st should look into it, it probably was an attempt to include the RALPH results into CASP7
ID: 20609 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 20639 - Posted: 19 Jul 2006, 14:47:56 UTC

Happy to help any way that I can, but I'm just a participant like everyone else. I got the same error this morning with a Ralph WU for what it is worth. I've reported on the Ralph boards. Also crunched about 90 seconds reported CPU, just like yours. Mine was also was a "T353 LOOPRELAX".

Don't believe it is possible for Ralph and Rosetta to mix their WUs. But we're all studying the same proteins. If you review the entirety of the WU name, the links Ananas provides take you to the very similar, but not identical WUs being tested on Ralph. The project team will look in to these failing WUs (as they always do), determine the cause and test on Ralph if needed.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 20639 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 20702 - Posted: 20 Jul 2006, 6:37:48 UTC

Oops, sorry, I you sometimes provide informations that made me think that you're part of the developers team :-) I doubt that you're mad about this misunderstanding ;-)
ID: 20702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 20732 - Posted: 20 Jul 2006, 16:06:30 UTC

Mod9 hasn't been around recently, so I've tried to step up a notch to try and help out. I really just read the available information, and then apply it to questions as they come up. Most posts have already been discussed elsewhere, so if you can keep up with the posts for a while, it only takes a month or so before you find you can post meaningful responses and references to quite a number of questions.

I sometimes confuse the issue by my use of phrases like "what we're working on". I simply mean all of us collectively.

I'm off topic, but wanted to point out that others could follow the same path and become very helpful to project operations. If we're (there I go again!) going to get to 150 TFLOPs, we're gonna need more than just machines.

I'm very pleased to see you feel my posts are informative. That's my whole goal.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 20732 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 20781 - Posted: 21 Jul 2006, 0:06:20 UTC

Many of us offer help when we can. Feet1st has been doing a really good job about getting to questions first, and posting a response lately.


As for problems:
I've been running F@H on my Athlon 64 x2 3800+ at work for at least the last year. It's running win2k pro SP4, and had 512Megs. I stopped F@H's two running services and put them in manual mode, and started up Boinc. It downloaded a few jobs, and started crunching. The system became incredibly slow (using 768megs on a 512Meg system will do that..) so I shut the system down, upgraded to 1 Gig, and turned the system back on. Everything was fine..

I got in the next day, and I couldn't use any tcp/ip apps. They all hung while trying to access the internet. After getting the critical work related items taken care of, I started Boinc again, and didn't have any problems accessing the internet up until I ran off at 10:30 or 11pm when Boinc was claiming it had about an hour to go before finishing the 24 hour WUs. I get in today, Boinc manager is still running. Cpu core 1 and 2 aren't being used at all. And the internet is inaccessible again.

Shut things down, reboot, and the internet is back. Boinc is able to start - and starts crunching away - with 15 mins to go on the 2ea 24 hour WUs that were supposed to have finished right after I left last night.

The WUs were finished and supposedly uploaded the results; although I haven't seen proof from the page listing my results for this system.

It's back to F@H. (Perhaps I'll try two of the Win98 machines for Rosetta, instead.)

Hardware: DFI nF4-DAGF motherboard; crucial Ballistix 3200 cl2 ram (2 ea 256Megs to start.. 2ea 512 Megs now). Using nForce Networking controller driver version 4.8.2.0 (and 3 version 1.x sub components).

Any explanation for why Boinc/Rosetta keeps killing my internet connection - plus kills itself off when I leave the shop? :) (No screensaver, no power saver mode, hibernation off, no shutting off any hardware if the system if the system doesn't see me using the keyboard/mouse.)


ID: 20781 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 20788 - Posted: 21 Jul 2006, 3:25:08 UTC - in response to Message 20781.  
Last modified: 21 Jul 2006, 3:25:54 UTC

The WUs were finished and supposedly uploaded the results; although I haven't seen proof from the page listing my results for this system.

I've noticed the WU list doesn't seem to reflect in your report until the WUs complete the "ready to report" stage with another project update. Is that what you mean?

Any explanation for why Boinc/Rosetta keeps killing my internet connection - plus kills itself off when I leave the shop? :) (No screensaver, no power saver mode, hibernation off, no shutting off any hardware if the system if the system doesn't see me using the keyboard/mouse.)

Same thing happens to me sometimes. I've always chalked it up to Windows TCP stack problems... COULD be BOINC I suppose. I many try setting me network hours on the PC I have the most problems with and see if this effects the frequncy of TCP probs.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 20788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 20798 - Posted: 21 Jul 2006, 6:34:22 UTC

I let it update again, and the credits instantly appeared after not being there when I checked just before running the Boinc manager and clicking update. (I leave Boinc and Rosetta alone on my single core home machine.)

There's only a few changes that were made to this system: disabled F@H's two services so cpu usage drops to 0. Start Boinc manager and watch Rosetta kick in, download WUs and use both cores. Swap ram.

If I don't have any problem by tomorrow, then Boinc and Rosetta are the only difference left between months of non stop 24/7 crunching with no tcp/ip stack problems - two days of non stop problems - and now.
ID: 20798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TCU Computer Science

Send message
Joined: 7 Dec 05
Posts: 28
Credit: 12,861,977
RAC: 0
Message 20832 - Posted: 21 Jul 2006, 15:50:52 UTC - in response to Message 20421.  

Another stuck work unit:

Mac OS X 10.4.7
BOINC 5.4.9
wuid=25113559

The Messages tab shows the entry

Thu Jul 20 08:28:59 2006|rosetta@home|Starting task FRA_t370_CASP7_hom001_4_t370_4_1g76A_IGNORE_THE_REST_46_1010_22_0 using rosetta version 525

followed by a few lines about uploading the previous result and reporting task completion. Then nothing for 24 hours.

The Tasks tab shows CPU Time stuck at 00:00:01

top command shows over 24 hours accumulated and rising.

Had to stop and restart BOINC.
ID: 20832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
calvin

Send message
Joined: 13 May 06
Posts: 3
Credit: 448,453
RAC: 0
Message 20896 - Posted: 22 Jul 2006, 17:42:43 UTC

Maybe a problem, but I am one retiring after 17000 credits on my one little old computer.
ID: 20896 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
calvin

Send message
Joined: 13 May 06
Posts: 3
Credit: 448,453
RAC: 0
Message 20897 - Posted: 22 Jul 2006, 17:50:21 UTC

I see its 22000 credits. I was proud to get my rac up to about 375 on my one little Dell. Then I had a power failure at my home for a few hours Jobs resumed but I now receive only about 65% as many credits when I complete the jobs. Don't understand how the credit thing works, understand they have no value, but I don't like the lower Rac. Therefore, I am finishing the jobs that have been downloaded, and will give my computer a rest. By the way no jobs were late. Carry on.
Any explanation?
ID: 20897 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 20904 - Posted: 22 Jul 2006, 18:31:07 UTC - in response to Message 20897.  

I see its 22000 credits. I was proud to get my rac up to about 375 on my one little Dell. Then I had a power failure at my home for a few hours Jobs resumed but I now receive only about 65% as many credits when I complete the jobs. Don't understand how the credit thing works, understand they have no value, but I don't like the lower Rac. Therefore, I am finishing the jobs that have been downloaded, and will give my computer a rest. By the way no jobs were late. Carry on.
Any explanation?



Hi cal

Try running benchmarks.

They update from time to time and if you have bad luck

they update when you are using the computer for something else.

Anders n


ID: 20904 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[DPC]Division_Brabant~OldButNotSoWise
Avatar

Send message
Joined: 23 Jan 06
Posts: 42
Credit: 371,797
RAC: 0
Message 21256 - Posted: 27 Jul 2006, 8:45:09 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=30090209

after 2 hours and something ,the client stops processing it (the other thread still crunching), waited 10 minutes and then aborted it.
ID: 21256 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 21268 - Posted: 27 Jul 2006, 15:32:19 UTC

Yoy=u are lucky my last 15 work units have errored out with the lonets havin 12 to 13 minutes. :(
ID: 21268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 21275 - Posted: 27 Jul 2006, 18:23:32 UTC

18 WUs per day (7 hosts) since 5.25 was released and no stuck or hung WUs.

Just thought you should hear the other side of the story.
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 21275 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mnb

Send message
Joined: 15 Dec 05
Posts: 51
Credit: 69,458
RAC: 0
Message 21352 - Posted: 28 Jul 2006, 22:49:42 UTC

Oh krap. My problems started again.

30091341
30173274
30195679
30388054
30389022

Anybody have any idea what's going on?
Last time my memtest was ok.


list of my results
ID: 21352 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 21354 - Posted: 28 Jul 2006, 23:18:09 UTC
Last modified: 28 Jul 2006, 23:21:57 UTC

Maybe the fan is dirty?

You had the problem before btw., look at this result :

https://boinc.bakerlab.org/rosetta/result.php?resultid=29948473

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 5.5.0


Dump Timestamp    : 07/26/06 18:30:50
# cpu_run_time_pref: 43200
# cpu_run_time_pref: 43200
# DONE ::     1 starting structures built        12 (nstruct) times
# This process generated     12 decoys from      12 attempts
#                             0 starting pdbs were skipped


(sorry, the double linefeeds are a forum script bug)

It crashed and then it restarted for some reason - with a valid result.

Virus scanner running on BOINC directory (should be excluded)?
Windows Indexing service (one of the worst things Microsoft ever had enabled by default)?
ID: 21354 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
kevint

Send message
Joined: 8 Oct 05
Posts: 84
Credit: 2,530,451
RAC: 0
Message 21355 - Posted: 29 Jul 2006, 0:31:13 UTC - in response to Message 21256.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=30090209

after 2 hours and something ,the client stops processing it (the other thread still crunching), waited 10 minutes and then aborted it.



Same problem here on a couple of my dual cores - seems to have started today or sometime yesterday.
One core keeps processing, the other WU stays at 0% - have not timed how long it hangs there. Seems when I restrt BOINC or abort the hung WU then the problem goes away.

Another thing I have noticed happening during the past few days is the WU's are taking longer to crunch. I have lots of VERY SLOW machines so I have my WU time set to one hour for these guys, in the past they would crunch for about 3-4 hourse with the 1 hour setting, now it looks like they may take 10-12 hours or longer to complete. I don't mind the longer WU, problem is I had my cache set for 4 days - many of these WU's will have to be aborted because they will not finish before deadline.

SETI.USA


ID: 21355 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 21377 - Posted: 29 Jul 2006, 13:14:46 UTC
Last modified: 29 Jul 2006, 13:16:12 UTC

It might be a workunit configuration problem too. One of my crashed results has been returned by a second box now and it had a 0xc0000005 error too :

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=25836810

Sometimes they survive the second attempt with a reduced target runtime.
ID: 21377 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 12 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.25



©2024 University of Washington
https://www.bakerlab.org