Posts by murky

1) Message boards : Number crunching : Dodgy wu's? (Message 35289)
Posted 22 Jan 2007 by murky
Post:
[quote]I have received 3 wu's with names like PSH_0134_looprlx_GP120_OD1_115_136_0694_1506_27, (the numbers vary but the general format is the same). All have crashed after about 20 seconds. All have been resent, and crashed at the second host as well.

The same situation here on one machine. & WU's like this:

PSH_0134_looprlx_GP120_OD1_115_136_0723_1506_14

They were reurned after 17 seconds!!!!
2) Message boards : Number crunching : Many Problems (Message 29117)
Posted 10 Oct 2006 by murky
Post:
murky[/quote]
If memtest runs without error I may give Rosetta another try. I'm having no problems on the other system

memtest86 ran for 21 hours before I shut it down. 0 errors
This system will go back to F@H for Team Helix and the pentium4 box will work on Rosetta.

Thanks again for the input.
murky
3) Message boards : Number crunching : Many Problems (Message 29082)
Posted 9 Oct 2006 by murky
Post:
Jack is the person who first setup the graphics/screensaver in the Rosetta@home program.

So it sounds like bad jobs (compilations) I think it maybe something about not including the manifest file, well thats what a search on google say. Since I also get the side by side errors on my computers.. Though I've not had any bad work ?


Thanks to everyone for their input. Prime95 ran with no errors for 17 hours and that is good enough for me :) I have just started memtest86 from a bootable CD and will give it a long run of all tests as this is not a conditioning exercise. I will look further into this SideBySide, version 5.2,
Symbolic Name:"MSG_SXS_Function_Call_Fail
If memtest runs without error I may give Rosetta another try. I'm having no problems on the other system
murky
4) Message boards : Number crunching : Many Problems (Message 29062)
Posted 8 Oct 2006 by murky
Post:
[quote]BennyRop:

A second possibility is software problems. Does HiJackThis! show any programs running that you don't recognize? The error log from the 386 second error result showed a number of applications running from the C:cygwin directory and from an F: drive. Error result (Can you temporarily disable them?)

BennyRop: with regard to the c:cygwin...etc from the resultid=41043803......
Looking through all that information, I was able to determine that it not from my C drive! nor is the reference to F drive. Those are at the Baker Labs! There is a reference to a "jack schonbrun" I Googled that name and it is assocoiated with Baker Labs and Rosetta. There is a reference to f:rtmvctoolscrt_bld: now I am starting to wonder if this is a part of my problem. I studied my Windows event logs and have 3 occurrences of errors.Quote: Source: Side by Side, Type: error Event ID:59
Resolve Partial Assembly failed for Microsoft VC80CRT.(I see a reference to vctools and crt in the f: directory above)
Continuing:
Resolve Partial Assembly failed for Microsoft VC80CRT:Reference error message. The referenced assembly is not installed on your system
Explanation: A component or manifest could not be activated.
Possible causes include: The component or manifest depends on another program or a component is not installed.
The manifest contains XML content that is not valid.
The user does not have the correct permissions.
I may be way out in left field but I thinks there is some connection to the errors in the event log and the failed tasks.
But this is way over my head :)
Regards....murky
5) Message boards : Number crunching : Many Problems (Message 29059)
Posted 8 Oct 2006 by murky
Post:
[quote]The BSOD on it's own would say there was a problem. This (the Page_Fault error) could well be bad memory OR a bad harddrive. (among many many oteer things ;-(

FluffyChicken, dcdc and BennyRop: thanks

I will try to address all the advice in one reply :)
I am running Prime95 now....2 1/2 hours.
I will run checkdsk after I stop Prime95, reseat the memory etc.
I thought that there might be a few bad WUs but I don't see very many people remarking on this so I assume my system has a problem.
I was using the consul version for F@H, adv methods and large work units.
I do not use the software to overclock....ASUS AI was uninstalled after I uninstalled BOINC but was only in the Start menu and was always closed.
Thanks for the link to Western Digital. I will definitely check that out after this run of Prime95.
This system has been dedicated to F@H since I built it in early spring. No surfing, no email, no virus, or adware. Using a firewall behind a router.
I will have to look into the cygwin directory and the F drive reference.
At this time I can not find a cygwin directory on either PC.
There are only 3 drives: C, D: (CD-RW) and E: (CD- read only)
I have to get back to Talladega for the end of the race :)
murky
6) Message boards : Number crunching : Many Problems (Message 29049)
Posted 8 Oct 2006 by murky
Post:
http://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=313507
;-)

As a temporary measure you could reduce your run times so that you lose less work if it does error out on you.


Thanks for the heads up on the computer id url.
In that vein I will provide a link to the 4 Tasks that errored out.

http://boinc.bakerlab.org/rosetta/result.php?resultid=41043803
http://boinc.bakerlab.org/rosetta/result.php?resultid=40834653
http://boinc.bakerlab.org/rosetta/result.php?resultid=40668898
http://boinc.bakerlab.org/rosetta/result.php?resultid=40606247
If someone can find a common thread in these that would indicate why I'm having these problems that would be great.
Thanks.....murky / Bob


7) Message boards : Number crunching : Many Problems (Message 29047)
Posted 8 Oct 2006 by murky
Post:
I have the asus 6200 (cheap) Pcix video card. Have you tried turning off the graphics? I see the 0x1 error and one was terminated by the watchdog timer.

Thanks for all your thoughts on this problem. The current task has just over an hour to completion so I'll run the diagnostics for the next 36 hours or so. I occasionally turn on the graphics to see what is happening but seldom for more than a couple of minutes. The card is also a low priced PCI-Express using the NVidia GeForce 7300GS chipset. The CPU runtime is currently 8 hours and some tasks complete without a glitch and some don't. Memory was at 2.6 volts and raised it to 2.75v but there was no change in stability. Is there a way I can provide a link to the work this PC has done (its id is 313507) and gain a little more insight? I will see what happens after I try memtest and prime. If they show stability I will give Rosetta another try.
Thanks....murky / Bob
8) Message boards : Number crunching : Many Problems (Message 29038)
Posted 8 Oct 2006 by murky
Post:
[quote]Eek, I'm not sure what to say.
Thanks for the reply.
The bios settings for the memory are the Auto settings if I was running the memory at a FSB of 263. If I use auto at 200 MHZ it sets them faster to 2.5, 3, 3, 7.
I have raised the RAM voltage and CPU voltage....all the OC'ing tricks but without success. As I stated Prime95 and memtest86 were solid when the box was built and with an FSB of 220 MHZ. I will run both tests again for at least 24 hours when the current task finishes. Its at 70% now.
murky / Bob

From the BSOD errors, I'd have to guess your bios setting are good enough for general work, but when tasked by rosetta your mem or something goof up. Try setting bios timings back to auto. have you run memtest86+ and Prime95?

tony

9) Message boards : Number crunching : Many Problems (Message 29036)
Posted 8 Oct 2006 by murky
Post:
It appears that I will have to cease work on Rosetta tasks as the PC descibed below is not doing well with Rosetta. This project is the only work assigned to the PC.Rosetts is version 5.25 and BOINC is ver 5.4.11
These are the messages that I was able to retrieve and don’t know if this is of any help to diagnose the reason this PC is causing problems.If any other information should be provided to try to resolve these problems please advise me.

This PC is an AMD Athlon 64 3700+
NOT OVERCLOCKED
Motherboard is ASUS A8N-E
RAM is 1 Gigabyte OCZ PC4200 (263 MHz) running at 200 MHz
I am using very relaxed timings of 2.5, 4, 4, and 8.
The hard drive is a Western Digital Raptor (SATA)
Processor temperature is around 43 degrees C.
There have also been several instances of BSOD .

One being “ Page_Fault_IN_Nonpaged_Area”
Stop: 0x00000050
Win32.sys – address BF8028A7 base at BF800000 Datestamp43446a58
There have been several instances of the system locking up and having to reboot to recover.
This PC has worked on Folding at Home for several months with no problems.
I have set BOINC to not take any new work and will run memtest86 and Prime95 to see if they indicate problems. None had been indicated by these programs when the PC was first built and run at at a FSB of 220 MHZ.
I see no point in messing up the project with Client and Compute errors. My Intel Pentium 4 at 2.4 GHz is working well / stable.

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0086009A write attempt to address 0x11B8E760

Engaging BOINC Windows Runtime Debugger

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0060AA0A read attempt to address 0x8A34AED4

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004C9C46 read attempt to address 0xA15A7174

stderr out
<core_client_version>5.4.11</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 3084267
# cpu_run_time_pref: 28800
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score 952.206 for 3600 seconds
**********************************************************************
GZIP SILENT FILE: .xx1vie.out
# cpu_run_time_pref: 28800
ERROR:: Exit at: .initialize.cc line:1618






©2024 University of Washington
https://www.bakerlab.org