Unrecoverable error

Message boards : Number crunching : Unrecoverable error

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile UBT - Halifax--lad
Avatar

Send message
Joined: 17 Sep 05
Posts: 157
Credit: 2,687
RAC: 0
Message 4627 - Posted: 28 Nov 2005, 21:59:31 UTC

Have just altered my settings to tell BOINC to switch every 10 hrs that way I can still crunch Rosetta without and errors when switching, it is currently impossible for me to keep in memory.

hope this problem with checkpointing/switching is found soon
Join us in Chat (see the forum) Click the Sig


Join UBT
ID: 4627 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 4634 - Posted: 28 Nov 2005, 23:10:36 UTC - in response to Message 4627.  

Have just altered my settings to tell BOINC to switch every 10 hrs that way I can still crunch Rosetta without and errors when switching, it is currently impossible for me to keep in memory.

hope this problem with checkpointing/switching is found soon


Why is it not possible to keep them in memory? You're running Win XP, so within a few moments of being suspended, the WU will be flushed from physical ram (chip) to the swap file, and left there for the duration.

Are you short of swap space or something?
ID: 4634 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile UBT - Halifax--lad
Avatar

Send message
Joined: 17 Sep 05
Posts: 157
Credit: 2,687
RAC: 0
Message 4654 - Posted: 29 Nov 2005, 7:24:00 UTC - in response to Message 4634.  

Why is it not possible to keep them in memory? You're running Win XP, so within a few moments of being suspended, the WU will be flushed from physical ram (chip) to the swap file, and left there for the duration.

Are you short of swap space or something?


I don't run XP at all I have Win 2000, I also run many multiple projects, there wouldn't be enough memory available to hold all of the projects in memory
Join us in Chat (see the forum) Click the Sig


Join UBT
ID: 4654 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 4663 - Posted: 29 Nov 2005, 9:55:52 UTC - in response to Message 4654.  

Why is it not possible to keep them in memory? You're running Win XP, so within a few moments of being suspended, the WU will be flushed from physical ram (chip) to the swap file, and left there for the duration.

Are you short of swap space or something?


I don't run XP at all I have Win 2000, I also run many multiple projects, there wouldn't be enough memory available to hold all of the projects in memory


You still don't get it, do you?

PAUSED APPLICATIONS CONSUME SWAP SPACE, THEY DO NOT CONSUME PHYSICAL RAM!



That picture shows what happened when I purposely "overloaded" this system. Note. Windows 98SE and 512 Mb of ram.

As Process Explorer shows in that image, I have paused in ram the following assortment: 3 rosetta WU's, 2 predictor WU's, one Einstein, I'm not sure who the Sixtrack belongs to (WCG?) and 7 seti at homes.

Not a bad haul, you will admit. System monitor explains how I can do this, and could keep this up indefinitely. Ignore the red graph, the blue is the one that counts, it's swap file in use, i.e. data that has been removed from chip and placed on disk in the swap file. That's going up at a pretty good rate, because guess what. The good old vmem system is hauling stuff out of ram and parking on disk. The yellow one at the bottom is available physical, which is (of course) pretty much at zero.

OK, so the bottom line is this. Even though you only have 256 Meg of ram in that system of yours, it'll still do the same thing. As apps get paused, they'll page out to disk. Win98 can do it, so Win 2K sure as hell can do it too.

Have you actually tried setting that switch on your config page?
ID: 4663 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile FZB

Send message
Joined: 17 Sep 05
Posts: 84
Credit: 4,899,261
RAC: 738
Message 4666 - Posted: 29 Nov 2005, 10:57:52 UTC - in response to Message 4663.  

As Process Explorer shows in that image, I have paused in ram the following assortment: 3 rosetta WU's, 2 predictor WU's, one Einstein, I'm not sure who the Sixtrack belongs to (WCG?) and 7 seti at homes.


sixtrack belongs to LHC@home if you run that, sixtrack is also used in some benchmarks.

--
Florian
www.domplatz1.de
ID: 4666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile UBT - Halifax--lad
Avatar

Send message
Joined: 17 Sep 05
Posts: 157
Credit: 2,687
RAC: 0
Message 4719 - Posted: 29 Nov 2005, 19:43:24 UTC

Dgnuff I know perfectly well how BOINC works and I know what swap space is so there is no need to stick something in Bold and CAPS to me
Join us in Chat (see the forum) Click the Sig


Join UBT
ID: 4719 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Spectre
Avatar

Send message
Joined: 1 Nov 05
Posts: 20
Credit: 177,671
RAC: 0
Message 4744 - Posted: 29 Nov 2005, 23:33:22 UTC

@Bill:

Tried everything, but still getting one error after another with workunits. Switched to LHC and have done 13 workunits now with NO errors at all....will go another dozen and switch back to Rosetta and see what happens.

Thanks,
Spectre



ID: 4744 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 4757 - Posted: 30 Nov 2005, 3:14:29 UTC - in response to Message 4719.  
Last modified: 30 Nov 2005, 3:26:20 UTC

Dgnuff I know perfectly well how BOINC works and I know what swap space is so there is no need to stick something in Bold and CAPS to me


You also state:

I don't run XP at all I have Win 2000, I also run many multiple projects, there wouldn't be enough memory available to hold all of the projects in memory


Of exactly what sort of memory do you not have enough available?

Can't be chip, because when you run out of chip, we both know that it winds up on disk. See my overload job for an example of this in action. Do you have a small hard disk that is limiting swap space?

ID: 4757 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile UBT - Halifax--lad
Avatar

Send message
Joined: 17 Sep 05
Posts: 157
Credit: 2,687
RAC: 0
Message 4775 - Posted: 30 Nov 2005, 9:57:01 UTC - in response to Message 4757.  

It just doesn't work when left in Memory I know that as had held in memory a long time ago and the computer just sulks, my easy option is to switch every 10hrs all the WU's are getting done with no errors and all get done before there deadline on this setting so thats not an issue with me, when I finally upgrade my comp I will leave in memory but for now I won't
Join us in Chat (see the forum) Click the Sig


Join UBT
ID: 4775 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hob.
Avatar

Send message
Joined: 4 Nov 05
Posts: 64
Credit: 250,683
RAC: 0
Message 5049 - Posted: 3 Dec 2005, 18:36:17 UTC

boink ver 5.6.2

rosetta ver 4.80

i have been getting these errors too, on one of 5 machines running rosetta......the other 4 are not having problems. i have now stopped rosetta and restarted FaD (which runs ok on this machine)

so far 18 of 27 jobs have failed with this error :-

03/12/2005 15:43:00|rosetta@home|Temporarily failed upload of 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_49156_0_0: can't resolve hostname
03/12/2005 15:43:00|rosetta@home|Backing off 2 hours, 20 minutes, and 55 seconds on upload of file 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_49156_0_0
03/12/2005 15:52:52|rosetta@home|Started upload of 1n0u__abrelaxmode_random_length05_jitter02_46340_1_0
03/12/2005 15:53:09||Couldn't resolve hostname [boinc.bakerlab.org]
03/12/2005 15:53:10|rosetta@home|Temporarily failed upload of 1n0u__abrelaxmode_random_length05_jitter02_46340_1_0: can't resolve hostname
03/12/2005 15:53:10|rosetta@home|Backing off 1 hours, 14 minutes, and 19 seconds on upload of file 1n0u__abrelaxmode_random_length05_jitter02_46340_1_0
03/12/2005 16:07:16|rosetta@home|Unrecoverable error for result 1di2__abrelax_rand_len10_jit02_omega_sim_filters_63326_0 ( - exit code -1073741819 (0xc0000005))
03/12/2005 16:07:16|rosetta@home|Too many backoffs - fetching master file
03/12/2005 16:07:16||request_reschedule_cpus: process exited
03/12/2005 16:07:16|rosetta@home|Deferring communication with project for 13 hours, 42 minutes, and 34 seconds
03/12/2005 16:07:16|rosetta@home|Computation for result 1di2__abrelax_rand_len10_jit02_omega_sim_filters_63326_0 finished
03/12/2005 16:07:16|rosetta@home|Starting result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_63399_0 using rosetta version 480
03/12/2005 16:23:59|rosetta@home|Unrecoverable error for result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_63399_0 ( - exit code -1073741819 (0xc0000005))
03/12/2005 16:23:59|rosetta@home|Too many backoffs - fetching master file
03/12/2005 16:23:59||request_reschedule_cpus: process exited
03/12/2005 16:23:59|rosetta@home|Computation for result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_63399_0 finished
03/12/2005 16:23:59|rosetta@home|Starting result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_63333_0 using rosetta version 480
03/12/2005 16:24:03|rosetta@home|Deferring communication with project for 13 hours, 25 minutes, and 47 seconds
03/12/2005 16:45:24|rosetta@home|Unrecoverable error for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_63333_0 ( - exit code -1073741819 (0xc0000005))
03/12/2005 16:45:24|rosetta@home|Too many backoffs - fetching master file
03/12/2005 16:45:24||request_reschedule_cpus: process exited
03/12/2005 16:45:24|rosetta@home|Deferring communication with project for 13 hours, 4 minutes, and 26 seconds
03/12/2005 16:45:24|rosetta@home|Computation for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_63333_0 finished
03/12/2005 16:45:24|rosetta@home|Starting result 1dcj__abrelax_rand_len10_jit02_omega_sim_filters_63289_0 using rosetta version 480
03/12/2005 17:07:30|rosetta@home|Started upload of 1n0u__abrelaxmode_random_length05_jitter02_46340_1_0
03/12/2005 17:07:47||Couldn't resolve hostname [boinc.bakerlab.org]
46 years dc so far

join team FaDbeens
join us

ID: 5049 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 575
Credit: 4,496,233
RAC: 702
Message 5062 - Posted: 3 Dec 2005, 22:31:29 UTC - in response to Message 5049.  

boink ver 5.6.2

rosetta ver 4.80

i have been getting these errors too, on one of 5 machines running rosetta......the other 4 are not having problems. i have now stopped rosetta and restarted FaD (which runs ok on this machine)


You don't say if you have "Leave in memory" set to "Yes", but assuming you do, I can only throw in a few things. V5.6.2 isn't a valid BOINC version. The current release is V5.2.13, and I would definitely recommend anyone having connection problems upgrade to it.

You're currently seeing two problems, a failure to reliably connect with the project, and the results erroring out. I would address one of those at a time, _unless_ you are overclocked. If you are, I would definitely do some testing of temperatures and memory stability. (Actually, checking temps would be a good idea even if not overclocked...) Never having run FaD, I have no idea how sensitive it is to stability issues. I know the various BOINC projects have varying sensitivity, with Rosetta and SETI being at the top of the list. Almost any 'glitch' will cause problems for them, as they rely on extreme accuracy of the calculations.

ID: 5062 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hob.
Avatar

Send message
Joined: 4 Nov 05
Posts: 64
Credit: 250,683
RAC: 0
Message 5063 - Posted: 3 Dec 2005, 22:59:42 UTC - in response to Message 5062.  

boink ver 5.6.2

rosetta ver 4.80

i have been getting these errors too, on one of 5 machines running rosetta......the other 4 are not having problems. i have now stopped rosetta and restarted FaD (which runs ok on this machine)


You don't say if you have "Leave in memory" set to "Yes", but assuming you do, I can only throw in a few things. V5.6.2 isn't a valid BOINC version. The current release is V5.2.13, and I would definitely recommend anyone having connection problems upgrade to it.

You're currently seeing two problems, a failure to reliably connect with the project, and the results erroring out. I would address one of those at a time, _unless_ you are overclocked. If you are, I would definitely do some testing of temperatures and memory stability. (Actually, checking temps would be a good idea even if not overclocked...) Never having run FaD, I have no idea how sensitive it is to stability issues. I know the various BOINC projects have varying sensitivity, with Rosetta and SETI being at the top of the list. Almost any 'glitch' will cause problems for them, as they rely on extreme accuracy of the calculations.


if i go to "help/about" in boink manager it tells me i have ver 5.2.6

quote ..."failure to connect to the project"

thats because i only have a 52k modem and an off peak package...so i can't stay "always on"

it is not overclocked ...however cooling it is a problem as it's the top unit in this rack



"Leave in memory" set to Yes

windows xp pro service pack 1
46 years dc so far

join team FaDbeens
join us

ID: 5063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Morphy375
Avatar

Send message
Joined: 2 Nov 05
Posts: 86
Credit: 1,629,758
RAC: 0
Message 5064 - Posted: 3 Dec 2005, 23:41:43 UTC

"Couldn't resolve hostname [boinc.bakerlab.org]"

No nameserver available. DNS misconfigured?
Teddies....
ID: 5064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Vester
Avatar

Send message
Joined: 2 Nov 05
Posts: 257
Credit: 3,282,306
RAC: 14,727
Message 5067 - Posted: 4 Dec 2005, 0:03:42 UTC
Last modified: 4 Dec 2005, 0:04:10 UTC

Hob, the current BOINC client is version 5.2.13. It helped me. Download from this page.
ID: 5067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Plum Ugly

Send message
Joined: 3 Nov 05
Posts: 24
Credit: 2,005,763
RAC: 0
Message 5078 - Posted: 4 Dec 2005, 3:17:46 UTC
Last modified: 4 Dec 2005, 3:19:22 UTC

Seems I'm getting nothing but errors too.Changed the drive,cpu,memory everything but the mother board to see if this changed anything.No change.Not over clocked and running the newest version.

2005-12-03 21:10:18 [rosetta@home] Unrecoverable error for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_82016_0 ( - exit code -164 (0xffffff5c))
2005-12-03 21:10:18 [---] request_reschedule_cpus: process exited
2005-12-03 21:10:18 [rosetta@home] Computation for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_82016_0 finished
2005-12-03 21:10:18 [rosetta@home] Starting result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_82056_0 using rosetta version 480

2005-12-03 21:00:51 [rosetta@home] Unrecoverable error for result 1di2__abrelax_rand_len10_jit02_omega_sim_filters_82001_0 ( - exit code -1073741819 (0xc0000005))
2005-12-03 21:03:53 [rosetta@home] Unrecoverable error for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_54889_1 ( - exit code -164 (0xffffff5c))
2005-12-03 21:04:53 [rosetta@home] Unrecoverable error for result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_82041_0 ( - exit code -1073741819 (0xc0000005))
2005-12-03 21:06:03 [rosetta@home] Unrecoverable error for result 1ogw__abrelax_rand_len10_jit02_omega_sim_filters_82048_0 ( - exit code -1073741819 (0xc0000005))
2005-12-03 21:10:18 [rosetta@home] Unrecoverable error for result 1dtj__abrelax_rand_len10_jit02_omega_sim_filters_82016_0 ( - exit code -164 (0xffffff5c))

from the results page;

core_client_version>5.2.13</core_client_version>
<message> - exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# =====================================
# random seed: 1518321
# =====================================

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x77F51D24 write attempt to address 0x00000000

1: 12/03/05 21:06:03



</stderr_txt>
ID: 5078 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Vester
Avatar

Send message
Joined: 2 Nov 05
Posts: 257
Credit: 3,282,306
RAC: 14,727
Message 5083 - Posted: 4 Dec 2005, 4:25:50 UTC

I would check to see that Windows XP settings for cache are "system managed size" or cache is liberally allocated, about 1.5 GB. Also, run cache in the partition on which the OS is installed unless you can run it on a secondary hard drive (optimal).



Tank once commented:
I have to agree with THINK about stress causing these access violations. I have found that a hastily assembled machine is much more likely to produce these errors. It may just be down to the order in which drivers are loaded during setup. I am not convinced that this accounts for all of the occurrences but a rebuild and reinstall usually sorts things out. IMHO.

They were referring to heat stress.
ID: 5083 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dogbytes
Avatar

Send message
Joined: 4 Dec 05
Posts: 37
Credit: 207,563
RAC: 0
Message 5321 - Posted: 6 Dec 2005, 22:49:14 UTC - in response to Message 3843.  

Looks like my Rosetta WU crunches are erroring out since Nov 19th.

Mac G5 dual OSX 10.39, with BOINC Superbench menubar 4.44


I've got a PowerMac G5 2.5 running OS10.4.3 and I got the same thing with the Super Bench Manager.
I uninstalled that client and downloaded the current OS X client and I'm getting the same thing. The WU keeps crunching and crunching for 7 or more hours then fails with client error.
<core_client_version>4.44</core_client_version> <message>Maximum CPU time exceeded </message> <stderr_txt> # ===================================== # random seed: 824341 # ===================================== </stderr_txt>

If I cant get this fixed, I'll have to move this hosts to another project.
ID: 5321 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dogbytes
Avatar

Send message
Joined: 4 Dec 05
Posts: 37
Credit: 207,563
RAC: 0
Message 5322 - Posted: 6 Dec 2005, 23:02:58 UTC
Last modified: 6 Dec 2005, 23:06:36 UTC

I just joined Rosetta a few days ago. My PC's are OK with R&H but my PowerMac G5 2.5 runs the WU's for about the same time as a 600MHz Celeron, then error out. I was using a superbench mark 4.44 client. I then uninstalled it, trashing numerous Seti work units which were hung up with Berkeley's servers BS, and installed the current Mac client. The same thing it happening.

I don't do mixed projects; I crunch only one project exclusively at a time. If I can't get this fixed soon, I'll migrate all my hosts to some other project. My Mac is my pride and joy, if it can't be in on the project, nothing will. I was looking forward to the change from Seti, I'd heard a lot of good things about this project, but this is not working.
ID: 5322 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Tern
Avatar

Send message
Joined: 25 Oct 05
Posts: 575
Credit: 4,496,233
RAC: 702
Message 5327 - Posted: 6 Dec 2005, 23:56:33 UTC - in response to Message 5322.  

I don't do mixed projects; I crunch only one project exclusively at a time. If I can't get this fixed soon, I'll migrate all my hosts to some other project. My Mac is my pride and joy, if it can't be in on the project, nothing will. I was looking forward to the change from Seti, I'd heard a lot of good things about this project, but this is not working.


I will only do work for projects that _have_ a Mac client, but I do work on multiple projects at a time, so if a project is having a Mac problem for a while, I'll leave the PC going for them...

I'm running Rosetta "heavy" on the PC, and "some" on the Mac Mini. Kind of playing with numbers trying to get Rosetta and Predictor over 10000, and Einstein over SETI... The only problem I've had is certain WUs that run "extremely" long times on the Mini, while others seem to just be a touch slower on the Mini than they should be, based on work downloaded the same day to the PC. (Now, I did give up on the iBook G3...) I haven't seen _any_ WUs that errored out, so I don't know what to tell you on that.

If you want to "pull the plug" on Rosetta for now, that's up to you, but they are currently looking at the Mac app, wanting to make it better/faster. So don't give up completely, but come back and check it out again later.

Out of curiosity, no offense intended - does it work for you to do one-project work? I know PoorBoy does that too, putting all of his (considerable!) power towards one project until he's happy with his rank, then moving to the next. I play games with resource shares and which computers are attached to which projects, but I almost always have one computer _somewhere_ doing any of the projects I'm interested in - if nothing else, it keeps me from falling quite so far before I get back to making that one "#1" again.

ID: 5327 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dogbytes
Avatar

Send message
Joined: 4 Dec 05
Posts: 37
Credit: 207,563
RAC: 0
Message 5329 - Posted: 7 Dec 2005, 0:19:46 UTC
Last modified: 7 Dec 2005, 0:24:09 UTC

Now that someone has said that they're working on a functional client, I'll stick around and bide my time.
I'm like Poorboy. I find it rather frustrating at this point in time that Mac's are still treated like unwanted, redheaded, step childs within the community. So, I'll hang loose for awhile. I guess that I'm having the adult version of a temper tantrum.

What's even worst is having to trash WU's!
ID: 5329 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Unrecoverable error



©2024 University of Washington
https://www.bakerlab.org