Client Errors

Message boards : Number crunching : Client Errors

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Bok

Send message
Joined: 17 Sep 05
Posts: 54
Credit: 3,514,973
RAC: 0
Message 800 - Posted: 30 Sep 2005, 3:17:36 UTC - in response to Message 793.  
Last modified: 30 Sep 2005, 3:18:21 UTC

I got quite a few around about the same time on a new install on an AMD 3800+

Reset the project and they started working ok.

I guess there was a bad batch out there ?

Bok

Free-DC

Stats for all projects

Custom Stats
ID: 800 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 809 - Posted: 30 Sep 2005, 13:37:36 UTC

I don't know ... :(

I looked again today, and I have not had one yet and I have 60 some work units on the two computers. I guess I should count my blessings ...
ID: 809 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jeff

Send message
Joined: 21 Sep 05
Posts: 20
Credit: 380,889
RAC: 0
Message 811 - Posted: 30 Sep 2005, 13:40:16 UTC

Hmmm... I have Rosetta running on 3 AMD3800x2 systems and 4 other various systems and none of these kind of errors yet. WinXP SP2 is running on all of them and they are all running Rosetta exclusively in BOINC 24/7. Half of them also have the DIMES project running.

Strange problem here for some folks...


Jeff's Computer Farm
ID: 811 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile FZB

Send message
Joined: 17 Sep 05
Posts: 84
Credit: 4,899,675
RAC: 756
Message 818 - Posted: 30 Sep 2005, 15:01:40 UTC

you don't get the error when you run it exclusive... it happens (on some systems) when switching from rosetta to something else (actually seems to happen after switching...)
--
Florian
www.domplatz1.de
ID: 818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 821 - Posted: 30 Sep 2005, 15:20:47 UTC - in response to Message 818.  

you don't get the error when you run it exclusive... it happens (on some systems) when switching from rosetta to something else (actually seems to happen after switching...)


That explains why I have had no problems. After hearing DB's plea for more computational power, I dedicated one machine to exclusively running R@home. :)
Regards,
Bob P.
ID: 821 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jeff

Send message
Joined: 21 Sep 05
Posts: 20
Credit: 380,889
RAC: 0
Message 826 - Posted: 30 Sep 2005, 17:07:04 UTC - in response to Message 818.  

you don't get the error when you run it exclusive... it happens (on some systems) when switching from rosetta to something else (actually seems to happen after switching...)


;o) Makes sense then in my case.

Jeff's Computer Farm
ID: 826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JaRski-S60R
Avatar

Send message
Joined: 24 Sep 05
Posts: 4
Credit: 608,548
RAC: 0
Message 835 - Posted: 30 Sep 2005, 20:33:17 UTC - in response to Message 818.  

you don't get the error when you run it exclusive... it happens (on some systems) when switching from rosetta to something else (actually seems to happen after switching...)


But I often have an error when WU is done 100% (using boinc v4.72 and also other projects). Sofar none succesful (9) :-(
<img src="http://www.boincstats.com/stats/banner.php?id=184312"><img src="http://i23.photobucket.com/albums/b398/6teacher/vobo.gif">
<img src="http://i23.photobucket.com/albums/b398/6teacher/faster.gif">
ID: 835 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Neil Woodvine
Avatar

Send message
Joined: 16 Sep 05
Posts: 3
Credit: 30,708
RAC: 0
Message 856 - Posted: 1 Oct 2005, 5:29:19 UTC

had something similar on my p4 3.2ghz ht box. noticed about 2 hours ago that one of the wu's had been stuck at 1% for 4 days and the other for a day and a half =/.

i reset the project on the box just in case it was a bad batch of wu's and it happily cruched away for 2 hours and then did the 4day benchmark and errored the two wu's it had been running.

I've only seen this problem on my ht box all the "single" cpu's are suspending the wu and coming back fine. maybe it's a problem with suspending 2 running rosetta wu's at the same time ? (ht /dual cpu's)

ID: 856 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile UBT - Halifax--lad
Avatar

Send message
Joined: 17 Sep 05
Posts: 157
Credit: 2,687
RAC: 0
Message 860 - Posted: 1 Oct 2005, 8:29:39 UTC

Going to give it one last go and if it fails will stop allowing new work until something is sorted out in the future
Join us in Chat (see the forum) Click the Sig


Join UBT
ID: 860 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile FZB

Send message
Joined: 17 Sep 05
Posts: 84
Credit: 4,899,675
RAC: 756
Message 867 - Posted: 1 Oct 2005, 9:56:17 UTC - in response to Message 835.  

But I often have an error when WU is done 100% (using boinc v4.72 and also other projects). Sofar none succesful (9) :-(


what was your exit code? might be a different error? the one i see while/after switching is 0xc0000005

--
Florian
www.domplatz1.de
ID: 867 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JaRski-S60R
Avatar

Send message
Joined: 24 Sep 05
Posts: 4
Credit: 608,548
RAC: 0
Message 868 - Posted: 1 Oct 2005, 12:29:48 UTC - in response to Message 867.  

But I often have an error when WU is done 100% (using boinc v4.72 and also other projects). Sofar none succesful (9) :-(


what was your exit code? might be a different error? the one i see while/after switching is 0xc0000005


Sry, haven't wrote it down and I had to restart my pc since then so the messages were cleared. But I give update soon, 1WU is at 83,33% complete but I keep my sentings to same for moment (so thus allowing ALL projects to work) but I did switched to "leave in memory" (got 1Gb ram so) so just let's wait :-)

<img src="http://www.boincstats.com/stats/banner.php?id=184312"><img src="http://i23.photobucket.com/albums/b398/6teacher/vobo.gif">
<img src="http://i23.photobucket.com/albums/b398/6teacher/faster.gif">
ID: 868 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JaRski-S60R
Avatar

Send message
Joined: 24 Sep 05
Posts: 4
Credit: 608,548
RAC: 0
Message 869 - Posted: 1 Oct 2005, 14:16:28 UTC - in response to Message 868.  

mmm...that's funky :-S
It resumed but now it's already standint for over a hour orsow at the same % (83.33 that is) :-(
Keep you updated, coz I noticed that before, it eventually resumed with the earlier WU. Just waiting and see what the outcome error-message will be.

<img src="http://www.boincstats.com/stats/banner.php?id=184312"><img src="http://i23.photobucket.com/albums/b398/6teacher/vobo.gif">
<img src="http://i23.photobucket.com/albums/b398/6teacher/faster.gif">
ID: 869 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JaRski-S60R
Avatar

Send message
Joined: 24 Sep 05
Posts: 4
Credit: 608,548
RAC: 0
Message 870 - Posted: 1 Oct 2005, 14:18:30 UTC - in response to Message 869.  

ROFL omg ... as soon as I wrote last message I checked BOINC again and it was running and showed 91.97%
<img src="http://www.boincstats.com/stats/banner.php?id=184312"><img src="http://i23.photobucket.com/albums/b398/6teacher/vobo.gif">
<img src="http://i23.photobucket.com/albums/b398/6teacher/faster.gif">
ID: 870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 873 - Posted: 1 Oct 2005, 15:34:05 UTC - in response to Message 869.  
Last modified: 1 Oct 2005, 15:36:52 UTC

mmm...that's funky :-S
It resumed but now it's already standint for over a hour orsow at the same % (83.33 that is) :-(
Keep you updated, coz I noticed that before, it eventually resumed with the earlier WU. Just waiting and see what the outcome error-message will be.


I had the same stuck percentage (or approximately). R@h was only using about 0.290 Mb of RAM on it at the time--in other words, next to nothing. So R@h seemed to be "lost in space" or something.

I suspended the wu, started the next, unsuspended the wu (the program continued on with the new one), and turned in for the night. When I checked this morning, both wu had been successfully completed.

Very mysterious! It was as though the wu had lost its way in molecular space somewhere....but then found its way back again!
Regards,
Bob P.
ID: 873 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile UBT - Halifax--lad
Avatar

Send message
Joined: 17 Sep 05
Posts: 157
Credit: 2,687
RAC: 0
Message 876 - Posted: 1 Oct 2005, 16:21:01 UTC

Im holding back off this project for now just had another failed WU.

01/10/2005 16:03:48|rosetta@home|Unrecoverable error for result 1pvaA_abrelax_no_cst_21829_0 ( - exit code -1073741819 (0xc0000005))

will wait 4 the problems to be sorted
Join us in Chat (see the forum) Click the Sig


Join UBT
ID: 876 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 878 - Posted: 1 Oct 2005, 18:18:32 UTC

interestingly enough I had one work unit client error. Issued 3 times, 2 client error, and one success ... more to be puzzled about ...
ID: 878 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JimB
Avatar

Send message
Joined: 17 Sep 05
Posts: 19
Credit: 228,111
RAC: 0
Message 879 - Posted: 1 Oct 2005, 18:54:51 UTC

Almost the same here - issued 3 times, one success, one error, one still out. *Right now* I seem to get errors only every 5 days when benchmarks run, only on rosetta wu's.

2005-09-30 14:56:17 [---] Suspending computation and network activity - running CPU benchmarks
2005-09-30 14:56:17 [rosetta@home] Pausing result 1btn__abrelax_no_cst_14765_1 (removed from memory)
2005-09-30 14:56:17 [rosetta@home] Pausing result 1btn__abrelax_no_cst_13275_1 (removed from memory)
2005-09-30 14:56:18 [rosetta@home] Unrecoverable error for result 1btn__abrelax_no_cst_14765_1 ( - exit code -1073741819 (0xc0000005))
2005-09-30 14:56:18 [rosetta@home] Unrecoverable error for result 1btn__abrelax_no_cst_13275_1 ( - exit code -1073741819 (0xc0000005))
2005-09-30 14:56:18 [---] request_reschedule_cpus: process exited
2005-09-30 14:56:19 [---] Running CPU benchmarks



"Be all that you can be...considering." Harold Green
ID: 879 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile FZB

Send message
Joined: 17 Sep 05
Posts: 84
Credit: 4,899,675
RAC: 756
Message 888 - Posted: 1 Oct 2005, 23:17:35 UTC - in response to Message 856.  
Last modified: 1 Oct 2005, 23:18:09 UTC

I've only seen this problem on my ht box all the "single" cpu's are suspending the wu and coming back fine. maybe it's a problem with suspending 2 running rosetta wu's at the same time ? (ht /dual cpu's)


https://boinc.bakerlab.org/rosetta/workunit.php?wuid=103170

while i see this on my two boxes (both multi proc/core) when i have not leave in memory on, the above wu i returned successful was returned by a pentium m with 1 cpu, so does not seem to be a multi proc exclusive issue

--
Florian
www.domplatz1.de
ID: 888 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 889 - Posted: 1 Oct 2005, 23:37:18 UTC

Ah Ha!

Forgot to check the logs ... yes, mine was the x05 error with the benchmarks noted as aborted as tasks are active.

So ...

Tentative conclusion: Rosetta@Home does not to be stopped ...

Ugh ...

ID: 889 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Joe

Send message
Joined: 26 Sep 05
Posts: 3
Credit: 433,580
RAC: 631
Message 898 - Posted: 2 Oct 2005, 7:46:04 UTC

I also get client error every time the application is removed ftom memory to run benchmarks.
ID: 898 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Client Errors



©2024 University of Washington
https://www.bakerlab.org