Miscellaneous Work Unit Errors

Message boards : Number crunching : Miscellaneous Work Unit Errors

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

AuthorMessage
loren

Send message
Joined: 10 Oct 05
Posts: 3
Credit: 2,449,762
RAC: 0
Message 11782 - Posted: 8 Mar 2006, 14:55:57 UTC - in response to Message 11781.  

I am have also recieved a computational error each of last three mornings. Is there any information I can collect that will help fix the problem?
Loren
ID: 11782 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 11795 - Posted: 8 Mar 2006, 21:16:05 UTC

Exit status 1 (0x1)
https://boinc.bakerlab.org/rosetta/result.php?resultid=12918099
Rosetta_4.82 Windows XP
Click signature for global team stats
ID: 11795 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
David Baker
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 17 Sep 05
Posts: 705
Credit: 559,847
RAC: 0
Message 11802 - Posted: 9 Mar 2006, 2:06:14 UTC - in response to Message 11774.  

The following three WUs:

HOMSdt_homDB002_1dtj__340_124_0
HOMSdt_homDB002_1dtj__352_271_0
HOMSdt_homDB004_1dtj__352_669_0

exited with error status 1 after about 30 seconds of run time on my Linux computer as well as on several other computers. Since three out of three units of this particular type have failed on this computer which usually has almost no errors I believe this is a WU specific error which may need investigating.


we are looking into this--thanks.

ID: 11802 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
divyab

Send message
Joined: 20 Oct 05
Posts: 6
Credit: 0
RAC: 0
Message 11804 - Posted: 9 Mar 2006, 3:00:40 UTC

We have found the problem, and are resubmitting the jobs with a fix. There are still a few workunits with the following prefix out there that you can expect to fail very quickly:

HOMSdt_homDB0??_1dtj

this should not happen with the next batch.

ID: 11804 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lee Carre

Send message
Joined: 6 Oct 05
Posts: 96
Credit: 79,331
RAC: 0
Message 11805 - Posted: 9 Mar 2006, 3:00:58 UTC
Last modified: 9 Mar 2006, 3:03:05 UTC

3 more:

HOMSdt_homDB009_1dtj__352_458_1 - exited after 124.2 seconds
HOMSdt_homDB003_1dtj__352_40_2 - exited after 129.9 seconds
HOMSdt_homDB027_1dtj__352_1364_1 - exited after 47.58 seconds


all returned the following error:
Incorrect function. (0x1) - exit code 1 (0x1)
my hosts are pretty stable, i have very few errors normally, and i'm using the latest recommended version of BOINC (v5.2.13)

the first 2 occoured on the same host, and the 3rd on a different host

it seems that those units have filed on all systems they've been sent to (see the WU page for each) which wouldn't surprise me if "incorrect function" means that there was something wrong with the input

[edit]oops, just seen the previous post, sorry, nevermind lol[/edit]
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
ID: 11805 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Osku87

Send message
Joined: 1 Nov 05
Posts: 17
Credit: 280,268
RAC: 0
Message 11825 - Posted: 9 Mar 2006, 17:28:14 UTC

One more with another computer.

Result

You can find another workunit with error in there, but it's only aborted because it's deadline passed.
ID: 11825 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Nightlord

Send message
Joined: 6 Dec 05
Posts: 5
Credit: 1,635,379
RAC: 0
Message 11882 - Posted: 11 Mar 2006, 13:05:23 UTC

WU 4385833

"Work unit error - check skipped" Never seen that before. It was issued to me before the other cruncher returned their result which was subsequently validatedhere with no error.

Reviewing the summary for the WU here, it says the unit was cancelled. This must have been after the original cruncher returned his result, but before I returned mine.

Are there more units that have been cancelled after issuing to crunchers and if so how many? Can they be identified so we can abort them please.
ID: 11882 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kics_ee

Send message
Joined: 27 Feb 06
Posts: 2
Credit: 1,531
RAC: 0
Message 11935 - Posted: 12 Mar 2006, 13:33:20 UTC

on my first computer: (amd64 optimalizaed Boinc manager)
2006.03.12. 13:46:01|rosetta@home|Pausing result HOMSti_homDB005_1tif__352_1849_0 (removed from memory)
2006.03.12. 13:46:02|rosetta@home|Unrecoverable error for result HOMSti_homDB005_1tif__352_1849_0 ( - exit code -1073741819 (0xc0000005))

my second computer i had more fault...
i wish something, to know: why?
ID: 11935 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 11937 - Posted: 12 Mar 2006, 13:59:17 UTC - in response to Message 11935.  
Last modified: 12 Mar 2006, 13:59:44 UTC

on my first computer: (amd64 optimalizaed Boinc manager)
2006.03.12. 13:46:01|rosetta@home|Pausing result HOMSti_homDB005_1tif__352_1849_0 (removed from memory)
2006.03.12. 13:46:02|rosetta@home|Unrecoverable error for result HOMSti_homDB005_1tif__352_1849_0 ( - exit code -1073741819 (0xc0000005))

my second computer i had more fault...
i wish something, to know: why?


Read the FAQ about keeping work in memory.

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=669#10374

Anders n
ID: 11937 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 11939 - Posted: 12 Mar 2006, 14:25:18 UTC
Last modified: 12 Mar 2006, 14:28:48 UTC

Exit status 1 (0x1)
https://boinc.bakerlab.org/rosetta/result.php?resultid=13276339
Exit status 128 (0x80)
https://boinc.bakerlab.org/rosetta/result.php?resultid=13322658
Rosetta 4.82 Windows XP for both above
Click signature for global team stats
ID: 11939 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Knorr

Send message
Joined: 18 Feb 06
Posts: 21
Credit: 373,953
RAC: 0
Message 11940 - Posted: 12 Mar 2006, 16:17:47 UTC

According to the Results page, I should have recieved this WU: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=9568873

But it's not showing in my BOINC manager.

I can see the WU had timed out on the previous host. Perhaps an error with the WU?

What should I do?

- Knorr
ID: 11940 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 11944 - Posted: 12 Mar 2006, 17:15:54 UTC - in response to Message 11940.  

According to the Results page, I should have recieved this WU: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=9568873

But it's not showing in my BOINC manager.

I can see the WU had timed out on the previous host. Perhaps an error with the WU?

What should I do?

- Knorr


It could be a bad work Unit. "Ghost" work units do occur on all the projects for a variety of reasons. The most common is a failure to download properly. If it is stuck in your queue, abort it manually, if it is only appearing in your stats pages, then you need do nothing.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 11944 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 11971 - Posted: 13 Mar 2006, 6:21:12 UTC
Last modified: 13 Mar 2006, 6:32:37 UTC

Exit status -164 (0xffffff5c)
https://boinc.bakerlab.org/rosetta/result.php?resultid=13334669
Rosetta 4.82 Windows XP
Click signature for global team stats
ID: 11971 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kics_ee

Send message
Joined: 27 Feb 06
Posts: 2
Credit: 1,531
RAC: 0
Message 11976 - Posted: 13 Mar 2006, 12:16:39 UTC - in response to Message 11937.  


Read the FAQ about keeping work in memory.

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=669#10374

Anders n



thx... i care about further
ID: 11976 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 11980 - Posted: 13 Mar 2006, 15:08:40 UTC

Exit status 131 (0x83)
https://boinc.bakerlab.org/rosetta/result.php?resultid=13559020
https://boinc.bakerlab.org/rosetta/result.php?resultid=13575348
Rosetta 4.81 Linux for both
Click signature for global team stats
ID: 11980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lee Carre

Send message
Joined: 6 Oct 05
Posts: 96
Credit: 79,331
RAC: 0
Message 11992 - Posted: 13 Mar 2006, 19:59:54 UTC
Last modified: 13 Mar 2006, 20:02:05 UTC

result FA_RLXac_hom029_1acf__359_123_0 produced: exit code -1073741811 (0xc000000d) after 1 minute and 2 seconds

i have leave apps in memory set to yes, running on winXP with an Intel CPU
Want to search the BOINC Wiki, BOINCstats, or various BOINC forums from within firefox? Try the BOINC related Firefox Search Plugins
ID: 11992 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikus

Send message
Joined: 7 Nov 05
Posts: 58
Credit: 700,115
RAC: 0
Message 12013 - Posted: 14 Mar 2006, 19:10:08 UTC - in response to Message 11804.  
Last modified: 14 Mar 2006, 19:37:01 UTC

We have found the problem, and are resubmitting the jobs with a fix. There are still a few workunits with the following prefix out there that you can expect to fail very quickly:

HOMSdt_homDB0??_1dtj

this should not happen with the next batch.

Apparently some of these are still circulating. Within jobs downloaded on Mar 8:

https://boinc.bakerlab.org/rosetta/result.php?resultid=12969634
https://boinc.bakerlab.org/rosetta/result.php?resultid=12969645
.
ID: 12013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile UBT - Timbo

Send message
Joined: 25 Sep 05
Posts: 20
Credit: 2,275,059
RAC: 0
Message 12052 - Posted: 15 Mar 2006, 11:33:11 UTC - in response to Message 10953.  
Last modified: 15 Mar 2006, 11:33:53 UTC

Report all Work Unit errors on this thread that are NOT -

    "1%" Hang"
    "Max Time Exceeded"
    or other "stuck" or "hung" workuinits





15/03/2006 00:31:28|rosetta@home|Unrecoverable error for result FA_RLXcc_hom003_1cc8A_359_158_0 ( - exit code -164 (0xffffff5c))
15/03/2006 03:04:02|rosetta@home|Unrecoverable error for result FA_RLXbq_hom005_1bq9A_359_158_0 ( - exit code -1073741819 (0xc0000005))
15/03/2006 11:26:32|rosetta@home|Unrecoverable error for result FA_RLXbk_hom002_1bk2__359_459_0 ( - exit code -164 (0xffffff5c))



Not too happy about getting these errors - but grateful to the project if they can fix it so that all WU's are good and can return useful results.

regards,

Tim



ID: 12052 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 12053 - Posted: 15 Mar 2006, 13:05:34 UTC

*** glibc detected *** corrupted double-linked list: 0x08e1b4d0 ***
https://boinc.bakerlab.org/rosetta/result.php?resultid=13717091
Rosetta 4.81 Linux
Click signature for global team stats
ID: 12053 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile KWSN Sir Clark

Send message
Joined: 18 Sep 05
Posts: 46
Credit: 387,432
RAC: 0
Message 12116 - Posted: 17 Mar 2006, 0:19:26 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=13963817

	

<core_client_version>5.2.13</core_client_version>
<message> - exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# random seed: 2575106
# cpu_run_time_pref: 7200
No heartbeat from core client for 31 sec - exiting

***UNHANDLED EXCEPTION****
Reason: Access Violation (0xc0000005) at address 0x7C911E58 read attempt to address 0x3EF89F50

1: 03/16/06 23:50:49



</stderr_txt>


Running :-
Rosetta 4.82,
BOINC 5.2.13
Win XP,
1.5MB RAM,
set to remain in memory,
CPU time setting: 2 hrs
all other projects suspended
(I was trying to run Rosetta work down to make more time for other projects - I'm not quitting, just taking a holiday :D )

Judging by the CPU time taken it quit just as it was finishing or thereabouts.
ID: 12116 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

Message boards : Number crunching : Miscellaneous Work Unit Errors



©2024 University of Washington
https://www.bakerlab.org