Miscellaneous Work Unit Errors

Message boards : Number crunching : Miscellaneous Work Unit Errors

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

AuthorMessage
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 12247 - Posted: 19 Mar 2006, 7:51:46 UTC
Last modified: 19 Mar 2006, 7:52:42 UTC

Rosetta WU crash due to Ralph testing.

Reported here: http://ralph.bakerlab.org/forum_thread.php?id=4#914

And before anybody wants to post that Rosetta requires to stay in memory while preempted, let me show you David Kim's answer to that question:

http://ralph.bakerlab.org/forum_thread.php?id=4#407


[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 12247 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 12272 - Posted: 19 Mar 2006, 16:51:00 UTC - in response to Message 12247.  
Last modified: 19 Mar 2006, 16:55:32 UTC

Happened again!

Rosetta WU crash due to Ralph testing.

Reported here: http://ralph.bakerlab.org/forum_thread.php?id=4#918

Moderator 8 or 9 or whatever, will you please create a thread, where we, who are testing WU's over at Ralph can report our crashed Rosetta WU's due to Ralph testing.

And I'm changing back to keep WU's in memory while preempted, untill the devs have fixed this bug.

And I really hope they'll grant me the claimed credit for the crashed Rosetta WU's.




[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 12272 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 12275 - Posted: 19 Mar 2006, 17:06:18 UTC - in response to Message 12274.  

For those who may be interested, Rom has posted information about Rosetta Work Unit errors and the status of the ongoing work to fix the bugs in Rosetta/Ralph on his "Blog".


Yes, but he doesn't mention the "stay in memory or crash" bug.


[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 12275 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 17 Sep 05
Posts: 18
Credit: 40,071
RAC: 0
Message 12277 - Posted: 19 Mar 2006, 17:10:22 UTC

Ummmmm, that is the bug, or rather the manifestation of the bug that you are seeing.

The bug I fixed is the stay in memory or crash bug.

----- Rom
----- Rom
My Blog
ID: 12277 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rom Walton (BOINC)
Volunteer moderator
Project developer

Send message
Joined: 17 Sep 05
Posts: 18
Credit: 40,071
RAC: 0
Message 12304 - Posted: 20 Mar 2006, 0:16:27 UTC - in response to Message 12279.  

Ummmmm, that is the bug, or rather the manifestation of the bug that you are seeing.

The bug I fixed is the stay in memory or crash bug.

----- Rom


Rom,

The problem is many people running Ralph are also running Rosetta. The latest Ralph app has not yet been deployed in Rosetta. So when people remove the application to run Ralph, this impacts their Rossetta work adversely.

What they really need is guidance on how to run Ralph under these conditions.


I wish I had a good answer for ya, all I can say is this issue will become a thing of the past over the next day or two.


----- Rom
My Blog
ID: 12304 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike

Send message
Joined: 21 Dec 05
Posts: 9
Credit: 35,252
RAC: 0
Message 12323 - Posted: 20 Mar 2006, 10:43:59 UTC

Hi . My first post. Having seemingly to have got past the 1% bug. (turned off all screen savers). I have now noticed a'Client error' problem when checking my results. Almost 50% have this error. Any ideas.??
ID: 12323 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 12326 - Posted: 20 Mar 2006, 10:56:16 UTC - in response to Message 12323.  

Hi . My first post. Having seemingly to have got past the 1% bug. (turned off all screen savers). I have now noticed a'Client error' problem when checking my results. Almost 50% have this error. Any ideas.??


Read the FAQ about keeping work in memory.

https://boinc.bakerlab.org/rosetta/forum_thread.php?id=669#10374

Anders n
ID: 12326 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike

Send message
Joined: 21 Dec 05
Posts: 9
Credit: 35,252
RAC: 0
Message 12327 - Posted: 20 Mar 2006, 11:11:46 UTC
Last modified: 20 Mar 2006, 11:13:20 UTC

Hi All. My first post. I need to know why 50% of my results have a client error ??
ID: 12327 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
STE\/E

Send message
Joined: 17 Sep 05
Posts: 125
Credit: 4,103,208
RAC: 0
Message 12359 - Posted: 20 Mar 2006, 23:14:48 UTC
Last modified: 21 Mar 2006, 0:04:29 UTC

I have a weird one going right now on 1 Computer. I have my Preferences set to run the WU's in 2 Hours & most of the WU's do run in that amount of Time give or take 10-15 Minutes.

But this one has been running for 4:22:30 now & only showing 38.71% done, at least it is showing some progress so I guess I'll just have to see what happens with it ... :)
ID: 12359 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 12364 - Posted: 21 Mar 2006, 0:32:25 UTC - in response to Message 12359.  
Last modified: 21 Mar 2006, 0:33:56 UTC

I have a weird one going right now on 1 Computer. I have my Preferences set to run the WU's in 2 Hours & most of the WU's do run in that amount of Time give or take 10-15 Minutes.

But this one has been running for 4:22:30 now & only showing 38.71% done, at least it is showing some progress so I guess I'll just have to see what happens with it ... :)


Poorboy, I had one running for hours and with a very slow almost invisible progress, and when I opened the graphic to see if it was dead, I saw a huge protein! And I could see it moved rapidly and the steps incremented very slowly, so I guess it had to run through a lot of foldings, as it was so big. I should have taken a screendump of it, but I didn't. But it was the biggest, I've ever seen.

So maybe it's one of those giants, you have got?


[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 12364 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Fuzzy Hollynoodles
Avatar

Send message
Joined: 7 Oct 05
Posts: 234
Credit: 15,020
RAC: 0
Message 12365 - Posted: 21 Mar 2006, 0:38:34 UTC - in response to Message 12304.  

Ummmmm, that is the bug, or rather the manifestation of the bug that you are seeing.

The bug I fixed is the stay in memory or crash bug.

----- Rom


Rom,

The problem is many people running Ralph are also running Rosetta. The latest Ralph app has not yet been deployed in Rosetta. So when people remove the application to run Ralph, this impacts their Rossetta work adversely.

What they really need is guidance on how to run Ralph under these conditions.


I wish I had a good answer for ya, all I can say is this issue will become a thing of the past over the next day or two.




Thanks Rom. I'm sure you'll let us know as soon you have found out something. :-)

Anyway, I've changed my settings back to stay in memory while preempted, and I won't change it before you give us a signal that it's ok to do.


[b]"I'm trying to maintain a shred of dignity in this world." - Me[/b]

ID: 12365 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
nairb

Send message
Joined: 8 Dec 05
Posts: 17
Credit: 990,147
RAC: 0
Message 12477 - Posted: 22 Mar 2006, 1:26:12 UTC

been getting more wu failing at the end of processing. A reboot did not solve it. The machine is only crunching rosy.

3/22/06 12:54:34 AM|rosetta@home|Unrecoverable error for result FA_RLXig_hom019_1ig5A_360_492_0 ( - exit code -1073741819 (0xc0000005))

2006-03-20 02:37:31 [rosetta@home] Unrecoverable error for result HB_BARCODE_30_1pgx__351_7954_0 ( - exit code -1073741819 (0xc0000005))

2006-03-19 06:05:28 [rosetta@home] Unrecoverable error for result FA_RLXig_hom010_1ig5A_360_230_0 ( - exit code -1073741819 (0xc0000005))

2006-03-20 20:22:06 [rosetta@home] Unrecoverable error for result HB_BARCODE_30_1iibA_351_11198_0 ( - exit code -1073741819 (0xc0000005))

etc.....

Might check the machine with other projects. It has run seti before without any problems.

ID: 12477 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kevin

Send message
Joined: 15 Jan 06
Posts: 21
Credit: 109,496
RAC: 0
Message 12541 - Posted: 23 Mar 2006, 1:39:03 UTC

I had a unit fail with this:
<core_client_version>5.3.12.tx36</core_client_version>
<message>The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>

Any clue to what caused this error? I think this error may have occurred when I restarted my computer and Rosetta failed to quit so windows just ended the process. Rosetta fails to quit occasionally on both XP and OS X and I end up with an error similar to this.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=11697219
ID: 12541 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
darioml

Send message
Joined: 9 Feb 06
Posts: 1
Credit: 220,203
RAC: 0
Message 12562 - Posted: 23 Mar 2006, 10:01:09 UTC

Hello.

I'm using BOINC 5.2.13 in Windows XP SP2 running Rosetta@home 4.82 and SETI@home 4.18 projects.

I changed my settings back to stay in memory while preempted, as well as the work unit time fixed to 2 hours, but still most of the Rosetta WUs give errors :(

For example these ones this morning:

3/23/2006 9:59:05 AM|rosetta@home|Unrecoverable error for result FA_RLXsc_hom010_1scjB_361_277_0 ( - exit code -1073741811 (0xc000000d))
3/23/2006 9:59:05 AM|rosetta@home|Unrecoverable error for result FA_RLXop_hom028_1opd__361_296_0 ( - exit code -1073741811 (0xc000000d))

I don't have ANY problems with SETI, except that sometimes the scheduler doesn't respond, but nothing related with the calculation.

When this bug will be fixed? When it crashes, sometimes the whole BOINC crashes and XP shows the window to report the problem to Microsoft...

Thanks,

Dar
ID: 12562 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike

Send message
Joined: 21 Dec 05
Posts: 9
Credit: 35,252
RAC: 0
Message 12564 - Posted: 23 Mar 2006, 11:10:59 UTC

Hi. Try turning off all screen savers. I did this 6 days ago with no problems since.
ID: 12564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bob Guy

Send message
Joined: 7 Oct 05
Posts: 39
Credit: 24,895
RAC: 0
Message 12601 - Posted: 24 Mar 2006, 4:23:16 UTC

Had this error:

3/23/2006 6:36:47 PM|rosetta@home|Unrecoverable error for result FA_RLXvi_hom027_2vik__362_83_0 ( - exit code -1073741819 (0xc0000005))


Leave in memory = yes

This WU was never interrupted - it ran from start to failure without being paused. I was actually eating dinner when this occurred so the computer was otherwise idle.

I NEVER use the screensaver or viewed the graphics for this WU.

Runs with SETI, SETI Beta, Einstein, Predictor and QAH - no problems with any of the other projects.

Other R@H WUs complete normally.

System is not overclocked and the temps are mid-range - I don't think the CPU is working very hard.
ID: 12601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 12632 - Posted: 24 Mar 2006, 18:16:16 UTC
Last modified: 24 Mar 2006, 18:21:39 UTC


https://boinc.bakerlab.org/rosetta/result.php?resultid=14511478

15 hours in a slot on an unused laptop overnight - 1.3 hrs cpu time accumulated, ~25% progress.

But, this may be progress of a sort as it didn't hang at 1%!
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 12632 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Los Alcoholicos~Megaflix

Send message
Joined: 10 Nov 05
Posts: 24
Credit: 77,199
RAC: 0
Message 12675 - Posted: 25 Mar 2006, 11:58:03 UTC
Last modified: 25 Mar 2006, 12:00:27 UTC

exit code -1073741819

I seem to have one computer which generates a lot of workunits with above error. Always the same error, but at different moments in the workunit. Maybe the project researchers could look into it. The computer's name in the computer list is Megaflix. It's got at least a 25% failure rate, but it's still climbing. I've installed a fresh Windows, with no software at all. Just for testing I installed only Boinc with Rosetta and it keeps producing the errors.

Leave applications in memory has been set to yes since the beginning, so that's not an issue. I had set the work units to run for 2 hours, but since it produced errors almost from the start I set it lower, to 1 hour. It decreased the number of errors just a little bit, but not much.

Might be handy in trying to find a cure for this error.

I'm willing to use the computer as a testing guinee pig if you want to...
ID: 12675 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Los Alcoholicos~Megaflix

Send message
Joined: 10 Nov 05
Posts: 24
Credit: 77,199
RAC: 0
Message 12703 - Posted: 25 Mar 2006, 23:17:03 UTC - in response to Message 12685.  


Cab you attach this computer to RALPH. It might help to see those errors over there.


Ok. I'll finish the outstanding workunits of Rosetta and then I'll connect it to Ralph.
ID: 12703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 12717 - Posted: 26 Mar 2006, 21:58:20 UTC

Exit status -164 (0xffffff5c)
https://boinc.bakerlab.org/rosetta/result.php?resultid=15006663
Rosetta 4.82 Windows
Click signature for global team stats
ID: 12717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

Message boards : Number crunching : Miscellaneous Work Unit Errors



©2025 University of Washington
https://www.bakerlab.org