Report Problems with Rosetta Version 5.25

Message boards : Number crunching : Report Problems with Rosetta Version 5.25

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12

AuthorMessage
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 26156 - Posted: 6 Sep 2006, 7:40:19 UTC - in response to Message 26155.  

Is the rosetta process still present if you Exit BOINC (not just stopping).
If there is no BOINC.exe process present but a rosetta process then there is something wrong with BOINC I think, since it should kill all child processes when it exits.

Sure. I'll better tell the Boinc devs too. I was already thinking about doing this.
But it is only different Boinc behaviour when the problem arises, it still begins with rosetta getting stuck.

Peter
ID: 26156 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 26578 - Posted: 11 Sep 2006, 6:22:54 UTC

ID: 26578 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile eNDo

Send message
Joined: 9 Apr 06
Posts: 9
Credit: 372,288
RAC: 0
Message 26664 - Posted: 12 Sep 2006, 19:27:35 UTC

lol... i see all my posts have been removed.
ID: 26664 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 26666 - Posted: 12 Sep 2006, 19:47:02 UTC - in response to Message 26664.  

lol... i see all my posts have been removed.

I don't think so. It's just that this thread is getting so long that you have to click for more. Look for a link after the original post from Rhiju that says "Click here to also display the remaining posts."
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 26666 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christoph

Send message
Joined: 10 Dec 05
Posts: 57
Credit: 1,512,386
RAC: 0
Message 26710 - Posted: 13 Sep 2006, 17:06:37 UTC

ID: 26710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 26762 - Posted: 14 Sep 2006, 15:57:00 UTC

ERROR:: Exit at: .dock_structure.cc line:401

https://boinc.bakerlab.org/rosetta/result.php?resultid=37385834

Anders n

ID: 26762 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 26764 - Posted: 14 Sep 2006, 16:23:26 UTC - in response to Message 26762.  

ERROR:: Exit at: .dock_structure.cc line:401

https://boinc.bakerlab.org/rosetta/result.php?resultid=37385834

Anders n


Same with:
https://boinc.bakerlab.org/rosetta/result.php?resultid=37283001

I only noticed this as it was the first unit (prematurely) returned by my "new" machine...

--
Mats

ID: 26764 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 26988 - Posted: 16 Sep 2006, 20:42:04 UTC
Last modified: 16 Sep 2006, 20:43:40 UTC

9/16/2006 2:36:48 PM|rosetta@home|Unrecoverable error for result 1fvk_1_CASPR_1_1fvk_1_yyidrenum_16IGNORE_THE_REST_0001_1224_4988_0 (aborted by user)

Took a slot for a Long time with the reported cpu time hung around 3:18:21, but still merrily sucking electricity.

Result id
Work Unit
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 26988 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 28901 - Posted: 4 Oct 2006, 14:25:41 UTC

I have had two stuck WU today, having not seen this issue for months.

Both are on Rosetta v5.25

This one stuck at 78.94%, elapsed time stopped increasing. Running on a single cpu Linux box, BOINC v5.27

That one stuck at 100% and still shown as running. This is on a 2cpu linux box, shared with 2 CPDN WU, BOINC v5.28

Both tasks shown as taking no cpu in top. Both boxes rebooted. Sorry, I did not think to save the files before reboot.

Both tasks then recovered normally, the 100% one going on to report immediately, and the 78.94% one carrying on from its last checkpoint at 78.90% I am pretty sure I wrote down the figures correctly, and it backed off just 0.04% which would mean it got stuck right at the end of a very short decoy. Will let you know if it runs to completion OK or gets stuck again on the way.

Am planning now to update both boxes to latest BOINC Linux client in case that is the issue, though seems unlucky if it is when the boxes are running different BOINC versions.

This is not a complaint, just thought you might like to know.

River~~
ID: 28901 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 28902 - Posted: 4 Oct 2006, 14:51:07 UTC - in response to Message 28901.  
Last modified: 4 Oct 2006, 14:53:00 UTC

I have had two stuck WU today, having not seen this issue for months.
Both are on Rosetta v5.25
[...]
Am planning now to update both boxes to latest BOINC Linux client in case that is the issue, though seems unlucky if it is when the boxes are running different BOINC versions.

This seems not to be the issue. They are being stuck somewhere just after having checkpointed, therefore the "insignificant back-off of just 0.04%".

If, then only "Leave apps in memory" could help you. But the "stuck at 100%" will happen anyway - the problem is checkpointing at the 100% instead of finishing (thus able to immediately report when getting its turn later again).

(Have been testing it extensively with debug version of Rosetta 5.25 on Boinc 5.3.31 and now running on Boinc 5.5.15 - it behaves the same with "Leave apps in memory" option set - no stuck problems except the 100% one.)

Peter
ID: 28902 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 28904 - Posted: 4 Oct 2006, 16:41:18 UTC - in response to Message 28902.  

I have had two stuck WU today, having not seen this issue for months.
Both are on Rosetta v5.25


The one that stuck before the end ran through the place where it stuck before. Unless I post again here, please assume it went on to finish OK - you'll be able to see it once it uploads anyway.


[...]
Am planning now to update both boxes to latest BOINC Linux client in case that is the issue

This seems not to be the issue. They are being stuck somewhere just after having checkpointed, therefore the "insignificant back-off of just 0.04%".

If, then only "Leave apps in memory" could help you. But the "stuck at 100%" will happen anyway - the problem is checkpointing at the 100% instead of finishing (thus able to immediately report when getting its turn later again).

(Have been testing it extensively with debug version of Rosetta 5.25 on Boinc 5.3.31 and now running on Boinc 5.5.15 - it behaves the same with "Leave apps in memory" option set - no stuck problems except the 100% one.)

Peter


That differs from my experience today, Peter. All my boxes have "Leave in memory" set, including the one that stuck before the end.

Anyway it is good to know that it is already being investigated, and thanks for your response.

If it happens again, do you want further reports or not? Is it useful to know the tasks that fail like this? Are there any files, etc, it would be useful to keep next time?

R~~
ID: 28904 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1829
Credit: 115,531,201
RAC: 57,814
Message 28908 - Posted: 4 Oct 2006, 17:21:06 UTC

I've had quite a few compute errors come through on one of my hosts (haven't checked any of the other hosts yet):

https://boinc.bakerlab.org/rosetta/results.php?hostid=279343

As is Schoasch's computer who's just posted
this thread
:

https://boinc.bakerlab.org/rosetta/results.php?hostid=312686
ID: 28908 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 28921 - Posted: 4 Oct 2006, 19:16:32 UTC - in response to Message 28902.  

(Have been testing it extensively with debug version of Rosetta 5.25 on Boinc 5.3.31 and now running on Boinc 5.5.15 - it behaves the same with "Leave apps in memory" option set - no stuck problems except the 100% one.)

Peter


They're on v5.6.5 now, though development on 5.6 has stopped and they are jumping to 5.7 as of today (to prepare for 5.8 WCG/fancifiedGUI)
Team mauisun.org
ID: 28921 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pepo
Avatar

Send message
Joined: 28 Sep 05
Posts: 115
Credit: 101,358
RAC: 0
Message 28927 - Posted: 4 Oct 2006, 21:19:35 UTC - in response to Message 28904.  

(Have been testing it extensively with debug version of Rosetta 5.25 on Boinc 5.3.31 and now running on Boinc 5.5.15 - it behaves the same with "Leave apps in memory" option set - no stuck problems except the 100% one.)

That differs from my experience today, Peter. All my boxes have "Leave in memory" set, including the one that stuck before the end.

This might be also because I used (and use) a (75 MB huge :-) debug version from D. Kim - it may actually behave slightly differently from the released one. Previously (with the official one) I observed the same problems as you.

If it happens again, do you want further reports or not? Is it useful to know the tasks that fail like this?

I hope you did not ask me? ;-)
(I was only a yet-another-small-whinner, who helped few devs to make some thoughts and tests.)

Peter
ID: 28927 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 150
Credit: 3,818,279
RAC: 728
Message 29234 - Posted: 12 Oct 2006, 12:18:16 UTC

> Over the last few days have seen this error
Incorrect Function exit code 1
Exit at: .initialize.cc line:236

has happened on the following workunits
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=36029257
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=36029163
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=36029157
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=36029182
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=36029211
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=36029218
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=36029219
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=36029255
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=36029257

This may help the debugging of 5.32 even though all units are 5.25.
ID: 29234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 9 · 10 · 11 · 12

Message boards : Number crunching : Report Problems with Rosetta Version 5.25



©2024 University of Washington
https://www.bakerlab.org