Minirosetta v1.40 bug thread

Message boards : Number crunching : Minirosetta v1.40 bug thread

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 15 · Next

AuthorMessage
Profile Sarel

Send message
Joined: 11 May 06
Posts: 51
Credit: 81,712
RAC: 0
Message 56741 - Posted: 6 Nov 2008, 23:00:25 UTC

Please report any bugs in this version here.

Sarel.
ID: 56741 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 56743 - Posted: 6 Nov 2008, 23:13:16 UTC

The link on the homepage to the bugs thread leads you to the v1.39 thread.
Rosetta Moderator: Mod.Sense
ID: 56743 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 56745 - Posted: 6 Nov 2008, 23:43:09 UTC

we have also located the graphic problem when there is non-protein ligand displayed and implemented a fix to that. So please let us know if you still observe such problems.
ID: 56745 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sarel

Send message
Joined: 11 May 06
Posts: 51
Credit: 81,712
RAC: 0
Message 56749 - Posted: 7 Nov 2008, 1:05:07 UTC

Thanks! Fixed... Sarel
ID: 56749 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Naesbye

Send message
Joined: 30 Jul 08
Posts: 5
Credit: 201,436
RAC: 0
Message 56760 - Posted: 7 Nov 2008, 14:28:32 UTC

My first 1.40 unit ended with a computation error.
ID: 56760 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Odd Braathun

Send message
Joined: 2 Sep 08
Posts: 9
Credit: 16,125
RAC: 0
Message 56772 - Posted: 8 Nov 2008, 15:23:11 UTC

Problem with this task:

Task ID 206078107
Name 1vcc__BOINC_ABRELAX_SPLIT_CONTROL_IGNORE_THE_REST-S25-9-S3-3--1vcc_-_4677_199_0
Workunit 188017112

Exiting numerous times but no "finished" file. Boinc said to reset project.

Odd
ID: 56772 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 56779 - Posted: 9 Nov 2008, 6:48:34 UTC

I have this task running now it is very slow to progress, I watched and it

is only making .001% in 20sec. It has been running for 8hrs,20min and is at

98.050% my run time is 6hrs i haven't had this big a margin to finish before.

Could it be the new mini app 1.40 or the task?

1hzh_2fiw_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_76

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=188078859

I'll let it run to end.

pete.
ID: 56779 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Odd Braathun

Send message
Joined: 2 Sep 08
Posts: 9
Credit: 16,125
RAC: 0
Message 56780 - Posted: 9 Nov 2008, 9:41:54 UTC

I have had one of these, too, but have now aborted it.

Task ID 206030023
Name 1hzh_1juv_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_27_0
Workunit 187974469

I had also
Task ID 206101035
Name IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_2osa_4683_197_0
Workunit 188036989

This task ran smoothly for 2 hours, but ended up with a validate error.

Odd
ID: 56780 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aegis Maelstrom

Send message
Joined: 29 Oct 08
Posts: 61
Credit: 2,137,555
RAC: 0
Message 56781 - Posted: 9 Nov 2008, 11:48:34 UTC

Hi,
I'm just having a similar problem as above.

Task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1

restarted twice so far, now processing:

2008-11-09 07:36:00|rosetta@home|Starting IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1
2008-11-09 07:36:35|rosetta@home|Starting task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 using minirosetta version 140
2008-11-09 09:36:44|rosetta@home|Task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 exited with zero status but no 'finished' file
2008-11-09 09:36:45|rosetta@home|If this happens repeatedly you may need to reset the project.
2008-11-09 09:38:42|rosetta@home|Restarting task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 using minirosetta version 140
2008-11-09 12:16:02|rosetta@home|Task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 exited with zero status but no 'finished' file
2008-11-09 12:16:03|rosetta@home|If this happens repeatedly you may need to reset the project.
2008-11-09 12:16:48|rosetta@home|Restarting task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 using minirosetta version 140

Just before the second restart, it had a progress near 60% and it was on a "model 1" 11000+ step, I guess unfolding/testing a beautifully folded protein (step around 10000 had a lower "low energy" than 11000) - I've made a snapshot.

When it restarted, it began from something like 18% and a still not enough folded protein. The time elapsed has been reduced as well.

What I would like to ask first is to add some checkpoints, it would help to process and bugtest. Now I am waiting to check if this workunit is endable.
ID: 56781 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aegis Maelstrom

Send message
Joined: 29 Oct 08
Posts: 61
Credit: 2,137,555
RAC: 0
Message 56782 - Posted: 9 Nov 2008, 13:39:32 UTC - in response to Message 56781.  


Task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1

restarted twice so far, now processing:

(...)

Now I am waiting to check if this workunit is endable.


The Workunit restarted third time, seemingly in the same place as the previous time (the percentage "completed" was higher but I was checking a couple minutes earlier and it was once again step 10000 then, so now it was probably 11000).

The WU started for the fourth time, now with 24% but I guess it was the same moment as before. When I restarted the WU after temporarily halting once again, it went back to 17%. Now I can see 18,23% and step 523.

Now I am halting this task and my business with Rosetta.

When the BOINC tried to download a different task, I got a following log:
2008-11-09 14:29:23|rosetta@home|Message from server: No work sent
2008-11-09 14:29:23|rosetta@home|Message from server: Your preferences limit memory usage to 452 MB, and 488 MB is needed

The problem seems to be with a higher memory usage although one of the mods recently assured us that there is no increase in memory requirements.
I could increase amount of memory dedicated to BOINC, however I would like to have this problem explained and ironed out.

Frankly speaking, as this is just a next computational problem in a few days, any explanations from Rosetta developers/maintainers would be highly appreciated. Thanks for your co-operation and good luck.
ID: 56782 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Path7

Send message
Joined: 25 Aug 07
Posts: 128
Credit: 61,751
RAC: 0
Message 56783 - Posted: 9 Nov 2008, 14:29:56 UTC

Hello all,
Just saw an error from this WU:
loopbuild_boinc4_hombench_loopbuild_t308__IGNORE_THE_REST_1UKVY_1_4693_12_0

<core_client_version>6.2.25</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
Too many restarts with no progress. Keep application in memory while preempted.
======================================================
DONE :: 1 starting structures 24.3206 cpu seconds
This process generated 0 decoys from 0 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>loopbuild_boinc4_hombench_loopbuild_t308__IGNORE_THE_REST_1UKVY_1_4693_12_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
</message>

Well, looks like 2 errors: Too many restarts & file_xfer error.

Be aware: I'm running WCG's (beta-)BOINC 6.2.25, which seems to be pretty stable (so far).

Have a nice day,
Path7.
ID: 56783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Path7

Send message
Joined: 25 Aug 07
Posts: 128
Credit: 61,751
RAC: 0
Message 56786 - Posted: 9 Nov 2008, 16:27:54 UTC

And another error:
oopbuild_boinc4_hombench_loopbuild_t326__IGNORE_THE_REST_1I1QB_3_4700_8_0
failed with:

ERROR: NANs occured in hbonding!
ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763
called boinc_finish

Have a nice day,
Path7.
ID: 56786 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
neil.hunter14

Send message
Joined: 9 May 06
Posts: 10
Credit: 278,867
RAC: 0
Message 56792 - Posted: 9 Nov 2008, 19:21:12 UTC - in response to Message 56779.  

I have this task running now it is very slow to progress, I watched and it

is only making .001% in 20sec. It has been running for 8hrs,20min and is at

98.050% my run time is 6hrs i haven't had this big a margin to finish before.

Could it be the new mini app 1.40 or the task?

1hzh_2fiw_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_76

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=188078859

I'll let it run to end.

pete.




I grabbed a few WUs on both an XP and Linux m/c.
Both have the same problem for me, in that they get to around 98% complete, then seem to just hang there. Completion does not take place and I have aborted all 1.40 WUs on both PCs for now.

Neil, UK.

ID: 56792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
neil.hunter14

Send message
Joined: 9 May 06
Posts: 10
Credit: 278,867
RAC: 0
Message 56793 - Posted: 9 Nov 2008, 19:30:28 UTC - in response to Message 56792.  



[/quote]

I grabbed a few WUs on both an XP and Linux m/c.
Both have the same problem for me, in that they get to around 98% complete, then seem to just hang there. Completion does not take place and I have aborted all 1.40 WUs on both PCs for now.

Neil, UK.
[/quote]


......they all seem to finally stick with 9m 53s to the end of the WU.
ID: 56793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 56794 - Posted: 9 Nov 2008, 21:25:25 UTC - in response to Message 56779.  
Last modified: 9 Nov 2008, 21:59:01 UTC

I have this task running now it is very slow to progress, I watched and it

is only making .001% in 20sec. It has been running for 8hrs,20min and is at

98.050% my run time is 6hrs i haven't had this big a margin to finish before.

Could it be the new mini app 1.40 or the task?

1hzh_2fiw_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_76

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=188078859

I'll let it run to end.

pete.


Well it finally finished after 11hrs not very happy, something needs to be fixed.

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures 39696.6 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>

Over _ Success _ Done _ 39,697.10 _ 278.57 _ 16.10

b.t.w the credit is a bad joke.

pete.
ID: 56794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Allan Hojgaard

Send message
Joined: 4 May 08
Posts: 9
Credit: 591,749
RAC: 0
Message 56798 - Posted: 10 Nov 2008, 0:12:58 UTC
Last modified: 10 Nov 2008, 0:22:29 UTC

Adding my share of long working WUs:

1hzh_2pww_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_86

Result:

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# cpu_run_time_pref: 21600
======================================================
DONE :: 1 starting structures 39652.9 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>

As many have said before I do not mind crunching large WUs, but I would like to be credited/warned about it beforehand. Currently one of my cores is working on
1hzh_1a58_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_87_0 and it has now been working on it for 14 hours and 24 minutes and it has reached 98.840%. I am sure that I will get very low credit for it like the others in this thread.

This what the graphics show me:
http://www.home.no/kalumba/rosetta.png


Until the mess has been sorted out/properly explained I'm crunching for another project. I'm going to visit the forum frequently as Rosetta@Home is my favourite project.
ID: 56798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 56799 - Posted: 10 Nov 2008, 1:47:26 UTC

Looks like i have another run away task it's at 6hrs, 45min at 97.655% and as

slow as wet cement about .001% every 10 sec better then the last one but not much.

I bet i don't get much for it if & when it finisher's.

1hzh_2fe5_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_76

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=188078846

pete.

ID: 56799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 56800 - Posted: 10 Nov 2008, 6:16:10 UTC - in response to Message 56786.  

And another error:
oopbuild_boinc4_hombench_loopbuild_t326__IGNORE_THE_REST_1I1QB_3_4700_8_0
failed with:

ERROR: NANs occured in hbonding!
ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763
called boinc_finish

Have a nice day,
Path7.


I got the same error on one of my Linux nodes:
h005__BOINC_ABRELAX_RANGE_yebf_IGNORE_THE_REST-S25-7-S3-8--h005_-_4675_19_0
ID: 56800 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1229
Credit: 14,172,067
RAC: 1,095
Message 56802 - Posted: 10 Nov 2008, 13:06:47 UTC

I've got another one of those workunits that are running longer than expected:

11/9/2008 5:57:49 PM|rosetta@home|Starting 1hzh_1o9g_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_155_1
11/9/2008 5:57:54 PM|rosetta@home|Starting task 1hzh_1o9g_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_155_1 using minirosetta version 140

Last night, it had accumulated about 6 CPU hours and claimed that it would finish in another 10 CPU minutes. This morning, it has accumulated over 12 CPU hours and claims that it will finish in another 9 CPU minutes and 56 seconds.

Also, it's currently the most memory hungry process on my machine. The Windows Task Manager recently said it was using over 256,000K of memory - over 10 times as much as the next process - but then dropped that to a little over 200,000K and is now 223,132K.

Since it hasn't let any other process take a turn in its CPU core for much longer than the 2 hours I've tried to set it for, I'll suspend it for a while and see if that helps.

The other person with a similar workunit had a compute error after about 6 CPU hours.
ID: 56802 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile caesar1987
Avatar

Send message
Joined: 28 Nov 06
Posts: 13
Credit: 22,268
RAC: 0
Message 56804 - Posted: 10 Nov 2008, 13:49:00 UTC - in response to Message 56802.  


Last night, it had accumulated about 6 CPU hours and claimed that it would finish in another 10 CPU minutes. This morning, it has accumulated over 12 CPU hours and claims that it will finish in another 9 CPU minutes and 56 seconds.

Also, it's currently the most memory hungry process on my machine. The Windows Task Manager recently said it was using over 256,000K of memory - over 10 times as much as the next process - but then dropped that to a little over 200,000K and is now 223,132K.
same by me
it say that it will finish in 9minuter and 51 sec. But by me is accumulates only 5 hour 5 min, but las hour it is the same.

"mini"rosetta mem usage -cca 290,000 K, VMsize - 320,000 K!!!
whats on this mini?

ID: 56804 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 15 · Next

Message boards : Number crunching : Minirosetta v1.40 bug thread



©2024 University of Washington
https://www.bakerlab.org