Minirosetta v1.47 bug thread.

Message boards : Number crunching : Minirosetta v1.47 bug thread.

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

AuthorMessage
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 58121 - Posted: 23 Dec 2008, 0:22:58 UTC

yet another one dies...what is going on? is it the program or my OC speed? this makes 12 in 2 days.

https://boinc.bakerlab.org/rosetta/result.php?resultid=216194755
Name t073_1_RDC_NMR_NESG_5563_176398_0
Workunit 197027384
Server state Over
Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
CPU time 25.375
stderr out

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400

ID: 58121 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 58126 - Posted: 23 Dec 2008, 3:31:31 UTC - in response to Message 58119.  

Hi greg_be, this WU is one of my jobs and I just double checked this sub-batch, so far about 9000 clients have returned results successfully with normal error rate. The fact that you recently have got same error code from many different Rosetta@home workunits makes me think that it is more likely due to some certain incompatible setup on your computer, though I don't know what is exactly causing this. Did this problem happen to you before?

your vanilla task died at 2hrs and 23 mins.
this makes about 12 failures now in 2 days.
https://boinc.bakerlab.org/rosetta/result.php?resultid=216178144
1g47A_BOINC_MPZN_vanilla_abrelax_5901_7554_0
Client state Compute error
Exit status -1073741819 (0xc0000005)
CPU time 8912.25
stderr out

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400



ID: 58126 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 58130 - Posted: 23 Dec 2008, 9:09:13 UTC - in response to Message 58126.  
Last modified: 23 Dec 2008, 9:11:41 UTC

Chu,

Thanks for replying. I suspect its my over clock speed. If you have that many clients returning good tasks and I see the last one I posted went through to another client ok, I have to assume my speed is to high for these tasks on RAH.
I dropped the speed by 10 mhz to see if that corrects the problem, if it continues then I will drop it some more until things become steady. as of a week ago I could run at the faster speed with no problems. But this week the majority die.

Normally when my speed is to high, the tasks fail immediately. So I don't understand how a task can run eight thousand seconds and then crash. I had another one that ran up to 10 mins of completion and died.

Can you tell me how to see the difference between a error due to windows or OC speed vs a program error that triggers a windows dump with '-1073741819 (0xc0000005)'?

Thanks again for the reply.

Hi greg_be, this WU is one of my jobs and I just double checked this sub-batch, so far about 9000 clients have returned results successfully with normal error rate. The fact that you recently have got same error code from many different Rosetta@home workunits makes me think that it is more likely due to some certain incompatible setup on your computer, though I don't know what is exactly causing this. Did this problem happen to you before?

your vanilla task died at 2hrs and 23 mins.
this makes about 12 failures now in 2 days.
https://boinc.bakerlab.org/rosetta/result.php?resultid=216178144
1g47A_BOINC_MPZN_vanilla_abrelax_5901_7554_0
Client state Compute error
Exit status -1073741819 (0xc0000005)
CPU time 8912.25
stderr out

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400


ID: 58130 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
HA-SOFT, s.r.o.

Send message
Joined: 27 Jan 07
Posts: 10
Credit: 94,518,643
RAC: 0
Message 58132 - Posted: 23 Dec 2008, 9:48:28 UTC - in response to Message 58130.  

I have the same problem on 64 bit Win 2008 server only for all Minirosetta tasks. Minirosetta 1.45 had this problem too. All other PC (32bit, XP64bit) have no problem.

Zdenek


Chu,

Thanks for replying. I suspect its my over clock speed. If you have that many clients returning good tasks and I see the last one I posted went through to another client ok, I have to assume my speed is to high for these tasks on RAH.
I dropped the speed by 10 mhz to see if that corrects the problem, if it continues then I will drop it some more until things become steady. as of a week ago I could run at the faster speed with no problems. But this week the majority die.

Normally when my speed is to high, the tasks fail immediately. So I don't understand how a task can run eight thousand seconds and then crash. I had another one that ran up to 10 mins of completion and died.

Can you tell me how to see the difference between a error due to windows or OC speed vs a program error that triggers a windows dump with '-1073741819 (0xc0000005)'?

Thanks again for the reply.



ID: 58132 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,217,610
RAC: 822
Message 58134 - Posted: 23 Dec 2008, 11:17:31 UTC - in response to Message 58130.  

Chu,

Thanks for replying. I suspect its my over clock speed. If you have that many clients returning good tasks and I see the last one I posted went through to another client ok, I have to assume my speed is to high for these tasks on RAH.
I dropped the speed by 10 mhz to see if that corrects the problem, if it continues then I will drop it some more until things become steady. as of a week ago I could run at the faster speed with no problems. But this week the majority die.

Normally when my speed is to high, the tasks fail immediately. So I don't understand how a task can run eight thousand seconds and then crash. I had another one that ran up to 10 mins of completion and died.


I think this could happen if the system is close to being okay but just on that edge. ie if your task has run for only 7 thousand seconds it would have completed just fine, but that little extra time pushed it over the edge. This could come from the ram being pushed, the hard drive saying enough, the cpu sending that one bit of data too fast etc etc. By backing off in 10 mhz increments I think you will find the solution fairly quickly. Then you could even go back up in 1 mhz increments until the errors come back.
ID: 58134 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 58137 - Posted: 23 Dec 2008, 11:39:54 UTC - in response to Message 58134.  

Chu,

Thanks for replying. I suspect its my over clock speed. If you have that many clients returning good tasks and I see the last one I posted went through to another client ok, I have to assume my speed is to high for these tasks on RAH.
I dropped the speed by 10 mhz to see if that corrects the problem, if it continues then I will drop it some more until things become steady. as of a week ago I could run at the faster speed with no problems. But this week the majority die.

Normally when my speed is to high, the tasks fail immediately. So I don't understand how a task can run eight thousand seconds and then crash. I had another one that ran up to 10 mins of completion and died.


I think this could happen if the system is close to being okay but just on that edge. ie if your task has run for only 7 thousand seconds it would have completed just fine, but that little extra time pushed it over the edge. This could come from the ram being pushed, the hard drive saying enough, the cpu sending that one bit of data too fast etc etc. By backing off in 10 mhz increments I think you will find the solution fairly quickly. Then you could even go back up in 1 mhz increments until the errors come back.


i suspect your right about the ram frequency and the cpu. probably just a bit to high for these tasks now. i might raise it by 5 mhz after tonight just to see what happens. my RAC is already low enough. i can't "afford" to take much more in errors.
ID: 58137 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 58144 - Posted: 23 Dec 2008, 19:00:03 UTC - in response to Message 58132.  

greb_be and all,

When there is a new version of minirosetta update, we usually put a windows debug symbol image in a downloadable location. So when a WU crashes out, it should provide a backtrace of how an error is caused (this does not work every time and that makes our debugging very hard). If it is an error from Minirosetta program or bad command line/input file setup, the stdout or stderr usually will print out a message as hints, for example, the hbond NAN problem in the previous versions. Also, we should see a significantly higher error rate among either all or certain batches of WUs running. If it is caused by interfacing with the host's hardware or software, we will usually see that certain client hosts kept encountering errors or failure. We wish we could tell what have been wrong in every scenario when an error occurs, however, most of us Rosetta developer are far from being an expert on computer software/hardware and we can only hope to trap errors locally on our testing machines to continue with debugging.

Thank you all for voluntarily helping us on doing this project and sorry about any inconvenience/trouble caused on your computer. Please continue to report problems and/or possible fixes you have found as every bit of such information will certainly help us to improve R@H stability and resolve hidden bugs/problems sooner or later. Happy holidays to every one and happy crunching!

I have the same problem on 64 bit Win 2008 server only for all Minirosetta tasks. Minirosetta 1.45 had this problem too. All other PC (32bit, XP64bit) have no problem.

Zdenek


Chu,

Thanks for replying. I suspect its my over clock speed. If you have that many clients returning good tasks and I see the last one I posted went through to another client ok, I have to assume my speed is to high for these tasks on RAH.
I dropped the speed by 10 mhz to see if that corrects the problem, if it continues then I will drop it some more until things become steady. as of a week ago I could run at the faster speed with no problems. But this week the majority die.

Normally when my speed is to high, the tasks fail immediately. So I don't understand how a task can run eight thousand seconds and then crash. I had another one that ran up to 10 mins of completion and died.

Can you tell me how to see the difference between a error due to windows or OC speed vs a program error that triggers a windows dump with '-1073741819 (0xc0000005)'?

Thanks again for the reply.




ID: 58144 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 58145 - Posted: 23 Dec 2008, 19:44:39 UTC

Chu,

I reduced the OC amount by 10 mhz and then brought it back up 5 mhz.
Everything seems stable now as I have run nearly a day without trouble since backing down. It would seem your program is more and more sensitive to tiny things that high OC rates create. In any case backing down the cpu OC speed a bit seems to have solved this issue.

thanks for taking the time to discuss this problem with me and the other person.
ID: 58145 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile staffann

Send message
Joined: 7 Oct 07
Posts: 7
Credit: 69,937
RAC: 0
Message 58146 - Posted: 23 Dec 2008, 22:00:59 UTC

I had one WU crash on me today. Running on a WinXPSP3 Athlon X2 3800+ with 1Gb RAM. Link to task details.

216493218
Name 1nkuA_BOINC_MPZN_vanilla_abrelax_5901_16326_0
Workunit 197297715
Created 23 Dec 2008 8:53:31 UTC
Sent 23 Dec 2008 9:33:56 UTC
Received 23 Dec 2008 22:08:04 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 625945
Report deadline 2 Jan 2009 9:33:56 UTC
CPU time 4928.609
stderr out

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
ID: 58146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 826
Message 58150 - Posted: 24 Dec 2008, 2:20:48 UTC - in response to Message 58137.  

Chu,

Thanks for replying. I suspect its my over clock speed. If you have that many clients returning good tasks and I see the last one I posted went through to another client ok, I have to assume my speed is to high for these tasks on RAH.
I dropped the speed by 10 mhz to see if that corrects the problem, if it continues then I will drop it some more until things become steady. as of a week ago I could run at the faster speed with no problems. But this week the majority die.

Normally when my speed is to high, the tasks fail immediately. So I don't understand how a task can run eight thousand seconds and then crash. I had another one that ran up to 10 mins of completion and died.


I think this could happen if the system is close to being okay but just on that edge. ie if your task has run for only 7 thousand seconds it would have completed just fine, but that little extra time pushed it over the edge. This could come from the ram being pushed, the hard drive saying enough, the cpu sending that one bit of data too fast etc etc. By backing off in 10 mhz increments I think you will find the solution fairly quickly. Then you could even go back up in 1 mhz increments until the errors come back.


i suspect your right about the ram frequency and the cpu. probably just a bit to high for these tasks now. i might raise it by 5 mhz after tonight just to see what happens. my RAC is already low enough. i can't "afford" to take much more in errors.


Do you think it could be a problem in BOINC 6.4.5 instead? Chu, could you check how many machines running workunits from that batch under BOINC 6.4.5 on similar hardware have returned successful results?

ID: 58150 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 826
Message 58151 - Posted: 24 Dec 2008, 2:21:13 UTC - in response to Message 58137.  

Chu,

Thanks for replying. I suspect its my over clock speed. If you have that many clients returning good tasks and I see the last one I posted went through to another client ok, I have to assume my speed is to high for these tasks on RAH.
I dropped the speed by 10 mhz to see if that corrects the problem, if it continues then I will drop it some more until things become steady. as of a week ago I could run at the faster speed with no problems. But this week the majority die.

Normally when my speed is to high, the tasks fail immediately. So I don't understand how a task can run eight thousand seconds and then crash. I had another one that ran up to 10 mins of completion and died.


I think this could happen if the system is close to being okay but just on that edge. ie if your task has run for only 7 thousand seconds it would have completed just fine, but that little extra time pushed it over the edge. This could come from the ram being pushed, the hard drive saying enough, the cpu sending that one bit of data too fast etc etc. By backing off in 10 mhz increments I think you will find the solution fairly quickly. Then you could even go back up in 1 mhz increments until the errors come back.


i suspect your right about the ram frequency and the cpu. probably just a bit to high for these tasks now. i might raise it by 5 mhz after tonight just to see what happens. my RAC is already low enough. i can't "afford" to take much more in errors.


Do you think it could be a problem in BOINC 6.4.5 instead? Chu, could you check how many machines running workunits from that batch under BOINC 6.4.5 on similar hardware have returned successful results?

ID: 58151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (_KoDAk_)

Send message
Joined: 18 Jul 06
Posts: 109
Credit: 1,859,263
RAC: 0
Message 58154 - Posted: 24 Dec 2008, 6:35:31 UTC

Exit status -1073741819 (0xc0000005)
https://boinc.bakerlab.org/rosetta/result.php?resultid=214936635
https://boinc.bakerlab.org/rosetta/result.php?resultid=216341024
https://boinc.bakerlab.org/rosetta/result.php?resultid=215006649
https://boinc.bakerlab.org/rosetta/result.php?resultid=214872151
Exit status 1 (0x1)
https://boinc.bakerlab.org/rosetta/result.php?resultid=212896182


ID: 58154 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,217,610
RAC: 822
Message 58156 - Posted: 24 Dec 2008, 12:50:29 UTC - in response to Message 58150.  

Do you think it could be a problem in BOINC 6.4.5 instead? Chu, could you check how many machines running workunits from that batch under BOINC 6.4.5 on similar hardware have returned successful results?


I am using version 6.4.5, on some of my pc's, and am not having any issues.
ID: 58156 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dalton

Send message
Joined: 30 Nov 05
Posts: 2
Credit: 27,777,725
RAC: 0
Message 58158 - Posted: 24 Dec 2008, 14:04:03 UTC - in response to Message 58157.  

I found this WU stalled after 15 hrs. I suspended the task and then reenabled it later. After it started again it stalled at the same point. I looked at the box and it had a popup saying that it had a C++ runtime error that had asked to be shutdown in an unusual way.

STDERR OUT

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
# cpu_run_time_pref: 86400

</stderr_txt>
]]>



I've been getting those C++ popups as well on multiple configs machine/os, it seems as if then that core on the cpu refuses to get work after that. This is a new event for me.
ID: 58158 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 58159 - Posted: 24 Dec 2008, 14:18:26 UTC - in response to Message 58151.  

Chu,

Thanks for replying. I suspect its my over clock speed. If you have that many clients returning good tasks and I see the last one I posted went through to another client ok, I have to assume my speed is to high for these tasks on RAH.
I dropped the speed by 10 mhz to see if that corrects the problem, if it continues then I will drop it some more until things become steady. as of a week ago I could run at the faster speed with no problems. But this week the majority die.

Normally when my speed is to high, the tasks fail immediately. So I don't understand how a task can run eight thousand seconds and then crash. I had another one that ran up to 10 mins of completion and died.


I think this could happen if the system is close to being okay but just on that edge. ie if your task has run for only 7 thousand seconds it would have completed just fine, but that little extra time pushed it over the edge. This could come from the ram being pushed, the hard drive saying enough, the cpu sending that one bit of data too fast etc etc. By backing off in 10 mhz increments I think you will find the solution fairly quickly. Then you could even go back up in 1 mhz increments until the errors come back.


i suspect your right about the ram frequency and the cpu. probably just a bit to high for these tasks now. i might raise it by 5 mhz after tonight just to see what happens. my RAC is already low enough. i can't "afford" to take much more in errors.


Do you think it could be a problem in BOINC 6.4.5 instead? Chu, could you check how many machines running workunits from that batch under BOINC 6.4.5 on similar hardware have returned successful results?



robert, after dropping the OC 10 mhz and then bringing it back 5mhz (total reduction 5 mhz) I have not had any further issues. so at least for my machine the errors were caused by OC'ing to far. this accounts for the huge amount of failures I had. It would seem the the new mini is even more sensitive than 1.45 to whatever signals OC'ing produces. For those who get 1 failure in 20 tasks, then your not having the same problem as I was. Also I am on 6.4.5 after upgrading from the old version.
ID: 58159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 58160 - Posted: 24 Dec 2008, 14:20:59 UTC - in response to Message 58154.  

Exit status -1073741819 (0xc0000005)
https://boinc.bakerlab.org/rosetta/result.php?resultid=214936635
https://boinc.bakerlab.org/rosetta/result.php?resultid=216341024
https://boinc.bakerlab.org/rosetta/result.php?resultid=215006649
https://boinc.bakerlab.org/rosetta/result.php?resultid=214872151
Exit status 1 (0x1)
https://boinc.bakerlab.org/rosetta/result.php?resultid=212896182



kodak, that looks similar to the rash of broken tasks I had.
are you OC'd at all?
ID: 58160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 58163 - Posted: 24 Dec 2008, 21:59:14 UTC

Hi.

I have this task at the moment running, it's odd. This morning when i restarted

the system Boinc was showing 5hrs,4mins completed, when the task got it's turn to

run it dropped back to 1hr,33mins and showing 2 models, it would have done more

than two in the five hours!

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=197257513

Thu 25 Dec 2008 08:42:56 EST|rosetta@home|Restarting task cc_nonideal_1_8_nocst4_hb_t303__IGNORE_THE_REST_1FEZA_6_6019_17_0 using minirosetta version 147

pete.


ID: 58163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 58164 - Posted: 24 Dec 2008, 22:12:02 UTC - in response to Message 58163.  

normally this is due to the last check point set. seems kind of odd that you would lose up to 4hrs of work between check points. it acts like it lost all the latest check point data. it also looks like your running a really old version of boinc. you might want to update to the latest version.

Merry Christmas

Hi.

I have this task at the moment running, it's odd. This morning when i restarted

the system Boinc was showing 5hrs,4mins completed, when the task got it's turn to

run it dropped back to 1hr,33mins and showing 2 models, it would have done more

than two in the five hours!

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=197257513

Thu 25 Dec 2008 08:42:56 EST|rosetta@home|Restarting task cc_nonideal_1_8_nocst4_hb_t303__IGNORE_THE_REST_1FEZA_6_6019_17_0 using minirosetta version 147

pete.


ID: 58164 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 58165 - Posted: 24 Dec 2008, 22:17:22 UTC - in response to Message 58159.  

Chu,

Thanks for replying. I suspect its my over clock speed. If you have that many clients returning good tasks and I see the last one I posted went through to another client ok, I have to assume my speed is to high for these tasks on RAH.
I dropped the speed by 10 mhz to see if that corrects the problem, if it continues then I will drop it some more until things become steady. as of a week ago I could run at the faster speed with no problems. But this week the majority die.

Normally when my speed is to high, the tasks fail immediately. So I don't understand how a task can run eight thousand seconds and then crash. I had another one that ran up to 10 mins of completion and died.


I think this could happen if the system is close to being okay but just on that edge. ie if your task has run for only 7 thousand seconds it would have completed just fine, but that little extra time pushed it over the edge. This could come from the ram being pushed, the hard drive saying enough, the cpu sending that one bit of data too fast etc etc. By backing off in 10 mhz increments I think you will find the solution fairly quickly. Then you could even go back up in 1 mhz increments until the errors come back.


i suspect your right about the ram frequency and the cpu. probably just a bit to high for these tasks now. i might raise it by 5 mhz after tonight just to see what happens. my RAC is already low enough. i can't "afford" to take much more in errors.


Do you think it could be a problem in BOINC 6.4.5 instead? Chu, could you check how many machines running workunits from that batch under BOINC 6.4.5 on similar hardware have returned successful results?



robert, after dropping the OC 10 mhz and then bringing it back 5mhz (total reduction 5 mhz) I have not had any further issues. so at least for my machine the errors were caused by OC'ing to far. this accounts for the huge amount of failures I had. It would seem the the new mini is even more sensitive than 1.45 to whatever signals OC'ing produces. For those who get 1 failure in 20 tasks, then your not having the same problem as I was. Also I am on 6.4.5 after upgrading from the old version.


dec 24 22.15 UTC - system is stable and RAC is slowly returning to normal.
Chu - thanks for taking the time to look into the average return of the various tasks you sent out. It was definitely a case of to much OC and no way to verify it. probably would have got to that conclusion after a few more errors.

ID: 58165 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
stewjack

Send message
Joined: 23 Apr 06
Posts: 39
Credit: 95,871
RAC: 0
Message 58166 - Posted: 24 Dec 2008, 23:12:52 UTC - in response to Message 58163.  

Hi.

I have this task at the moment running, it's odd. This morning when i restarted

the ... task cc_nonideal_1_8_nocst4_hb_t303__IGNORE_THE_REST_1FEZA_6_6019_17_0 using minirosetta version 147
pete.


I have had that happen three times during the last 4 or 5 days. I didn't report it because technically
such actions are not prohibited. The tasks complete and grant credit.
However; I have set my tasks length to 2 hours for now,
and these task run well over that time.

NOTE: I have checkpoint logging turned on!

ALL TIMES APPROX.

4 hours with no ckeckpoints after 40 min
cc_nonideal_3_5_nocst4_hb_t374__IGNORE_THE_REST_2FCKA_10_5832_14_0

3.5 hours with no checkpoints after 35 min
cc2_1_8_mammoth_mix_cen_cst_hb_t332__IGNORE_THE_REST_1V2XA_7_5888_15_0

3 hours with no checkpoints after 50 min
cc_nonideal_0_6_nocst4_hb_t313__IGNORE_THE_REST_1GOJA_10_5910_16_0

NOTE: On the last WU I noticed that when I restarted the task,
well into the no checkpointing period -
checkpointing restarted for a short period of time!




ID: 58166 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

Message boards : Number crunching : Minirosetta v1.47 bug thread.



©2024 University of Washington
https://www.bakerlab.org