Minirosetta v1.32 bug thread

Message boards : Number crunching : Minirosetta v1.32 bug thread

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
Philosopher2

Send message
Joined: 28 Mar 06
Posts: 3
Credit: 111,037
RAC: 0
Message 55156 - Posted: 18 Aug 2008, 11:43:06 UTC - in response to Message 54936.  

Please post bugs/issues with minirosetta v1.32 here.


I downloaded and installed v1.32 and the wu came up!

WU 2reb_JUMPRELAX_PREDJUMP_FROMPREDFRAG_SAVE_ALL_OUT-2re_-_4420_740_0 has been running.

This Wu completed 95 per cent of processing in approx 3 to 4 hours.

From 95.360 percent I have observed that it has taken 9 hours of processing to move upto 98.809 percent!

This Wu is targeted to complete on 8/22/08 ! at this rate of progress I wonder if it will!

IS this the predicted behaviour ?

The time to completion has moved from 00:09.51 to 00:09:54 during these last five days.

I am running two other BOINC applications, hence time is available sequentially for 50 minutes to each application.

Please advise - should I abort or let it carry on till the -whatever ?

ID: 55156 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 55157 - Posted: 18 Aug 2008, 12:26:57 UTC

Philosopher2

It all sounded normal up until you said it has been running for 5 days. But since you are running other projects, we would need to look at the CPU time used to understand how much of that time this task was actually running.

You see the initial runtime is just an estimate. But the actual is related to the runtime preference in your Rosetta preferences (3hrs is the default). At this is a target, not a hard and fast limit. Rosetta will do it's best to complete within that target time if possible. But it is not always possible. In cases where it is not possible to complete in your desired runtime, the time estimate will get down to about 11 minutes and then move exponentially slower until it completes and jumps to 100%.

There is a watchdog thread that will check in on the tasks every 15 minutes or so and see if it thinks things are running normally or not.

I suggest the following:

If that task has 5 or more days of CPU time (120 hours), which is pretty unlikely, then abort it. If it has run for 9 hours, let it run.

If not, take a look at what your Rosetta preferences have configured for the venue of that PC, and post back here with the details of both your preference, and the actual CPU time you see for the task. The watchdog task will end the task if it runs longer then 5 times your runtime preference. So that would be 15 hours with the defaults. You should follow the same guideline.

I would also suggest you review your computing preferences and check the box to keep tasks in memory while suspended. Since you are switching projects every 50 minutes, you will be losing a lot of work if you do not keep the tasks in memory.

Oh, and you shouldn't have to download anything manually. You said you downloaded 1.32. I wasn't positive if you meant that you did this manually, or if BOINC did this during the normal file transfers.
Rosetta Moderator: Mod.Sense
ID: 55157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lusvladimir

Send message
Joined: 18 Oct 05
Posts: 12
Credit: 1,784,854
RAC: 0
Message 55173 - Posted: 19 Aug 2008, 5:46:02 UTC - in response to Message 54936.  
Last modified: 19 Aug 2008, 5:52:15 UTC

Please post bugs/issues with minirosetta v1.32 here.


Debian Linux;Boinc Manager 6.2.14

errors from 1.32 tasks

https://boinc.bakerlab.org/rosetta/result.php?resultid=185499939
https://boinc.bakerlab.org/rosetta/result.php?resultid=185493038
https://boinc.bakerlab.org/rosetta/result.php?resultid=185493027
https://boinc.bakerlab.org/rosetta/result.php?resultid=185493026
https://boinc.bakerlab.org/rosetta/result.php?resultid=185489375


<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# cpu_run_time_pref: 3600
needs psipred_ss2 to run filters
needs psipred_ss2 to run filters
SIGSEGV: segmentation violation
Stack trace (19 frames):
[0x8926f8f]
[0x89514e0]
[0xb7f19400]
[0x880c924]
[0x834349c]
[0x88bcc81]
[0x880c5d6]
[0x829591c]
[0x85f20f6]
[0x8072b5d]
[0x807e2e7]
[0x8165bee]
[0x80abecc]
[0x80a9ea4]
[0x80d7044]
[0x80d8651]
[0x804b9f8]
[0x89acfdc]
[0x8048111]

Exiting...

</stderr_txt>
]]>
ID: 55173 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Philosopher2

Send message
Joined: 28 Mar 06
Posts: 3
Credit: 111,037
RAC: 0
Message 55174 - Posted: 19 Aug 2008, 5:59:36 UTC - in response to Message 55157.  
Last modified: 19 Aug 2008, 6:33:19 UTC

Thank you Moderator.
I have changed the prefernce to 120 minutes per application run.
The application will remain in memory as suggested.
This WU has already been running (CPU time) for 15 hours and it is only 98.900 done!
Should I let it go on - I am a bit curious whether it will complete by 22 Aug?
Take care.


Philosopher2

It all sounded normal up until you said it has been running for 5 days. But since you are running other projects, we would need to look at the CPU time used to understand how much of that time this task was actually running.

You see the initial runtime is just an estimate. But the actual is related to the runtime preference in your Rosetta preferences (3hrs is the default). At this is a target, not a hard and fast limit. Rosetta will do it's best to complete within that target time if possible. But it is not always possible. In cases where it is not possible to complete in your desired runtime, the time estimate will get down to about 11 minutes and then move exponentially slower until it completes and jumps to 100%.

There is a watchdog thread that will check in on the tasks every 15 minutes or so and see if it thinks things are running normally or not.

I suggest the following:

If that task has 5 or more days of CPU time (120 hours), which is pretty unlikely, then abort it. If it has run for 9 hours, let it run.

If not, take a look at what your Rosetta preferences have configured for the venue of that PC, and post back here with the details of both your preference, and the actual CPU time you see for the task. The watchdog task will end the task if it runs longer then 5 times your runtime preference. So that would be 15 hours with the defaults. You should follow the same guideline.

I would also suggest you review your computing preferences and check the box to keep tasks in memory while suspended. Since you are switching projects every 50 minutes, you will be losing a lot of work if you do not keep the tasks in memory.

Oh, and you shouldn't have to download anything manually. You said you downloaded 1.32. I wasn't positive if you meant that you did this manually, or if BOINC did this during the normal file transfers.

ID: 55174 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
joergent

Send message
Joined: 17 Feb 08
Posts: 1
Credit: 32,031
RAC: 0
Message 55180 - Posted: 19 Aug 2008, 16:34:40 UTC - in response to Message 55094.  

Every morning for the last week, I find the computer frozen, and the mini rosetta on the task bar. I have detached from the project, to see what happens with the other projects.. just for info.



Add to this, that every time my screen saver is running with Minirosetta and the screen has turned black, the PC (windows XP SP3) cannot be returned to its previous state. The screen with rosetta appears, but mouse is not working and I can barely use the keyboard to shut down the PC, which goes very slowly just until the rosetta is killed.

Rosetta has been disabled on my PC !!!
ID: 55180 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Terrasapiens

Send message
Joined: 25 Apr 08
Posts: 15
Credit: 368,919
RAC: 0
Message 55182 - Posted: 20 Aug 2008, 3:18:51 UTC

I've been having a lot of WU errors with mini rosetta ever since version 1.28 and now have had 5 in the past two days. Here's the link to my WUs showing the recent falures: https://boinc.bakerlab.org/rosetta/results.php?userid=254884

I've also had to do a hard shutdown and reboot several times recently after the RAH screen saver apparently locked up the machine. Not sure if v1.32 or 5.98 was running at the time. This seemed to happen after I changed the options setting so the screen would go to black after a few minutes. I undid the setting and have had no application crashes since then, but I'm not totally sure the crashes were due to that change.

ID: 55182 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 55183 - Posted: 20 Aug 2008, 4:07:10 UTC - in response to Message 55182.  

{...}
I've also had to do a hard shutdown and reboot several times recently after the RAH screen saver apparently locked up the machine. {...}


There is a workaround for this:

Ctrl + Shift + Esc to force the Task Manager, then carefully move the mouse around until you can "find" it in the Task Manager window, at which point you can kill the screensaver process without having to shutdown the computer. As always, YMMV, but I've not lost any crunching time using this method.

But the simplest solution is to not use the BOINC screensaver...

Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 55183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Terrasapiens

Send message
Joined: 25 Apr 08
Posts: 15
Credit: 368,919
RAC: 0
Message 55188 - Posted: 20 Aug 2008, 6:27:13 UTC - in response to Message 55183.  
Last modified: 20 Aug 2008, 6:27:51 UTC



But the simplest solution is to not use the BOINC screensaver...


I disabled the screensaver and will see if this minimizes the WUs crashing as well. Thanks for the info.
ID: 55188 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Terrasapiens

Send message
Joined: 25 Apr 08
Posts: 15
Credit: 368,919
RAC: 0
Message 55196 - Posted: 21 Aug 2008, 3:53:17 UTC
Last modified: 21 Aug 2008, 4:01:07 UTC

I decided to disable the BOINC screensaver yesterday to see if that would have any affect on the number of failed WUs, but it didn't seem to make any difference. Two more failed today:
https://boinc.bakerlab.org/rosetta/result.php?resultid=186050178
https://boinc.bakerlab.org/rosetta/result.php?resultid=186005807
ID: 55196 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Wilkins

Send message
Joined: 5 Feb 08
Posts: 1
Credit: 4,513
RAC: 0
Message 55201 - Posted: 21 Aug 2008, 13:19:56 UTC
Last modified: 21 Aug 2008, 13:20:15 UTC

I successfully completed a 1.32 run but had a lot of this message in my stderr file:

needs psipred_ss2 to run filters

Is that a problem?
Thanks,
Jim
ID: 55201 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AdeB
Avatar

Send message
Joined: 12 Dec 06
Posts: 45
Credit: 4,428,086
RAC: 0
Message 55207 - Posted: 21 Aug 2008, 18:26:55 UTC

Compute error in this workunit.

stderr out:
<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
# cpu_run_time_pref: 43200

ERROR: NANs occured in hbonding!
ERROR:: Exit from: src/core/scoring/hbonds/hbonds_geom.cc line: 763
called boinc_finish

</stderr_txt>
]]>

ID: 55207 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mitrichr
Avatar

Send message
Joined: 23 May 07
Posts: 44
Credit: 1,005,660
RAC: 0
Message 55219 - Posted: 22 Aug 2008, 3:12:05 UTC

The graphic is freezing, meaning, I assume, that the WU is a dead fish. I am getting this on three machines so far. I have to abort too often.

>>RSM
http://sciencespringe.wordpress.com
http://facebook.com/sciencesprings


ID: 55219 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5662
Credit: 5,703,589
RAC: 2,175
Message 55222 - Posted: 22 Aug 2008, 8:14:04 UTC

Just found this one bombed in my results
https://boinc.bakerlab.org/rosetta/result.php?resultid=184820750
1ughI_BOINC_ABINITIO_IGNORE_THE_REST-S25-13-S3-3--1ughI-_4309_84_1
Exit status 1 (0x1)
CPU time 1.328125
stderr out <core_client_version>6.2.16</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>

ERROR: Cannot find file 'minirosetta_databasechemical/residue_type_sets/fa_standard/CSD_ATOM_TYPE_SET fa_standard'
ERROR:: Exit from: ....srccorechemicalresidue_io.cc line: 132
called boinc_finish

</stderr_txt>
]]>

ID: 55222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 55226 - Posted: 22 Aug 2008, 13:44:49 UTC - in response to Message 55219.  
Last modified: 22 Aug 2008, 13:46:50 UTC

The graphic is freezing, meaning, I assume, that the WU is a dead fish. {...}


Not necessarily. Try this workaround.
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 55226 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lusvladimir

Send message
Joined: 18 Oct 05
Posts: 12
Credit: 1,784,854
RAC: 0
Message 55233 - Posted: 23 Aug 2008, 7:37:11 UTC

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)

I observed that the rosetta model I was processing failed with this error after a ntp daemon resynch on my linux mashine.
System clock, when adjusted on a routine resynch, caused the running model to fail because its understanding of time steps changed outside of the model
I temporarily stop ntp daemon and not see this error.
ID: 55233 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 55234 - Posted: 23 Aug 2008, 10:53:01 UTC - in response to Message 55233.  

I temporarily stop ntp daemon and not see this error.


Have you seen it fail during a resynch before? Consistently?

Am I correct to presume that if the resynch did not cause a change to the clock, then there is no problem?

Do you have any perspective on whether resynch caused failures on older BOINC releases?

Which Linux distribution are you running?

Do you configure the machine to run at 100% of CPU? Or less?
Rosetta Moderator: Mod.Sense
ID: 55234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lusvladimir

Send message
Joined: 18 Oct 05
Posts: 12
Credit: 1,784,854
RAC: 0
Message 55237 - Posted: 23 Aug 2008, 13:40:15 UTC - in response to Message 55234.  

I temporarily stop ntp daemon and not see this error.


Have you seen it fail during a resynch before? Consistently?

Am I correct to presume that if the resynch did not cause a change to the clock, then there is no problem?

Do you have any perspective on whether resynch caused failures on older BOINC releases?

Which Linux distribution are you running?

Do you configure the machine to run at 100% of CPU? Or less?


Thanks and sorry for my english, its not my own language.

Debian Linux
Kernel is 2.6.26-1-686 SMP

Boinc Manager 6.2.14
Machine configure to run at 100 % CPU

In linux system log ..... ntp time change

Aug 23 03:09:12 alpha ntpd[13389]: time reset -0.175490 s
Aug 23 03:09:33 alpha ntpd[13389]: synchronized to 77.234.200.98, stratum 4
Aug 23 03:10:29 alpha ntpd[13389]: synchronized to 87.236.24.179, stratum 2

...and in BOINC stderr.txt at this time (i'm set task_debug on) ....

23-Aug-2008 03:07:27 [rosetta@home] Started download of boinc_homfrags_aa1pxuA03_05.200_v1_3.gz
23-Aug-2008 03:08:05 [rosetta@home] [task_debug] result abinitio_only62_A_1bq9A_4438_2605_0 checkpointed
23-Aug-2008 03:08:44 [rosetta@home] [task_debug] result abinitio_only62_A_1vcc__4438_3676_0 checkpointed
23-Aug-2008 03:08:45 [rosetta@home] [task_debug] result abinitio_only62_A_1vcc__4438_3676_0 checkpointed
23-Aug-2008 03:09:23 [rosetta@home] [task_debug] result abinitio_only62_A_2chf__4434_6914_0 checkpointed
23-Aug-2008 03:09:38 [rosetta@home] [task_debug] result abinitio_homfrag_71_A_2hboA_4443_1214_0 checkpointed
23-Aug-2008 03:09:52 [rosetta@home] Finished download of boinc_homfrags_aa1pxuA03_05.200_v1_3.gz
23-Aug-2008 03:09:52 [rosetta@home] Started download of boinc_homfrags_aa1pxuA09_05.200_v1_3.gz
23-Aug-2008 03:10:32 [rosetta@home] [task_debug] result abinitio_only62_A_1bq9A_4438_2605_0 checkpointed
23-Aug-2008 03:10:44 [rosetta@home] [task_debug] result abinitio_only62_A_1bq9A_4438_2605_0 checkpointed
23-Aug-2008 03:10:56 [rosetta@home] [task_debug] result abinitio_only62_A_1vcc__4438_3676_0 checkpointed
23-Aug-2008 03:11:28 [rosetta@home] Sending scheduler request: To fetch work. Requesting 3081 seconds of work, reporting 0 completed tasks
23-Aug-2008 03:11:31 [rosetta@home] [task_debug] result abinitio_only62_A_2chf__4434_6914_0 checkpointed
23-Aug-2008 03:11:33 [rosetta@home] Scheduler request succeeded: got 1 new tasks
23-Aug-2008 03:11:33 [rosetta@home] [task_debug] result state=NEW for abinitio_only62_A_1ptq__4438_5437_0 from handle_scheduler_reply
23-Aug-2008 03:11:34 [rosetta@home] [task_debug] result state=FILES_DOWNLOADING for abinitio_only62_A_1ptq__4438_5437_0 from CS::update_results
23-Aug-2008 03:12:00 [rosetta@home] [task_debug] result abinitio_homfrag_71_A_2hboA_4443_1214_0 checkpointed
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] Process for abinitio_only62_A_2chf__4434_6914_0 exited
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] task_state=EXITED for abinitio_only62_A_2chf__4434_6914_0 from handle_exited_app
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_2chf__4434_6914_0 from CS::report_result_error
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] exit status 193
23-Aug-2008 03:12:11 [rosetta@home] Computation for task abinitio_only62_A_2chf__4434_6914_0 finished
23-Aug-2008 03:12:11 [rosetta@home] Output file abinitio_only62_A_2chf__4434_6914_0_0 for task abinitio_only62_A_2chf__4434_6914_0 absent
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_2chf__4434_6914_0 from CS::app_finished
23-Aug-2008 03:12:11 [rosetta@home] Starting abinitio_only62_A_1pgx__4438_2667_0
23-Aug-2008 03:12:12 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 4030
23-Aug-2008 03:12:12 [rosetta@home] [task_debug] task_state=EXECUTING for abinitio_only62_A_1pgx__4438_2667_0 from start
23-Aug-2008 03:12:12 [rosetta@home] Starting task abinitio_only62_A_1pgx__4438_2667_0 using minirosetta version 132
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] Process for abinitio_homfrag_71_A_2hboA_4443_1214_0 exited
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] task_state=EXITED for abinitio_homfrag_71_A_2hboA_4443_1214_0 from handle_exited_app
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_homfrag_71_A_2hboA_4443_1214_0 from CS::report_result_error
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] exit status 193
23-Aug-2008 03:12:13 [rosetta@home] Computation for task abinitio_homfrag_71_A_2hboA_4443_1214_0 finished
23-Aug-2008 03:12:13 [rosetta@home] Output file abinitio_homfrag_71_A_2hboA_4443_1214_0_0 for task abinitio_homfrag_71_A_2hboA_4443_1214_0 absent
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_homfrag_71_A_2hboA_4443_1214_0 from CS::app_finished
23-Aug-2008 03:12:13 [rosetta@home] Starting abinitio_homfrag_71_A_2hl7A_4443_1633_0
23-Aug-2008 03:12:13 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 4042
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] task_state=EXECUTING for abinitio_homfrag_71_A_2hl7A_4443_1633_0 from start
23-Aug-2008 03:12:13 [rosetta@home] Starting task abinitio_homfrag_71_A_2hl7A_4443_1633_0 using minirosetta version 132
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] Process for abinitio_only62_A_1pgx__4438_2667_0 exited
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] task_state=EXITED for abinitio_only62_A_1pgx__4438_2667_0 from handle_exited_app
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_1pgx__4438_2667_0 from CS::report_result_error
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] exit status 193
23-Aug-2008 03:12:17 [rosetta@home] Computation for task abinitio_only62_A_1pgx__4438_2667_0 finished
23-Aug-2008 03:12:17 [rosetta@home] Output file abinitio_only62_A_1pgx__4438_2667_0_0 for task abinitio_only62_A_1pgx__4438_2667_0 absent
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_1pgx__4438_2667_0 from CS::app_finished
23-Aug-2008 03:12:17 [rosetta@home] Starting abinitio_only62_A_1cc8A_4438_3695_0
23-Aug-2008 03:12:18 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 4061
23-Aug-2008 03:12:18 [rosetta@home] [task_debug] task_state=EXECUTING for abinitio_only62_A_1cc8A_4438_3695_0 from start
23-Aug-2008 03:12:18 [rosetta@home] Starting task abinitio_only62_A_1cc8A_4438_3695_0 using minirosetta version 132
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] Process for abinitio_homfrag_71_A_2hl7A_4443_1633_0 exited
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] task_state=EXITED for abinitio_homfrag_71_A_2hl7A_4443_1633_0 from handle_exited_app
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_homfrag_71_A_2hl7A_4443_1633_0 from CS::report_result_error
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] exit status 193

I see my old rosetta result (my own stats) - in BOINC 5.10.45 and 5.96 rosetta client - 4 errors in month
After upgrading BOINC to version 6.2.X and new minirosetta app i see many more errors if ntp is on ....
Stop ntp damon - all works fine without error.
I try manually run ntpdate ( not daemon, only once sync with time server) - after sync, two workunuts fails and then works again without error.
I run rosetta 3 years ago and i do not know in what a problem in my system cause it - kernel, boinc manger, science app or ntp.

ID: 55237 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 55240 - Posted: 23 Aug 2008, 17:18:41 UTC
Last modified: 23 Aug 2008, 17:20:23 UTC

lusvladimir, thank you for all the details. One more question, have you run any other projects when the time change is negative? I mean, do tasks from other projects have a similar problem?
Rosetta Moderator: Mod.Sense
ID: 55240 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lusvladimir

Send message
Joined: 18 Oct 05
Posts: 12
Credit: 1,784,854
RAC: 0
Message 55243 - Posted: 23 Aug 2008, 17:41:40 UTC - in response to Message 55240.  

lusvladimir, thank you for all the details. One more question, have you run any other projects when the time change is negative? I mean, do tasks from other projects have a similar problem?


I crunch rosetta@home only, but for experiment and resolving this problem
i will try to be connected to other project for linux platforms and inform results after 1-2 days. Thanks you.
ID: 55243 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
lusvladimir

Send message
Joined: 18 Oct 05
Posts: 12
Credit: 1,784,854
RAC: 0
Message 55258 - Posted: 24 Aug 2008, 10:27:29 UTC - in response to Message 55240.  

lusvladimir, thank you for all the details. One more question, have you run any other projects when the time change is negative? I mean, do tasks from other projects have a similar problem?


Mod.Sense, thank you for advice about negative time!!!
I read more manual about time synchronization and I was able to tune my system so that the time shift was very very small (millisecons per several hours) and still positive.
NTP daemon now do not need to synchronize the time often, adn rosetta workunits work without errors.
I did not replicate the error on another project (Einstein @ Home), but too little time has passed.
I will continue to monitor the state of the system and in case of errors will announce their way to reproduce.
ID: 55258 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : Number crunching : Minirosetta v1.32 bug thread



©2024 University of Washington
https://www.bakerlab.org