Posts by lusvladimir

1) Message boards : Number crunching : Problems with Minirosetta 1.80 (Message 62011)
Posted 29 Jun 2009 by lusvladimir
Post:
Errors for tasks: real_core_1.5_low200_beta_low200_start_hb

http://boinc.bakerlab.org/result.php?resultid=261781005
http://boinc.bakerlab.org/result.php?resultid=261750967
http://boinc.bakerlab.org/result.php?resultid=261750701
http://boinc.bakerlab.org/result.php?resultid=261750699

Ended by the watchdog. Marked invalid.
2) Message boards : Number crunching : Problems with Minirosetta 1.76 (Message 61870)
Posted 21 Jun 2009 by lusvladimir
Post:
http://boinc.bakerlab.org/rosetta/result.php?resultid=259823440
Task ID: 259823440
Name: wRMSF_1_5_core_jumps_mixcst2_hb_t290__IGNORE_THE_REST_12911_2080_0
Workunit: 237144213
InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ)
======================================================
DONE :: 1 starting structures 18047.8 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t290__IGNORE_THE_REST_12911_2080_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

Validate state: Invalid

---

http://boinc.bakerlab.org/rosetta/result.php?resultid=259799581
Task ID: 259799581
Name: wRMSF_1_5_core_jumps_mixcst2_hb_t362__IGNORE_THE_REST_12924_1373_0
Workunit: 237123479

InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ)
======================================================
DONE :: 1 starting structures 18501.5 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>wRMSF_1_5_core_jumps_mixcst2_hb_t362__IGNORE_THE_REST_12924_1373_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

Validate state: Invalid
3) Message boards : Number crunching : Problems with Minirosetta 1.76 (Message 61859)
Posted 20 Jun 2009 by lusvladimir
Post:
Validate error:
Task ID: 259620267
Name: looprebuild_t374_decoy_5_12863_2150_0
Workunit: 236957726
======================================================
DONE :: 1 starting structures 1749.9 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================

Validate state: Invalid
4) Message boards : Number crunching : Report long-running models here (Message 59206)
Posted 31 Jan 2009 by lusvladimir
Post:
This WU stopped after preferred runtime (1hrs) + 4hrs
Debian Linux
Boinc 6.2.14
Rosetta Mini 1.54

1nkuA_BOINC_MPZN_with_zinc_abrelax_cs_frags_6231_156556_0
http://boinc.bakerlab.org/result.php?resultid=224365694
CPU Time: 18255.47

1nkuA_BOINC_MPZN_with_zinc_abrelax_cs_frags_6231_113531_0
http://boinc.bakerlab.org/result.php?resultid=224061637
CPU Time: 18444.31

1nkuA_BOINC_MPZN_with_zinc_abrelax_cs_frags_6231_113176_0
http://boinc.bakerlab.org/result.php?resultid=224058752
CPU Time: 18159.49

stderr out:

...
End of unzipping.
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _00001
# cpu_run_time_pref: 3600
====>
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
5) Message boards : Number crunching : Minirosetta v1.47 bug thread. (Message 58087)
Posted 21 Dec 2008 by lusvladimir
Post:
Running Debian Linux , Boinc 6.2.14.

http://boinc.bakerlab.org/result.php?resultid=215464278

Task ID 215464278
Name cc_nonideal_1_3_nocst4_hb_t286__IGNORE_THE_REST_1VYHA_6_5693_20_0
Workunit 196380006

<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# cpu_run_time_pref: 3600
*** glibc detected *** double free or corruption (!prev): 0x0e13a4f0 ***
SIGABRT: abort called
Stack trace (23 frames):
6) Message boards : Number crunching : Servers running, but no work available?? (Message 55858)
Posted 18 Sep 2008 by lusvladimir
Post:
Server Status as of 18 Sep 2008 11:57:02 UTC
[ Scheduler running ] Queued: 8

18-Sep-2008 13:13:20 [rosetta@home] Sending scheduler request: To fetch work. Requesting 101005 seconds of work, reporting 1 completed tasks
18-Sep-2008 13:13:30 [rosetta@home] Scheduler request succeeded: got 0 new tasks
18-Sep-2008 13:19:52 [rosetta@home] Sending scheduler request: To fetch work. Requesting 102317 seconds of work, reporting 0 completed tasks
18-Sep-2008 13:20:02 [rosetta@home] Scheduler request succeeded: got 0 new tasks
18-Sep-2008 13:37:58 [rosetta@home] Sending scheduler request: To fetch work. Requesting 106150 seconds of work, reporting 0 completed tasks
18-Sep-2008 13:38:03 [rosetta@home] Scheduler request succeeded: got 0 new tasks

7) Message boards : Number crunching : Minirosetta v1.32 bug thread (Message 55258)
Posted 24 Aug 2008 by lusvladimir
Post:
lusvladimir, thank you for all the details. One more question, have you run any other projects when the time change is negative? I mean, do tasks from other projects have a similar problem?


Mod.Sense, thank you for advice about negative time!!!
I read more manual about time synchronization and I was able to tune my system so that the time shift was very very small (millisecons per several hours) and still positive.
NTP daemon now do not need to synchronize the time often, adn rosetta workunits work without errors.
I did not replicate the error on another project (Einstein @ Home), but too little time has passed.
I will continue to monitor the state of the system and in case of errors will announce their way to reproduce.
8) Message boards : Number crunching : Minirosetta v1.32 bug thread (Message 55243)
Posted 23 Aug 2008 by lusvladimir
Post:
lusvladimir, thank you for all the details. One more question, have you run any other projects when the time change is negative? I mean, do tasks from other projects have a similar problem?


I crunch rosetta@home only, but for experiment and resolving this problem
i will try to be connected to other project for linux platforms and inform results after 1-2 days. Thanks you.
9) Message boards : Number crunching : Minirosetta v1.32 bug thread (Message 55237)
Posted 23 Aug 2008 by lusvladimir
Post:
I temporarily stop ntp daemon and not see this error.


Have you seen it fail during a resynch before? Consistently?

Am I correct to presume that if the resynch did not cause a change to the clock, then there is no problem?

Do you have any perspective on whether resynch caused failures on older BOINC releases?

Which Linux distribution are you running?

Do you configure the machine to run at 100% of CPU? Or less?


Thanks and sorry for my english, its not my own language.

Debian Linux
Kernel is 2.6.26-1-686 SMP

Boinc Manager 6.2.14
Machine configure to run at 100 % CPU

In linux system log ..... ntp time change

Aug 23 03:09:12 alpha ntpd[13389]: time reset -0.175490 s
Aug 23 03:09:33 alpha ntpd[13389]: synchronized to 77.234.200.98, stratum 4
Aug 23 03:10:29 alpha ntpd[13389]: synchronized to 87.236.24.179, stratum 2

...and in BOINC stderr.txt at this time (i'm set task_debug on) ....

23-Aug-2008 03:07:27 [rosetta@home] Started download of boinc_homfrags_aa1pxuA03_05.200_v1_3.gz
23-Aug-2008 03:08:05 [rosetta@home] [task_debug] result abinitio_only62_A_1bq9A_4438_2605_0 checkpointed
23-Aug-2008 03:08:44 [rosetta@home] [task_debug] result abinitio_only62_A_1vcc__4438_3676_0 checkpointed
23-Aug-2008 03:08:45 [rosetta@home] [task_debug] result abinitio_only62_A_1vcc__4438_3676_0 checkpointed
23-Aug-2008 03:09:23 [rosetta@home] [task_debug] result abinitio_only62_A_2chf__4434_6914_0 checkpointed
23-Aug-2008 03:09:38 [rosetta@home] [task_debug] result abinitio_homfrag_71_A_2hboA_4443_1214_0 checkpointed
23-Aug-2008 03:09:52 [rosetta@home] Finished download of boinc_homfrags_aa1pxuA03_05.200_v1_3.gz
23-Aug-2008 03:09:52 [rosetta@home] Started download of boinc_homfrags_aa1pxuA09_05.200_v1_3.gz
23-Aug-2008 03:10:32 [rosetta@home] [task_debug] result abinitio_only62_A_1bq9A_4438_2605_0 checkpointed
23-Aug-2008 03:10:44 [rosetta@home] [task_debug] result abinitio_only62_A_1bq9A_4438_2605_0 checkpointed
23-Aug-2008 03:10:56 [rosetta@home] [task_debug] result abinitio_only62_A_1vcc__4438_3676_0 checkpointed
23-Aug-2008 03:11:28 [rosetta@home] Sending scheduler request: To fetch work. Requesting 3081 seconds of work, reporting 0 completed tasks
23-Aug-2008 03:11:31 [rosetta@home] [task_debug] result abinitio_only62_A_2chf__4434_6914_0 checkpointed
23-Aug-2008 03:11:33 [rosetta@home] Scheduler request succeeded: got 1 new tasks
23-Aug-2008 03:11:33 [rosetta@home] [task_debug] result state=NEW for abinitio_only62_A_1ptq__4438_5437_0 from handle_scheduler_reply
23-Aug-2008 03:11:34 [rosetta@home] [task_debug] result state=FILES_DOWNLOADING for abinitio_only62_A_1ptq__4438_5437_0 from CS::update_results
23-Aug-2008 03:12:00 [rosetta@home] [task_debug] result abinitio_homfrag_71_A_2hboA_4443_1214_0 checkpointed
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] Process for abinitio_only62_A_2chf__4434_6914_0 exited
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] task_state=EXITED for abinitio_only62_A_2chf__4434_6914_0 from handle_exited_app
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_2chf__4434_6914_0 from CS::report_result_error
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] exit status 193
23-Aug-2008 03:12:11 [rosetta@home] Computation for task abinitio_only62_A_2chf__4434_6914_0 finished
23-Aug-2008 03:12:11 [rosetta@home] Output file abinitio_only62_A_2chf__4434_6914_0_0 for task abinitio_only62_A_2chf__4434_6914_0 absent
23-Aug-2008 03:12:11 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_2chf__4434_6914_0 from CS::app_finished
23-Aug-2008 03:12:11 [rosetta@home] Starting abinitio_only62_A_1pgx__4438_2667_0
23-Aug-2008 03:12:12 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 4030
23-Aug-2008 03:12:12 [rosetta@home] [task_debug] task_state=EXECUTING for abinitio_only62_A_1pgx__4438_2667_0 from start
23-Aug-2008 03:12:12 [rosetta@home] Starting task abinitio_only62_A_1pgx__4438_2667_0 using minirosetta version 132
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] Process for abinitio_homfrag_71_A_2hboA_4443_1214_0 exited
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] task_state=EXITED for abinitio_homfrag_71_A_2hboA_4443_1214_0 from handle_exited_app
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_homfrag_71_A_2hboA_4443_1214_0 from CS::report_result_error
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] exit status 193
23-Aug-2008 03:12:13 [rosetta@home] Computation for task abinitio_homfrag_71_A_2hboA_4443_1214_0 finished
23-Aug-2008 03:12:13 [rosetta@home] Output file abinitio_homfrag_71_A_2hboA_4443_1214_0_0 for task abinitio_homfrag_71_A_2hboA_4443_1214_0 absent
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_homfrag_71_A_2hboA_4443_1214_0 from CS::app_finished
23-Aug-2008 03:12:13 [rosetta@home] Starting abinitio_homfrag_71_A_2hl7A_4443_1633_0
23-Aug-2008 03:12:13 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 4042
23-Aug-2008 03:12:13 [rosetta@home] [task_debug] task_state=EXECUTING for abinitio_homfrag_71_A_2hl7A_4443_1633_0 from start
23-Aug-2008 03:12:13 [rosetta@home] Starting task abinitio_homfrag_71_A_2hl7A_4443_1633_0 using minirosetta version 132
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] Process for abinitio_only62_A_1pgx__4438_2667_0 exited
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] task_state=EXITED for abinitio_only62_A_1pgx__4438_2667_0 from handle_exited_app
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_1pgx__4438_2667_0 from CS::report_result_error
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] exit status 193
23-Aug-2008 03:12:17 [rosetta@home] Computation for task abinitio_only62_A_1pgx__4438_2667_0 finished
23-Aug-2008 03:12:17 [rosetta@home] Output file abinitio_only62_A_1pgx__4438_2667_0_0 for task abinitio_only62_A_1pgx__4438_2667_0 absent
23-Aug-2008 03:12:17 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_only62_A_1pgx__4438_2667_0 from CS::app_finished
23-Aug-2008 03:12:17 [rosetta@home] Starting abinitio_only62_A_1cc8A_4438_3695_0
23-Aug-2008 03:12:18 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 4061
23-Aug-2008 03:12:18 [rosetta@home] [task_debug] task_state=EXECUTING for abinitio_only62_A_1cc8A_4438_3695_0 from start
23-Aug-2008 03:12:18 [rosetta@home] Starting task abinitio_only62_A_1cc8A_4438_3695_0 using minirosetta version 132
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] Process for abinitio_homfrag_71_A_2hl7A_4443_1633_0 exited
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] task_state=EXITED for abinitio_homfrag_71_A_2hl7A_4443_1633_0 from handle_exited_app
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for abinitio_homfrag_71_A_2hl7A_4443_1633_0 from CS::report_result_error
23-Aug-2008 03:12:21 [rosetta@home] [task_debug] exit status 193

I see my old rosetta result (my own stats) - in BOINC 5.10.45 and 5.96 rosetta client - 4 errors in month
After upgrading BOINC to version 6.2.X and new minirosetta app i see many more errors if ntp is on ....
Stop ntp damon - all works fine without error.
I try manually run ntpdate ( not daemon, only once sync with time server) - after sync, two workunuts fails and then works again without error.
I run rosetta 3 years ago and i do not know in what a problem in my system cause it - kernel, boinc manger, science app or ntp.
10) Message boards : Number crunching : Minirosetta v1.32 bug thread (Message 55233)
Posted 23 Aug 2008 by lusvladimir
Post:
<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)

I observed that the rosetta model I was processing failed with this error after a ntp daemon resynch on my linux mashine.
System clock, when adjusted on a routine resynch, caused the running model to fail because its understanding of time steps changed outside of the model
I temporarily stop ntp daemon and not see this error.
11) Message boards : Number crunching : Minirosetta v1.32 bug thread (Message 55173)
Posted 19 Aug 2008 by lusvladimir
Post:
Please post bugs/issues with minirosetta v1.32 here.


Debian Linux;Boinc Manager 6.2.14

errors from 1.32 tasks

http://boinc.bakerlab.org/rosetta/result.php?resultid=185499939
http://boinc.bakerlab.org/rosetta/result.php?resultid=185493038
http://boinc.bakerlab.org/rosetta/result.php?resultid=185493027
http://boinc.bakerlab.org/rosetta/result.php?resultid=185493026
http://boinc.bakerlab.org/rosetta/result.php?resultid=185489375


<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# cpu_run_time_pref: 3600
needs psipred_ss2 to run filters
needs psipred_ss2 to run filters
SIGSEGV: segmentation violation
Stack trace (19 frames):
[0x8926f8f]
[0x89514e0]
[0xb7f19400]
[0x880c924]
[0x834349c]
[0x88bcc81]
[0x880c5d6]
[0x829591c]
[0x85f20f6]
[0x8072b5d]
[0x807e2e7]
[0x8165bee]
[0x80abecc]
[0x80a9ea4]
[0x80d7044]
[0x80d8651]
[0x804b9f8]
[0x89acfdc]
[0x8048111]

Exiting...

</stderr_txt>
]]>
12) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 49877)
Posted 21 Dec 2007 by lusvladimir
Post:
Ubuntu 7.10 and Core2Duo
Progress indicators do not progress and show on my two WU's 0% and 0.014%
I'm wait 5 hours - progress freeze, CPU usage - 100 at both WU's.






©2024 University of Washington
https://www.bakerlab.org