Problems with Minirosetta Version 1.64/1.65

Message boards : Number crunching : Problems with Minirosetta Version 1.64/1.65

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
LizzieBarry

Send message
Joined: 25 Feb 08
Posts: 76
Credit: 201,862
RAC: 0
Message 61031 - Posted: 6 May 2009, 10:33:14 UTC - in response to Message 61026.  

I've tracked down the crashes in the docking runs to 4 runs within the batch. I've now fixed the problem and should rerun those soon (first on ralph).

I'm sorry for the trouble this has caused.

Thanks for your quick feedback!

Sarel

I think I got one of those:

Docking_benchmark_natives__2KAI.mppk.pdb.gzdock_score12_hi.xml_11809_6_0

It looks like it ran for 10 seconds but in fact it ran for over two hours

06/05/2009 08:45:01 rosetta@home Starting task Docking_benchmark_natives__2KAI.mppk.pdb.gzdock_score12_hi.xml_11809_6_0 using minirosetta version 165
06/05/2009 08:48:22 rosetta@home Restarting task Docking_benchmark_natives__2KAI.mppk.pdb.gzdock_score12_hi.xml_11809_6_0 using minirosetta version 165
06/05/2009 08:51:46 rosetta@home Restarting task Docking_benchmark_natives__2KAI.mppk.pdb.gzdock_score12_hi.xml_11809_6_0 using minirosetta version 165
[...]
[...]
06/05/2009 10:53:38 rosetta@home Restarting task Docking_benchmark_natives__2KAI.mppk.pdb.gzdock_score12_hi.xml_11809_6_0 using minirosetta version 165
06/05/2009 10:57:02 rosetta@home Restarting task Docking_benchmark_natives__2KAI.mppk.pdb.gzdock_score12_hi.xml_11809_6_0 using minirosetta version 165
06/05/2009 11:02:22 rosetta@home Computation for task Docking_benchmark_natives__2KAI.mppk.pdb.gzdock_score12_hi.xml_11809_6_0 finished

I the absence of stating which jobs they are I'll be aborting all dockiing...11809_*.* jobs
ID: 61031 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2140
Credit: 41,518,559
RAC: 10,612
Message 61034 - Posted: 6 May 2009, 14:22:40 UTC - in response to Message 61031.  

I've tracked down the crashes in the docking runs to 4 runs within the batch. I've now fixed the problem and should rerun those soon (first on ralph).

I'm sorry for the trouble this has caused.

I think I got one of those:

Docking_benchmark_natives__2KAI.mppk.pdb.gzdock_score12_hi.xml_11809_6_0

It looks like it ran for 10 seconds but in fact it ran for over two hours

In the absence of stating which jobs they are I'll be aborting all docking...11809_*.* jobs

If this was one of those jobs, why has it gone back out to another user?

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=227006233
ID: 61034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TomaszPawel

Send message
Joined: 28 Apr 07
Posts: 54
Credit: 2,791,145
RAC: 0
Message 61035 - Posted: 6 May 2009, 18:52:44 UTC
Last modified: 6 May 2009, 19:01:26 UTC

Hi

I have first error of 1.65

https://boinc.bakerlab.org/rosetta/result.php?resultid=248333213

<core_client_version>6.6.20</core_client_version>
<![CDATA[
<message>
- exit code -1073741680 (0xc0000090)
</message>
<stderr_txt>
[2009- 5- 6 18: 7: 0:] :: BOINC:: Initializing ... ok.
[2009- 5- 6 18: 7: 0:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev29025.zip
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _U18X26X_00001
Starting work on structure: _U18X26X_00002
Starting work on structure: _U18X26X_00003
Starting work on structure: _U18X26X_00004
Starting work on structure: _U18X26X_00005
Starting work on structure: _U18X26X_00006
Starting work on structure: _U18X26X_00007
Starting work on structure: _U18X26X_00008
Starting work on structure: _U18X26X_00009
Starting work on structure: _U18X26X_00010
Starting work on structure: _U18X26X_00011


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Float Invalid Operation (0xc0000090) at address 0x0050A2B9

Engaging BOINC Windows Runtime Debugger...


</stderr_txt>
]]>


2,2 h lost....


and for 1.64

https://boinc.bakerlab.org/rosetta/result.php?resultid=247697671

<core_client_version>6.6.20</core_client_version>
<![CDATA[
<message>
- exit code -1073741680 (0xc0000090)
</message>
<stderr_txt>
[2009- 5- 4 20:20:35:] :: BOINC:: Initializing ... ok.
[2009- 5- 4 20:20:35:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev29025.zip
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _U23X25X_00001
Starting work on structure: _U23X25X_00002
Starting work on structure: _U23X25X_00003
Starting work on structure: _U23X25X_00004
Starting work on structure: _U23X25X_00005
Starting work on structure: _U23X25X_00006
Starting work on structure: _U23X25X_00007
Starting work on structure: _U23X25X_00008
Starting work on structure: _U23X25X_00009


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Float Invalid Operation (0xc0000090) at address 0x0050AEE9

Engaging BOINC Windows Runtime Debugger...


</stderr_txt>
]]>

almost 2h lost...
WWW of Polish National Team - Join! Crunch! Win!
ID: 61035 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ViVac

Send message
Joined: 10 Dec 08
Posts: 4
Credit: 117,352
RAC: 0
Message 61040 - Posted: 6 May 2009, 22:25:59 UTC

aonther (0xc0000005) error here.


https://boinc.bakerlab.org/rosetta/result.php?resultid=249318829
ID: 61040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 61047 - Posted: 7 May 2009, 15:18:42 UTC

task 249354645 failed on Mac during initialization. It's actually the 3rd time it was sent out (16th April was the first: suspect this predates version 1.65)

Watchdog active.
# cpu_run_time_pref: 14400
Hbond tripped: [2009- 5- 6 21:40:20:]

ERROR: dis==0 in pairtermderiv!
ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 333
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 61047 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 61048 - Posted: 7 May 2009, 20:02:51 UTC

This task 249446890 (lr8_seq_score12_rlbd_1wd6_IGNORE_THE_REST_DECOY_11810_826_0) looks OK in the BOINC task pane on Mac but the graphics are strange.

No protein is displayed and the step remains stuck at 0, although the task number slowly increments as usual.
ID: 61048 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 61052 - Posted: 8 May 2009, 5:37:09 UTC - in response to Message 61048.  

This task 249446890 (lr8_seq_score12_rlbd_1wd6_IGNORE_THE_REST_DECOY_11810_826_0) looks OK in the BOINC task pane on Mac but the graphics are strange.

No protein is displayed and the step remains stuck at 0, although the task number slowly increments as usual.


I have a couple of these exhibiting the same odd graphics on my Mac as well. lr8_seq_score12_rlbd_4ubp_IGNORE_THE_REST_DECOY_11810_866_0 has finished successfully albeit very quickly; it completed 99 models in 12737 seconds.

Snags
ID: 61052 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 808,337
RAC: 1
Message 61053 - Posted: 8 May 2009, 6:16:25 UTC - in response to Message 61052.  


I have a couple of these exhibiting the same odd graphics on my Mac as well. lr8_seq_score12_rlbd_4ubp_IGNORE_THE_REST_DECOY_11810_866_0 has finished successfully albeit very quickly; it completed 99 models in 12737 seconds.

lr8 tasks are known to finish way before the run time pref is met. This is perfectly normal I have had a number of these on my windows host.
Have a crunching good day!!
ID: 61053 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 61054 - Posted: 8 May 2009, 6:18:45 UTC

Hi.

I've noticed that too, the tasks starting with lr8_seq_score12_rlbd_ the graphics

are mostly blank. The only thing working is the time & models count,

stage says unknown! Otherwise they run O.K.

pete.


ID: 61054 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Betting Slip

Send message
Joined: 26 Sep 05
Posts: 71
Credit: 5,702,246
RAC: 0
Message 61056 - Posted: 8 May 2009, 10:27:29 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=249163776



<core_client_version>6.6.20</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
[2009- 5- 6 8:46:50:] :: BOINC:: Initializing ... ok.
[2009- 5- 6 8:46:50:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev29025.zip
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _U6X12X_00001
# cpu_run_time_pref: 86400
Starting work on structure: _U6X12X_00002
Starting work on structure: _U6X12X_00003
Starting work on structure: _U6X12X_00004
Starting work on structure: _U6X12X_00005
Starting work on structure: _U6X12X_00006
Starting work on structure: _U6X12X_00007
Starting work on structure: _U6X12X_00008
Starting work on structure: _U6X12X_00009
Starting work on structure: _U6X12X_00010
Starting work on structure: _U6X12X_00011
Starting work on structure: _U6X12X_00012
Starting work on structure: _U6X12X_00013
Starting work on structure: _U6X12X_00014
Starting work on structure: _U6X12X_00015
Starting work on structure: _U6X12X_00016
Starting work on structure: _U6X12X_00017
Starting work on structure: _U6X12X_00018
Starting work on structure: _U6X12X_00019
Starting work on structure: _U6X12X_00020
Starting work on structure: _U6X12X_00021
Starting work on structure: _U6X12X_00022
Starting work on structure: _U6X12X_00023
Starting work on structure: _U6X12X_00024
Starting work on structure: _U6X12X_00025
Starting work on structure: _U6X12X_00026
Starting work on structure: _U6X12X_00027
Starting work on structure: _U6X12X_00028
Starting work on structure: _U6X12X_00029
Starting work on structure: _U6X12X_00030
Starting work on structure: _U6X12X_00031
Starting work on structure: _U6X12X_00032
Starting work on structure: _U6X12X_00033
Starting work on structure: _U6X12X_00034
Starting work on structure: _U6X12X_00035
Starting work on structure: _U6X12X_00036
Starting work on structure: _U6X12X_00037
Starting work on structure: _U6X12X_00038
Starting work on structure: _U6X12X_00039
[2009- 5- 7 18:23:19:] :: BOINC:: Initializing ... ok.
[2009- 5- 7 18:23:19:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev29025.zip
Setting database description ...
Setting up checkpointing ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 86400
Starting work on structure: _U6X12X_00039
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_1 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_2 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_3_iter1_1 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_3_iter1_2 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_3_iter1_3 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_3_iter1_4 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_3_iter1_5 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_3_iter1_6 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_3_iter1_7 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_3_iter1_8 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_3_iter1_9 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage_3_iter1_10 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_U6X12X_00000039_ClassicAbinitio__stage4_kk_2 ... success!
Starting work on structure: _U6X12X_00040
[2009- 5- 7 19:14:22:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:15: 3:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:15:44:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:16:25:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:17: 7:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:17:48:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:18:29:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:19:10:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:19:51:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:32:22:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:33: 3:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:33:45:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:34:26:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:35: 7:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:35:48:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:36:29:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:37:11:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:37:52:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:47:39:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:48:21:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:49: 2:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:49:43:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:50:24:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:51: 6:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:51:47:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:52:28:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:53: 9:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:53:50:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:54:32:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:55:13:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:55:54:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:58:39:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 19:59:21:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20: 1:41:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20: 2:22:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:20:17:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:20:58:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:21:40:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:22:21:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:23: 2:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:23:43:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:24:24:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:25: 6:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:25:47:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:26:28:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:27: 9:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:27:50:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:28:32:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:29:13:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:29:54:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:30:35:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:31:16:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:31:58:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:32:39:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:33:20:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:34: 1:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:34:43:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:35:24:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:36: 5:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:36:46:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:37:28:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:40:35:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:41:16:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:41:57:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:42:39:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:45:45:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:46:26:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:47: 7:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:52:20:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:53: 0:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:53:42:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:54:23:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:55: 4:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:55:45:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:56:27:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:57: 8:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:57:49:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:58:31:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:59:12:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 20:59:53:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21: 0:34:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21: 1:16:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21: 1:57:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21: 2:38:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21: 3:19:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21: 4: 0:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:25:23:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:26: 4:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:26:45:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:27:27:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:28: 8:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:28:49:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:29:30:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:30:12:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:30:53:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:31:34:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:32:15:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:32:57:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:33:38:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:34:19:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 5- 7 21:37: 4:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting

</stderr_txt>
]]>


Validate state Invalid
Claimed credit 188.147749656091
Granted credit 0
application version 1.65

ID: 61056 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 61061 - Posted: 8 May 2009, 16:57:55 UTC

A question for all the "error experts". I've got a task running on one of my machines that appears to have stalled.

Here's the line from the messages where it started:

Silver rosetta@home 5/4/2009 18:05:05 Starting abinitio_norelax_homfrag_natfrag_129_B_2hkvA_SAVE_ALL_OUT_6252_7736_0

This is from BoincView, since that program allows me to monitor all my farm machines from one place.

What's a concern is that after almost 4 days, it's still taking a slot up on the machine. CPU efficiency reports as zero (not a good sign), CPU consumed is 1:48:20, however the to completion time is off the scale: 140:06:02.

What I'd like to know is the most useful things I can do to get information about this back to Bakerlab. That includes anything up to and including attaching a debugger to it and poking around inside the process.
ID: 61061 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 61062 - Posted: 8 May 2009, 19:56:55 UTC

dnuff, what is the current status of the task? "Running"? What platform are you running on? What BOINC version?
Rosetta Moderator: Mod.Sense
ID: 61062 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 61063 - Posted: 8 May 2009, 23:48:42 UTC

I've just has a couple of 1.67 tasks (the first and only two I've received) fail on me in termination on Mac. This happened on Ralph 1.67 also: I reported it then and it should have been fixed.

249719353
249691873

Crashed executable name: minirosetta_1.67_i686-apple-darwin
built using BOINC library version 6.5.0
Machine type Intel 80486 (32-bit executable)
System version: Macintosh OS 10.4.11 build 8S2167
Fri May 8 14:26:41 2009

Thread 0 Crashed:
0 ...etta_1.67_i686-apple-darwin 0x00efc851 __ZN7utility7signals9SignalHubIvN4core12conformation7signals16DestructionEventEE11send_signalES5_ + 1593
1 ...etta_1.67_i686-apple-darwin 0x0002a4a7 __ZN4core12conformation12ConformationD1Ev + 7373
2 ...etta_1.67_i686-apple-darwin 0x000910d0 __ZN4core4pose4PoseD1Ev + 4652
3 ...etta_1.67_i686-apple-darwin 0x00518bdc __ZN9protocols3jd214JobDistributor2goEN7utility7pointer10owning_ptrINS_5moves5MoverEEE + 1730
4 ...etta_1.67_i686-apple-darwin 0x00b59c20 __ZN9protocols3jd219BOINCJobDistributor2goEN7utility7pointer10owning_ptrINS_5moves5MoverEEE + 42
5 ...etta_1.67_i686-apple-darwin 0x0013b068 __ZN9protocols8abinitio24Loopbuild_Threading_mainEv + 720
6 ...etta_1.67_i686-apple-darwin 0x00005db8 _main + 7640
7 ...etta_1.67_i686-apple-darwin 0x0000292e __start + 216
8 ...etta_1.67_i686-apple-darwin 0x00002855 start + 41


And could we have a new thread for 1.67 please?
ID: 61063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
nick n
Avatar

Send message
Joined: 26 Aug 07
Posts: 49
Credit: 219,102
RAC: 0
Message 61065 - Posted: 9 May 2009, 4:04:02 UTC

Yeah we should start a 1.67 thread
ID: 61065 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 61066 - Posted: 9 May 2009, 5:01:44 UTC - in response to Message 61062.  

dnuff, what is the current status of the task? "Running"? What platform are you running on? What BOINC version?


Platform: Windows XP SP 3
Boinc client version: 6.4.7
Hardware: Intel Q6600 quad core, 2 GB memory

Link to the host. While I'm at it, Task details and workunit details

According to boincmgr, it's running, high priority.

Keeping that in mind, I ran a VNC session to get to the desktop of the system in question. Task manager shows the process is present, but not using any CPU time.
ID: 61066 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 61067 - Posted: 9 May 2009, 6:23:01 UTC

I just had one of these crash upon completion on my Mac as well:

threading_lb_test1_hb_t373__IGNORE_THE_REST_11850_55_0

# cpu_run_time_pref: 36000
SIGBUS: bus error

Crashed executable name: minirosetta_1.67_i686-apple-darwin
built using BOINC library version 6.5.0
Machine type Intel 80486 (32-bit executable)
System version: Macintosh OS 10.5.6 build 9G55
Fri May 8 23:50:54 2009

sh: /usr/bin/atos: No such file or directory
0 0x006c0345 SIGPIPE: write on a pipe with no reader
1 0x004a3d8e SIGPIPE: write on a pipe with no reader
2 0x90a9e2bb SIGPIPE: write on a pipe with no reader
3 0xffffffff SIGPIPE: write on a pipe with no reader
4 0x0002a4a7 SIGPIPE: write on a pipe with no reader
5 0x000910d0 SIGPIPE: write on a pipe with no reader
6 0x00518bdc SIGPIPE: write on a pipe with no reader
7 0x00b59c20 SIGPIPE: write on a pipe with no reader
8 0x0013b068 SIGPIPE: write on a pipe with no reader
9 0x00005db8 SIGPIPE: write on a pipe with no reader
10 0x0000292e SIGPIPE: write on a pipe with no reader
11 0x00002855
Thread 0 crashed with X86 Thread State (32-bit):
eax: 0xffffffe1 ebx: 0x90a66802 ecx: 0xbfffc26c edx: 0x90a321c6
edi: 0x00000000 esi: 0x00000000 ebp: 0xbfffc2a8 esp: 0xbfffc26c
ss: 0x0000001f efl: 0x00000206 eip: 0x90a321c6 cs: 0x00000007
ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037

ID: 61067 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 61073 - Posted: 9 May 2009, 15:40:09 UTC

dgnuff, when I've seen that before, it always seems it is with that era of BOINC Manager installed. It seems to lose track of the tasks sometimes and now you probably have an idle CPU. I suggest you suspend and release the task and see if it wakes up. If not, suspend and release it for a minute or so, and after 5 times the watchdog will kick it out for not making progress after 5 restarts and it will report back.
Rosetta Moderator: Mod.Sense
ID: 61073 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 61077 - Posted: 9 May 2009, 20:37:33 UTC - in response to Message 61073.  
Last modified: 9 May 2009, 20:40:25 UTC

dgnuff, when I've seen that before, it always seems it is with that era of BOINC Manager installed. It seems to lose track of the tasks sometimes and now you probably have an idle CPU. I suggest you suspend and release the task and see if it wakes up. If not, suspend and release it for a minute or so, and after 5 times the watchdog will kick it out for not making progress after 5 restarts and it will report back.


Thanks for the information. Suspending and resuming it a couple of times unlocked it. I'll keep an eye on it for now, since the completion time is totally wrong. However, the "completion percentage" is increasing at a rate that's correct for my standard run time of 12 hours.

-- Edit --

You indicated that a 6.4.? client can cause this. I notice that the current version for Windows is 6.6.20. Do you think it would be worth my time to download and install that to avoid this problem in the future?
ID: 61077 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 808,337
RAC: 1
Message 61079 - Posted: 9 May 2009, 23:14:07 UTC - in response to Message 61077.  


You indicated that a 6.4.? client can cause this. I notice that the current version for Windows is 6.6.20. Do you think it would be worth my time to download and install that to avoid this problem in the future?

I'm not Mod.Sense In my view it's always best to update to the latest stable version of Boinc Especially if a earlier version is causing a problem as was indicated in Mod. Sense post above.
Have a crunching good day!!
ID: 61079 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 61086 - Posted: 10 May 2009, 3:32:56 UTC

There have been other problems with the current BOINC release, so I wasn't trying to push you that direction. Just to say that the situation wasn't unique to your machine or the task, or Rosetta version.

Yes, I wouldn't make too much of the completion estimate. It was thrown off by the period of time with no CPU. Now that it is crunching again, it should end in a normal time period for your tasks (I mean, based on your runtime preference plus up to 4 hours).
Rosetta Moderator: Mod.Sense
ID: 61086 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Problems with Minirosetta Version 1.64/1.65



©2024 University of Washington
https://www.bakerlab.org