minirosetta 2.17

Message boards : Number crunching : minirosetta 2.17

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
Tex1954

Send message
Joined: 3 Apr 11
Posts: 9
Credit: 3,394,752
RAC: 1
Message 70064 - Posted: 18 Apr 2011, 16:06:46 UTC - in response to Message 70063.  

I am seeing validate errors (with matching wingman results) on tasks whose name has the form of:

T0590_boinc_nmr_homology_max10_loopbuild_threading_cst_relax_tex

A few samples would be:

414980981
414994609
414957506
414950332
415065606


Well, I suggest they give us a storage fee in the form of double normal points to house their flawed software on our systems...

LOL!

8-)

Tex1954
ID: 70064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 70066 - Posted: 18 Apr 2011, 16:18:08 UTC

Text1954 said: Well, I suggest they give us a storage fee in the form of double normal points to house their flawed software on our systems...


There are even a few additional types of tasks getting the validate errors with matching wingman results but there are fewer of them so I'm just going to sit back and see what they do with these before sorting through more the chaff.

It sure would be nice if they would update their server software so that we could pull a task list by Server State / Outcome like some of the other projects have. It would make digging through the results a bunch easier.

ID: 70066 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tex1954

Send message
Joined: 3 Apr 11
Posts: 9
Credit: 3,394,752
RAC: 1
Message 70069 - Posted: 18 Apr 2011, 19:51:01 UTC - in response to Message 70066.  

There are even a few additional types of tasks getting the validate errors with matching wingman results but there are fewer of them so I'm just going to sit back and see what they do with these before sorting through more the chaff.

It sure would be nice if they would update their server software so that we could pull a task list by Server State / Outcome like some of the other projects have. It would make digging through the results a bunch easier.



Well, I am not a developer for their tasks, just a helper with hardware. These (all BOINC etc. tasks) are all cooperative ventures. Sometimes, the certain folks feel superior and/or embarrassed and clog/break the information circle...

Would be nice if "someone" that actually writes the apps would pop in and let us know somebody is awake! As mentioned before, some sort of Status message germane to the current situation? A one line Sticky NOTE for crying out loud?

LOL!

Anyway, plugging along with the rest of ya'll...

8-)
ID: 70069 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bobgoblin

Send message
Joined: 15 Oct 05
Posts: 2
Credit: 1,616,056
RAC: 0
Message 70070 - Posted: 19 Apr 2011, 1:30:21 UTC - in response to Message 70048.  



Your main concern (because you've "disabled Rosetta") seems to be whether things are running properly, or doing harm to your machines. Nothing you've described implies any harm. In fact, the tasks complete in the 3 hours of CPU time that you've (likely) set (or defaulted to) in your R@h preferences.

I think what you are saying is that "wall clock" time is over 12 hours, but actual CPU time is around 3 hours. So the question boils down to asking why tasks might not be receiving CPU time when they are trying to run. This could be due to other tasks on the machine demanding CPU (as BOINC runs at lowest possible priority, and will yield to other tasks).

It seems fairly likely that with one 8 core machine running in 6GB of memory and the other 8 core machine running in 8GB of memory, that you would see "waiting for memory" as the status of several tasks rather then "running". This causes BOINC to stop giving the tasks CPU time until the total memory of other active tasks comes back down to be within the preferences set in your BOINC Manager for memory. So, when memory becomes constrained, BOINC is not longer using all of the CPUs of the machine (or all of the CPUs BOINC is configured to use).

This likely is not occurring on your 4-core machine because it has 6GB of memory (50% more per core then the other machines).

This thread has a number of ideas and descriptions of what to expect and what actions you might take to help things run better.



I was not concerned about, nor did I believe, rosetta was causing harm to the i7's. I have not seen the "waiting for memory" message on either machine. The 12+ hour crunch time is a very recent development, only in the last few weeks. Have the memory demands of rosetta increased? If so, then I will not run them on those machines and continue to run seti, cpdn, and einstein instead.
ID: 70070 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 70072 - Posted: 19 Apr 2011, 4:25:39 UTC

R@h memory demands vary considerably with different types of tasks. I've noticed several recently that are using more then 300MB of memory at times.

If you catch it happening again, please look at the BOINC Manager to see if you've got 8 tasks in a "running" status, and then look at Windows task manager to see if all 8 are getting CPU time.

There has been another problem that seems to come up from time to time where the task looks like it is running from BOINC, but not actually getting any CPU time. And because it never gets CPU time, the R@h watchdog can't take action to end the task or clean it up (because it would need CPU to take any action). As I recall, the only way around that one (other then aborting the task) is to completely end and restart BOINC (just suspending the task and resuming it doesn't seem to resolve the problem). ...with 7 other tasks running, and standing to loose work they've done since their last checkpoints when you restart BOINC, you may be time ahead to just abort such tasks, or if you know you are going to reboot the machine soon, suspend them until you reboot.

I have not been able to determine any patterns as to what makes this occur when BOINC says the task is running, yet doesn't allocate CPU time to it. So, any details about mix with other projects, or number of tasks involved or amount of memory the stalled task shows being used in Windows task manager... hopefully with enough detail a pattern will begin to emerge. I'm not positive, but I believe this has only been occurring on Windows machines, so perhaps that's a start.
Rosetta Moderator: Mod.Sense
ID: 70072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 70075 - Posted: 19 Apr 2011, 15:09:03 UTC - in response to Message 70072.  

I have not been able to determine any patterns as to what makes this occur when BOINC says the task is running, yet doesn't allocate CPU time to it. So, any details about mix with other projects, or number of tasks involved or amount of memory the stalled task shows being used in Windows task manager... hopefully with enough detail a pattern will begin to emerge. I'm not positive, but I believe this has only been occurring on Windows machines, so perhaps that's a start.


Sorry, Mod.Sense, I've seen it occur, albeit extremely rarely, on my Mac. It also occurs on many other projects although it seems more prevalent here on Rosetta; at least, there are more complaints here than on the other boards I peruse. Have you been in contact with Josef Segur? Judging from his most recent contribution to the boinc_dev list ("check_progress option") he has an interest in this problem and could probably point you to discussions elsewhere and/or individuals who are also collecting observations and trying to discern patterns. It also might be helpful if a project the size and import of Rosetta expressed an interest in having BOINC address the issue.

Best,
Snags
ID: 70075 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile bump

Send message
Joined: 13 Apr 10
Posts: 1
Credit: 2,315,841
RAC: 0
Message 70077 - Posted: 19 Apr 2011, 17:26:32 UTC

I too am getting the compute errors and I do not believe memory to be the issue. I have 3 boxes and they all get the errors. Two of them have 4GB and they are essentially idle most of the day. According the stats a maximum of 2.1 GB of the 4GB has ever been used. Along with the compute errors I am seeing difficulty in uploading finished jobs. right now I have 10 queued up on one box and they keep getting paused and set for retry.
ID: 70077 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 70078 - Posted: 19 Apr 2011, 17:33:07 UTC

Yes, just from my own observations it seemed to be more likely to occur when tasks were contending for memory. What makes it hard to study is the lack of messages about tasks being suspended to wait for memory. And when multiple projects were in the mix, it got very difficult to tell whether another task was started due to memory limits, or due to project switching, or what. Another factor is that when you get on to your machine to look at things, your preference for "when active" memory is often less then "when idle", and so now simply observing it is effecting it too.

I never found a way to cause it to happen. And even when memory is constrained, it doesn't seem to happen very often. Yet when it does happen, it seems to come in waves where you see it several times in just a few days, and then not again for weeks or months. Makes me question if the OS is not properly swapping memory back in when a task is resumed.
Rosetta Moderator: Mod.Sense
ID: 70078 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1234
Credit: 14,338,560
RAC: 1,227
Message 70080 - Posted: 20 Apr 2011, 5:00:41 UTC
Last modified: 20 Apr 2011, 5:13:40 UTC

I've been seeing it occasionally with a computer running 64-bit Windows Vista, with 8 GB memory available and BOINC allowed to use 40% of that. I've already posted more details about some of those workunits earlier in this thread. No error messages about why unless they're in that workunit's log file.

For the last few, the time when it stopped using any CPU time at all was around 1 minute after it resumed processing after the last checkpoint.

I have the same computer participating in most of the other BOINC projects related to medical research, with those that do not have checkpoints currently disabled. Occasionally two minirosetta workunits at a time; three CPU cores set to allow BOINC use, but I don't remember seeing three minirosetta workunits try to run at once.
ID: 70080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
svincent

Send message
Joined: 30 Dec 05
Posts: 219
Credit: 12,120,035
RAC: 0
Message 70143 - Posted: 27 Apr 2011, 16:00:43 UTC

Task (blind_rhoda_boinc_nmr_control.2nz6A_330_abrelax_cs_frags_sgourn_IGNORE_THE_REST_25677_1336_0) 418323998 failed on Mac after about 5 minutes. Other tasks with names like blind* fail similarly.

ERROR: ct == final_atoms
ERROR:: Exit from: src/core/scoring/rms_util.cc line: 475
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>
ID: 70143 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James Thompson

Send message
Joined: 13 Oct 05
Posts: 46
Credit: 186,109
RAC: 0
Message 70145 - Posted: 27 Apr 2011, 17:46:35 UTC

Thanks svincent. This is another input file issue, this time from a different user. The jobs have been removed, and we're working on the problem right now.


ID: 70145 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ian_D

Send message
Joined: 21 Sep 05
Posts: 55
Credit: 4,216,173
RAC: 0
Message 70186 - Posted: 30 Apr 2011, 21:58:00 UTC
Last modified: 30 Apr 2011, 21:59:40 UTC

So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

https://boinc.bakerlab.org/rosetta/result.php?resultid=419011804

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 4-30 2:38:23:] :: BOINC:: Initializing ... ok.
[2011- 4-30 2:38:23:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910F1E read attempt to address 0x00000001

Engaging BOINC Windows Runtime Debugger...

[2011- 4-30 21:39:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 21:39:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_2 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 6:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 6:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 7:11:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 7:11:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk13_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk14_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk15_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk16_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk17_fa ... success!
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 8:47:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 8:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22:17:41:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22:17:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
======================================================
DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>


ID: 70186 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 70193 - Posted: 1 May 2011, 7:50:35 UTC - in response to Message 70186.  

Despite the heartbeat issues, you did complete the task:

DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts


So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

https://boinc.bakerlab.org/rosetta/result.php?resultid=419011804

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 4-30 2:38:23:] :: BOINC:: Initializing ... ok.
[2011- 4-30 2:38:23:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910F1E read attempt to address 0x00000001

Engaging BOINC Windows Runtime Debugger...

[2011- 4-30 21:39:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 21:39:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_2 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 6:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 6:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 7:11:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 7:11:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk13_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk14_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk15_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk16_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk17_fa ... success!
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 8:47:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 8:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22:17:41:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22:17:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
======================================================
DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

ID: 70193 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ian_D

Send message
Joined: 21 Sep 05
Posts: 55
Credit: 4,216,173
RAC: 0
Message 70198 - Posted: 1 May 2011, 11:20:39 UTC - in response to Message 70193.  

Yep, task completed after I restarted BOINC - put into snooze , then shutdown and started (in that order) ??

Despite the heartbeat issues, you did complete the task:

DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts


So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

https://boinc.bakerlab.org/rosetta/result.php?resultid=419011804

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 4-30 2:38:23:] :: BOINC:: Initializing ... ok.
[2011- 4-30 2:38:23:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910F1E read attempt to address 0x00000001

Engaging BOINC Windows Runtime Debugger...

[2011- 4-30 21:39:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 21:39:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_2 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 6:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 6:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 7:11:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 7:11:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk13_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk14_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk15_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk16_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk17_fa ... success!
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 8:47:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 8:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22:17:41:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22:17:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
======================================================
DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>





ID: 70198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 70201 - Posted: 1 May 2011, 13:11:04 UTC - in response to Message 70198.  

Well guess we will have to wait for the Grad student to wake up and come on duty to fully address you question. I found a little something from the Wiki of Boinc that addresses this issue:

Why am I getting a 'Reason: Access Violation (0xc0000005) error'?

1. Change your preferences to leave Rosetta@Home in memory, General Preferences Log in (at General Preferences if you're not already) -> Edit Preferences (down the bottom) -> Leave applications in memory while preempted? Check yes and click the update preferences button; also, remember to "update" the BOINC Client Software so that the changes are downloaded. Open the BOINC Manager and select the "Projects Tab", left-click on "Rosetta@home" to select the project, and click the "Update" Button.
2. An error occurred somewhere on the computer, it could have been the BOINC Client Software or the Rosetta@Home Science Application or any programme that your computer was doing at the time. This is not a Rosetta@Home specific error, as far as I am aware it happens, on occasion, in all of the BOINC Powered Projects with all of the Science Applications. Keep Rosetta@Home in memory and ignore this problem if it's not getting out of hand.

I'm going to leave it at that....wait for the big experts

Yep, task completed after I restarted BOINC - put into snooze , then shutdown and started (in that order) ??

Despite the heartbeat issues, you did complete the task:

DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts


So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

https://boinc.bakerlab.org/rosetta/result.php?resultid=419011804

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 4-30 2:38:23:] :: BOINC:: Initializing ... ok.
[2011- 4-30 2:38:23:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910F1E read attempt to address 0x00000001

Engaging BOINC Windows Runtime Debugger...

[2011- 4-30 21:39:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 21:39:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_2 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 6:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 6:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 7:11:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 7:11:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk13_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk14_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk15_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk16_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk17_fa ... success!
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 8:47:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 8:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22:17:41:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22:17:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
======================================================
DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>



ID: 70201 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ian_D

Send message
Joined: 21 Sep 05
Posts: 55
Credit: 4,216,173
RAC: 0
Message 70205 - Posted: 1 May 2011, 14:50:41 UTC - in response to Message 70201.  

Cheers Greg !

Well guess we will have to wait for the Grad student to wake up and come on duty to fully address you question. I found a little something from the Wiki of Boinc that addresses this issue:

Why am I getting a 'Reason: Access Violation (0xc0000005) error'?

1. Change your preferences to leave Rosetta@Home in memory, General Preferences Log in (at General Preferences if you're not already) -> Edit Preferences (down the bottom) -> Leave applications in memory while preempted? Check yes and click the update preferences button; also, remember to "update" the BOINC Client Software so that the changes are downloaded. Open the BOINC Manager and select the "Projects Tab", left-click on "Rosetta@home" to select the project, and click the "Update" Button.
2. An error occurred somewhere on the computer, it could have been the BOINC Client Software or the Rosetta@Home Science Application or any programme that your computer was doing at the time. This is not a Rosetta@Home specific error, as far as I am aware it happens, on occasion, in all of the BOINC Powered Projects with all of the Science Applications. Keep Rosetta@Home in memory and ignore this problem if it's not getting out of hand.

I'm going to leave it at that....wait for the big experts

Yep, task completed after I restarted BOINC - put into snooze , then shutdown and started (in that order) ??

Despite the heartbeat issues, you did complete the task:

DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts


So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

https://boinc.bakerlab.org/rosetta/result.php?resultid=419011804

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 4-30 2:38:23:] :: BOINC:: Initializing ... ok.
[2011- 4-30 2:38:23:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910F1E read attempt to address 0x00000001

Engaging BOINC Windows Runtime Debugger...

[2011- 4-30 21:39:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 21:39:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_2 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 6:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 6:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 7:11:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 7:11:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk13_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk14_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk15_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk16_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk17_fa ... success!
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 8:47:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 8:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22:17:41:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22:17:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
======================================================
DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>







ID: 70205 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Snags

Send message
Joined: 22 Feb 07
Posts: 198
Credit: 2,888,320
RAC: 0
Message 70210 - Posted: 1 May 2011, 17:27:12 UTC - in response to Message 70201.  

Well guess we will have to wait for the Grad student to wake up and come on duty to fully address you question. I found a little something from the Wiki of Boinc that addresses this issue:

Why am I getting a 'Reason: Access Violation (0xc0000005) error'?

1. Change your preferences to leave Rosetta@Home in memory, General Preferences Log in (at General Preferences if you're not already) -> Edit Preferences (down the bottom) -> Leave applications in memory while preempted? Check yes and click the update preferences button; also, remember to "update" the BOINC Client Software so that the changes are downloaded. Open the BOINC Manager and select the "Projects Tab", left-click on "Rosetta@home" to select the project, and click the "Update" Button.
2. An error occurred somewhere on the computer, it could have been the BOINC Client Software or the Rosetta@Home Science Application or any programme that your computer was doing at the time. This is not a Rosetta@Home specific error, as far as I am aware it happens, on occasion, in all of the BOINC Powered Projects with all of the Science Applications. Keep Rosetta@Home in memory and ignore this problem if it's not getting out of hand.

I'm going to leave it at that....wait for the big experts

Yep, task completed after I restarted BOINC - put into snooze , then shutdown and started (in that order) ??

[quote]Despite the heartbeat issues, you did complete the task:

DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts


[quote]So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

https://boinc.bakerlab.org/rosetta/result.php?resultid=419011804



The "no heartbeat" message means the science app and BOINC client lost contact with each other. When the science application doesn't receive the heartbeat (the "I'm alive") message from BOINC it is supposed to exit. As long as it was merely a temporary obstruction and BOINC hasn't actually crashed it should see that the application has stopped, restart it and proceed merrily on its way. Only when it happens repeatedly with a single task (100 times) does BOINC give up, sending that task back and starting a brand new task. If I'm reading correctly the "no heartbeat" messages occurred after you had restarted BOINC and Rosetta was able to successfully complete the task despite them. They may or may not be related to the cause of the error Gregg highlighted and which may have led to a BOINC crash which it couldn't recover from without a restart, thus the long delay until you noticed, restarted, and set BOINC and Rosetta on their merry way again.

You might try to recall what else was running on your computer at the time of the "no heartbeat" messages (22:6:36, 22:7:11, 22:8:47, 22:17:41). Anti-virus, anti-spyware, some other maintenance type scan, indexing? Could be something you started deliberately or could be something running automatically in the background. I don't suppose you started some new process (indexing, say) between 2:38:23 and the time BOINC stopped (which, if BOINC hadn't been running for 13.5 hours when you restarted must have been about 8. Is that right?). That could point to the cause of the crash and, if the process was ongoing (or maybe set to check for changes, like an index or a backup), could also explain the "no heartbeat" messages.


Best,
Snags
ID: 70210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,143,328
RAC: 1,511
Message 70211 - Posted: 1 May 2011, 21:20:39 UTC

Hey guys, is it really necessary to full quote the same stuff over and over again? :(

Ralf
ID: 70211 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 70212 - Posted: 2 May 2011, 0:00:18 UTC - in response to Message 70211.  

Hey guys, is it really necessary to full quote the same stuff over and over again? :(

Ralf



Well in this case it keeps everything together in one block so we can reference ALL the information, the error messages, initial complaint, possible solutions and information about the error.

This is a small enough thread it wasn't that big of a deal.
In bigger threads it can be a problem.

Ive been around long enough to know how some of us as a joke created a thread so long by just replying to the same quote time after time. Mod remembers this.
So this is just a pidly thread.
ID: 70212 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ian_D

Send message
Joined: 21 Sep 05
Posts: 55
Credit: 4,216,173
RAC: 0
Message 70214 - Posted: 2 May 2011, 9:23:19 UTC

Think I may have "solved" this one and as you so rightly said, it looks like it was a hardware problem. Looking at System info messages I've been getting a lot of intermittent paging problems to one of the hard disks aroud the times of the Reason: Access Violation (0xc0000005) failures

Cheers for the steer !

Ian


ID: 70214 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : minirosetta 2.17



©2024 University of Washington
https://www.bakerlab.org