Rosetta@home

minirosetta 2.17

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : minirosetta 2.17

Sort
AuthorMessage
Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 68306 - Posted 31 Oct 2010 23:43:18 UTC

This update fixes the memory and invalidation issues from 2 weeks ago.
please report bugs here.

Pardner

Joined: Oct 31 10
Posts: 6
ID: 399813
Credit: 3,442
RAC: 0
Message 68322 - Posted 1 Nov 2010 20:42:55 UTC - in response to Message ID 68306.

This update fixes the memory and invalidation issues from 2 weeks ago.
please report bugs here.


I am new to Rosetta but am getting the Tasks "Suspended" a lot. The Task (a single Task) runs fine when I "Suspend" all the other Tasks. But if I let them all be active (so that 2 are "Running"),that's when the "Suspended" constantly occurs. I have verified that my PC meets all the "Minimum Requirements". Below is what I'm getting as far as "Messages".

11/1/2010 1:30:47 PM Suspending computation - CPU usage is too high
11/1/2010 1:30:57 PM Resuming computation
11/1/2010 1:31:08 PM Suspending computation - CPU usage is too high
11/1/2010 1:31:28 PM Resuming computation
11/1/2010 1:31:39 PM Suspending computation - CPU usage is too high
11/1/2010 1:31:49 PM Resuming computation
11/1/2010 1:31:59 PM Suspending computation - CPU usage is too high
11/1/2010 1:32:10 PM Resuming computation
11/1/2010 1:32:20 PM Suspending computation - CPU usage is too high
11/1/2010 1:32:31 PM Resuming computation
11/1/2010 1:32:41 PM Suspending computation - CPU usage is too high
11/1/2010 1:32:51 PM Resuming computation
11/1/2010 1:33:02 PM Suspending computation - CPU usage is too high
11/1/2010 1:33:13 PM Resuming computation
11/1/2010 1:33:23 PM Suspending computation - CPU usage is too high
11/1/2010 1:33:33 PM Resuming computation
11/1/2010 1:33:44 PM Suspending computation - CPU usage is too high
11/1/2010 1:33:55 PM Resuming computation
11/1/2010 1:34:05 PM Suspending computation - CPU usage is too high
11/1/2010 1:34:15 PM Resuming computation
11/1/2010 1:34:26 PM Suspending computation - CPU usage is too high
11/1/2010 1:34:36 PM Resuming computation
11/1/2010 1:34:47 PM Suspending computation - CPU usage is too high
11/1/2010 1:34:57 PM Resuming computation
11/1/2010 1:35:08 PM Suspending computation - CPU usage is too high
11/1/2010 1:35:18 PM Resuming computation
11/1/2010 1:35:29 PM Suspending computation - CPU usage is too high
11/1/2010 1:35:39 PM Resuming computation
11/1/2010 1:35:49 PM Suspending computation - CPU usage is too high
11/1/2010 1:36:00 PM Resuming computation
11/1/2010 1:36:10 PM Suspending computation - CPU usage is too high
11/1/2010 1:36:21 PM Resuming computation

Thanks

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 365,375
RAC: 94
Message 68323 - Posted 1 Nov 2010 20:52:20 UTC - in response to Message ID 68322.
Last modified: 1 Nov 2010 20:53:20 UTC

I am new to Rosetta but am getting the Tasks "Suspended" a lot. The Task (a single Task) runs fine when I "Suspend" all the other Tasks. But if I let them all be active (so that 2 are "Running"),that's when the "Suspended" constantly occurs. I have verified that my PC meets all the "Minimum Requirements". Below is what I'm getting as far as "Messages".


This sounds like a problem with your BOINC preferences set on your computer. In your BOINC manager's preferences you will have a setting for running the programme "while processor usage is less than X%". If on that setting you have 25%, then when your CPU becomes more than 25% busy BOINC will suspend activity.

You can either set the setting to a higher percentage, or set it to 0% so the setting becomes ignored. When 0% is set BOINC will continue running as best it can no matter how busy your CPU is.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68324 - Posted 1 Nov 2010 20:58:36 UTC

BOINC runs at low priority. But the default configuration is there to try and make BOINC as unnoticeable as possible. The thought is that if your machine is busy with other things that are taking 25% of CPU, then running BOINC may conflict with the other work you are trying to do, so it steps out of the way.

But, as Murasaki states, you can increase the value of this cut-out threshold, or set it to zero, and BOINC will try to run more then it has been.
____________
Rosetta Moderator: Mod.Sense

Pardner

Joined: Oct 31 10
Posts: 6
ID: 399813
Credit: 3,442
RAC: 0
Message 68328 - Posted 1 Nov 2010 23:24:51 UTC - in response to Message ID 68323.

I am new to Rosetta but am getting the Tasks "Suspended" a lot. The Task (a single Task) runs fine when I "Suspend" all the other Tasks. But if I let them all be active (so that 2 are "Running"),that's when the "Suspended" constantly occurs. I have verified that my PC meets all the "Minimum Requirements". Below is what I'm getting as far as "Messages".


This sounds like a problem with your BOINC preferences set on your computer. In your BOINC manager's preferences you will have a setting for running the programme "while processor usage is less than X%". If on that setting you have 25%, then when your CPU becomes more than 25% busy BOINC will suspend activity.

You can either set the setting to a higher percentage, or set it to 0% so the setting becomes ignored. When 0% is set BOINC will continue running as best it can no matter how busy your CPU is.


Thanks for the quick reply and info both Marasaki & Mod.sense.
I have been using SETI and of course BOINC for a while now. I'm familiar with the CPU usgae settings and mine are at 100% and 0 (run always). I did not have this issue when running the SETI Tasks. I am only running Rosetta now as SETI is down for a server upgrade so there is no competing for Projects.

Below is a snipet of my "Messages" You'll notice that at 2:01 I had "Suspended" all tasks except one and then it ran fine for an hour, until 3:01. (Keep in mind I am not using any other applications/programs during this time. Rosetta is the only item running.)
At 3:01 I decided to "Resume" all my "Suspended" Rosetta tasks. Once I did that I started getting the "Suspending computation - CPU Usage is too high" message.
At 3:22 I "Suspended" all Rosetta Tasks but one and it worked fine until 3:30 when I got on the Message Board which of course pushed my CPU usage over the limit. So the Tasks DO run well but they only run well when 1 Rosetta Task is running at a time. Like I said earlier, I have been running multiple SETI Tasks for over a month and have not had an issue like this.

Thanks for any input.


11/1/2010 2:01:00 PM Resuming computation
11/1/2010 3:01:02 PM rosetta@home task T0572_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln23_SAVE_ALL_OUT_22447_312_0 resumed by user
11/1/2010 3:01:02 PM rosetta@home Restarting task T0572_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln23_SAVE_ALL_OUT_22447_312_0 using minirosetta version 217
11/1/2010 3:01:21 PM Suspending computation - CPU usage is too high
11/1/2010 3:01:31 PM Resuming computation
11/1/2010 3:01:42 PM Suspending computation - CPU usage is too high
11/1/2010 3:01:46 PM rosetta@home task T0572_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln23_SAVE_ALL_OUT_22447_312_0 suspended by user
11/1/2010 3:01:52 PM Resuming computation
11/1/2010 3:03:53 PM rosetta@home Computation for task celldiv_LPhe_1de2_3bci_ProteinInterfaceDesign_30Oct2010_22401_198_0 finished
11/1/2010 3:03:54 PM rosetta@home Started upload of celldiv_LPhe_1de2_3bci_ProteinInterfaceDesign_30Oct2010_22401_198_0_0
11/1/2010 3:03:57 PM rosetta@home Finished upload of celldiv_LPhe_1de2_3bci_ProteinInterfaceDesign_30Oct2010_22401_198_0_0
11/1/2010 3:20:42 PM rosetta@home update requested by user
11/1/2010 3:20:46 PM rosetta@home Sending scheduler request: Requested by user.
11/1/2010 3:20:46 PM rosetta@home Reporting 1 completed tasks, not requesting new tasks
11/1/2010 3:20:47 PM rosetta@home Scheduler request completed
11/1/2010 3:21:08 PM rosetta@home task T0572_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln23_SAVE_ALL_OUT_22447_312_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldivs_LPr2_1de2_2av4_ProteinInterfaceDesign_29Oct2010_22399_200_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldivs_LPr2_1de2_2aak_ProteinInterfaceDesign_29Oct2010_22399_200_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldivs_LPr2_1de2_2a7m_ProteinInterfaceDesign_29Oct2010_22399_200_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldivs_LPr2_1de2_2a1i_ProteinInterfaceDesign_29Oct2010_22399_200_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldivs_LPr2_1de2_1zn6_ProteinInterfaceDesign_29Oct2010_22399_200_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldivs_LPr2_1de2_1rz4_ProteinInterfaceDesign_29Oct2010_22399_203_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldivs_LPr2_1de2_1rw7_ProteinInterfaceDesign_29Oct2010_22399_203_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldiv_LPhe_1de2_3boe_ProteinInterfaceDesign_30Oct2010_22401_198_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldiv_LPhe_1de2_3b5v_ProteinInterfaceDesign_30Oct2010_22401_198_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldiv_LPhe_1de2_2zrr_ProteinInterfaceDesign_30Oct2010_22401_198_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldiv_LPhe_1de2_2zfy_ProteinInterfaceDesign_30Oct2010_22401_198_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldiv_LPhe_1de2_2qgu_ProteinInterfaceDesign_30Oct2010_22401_197_1 resumed by user
11/1/2010 3:21:08 PM rosetta@home task TEMP_0.05_control_1vcc__SAVE_ALL_OUT_22400_50_1 resumed by user
11/1/2010 3:21:08 PM rosetta@home task T0640_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22500_313_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task T0628_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln3_SAVE_ALL_OUT_22492_290_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task T0620_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22486_313_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task T0615_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22483_313_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task T0540_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22425_313_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task T0535_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln6_SAVE_ALL_OUT_22421_313_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home task celldivs_LPr2_1de2_2vwr_ProteinInterfaceDesign_29Oct2010_22399_203_0 resumed by user
11/1/2010 3:21:08 PM rosetta@home Restarting task T0572_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln23_SAVE_ALL_OUT_22447_312_0 using minirosetta version 217
11/1/2010 3:21:08 PM rosetta@home Starting celldivs_LPr2_1de2_2av4_ProteinInterfaceDesign_29Oct2010_22399_200_0
11/1/2010 3:21:09 PM rosetta@home Starting task celldivs_LPr2_1de2_2av4_ProteinInterfaceDesign_29Oct2010_22399_200_0 using minirosetta version 217
11/1/2010 3:21:15 PM Suspending computation - CPU usage is too high
11/1/2010 3:21:25 PM Resuming computation
11/1/2010 3:21:35 PM Suspending computation - CPU usage is too high
11/1/2010 3:21:46 PM Resuming computation
11/1/2010 3:21:56 PM Suspending computation - CPU usage is too high
11/1/2010 3:22:02 PM rosetta@home task celldivs_LPr2_1de2_2av4_ProteinInterfaceDesign_29Oct2010_22399_200_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldivs_LPr2_1de2_2aak_ProteinInterfaceDesign_29Oct2010_22399_200_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldivs_LPr2_1de2_2a7m_ProteinInterfaceDesign_29Oct2010_22399_200_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldivs_LPr2_1de2_2a1i_ProteinInterfaceDesign_29Oct2010_22399_200_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldivs_LPr2_1de2_1zn6_ProteinInterfaceDesign_29Oct2010_22399_200_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldivs_LPr2_1de2_1rz4_ProteinInterfaceDesign_29Oct2010_22399_203_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldivs_LPr2_1de2_1rw7_ProteinInterfaceDesign_29Oct2010_22399_203_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldiv_LPhe_1de2_3boe_ProteinInterfaceDesign_30Oct2010_22401_198_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldiv_LPhe_1de2_3b5v_ProteinInterfaceDesign_30Oct2010_22401_198_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldiv_LPhe_1de2_2zrr_ProteinInterfaceDesign_30Oct2010_22401_198_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldiv_LPhe_1de2_2zfy_ProteinInterfaceDesign_30Oct2010_22401_198_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldiv_LPhe_1de2_2qgu_ProteinInterfaceDesign_30Oct2010_22401_197_1 suspended by user
11/1/2010 3:22:02 PM rosetta@home task TEMP_0.05_control_1vcc__SAVE_ALL_OUT_22400_50_1 suspended by user
11/1/2010 3:22:02 PM rosetta@home task T0640_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22500_313_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task T0628_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln3_SAVE_ALL_OUT_22492_290_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task T0620_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22486_313_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task T0615_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22483_313_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task T0540_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22425_313_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task T0535_AG_rs_stg0_lrlxMultiCst_t000__casp9__aln6_SAVE_ALL_OUT_22421_313_0 suspended by user
11/1/2010 3:22:02 PM rosetta@home task celldivs_LPr2_1de2_2vwr_ProteinInterfaceDesign_29Oct2010_22399_203_0 suspended by user
11/1/2010 3:22:06 PM Resuming computation
11/1/2010 3:22:17 PM Suspending computation - CPU usage is too high
11/1/2010 3:22:27 PM Resuming computation
11/1/2010 3:30:50 PM SETI@home Sending scheduler request: To fetch work.

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 365,375
RAC: 94
Message 68329 - Posted 1 Nov 2010 23:37:48 UTC - in response to Message ID 68328.
Last modified: 1 Nov 2010 23:40:53 UTC

Thanks for the quick reply and info both Marasaki & Mod.sense.
I have been using SETI and of course BOINC for a while now. I'm familiar with the CPU usgae settings and mine are at 100% and 0 (run always). I did not have this issue when running the SETI Tasks. I am only running Rosetta now as SETI is down for a server upgrade so there is no competing for Projects.


As you are familiar with BOINC I assume you are aware that local settings can override web-settings. Therefore if you have checked the BOINC Manager preferences on your local computer and they say 0 then there isn't anything more I can suggest. Hopefully someone else can provide an answer.

Pardner

Joined: Oct 31 10
Posts: 6
ID: 399813
Credit: 3,442
RAC: 0
Message 68330 - Posted 2 Nov 2010 0:37:25 UTC - in response to Message ID 68329.

Thanks for the quick reply and info both Marasaki & Mod.sense.
I have been using SETI and of course BOINC for a while now. I'm familiar with the CPU usgae settings and mine are at 100% and 0 (run always). I did not have this issue when running the SETI Tasks. I am only running Rosetta now as SETI is down for a server upgrade so there is no competing for Projects.


As you are familiar with BOINC I assume you are aware that local settings can override web-settings. Therefore if you have checked the BOINC Manager preferences on your local computer and they say 0 then there isn't anything more I can suggest. Hopefully someone else can provide an answer.


Thanks Murasaki. Yes I'm using the local Preferences settings. Thanks for your help though. Just kind of weird. I'm not going to kill myself about it but it does cause me to manage something I'd rather not. Hopefully there is another suggestion out there.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68334 - Posted 2 Nov 2010 3:14:40 UTC

All I can suggest is that this doesn't sound like how it was intended to work, and so I'd check to see if there is a fix in a newer version of BOINC Manager for such a problem. What BOINC version are you running?
____________
Rosetta Moderator: Mod.Sense

[AF>france>pas-de-calais]symaski62

Joined: Sep 19 05
Posts: 47
ID: 506
Credit: 33,871
RAC: 0
Message 68344 - Posted 2 Nov 2010 16:09:15 UTC

no problem :)

02/11/2010 00:22:36 suspend work if non-BOINC CPU load exceeds 25 %

BOINC.exe (french) => Activité => 1/3/1

minirosetta_2.17_windows_intelx86.exe 50% (CPU)

Processor: 2 GenuineIntel Intel(R) Pentium(R) Dual CPU E2160 @ 1.80GHz [Family 6 Model 15 Stepping 13]
Processor: 1.00 MB cache

Preferences: max memory usage when active: 1022.64MB
Preferences: max memory usage when idle: 1022.64MB
Preferences: max CPUs used: 1 <= 50%

voila :)


____________

Pardner

Joined: Oct 31 10
Posts: 6
ID: 399813
Credit: 3,442
RAC: 0
Message 68352 - Posted 2 Nov 2010 19:42:13 UTC - in response to Message ID 68334.

All I can suggest is that this doesn't sound like how it was intended to work, and so I'd check to see if there is a fix in a newer version of BOINC Manager for such a problem. What BOINC version are you running?



Hi Mod.Sense,

I'm using BOINC 6.10.58 and I see that as the current/latest version. It's doing it on 2 of my computers too. Guess I just get the "fun issues". I think I'll let the Tasks that I have complete and then drop Rosetta since that's the only app where I have this issue.

Thanks anyway.

Snagletooth

Joined: Feb 22 07
Posts: 192
ID: 149031
Credit: 1,396,123
RAC: 1,318
Message 68359 - Posted 3 Nov 2010 1:20:50 UTC - in response to Message ID 68352.

All I can suggest is that this doesn't sound like how it was intended to work, and so I'd check to see if there is a fix in a newer version of BOINC Manager for such a problem. What BOINC version are you running?



Hi Mod.Sense,

I'm using BOINC 6.10.58 and I see that as the current/latest version. It's doing it on 2 of my computers too. Guess I just get the "fun issues". I think I'll let the Tasks that I have complete and then drop Rosetta since that's the only app where I have this issue.

Thanks anyway.


I know you went into local prefs to set them but did you confirm that BOINC is actually using them? The easiest way to do this is to quit BOINC. Upon restarting BOINC open the messages tab and look for "Reading preferences override file". It should be about 20 lines down. If it's not there, despite your best efforts, BOINC is using website prefs instead of local ones. It may simply be your copy of BOINC has developed a glitch and a reinstall is all that's needed to set things right again.

When using global prefs the BOINC client uses the most recently modified preferences. Rosetta@home uses a somewhat older server code that does not know about this relatively new client setting. IF, when you signed up a few days ago, you made any changes in the computing preferences section on the web site AND your BOINC client is insisting, for whatever reason, on using global instead of local prefs, then the client would default to 25% for this setting and that would explain what's happening. You can confirm where BOINC is reading the global prefs by looking at those first 20 lines in the messages tab. You should see a line something like "General prefs: from rosetta@home (last modified 31-Oct-2010 15:20:47)".

IF, while you were running SETI, you edited the computing preferences on the SETI website including changing that 25 to 0 AND your BOINC client is insisting, for whatever reason, on using global instead of local prefs, then, well, you would notice nothing wrong since your global and local prefs would exactly match.

It may be that signing up for a project using an older server code has exposed a fault in your copy of BOINC that was previously invisible.

If your client says it's reading the override file but you are still seeing the "cpu usage too high" message and, as in the message line described above, your "general prefs" are from rosetta, then try this: go back to the SETI website and edit the computing preferences again. You don't have to actually change anything; the goal is just to make sure the SETI prefs are the last modified and so the ones BOINC will use. You can check the modified date at the top of the web page. Restart BOINC and see what happens.

Of course, it would be odd for the same glitch to occur on two different PCs at the same time (unless you copied over a broken BOINC from the first pc to the second rather than downloading a fresh copy from Berkeley) so ... when you changed the local prefs did you close the window by clicking OK or by some other method? Forgive me if that sounds silly; I'm just trying to cover all the bases.


Best,
Snags

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68369 - Posted 3 Nov 2010 13:15:30 UTC

Good line of thought Snags.

I believe where Snags says "Restart BOINC and see what happens." in the second to last paragraph, they intended to say you should select the option to "update" in the projects tab when you have SETI highlighted (this will bring down the new preference for the CPU threshold as defined in your SETI profile), and then restart BOINC.

To word it another way, there is a new preference for CPU threshold. Older server code such as what is being used presently on R@h knows nothing of this new field and so it may be getting a default value if your preferences are coming from R@h. SETI is an example of a project that has the new server code and will allow you to enter your desired CPU threshold. So we're just trying to force it to pull those preferences down to your machine and run with them.
____________
Rosetta Moderator: Mod.Sense

Pardner

Joined: Oct 31 10
Posts: 6
ID: 399813
Credit: 3,442
RAC: 0
Message 68370 - Posted 3 Nov 2010 18:34:46 UTC - in response to Message ID 68359.

All I can suggest is that this doesn't sound like how it was intended to work, and so I'd check to see if there is a fix in a newer version of BOINC Manager for such a problem. What BOINC version are you running?



Hi Mod.Sense,

I'm using BOINC 6.10.58 and I see that as the current/latest version. It's doing it on 2 of my computers too. Guess I just get the "fun issues". I think I'll let the Tasks that I have complete and then drop Rosetta since that's the only app where I have this issue.

Thanks anyway.


I know you went into local prefs to set them but did you confirm that BOINC is actually using them? The easiest way to do this is to quit BOINC. Upon restarting BOINC open the messages tab and look for "Reading preferences override file". It should be about 20 lines down. If it's not there, despite your best efforts, BOINC is using website prefs instead of local ones. It may simply be your copy of BOINC has developed a glitch and a reinstall is all that's needed to set things right again.

When using global prefs the BOINC client uses the most recently modified preferences. Rosetta@home uses a somewhat older server code that does not know about this relatively new client setting. IF, when you signed up a few days ago, you made any changes in the computing preferences section on the web site AND your BOINC client is insisting, for whatever reason, on using global instead of local prefs, then the client would default to 25% for this setting and that would explain what's happening. You can confirm where BOINC is reading the global prefs by looking at those first 20 lines in the messages tab. You should see a line something like "General prefs: from rosetta@home (last modified 31-Oct-2010 15:20:47)".

IF, while you were running SETI, you edited the computing preferences on the SETI website including changing that 25 to 0 AND your BOINC client is insisting, for whatever reason, on using global instead of local prefs, then, well, you would notice nothing wrong since your global and local prefs would exactly match.

It may be that signing up for a project using an older server code has exposed a fault in your copy of BOINC that was previously invisible.

If your client says it's reading the override file but you are still seeing the "cpu usage too high" message and, as in the message line described above, your "general prefs" are from rosetta, then try this: go back to the SETI website and edit the computing preferences again. You don't have to actually change anything; the goal is just to make sure the SETI prefs are the last modified and so the ones BOINC will use. You can check the modified date at the top of the web page. Restart BOINC and see what happens.

Of course, it would be odd for the same glitch to occur on two different PCs at the same time (unless you copied over a broken BOINC from the first pc to the second rather than downloading a fresh copy from Berkeley) so ... when you changed the local prefs did you close the window by clicking OK or by some other method? Forgive me if that sounds silly; I'm just trying to cover all the bases.


Best,
Snags



Hi Snagletooth & Mod. Sense,

Snags you are the BOMB!!

I looked at the statement in Messages for what you suggested... "General prefs: from rosetta@home (last modified 31-Oct-2010 15:20:47)".
What I found was "General prefs: from SETI@home (last modified 21-Sep-2010 09:26:17)".

I logged into SETI to update my global "Computing Preferences". Made one change and clicked Update to see if it would work. I exited the BOINC app and then restarted BOINC, looked for the "General prefs" message and it had not changed. (Maybe because the SETI project is currently down)

I then decided to log into Rosetta and modified 1 item in the "Computing Preferences" and clicked Update. I then exited BOINC and restarted it.
Lo and behold I saw "11/3/2010 10:58:12 AM rosetta@home General prefs: from rosetta@home (last modified 03-Nov-2010 10:51:03)".

I'm now running just fine on both PCs with no "CPU usage too high" messages.

Thanks very much to you, Mod.Sense & Murasaki for taking the time with this and providing input. It is VERY much appreciated!!!! I was going through "crunch withdrawl".

Pardner

Rabinovitch Profile
Avatar

Joined: Apr 28 07
Posts: 28
ID: 170444
Credit: 1,377,008
RAC: 1,448
Message 68374 - Posted 4 Nov 2010 4:14:23 UTC

Two times on both PC and notebook (both under Kubuntu 10.xx amd 64) WUs became very "heavy" (more than 1000 MB in the RAM). Ain't it a bug?

I have suspended both of them now.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68375 - Posted 4 Nov 2010 6:39:52 UTC
Last modified: 4 Nov 2010 6:47:00 UTC

Rabinovitch, I see you have a number of machines, and are saying at least two are seeing high memory utilization. Could you please locate the specific tasks where you are seeing this? And please double check the Rosetta version they are running... errr... ummm "suspended" under :) (this is shown in the tasks tab in the advanced view under the "application" column).
____________
Rosetta Moderator: Mod.Sense

Snagletooth

Joined: Feb 22 07
Posts: 192
ID: 149031
Credit: 1,396,123
RAC: 1,318
Message 68384 - Posted 4 Nov 2010 15:22:03 UTC - in response to Message ID 68370.



Hi Snagletooth & Mod. Sense,

Snags you are the BOMB!!

I looked at the statement in Messages for what you suggested... "General prefs: from rosetta@home (last modified 31-Oct-2010 15:20:47)".
What I found was "General prefs: from SETI@home (last modified 21-Sep-2010 09:26:17)".

I logged into SETI to update my global "Computing Preferences". Made one change and clicked Update to see if it would work. I exited the BOINC app and then restarted BOINC, looked for the "General prefs" message and it had not changed. (Maybe because the SETI project is currently down)

I then decided to log into Rosetta and modified 1 item in the "Computing Preferences" and clicked Update. I then exited BOINC and restarted it.
Lo and behold I saw "11/3/2010 10:58:12 AM rosetta@home General prefs: from rosetta@home (last modified 03-Nov-2010 10:51:03)".

I'm now running just fine on both PCs with no "CPU usage too high" messages.

Thanks very much to you, Mod.Sense & Murasaki for taking the time with this and providing input. It is VERY much appreciated!!!! I was going through "crunch withdrawl".

Pardner


Well, Pardner, I don't know quite why that worked but nonetheless I am very happy it did. (And very glad Mod.Sense quickly caught my omission of the "update" step. Details, details are everything!)

Happy crunching,
Snags

If you are willing to satisfy my curiosity (or, more likely, risk provoking it further) you could say whether you ever saw the "Reading preferences override file".

[AF>WildWildWest] Ryzen Profile

Joined: Jul 27 08
Posts: 1
ID: 270904
Credit: 105,860
RAC: 0
Message 68398 - Posted 4 Nov 2010 22:39:15 UTC
Last modified: 4 Nov 2010 22:44:59 UTC

Hello,

I have 2 tasks running but they are very long!

one, running during 23 hours with 54% and an other 45h at only 22%!!

What can I do?

I let running these tasks or it's better to stop?

Thank you for your answers

I just view the memory usage and it's very down: 3.3 mo for one, 1 mo for an other and 0.3mo for the last. the problem must be here?

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 68400 - Posted 4 Nov 2010 23:10:19 UTC - in response to Message ID 68398.

Hello,

I have 2 tasks running but they are very long!

one, running during 23 hours with 54% and an other 45h at only 22%!!

What can I do?

I let running these tasks or it's better to stop?

Thank you for your answers

I just view the memory usage and it's very down: 3.3 mo for one, 1 mo for an other and 0.3mo for the last. the problem must be here?


Look at the properties tab for each work unit. If the CPU time (time actually spent working on the WU) is reasonably close to the elapsed time(real time), then it is still working ok. If there is a big difference then it has stopped working. In that case exit completely from boinc and restart it (or reboot). This should get the programmes working properly. Also if the memory usage is below normal then this indicates a problem.
____________

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 68406 - Posted 5 Nov 2010 1:28:30 UTC

I seem to be having problems with a series of tasks which appear to have come into my system in the last day or so - the common point is that they are all named in the form of "Rossmann3x3_abinitio"

Typical message being put out by failed tasks:

ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context

Sample Task IDs: 377019618, 377071245 and 376994962

I also had one Rossmann2x3_abinitio run over seven hours (on a system set to 4 hour preferred run time) and then fail with a repeated message:

OVERFLOW ERROR: Error writing

No indication what was being written.

Sample Task ID: 376994899

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68409 - Posted 5 Nov 2010 4:05:55 UTC
Last modified: 5 Nov 2010 4:10:15 UTC

Moved Chris' post,
Appears these are failing at startup, here's some direct links:

377019618
377071245
376994962

This one ran for more then 7 hours before failing
Overflow error: 376994899
____________
Rosetta Moderator: Mod.Sense

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 68412 - Posted 5 Nov 2010 9:31:07 UTC
Last modified: 5 Nov 2010 9:35:01 UTC

A few more examples of the Rossmann2x3_abinitio tasks having problems, running until the watchdog nails them, and spitting out gobs of "OVERFLOW ERROR: Error writing" messages.

376887103
376878933
377023057


Not all of these tasks are failing - here is a Rossmann2x3_abinitio task which ran normally:

376993800

However, when one of these tasks does decide to go renegade and run all the way out to watchdog territory, it can be justifiably reclassified as demon-spawn - I have watched several and they suck up every spare byte of memory on the system like a tax collector on steroids - I just watched one which had nearly 2 gig of memory allocated and resident.

Ouch!

This effectively shut out all other BOINC tasks until it completed. No other tasks were able to start until this task was purged and the memory released.

[AF>france>pas-de-calais]symaski62

Joined: Sep 19 05
Posts: 47
ID: 506
Credit: 33,871
RAC: 0
Message 68415 - Posted 5 Nov 2010 12:01:17 UTC - in response to Message ID 68409.

Moved Chris' post,
Appears these are failing at startup, here's some direct links:

This one ran for more then 7 hours before failing
Overflow error: 376994899


http://boinc.bakerlab.org/rosetta/result.php?resultid=376884639

ERROR: Unable to open file: minirosetta_database/chemical/residue_type_sets/faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/residue_types.txt

ERROR:: Exit from: src/core/chemical/ResidueTypeSet.cc line: 96
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish



____________

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 68428 - Posted 6 Nov 2010 12:10:57 UTC

Come on guys - I find it hard to believe that I am the only one seeing these Rossmann2X3 tasks chew up their systems. Some complete, some fail, all are running long an are using nearly 2 gig per task. And all spit out the ominous "OVERFLOW ERROR: Error writing" repeatedly.

Here are two which finished - generating just 1 decoy for eight hours of run time:

377289713
376887410

And here is one which did not (Google the error message and it seems like it is trying to create a string longer than the system / compiler allows)

377281598

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,590,569
RAC: 2,277
Message 68430 - Posted 6 Nov 2010 13:52:35 UTC

I've checked 2 of mine and they ended in a success. Could the problem be that in your case you're using Darwin and Linux? I'm using Windows. I'm not monitoring memory usage but it does not seen excessive to me, as far as I can tell.
____________

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 68432 - Posted 6 Nov 2010 14:07:42 UTC

Transient -

You could be right about it being a problem unique to Linux and OSX (Darwin) - in both cases they very well may be built using the same compiler (GCC?) and it is possible they have stumbled on an awkward spot.

I have no way of knowing - in preparation for the purification ceremonies required to reach a higher state of karma and grace, I no longer own or run a Windows system :)

Ace Casino Profile

Joined: Jul 16 07
Posts: 10
ID: 191064
Credit: 3,575,075
RAC: 3,280
Message 68433 - Posted 6 Nov 2010 20:20:34 UTC

Getting computational errors.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 68436 - Posted 6 Nov 2010 20:59:36 UTC - in response to Message ID 68433.

Getting computational errors.



your getting file transfer errors
error -161 to be precise
<message>
<file_xfer_error>
<file_name>TEMP_0.01_control_1shfA_SAVE_ALL_OUT_22400_68_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

your system is processing the tasks just fine but when it comes to writing the data there is a problem.

AtHomer
Avatar

Joined: Jan 26 10
Posts: 13
ID: 368008
Credit: 2,475,105
RAC: 492
Message 68437 - Posted 6 Nov 2010 21:19:59 UTC
Last modified: 6 Nov 2010 21:20:32 UTC

I have had two of those "Rossmann" WUs today and they both "crashed". They just kept on running for hours, the last checkpoint having been over three hours ago. I have spent over 12 hours of crunching today on these runaway tasks. Such a waste of resources! Is there no way to prevent this?

When a task has had its last checkpoint a long time in the past, it would be better to stop it right away and download a new one, right? Whenever I see a task like this I abort it manually.

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 365,375
RAC: 94
Message 68438 - Posted 6 Nov 2010 23:08:29 UTC - in response to Message ID 68437.

Such a waste of resources! Is there no way to prevent this?


The watchdog should shut the task down automatically when you reach
4 hours past your preferred run time. For example, if you have a runtime
of 10 hours a task will terminate at 14 hours if it has not been able to
checkpoint before then.

Is this a waste of resources? Yes, but it is seen as a reasonable
balance between stopping rogue tasks that aren't working properly and
not wasting good tasks that are just being a little slow in reaching a
checkpoint.

Is there a better a way to achieve that balance? Perhaps, but I
personally don't have a good answer.

Michael Gould

Joined: Feb 3 10
Posts: 39
ID: 368947
Credit: 1,149,075
RAC: 0
Message 68440 - Posted 7 Nov 2010 7:09:45 UTC - in response to Message ID 68432.


You could be right about it being a problem unique to Linux and OSX (Darwin)...


Chris, you obviously run many more WU's than I do, but I haven't had any errors at all running them on my OS X machine. There is a Ross2X3 running as I type this. And I only have 2 gig of total ram installed.

Perhaps only certain WU's are problematic? The larger molecules, I guess.

Ace Casino Profile

Joined: Jul 16 07
Posts: 10
ID: 191064
Credit: 3,575,075
RAC: 3,280
Message 68441 - Posted 7 Nov 2010 10:40:21 UTC

@Greg_BE,
If you look at my compute errors you will see after the WU was sent out to second party, it error-ed out again. So, not a problem on my side.

AdeB Profile
Avatar

Joined: Dec 12 06
Posts: 45
ID: 135244
Credit: 2,358,915
RAC: 2,105
Message 68442 - Posted 7 Nov 2010 10:46:53 UTC - in response to Message ID 68440.


You could be right about it being a problem unique to Linux and OSX (Darwin)...


Chris, you obviously run many more WU's than I do, but I haven't had any errors at all running them on my OS X machine. There is a Ross2X3 running as I type this. And I only have 2 gig of total ram installed.

Perhaps only certain WU's are problematic? The larger molecules, I guess.


And I haven't had any errors on my linux machine. Even one of Chris' linux machines has no problems with them. Could it be machine specific?

Adeb
____________

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 68444 - Posted 7 Nov 2010 14:59:11 UTC
Last modified: 7 Nov 2010 15:04:04 UTC

AdeB wondered:

Even one of Chris' Linux machines has no problems with them. Could it be machine specific?


The machine you pointed to has had the issue - although the task did not end in error it did eat up all off the memory in sight, run until the watchdog killed it, and spit out repeated "OVERFLOW ERROR: Error writing" messages.

Just because the task runs to completion, does not mean its not a problem task. Extreme memory usage + runtime can be issues when one of these tasks pretty much shut down the other 3 (or 5) cores on a system.

And it is not AMD specific - it also happens on my Xeon based Mac pro too.

But I do appreciate you taking the time to look at it and offer suggestions, I really do.

A couple sample tasks from the the system AdeB pointed to:

377124278
377297655

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,545,746
RAC: 7,447
Message 68455 - Posted 8 Nov 2010 13:10:35 UTC

In several posts Chris wrote:

A few more examples of the Rossmann2x3_abinitio tasks having problems, running until the watchdog nails them, and spitting out gobs of "OVERFLOW ERROR: Error writing" messages.

Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_001_22515_226_0 - Linux 6.10.56
Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_008_22515_182_0 - Darwin 6.10.58
Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_008_22515_1096_0 - Linux 6.10.56

...

Come on guys - I find it hard to believe that I am the only one seeing these Rossmann2X3 tasks chew up their systems. Some complete, some fail, all are running long an are using nearly 2 gig per task. And all spit out the ominous "OVERFLOW ERROR: Error writing" repeatedly.

Here are two which finished - generating just 1 decoy for eight hours of run time:

Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_001_22515_1024_1 - Linux 6.10.56
Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_007_22515_256_0 - Linux 6.10.56

...

You could be right about it being a problem unique to Linux and OSX (Darwin) - in both cases they very well may be built using the same compiler (GCC?) and it is possible they have stumbled on an awkward spot.

I have no way of knowing - in preparation for the purification ceremonies required to reach a higher state of karma and grace, I no longer own or run a Windows system :)

...

AdeB wondered:
Even one of Chris' Linux machines has no problems with them. Could it be machine specific?

The machine you pointed to has had the issue - although the task did not end in error it did eat up all off the memory in sight, run until the watchdog killed it, and spit out repeated "OVERFLOW ERROR: Error writing" messages.

Just because the task runs to completion, does not mean its not a problem task. Extreme memory usage + runtime can be issues when one of these tasks pretty much shut down the other 3 (or 5) cores on a system.

And it is not AMD specific - it also happens on my Xeon based Mac pro too.

But I do appreciate you taking the time to look at it and offer suggestions, I really do.

A couple sample tasks from the the system AdeB pointed to:

Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_004_22515_1706_0 - Linux 6.10.56
Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_004_22515_1706_0 - Linux 6.10.56

I checked a few days ago and I really didn't see any of this, so I've assumed it was OS specific or machine specific, as suggested, but I just glanced at a long-running watchdog-truncated job and find I had the same experience on my W7 x64 laptop.

I've modified Chris's earlier links to show the job names, OS & Boinc version just in case it reveals a more specific pattern of tasks. My task was slightly different in that it does seem to have checkpointed several times before the watchdog cut in at 8+4 hours.

Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_005_22515_1974_0 - Windows 7 64-bit 6.10.58

So the pattern is more specifically "Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_" if that helps.
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,545,746
RAC: 7,447
Message 68460 - Posted 8 Nov 2010 17:17:17 UTC
Last modified: 8 Nov 2010 17:21:02 UTC

PCS_PGR122A_v1.frag_18-51_SAVE_ALL_OUT_22518_71_0

Outcome Client error
Client state Compute error
Exit status 1 (0x1)

CPU time 14.30529

stderr out <core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
...
ERROR: First parameter of SVD_Solver constructor MUST be larger than the second parameter
ERROR:: Exit from: ..\..\src\numeric\SVD\SVD_Solver.cc line: 202
BOINC:: Error reading and gzipping output datafile: default.out

Same error from the wingman too.
____________

AdeB Profile
Avatar

Joined: Dec 12 06
Posts: 45
ID: 135244
Credit: 2,358,915
RAC: 2,105
Message 68462 - Posted 8 Nov 2010 19:37:51 UTC - in response to Message ID 68455.

In several posts Chris wrote:
A few more examples of the Rossmann2x3_abinitio tasks having problems, running until the watchdog nails them, and spitting out gobs of "OVERFLOW ERROR: Error writing" messages.

Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_001_22515_226_0 - Linux 6.10.56
Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_008_22515_182_0 - Darwin 6.10.58
Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_008_22515_1096_0 - Linux 6.10.56

...

Come on guys - I find it hard to believe that I am the only one seeing these Rossmann2X3 tasks chew up their systems. Some complete, some fail, all are running long an are using nearly 2 gig per task. And all spit out the ominous "OVERFLOW ERROR: Error writing" repeatedly.

Here are two which finished - generating just 1 decoy for eight hours of run time:

Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_001_22515_1024_1 - Linux 6.10.56
Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_007_22515_256_0 - Linux 6.10.56

...

You could be right about it being a problem unique to Linux and OSX (Darwin) - in both cases they very well may be built using the same compiler (GCC?) and it is possible they have stumbled on an awkward spot.

I have no way of knowing - in preparation for the purification ceremonies required to reach a higher state of karma and grace, I no longer own or run a Windows system :)

...

AdeB wondered:
Even one of Chris' Linux machines has no problems with them. Could it be machine specific?

The machine you pointed to has had the issue - although the task did not end in error it did eat up all off the memory in sight, run until the watchdog killed it, and spit out repeated "OVERFLOW ERROR: Error writing" messages.

Just because the task runs to completion, does not mean its not a problem task. Extreme memory usage + runtime can be issues when one of these tasks pretty much shut down the other 3 (or 5) cores on a system.

And it is not AMD specific - it also happens on my Xeon based Mac pro too.

But I do appreciate you taking the time to look at it and offer suggestions, I really do.

A couple sample tasks from the the system AdeB pointed to:

Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_004_22515_1706_0 - Linux 6.10.56
Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_004_22515_1706_0 - Linux 6.10.56

I checked a few days ago and I really didn't see any of this, so I've assumed it was OS specific or machine specific, as suggested, but I just glanced at a long-running watchdog-truncated job and find I had the same experience on my W7 x64 laptop.

I've modified Chris's earlier links to show the job names, OS & Boinc version just in case it reveals a more specific pattern of tasks. My task was slightly different in that it does seem to have checkpointed several times before the watchdog cut in at 8+4 hours.

Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_005_22515_1974_0 - Windows 7 64-bit 6.10.58

So the pattern is more specifically "Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_" if that helps.


Of course I only did a quick scan, and missed the problematic tasks on Chris' machine.
Sid's approach clearly shows a pattern, nice catch.

AdeB
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,545,746
RAC: 7,447
Message 68464 - Posted 8 Nov 2010 22:43:58 UTC - in response to Message ID 68462.

In several posts Chris wrote: ...

...So the pattern is more specifically "Rossmann2x3_abinitio_SAVE_ALL_OUT_design_or28_w_csfrags_" if that helps.


Of course I only did a quick scan, and missed the problematic tasks on Chris' machine.
Sid's approach clearly shows a pattern, nice catch.

It was a possibility it was OS related while no-one else reported differently. It was the "write errors" that made me realise I had the same issue on a different OS.

Also, my error's on an Intel-based laptop, not my AMD desktop (yet), so it's not tied to AMD processors either. It seems to be the task itself (though most went through ok, as Chris originally reported). One for the coders to ponder.
____________

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 68468 - Posted 8 Nov 2010 23:16:20 UTC

Thanks a lot Sid - since it happened on both my Intel-based Xeon processor and AMD Phenom's I was pretty sure the issue was not silicon - however I could not defend the OS.

Darwin's kernel is BSD at the core, and BSD and Linux share a compiler and many run-times, so knowing it also happened with Windows was key. Thanks for taking the time to review your tasks.

I had two systems whose queues were just packed to the gills with these tasks - since they used enough memory to bollix up the whole system I had a big abort party after work today.

However, speaking of long-running / low decoy count tasks, while scanning a few other user's task lists I was in awe of these PCS* / PCT tasks which came into the queue over the past few days.

I don't really care that they are watchdog bait, at least they don't seem to be bringing my system to its knees with a 2 gigabyte memory requirement.

Pardner

Joined: Oct 31 10
Posts: 6
ID: 399813
Credit: 3,442
RAC: 0
Message 68471 - Posted 9 Nov 2010 1:57:07 UTC - in response to Message ID 68384.



Hi Snagletooth & Mod. Sense,

Snags you are the BOMB!!

I looked at the statement in Messages for what you suggested... "General prefs: from rosetta@home (last modified 31-Oct-2010 15:20:47)".
What I found was "General prefs: from SETI@home (last modified 21-Sep-2010 09:26:17)".

I logged into SETI to update my global "Computing Preferences". Made one change and clicked Update to see if it would work. I exited the BOINC app and then restarted BOINC, looked for the "General prefs" message and it had not changed. (Maybe because the SETI project is currently down)

I then decided to log into Rosetta and modified 1 item in the "Computing Preferences" and clicked Update. I then exited BOINC and restarted it.
Lo and behold I saw "11/3/2010 10:58:12 AM rosetta@home General prefs: from rosetta@home (last modified 03-Nov-2010 10:51:03)".

I'm now running just fine on both PCs with no "CPU usage too high" messages.

Thanks very much to you, Mod.Sense & Murasaki for taking the time with this and providing input. It is VERY much appreciated!!!! I was going through "crunch withdrawl".

Pardner


Well, Pardner, I don't know quite why that worked but nonetheless I am very happy it did. (And very glad Mod.Sense quickly caught my omission of the "update" step. Details, details are everything!)

Happy crunching,
Snags

If you are willing to satisfy my curiosity (or, more likely, risk provoking it further) you could say whether you ever saw the "Reading preferences override file".


Hi Snags... Just checked back to see if there were any further comments. And yes I do see the "Reading preferences override file" statement in my Messages. Hope that doesn't cause any "provoking". Thanks again for your help.

Pardner

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 507,926
RAC: 0
Message 68634 - Posted 16 Nov 2010 4:06:04 UTC
Last modified: 16 Nov 2010 4:15:59 UTC

Both of the following tasks completed successfully. My runtime pref is 3 hours.
379301586 took 5.89 hours credit 158.14/184.97 & 379301549 took 5.64 hours credit 151.65/184.97. Both are from batch 1FPW_R2. Ran on stock I7 980X with HT on. I'm just passing info on nothing more nothing less.
Edits= getting links to work correctly
____________
Have a crunching good day!!

cleaner

Joined: Aug 22 10
Posts: 6
ID: 391551
Credit: 26,245
RAC: 0
Message 68637 - Posted 16 Nov 2010 10:06:18 UTC

I am getting alot of "output file absent" messages lately. It seems almost every work unit now is spitting out that message. Anyone else having the same issue??

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68640 - Posted 16 Nov 2010 15:51:18 UTC

cleaner, in looking at a few of your problem tasks, I see what are most likely memory exceptions. The tasks were then sent to another user and ran successfully. You are running with a longer runtime preference and that is one possible reason your machine might eventually hit a problem and another machine might not, but it tends to point to a problem on your machine. Have you tried any of the memory stress test tools? Are you overclocking that machine?
____________
Rosetta Moderator: Mod.Sense

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68643 - Posted 16 Nov 2010 17:27:36 UTC

Cleaner, this topic was posted about earlier in the thread.

I would like to write some details on this topic, but when I downloaded the new BOINC version it says it has a 25% CPU threshold (in the startup messages), but it doesn't seem to be enforcing it. I've updated other local preferences, see the 25% in the global_prefs_override.xml file, updated to R@h, restarted BOINC, but it still doesn't seem to suspend when CPU usage gets high.

Does anyone have specifics on the combination of updating to another project, or account manager, and what causes people to see the CPU threshold being enforced? I'd like to make it happen on my machine, study it in more detail and verify alternatives for establishing the desired setting for the CPU threshold.

____________
Rosetta Moderator: Mod.Sense

[AF>france>pas-de-calais]symaski62

Joined: Sep 19 05
Posts: 47
ID: 506
Credit: 33,871
RAC: 0
Message 68645 - Posted 16 Nov 2010 20:50:50 UTC

yes, :) i am french

1 CPU => BOINC 0% CPU & rosetta 100%

2 CPU => BOINC 25% & rosetta 50%

2 CPU => BOINC 0% & rosetta 100%

4 CPU => BOINC 25% & rosetta 25%, 50%, 75%.

4 CPU => BOINC 0% & rosetta 100%




____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68646 - Posted 16 Nov 2010 23:24:48 UTC

Yes, merci a` symaski62, I am aware of the setting you are showing. But BOINC is still running 100% of CPU when low priority permits it. Even if another task is using more then 25% of the CPU for several minutes. The 25% threshold, as shown in the start up messages and the display you are showing, is being ignored.
____________
Rosetta Moderator: Mod.Sense

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 365,375
RAC: 94
Message 68647 - Posted 16 Nov 2010 23:55:12 UTC - in response to Message ID 68643.
Last modified: 16 Nov 2010 23:56:08 UTC

I would like to write some details on this topic, but
when I downloaded the new BOINC version it says it has a 25%
CPU threshold (in the startup messages), but it doesn't seem
to be enforcing it. I've updated other local preferences,
see the 25% in the global_prefs_override.xml file, updated
to R@h, restarted BOINC, but it still doesn't seem to
suspend when CPU usage gets high.


I assume in your activity menu you have "Run based on preferences"
selected? It is a simple option that I expect you have probably
checked already, but it is often the simple things that trip
people up.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68649 - Posted 17 Nov 2010 0:19:25 UTC

<<---- smacks forehead. Been a while since I've tripped up on that one. Thanks.
____________
Rosetta Moderator: Mod.Sense

cleaner

Joined: Aug 22 10
Posts: 6
ID: 391551
Credit: 26,245
RAC: 0
Message 68650 - Posted 17 Nov 2010 9:46:34 UTC - in response to Message ID 68640.

cleaner, in looking at a few of your problem tasks, I see what are most likely memory exceptions. The tasks were then sent to another user and ran successfully. You are running with a longer runtime preference and that is one possible reason your machine might eventually hit a problem and another machine might not, but it tends to point to a problem on your machine. Have you tried any of the memory stress test tools? Are you overclocking that machine?


I ran the memory test tool from Microsoft 3 or 4 weeks ago and it tested okay. My machine is not overclocked. I will reset to default prefs and see what happens.

wolfpat

Joined: May 1 10
Posts: 3
ID: 379046
Credit: 836,905
RAC: 301
Message 68667 - Posted 19 Nov 2010 14:44:09 UTC

I've had so much trouble with minirosetta 2.17, I had to stop running it on two of my machines. The only results I get on them anyway is "Computation Error"

There's no problem with it on my Windows 7 computer. But with my Windows 2000 and my XP machines, it totally louses up Explorer. I have to restart using the reset button to get them to do anything. The only consistent symptoms are that all text disappears and clicking on icons has no response.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68668 - Posted 19 Nov 2010 17:17:45 UTC

The responsiveness of your computer is often related to memory. The active programs are using the memory and when you sit down and start something else, you first have to bring the programs that control the desktop etc. in to memory again.

Your XP machine only has 1GB of memory for 2 processors. That is on the small side, and the tasks that have been running recently are taking memory more on the large side.

You can configure how much memory BOINC is allowed to use and this will help reserve some space for your other applications. I'd suggest perhaps just allowing BOINC to only use one CPU on that machine might be a good compromise. It will only need memory for one task rather then two, and you probably won't have to worry too much about setting any specific memory limitations.

Your Win2000 machine has 512MB for one CPU. Again on the small side for what Rosetta would like to have to run well. Your Win7 machine by comparison, where you say things are running well, has 4GB for 2 CPUs.

Having said all of that, now in looking at the task details I see they all seem to fail with failures accessing files. In some cases the file named in the v2.17 program itself. So it was running for an hour, and then disappeared? It sounds like you have something else going on. Perhaps a virus checker discovering new files and placing them under quarantine?
____________
Rosetta Moderator: Mod.Sense

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 68679 - Posted 21 Nov 2010 20:21:47 UTC
Last modified: 21 Nov 2010 20:22:10 UTC

Anyone else seeing a large number of validate errors on tasks whose name start out with "rb_11_20" ??

I am seeing these on several machines, both OSX and Linux, AMD and Intel.

They are generally really short running - only a few minutes. The joblog says they are shutting down cleanly, but they all seem to get validate errors.

A few sample tasks would be:

380698404
380672414
380781443
380752525

380728668
380708503
380737556

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 365,375
RAC: 94
Message 68682 - Posted 21 Nov 2010 23:06:01 UTC - in response to Message ID 68679.
Last modified: 21 Nov 2010 23:08:04 UTC

Anyone else seeing a large number of validate errors on tasks whose name start out with "rb_11_20" ??

I am seeing these on several machines, both OSX and Linux, AMD and Intel.


Each of the tasks you listed also returned errors for your wingmen, though 380672414
got a compute error rather than a validate error. It doesn't appear to be platform
specific as your wingmen were using a mixture of machines including Darwin,
Windows XP and Windows 7.

I have only had one rb_11_20 task go through so far, but it appears to have validated
okay.

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 5,953,021
RAC: 7,157
Message 68684 - Posted 23 Nov 2010 1:56:39 UTC

This task:

rb_11_22_20682_38744_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_22593_1483_1

ended after 13:16.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68704 - Posted 24 Nov 2010 18:47:00 UTC

Is anyone else seeing this?

<message>
Maximum memory exceeded
</message>


See details here
____________
Rosetta Moderator: Mod.Sense

LigH Profile
Avatar

Joined: Sep 7 09
Posts: 19
ID: 343796
Credit: 3,084,011
RAC: 1,882
Message 68777 - Posted 7 Dec 2010 10:29:51 UTC

At the moment, there are 3 tasks of 4 hung for me ("Processor time" much lower than "Elapsed time", 0% CPU, ~300 MB RAM):


____________
Fun and success!

Jobs: holzon + 12angebote
Hobbies: doom9/Gleitz + PlaneShift

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,590,569
RAC: 2,277
Message 68781 - Posted 7 Dec 2010 15:43:57 UTC - in response to Message ID 68777.

At the moment, there are 3 tasks of 4 hung for me ("Processor time" much lower than "Elapsed time", 0% CPU, ~300 MB RAM):



You have tried exiting BOINC (completely) and restarting BOINC afterwards?

____________

LigH Profile
Avatar

Joined: Sep 7 09
Posts: 19
ID: 343796
Credit: 3,084,011
RAC: 1,882
Message 68788 - Posted 8 Dec 2010 7:47:37 UTC
Last modified: 8 Dec 2010 8:40:19 UTC

A reboot in the meantime must have unlocked the tasks.

That means I cannot trust BOINC running unattended.
__

P.S.: Quitting and restarting the BOINC manager helped as well.

I wonder if BOINC should implement a detector for hung tasks and restart those up to # times when one is detected "active" but not progressing for at least # minutes.
____________
Fun and success!

Jobs: holzon + 12angebote
Hobbies: doom9/Gleitz + PlaneShift

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 68790 - Posted 10 Dec 2010 1:27:35 UTC

I have some minirosetta workunits that are finished, but have been trying to upload their outputs for several hours:

rhoA8Dec2010_1lb1_2a1i_ProteinInterfaceDesign_8Dec2010_22762_101

mem_prog_run05_centroid_round01_E_subrun_000003_SAVE_ALL_OUT_IGNORE_THE_REST_22743_66868

The delays on uploading seem to be holding up any requests for downloading more workunits from Rosetta@home, and somewhat for workunits from other projects as well.

Is your server for accepting uploads having problems?

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,238,180
RAC: 4,709
Message 68792 - Posted 10 Dec 2010 1:51:06 UTC - in response to Message ID 68790.

I have some minirosetta workunits that are finished, but have been trying to upload their outputs for several hours:

rhoA8Dec2010_1lb1_2a1i_ProteinInterfaceDesign_8Dec2010_22762_101

mem_prog_run05_centroid_round01_E_subrun_000003_SAVE_ALL_OUT_IGNORE_THE_REST_22743_66868

The delays on uploading seem to be holding up any requests for downloading more workunits from Rosetta@home, and somewhat for workunits from other projects as well.

Is your server for accepting uploads having problems?


R@H was down for about a whole day or so.
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68793 - Posted 10 Dec 2010 1:51:11 UTC

robertmiles, all the servers were down for about 36hours here, just recovering now. Pending uploads do not impair downloads, but both servers are currently very busy and you may be seeing the BOINC imposed delays before it tries again.
____________
Rosetta Moderator: Mod.Sense

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 68809 - Posted 15 Dec 2010 11:50:21 UTC

Anyone else seeing this type of error?

TaskID: 386452405

Name: SerineHydrolase_relax_oh37_010_22774_173_0

ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context

Mad_Max

Joined: Dec 31 09
Posts: 150
ID: 365007
Credit: 4,704,855
RAC: 9,235
Message 68827 - Posted 17 Dec 2010 21:02:09 UTC

Few members of my team on our forum reported that part of the taks is completed much earlier the target CPU time. In this case, seems to be no other errors there - a tasks reported as usual and validated by server. Just calculation time is much (several times) smaller than the target time. For example, in this task: http://boinc.bakerlab.org/rosetta/result.php?resultid=386683334
# cpu_run_time_pref: 21600
======================================================
DONE :: 2 starting structures 3104.28 cpu seconds
This process generated 2 decoys from 2 attempts

So it is normal? Ie there is some criterion by which the client finish the calculation so early (similar to how the watchdog force end of calc when the target time + 4 hours exceeded, only in reverse). Or it is some sort of bug?

I myself have never met with such. But probably because i use a small target time (2 hours), and all who reported these tasks using a large target time (above the default) - i.e. 6-12 hours.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 68828 - Posted 17 Dec 2010 22:43:44 UTC

Mad Max, no, the only other mechanism that might end a task early is if it restarts 5 times without having made any progress (i.e. no checkpoint reach in any of the 5 starts). So if you are starting and ending BOINC, and rebooting your machine several times applying patches etc. then you can see basically all of the work in process end this way. But that number 5 is up high enough that even then it's pretty rarely the cause.

I've seen such tasks ending early as well. I've been unable to isolate a cause. Could you ask a few questions of the person reporting the problem? (I'm assuming you are translating... thank you). Are they running more projects then just R@h? Do the tasks seem to end just after work for another project is being started? Looks like 4GB memory there on a 4 CPU machine, is BOINC running on all 4 CPUs? How much memory is BOINC allowed? Is the machine running BOINC 24x7? Or is it rebooted each day or is BOINC's run hours limited by the user preferences? Do the tasks that were only partially completed when BOINC is exited seem to be ending within the first few minutes of BOINC starting again the next time?

These are just based on some of what I'm thinking I'm seeing personally. Hoping that with some additional perspective perhaps I can nail down some specific patterns for the Project Team to investigate. The ultimate would be if one could define a series of steps to follow that would CAUSE a task to end prematurely. I've been unable to do it intentionally.
____________
Rosetta Moderator: Mod.Sense

EvoDude Profile
Avatar

Joined: Nov 6 05
Posts: 21
ID: 9608
Credit: 52,425
RAC: 0
Message 69054 - Posted 9 Jan 2011 3:58:19 UTC

Just come back to Rosetta to stretch the legs on my new I7 but I'm getting continous 'Download failed' when BOINC Manager gets work. The message log on the recent one is:-

09/01/2011 03:51:11 rosetta@home Sending scheduler request: To fetch work.
09/01/2011 03:51:11 rosetta@home Requesting new tasks for CPU
09/01/2011 03:51:13 rosetta@home Scheduler request completed: got 1 new tasks
09/01/2011 03:51:15 rosetta@home Started download of minirosetta_2.17_windows_x86_64.exe
09/01/2011 03:51:15 rosetta@home Started download of minirosetta_graphics_1.92_windows_x86_64.exe
09/01/2011 03:51:16 rosetta@home Giving up on download of minirosetta_2.17_windows_x86_64.exe: file not found
09/01/2011 03:51:16 rosetta@home Giving up on download of minirosetta_graphics_1.92_windows_x86_64.exe: file not found
09/01/2011 03:51:16 rosetta@home Started download of Helvetica.txf
09/01/2011 03:51:16 rosetta@home Started download of minirosetta_database_rev39052.zip
09/01/2011 03:51:17 rosetta@home Giving up on download of Helvetica.txf: file not found
09/01/2011 03:51:17 rosetta@home Giving up on download of minirosetta_database_rev39052.zip: file not found
09/01/2011 03:51:17 rosetta@home Started download of 1poh.aahelix03_05.200_v1_3.gz
09/01/2011 03:51:17 rosetta@home Started download of 1poh.aahelix09_05.200_v1_3.gz
09/01/2011 03:51:24 rosetta@home Finished download of 1poh.aahelix03_05.200_v1_3.gz
09/01/2011 03:51:24 rosetta@home Started download of 1poh.native.pdb
09/01/2011 03:51:26 rosetta@home Finished download of 1poh.aahelix09_05.200_v1_3.gz
09/01/2011 03:51:26 rosetta@home Finished download of 1poh.native.pdb
09/01/2011 03:51:26 rosetta@home Started download of helix.psipred_ss2.gz
09/01/2011 03:51:27 rosetta@home Finished download of helix.psipred_ss2.gz


Any ideas guys and is anyone else getting this?
____________

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 69068 - Posted 9 Jan 2011 14:41:41 UTC
Last modified: 9 Jan 2011 14:44:29 UTC

I have two wu's with "compute errors" that each ran for over 80 hours of cpu time, despite what the logs may otherwise claim...


resultid=390949677


resultid=391015050
____________
Defeat Censorship! Wikileaks needs OUR help! Learn how you can help (d/l 'insurance' file), by clicking here. "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech" B. Franklin

UBT - Rick Horn Profile

Joined: Dec 17 05
Posts: 7
ID: 38932
Credit: 283,961
RAC: 0
Message 69087 - Posted 9 Jan 2011 17:57:29 UTC

I`m getting "download failed" messages with all my WUs also.
____________

FreierFriese

Joined: Aug 31 10
Posts: 1
ID: 392885
Credit: 4,159
RAC: 0
Message 69098 - Posted 9 Jan 2011 21:18:46 UTC

Hi there,

I can't upload my results at the moment, although the status-site tells me everything is ok.

Ater 1-2 seconds the upload stops and BOINC tells me "Project file upload handler is missing".

Link to the WU: http://boinc.bakerlab.org/rosetta/workunit.php?wuid=357863331

Anyone knows if it's my BOINC-client or Rosetta who doesn't let me upload things?

Greetings from Germany.

J.

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 365,375
RAC: 94
Message 69102 - Posted 9 Jan 2011 22:15:08 UTC - in response to Message ID 69098.

Anyone knows if it's my BOINC-client or Rosetta who doesn't let me upload things?


The Rosetta file server had a major crash the other day and other users are reporting similar problems.

Randy Proctor

Joined: Mar 24 10
Posts: 4
ID: 374932
Credit: 599,796
RAC: 0
Message 69111 - Posted 10 Jan 2011 7:41:56 UTC

I have about 30 jobs that need to upload but the I keep getting failed uploads....not sure if this is due to the server crash from earlier....any thoughts?

Just a sample of the messages....

Sun Jan 9 18:04:23 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0
Sun Jan 9 18:05:33 2011 rosetta@home Temporarily failed upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0: HTTP error
Sun Jan 9 18:05:33 2011 rosetta@home Backing off 1 hr 18 min 36 sec on upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0
Sun Jan 9 18:05:33 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0: HTTP error
Sun Jan 9 18:05:33 2011 rosetta@home Backing off 2 hr 5 min 25 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0
Sun Jan 9 20:58:59 2011 rosetta@home Started upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0
Sun Jan 9 20:58:59 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0
Sun Jan 9 21:00:01 2011 rosetta@home Temporarily failed upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0: HTTP error
Sun Jan 9 21:00:01 2011 rosetta@home Backing off 57 min 8 sec on upload of 1A24_new_targets_CONTROL_SAVE_ALL_OUT_21967_3232_0_0
Sun Jan 9 21:00:01 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0: HTTP error
Sun Jan 9 21:00:01 2011 rosetta@home Backing off 3 hr 1 min 16 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3528_0_0
Sun Jan 9 21:50:34 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0
Sun Jan 9 21:50:34 2011 rosetta@home Started upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0
Sun Jan 9 21:50:36 2011 rosetta@home [error] Error reported by file upload server: [1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0] locked by file_upload_handler PID=-1
Sun Jan 9 21:50:36 2011 rosetta@home [error] Error reported by file upload server: [1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0] locked by file_upload_handler PID=-1
Sun Jan 9 21:50:36 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0: transient upload error
Sun Jan 9 21:50:36 2011 rosetta@home Backing off 2 hr 39 min 24 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3527_0_0
Sun Jan 9 21:50:36 2011 rosetta@home Temporarily failed upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0: transient upload error
Sun Jan 9 21:50:36 2011 rosetta@home Backing off 2 hr 41 min 3 sec on upload of 1JW3_new_targets_CONTROL_SAVE_ALL_OUT_21967_3526_0_0

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 69142 - Posted 10 Jan 2011 21:11:46 UTC - in response to Message ID 69068.

I now have another task that has 54+ hours of cpu time, 22% Progress...


t476_boinc_nmr_cm_rnd1_cs_frags_loopbuild_threading_cst_relax_tex_IGNORE_THE_REST_22837_909_0


Anyone wanna bet this will also ultimately die with a "compute error" ?



I have two wu's with "compute errors" that each ran for over 80 hours of cpu time, despite what the logs may otherwise claim...


resultid=390949677


resultid=391015050


____________
Defeat Censorship! Wikileaks needs OUR help! Learn how you can help (d/l 'insurance' file), by clicking here. "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech" B. Franklin

Darmok

Joined: Sep 4 09
Posts: 6
ID: 343146
Credit: 178,639
RAC: 0
Message 69148 - Posted 10 Jan 2011 22:36:01 UTC

I have a problem which never occured to me before. Just prior to the crash, I was rejoining R@H (bad timing) and received a bunch of WU. Boinc Manager miscalculated and I will have many of them which won't make it to the report deadline. Can someone tell me what will happen and what I should do, if anything. Models will start to run in High Priority, which I want to avoid. Thanks

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 69151 - Posted 10 Jan 2011 23:22:24 UTC - in response to Message ID 69148.

I have a problem which never occured to me before. Just prior to the crash, I was rejoining R@H (bad timing) and received a bunch of WU. Boinc Manager miscalculated and I will have many of them which won't make it to the report deadline. Can someone tell me what will happen and what I should do, if anything. Models will start to run in High Priority, which I want to avoid. Thanks


If I understand you correctly, you have more work then you can complete before the deadline. Just abort a few of the tasks that have not started yet is generally the best thing to do. Perhaps reduce the number of days of work you configure to have on hand to avoid getting too much again before BOINC learns how long the tasks are taking to process.
____________
Rosetta Moderator: Mod.Sense

Darmok

Joined: Sep 4 09
Posts: 6
ID: 343146
Credit: 178,639
RAC: 0
Message 69154 - Posted 11 Jan 2011 0:44:18 UTC - in response to Message ID 69151.

That's exactly what happened and I had forgotten to reduce the work buffer prior to connection, which is not an issue with other projects, but is with R@H because of the short completion time provided/required. Too bad for aborting all those WU's though.
Thanks Mod.

Randy Proctor

Joined: Mar 24 10
Posts: 4
ID: 374932
Credit: 599,796
RAC: 0
Message 69156 - Posted 11 Jan 2011 1:01:20 UTC

Why would I be receiving this message?

Mon Jan 10 16:59:55 2011 rosetta@home Message from server: platform 'x86_64-apple-darwin' not found


I did do an OS update to Snow Leopard 10.6.6 but would that cause a problem? I have 40 tasks that are done and need uploaded and the deadline is in 2 days.

Thanks for any help.

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 365,375
RAC: 94
Message 69158 - Posted 11 Jan 2011 1:15:27 UTC - in response to Message ID 69156.

Why would I be receiving this message?

Mon Jan 10 16:59:55 2011 rosetta@home Message from server: platform 'x86_64-apple-darwin' not found


I did do an OS update to Snow Leopard 10.6.6 but would that cause a problem? I have 40 tasks that are done and need uploaded and the deadline is in 2 days.

Thanks for any help.


The file server crashed last week and the Rosetta staff are having problems rebuilding it. Many users are reporting a series of different error messages.

In terms of your uploads and the deadlines it is best not to worry. Normally work is valuable to the scientists even when it comes in late. The best thing you can do is keep an eye on these forums or the home page and wait to see if the Rosetta staff provide further advice.

Randy Proctor

Joined: Mar 24 10
Posts: 4
ID: 374932
Credit: 599,796
RAC: 0
Message 69161 - Posted 11 Jan 2011 1:57:53 UTC
Last modified: 11 Jan 2011 1:58:56 UTC

I had some things upload and not others....honestly I don't care much about this credit stuff, I'm in it for the science, because if success can be found with influenza maybe we can succesfully fight worse things (not that I take influenza lightly cause its a nasty bug).

As long as the scientists get usable results I'm good.

My main concern is I can't get anymore work to continue to help.

TPCBF

Joined: Nov 29 10
Posts: 105
ID: 403518
Credit: 1,704,662
RAC: 2,836
Message 69167 - Posted 11 Jan 2011 3:51:00 UTC - in response to Message ID 69156.

Why would I be receiving this message?

Mon Jan 10 16:59:55 2011 rosetta@home Message from server: platform 'x86_64-apple-darwin' not found


I did do an OS update to Snow Leopard 10.6.6 but would that cause a problem?
Certainly not. When I added my iMac while Rosetta was working fine, so was the Mac client (running on a 2007/24", so Intel x86_64 as well). I did the update to 10.6.6 just the other day when the fit had already hit the shan...

Now, since the project went belly up the last time, I have a mix of WU showing ready to report and uploading, at going by the stats at least, at least a couple of WU must haven been properly being credited. Not only on the Mac, but on various Windows machines as well.
In all, this is one great mess, lots of WU just stuck at uploading across the board, some that are being uploaded apparently seem to be stuck in increasing numbers as "credit pending" and some WU just see to go back and forth just fine.

Well, as far as I am concerned, I haven't detached from the project (yet), but WCG gets my priority as far as my resources are concerned...

Ralf

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 69193 - Posted 11 Jan 2011 21:02:02 UTC - in response to Message ID 69142.

a bit over 77 hours, expecting it to crap out soon...



I now have another task that has 54+ hours of cpu time, 22% Progress...


t476_boinc_nmr_cm_rnd1_cs_frags_loopbuild_threading_cst_relax_tex_IGNORE_THE_REST_22837_909_0


Anyone wanna bet this will also ultimately die with a "compute error" ?



I have two wu's with "compute errors" that each ran for over 80 hours of cpu time, despite what the logs may otherwise claim...


resultid=390949677


resultid=391015050



____________
Defeat Censorship! Wikileaks needs OUR help! Learn how you can help (d/l 'insurance' file), by clicking here. "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech" B. Franklin

spamhasser

Joined: Nov 12 10
Posts: 2
ID: 401292
Credit: 308
RAC: 0
Message 69204 - Posted 12 Jan 2011 1:54:48 UTC

Since the last rosetta-crash i can't upload anything to rosetta's server.

Mi 12 Jan 2011 02:43:24 CET rosetta@home Project file upload handler is missing

The upload is stuck after 2,26% (0,31 of 13,48 kB)

Is there anything i can do to get it done?

If "waiting for someone solving the problem on rosetta's end" is the solution then i will shut up and wait a bit (or a byte if the duration is higher ;-) )

But if i have to do anything else than wait (e.g. resetting rosetta), let me know :)

(It is still winter here and my CPU should not get cold)

Thx in advance

Ian

Joined: Apr 22 09
Posts: 1
ID: 312730
Credit: 459,642
RAC: 0
Message 69231 - Posted 12 Jan 2011 14:37:12 UTC

I have the same errorr on all my completed work units "Project file upload handler is missing". What should I do?

EvoDude Profile
Avatar

Joined: Nov 6 05
Posts: 21
ID: 9608
Credit: 52,425
RAC: 0
Message 69235 - Posted 12 Jan 2011 15:13:48 UTC

Isn't it a shame no-one from the project can be bothered responding to this. Makes you feel as if no-one cares. Very disappointed.
____________

banditwolf Profile

Joined: Jan 10 06
Posts: 28
ID: 49031
Credit: 139,737
RAC: 0
Message 69236 - Posted 12 Jan 2011 15:15:33 UTC

I have 2 of 3 that say compute error when they don't show any signs of having problems.
____________

Guywad

Joined: Apr 9 08
Posts: 1
ID: 252128
Credit: 1,547,823
RAC: 914
Message 69237 - Posted 12 Jan 2011 15:53:18 UTC

I have two separate machines in two locations that have multiple results that all have the same uploading error message (Project file upload handler is missing). I tried rebooting each machine with no difference. Both are running Linux Redhat 13.

iceweazel

Joined: Feb 23 09
Posts: 1
ID: 302960
Credit: 114,165
RAC: 0
Message 69240 - Posted 12 Jan 2011 16:31:39 UTC - in response to Message ID 69231.

You should check the main project page where they mention a major SAN storage problem has occurred again. Must be some pretty old junk to have the issues they've had. Heck I have a first gen SAN still chugging along here. (amongst others) Though not nearly the kind of disk activity they probably see.


Give it a week or three and things should be back to normal.


I have the same errorr on all my completed work units "Project file upload handler is missing". What should I do?

aad

Joined: Jan 5 06
Posts: 2
ID: 46945
Credit: 3,642,445
RAC: 6,735
Message 69254 - Posted 12 Jan 2011 20:52:26 UTC

Hmmm.....strange.......
From eight jobs, five uploaded normaly, but 3 won't upload.

12-1-2011 20:22:46 rosetta@home Started upload of patrick_test_01_rlbn_1fkb_IGNORE_THE_REST_NATIVE_22848_37_0_0
12-1-2011 20:22:47 rosetta@home Project file upload handler is missing
12-1-2011 20:22:47 rosetta@home Backing off 1 hr 1 min 31 sec on upload of patrick_test_01_rlbn_1fkb_IGNORE_THE_REST_NATIVE_22848_37_0_0

They're all 'patrick_test' jobs!
Maybe that's the thruth meaning of 'ignore the rest'?

Should I delete the stuff?
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 69255 - Posted 12 Jan 2011 21:13:50 UTC - in response to Message ID 69254.

Should I delete the stuff?


No.

____________
Rosetta Moderator: Mod.Sense

EvoDude Profile
Avatar

Joined: Nov 6 05
Posts: 21
ID: 9608
Credit: 52,425
RAC: 0
Message 69256 - Posted 12 Jan 2011 21:33:12 UTC - in response to Message ID 69255.

Should I delete the stuff?


No.


And that's all you've got to say in response to all the faults and complaints? What a joke of an admin this project has. I've got better things to donate my PC time to than this crap.

Bye Bye Rosetta.
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 69260 - Posted 12 Jan 2011 22:53:36 UTC
Last modified: 12 Jan 2011 22:54:54 UTC

EvoDude, I am a volunteer moderator, nothing more. I don't have any additional information to offer on the servers or plans or problems, nor do I have any access to post messages on the project homepage nor access to it's servers. So please, don't leave on my account. The "Project Administrator" title posted next to my profile name is due to the standard BOINC server code, and is not an indication of any duties.
____________
Rosetta Moderator: Mod.Sense

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,545,746
RAC: 7,447
Message 69284 - Posted 13 Jan 2011 3:45:13 UTC - in response to Message ID 69260.

EvoDude, I am a volunteer moderator, nothing more. I don't have any additional information to offer on the servers or plans or problems, nor do I have any access to post messages on the project homepage nor access to it's servers. So please, don't leave on my account. The "Project Administrator" title posted next to my profile name is due to the standard BOINC server code, and is not an indication of any duties.

Don't beat yourself up. EvoDude's 0 RAC leads me to discover he hasn't run a single task here since 2007 anyway... Makes you wonder... or not...
____________

EvoDude Profile
Avatar

Joined: Nov 6 05
Posts: 21
ID: 9608
Credit: 52,425
RAC: 0
Message 69294 - Posted 13 Jan 2011 8:25:12 UTC - in response to Message ID 69284.

EvoDude, I am a volunteer moderator, nothing more. I don't have any additional information to offer on the servers or plans or problems, nor do I have any access to post messages on the project homepage nor access to it's servers. So please, don't leave on my account. The "Project Administrator" title posted next to my profile name is due to the standard BOINC server code, and is not an indication of any duties.

Don't beat yourself up. EvoDude's 0 RAC leads me to discover he hasn't run a single task here since 2007 anyway... Makes you wonder... or not...


So, because I've been busy doing other work, particularly at Rosetta's test project where I'm 14th in the world and highest, by a country mile, in the UK means I can't have an opinion on the handling of the volunteers in this part of the project? Get real mate and check your facts properly before you cast aspersions. I PM'd the Mod in question as soon as he responded to discuss this privately, again something you wouldn't know, but let's not let actual fact get in the way of you having a personal dig at me.

My suggestion:- Keep your nose out of my business till you know what you are talking about.
____________

aad

Joined: Jan 5 06
Posts: 2
ID: 46945
Credit: 3,642,445
RAC: 6,735
Message 69295 - Posted 13 Jan 2011 11:36:36 UTC - in response to Message ID 69255.

Should I delete the stuff?


No.

Oke
There's still a week to go before they expire.
But nnw for now before it clutters my pc!
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,545,746
RAC: 7,447
Message 69305 - Posted 14 Jan 2011 2:33:44 UTC - in response to Message ID 69294.

Whatever opinion you want to express, EvoDude, your "bye bye Rosetta" was both impotent and petulant, irrespective of what you've done on Ralph. However limited the front page info it's been apparent the failure was extensive and mod.sense's reply seems as complete and accurate now as it did at the time. As was mine.
____________

EvoDude Profile
Avatar

Joined: Nov 6 05
Posts: 21
ID: 9608
Credit: 52,425
RAC: 0
Message 69315 - Posted 14 Jan 2011 8:42:07 UTC - in response to Message ID 69305.

Whatever opinion you want to express, EvoDude, your "bye bye Rosetta" was both impotent and petulant, irrespective of what you've done on Ralph. However limited the front page info it's been apparent the failure was extensive and mod.sense's reply seems as complete and accurate now as it did at the time. As was mine.


Aye aye - whatever. Why you needed to get personal about it I don't know but my post had nothing to do with you and maybe if you spent more time addressing the failings in your own site it would have less dead/broken links and people on here would be allowed to have freedom of expression and state their opinion without fear of unwarranted personal abuse.

Subject closed as far as I'm concerned but no doubt you will want to carry on your own wee petulant, impotent rant. Doesn't make you right, you know.
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 69586 - Posted 2 Feb 2011 18:20:25 UTC

Tasks starting with casd_ seem to be failing after 30 seconds of runtime, and then reporting with validation errors as per reports here. They all seem to show 100 starting structures and 100 decoys... and 1200 CPU seconds in the output, but around 40 in the task's reported CPU time.
____________
Rosetta Moderator: Mod.Sense

Mad_Max

Joined: Dec 31 09
Posts: 150
ID: 365007
Credit: 4,704,855
RAC: 9,235
Message 69721 - Posted 2 Mar 2011 2:00:20 UTC
Last modified: 2 Mar 2011 2:06:37 UTC

I caught several tasks ending too early which mentioned in Dec - now on my own computer. Dependence(what may cause this) is not found - all looks as normal for me.
The CPU target time was 2 hours and did not change.
http://boinc.bakerlab.org/rosetta/result.php?resultid=403175973
http://boinc.bakerlab.org/rosetta/result.php?resultid=403175972
http://boinc.bakerlab.org/rosetta/result.php?resultid=403005728

P.S.
In general, I like the version 2.17. Over the last >2 months no serious crashes or memory leaks at all.

[AF>france>pas-de-calais]symaski62

Joined: Sep 19 05
Posts: 47
ID: 506
Credit: 33,871
RAC: 0
Message 69741 - Posted 5 Mar 2011 15:47:46 UTC - in response to Message ID 69721.

I caught several tasks ending too early which mentioned in Dec - now on my own computer. Dependence(what may cause this) is not found - all looks as normal for me.
The CPU target time was 2 hours and did not change.
http://boinc.bakerlab.org/rosetta/result.php?resultid=403175973
http://boinc.bakerlab.org/rosetta/result.php?resultid=403175972
http://boinc.bakerlab.org/rosetta/result.php?resultid=403005728

P.S.
In general, I like the version 2.17. Over the last >2 months no serious crashes or memory leaks at all.


yes

# cpu_run_time_pref: 7200 sec = 2H

http://boinc.bakerlab.org/rosetta/prefs.php?subset=project

Target CPU run time => 2H > 4H, 8H, 12H, 1 day

BOINC Update = rosetta@home

:)
____________

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 365,375
RAC: 94
Message 69743 - Posted 5 Mar 2011 22:18:55 UTC - in response to Message ID 69741.
Last modified: 5 Mar 2011 22:19:56 UTC

I caught several tasks ending too early which mentioned in Dec - now on my own
computer. Dependence(what may cause this) is not found - all looks as normal for me.
The CPU target time was 2 hours and did not change.
http://boinc.bakerlab.org/rosetta/result.php?resultid=403175973
http://boinc.bakerlab.org/rosetta/result.php?resultid=403175972
http://boinc.bakerlab.org/rosetta/result.php?resultid=403005728

P.S.
In general, I like the version 2.17. Over the last >2 months no serious crashes or memory leaks at all.


yes

# cpu_run_time_pref: 7200 sec = 2H

http://boinc.bakerlab.org/rosetta/prefs.php?subset=project

Target CPU run time => 2H > 4H, 8H, 12H, 1 day

BOINC Update = rosetta@home

:)


I think you are missing Mad Max's point. The preferred run time was set at
2 hours but the three tasks each lasted less than 30 minutes and generated
11 decoys at most.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 69762 - Posted 8 Mar 2011 20:08:47 UTC

I've had some tasks with names starting with Ferredoxin-like_abinitio fail immediately on Mac with the following error:

ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context.

The tasks are :

405034651
405034650
404995301
404995300

Small point: the name for the last task (404995300) is 404995300 is Ferredoxin-lie_abinitio_SAVE_ALL_OUT_design_relax_fr28_009_23275_106_0. The k in like is missing.

KN_Ikari

Joined: May 27 09
Posts: 1
ID: 318200
Credit: 494,705
RAC: 46
Message 69763 - Posted 8 Mar 2011 20:54:51 UTC

I have the same problem. See my Result page if you want more information.

Saenger Profile
Avatar

Joined: Sep 19 05
Posts: 270
ID: 537
Credit: 293,819
RAC: 229
Message 69766 - Posted 9 Mar 2011 10:56:19 UTC - in response to Message ID 69763.
Last modified: 9 Mar 2011 10:59:09 UTC

I have the same problem. See my Result page if you want more information.

Same here on Linux.

Edith says:
You have to link host results, account results are hidden to all other users.
See here for me: account and host

Hank Barta

Joined: Feb 6 11
Posts: 14
ID: 410487
Credit: 2,632,256
RAC: 8
Message 69772 - Posted 9 Mar 2011 15:38:07 UTC - in response to Message ID 69766.


Same here on Linux.

Also on Linux, and also apparently the same problem.

Out of 24 complete work units since I set up BOINC/Rosetta yesterday, 9 have finished with the compute error "ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context"

Can I presume that this represents a bug in the S/W or should I be looking for H/W problems?

the latest result is: http://boinc.bakerlab.org/rosetta/result.php?resultid=405000927

(Interesting to note that two other machines I have crunching have not encountered compute errors. However they have faster processors so they may be getting different types of work units.)

thanks,
hank

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 69774 - Posted 9 Mar 2011 17:40:04 UTC

Can I presume that this represents a bug in the S/W or should I be looking for H/W problems?


No, looks like retries consistently fail so looks entirely like an issue with how the specific tasks were created. So it will be hit and miss as to whether a given host is assigned any of the failing tasks. Nothing to do with their speed, or OS or etc.
____________
Rosetta Moderator: Mod.Sense

Mad_Max

Joined: Dec 31 09
Posts: 150
ID: 365007
Credit: 4,704,855
RAC: 9,235
Message 69787 - Posted 11 Mar 2011 2:44:06 UTC

Same here: ALL my tasks with name starts from "Ferredoxin-" ends by computation error after few seconds after start:
http://boinc.bakerlab.org/rosetta/result.php?resultid=405012197
http://boinc.bakerlab.org/rosetta/result.php?resultid=405043700
http://boinc.bakerlab.org/rosetta/result.php?resultid=405048095
http://boinc.bakerlab.org/rosetta/result.php?resultid=405048985
http://boinc.bakerlab.org/rosetta/result.php?resultid=405102023
http://boinc.bakerlab.org/rosetta/result.php?resultid=405103050

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 69801 - Posted 13 Mar 2011 3:42:44 UTC

Workunit T0635_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23623_2499 has stopped using ANY CPU time, even though BOINC thinks it is still running.

Currently at 00:43:26 CPU time, 17:06:52 Elapsed time, 5.656% Progress, 45:45:04 Estimated time remaining.

Since I ask for 12-hour workunits, I suspect that its timeout procedure has failed as well.

I'm about to restart it from its last checkpoint, in case that will help.

I'm using TThrottle to keep my computers from overheating, so I'm not quite sure just how much of the CPU time BOINC is allowed to use.

Kartsa

Joined: Jul 29 06
Posts: 3
ID: 102312
Credit: 1,284,412
RAC: 307
Message 69810 - Posted 13 Mar 2011 13:01:57 UTC - in response to Message ID 69801.

Workunit T0635_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23623_2499 has stopped using ANY CPU time, even though BOINC thinks it is still running.

Currently at 00:43:26 CPU time, 17:06:52 Elapsed time, 5.656% Progress, 45:45:04 Estimated time remaining.

Since I ask for 12-hour workunits, I suspect that its timeout procedure has failed as well.

I'm about to restart it from its last checkpoint, in case that will help.

I'm using TThrottle to keep my computers from overheating, so I'm not quite sure just how much of the CPU time BOINC is allowed to use.

I'm having similar issues with some of the units on two different machines (win 7 and xp). Boinc thinks they are still running but they are not using any cpu and progress doesn't increase. I'm not using any throttling at all, full 100% all the time. I just abort the failed units since suspending/resume doesn't make any difference. I've been having these problems for couple of months I think, most of the wus work just fine.
Tried resetting the project, didn't help.

two examples
T0590_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23137_1570
T0620_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23164_2093 (someone seems to have successfully finished this one, though...)
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 69811 - Posted 13 Mar 2011 21:01:19 UTC

Those last two posts sound like the illusive BOINC issue where it seems to think the task is running, yet not give it any CPU time. The only way around the problem that I am aware of (other then just aborting them) is to completely exit BOINC and restart it (or reboot the machine). Since the task gets no CPU time, the watchdog has no way to run and detect the long runtime.
____________
Rosetta Moderator: Mod.Sense

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 69813 - Posted 13 Mar 2011 23:28:57 UTC

I tried a reboot, and that workunit is now up to 04:50:51 CPU time and 38.781% progress, but has the same problem again - not using any CPU time, but BOINC thinks it is still running. Looks like time for another reboot to see how much further it will get after that.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 69817 - Posted 14 Mar 2011 9:33:11 UTC
Last modified: 14 Mar 2011 9:35:41 UTC

Reached 47.188% progress after that reboot, then same problem again. Another reboot, and it finally finished properly.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 69820 - Posted 14 Mar 2011 15:36:28 UTC

Robert, you seem to have hit a system or a task or operating environment that encountered the problem frequently. I've been trying to put my finger on what situations make it more likely to occur. One of my theories is that it has something to do with BOINC being short on memory. Could you document your BOINC version, memory settings, whether you leave tasks in memory or not when suspended, and any messages you might have seen while the task was running (other tasks starting, perhaps suspending that one due to memory or project swapping).
____________
Rosetta Moderator: Mod.Sense

Kartsa

Joined: Jul 29 06
Posts: 3
ID: 102312
Credit: 1,284,412
RAC: 307
Message 69824 - Posted 14 Mar 2011 17:47:52 UTC - in response to Message ID 69811.

completely exit BOINC and restart it

yep this seems to get them running again

BOINC version 6.10.58, using the default(?) memory settings, 50% when in use and 90% when not in use. I have 8 gigs total and rosetta rarely uses more than 2gigs in total (4 wus, ~500MB each; usually it's a lot less, 250-400MB each. Apps are not left in memory when suspended. At this point cant say anything certain about the possible messages since I just restarted the client, will post the next time some wu 'hangs'.

The other projects I'm running, Seti and Einstein, aren't affected by this problem.
____________

Kartsa

Joined: Jul 29 06
Posts: 3
ID: 102312
Credit: 1,284,412
RAC: 307
Message 69825 - Posted 14 Mar 2011 20:45:07 UTC
Last modified: 14 Mar 2011 20:48:50 UTC

so, theres no error messages or anything. From restart to the point when I noticed that a wu has hanged

14/03/2011 20:26:33 Not using a proxy
14/03/2011 20:26:33 rosetta@home Restarting task T0596_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23234_4284_0 using minirosetta version 217
14/03/2011 20:26:33 rosetta@home Restarting task T0623_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23166_4307_0 using minirosetta version 217
14/03/2011 20:26:33 rosetta@home Restarting task T0528_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23193_4413_0 using minirosetta version 217
14/03/2011 20:26:34 rosetta@home Restarting task IF3_like_SAVE_ALL_OUT_i016_008_23333_342_0 using minirosetta version 217
14/03/2011 20:39:24 rosetta@home Computation for task T0596_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23234_4284_0 finished
14/03/2011 20:39:24 rosetta@home Starting IF3_like_SAVE_ALL_OUT_relax_i016_23334_521_0
14/03/2011 20:39:25 rosetta@home Starting task IF3_like_SAVE_ALL_OUT_relax_i016_23334_521_0 using minirosetta version 217
14/03/2011 20:39:26 rosetta@home Started upload of T0596_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23234_4284_0_0
14/03/2011 20:39:46 rosetta@home Finished upload of T0596_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23234_4284_0_0
14/03/2011 21:12:48 rosetta@home Computation for task IF3_like_SAVE_ALL_OUT_relax_i016_23334_521_0 finished
14/03/2011 21:12:48 rosetta@home Starting IF3_like_SAVE_ALL_OUT_i016_009_23333_1081_0
14/03/2011 21:12:48 rosetta@home Starting task IF3_like_SAVE_ALL_OUT_i016_009_23333_1081_0 using minirosetta version 217
14/03/2011 21:12:50 rosetta@home Started upload of IF3_like_SAVE_ALL_OUT_relax_i016_23334_521_0_0
14/03/2011 21:13:13 rosetta@home Finished upload of IF3_like_SAVE_ALL_OUT_relax_i016_23334_521_0_0
14/03/2011 21:41:14 rosetta@home Computation for task T0528_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23193_4413_0 finished
14/03/2011 21:41:14 rosetta@home Starting IF3_like_SAVE_ALL_OUT_i016_008_23333_703_0
14/03/2011 21:41:15 rosetta@home Starting task IF3_like_SAVE_ALL_OUT_i016_008_23333_703_0 using minirosetta version 217
14/03/2011 21:41:17 rosetta@home Started upload of T0528_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23193_4413_0_0
14/03/2011 21:41:27 rosetta@home Finished upload of T0528_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23193_4413_0_0

and it was the task T0623_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23166_4307_0
that got stuck

edit and just to add: I had all the other projects suspended before the restart, so only rosetta was running
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 69828 - Posted 15 Mar 2011 2:02:25 UTC - in response to Message ID 69820.
Last modified: 15 Mar 2011 2:03:41 UTC

Robert, you seem to have hit a system or a task or operating environment that encountered the problem frequently. I've been trying to put my finger on what situations make it more likely to occur. One of my theories is that it has something to do with BOINC being short on memory. Could you document your BOINC version, memory settings, whether you leave tasks in memory or not when suspended, and any messages you might have seen while the task was running (other tasks starting, perhaps suspending that one due to memory or project swapping).


BOINC 6.10.58 (64 bit) for Windows
8 GB memory installed
BOINC allowed to use only 35% of it due to problems in the past
BOINC tasks left in memory when suspended
64-bit Windows Vista Home Premium SP2

I'd install more memory if that motherboard could hold any more, even though that would require upgrading to a version of Windows Vista that would allow use of more.

Problem has occurred again; this time with workunit:

T0599_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23146_3686_0

Will reboot again to restart that workunit after sending this post.


BOINC messages from after the last reboot:

3/14/2011 2:55:23 AM Starting BOINC client version 6.10.58 for windows_x86_64
3/14/2011 2:55:23 AM log flags: file_xfer, sched_ops, task
3/14/2011 2:55:23 AM Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
3/14/2011 2:55:23 AM Data directory: C:\ProgramData\BOINC
3/14/2011 2:55:23 AM Running under account Bobby
3/14/2011 2:55:30 AM Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz [Family 6 Model 23 Stepping 10]
3/14/2011 2:55:30 AM Processor: 6.00 MB cache
3/14/2011 2:55:30 AM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe
3/14/2011 2:55:30 AM OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00)
3/14/2011 2:55:30 AM Memory: 8.00 GB physical, 15.66 GB virtual
3/14/2011 2:55:30 AM Disk: 919.67 GB total, 631.40 GB free
3/14/2011 2:55:30 AM Local time is UTC -5 hours
3/14/2011 2:55:35 AM NVIDIA GPU 0: GeForce GTS 450 (driver version 26089, CUDA version 3020, compute capability 2.1, 993MB, 476 GFLOPS peak)
3/14/2011 2:55:38 AM rosetta@home URL http://boinc.bakerlab.org/rosetta/; Computer ID xxxxxxx; resource share 175
3/14/2011 2:55:38 AM DrugDiscovery URL http://boinc.drugdiscoveryathome.com/; Computer ID xxx; resource share 60
3/14/2011 2:55:38 AM Poem@Home URL http://boinc.fzk.de/poem/; Computer ID xxxxx; resource share 50
3/14/2011 2:55:38 AM Collatz Conjecture URL http://boinc.thesonntags.com/collatz/; Computer ID xxxxx; resource share 20
3/14/2011 2:55:38 AM The Lattice Project URL http://boinc.umiacs.umd.edu/; Computer ID xxxxx; resource share 80
3/14/2011 2:55:38 AM boincsimap URL http://boincsimap.org/boincsimap/; Computer ID 185338; resource share 40
3/14/2011 2:55:38 AM superlinkattechnion URL http://cbl-boinc-server2.cs.technion.ac.il/superlinkattechnion/; Computer ID xxxxx; resource share 100
3/14/2011 2:55:38 AM Docking URL http://docking.cis.udel.edu/; Computer ID xxxxx; resource share 50
3/14/2011 2:55:38 AM Hydrogen@Home URL http://hydrogenathome.org/; Computer ID not assigned yet; resource share 20
3/14/2011 2:55:38 AM QMC@HOME URL http://qah.uni-muenster.de/; Computer ID xxxxx; resource share 40
3/14/2011 2:55:38 AM ralph@home URL http://ralph.bakerlab.org/; Computer ID xxxxx; resource share 60
3/14/2011 2:55:38 AM ibercivis URL http://registro.ibercivis.es/; Computer ID xxxxxx; resource share 20
3/14/2011 2:55:38 AM GPUGRID URL http://www.gpugrid.net/; Computer ID xxxxx; resource share 35
3/14/2011 2:55:38 AM malariacontrol.net URL http://www.malariacontrol.net/; Computer ID xxxxx; resource share 45
3/14/2011 2:55:38 AM PrimeGrid URL http://www.primegrid.com/; Computer ID xxxxxx; resource share 20
3/14/2011 2:55:38 AM RNA World URL http://www.rnaworld.de/rnaworld/; Computer ID xxxx; resource share 20
3/14/2011 2:55:38 AM World Community Grid URL http://www.worldcommunitygrid.org/; Computer ID xxxxxxx; resource share 175
3/14/2011 2:55:38 AM World Community Grid General prefs: from World Community Grid (last modified 10-Nov-2010 19:33:48)
3/14/2011 2:55:38 AM World Community Grid Computer location: work
3/14/2011 2:55:38 AM General prefs: using separate prefs for work
3/14/2011 2:55:38 AM Reading preferences override file
3/14/2011 2:55:38 AM Preferences:
3/14/2011 2:55:38 AM max memory usage when active: 2866.64MB
3/14/2011 2:55:38 AM max memory usage when idle: 2866.64MB
3/14/2011 2:56:25 AM max disk usage: 30.00GB
3/14/2011 2:56:25 AM max CPUs used: 3
3/14/2011 2:56:25 AM (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
3/14/2011 2:56:30 AM Not using a proxy
3/14/2011 2:56:30 AM Suspending computation - user request
3/14/2011 2:59:14 AM rosetta@home Restarting task T0635_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23263_2499_0 using minirosetta version 217
3/14/2011 2:59:14 AM GPUGRID Restarting task F658-TONI_KKAL2-14-100-RND8723_1 using acemd2 version 613
3/14/2011 2:59:14 AM World Community Grid Restarting task c4cw_target03_069498213_0 using c4cw version 640
3/14/2011 2:59:14 AM boincsimap Restarting task 20101211.491368_0 using simap version 512
3/14/2011 3:13:09 AM GPUGRID Sending scheduler request: To fetch work.
3/14/2011 3:13:09 AM GPUGRID Requesting new tasks for CPU
3/14/2011 3:13:12 AM GPUGRID Scheduler request completed: got 0 new tasks
3/14/2011 3:13:12 AM GPUGRID Message from server: No work sent
3/14/2011 3:13:12 AM GPUGRID Message from server: No work is available for ACEMD beta version
3/14/2011 3:13:12 AM GPUGRID Message from server: Fermi-class GPU not supported by cuda2.2
3/14/2011 3:13:12 AM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
3/14/2011 3:13:12 AM GPUGRID Message from server: No work available for the applications you have selected. Please check your preferences on the web site.
3/14/2011 3:22:14 AM rosetta@home Computation for task T0635_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23263_2499_0 finished
3/14/2011 3:22:18 AM rosetta@home Restarting task T0599_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23146_3686_0 using minirosetta version 217
3/14/2011 3:22:19 AM rosetta@home Started upload of T0635_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23263_2499_0_0
3/14/2011 3:23:01 AM rosetta@home Finished upload of T0635_rR_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23263_2499_0_0
3/14/2011 3:42:44 AM RNA World Sending scheduler request: Requested by project.
3/14/2011 3:42:44 AM RNA World Not reporting or requesting tasks
3/14/2011 3:42:47 AM RNA World Scheduler request completed
3/14/2011 3:59:59 AM Docking Restarting task 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0 using charmm34 version 623
3/14/2011 4:26:29 AM ralph@home Fetching scheduler list
3/14/2011 4:26:51 AM Project communication failed: attempting access to reference site
3/14/2011 4:26:52 AM Internet access OK - project servers may be temporarily down.
3/14/2011 4:27:07 AM rosetta@home update requested by user
3/14/2011 4:27:12 AM rosetta@home Sending scheduler request: Requested by user.
3/14/2011 4:27:12 AM rosetta@home Reporting 1 completed tasks, not requesting new tasks
3/14/2011 4:27:14 AM rosetta@home Scheduler request completed
3/14/2011 4:27:19 AM World Community Grid Sending scheduler request: To fetch work.
3/14/2011 4:27:19 AM World Community Grid Requesting new tasks for CPU
3/14/2011 4:27:26 AM World Community Grid Scheduler request completed: got 1 new tasks
3/14/2011 4:27:29 AM World Community Grid Started download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_ZINC04128093_xEyeSiteXtl5N_01.gpf.gzb
3/14/2011 4:27:29 AM World Community Grid Started download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_AD4.1_bound.dat.gzb
3/14/2011 4:27:30 AM World Community Grid Finished download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_ZINC04128093_xEyeSiteXtl5N_01.gpf.gzb
3/14/2011 4:27:30 AM World Community Grid Finished download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_AD4.1_bound.dat.gzb
3/14/2011 4:27:30 AM World Community Grid Started download of ecdb29f199f44a3e87b359c1362606f2.dpf.gzb
3/14/2011 4:27:30 AM World Community Grid Started download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_xEyeSiteXtl5NI.pdbqt.gzb
3/14/2011 4:27:33 AM World Community Grid Finished download of ecdb29f199f44a3e87b359c1362606f2.dpf.gzb
3/14/2011 4:27:33 AM World Community Grid Finished download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_xEyeSiteXtl5NI.pdbqt.gzb
3/14/2011 4:27:33 AM World Community Grid Started download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_ZINC04128093.pdbqt.gzb
3/14/2011 4:27:33 AM World Community Grid Started download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_base.00047.dat.gzb
3/14/2011 4:27:35 AM World Community Grid Finished download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_ZINC04128093.pdbqt.gzb
3/14/2011 4:27:35 AM World Community Grid Finished download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_base.00047.dat.gzb
3/14/2011 4:27:35 AM World Community Grid Started download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_base1.00047.pdbqt.gzb
3/14/2011 4:27:35 AM World Community Grid Started download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_base2.00047.pdbqt.gzb
3/14/2011 4:27:36 AM World Community Grid Finished download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_base1.00047.pdbqt.gzb
3/14/2011 4:27:36 AM World Community Grid Finished download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_base2.00047.pdbqt.gzb
3/14/2011 4:27:36 AM World Community Grid Started download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_wcgrid.00047.gpf.gzb
3/14/2011 4:27:36 AM World Community Grid Started download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_wcgrid.00047.dpf.gzb
3/14/2011 4:27:38 AM World Community Grid Finished download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_wcgrid.00047.gpf.gzb
3/14/2011 4:27:38 AM World Community Grid Finished download of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_wcgrid.00047.dpf.gzb
3/14/2011 4:53:18 AM QMC@HOME Sending scheduler request: To fetch work.
3/14/2011 4:53:18 AM QMC@HOME Requesting new tasks for GPU
3/14/2011 4:53:21 AM QMC@HOME Scheduler request completed: got 0 new tasks
3/14/2011 4:53:21 AM QMC@HOME Message from server: No work sent
3/14/2011 4:53:26 AM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 4:53:26 AM Poem@Home Requesting new tasks for GPU
3/14/2011 4:53:28 AM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 4:53:28 AM Poem@Home Message from server: No work sent
3/14/2011 4:53:33 AM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 4:53:33 AM malariacontrol.net Requesting new tasks for GPU
3/14/2011 4:53:36 AM malariacontrol.net Scheduler request completed: got 0 new tasks
3/14/2011 4:53:36 AM malariacontrol.net Message from server: No work sent
3/14/2011 4:53:36 AM malariacontrol.net Message from server: No work is available for malariacontrol.net
3/14/2011 4:53:36 AM malariacontrol.net Message from server: No work is available for openMalaria test version
3/14/2011 4:53:36 AM malariacontrol.net Message from server: No work is available for Prediction of Malaria Prevalence
3/14/2011 4:53:36 AM malariacontrol.net Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h)
3/14/2011 4:53:36 AM malariacontrol.net Message from server: No work is available for
3/14/2011 4:53:41 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 4:53:41 AM boincsimap Requesting new tasks for GPU
3/14/2011 4:53:43 AM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 4:53:43 AM boincsimap Message from server: No work sent
3/14/2011 4:53:48 AM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 4:53:48 AM rosetta@home Requesting new tasks for GPU
3/14/2011 4:53:50 AM rosetta@home Scheduler request completed: got 0 new tasks
3/14/2011 4:54:55 AM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 4:54:55 AM Poem@Home Requesting new tasks for GPU
3/14/2011 4:54:57 AM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 4:54:57 AM Poem@Home Message from server: No work sent
3/14/2011 4:55:02 AM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 4:55:02 AM malariacontrol.net Requesting new tasks for GPU
3/14/2011 4:55:04 AM malariacontrol.net Scheduler request completed: got 0 new tasks
3/14/2011 4:55:04 AM malariacontrol.net Message from server: No work sent
3/14/2011 4:55:04 AM malariacontrol.net Message from server: No work is available for malariacontrol.net
3/14/2011 4:55:04 AM malariacontrol.net Message from server: No work is available for openMalaria test version
3/14/2011 4:55:04 AM malariacontrol.net Message from server: No work is available for Prediction of Malaria Prevalence
3/14/2011 4:55:04 AM malariacontrol.net Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h)
3/14/2011 4:55:04 AM malariacontrol.net Message from server: No work is available for
3/14/2011 4:55:09 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 4:55:09 AM boincsimap Requesting new tasks for GPU
3/14/2011 4:55:12 AM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 4:55:12 AM boincsimap Message from server: No work sent
3/14/2011 4:55:18 AM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 4:55:18 AM malariacontrol.net Requesting new tasks for GPU
3/14/2011 4:55:20 AM malariacontrol.net Scheduler request completed: got 0 new tasks
3/14/2011 4:55:20 AM malariacontrol.net Message from server: No work sent
3/14/2011 4:55:20 AM malariacontrol.net Message from server: No work is available for malariacontrol.net
3/14/2011 4:55:20 AM malariacontrol.net Message from server: No work is available for openMalaria test version
3/14/2011 4:55:20 AM malariacontrol.net Message from server: No work is available for Prediction of Malaria Prevalence
3/14/2011 4:55:20 AM malariacontrol.net Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h)
3/14/2011 4:55:20 AM malariacontrol.net Message from server: No work is available for
3/14/2011 4:55:35 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 4:55:35 AM boincsimap Requesting new tasks for GPU
3/14/2011 4:55:39 AM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 4:55:39 AM boincsimap Message from server: No work sent
3/14/2011 4:55:44 AM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 4:55:44 AM Poem@Home Requesting new tasks for GPU
3/14/2011 4:55:46 AM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 4:55:46 AM Poem@Home Message from server: No work sent
3/14/2011 4:56:56 AM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 4:56:56 AM malariacontrol.net Requesting new tasks for GPU
3/14/2011 4:56:58 AM malariacontrol.net Scheduler request completed: got 0 new tasks
3/14/2011 4:56:58 AM malariacontrol.net Message from server: No work sent
3/14/2011 4:56:58 AM malariacontrol.net Message from server: No work is available for malariacontrol.net
3/14/2011 4:56:58 AM malariacontrol.net Message from server: No work is available for openMalaria test version
3/14/2011 4:56:58 AM malariacontrol.net Message from server: No work is available for Prediction of Malaria Prevalence
3/14/2011 4:56:58 AM malariacontrol.net Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h)
3/14/2011 4:56:58 AM malariacontrol.net Message from server: No work is available for
3/14/2011 4:57:53 AM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 4:57:53 AM rosetta@home Requesting new tasks for GPU
3/14/2011 4:57:55 AM rosetta@home Scheduler request completed: got 0 new tasks
3/14/2011 4:59:00 AM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 4:59:00 AM Poem@Home Requesting new tasks for GPU
3/14/2011 4:59:02 AM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 4:59:02 AM Poem@Home Message from server: No work sent
3/14/2011 5:00:12 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 5:00:12 AM boincsimap Requesting new tasks for GPU
3/14/2011 5:00:14 AM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 5:00:14 AM boincsimap Message from server: No work sent
3/14/2011 5:02:00 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 5:02:00 AM boincsimap Requesting new tasks for GPU
3/14/2011 5:02:03 AM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 5:02:03 AM boincsimap Message from server: No work sent
3/14/2011 5:02:08 AM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 5:02:08 AM rosetta@home Requesting new tasks for GPU
3/14/2011 5:02:10 AM rosetta@home Scheduler request completed: got 0 new tasks
3/14/2011 5:02:23 AM Poem@Home Starting data_487_1300085993_1951072479_0
3/14/2011 5:02:23 AM Poem@Home Starting task data_487_1300085993_1951072479_0 using poem version 100
3/14/2011 5:04:15 AM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 5:04:15 AM malariacontrol.net Requesting new tasks for GPU
3/14/2011 5:04:19 AM malariacontrol.net Scheduler request completed: got 0 new tasks
3/14/2011 5:04:19 AM malariacontrol.net Message from server: No work sent
3/14/2011 5:04:19 AM malariacontrol.net Message from server: No work is available for malariacontrol.net
3/14/2011 5:04:19 AM malariacontrol.net Message from server: No work is available for openMalaria test version
3/14/2011 5:04:19 AM malariacontrol.net Message from server: No work is available for Prediction of Malaria Prevalence
3/14/2011 5:04:19 AM malariacontrol.net Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h)
3/14/2011 5:04:19 AM malariacontrol.net Message from server: No work is available for
3/14/2011 5:04:34 AM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 5:04:34 AM malariacontrol.net Requesting new tasks for GPU
3/14/2011 5:04:36 AM malariacontrol.net Scheduler request completed: got 0 new tasks
3/14/2011 5:04:36 AM malariacontrol.net Message from server: No work sent
3/14/2011 5:04:36 AM malariacontrol.net Message from server: No work is available for malariacontrol.net
3/14/2011 5:04:36 AM malariacontrol.net Message from server: No work is available for openMalaria test version
3/14/2011 5:04:36 AM malariacontrol.net Message from server: No work is available for Prediction of Malaria Prevalence
3/14/2011 5:04:36 AM malariacontrol.net Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h)
3/14/2011 5:04:36 AM malariacontrol.net Message from server: No work is available for
3/14/2011 5:04:41 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 5:04:41 AM boincsimap Requesting new tasks for GPU
3/14/2011 5:04:44 AM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 5:04:44 AM boincsimap Message from server: No work sent
3/14/2011 5:04:54 AM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 5:04:54 AM malariacontrol.net Requesting new tasks for GPU
3/14/2011 5:04:57 AM malariacontrol.net Scheduler request completed: got 0 new tasks
3/14/2011 5:04:57 AM malariacontrol.net Message from server: No work sent
3/14/2011 5:04:57 AM malariacontrol.net Message from server: No work is available for malariacontrol.net
3/14/2011 5:04:57 AM malariacontrol.net Message from server: No work is available for openMalaria test version
3/14/2011 5:04:57 AM malariacontrol.net Message from server: No work is available for Prediction of Malaria Prevalence
3/14/2011 5:04:57 AM malariacontrol.net Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h)
3/14/2011 5:04:57 AM malariacontrol.net Message from server: No work is available for
3/14/2011 5:06:12 AM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 5:06:12 AM Poem@Home Requesting new tasks for GPU
3/14/2011 5:06:17 AM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 5:06:17 AM Poem@Home Message from server: No work sent
3/14/2011 5:06:22 AM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 5:06:22 AM rosetta@home Requesting new tasks for GPU
3/14/2011 5:06:24 AM rosetta@home Scheduler request completed: got 0 new tasks
3/14/2011 5:13:13 AM Hydrogen@Home work fetch resumed by user
3/14/2011 5:13:14 AM Hydrogen@Home Sending scheduler request: To fetch work.
3/14/2011 5:13:14 AM Hydrogen@Home Requesting new tasks for CPU and GPU
3/14/2011 5:13:16 AM Hydrogen@Home Scheduler request completed: got 0 new tasks
3/14/2011 5:13:16 AM Hydrogen@Home Message from server: Server can't open log file (../log_localhost/scheduler.log)
3/14/2011 5:13:34 AM Hydrogen@Home work fetch suspended by user
3/14/2011 5:14:21 AM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 5:14:21 AM rosetta@home Requesting new tasks for GPU
3/14/2011 5:14:23 AM rosetta@home Scheduler request completed: got 0 new tasks
3/14/2011 5:18:28 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 5:18:28 AM boincsimap Requesting new tasks for GPU
3/14/2011 5:18:31 AM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 5:18:31 AM boincsimap Message from server: No work sent
3/14/2011 5:20:41 AM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 5:20:41 AM Poem@Home Requesting new tasks for GPU
3/14/2011 5:20:44 AM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 5:20:44 AM Poem@Home Message from server: No work sent
3/14/2011 5:22:55 AM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 5:22:55 AM rosetta@home Requesting new tasks for GPU
3/14/2011 5:22:57 AM rosetta@home Scheduler request completed: got 0 new tasks
3/14/2011 5:32:02 AM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 5:32:02 AM rosetta@home Requesting new tasks for GPU
3/14/2011 5:32:04 AM rosetta@home Scheduler request completed: got 0 new tasks
3/14/2011 5:33:48 AM QMC@HOME Restarting task qasino_b3lypqz-E28_iso34.6809_0 using qasinoAlpha version 501
3/14/2011 5:36:10 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 5:36:10 AM boincsimap Requesting new tasks for GPU
3/14/2011 5:36:13 AM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 5:36:13 AM boincsimap Message from server: No work sent
3/14/2011 5:45:23 AM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 5:45:23 AM Poem@Home Requesting new tasks for GPU
3/14/2011 5:45:26 AM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 5:45:26 AM Poem@Home Message from server: No work sent
3/14/2011 6:02:35 AM World Community Grid Resuming task c4cw_target03_069498213_0 using c4cw version 640
3/14/2011 6:05:36 AM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 6:05:36 AM malariacontrol.net Requesting new tasks for GPU
3/14/2011 6:05:38 AM malariacontrol.net Scheduler request completed: got 0 new tasks
3/14/2011 6:05:38 AM malariacontrol.net Message from server: No work sent
3/14/2011 6:05:38 AM malariacontrol.net Message from server: No work is available for malariacontrol.net
3/14/2011 6:05:38 AM malariacontrol.net Message from server: No work is available for openMalaria test version
3/14/2011 6:05:38 AM malariacontrol.net Message from server: No work is available for Prediction of Malaria Prevalence
3/14/2011 6:05:38 AM malariacontrol.net Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h)
3/14/2011 6:05:38 AM malariacontrol.net Message from server: No work is available for
3/14/2011 6:09:53 AM GPUGRID Sending scheduler request: To fetch work.
3/14/2011 6:09:53 AM GPUGRID Requesting new tasks for CPU
3/14/2011 6:09:55 AM GPUGRID Scheduler request completed: got 0 new tasks
3/14/2011 6:09:55 AM GPUGRID Message from server: No work sent
3/14/2011 6:09:55 AM GPUGRID Message from server: No work is available for ACEMD beta version
3/14/2011 6:09:55 AM GPUGRID Message from server: Fermi-class GPU not supported by cuda2.2
3/14/2011 6:09:55 AM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
3/14/2011 6:09:55 AM GPUGRID Message from server: No work available for the applications you have selected. Please check your preferences on the web site.
3/14/2011 6:17:20 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:17:20 AM boincsimap Requesting new tasks for GPU
3/14/2011 6:17:22 AM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 6:17:22 AM boincsimap Message from server: No work sent
3/14/2011 6:18:32 AM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 6:18:32 AM rosetta@home Requesting new tasks for GPU
3/14/2011 6:18:34 AM rosetta@home Scheduler request completed: got 0 new tasks
3/14/2011 6:20:39 AM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 6:20:39 AM Poem@Home Requesting new tasks for GPU
3/14/2011 6:20:42 AM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 6:20:42 AM Poem@Home Message from server: No work sent
3/14/2011 6:39:01 AM malariacontrol.net Starting wu_990_24_350701_0_1300083521_1
3/14/2011 6:39:01 AM malariacontrol.net Starting task wu_990_24_350701_0_1300083521_1 using openMalariaA version 652
3/14/2011 6:45:38 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:45:38 AM boincsimap Requesting new tasks for GPU
3/14/2011 6:45:40 AM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 6:45:40 AM boincsimap Message from server: No work sent
3/14/2011 6:49:46 AM World Community Grid Computation for task c4cw_target03_069498213_0 finished
3/14/2011 6:49:46 AM World Community Grid Starting faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1
3/14/2011 6:49:47 AM World Community Grid Starting task faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1 using faah version 640
3/14/2011 6:49:49 AM World Community Grid Started upload of c4cw_target03_069498213_0_0
3/14/2011 6:49:56 AM World Community Grid Finished upload of c4cw_target03_069498213_0_0
3/14/2011 7:09:51 AM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 7:09:51 AM malariacontrol.net Requesting new tasks for GPU
3/14/2011 7:09:53 AM malariacontrol.net Scheduler request completed: got 0 new tasks
3/14/2011 7:09:53 AM malariacontrol.net Message from server: No work sent
3/14/2011 7:09:53 AM malariacontrol.net Message from server: No work is available for malariacontrol.net
3/14/2011 7:09:53 AM malariacontrol.net Message from server: No work is available for openMalaria test version
3/14/2011 7:09:53 AM malariacontrol.net Message from server: No work is available for Prediction of Malaria Prevalence
3/14/2011 7:09:53 AM malariacontrol.net Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h)
3/14/2011 7:09:53 AM malariacontrol.net Message from server: No work is available for
3/14/2011 7:13:08 AM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 7:13:08 AM rosetta@home Requesting new tasks for GPU
3/14/2011 7:13:10 AM rosetta@home Scheduler request completed: got 0 new tasks
3/14/2011 7:13:15 AM World Community Grid Sending scheduler request: To fetch work.
3/14/2011 7:13:15 AM World Community Grid Reporting 1 completed tasks, requesting new tasks for CPU
3/14/2011 7:13:18 AM World Community Grid Scheduler request completed: got 1 new tasks
3/14/2011 7:13:20 AM World Community Grid Started download of b0d06e8412f7d900649c09129302d43f.zip
3/14/2011 7:13:21 AM World Community Grid Finished download of b0d06e8412f7d900649c09129302d43f.zip
3/14/2011 7:27:14 AM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 7:27:14 AM Poem@Home Requesting new tasks for GPU
3/14/2011 7:27:17 AM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 7:27:17 AM Poem@Home Message from server: No work sent
3/14/2011 7:39:03 AM Docking Resuming task 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0 using charmm34 version 623
3/14/2011 8:42:47 AM RNA World Sending scheduler request: Requested by project.
3/14/2011 8:42:47 AM RNA World Not reporting or requesting tasks
3/14/2011 8:42:56 AM RNA World Scheduler request completed
3/14/2011 8:43:00 AM boincsimap Resuming task 20101211.491368_0 using simap version 512
3/14/2011 8:46:11 AM Poem@Home Resuming task data_487_1300085993_1951072479_0 using poem version 100
3/14/2011 8:48:02 AM GPUGRID Sending scheduler request: To fetch work.
3/14/2011 8:48:02 AM GPUGRID Requesting new tasks for GPU
3/14/2011 8:48:05 AM GPUGRID Scheduler request completed: got 1 new tasks
3/14/2011 8:48:07 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-LICENSE
3/14/2011 8:48:07 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-COPYRIGHT
3/14/2011 8:48:09 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-LICENSE
3/14/2011 8:48:09 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-COPYRIGHT
3/14/2011 8:48:09 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-123-KASHIF_HIVPR_so_ba1-18-50-RND6369_1
3/14/2011 8:48:09 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-123-KASHIF_HIVPR_so_ba1-18-50-RND6369_2
3/14/2011 8:48:19 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-123-KASHIF_HIVPR_so_ba1-18-50-RND6369_2
3/14/2011 8:48:19 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-123-KASHIF_HIVPR_so_ba1-18-50-RND6369_3
3/14/2011 8:48:20 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-123-KASHIF_HIVPR_so_ba1-18-50-RND6369_1
3/14/2011 8:48:20 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-pdb_file
3/14/2011 8:48:41 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-123-KASHIF_HIVPR_so_ba1-18-50-RND6369_3
3/14/2011 8:48:41 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-psf_file
3/14/2011 8:48:42 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-psf_file
3/14/2011 8:48:42 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-par_file
3/14/2011 8:48:47 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-pdb_file
3/14/2011 8:48:47 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-conf_file_enc
3/14/2011 8:48:49 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-conf_file_enc
3/14/2011 8:48:49 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-metainp_file
3/14/2011 8:48:50 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-metainp_file
3/14/2011 8:48:50 AM GPUGRID Started download of 123-KASHIF_HIVPR_so_ba1-19-123-KASHIF_HIVPR_so_ba1-18-50-RND6369_7
3/14/2011 8:48:51 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-123-KASHIF_HIVPR_so_ba1-18-50-RND6369_7
3/14/2011 8:49:15 AM GPUGRID Finished download of 123-KASHIF_HIVPR_so_ba1-19-par_file
3/14/2011 9:00:19 AM boincsimap Computation for task 20101211.491368_0 finished
3/14/2011 9:00:19 AM rosetta@home Resuming task T0599_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23146_3686_0 using minirosetta version 217
3/14/2011 9:00:21 AM boincsimap Started upload of 20101211.491368_0_0
3/14/2011 9:00:22 AM boincsimap Sending scheduler request: To fetch work.
3/14/2011 9:00:22 AM boincsimap Requesting new tasks for CPU
3/14/2011 9:00:24 AM boincsimap Scheduler request completed: got 1 new tasks
3/14/2011 9:00:26 AM boincsimap Started download of 20101211.077579
3/14/2011 9:00:31 AM boincsimap Finished upload of 20101211.491368_0_0
3/14/2011 9:00:34 AM Project communication failed: attempting access to reference site
3/14/2011 9:00:34 AM boincsimap Temporarily failed download of 20101211.077579: HTTP error
3/14/2011 9:00:34 AM boincsimap Backing off 1 min 0 sec on download of 20101211.077579
3/14/2011 9:00:35 AM Internet access OK - project servers may be temporarily down.
3/14/2011 9:01:34 AM boincsimap Started download of 20101211.077579
3/14/2011 9:01:39 AM Project communication failed: attempting access to reference site
3/14/2011 9:01:39 AM boincsimap Temporarily failed download of 20101211.077579: HTTP error
3/14/2011 9:01:39 AM boincsimap Backing off 1 min 0 sec on download of 20101211.077579
3/14/2011 9:01:41 AM Internet access OK - project servers may be temporarily down.
3/14/2011 9:02:40 AM boincsimap Started download of 20101211.077579
3/14/2011 9:02:50 AM boincsimap Finished download of 20101211.077579
3/14/2011 9:21:35 AM GPUGRID Sending scheduler request: To fetch work.
3/14/2011 9:21:35 AM GPUGRID Requesting new tasks for CPU
3/14/2011 9:21:37 AM GPUGRID Scheduler request completed: got 0 new tasks
3/14/2011 9:21:37 AM GPUGRID Message from server: No work sent
3/14/2011 9:21:37 AM GPUGRID Message from server: No work is available for ACEMD beta version
3/14/2011 9:21:37 AM GPUGRID Message from server: Fermi-class GPU not supported by cuda2.2
3/14/2011 9:21:37 AM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
3/14/2011 9:21:37 AM GPUGRID Message from server: No work available for the applications you have selected. Please check your preferences on the web site.
3/14/2011 9:21:37 AM GPUGRID Message from server: (reached limit of 2 GPU tasks in progress)
3/14/2011 9:46:44 AM boincsimap Starting 20101211.077579_2
3/14/2011 9:46:44 AM boincsimap Starting task 20101211.077579_2 using simap version 512
3/14/2011 10:20:51 AM boincsimap update requested by user
3/14/2011 10:20:52 AM boincsimap Sending scheduler request: Requested by user.
3/14/2011 10:20:52 AM boincsimap Reporting 1 completed tasks, not requesting new tasks
3/14/2011 10:20:55 AM boincsimap Scheduler request completed
3/14/2011 10:47:34 AM malariacontrol.net Resuming task wu_990_24_350701_0_1300083521_1 using openMalariaA version 652
3/14/2011 10:52:55 AM GPUGRID Computation for task F658-TONI_KKAL2-14-100-RND8723_1 finished
3/14/2011 10:52:56 AM GPUGRID Starting 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1
3/14/2011 10:52:59 AM GPUGRID Starting task 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1 using acemd2 version 613
3/14/2011 10:52:59 AM malariacontrol.net Computation for task wu_990_24_350701_0_1300083521_1 finished
3/14/2011 10:52:59 AM QMC@HOME Resuming task qasino_b3lypqz-E28_iso34.6809_0 using qasinoAlpha version 501
3/14/2011 10:52:59 AM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 10:52:59 AM malariacontrol.net Requesting new tasks for CPU
3/14/2011 10:53:00 AM GPUGRID Started upload of F658-TONI_KKAL2-14-100-RND8723_1_0
3/14/2011 10:53:00 AM GPUGRID Started upload of F658-TONI_KKAL2-14-100-RND8723_1_1
3/14/2011 10:53:00 AM QMC@HOME Computation for task qasino_b3lypqz-E28_iso34.6809_0 finished
3/14/2011 10:53:00 AM Docking Resuming task 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0 using charmm34 version 623
3/14/2011 10:53:01 AM malariacontrol.net Started upload of wu_990_24_350701_0_1300083521_1_0
3/14/2011 10:53:01 AM malariacontrol.net Started upload of wu_990_24_350701_0_1300083521_1_1
3/14/2011 10:53:01 AM malariacontrol.net Scheduler request completed: got 1 new tasks
3/14/2011 10:53:02 AM malariacontrol.net Finished upload of wu_990_24_350701_0_1300083521_1_0
3/14/2011 10:53:02 AM malariacontrol.net Finished upload of wu_990_24_350701_0_1300083521_1_1
3/14/2011 10:53:02 AM QMC@HOME Started upload of qasino_b3lypqz-E28_iso34.6809_0_0
3/14/2011 10:53:03 AM malariacontrol.net Started download of wu_993_505_351285_0_1300114257
3/14/2011 10:53:05 AM QMC@HOME Finished upload of qasino_b3lypqz-E28_iso34.6809_0_0
3/14/2011 10:53:07 AM malariacontrol.net Finished download of wu_993_505_351285_0_1300114257
3/14/2011 10:53:07 AM QMC@HOME Sending scheduler request: To fetch work.
3/14/2011 10:53:07 AM QMC@HOME Reporting 1 completed tasks, requesting new tasks for CPU
3/14/2011 10:53:08 AM GPUGRID Finished upload of F658-TONI_KKAL2-14-100-RND8723_1_0
3/14/2011 10:53:08 AM GPUGRID Started upload of F658-TONI_KKAL2-14-100-RND8723_1_2
3/14/2011 10:53:10 AM QMC@HOME Scheduler request completed: got 1 new tasks
3/14/2011 10:53:12 AM QMC@HOME Started download of orcaEngradAlpha_6.05_windows_intelx86.exe
3/14/2011 10:53:12 AM QMC@HOME Started download of orca_FHP_0007510.in
3/14/2011 10:53:14 AM QMC@HOME Finished download of orca_FHP_0007510.in
3/14/2011 10:53:30 AM GPUGRID Finished upload of F658-TONI_KKAL2-14-100-RND8723_1_1
3/14/2011 10:53:30 AM GPUGRID Started upload of F658-TONI_KKAL2-14-100-RND8723_1_3
3/14/2011 10:53:32 AM GPUGRID Finished upload of F658-TONI_KKAL2-14-100-RND8723_1_2
3/14/2011 10:53:32 AM GPUGRID Started upload of F658-TONI_KKAL2-14-100-RND8723_1_7
3/14/2011 10:53:33 AM GPUGRID Finished upload of F658-TONI_KKAL2-14-100-RND8723_1_7
3/14/2011 10:53:44 AM GPUGRID Finished upload of F658-TONI_KKAL2-14-100-RND8723_1_3
3/14/2011 10:55:39 AM QMC@HOME Finished download of orcaEngradAlpha_6.05_windows_intelx86.exe
3/14/2011 11:56:23 AM malariacontrol.net Starting wu_993_505_351285_0_1300114257_0
3/14/2011 11:56:23 AM malariacontrol.net Starting task wu_993_505_351285_0_1300114257_0 using openMalariaA version 652
3/14/2011 11:57:33 AM QMC@HOME Starting orca_FHP_0007510_0
3/14/2011 11:57:33 AM QMC@HOME Starting task orca_FHP_0007510_0 using orcaEngradAlpha version 605
3/14/2011 11:59:03 AM World Community Grid Resuming task faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1 using faah version 640
3/14/2011 12:00:39 PM QMC@HOME Computation for task orca_FHP_0007510_0 finished
3/14/2011 12:00:39 PM rosetta@home Resuming task T0599_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23146_3686_0 using minirosetta version 217
3/14/2011 12:00:40 PM QMC@HOME Sending scheduler request: To fetch work.
3/14/2011 12:00:40 PM QMC@HOME Requesting new tasks for CPU
3/14/2011 12:00:41 PM QMC@HOME Started upload of orca_FHP_0007510_0_0
3/14/2011 12:00:42 PM QMC@HOME Scheduler request completed: got 1 new tasks
3/14/2011 12:00:45 PM QMC@HOME Finished upload of orca_FHP_0007510_0_0
3/14/2011 12:00:45 PM QMC@HOME Started download of orca_FHP_0007599.in
3/14/2011 12:00:46 PM QMC@HOME Finished download of orca_FHP_0007599.in
3/14/2011 12:33:00 PM malariacontrol.net Computation for task wu_993_505_351285_0_1300114257_0 finished
3/14/2011 12:33:00 PM QMC@HOME Starting orca_FHP_0007599_0
3/14/2011 12:33:00 PM QMC@HOME Starting task orca_FHP_0007599_0 using orcaEngradAlpha version 605
3/14/2011 12:33:02 PM malariacontrol.net Started upload of wu_993_505_351285_0_1300114257_0_0
3/14/2011 12:33:02 PM malariacontrol.net Started upload of wu_993_505_351285_0_1300114257_0_1
3/14/2011 12:33:03 PM malariacontrol.net Finished upload of wu_993_505_351285_0_1300114257_0_0
3/14/2011 12:33:03 PM malariacontrol.net Finished upload of wu_993_505_351285_0_1300114257_0_1
3/14/2011 12:33:03 PM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 12:33:03 PM malariacontrol.net Reporting 2 completed tasks, requesting new tasks for CPU
3/14/2011 12:33:06 PM malariacontrol.net Scheduler request completed: got 1 new tasks
3/14/2011 12:33:09 PM malariacontrol.net Started download of wu_963_401_345704_0_1299819189
3/14/2011 12:33:10 PM malariacontrol.net Finished download of wu_963_401_345704_0_1299819189
3/14/2011 12:35:20 PM QMC@HOME Computation for task orca_FHP_0007599_0 finished
3/14/2011 12:35:20 PM Poem@Home Resuming task data_487_1300085993_1951072479_0 using poem version 100
3/14/2011 12:35:22 PM QMC@HOME Started upload of orca_FHP_0007599_0_0
3/14/2011 12:35:22 PM QMC@HOME Sending scheduler request: To fetch work.
3/14/2011 12:35:22 PM QMC@HOME Reporting 1 completed tasks, requesting new tasks for CPU
3/14/2011 12:35:24 PM QMC@HOME Scheduler request completed: got 1 new tasks
3/14/2011 12:35:26 PM QMC@HOME Started download of gwfn.data.qasino_b3lypqz-P14_iso34
3/14/2011 12:35:26 PM QMC@HOME Started download of correlation.data.qasino_b3lypqz-P14_iso34
3/14/2011 12:35:27 PM QMC@HOME Finished download of correlation.data.qasino_b3lypqz-P14_iso34
3/14/2011 12:35:27 PM QMC@HOME Started download of input.qasino_b3lypqz-P14_iso34.2860
3/14/2011 12:35:29 PM QMC@HOME Finished download of input.qasino_b3lypqz-P14_iso34.2860
3/14/2011 12:35:29 PM QMC@HOME Started download of c_pp.data.qasino_b3lypqz-P14_iso34
3/14/2011 12:35:30 PM QMC@HOME Finished upload of orca_FHP_0007599_0_0
3/14/2011 12:35:31 PM QMC@HOME Finished download of c_pp.data.qasino_b3lypqz-P14_iso34
3/14/2011 12:35:31 PM QMC@HOME Started download of h_pp.data.qasino_b3lypqz-P14_iso34
3/14/2011 12:35:34 PM QMC@HOME Finished download of h_pp.data.qasino_b3lypqz-P14_iso34
3/14/2011 12:35:34 PM QMC@HOME Started download of n_pp.data.qasino_b3lypqz-P14_iso34
3/14/2011 12:35:36 PM QMC@HOME Finished download of n_pp.data.qasino_b3lypqz-P14_iso34
3/14/2011 12:35:38 PM QMC@HOME Finished download of gwfn.data.qasino_b3lypqz-P14_iso34
3/14/2011 1:00:46 PM QMC@HOME Starting qasino_b3lypqz-P14_iso34.2860_0
3/14/2011 1:00:46 PM QMC@HOME Starting task qasino_b3lypqz-P14_iso34.2860_0 using qasinoAlpha version 501
3/14/2011 1:03:14 PM rosetta@home Resuming task T0599_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23146_3686_0 using minirosetta version 217
3/14/2011 1:36:48 PM World Community Grid Resuming task faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1 using faah version 640
3/14/2011 1:43:01 PM RNA World Sending scheduler request: Requested by project.
3/14/2011 1:43:01 PM RNA World Not reporting or requesting tasks
3/14/2011 1:43:03 PM RNA World Scheduler request completed
3/14/2011 2:04:07 PM malariacontrol.net Starting wu_963_401_345704_0_1299819189_2
3/14/2011 2:04:07 PM malariacontrol.net Starting task wu_963_401_345704_0_1299819189_2 using openMalariaA version 652
3/14/2011 2:21:38 PM GPUGRID Sending scheduler request: Requested by project.
3/14/2011 2:21:38 PM GPUGRID Reporting 1 completed tasks, requesting new tasks for CPU
3/14/2011 2:21:40 PM GPUGRID Scheduler request completed: got 0 new tasks
3/14/2011 2:21:40 PM GPUGRID Message from server: No work sent
3/14/2011 2:21:40 PM GPUGRID Message from server: No work is available for ACEMD beta version
3/14/2011 2:21:40 PM GPUGRID Message from server: Fermi-class GPU not supported by cuda2.2
3/14/2011 2:21:40 PM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
3/14/2011 2:36:16 PM QMC@HOME Sending scheduler request: To fetch work.
3/14/2011 2:36:16 PM QMC@HOME Reporting 1 completed tasks, requesting new tasks for CPU
3/14/2011 2:36:18 PM QMC@HOME Scheduler request completed: got 1 new tasks
3/14/2011 2:36:20 PM QMC@HOME Started download of orca_FHP_0007801.in
3/14/2011 2:36:21 PM QMC@HOME Finished download of orca_FHP_0007801.in
3/14/2011 2:42:30 PM malariacontrol.net Computation for task wu_963_401_345704_0_1299819189_2 finished
3/14/2011 2:42:30 PM boincsimap Resuming task 20101211.077579_2 using simap version 512
3/14/2011 2:42:32 PM malariacontrol.net Started upload of wu_963_401_345704_0_1299819189_2_0
3/14/2011 2:42:32 PM malariacontrol.net Started upload of wu_963_401_345704_0_1299819189_2_1
3/14/2011 2:42:33 PM malariacontrol.net Finished upload of wu_963_401_345704_0_1299819189_2_0
3/14/2011 2:42:33 PM malariacontrol.net Finished upload of wu_963_401_345704_0_1299819189_2_1
3/14/2011 2:42:33 PM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 2:42:33 PM malariacontrol.net Reporting 1 completed tasks, requesting new tasks for CPU
3/14/2011 2:42:35 PM malariacontrol.net Scheduler request completed: got 1 new tasks
3/14/2011 2:42:37 PM malariacontrol.net Started download of wu_964_504_351561_0_1300128020
3/14/2011 2:42:38 PM malariacontrol.net Finished download of wu_964_504_351561_0_1300128020
3/14/2011 2:59:31 PM boincsimap Computation for task 20101211.077579_2 finished
3/14/2011 2:59:31 PM malariacontrol.net Starting wu_964_504_351561_0_1300128020_0
3/14/2011 2:59:31 PM malariacontrol.net Starting task wu_964_504_351561_0_1300128020_0 using openMalariaA version 652
3/14/2011 2:59:33 PM boincsimap Started upload of 20101211.077579_2_0
3/14/2011 2:59:35 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 2:59:35 PM boincsimap Requesting new tasks for CPU
3/14/2011 2:59:39 PM boincsimap Scheduler request completed: got 1 new tasks
3/14/2011 2:59:41 PM boincsimap Started download of 20101211.522461
3/14/2011 2:59:45 PM boincsimap Finished upload of 20101211.077579_2_0
3/14/2011 3:04:59 PM Project communication failed: attempting access to reference site
3/14/2011 3:04:59 PM boincsimap Temporarily failed download of 20101211.522461: HTTP error
3/14/2011 3:04:59 PM boincsimap Backing off 1 min 0 sec on download of 20101211.522461
3/14/2011 3:05:00 PM Internet access OK - project servers may be temporarily down.
3/14/2011 3:05:59 PM boincsimap Started download of 20101211.522461
3/14/2011 3:06:05 PM boincsimap Finished download of 20101211.522461
3/14/2011 3:32:08 PM malariacontrol.net Computation for task wu_964_504_351561_0_1300128020_0 finished
3/14/2011 3:32:08 PM QMC@HOME Resuming task qasino_b3lypqz-P14_iso34.2860_0 using qasinoAlpha version 501
3/14/2011 3:32:10 PM malariacontrol.net Started upload of wu_964_504_351561_0_1300128020_0_0
3/14/2011 3:32:10 PM malariacontrol.net Started upload of wu_964_504_351561_0_1300128020_0_1
3/14/2011 3:32:10 PM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 3:32:10 PM malariacontrol.net Requesting new tasks for CPU
3/14/2011 3:32:13 PM malariacontrol.net Scheduler request completed: got 1 new tasks
3/14/2011 3:32:14 PM malariacontrol.net Finished upload of wu_964_504_351561_0_1300128020_0_0
3/14/2011 3:32:14 PM malariacontrol.net Finished upload of wu_964_504_351561_0_1300128020_0_1
3/14/2011 3:32:15 PM malariacontrol.net Started download of wu_968_523_351622_0_1300130775
3/14/2011 3:32:16 PM malariacontrol.net Finished download of wu_968_523_351622_0_1300130775
3/14/2011 4:04:39 PM World Community Grid Computation for task faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1 finished
3/14/2011 4:04:39 PM World Community Grid Starting E201475_715_C.19.C14H9NOS2Si.00161909.0.set1d06_2
3/14/2011 4:04:39 PM World Community Grid Starting task E201475_715_C.19.C14H9NOS2Si.00161909.0.set1d06_2 using cep2 version 640
3/14/2011 4:04:41 PM World Community Grid Started upload of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1_0
3/14/2011 4:04:41 PM World Community Grid Started upload of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1_1
3/14/2011 4:04:42 PM World Community Grid Finished upload of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1_0
3/14/2011 4:04:42 PM World Community Grid Started upload of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1_2
3/14/2011 4:04:44 PM World Community Grid Finished upload of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1_2
3/14/2011 4:04:44 PM World Community Grid Started upload of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1_3
3/14/2011 4:04:45 PM World Community Grid Finished upload of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1_1
3/14/2011 4:04:45 PM World Community Grid Finished upload of faah19610_ZINC04128093_xEyeSiteXtl5NI_01_1_3
3/14/2011 4:36:15 PM Poem@Home Resuming task data_487_1300085993_1951072479_0 using poem version 100
3/14/2011 4:38:57 PM Poem@Home Computation for task data_487_1300085993_1951072479_0 finished
3/14/2011 4:38:57 PM boincsimap Starting 20101211.522461_1
3/14/2011 4:38:57 PM boincsimap Starting task 20101211.522461_1 using simap version 512
3/14/2011 4:38:59 PM Poem@Home Started upload of data_487_1300085993_1951072479_0_0
3/14/2011 4:38:59 PM Poem@Home Started upload of data_487_1300085993_1951072479_0_1
3/14/2011 4:39:00 PM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 4:39:00 PM Poem@Home Requesting new tasks for CPU
3/14/2011 4:39:02 PM Poem@Home Finished upload of data_487_1300085993_1951072479_0_1
3/14/2011 4:39:03 PM Poem@Home Scheduler request completed: got 1 new tasks
3/14/2011 4:39:04 PM Poem@Home Finished upload of data_487_1300085993_1951072479_0_0
3/14/2011 4:39:05 PM Poem@Home Started download of script.com_1300139109_871967982
3/14/2011 4:39:05 PM Poem@Home Started download of 36568518.top_1300139109_871967982
3/14/2011 4:39:06 PM Poem@Home Finished download of script.com_1300139109_871967982
3/14/2011 4:39:06 PM Poem@Home Started download of input.inp_1300139109_871967982
3/14/2011 4:39:07 PM Poem@Home Finished download of 36568518.top_1300139109_871967982
3/14/2011 4:39:07 PM Poem@Home Finished download of input.inp_1300139109_871967982
3/14/2011 5:38:24 PM boincsimap Computation for task 20101211.522461_1 finished
3/14/2011 5:38:24 PM Docking Resuming task 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0 using charmm34 version 623
3/14/2011 5:38:24 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 5:38:24 PM boincsimap Reporting 1 completed tasks, requesting new tasks for CPU
3/14/2011 5:38:26 PM boincsimap Started upload of 20101211.522461_1_0
3/14/2011 5:38:26 PM boincsimap Scheduler request completed: got 1 new tasks
3/14/2011 5:38:28 PM boincsimap Started download of 20101211.529455
3/14/2011 5:38:34 PM boincsimap Finished upload of 20101211.522461_1_0
3/14/2011 5:38:44 PM Project communication failed: attempting access to reference site
3/14/2011 5:38:44 PM boincsimap Temporarily failed download of 20101211.529455: HTTP error
3/14/2011 5:38:44 PM boincsimap Backing off 1 min 0 sec on download of 20101211.529455
3/14/2011 5:38:45 PM Internet access OK - project servers may be temporarily down.
3/14/2011 5:38:52 PM Poem@Home Starting data_488_1300139109_871967982_0
3/14/2011 5:38:52 PM Poem@Home Starting task data_488_1300139109_871967982_0 using poem version 100
3/14/2011 5:39:44 PM boincsimap Started download of 20101211.529455
3/14/2011 5:39:47 PM boincsimap Finished download of 20101211.529455
3/14/2011 5:43:08 PM Docking Computation for task 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0 finished
3/14/2011 5:43:09 PM World Community Grid Resuming task E201475_715_C.19.C14H9NOS2Si.00161909.0.set1d06_2 using cep2 version 640
3/14/2011 5:43:11 PM Docking Started upload of 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0_0
3/14/2011 5:43:11 PM Docking Started upload of 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0_1
3/14/2011 5:43:12 PM Docking Sending scheduler request: To fetch work.
3/14/2011 5:43:12 PM Docking Requesting new tasks for CPU
3/14/2011 5:43:15 PM Docking Finished upload of 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0_1
3/14/2011 5:43:15 PM Docking Started upload of 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0_2
3/14/2011 5:43:15 PM Docking Scheduler request completed: got 1 new tasks
3/14/2011 5:43:16 PM Docking Finished upload of 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0_0
3/14/2011 5:43:16 PM Docking Finished upload of 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0_2
3/14/2011 5:43:16 PM Docking Started upload of 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0_3
3/14/2011 5:43:17 PM Docking Finished upload of 1m0b1hps_mod0014crossdockinghiv1_27156_234531_0_3
3/14/2011 5:43:17 PM Docking Started download of 1ohr1hps_mod0014crossdockinghiv1_14272_97767.inp
3/14/2011 5:43:22 PM Docking Finished download of 1ohr1hps_mod0014crossdockinghiv1_14272_97767.inp
3/14/2011 6:10:21 PM Docking Sending scheduler request: To fetch work.
3/14/2011 6:10:21 PM Docking Reporting 1 completed tasks, requesting new tasks for GPU
3/14/2011 6:10:22 PM Docking Scheduler request completed: got 0 new tasks
3/14/2011 6:10:27 PM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 6:10:27 PM rosetta@home Requesting new tasks for GPU
3/14/2011 6:10:29 PM rosetta@home Scheduler request completed: got 0 new tasks
3/14/2011 6:10:34 PM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 6:10:34 PM malariacontrol.net Reporting 1 completed tasks, requesting new tasks for GPU
3/14/2011 6:10:36 PM malariacontrol.net Scheduler request completed: got 0 new tasks
3/14/2011 6:10:36 PM malariacontrol.net Message from server: No work sent
3/14/2011 6:10:36 PM malariacontrol.net Message from server: No work is available for malariacontrol.net
3/14/2011 6:10:36 PM malariacontrol.net Message from server: No work is available for openMalaria test version
3/14/2011 6:10:36 PM malariacontrol.net Message from server: No work is available for Prediction of Malaria Prevalence
3/14/2011 6:10:36 PM malariacontrol.net Message from server: No work is available for Estimation of parameters of infection dynamics (variable duration, max 4h)
3/14/2011 6:10:36 PM malariacontrol.net Message from server: No work is available for
3/14/2011 6:10:42 PM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 6:10:42 PM Poem@Home Reporting 1 completed tasks, requesting new tasks for GPU
3/14/2011 6:10:45 PM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 6:10:45 PM Poem@Home Message from server: No work sent
3/14/2011 6:10:50 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:10:50 PM boincsimap Reporting 1 completed tasks, requesting new tasks for GPU
3/14/2011 6:16:08 PM Project communication failed: attempting access to reference site
3/14/2011 6:16:08 PM boincsimap Scheduler request failed: Timeout was reached
3/14/2011 6:16:09 PM Internet access OK - project servers may be temporarily down.
3/14/2011 6:17:08 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:17:08 PM boincsimap Reporting 1 completed tasks, requesting new tasks for GPU
3/14/2011 6:17:30 PM Project communication failed: attempting access to reference site
3/14/2011 6:17:30 PM boincsimap Scheduler request failed: Couldn't connect to server
3/14/2011 6:17:31 PM Internet access OK - project servers may be temporarily down.
3/14/2011 6:18:30 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:18:30 PM boincsimap Reporting 1 completed tasks, requesting new tasks for GPU
3/14/2011 6:18:33 PM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 6:18:33 PM boincsimap Message from server: No work sent
3/14/2011 6:19:43 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:19:43 PM boincsimap Requesting new tasks for GPU
3/14/2011 6:19:48 PM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 6:19:48 PM boincsimap Message from server: No work sent
3/14/2011 6:21:58 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:21:58 PM boincsimap Requesting new tasks for GPU
3/14/2011 6:22:00 PM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 6:22:00 PM boincsimap Message from server: No work sent
3/14/2011 6:23:10 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:23:10 PM boincsimap Requesting new tasks for GPU
3/14/2011 6:23:12 PM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 6:23:12 PM boincsimap Message from server: No work sent
3/14/2011 6:25:25 PM Poem@Home Computation for task data_488_1300139109_871967982_0 finished
3/14/2011 6:25:25 PM Docking Starting 1ohr1hps_mod0014crossdockinghiv1_14272_97767_0
3/14/2011 6:25:25 PM Docking Starting task 1ohr1hps_mod0014crossdockinghiv1_14272_97767_0 using charmm34 version 623
3/14/2011 6:25:27 PM Poem@Home Started upload of data_488_1300139109_871967982_0_0
3/14/2011 6:25:27 PM Poem@Home Started upload of data_488_1300139109_871967982_0_1
3/14/2011 6:25:27 PM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 6:25:27 PM Poem@Home Requesting new tasks for CPU
3/14/2011 6:25:29 PM Poem@Home Finished upload of data_488_1300139109_871967982_0_0
3/14/2011 6:25:29 PM Poem@Home Scheduler request completed: got 1 new tasks
3/14/2011 6:25:31 PM Poem@Home Started download of script.com_1300144434_1925235660
3/14/2011 6:25:31 PM Poem@Home Started download of 32849470.top_1300144434_1925235660
3/14/2011 6:25:32 PM Poem@Home Finished download of script.com_1300144434_1925235660
3/14/2011 6:25:32 PM Poem@Home Started download of input.inp_1300144434_1925235660
3/14/2011 6:25:33 PM Poem@Home Finished upload of data_488_1300139109_871967982_0_1
3/14/2011 6:25:33 PM Poem@Home Finished download of input.inp_1300144434_1925235660
3/14/2011 6:25:34 PM Poem@Home Finished download of 32849470.top_1300144434_1925235660
3/14/2011 6:27:39 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:27:39 PM boincsimap Requesting new tasks for GPU
3/14/2011 6:27:42 PM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 6:27:42 PM boincsimap Message from server: No work sent
3/14/2011 6:34:52 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:34:52 PM boincsimap Requesting new tasks for GPU
3/14/2011 6:34:58 PM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 6:34:58 PM boincsimap Message from server: No work sent
3/14/2011 6:40:08 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:40:08 PM boincsimap Requesting new tasks for GPU
3/14/2011 6:40:12 PM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 6:40:12 PM boincsimap Message from server: No work sent
3/14/2011 6:43:07 PM RNA World Sending scheduler request: Requested by project.
3/14/2011 6:43:07 PM RNA World Not reporting or requesting tasks
3/14/2011 6:43:09 PM RNA World Scheduler request completed
3/14/2011 6:56:14 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 6:56:14 PM boincsimap Requesting new tasks for GPU
3/14/2011 6:56:16 PM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 6:56:16 PM boincsimap Message from server: No work sent
3/14/2011 7:00:26 PM Poem@Home Sending scheduler request: To fetch work.
3/14/2011 7:00:26 PM Poem@Home Reporting 1 completed tasks, requesting new tasks for GPU
3/14/2011 7:00:28 PM Poem@Home Scheduler request completed: got 0 new tasks
3/14/2011 7:00:28 PM Poem@Home Message from server: (Project has no jobs available)
3/14/2011 7:01:39 PM boincsimap Sending scheduler request: To fetch work.
3/14/2011 7:01:39 PM boincsimap Requesting new tasks for GPU
3/14/2011 7:01:41 PM boincsimap Scheduler request completed: got 0 new tasks
3/14/2011 7:01:41 PM boincsimap Message from server: No work sent
3/14/2011 7:21:41 PM GPUGRID Sending scheduler request: Requested by project.
3/14/2011 7:21:41 PM GPUGRID Requesting new tasks for CPU
3/14/2011 7:21:43 PM GPUGRID Scheduler request completed: got 0 new tasks
3/14/2011 7:21:43 PM GPUGRID Message from server: No work sent
3/14/2011 7:21:43 PM GPUGRID Message from server: No work is available for ACEMD beta version
3/14/2011 7:21:43 PM GPUGRID Message from server: Fermi-class GPU not supported by cuda2.2
3/14/2011 7:21:43 PM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
3/14/2011 7:21:43 PM GPUGRID Message from server: No work available for the applications you have selected. Please check your preferences on the web site.
3/14/2011 7:27:41 PM malariacontrol.net Starting wu_968_523_351622_0_1300130775_0
3/14/2011 7:27:41 PM malariacontrol.net Starting task wu_968_523_351622_0_1300130775_0 using openMalariaA version 652
3/14/2011 8:16:00 PM World Community Grid update requested by user
3/14/2011 8:16:04 PM World Community Grid Sending scheduler request: Requested by user.
3/14/2011 8:16:04 PM World Community Grid Reporting 1 completed tasks, requesting new tasks for GPU
3/14/2011 8:16:07 PM World Community Grid Scheduler request completed: got 0 new tasks
3/14/2011 8:16:12 PM rosetta@home Sending scheduler request: To fetch work.
3/14/2011 8:16:12 PM rosetta@home Requesting new tasks for CPU
3/14/2011 8:16:15 PM rosetta@home Scheduler request completed: got 1 new tasks
3/14/2011 8:16:17 PM rosetta@home Started download of groes_boinc_nmr_groes_0.2_fnd_C8_cs_frags_sgourn.boinc.zip
3/14/2011 8:16:17 PM rosetta@home Started download of groes_boinc_nmr_groes_0.2_fnd_C8_cs_frags_sgourn.boinc.flags
3/14/2011 8:16:19 PM rosetta@home Finished download of groes_boinc_nmr_groes_0.2_fnd_C8_cs_frags_sgourn.boinc.flags
3/14/2011 8:16:20 PM World Community Grid Sending scheduler request: To fetch work.
3/14/2011 8:16:20 PM World Community Grid Requesting new tasks for CPU
3/14/2011 8:16:23 PM World Community Grid Scheduler request completed: got 1 new tasks
3/14/2011 8:16:30 PM rosetta@home Finished download of groes_boinc_nmr_groes_0.2_fnd_C8_cs_frags_sgourn.boinc.zip
3/14/2011 8:19:00 PM malariacontrol.net Computation for task wu_968_523_351622_0_1300130775_0 finished
3/14/2011 8:19:00 PM Poem@Home Starting data_487_1300144434_1925235660_0
3/14/2011 8:19:00 PM Poem@Home Starting task data_487_1300144434_1925235660_0 using poem version 100
3/14/2011 8:19:02 PM malariacontrol.net Started upload of wu_968_523_351622_0_1300130775_0_0
3/14/2011 8:19:02 PM malariacontrol.net Started upload of wu_968_523_351622_0_1300130775_0_1
3/14/2011 8:19:04 PM malariacontrol.net Finished upload of wu_968_523_351622_0_1300130775_0_0
3/14/2011 8:19:04 PM malariacontrol.net Finished upload of wu_968_523_351622_0_1300130775_0_1
3/14/2011 8:19:04 PM malariacontrol.net Sending scheduler request: To fetch work.
3/14/2011 8:19:04 PM malariacontrol.net Reporting 1 completed tasks, requesting new tasks for CPU
3/14/2011 8:19:07 PM malariacontrol.net Scheduler request completed: got 1 new tasks
3/14/2011 8:19:09 PM malariacontrol.net Started download of wu_990_512_351961_0_1300147128
3/14/2011 8:19:10 PM malariacontrol.net Finished download of wu_990_512_351961_0_1300147128


Don't have similar logfiles from when the earlier workunit encountered the problem.

All computer IDs replaced with xxxs.

Had the boincmgr.exe program reduced to an icon when the problem occurred; had no messages sent past that to tell me of the problem.

Using the TThrottle V3.11 add-on for BOINC to prevent my computers from overheating.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 69829 - Posted 15 Mar 2011 7:13:09 UTC

Happened again, after I rebooted and increased BOINC's share of the memory to 40%.

Log file before that reboot:

3/14/2011 9:14:51 PM Starting BOINC client version 6.10.58 for windows_x86_64
3/14/2011 9:14:51 PM log flags: file_xfer, sched_ops, task
3/14/2011 9:14:51 PM Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
3/14/2011 9:14:51 PM Data directory: C:\ProgramData\BOINC
3/14/2011 9:14:51 PM Running under account Bobby
3/14/2011 9:14:56 PM Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz [Family 6 Model 23 Stepping 10]
3/14/2011 9:14:56 PM Processor: 6.00 MB cache
3/14/2011 9:14:56 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx

fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe
3/14/2011 9:14:56 PM OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00)
3/14/2011 9:14:56 PM Memory: 8.00 GB physical, 15.66 GB virtual
3/14/2011 9:14:56 PM Disk: 919.67 GB total, 620.79 GB free
3/14/2011 9:14:56 PM Local time is UTC -5 hours
3/14/2011 9:15:00 PM NVIDIA GPU 0: GeForce GTS 450 (driver version 26089, CUDA version 3020, compute capability 2.1, 993MB, 476 GFLOPS

peak)
3/14/2011 9:15:02 PM rosetta@home URL http://boinc.bakerlab.org/rosetta/; Computer ID xxxxxxx; resource share 175
3/14/2011 9:15:02 PM DrugDiscovery URL http://boinc.drugdiscoveryathome.com/; Computer ID xxx; resource share 60
3/14/2011 9:15:02 PM Poem@Home URL http://boinc.fzk.de/poem/; Computer ID xxxxx; resource share 50
3/14/2011 9:15:02 PM Collatz Conjecture URL http://boinc.thesonntags.com/collatz/; Computer ID xxxxx; resource share 20
3/14/2011 9:15:02 PM The Lattice Project URL http://boinc.umiacs.umd.edu/; Computer ID xxxxx; resource share 80
3/14/2011 9:15:02 PM boincsimap URL http://boincsimap.org/boincsimap/; Computer ID xxxxxx; resource share 40
3/14/2011 9:15:02 PM superlinkattechnion URL http://cbl-boinc-server2.cs.technion.ac.il/superlinkattechnion/; Computer ID xxxxx; resource share 100
3/14/2011 9:15:02 PM Docking URL http://docking.cis.udel.edu/; Computer ID xxxxx; resource share 50
3/14/2011 9:15:02 PM Hydrogen@Home URL http://hydrogenathome.org/; Computer ID not assigned yet; resource share 20
3/14/2011 9:15:02 PM QMC@HOME URL http://qah.uni-muenster.de/; Computer ID xxxxxx; resource share 40
3/14/2011 9:15:02 PM ralph@home URL http://ralph.bakerlab.org/; Computer ID xxxxx; resource share 60
3/14/2011 9:15:02 PM ibercivis URL http://registro.ibercivis.es/; Computer ID xxxxxx; resource share 20
3/14/2011 9:15:02 PM GPUGRID URL http://www.gpugrid.net/; Computer ID xxxxx; resource share 35
3/14/2011 9:15:02 PM malariacontrol.net URL http://www.malariacontrol.net/; Computer ID xxxxxx; resource share 45
3/14/2011 9:15:02 PM PrimeGrid URL http://www.primegrid.com/; Computer ID xxxxxx; resource share 20
3/14/2011 9:15:02 PM RNA World URL http://www.rnaworld.de/rnaworld/; Computer ID xxxx; resource share 20
3/14/2011 9:15:02 PM World Community Grid URL http://www.worldcommunitygrid.org/; Computer ID xxxxxx; resource share 175
3/14/2011 9:15:02 PM World Community Grid General prefs: from World Community Grid (last modified 10-Nov-2010 19:33:48)
3/14/2011 9:15:02 PM World Community Grid Computer location: work
3/14/2011 9:15:02 PM General prefs: using separate prefs for work
3/14/2011 9:15:02 PM Reading preferences override file
3/14/2011 9:15:02 PM Preferences:
3/14/2011 9:15:02 PM max memory usage when active: 2866.64MB
3/14/2011 9:15:02 PM max memory usage when idle: 2866.64MB
3/14/2011 9:15:14 PM max disk usage: 30.00GB
3/14/2011 9:15:14 PM max CPUs used: 3
3/14/2011 9:15:14 PM (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
3/14/2011 9:15:15 PM Not using a proxy
3/14/2011 9:15:15 PM Suspending computation - user request
3/14/2011 9:17:06 PM World Community Grid General prefs: from World Community Grid (last modified 10-Nov-2010 19:33:48)
3/14/2011 9:17:06 PM World Community Grid Computer location: work
3/14/2011 9:17:06 PM General prefs: using separate prefs for work
3/14/2011 9:17:06 PM Reading preferences override file
3/14/2011 9:17:06 PM Preferences:
3/14/2011 9:17:06 PM max memory usage when active: 3276.16MB
3/14/2011 9:17:06 PM max memory usage when idle: 3276.16MB
3/14/2011 9:17:06 PM max disk usage: 30.00GB
3/14/2011 9:17:06 PM max CPUs used: 3
3/14/2011 9:17:06 PM (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
3/14/2011 9:17:15 PM GPUGRID Restarting task 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1 using acemd2 version 613
3/14/2011 9:17:17 PM Docking Restarting task 1ohr1hps_mod0014crossdockinghiv1_14272_97767_0 using charmm34 version 623
3/14/2011 9:17:17 PM World Community Grid Restarting task c4cw_target03_070258503_0 using c4cw version 640
3/14/2011 9:17:17 PM boincsimap Starting 20101211.529455_0
3/14/2011 9:17:18 PM boincsimap Starting task 20101211.529455_0 using simap version 512
3/14/2011 9:17:18 PM Docking Sending scheduler request: To fetch work.
3/14/2011 9:17:18 PM Docking Requesting new tasks for GPU
3/14/2011 9:17:20 PM Docking Scheduler request completed: got 0 new tasks
3/14/2011 9:17:23 PM World Community Grid update requested by user
3/14/2011 9:17:25 PM World Community Grid Sending scheduler request: Requested by user.
3/14/2011 9:17:25 PM World Community Grid Reporting 1 completed tasks, requesting new tasks for GPU
3/14/2011 9:17:29 PM World Community Grid Scheduler request completed: got 0 new tasks
3/14/2011 9:30:40 PM PrimeGrid Sending scheduler request: Requested by project.
3/14/2011 9:30:40 PM PrimeGrid Not reporting or requesting tasks
3/14/2011 9:30:44 PM PrimeGrid Scheduler request completed
3/14/2011 10:08:54 PM QMC@HOME Sending scheduler request: To fetch work.
3/14/2011 10:08:54 PM QMC@HOME Requesting new tasks for GPU
3/14/2011 10:08:56 PM QMC@HOME Scheduler request completed: got 0 new tasks
3/14/2011 10:08:56 PM QMC@HOME Message from server: No work sent
3/14/2011 10:17:48 PM rosetta@home Restarting task T0599_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23146_3686_0 using minirosetta

version 217
3/14/2011 10:18:02 PM QMC@HOME Restarting task qasino_b3lypqz-P14_iso34.2860_0 using qasinoAlpha version 501
3/14/2011 10:20:52 PM malariacontrol.net Starting wu_990_512_351961_0_1300147128_1
3/14/2011 10:20:53 PM malariacontrol.net Starting task wu_990_512_351961_0_1300147128_1 using openMalariaA version 652
3/14/2011 10:21:09 PM GPUGRID Sending scheduler request: To fetch work.
3/14/2011 10:21:09 PM GPUGRID Requesting new tasks for GPU
3/14/2011 10:21:12 PM GPUGRID Scheduler request completed: got 1 new tasks
3/14/2011 10:21:14 PM GPUGRID Started download of F435-TONI_KKAL2-14-LICENSE
3/14/2011 10:21:14 PM GPUGRID Started download of F435-TONI_KKAL2-14-COPYRIGHT
3/14/2011 10:21:16 PM GPUGRID Finished download of F435-TONI_KKAL2-14-LICENSE
3/14/2011 10:21:16 PM GPUGRID Finished download of F435-TONI_KKAL2-14-COPYRIGHT
3/14/2011 10:21:16 PM GPUGRID Started download of F435-TONI_KKAL2-14-F435-TONI_KKAL2-13-100-RND6477_1
3/14/2011 10:21:16 PM GPUGRID Started download of F435-TONI_KKAL2-14-F435-TONI_KKAL2-13-100-RND6477_2
3/14/2011 10:21:30 PM GPUGRID Finished download of F435-TONI_KKAL2-14-F435-TONI_KKAL2-13-100-RND6477_2
3/14/2011 10:21:30 PM GPUGRID Started download of F435-TONI_KKAL2-14-F435-TONI_KKAL2-13-100-RND6477_3
3/14/2011 10:21:31 PM GPUGRID Finished download of F435-TONI_KKAL2-14-F435-TONI_KKAL2-13-100-RND6477_1
3/14/2011 10:21:31 PM GPUGRID Started download of F435-TONI_KKAL2-14-pdb_file
3/14/2011 10:21:36 PM GPUGRID Finished download of F435-TONI_KKAL2-14-F435-TONI_KKAL2-13-100-RND6477_3
3/14/2011 10:21:36 PM GPUGRID Started download of F435-TONI_KKAL2-14-psf_file
3/14/2011 10:21:37 PM GPUGRID Finished download of F435-TONI_KKAL2-14-psf_file
3/14/2011 10:21:37 PM GPUGRID Started download of F435-TONI_KKAL2-14-par_file
3/14/2011 10:21:45 PM GPUGRID Finished download of F435-TONI_KKAL2-14-pdb_file
3/14/2011 10:21:45 PM GPUGRID Started download of F435-TONI_KKAL2-14-conf_file_enc
3/14/2011 10:21:46 PM GPUGRID Finished download of F435-TONI_KKAL2-14-conf_file_enc
3/14/2011 10:21:46 PM GPUGRID Started download of F435-TONI_KKAL2-14-metainp_file
3/14/2011 10:21:47 PM GPUGRID Finished download of F435-TONI_KKAL2-14-metainp_file
3/14/2011 10:21:47 PM GPUGRID Started download of F435-TONI_KKAL2-14-F435-TONI_KKAL2-13-100-RND6477_7
3/14/2011 10:21:48 PM GPUGRID Finished download of F435-TONI_KKAL2-14-F435-TONI_KKAL2-13-100-RND6477_7
3/14/2011 10:23:03 PM GPUGRID Finished download of F435-TONI_KKAL2-14-par_file
3/14/2011 11:07:18 PM World Community Grid General prefs: from World Community Grid (last modified 10-Nov-2010 19:33:48)
3/14/2011 11:07:18 PM World Community Grid Computer location: work
3/14/2011 11:07:18 PM General prefs: using separate prefs for work
3/14/2011 11:07:18 PM Reading preferences override file
3/14/2011 11:07:18 PM Preferences:
3/14/2011 11:07:18 PM max memory usage when active: 3276.16MB
3/14/2011 11:07:18 PM max memory usage when idle: 3276.16MB
3/14/2011 11:07:18 PM max disk usage: 30.00GB
3/14/2011 11:07:18 PM max CPUs used: 3
3/14/2011 11:07:18 PM (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
3/14/2011 11:19:21 PM World Community Grid Resuming task c4cw_target03_070258503_0 using c4cw version 640
3/14/2011 11:21:35 PM rosetta@home Resuming task T0599_rH_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_23146_3686_0 using minirosetta

version 217
3/14/2011 11:22:39 PM Poem@Home Restarting task data_487_1300144434_1925235660_0 using poem version 100
3/14/2011 11:24:27 PM Running CPU benchmarks
3/14/2011 11:24:27 PM Suspending computation - running CPU benchmarks
3/14/2011 11:24:59 PM Benchmark results:
3/14/2011 11:24:59 PM Number of CPUs: 3
3/14/2011 11:24:59 PM 3043 floating point MIPS (Whetstone) per CPU
3/14/2011 11:24:59 PM 8777 integer MIPS (Dhrystone) per CPU
3/14/2011 11:25:00 PM Resuming computation
3/14/2011 11:43:11 PM RNA World Sending scheduler request: Requested by project.
3/14/2011 11:43:11 PM RNA World Not reporting or requesting tasks
3/14/2011 11:43:13 PM RNA World Scheduler request completed
3/15/2011 12:23:38 AM Docking Resuming task 1ohr1hps_mod0014crossdockinghiv1_14272_97767_0 using charmm34 version 623
3/15/2011 1:13:50 AM GPUGRID Computation for task 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1 finished
3/15/2011 1:13:51 AM GPUGRID Starting F435-TONI_KKAL2-14-100-RND6477_2
3/15/2011 1:13:53 AM GPUGRID Starting task F435-TONI_KKAL2-14-100-RND6477_2 using acemd2 version 613
3/15/2011 1:13:54 AM GPUGRID Started upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_0
3/15/2011 1:13:54 AM GPUGRID Started upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_1
3/15/2011 1:14:15 AM GPUGRID Finished upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_0
3/15/2011 1:14:15 AM GPUGRID Started upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_2
3/15/2011 1:14:24 AM GPUGRID Finished upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_1
3/15/2011 1:14:24 AM GPUGRID Started upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_3
3/15/2011 1:14:52 AM GPUGRID Finished upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_2
3/15/2011 1:14:52 AM GPUGRID Started upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_4
3/15/2011 1:15:00 AM GPUGRID Finished upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_3
3/15/2011 1:15:00 AM GPUGRID Started upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_7
3/15/2011 1:15:02 AM GPUGRID Finished upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_7
3/15/2011 1:19:54 AM GPUGRID Sending scheduler request: To fetch work.
3/15/2011 1:19:54 AM GPUGRID Requesting new tasks for CPU
3/15/2011 1:19:59 AM GPUGRID Scheduler request completed: got 0 new tasks
3/15/2011 1:19:59 AM GPUGRID Message from server: No work sent
3/15/2011 1:19:59 AM GPUGRID Message from server: No work is available for ACEMD beta version
3/15/2011 1:19:59 AM GPUGRID Message from server: Fermi-class GPU not supported by cuda2.2
3/15/2011 1:19:59 AM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
3/15/2011 1:19:59 AM GPUGRID Message from server: No work available for the applications you have selected. Please check your preferences on the

web site.
3/15/2011 1:19:59 AM GPUGRID Message from server: (reached limit of 2 GPU tasks in progress)
3/15/2011 1:24:52 AM GPUGRID Finished upload of 123-KASHIF_HIVPR_so_ba1-19-50-RND6369_1_4
3/15/2011 1:26:30 AM boincsimap Resuming task 20101211.529455_0 using simap version 512
3/15/2011 1:40:37 AM boincsimap Computation for task 20101211.529455_0 finished
3/15/2011 1:40:37 AM malariacontrol.net Resuming task wu_990_512_351961_0_1300147128_1 using openMalariaA version 652
3/15/2011 1:40:39 AM boincsimap Started upload of 20101211.529455_0_0
3/15/2011 1:40:39 AM boincsimap Sending scheduler request: To fetch work.
3/15/2011 1:40:39 AM boincsimap Requesting new tasks for CPU
3/15/2011 1:40:41 AM boincsimap Scheduler request completed: got 1 new tasks
3/15/2011 1:40:44 AM boincsimap Started download of 20101211.548662
3/15/2011 1:40:49 AM boincsimap Finished upload of 20101211.529455_0_0
3/15/2011 1:40:56 AM malariacontrol.net Computation for task wu_990_512_351961_0_1300147128_1 finished
3/15/2011 1:40:56 AM QMC@HOME Resuming task qasino_b3lypqz-P14_iso34.2860_0 using qasinoAlpha version 501
3/15/2011 1:40:57 AM Project communication failed: attempting access to reference site
3/15/2011 1:40:57 AM boincsimap Temporarily failed download of 20101211.548662: HTTP error
3/15/2011 1:40:57 AM boincsimap Backing off 1 min 0 sec on download of 20101211.548662
3/15/2011 1:40:57 AM malariacontrol.net Sending scheduler request: To fetch work.
3/15/2011 1:40:57 AM malariacontrol.net Requesting new tasks for CPU
3/15/2011 1:40:58 AM Internet access OK - project servers may be temporarily down.
3/15/2011 1:40:58 AM malariacontrol.net Started upload of wu_990_512_351961_0_1300147128_1_0
3/15/2011 1:40:58 AM malariacontrol.net Started upload of wu_990_512_351961_0_1300147128_1_1
3/15/2011 1:40:59 AM malariacontrol.net Finished upload of wu_990_512_351961_0_1300147128_1_0
3/15/2011 1:40:59 AM malariacontrol.net Finished upload of wu_990_512_351961_0_1300147128_1_1
3/15/2011 1:41:00 AM malariacontrol.net Scheduler request completed: got 1 new tasks
3/15/2011 1:41:02 AM malariacontrol.net Started download of wu_962_402_352336_0_1300167027
3/15/2011 1:41:03 AM malariacontrol.net Finished download of wu_962_402_352336_0_1300167027
3/15/2011 1:41:05 AM World Community Grid Sending scheduler request: To fetch work.
3/15/2011 1:41:05 AM World Community Grid Requesting new tasks for CPU
3/15/2011 1:41:07 AM World Community Grid Scheduler request completed: got 1 new tasks
3/15/2011 1:41:09 AM World Community Grid Started download of

E201515_818_C.20.C17H10OS2.00144646.2.set1d06_C.20.C17H10OS2.00144646.2.zip
3/15/2011 1:41:10 AM World Community Grid Finished download of

E201515_818_C.20.C17H10OS2.00144646.2.set1d06_C.20.C17H10OS2.00144646.2.zip
3/15/2011 1:41:58 AM boincsimap Started download of 20101211.548662
3/15/2011 1:42:03 AM boincsimap Finished download of 20101211.548662
3/15/2011 1:42:09 AM malariacontrol.net update requested by user
3/15/2011 1:42:11 AM GPUGRID update requested by user
3/15/2011 1:42:13 AM GPUGRID Sending scheduler request: Requested by user.
3/15/2011 1:42:13 AM GPUGRID Reporting 1 completed tasks, requesting new tasks for CPU
3/15/2011 1:42:15 AM GPUGRID Scheduler request completed: got 0 new tasks
3/15/2011 1:42:15 AM GPUGRID Message from server: No work sent
3/15/2011 1:42:15 AM GPUGRID Message from server: No work is available for ACEMD beta version
3/15/2011 1:42:15 AM GPUGRID Message from server: Fermi-class GPU not supported by cuda2.2
3/15/2011 1:42:15 AM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
3/15/2011 1:42:20 AM malariacontrol.net Sending scheduler request: Requested by user.
3/15/2011 1:42:20 AM malariacontrol.net Reporting 1 completed tasks, not requesting new tasks
3/15/2011 1:42:22 AM malariacontrol.net Scheduler request completed
3/15/2011 1:42:25 AM boincsimap update requested by user
3/15/2011 1:42:27 AM boincsimap Sending scheduler request: Requested by user.
3/15/2011 1:42:27 AM boincsimap Reporting 1 completed tasks, not requesting new tasks
3/15/2011 1:42:31 AM boincsimap Scheduler request completed
3/15/2011 1:43:46 AM GPUGRID Sending scheduler request: To fetch work.
3/15/2011 1:43:46 AM GPUGRID Requesting new tasks for CPU
3/15/2011 1:43:49 AM GPUGRID Scheduler request completed: got 0 new tasks
3/15/2011 1:43:49 AM GPUGRID Message from server: No work sent
3/15/2011 1:43:49 AM GPUGRID Message from server: No work is available for ACEMD beta version
3/15/2011 1:43:49 AM GPUGRID Message from server: Fermi-class GPU not supported by cuda2.2
3/15/2011 1:43:49 AM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
3/15/2011 1:43:49 AM GPUGRID Message from server: No work available for the applications you have selected. Please check your preferences on the

web site.
3/15/2011 1:44:24 AM GPUGRID Sending scheduler request: To fetch work.
3/15/2011 1:44:24 AM GPUGRID Requesting new tasks for CPU
3/15/2011 1:44:26 AM GPUGRID Scheduler request completed: got 0 new tasks
3/15/2011 1:44:26 AM GPUGRID Message from server: No work sent
3/15/2011 1:44:26 AM GPUGRID Message from server: No work is available for ACEMD beta version
3/15/2011 1:44:26 AM GPUGRID Message from server: Fermi-class GPU not supported by cuda2.2
3/15/2011 1:44:26 AM GPUGRID Message from server: ACEMD beta version is not available for your type of computer.
3/15/2011 1:44:26 AM GPUGRID Message from server: No work available for the applications you have selected. Please check your preferences on the

web site.


Note that the last message that even mentions that workunit says it was resuming.

Jesse Viviano

Joined: Jan 14 10
Posts: 39
ID: 366696
Credit: 447,858
RAC: 912
Message 69843 - Posted 17 Mar 2011 17:58:39 UTC

I have noticed what might be a bug with the watchdog. When the watchdog shuts down the result due to having 99 decoys found, the work unit validates. However, if it shuts down the result after 100 or maybe more decoys are found, a validate error happens. See work unit 372159115 and work unit 372173859 for examples of validate errors due to having 100 decoys, and work unit 371520892 and work unit 370797357 for times when the watchdog was able to shut things down at 99 decoys, allowing the result to validate. I don't care about credits, but I hate avoidable wasted processing time that does nothing for science.

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,590,569
RAC: 2,277
Message 69859 - Posted 18 Mar 2011 17:08:54 UTC

As far as I know, the watchdog only checks whether your runtime is exceeded by 4 hours or more. I don't think it checks for the number of decoys generated. I would say that is checked by another part of the code.
____________

TPCBF

Joined: Nov 29 10
Posts: 105
ID: 403518
Credit: 1,704,662
RAC: 2,836
Message 69862 - Posted 18 Mar 2011 23:10:15 UTC - in response to Message ID 69859.

As far as I know, the watchdog only checks whether your runtime is exceeded by 4 hours or more. I don't think it checks for the number of decoys generated. I would say that is checked by another part of the code.
That apparently doesn't work (always). I had a whole bunch of jobs which showed an estimated time to finish when download anywhere from 4-8h, and which I had to abort after 12-18h after they stopped "moving"...

Ralf

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,590,569
RAC: 2,277
Message 69863 - Posted 19 Mar 2011 7:53:14 UTC - in response to Message ID 69862.

As far as I know, the watchdog only checks whether your runtime is exceeded by 4 hours or more. I don't think it checks for the number of decoys generated. I would say that is checked by another part of the code.
That apparently doesn't work (always). I had a whole bunch of jobs which showed an estimated time to finish when download anywhere from 4-8h, and which I had to abort after 12-18h after they stopped "moving"...

Ralf


I've not had that happening to me for some time. Usually it was sufficient for me to completely exit BOINC and restart it. The tasks would start processing again.

I realise that is not an ideal solution. The project should not need that type of micro-managing.
____________

TPCBF

Joined: Nov 29 10
Posts: 105
ID: 403518
Credit: 1,704,662
RAC: 2,836
Message 69880 - Posted 22 Mar 2011 16:31:32 UTC - in response to Message ID 69863.

As far as I know, the watchdog only checks whether your runtime is exceeded by 4 hours or more. I don't think it checks for the number of decoys generated. I would say that is checked by another part of the code.
That apparently doesn't work (always). I had a whole bunch of jobs which showed an estimated time to finish when download anywhere from 4-8h, and which I had to abort after 12-18h after they stopped "moving"...

Ralf


I've not had that happening to me for some time. Usually it was sufficient for me to completely exit BOINC and restart it. The tasks would start processing again.

I realise that is not an ideal solution. The project should not need that type of micro-managing.
Has been fine here for a while now too. And it has to be an issue with certain sets of WU's, as others would go through before and after those "problem childs" just fine...

Ralf

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 69905 - Posted 27 Mar 2011 17:29:04 UTC

It seems like many of these IF3_like_* tasks aren't taking long at all, and some are failing for no obvious reason.

See for example 409586613 which completed 100 decoys successfully (isn't 99 the limit?) but then gave a validate error. A wingman had the same result.

But there have been other tasks, such as 409700121 where 100 decoys were also completed but the outcome was flagged as success.

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,590,569
RAC: 2,277
Message 69906 - Posted 27 Mar 2011 18:58:14 UTC

I, okay my computer, :) has succesfully run tasks generating hundreds of decoys per task. That makes me think the 99/100 decoys limit is not hard coded in the app. It appears to be a limit set by the task itself.
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 69907 - Posted 28 Mar 2011 1:57:27 UTC

Right, some types of tasks can produce models very quickly, and so to avoid the potential for large uploads those types of tasks cap the number of models, other types of tasks can produce greater then 100, but run longer and so still won't typically generate an excessive result size. But that doesn't explain the validation error.
____________
Rosetta Moderator: Mod.Sense

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 69909 - Posted 28 Mar 2011 16:57:27 UTC

I'm getting some more tasks that are failing with a Validate Error after supposedly generating 100 decoys.

This time though, they're failing after 25 seconds and have names starting with ilv_*. The message "Too many error results" occurs in the Workunit page.

409700185
409513977

Jesse Viviano

Joined: Jan 14 10
Posts: 39
ID: 366696
Credit: 447,858
RAC: 912
Message 69954 - Posted 4 Apr 2011 12:32:32 UTC

The work unit generator sometimes generates invalid work units that Rosetta Mini cannot open. Here are four of them:


Every time I run into these work units, it is usually sent out to someone else who also suffers the same kind of error, forcing the server to eventually reject the work units as bad. Each of these work units involves a file that could not be opened, suggesting that these work units are corrupt. Could someone please look into why corrupt work units are being gnereated?

Snagletooth

Joined: Feb 22 07
Posts: 192
ID: 149031
Credit: 1,396,123
RAC: 1,318
Message 69955 - Posted 4 Apr 2011 13:13:28 UTC

T0471_boinc_nmr_max40_rerun_abrelax_cs_frags_tex_IGNORE_THE_REST_23855_1423_0
T0590_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_23858_696_0

both ended with :

process exited with code 1 (0x1, -255)

ERROR: ERROR: FragmentIO: could not open file cs_frags.9mers.gz
ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 258
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

sparkler99

Joined: Mar 13 11
Posts: 7
ID: 414055
Credit: 5,469
RAC: 0
Message 69966 - Posted 4 Apr 2011 22:37:57 UTC

411802601
411731653
411731651

both with can't open cs_frags.9mers.gz file has correct ownership and im on ubuntu so no antivirus interfering

Snagletooth

Joined: Feb 22 07
Posts: 192
ID: 149031
Credit: 1,396,123
RAC: 1,318
Message 69972 - Posted 5 Apr 2011 8:49:04 UTC

Two more, both newly issued overnight:
T0590_boinc_nmr_max40_rerun_abrelax_cs_frags_negative_tex_IGNORE_THE_REST_23856_3040 first copy sent 5 Apr 2011 3:07:32 UTC
T0569_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_23858_6710 first copy sent 5 Apr 2011 5:21:21 UTC

All copies ended with:
process exited with code 1 (0x1, -255)

ERROR: ERROR: FragmentIO: could not open file cs_frags.9mers.gz
ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 258
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 69974 - Posted 5 Apr 2011 11:53:28 UTC

Looks much like a problem I've seen before. If so, the version of gzip sent as part of the workunit works properly in one direction but not the other.

Jesse Viviano

Joined: Jan 14 10
Posts: 39
ID: 366696
Credit: 447,858
RAC: 912
Message 69982 - Posted 6 Apr 2011 10:35:17 UTC

Here is another corrupt work unit: 376739708

While writing this post up, I remembered that one other project, Docking@home, once spewed corrupt work units all over the place because the disk drives on the work unit generation server got completely filled up. Therefore, the work unit generator was creating work units that were zero bytes long. The client software tried to crunch these corrupt work units and went nowhere, and failed to declare compute errors, wasting its volunteers' time and electricity until everyone was told to abort the corrupt work units. Could one of the Rosetta@home servers generating work units have a full hard drive? I am wondering if the same situation at Docking@home is happening here except that the client code is smart enough to declare a computation error when it encounters a corrupt work unit.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 70021 - Posted 12 Apr 2011 16:43:45 UTC

A couple of new protein interface design tasks failing immediately with a computation error on Mac. Sample:

Task 414128879 (dck_rhoA_rhoA_2nr7_final_ProteinInterfaceDesign_11Apr2011_25012_119_0)

ERROR: Option matching -docking:no_filters not found in command line top-level context

moody
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jun 8 10
Posts: 11
ID: 383479
Credit: 88,068
RAC: 0
Message 70030 - Posted 13 Apr 2011 20:08:16 UTC

In response to Message 70021:

"A couple of new protein interface design tasks failing immediately with a computation error on Mac. Sample:

Task 414128879 (dck_rhoA_rhoA_2nr7_final_ProteinInterfaceDesign_11Apr2011_25012_119_0)

ERROR: Option matching -docking:no_filters not found in command line top-level context"

This was due to a Rosetta option that was recently renamed on our end but not in the version of Rosetta currently on Boinc. It should be fixed now. We apologize for any inconvenience.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 70045 - Posted 16 Apr 2011 17:59:04 UTC

This task 414671655 was puzzling in a couple of respects.

First, it took about 7 hours (3 hours request time) to complete 1 decoy. This would be explicable if the model was particularly large but the log indicated that some other error was occurring:


Watchdog active.
BOINC:: CPU time: 25609.9s, 14400s + 10800s[2011- 4-15 1:37: 2:] :: BOINC
WARNING! cannot get file size for default.out.gz: could not open file.
Output exists: default.out.gz Size: -1
InternalDecoyCount: 0 (GZ)
-----
0
-----
Stream information inconsistent.
Writing W_0000001
======================================================
DONE :: 1 starting structures 25609.9 cpu seconds
This process generated 1 decoys from 1 attempts
======================================================
called boinc_finish

</stderr_txt>

bobgoblin

Joined: Oct 15 05
Posts: 2
ID: 4769
Credit: 1,616,056
RAC: 0
Message 70047 - Posted 17 Apr 2011 14:13:48 UTC
Last modified: 17 Apr 2011 14:15:18 UTC

I've noticed that both my i7 machines have been taking 12+ hours to complete wu's. When I look in pending tasks once the are reported they are all showing less than 3 hours. My i5 machine is still crunching them in less than 3 hours. So, I've disabled rosetta on the i7s for now. any idea what may be causing that?

All machines are running win7, though the i7's were upgrade last december from vista64, the i5 had win7 installed when it was built.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 70048 - Posted 17 Apr 2011 16:48:13 UTC - in response to Message ID 70047.
Last modified: 17 Apr 2011 16:48:31 UTC

I've noticed that both my i7 machines have been taking 12+ hours to complete wu's. When I look in pending tasks once the are reported they are all showing less than 3 hours. My i5 machine is still crunching them in less than 3 hours. So, I've disabled rosetta on the i7s for now. any idea what may be causing that?

All machines are running win7, though the i7's were upgrade last december from vista64, the i5 had win7 installed when it was built.


Your main concern (because you've "disabled Rosetta") seems to be whether things are running properly, or doing harm to your machines. Nothing you've described implies any harm. In fact, the tasks complete in the 3 hours of CPU time that you've (likely) set (or defaulted to) in your R@h preferences.

I think what you are saying is that "wall clock" time is over 12 hours, but actual CPU time is around 3 hours. So the question boils down to asking why tasks might not be receiving CPU time when they are trying to run. This could be due to other tasks on the machine demanding CPU (as BOINC runs at lowest possible priority, and will yield to other tasks).

It seems fairly likely that with one 8 core machine running in 6GB of memory and the other 8 core machine running in 8GB of memory, that you would see "waiting for memory" as the status of several tasks rather then "running". This causes BOINC to stop giving the tasks CPU time until the total memory of other active tasks comes back down to be within the preferences set in your BOINC Manager for memory. So, when memory becomes constrained, BOINC is not longer using all of the CPUs of the machine (or all of the CPUs BOINC is configured to use).

This likely is not occurring on your 4-core machine because it has 6GB of memory (50% more per core then the other machines).

This thread has a number of ideas and descriptions of what to expect and what actions you might take to help things run better.
____________
Rosetta Moderator: Mod.Sense

BerlinTomek Profile

Joined: Mar 11 09
Posts: 3
ID: 305503
Credit: 7,701,020
RAC: 4,317
Message 70049 - Posted 17 Apr 2011 20:36:35 UTC
Last modified: 17 Apr 2011 20:37:31 UTC

what does it means?
can you tell me if this errors are because of a hardware fault?

i cant believe because my i7 core (3,74 Ghz overclocked)
never reaches a temp. higher than 60°C

so whats wrong?






"Task ID 415317758
Name T0475_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_23858_18211_0
Workunit 379356995
Created 17 Apr 2011 12:39:28 UTC
Sent 17 Apr 2011 12:42:55 UTC
Received 17 Apr 2011 12:49:46 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 1388706
Report deadline 27 Apr 2011 12:42:55 UTC
CPU time 3.296875
stderr out

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Unzul�ssige Funktion. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
[2011- 4-17 14:44:30:] :: BOINC:: Initializing ... ok.
[2011- 4-17 14:44:30:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/T0475_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...

ERROR: ERROR: FragmentIO: could not open file cs_frags.9mers.gz
ERROR:: Exit from: ..\..\src\core\fragment\FragmentIO.cc line: 258
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 0.022264471690409
Granted credit 0
application version 2.17

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 70052 - Posted 17 Apr 2011 22:25:00 UTC

ERROR: FragmentIO: could not open file cs_frags.9mers.gz

It sounds like there is a problem with work units starting with T0471 and T0475. When you got the error another copy was created for processing and in each case the other person got the same error that you did. So, it sounds like the work unit has a problem, not your machine.
____________
Rosetta Moderator: Mod.Sense

BerlinTomek Profile

Joined: Mar 11 09
Posts: 3
ID: 305503
Credit: 7,701,020
RAC: 4,317
Message 70053 - Posted 17 Apr 2011 23:00:11 UTC - in response to Message ID 70052.

ERROR: FragmentIO: could not open file cs_frags.9mers.gz

It sounds like there is a problem with work units starting with T0471 and T0475. When you got the error another copy was created for processing and in each case the other person got the same error that you did. So, it sounds like the work unit has a problem, not your machine.



ok thanks for the quick answer... i just thought somethings bad going on with my cpu!

hope they will fix the problem as fast as possible.
my machine hates working for useless boinc units ;-)

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 70054 - Posted 18 Apr 2011 4:27:07 UTC

I may have a defective workunit.

Rosetta Mini 2.17
T0533_rH_stg0_lrlxjcst_t000__casp9_w_symm_fm_2qmx_2_SAVE_ALL_OUT_25095_1910

Elapsed 20:56:54, Progress 20.026% and not changing, To completion 40:53:28

BOINC thinks it is running, but it's using no CPU time at all.

No error messages seen.

CPU time at last checkpoint 03:11:48
CPU time 03:12:15

I've selected workunits expected to last 12 hours.

No relevant messages in the BOINC log file since:

4/17/2011 7:40:44 AM rosetta@home Restarting task

T0533_rH_rs_stg0_lrlxjcst_t000__casp9_w_symm_fm_2qmx_2_SAVE_ALL_OUT_25095_1910_0 using minirosetta version 217

I've restarted BOINC to see if having it restart from the last checkpoint will help.

Now showing 05:29:07 elapsed, 20.026% progress, 17:45:40 To completion,and Waiting to run.

Mad_Max

Joined: Dec 31 09
Posts: 150
ID: 365007
Credit: 4,704,855
RAC: 9,235
Message 70056 - Posted 18 Apr 2011 12:15:49 UTC
Last modified: 18 Apr 2011 12:25:45 UTC

Few last days I got big pack of "Compute error" on tasks starting from "T0xxx_".
This tasks ends with errors few seconds after start. Some examples:
T0475_boinc_nmr_homology_max10_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_25329_232_0
T0462_boinc_nmr_homology_max10_abrelax_cs_frags_tex_IGNORE_THE_REST_25326_218_0
T0462_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_23858_12436_1
T0589_symm_cm_runs_soeding_alns_relax_default_repeat_2_fix_csts_25310_1377_1
T0462_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_23858_12293_1
T0569_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_23858_11706_0
T0475_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_23858_11263_1



And 4 WUs with validate errors - all of same type (ProteinG_abinitio_SAVE_ALL_OUT_design_relax_):

ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g007_003_25073_196_1
ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g002_008_25063_189_1
ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g006_001_25071_111_0
ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g003_008_25065_94_1


All other types of WUs works on this machine normal.

P.S.
My wingmans on this WUs have received the same errors.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 70057 - Posted 18 Apr 2011 12:35:20 UTC - in response to Message ID 70054.

Rosetta Mini 2.17
T0533_rH_stg0_lrlxjcst_t000__casp9_w_symm_fm_2qmx_2_SAVE_ALL_OUT_25095_1910

The next morning:

10:48:20 elapsed, 20.021% progress, 25:26:47 to completion.

Now aborted.

Tex1954

Joined: Apr 3 11
Posts: 9
ID: 415829
Credit: 2,607,582
RAC: 12
Message 70058 - Posted 18 Apr 2011 14:55:51 UTC - in response to Message ID 70056.
Last modified: 18 Apr 2011 15:04:51 UTC

Few last days I got big pack of "Compute error" on tasks starting from "T0xxx_".
This tasks ends with errors few seconds after start. Some examples:..."CUT"


Yup, I just had a batch of 8 errors from 4 computers myself. Average 1 error per day per computer. (2 laptops, 2 desktops) In fact, just had like 5 real fast off main computer I just noticed in history too.

***************************************************************************
T0471_boinc_nmr_homology_max10_abrelax_cs_frags_nocst_tex_IGNORE_THE_REST_25328_918_0 00:01:21 (00:00:01) 4/16/2011 8:08:27 PM 4/16/2011 8:10:29 PM Reported: Computation error (1,)

T0471_boinc_nmr_homology_max10_abrelax_cs_frags_tex_IGNORE_THE_REST_25326_931_0 00:01:13 (00:00:01) 4/16/2011 8:06:45 PM 4/16/2011 8:10:29 PM Reported: Computation error (1,)

T0471_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_23858_15071_1 00:01:27 (00:00:02) 4/16/2011 8:05:13 PM 4/16/2011 8:06:45 PM Reported: Computation error (1,)

T0475_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_23858_14479_0 00:01:42 (00:00:02) 4/16/2011 5:04:57 PM 4/16/2011 5:07:01 PM Reported: Computation error (1,)

T0475_boinc_nmr_max40_rerun_abrelax_cs_frags_permuted_tex_IGNORE_THE_REST_23858_14438_1 00:01:37 (00:00:02) 4/16/2011 4:08:01 PM 4/16/2011 4:12:33 PM Reported: Computation error (1,)

***************************************************************************

It isn't OUR fault and they only waste a few seconds. I'm sure the PTB's are on it.


8-)


Tex1954

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 70059 - Posted 18 Apr 2011 15:04:22 UTC
Last modified: 18 Apr 2011 15:05:01 UTC

Few last days I got big pack of "Compute error" on tasks starting from "T0xxx_". This tasks ends with errors few seconds after start. Some examples:


To his list of those ending "in only a few seconds" with the error message "file cs_frags.9mers.gz" you can add:

415450587
415451112
415451774
415525056
415086428

415038519
415038519
415008687
415001340
415384421

415376541
415367010
415339775
415326968
415043715

415337799
415300558
415299126
415289036
415068958

415020984
414880550
415302737
415245580
415216583

415210228
415605253
415563904
415554009
415542857

415540472
415533975
415533394
415519909
415503828

415487923
415485024
415473737
415472523
415466575

415465650
415068765
415064287
415062582
415062402

415053058
415343211
415335164
415333379
415278070

This does NOT represent a COMPLETE listing of what I have seen on my systems - I just listed the FIRST 50 or so that have FAILED with this error so far TODAY. And it is still early.

These errors have been going on for AT LEAST 2 WEEKS and have been the topic of discussion in another thread on this board. These are all "fresh" tasks having been issued by the Rosetta server in the last day or two.

Tex1954

Joined: Apr 3 11
Posts: 9
ID: 415829
Credit: 2,607,582
RAC: 12
Message 70060 - Posted 18 Apr 2011 15:08:10 UTC - in response to Message ID 70059.
Last modified: 18 Apr 2011 15:10:02 UTC

CUT...
These errors have been going on for AT LEAST 2 WEEKS and have been the topic of discussion in another thread on this board. These are all "fresh" tasks having been issued by the Rosetta server in the last day or two.


Any way to know WHEN this group of tasks was generated on the server? Could it be it's an old batch and we just need to work through it?

Or is it possible the errors themselves are significant in the process?

8-)

Tex1954

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 70061 - Posted 18 Apr 2011 15:24:10 UTC

Tex1954 said: Any way to know WHEN this group of tasks was generated on the server? Could it be it's an old batch and we just need to work through it?


Good question - I just looked at a few of them on my previous list and the task creation dates were 16 April and 18 April.

So this is NOT a case of just letting "old" jobs work their way through the system.

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 70062 - Posted 18 Apr 2011 15:41:53 UTC

You can add to Mad Max's list of failing tasks (with matching wingman results) whose name is in the form of:

ProteinG_abinitio_SAVE_ALL_OUT_design_relax

415571046
414989706
414921629
415131368
415102869

415091934
414802441
415091930
415171797
415008017

This is not an exhaustive list of this type of error found on my systems – these were all “fresh” tasks with creation dates between 16 April and 18 April.

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 70063 - Posted 18 Apr 2011 15:56:18 UTC

I am seeing validate errors (with matching wingman results) on tasks whose name has the form of:

T0590_boinc_nmr_homology_max10_loopbuild_threading_cst_relax_tex

A few samples would be:

414980981
414994609
414957506
414950332
415065606

Tex1954

Joined: Apr 3 11
Posts: 9
ID: 415829
Credit: 2,607,582
RAC: 12
Message 70064 - Posted 18 Apr 2011 16:06:46 UTC - in response to Message ID 70063.

I am seeing validate errors (with matching wingman results) on tasks whose name has the form of:

T0590_boinc_nmr_homology_max10_loopbuild_threading_cst_relax_tex

A few samples would be:

414980981
414994609
414957506
414950332
415065606


Well, I suggest they give us a storage fee in the form of double normal points to house their flawed software on our systems...

LOL!

8-)

Tex1954

Chris Holvenstot Profile
Avatar

Joined: May 2 10
Posts: 220
ID: 379129
Credit: 9,106,918
RAC: 0
Message 70066 - Posted 18 Apr 2011 16:18:08 UTC

Text1954 said: Well, I suggest they give us a storage fee in the form of double normal points to house their flawed software on our systems...


There are even a few additional types of tasks getting the validate errors with matching wingman results but there are fewer of them so I'm just going to sit back and see what they do with these before sorting through more the chaff.

It sure would be nice if they would update their server software so that we could pull a task list by Server State / Outcome like some of the other projects have. It would make digging through the results a bunch easier.

Tex1954

Joined: Apr 3 11
Posts: 9
ID: 415829
Credit: 2,607,582
RAC: 12
Message 70069 - Posted 18 Apr 2011 19:51:01 UTC - in response to Message ID 70066.

There are even a few additional types of tasks getting the validate errors with matching wingman results but there are fewer of them so I'm just going to sit back and see what they do with these before sorting through more the chaff.

It sure would be nice if they would update their server software so that we could pull a task list by Server State / Outcome like some of the other projects have. It would make digging through the results a bunch easier.



Well, I am not a developer for their tasks, just a helper with hardware. These (all BOINC etc. tasks) are all cooperative ventures. Sometimes, the certain folks feel superior and/or embarrassed and clog/break the information circle...

Would be nice if "someone" that actually writes the apps would pop in and let us know somebody is awake! As mentioned before, some sort of Status message germane to the current situation? A one line Sticky NOTE for crying out loud?

LOL!

Anyway, plugging along with the rest of ya'll...

8-)

bobgoblin

Joined: Oct 15 05
Posts: 2
ID: 4769
Credit: 1,616,056
RAC: 0
Message 70070 - Posted 19 Apr 2011 1:30:21 UTC - in response to Message ID 70048.



Your main concern (because you've "disabled Rosetta") seems to be whether things are running properly, or doing harm to your machines. Nothing you've described implies any harm. In fact, the tasks complete in the 3 hours of CPU time that you've (likely) set (or defaulted to) in your R@h preferences.

I think what you are saying is that "wall clock" time is over 12 hours, but actual CPU time is around 3 hours. So the question boils down to asking why tasks might not be receiving CPU time when they are trying to run. This could be due to other tasks on the machine demanding CPU (as BOINC runs at lowest possible priority, and will yield to other tasks).

It seems fairly likely that with one 8 core machine running in 6GB of memory and the other 8 core machine running in 8GB of memory, that you would see "waiting for memory" as the status of several tasks rather then "running". This causes BOINC to stop giving the tasks CPU time until the total memory of other active tasks comes back down to be within the preferences set in your BOINC Manager for memory. So, when memory becomes constrained, BOINC is not longer using all of the CPUs of the machine (or all of the CPUs BOINC is configured to use).

This likely is not occurring on your 4-core machine because it has 6GB of memory (50% more per core then the other machines).

This thread has a number of ideas and descriptions of what to expect and what actions you might take to help things run better.



I was not concerned about, nor did I believe, rosetta was causing harm to the i7's. I have not seen the "waiting for memory" message on either machine. The 12+ hour crunch time is a very recent development, only in the last few weeks. Have the memory demands of rosetta increased? If so, then I will not run them on those machines and continue to run seti, cpdn, and einstein instead.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 70072 - Posted 19 Apr 2011 4:25:39 UTC

R@h memory demands vary considerably with different types of tasks. I've noticed several recently that are using more then 300MB of memory at times.

If you catch it happening again, please look at the BOINC Manager to see if you've got 8 tasks in a "running" status, and then look at Windows task manager to see if all 8 are getting CPU time.

There has been another problem that seems to come up from time to time where the task looks like it is running from BOINC, but not actually getting any CPU time. And because it never gets CPU time, the R@h watchdog can't take action to end the task or clean it up (because it would need CPU to take any action). As I recall, the only way around that one (other then aborting the task) is to completely end and restart BOINC (just suspending the task and resuming it doesn't seem to resolve the problem). ...with 7 other tasks running, and standing to loose work they've done since their last checkpoints when you restart BOINC, you may be time ahead to just abort such tasks, or if you know you are going to reboot the machine soon, suspend them until you reboot.

I have not been able to determine any patterns as to what makes this occur when BOINC says the task is running, yet doesn't allocate CPU time to it. So, any details about mix with other projects, or number of tasks involved or amount of memory the stalled task shows being used in Windows task manager... hopefully with enough detail a pattern will begin to emerge. I'm not positive, but I believe this has only been occurring on Windows machines, so perhaps that's a start.
____________
Rosetta Moderator: Mod.Sense

Snagletooth

Joined: Feb 22 07
Posts: 192
ID: 149031
Credit: 1,396,123
RAC: 1,318
Message 70075 - Posted 19 Apr 2011 15:09:03 UTC - in response to Message ID 70072.

I have not been able to determine any patterns as to what makes this occur when BOINC says the task is running, yet doesn't allocate CPU time to it. So, any details about mix with other projects, or number of tasks involved or amount of memory the stalled task shows being used in Windows task manager... hopefully with enough detail a pattern will begin to emerge. I'm not positive, but I believe this has only been occurring on Windows machines, so perhaps that's a start.


Sorry, Mod.Sense, I've seen it occur, albeit extremely rarely, on my Mac. It also occurs on many other projects although it seems more prevalent here on Rosetta; at least, there are more complaints here than on the other boards I peruse. Have you been in contact with Josef Segur? Judging from his most recent contribution to the boinc_dev list ("check_progress option") he has an interest in this problem and could probably point you to discussions elsewhere and/or individuals who are also collecting observations and trying to discern patterns. It also might be helpful if a project the size and import of Rosetta expressed an interest in having BOINC address the issue.

Best,
Snags

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,590,569
RAC: 2,277
Message 70076 - Posted 19 Apr 2011 16:24:05 UTC

Maybe lack of memory is a cause, like you (mod.sense) said. The one machine that I've seen it happening on had 'only' 1 GB. And I saw the message "not enough available memory" regularly on that machine when using other applications. I don't have that machine any more and I can't remember Rosetta stopping on other machines that had more memory, maybe on one machine that had 2 GB.
____________

bump Profile

Joined: Apr 13 10
Posts: 1
ID: 377169
Credit: 2,315,841
RAC: 0
Message 70077 - Posted 19 Apr 2011 17:26:32 UTC

I too am getting the compute errors and I do not believe memory to be the issue. I have 3 boxes and they all get the errors. Two of them have 4GB and they are essentially idle most of the day. According the stats a maximum of 2.1 GB of the 4GB has ever been used. Along with the compute errors I am seeing difficulty in uploading finished jobs. right now I have 10 queued up on one box and they keep getting paused and set for retry.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 70078 - Posted 19 Apr 2011 17:33:07 UTC

Yes, just from my own observations it seemed to be more likely to occur when tasks were contending for memory. What makes it hard to study is the lack of messages about tasks being suspended to wait for memory. And when multiple projects were in the mix, it got very difficult to tell whether another task was started due to memory limits, or due to project switching, or what. Another factor is that when you get on to your machine to look at things, your preference for "when active" memory is often less then "when idle", and so now simply observing it is effecting it too.

I never found a way to cause it to happen. And even when memory is constrained, it doesn't seem to happen very often. Yet when it does happen, it seems to come in waves where you see it several times in just a few days, and then not again for weeks or months. Makes me question if the OS is not properly swapping memory back in when a task is resumed.
____________
Rosetta Moderator: Mod.Sense

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 70080 - Posted 20 Apr 2011 5:00:41 UTC
Last modified: 20 Apr 2011 5:13:40 UTC

I've been seeing it occasionally with a computer running 64-bit Windows Vista, with 8 GB memory available and BOINC allowed to use 40% of that. I've already posted more details about some of those workunits earlier in this thread. No error messages about why unless they're in that workunit's log file.

For the last few, the time when it stopped using any CPU time at all was around 1 minute after it resumed processing after the last checkpoint.

I have the same computer participating in most of the other BOINC projects related to medical research, with those that do not have checkpoints currently disabled. Occasionally two minirosetta workunits at a time; three CPU cores set to allow BOINC use, but I don't remember seeing three minirosetta workunits try to run at once.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 70143 - Posted 27 Apr 2011 16:00:43 UTC

Task (blind_rhoda_boinc_nmr_control.2nz6A_330_abrelax_cs_frags_sgourn_IGNORE_THE_REST_25677_1336_0) 418323998 failed on Mac after about 5 minutes. Other tasks with names like blind* fail similarly.

ERROR: ct == final_atoms
ERROR:: Exit from: src/core/scoring/rms_util.cc line: 475
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

James Thompson

Joined: Oct 13 05
Posts: 46
ID: 4392
Credit: 186,109
RAC: 0
Message 70145 - Posted 27 Apr 2011 17:46:35 UTC

Thanks svincent. This is another input file issue, this time from a different user. The jobs have been removed, and we're working on the problem right now.


____________

Ian_D Profile

Joined: Sep 21 05
Posts: 55
ID: 757
Credit: 4,216,173
RAC: 0
Message 70186 - Posted 30 Apr 2011 21:58:00 UTC
Last modified: 30 Apr 2011 21:59:40 UTC

So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

http://boinc.bakerlab.org/rosetta/result.php?resultid=419011804

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 4-30 2:38:23:] :: BOINC:: Initializing ... ok.
[2011- 4-30 2:38:23:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910F1E read attempt to address 0x00000001

Engaging BOINC Windows Runtime Debugger...

[2011- 4-30 21:39:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 21:39:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_2 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 6:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 6:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 7:11:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 7:11:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk13_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk14_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk15_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk16_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk17_fa ... success!
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 8:47:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 8:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22:17:41:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22:17:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
======================================================
DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>
____________


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 70193 - Posted 1 May 2011 7:50:35 UTC - in response to Message ID 70186.

Despite the heartbeat issues, you did complete the task:

DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts


So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

http://boinc.bakerlab.org/rosetta/result.php?resultid=419011804

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 4-30 2:38:23:] :: BOINC:: Initializing ... ok.
[2011- 4-30 2:38:23:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910F1E read attempt to address 0x00000001

Engaging BOINC Windows Runtime Debugger...

[2011- 4-30 21:39:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 21:39:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_2 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 6:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 6:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 7:11:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 7:11:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk13_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk14_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk15_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk16_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk17_fa ... success!
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 8:47:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 8:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22:17:41:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22:17:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
======================================================
DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>

Ian_D Profile

Joined: Sep 21 05
Posts: 55
ID: 757
Credit: 4,216,173
RAC: 0
Message 70198 - Posted 1 May 2011 11:20:39 UTC - in response to Message ID 70193.

Yep, task completed after I restarted BOINC - put into snooze , then shutdown and started (in that order) ??

Despite the heartbeat issues, you did complete the task:

DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts


So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

http://boinc.bakerlab.org/rosetta/result.php?resultid=419011804

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 4-30 2:38:23:] :: BOINC:: Initializing ... ok.
[2011- 4-30 2:38:23:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910F1E read attempt to address 0x00000001

Engaging BOINC Windows Runtime Debugger...

[2011- 4-30 21:39:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 21:39:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_2 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 6:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 6:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 7:11:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 7:11:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk13_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk14_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk15_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk16_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk17_fa ... success!
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 8:47:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 8:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22:17:41:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22:17:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
======================================================
DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>



____________


Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 70201 - Posted 1 May 2011 13:11:04 UTC - in response to Message ID 70198.

Well guess we will have to wait for the Grad student to wake up and come on duty to fully address you question. I found a little something from the Wiki of Boinc that addresses this issue:

Why am I getting a 'Reason: Access Violation (0xc0000005) error'?

1. Change your preferences to leave Rosetta@Home in memory, General Preferences Log in (at General Preferences if you're not already) -> Edit Preferences (down the bottom) -> Leave applications in memory while preempted? Check yes and click the update preferences button; also, remember to "update" the BOINC Client Software so that the changes are downloaded. Open the BOINC Manager and select the "Projects Tab", left-click on "Rosetta@home" to select the project, and click the "Update" Button.
2. An error occurred somewhere on the computer, it could have been the BOINC Client Software or the Rosetta@Home Science Application or any programme that your computer was doing at the time. This is not a Rosetta@Home specific error, as far as I am aware it happens, on occasion, in all of the BOINC Powered Projects with all of the Science Applications. Keep Rosetta@Home in memory and ignore this problem if it's not getting out of hand.

I'm going to leave it at that....wait for the big experts

Yep, task completed after I restarted BOINC - put into snooze , then shutdown and started (in that order) ??

Despite the heartbeat issues, you did complete the task:

DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts


So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

http://boinc.bakerlab.org/rosetta/result.php?resultid=419011804

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 4-30 2:38:23:] :: BOINC:: Initializing ... ok.
[2011- 4-30 2:38:23:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910F1E read attempt to address 0x00000001

Engaging BOINC Windows Runtime Debugger...

[2011- 4-30 21:39:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 21:39:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_2 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 6:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 6:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 7:11:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 7:11:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk13_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk14_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk15_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk16_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk17_fa ... success!
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 8:47:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 8:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22:17:41:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22:17:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
======================================================
DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>



Ian_D Profile

Joined: Sep 21 05
Posts: 55
ID: 757
Credit: 4,216,173
RAC: 0
Message 70205 - Posted 1 May 2011 14:50:41 UTC - in response to Message ID 70201.

Cheers Greg !

Well guess we will have to wait for the Grad student to wake up and come on duty to fully address you question. I found a little something from the Wiki of Boinc that addresses this issue:

Why am I getting a 'Reason: Access Violation (0xc0000005) error'?

1. Change your preferences to leave Rosetta@Home in memory, General Preferences Log in (at General Preferences if you're not already) -> Edit Preferences (down the bottom) -> Leave applications in memory while preempted? Check yes and click the update preferences button; also, remember to "update" the BOINC Client Software so that the changes are downloaded. Open the BOINC Manager and select the "Projects Tab", left-click on "Rosetta@home" to select the project, and click the "Update" Button.
2. An error occurred somewhere on the computer, it could have been the BOINC Client Software or the Rosetta@Home Science Application or any programme that your computer was doing at the time. This is not a Rosetta@Home specific error, as far as I am aware it happens, on occasion, in all of the BOINC Powered Projects with all of the Science Applications. Keep Rosetta@Home in memory and ignore this problem if it's not getting out of hand.

I'm going to leave it at that....wait for the big experts

Yep, task completed after I restarted BOINC - put into snooze , then shutdown and started (in that order) ??

Despite the heartbeat issues, you did complete the task:

DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts


So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

http://boinc.bakerlab.org/rosetta/result.php?resultid=419011804

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2011- 4-30 2:38:23:] :: BOINC:: Initializing ... ok.
[2011- 4-30 2:38:23:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910F1E read attempt to address 0x00000001

Engaging BOINC Windows Runtime Debugger...

[2011- 4-30 21:39:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 21:39:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_1 ... success!
Continuing computation from checkpoint: chk_S_00046_FragmentSampler__stage4_kk_2 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 6:36:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 6:36:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 7:11:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 7:11:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage1 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage2 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage3 ... success!
Continuing computation from checkpoint: chk_S_00052_FragmentSampler__stage4_kk_1 ... success!
# cpu_run_time_pref: 28800
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk1_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk2_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk3_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk4_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk5_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk6_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk7_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk8_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk9_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk10_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk11_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk12_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk13_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk14_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk15_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk16_fa ... success!
Continuing computation from checkpoint: chk_S_00052_FastRelax__chk17_fa ... success!
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22: 8:47:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22: 8:47:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
No heartbeat from core client for 30 sec - exiting
[2011- 4-30 22:17:41:] :: BOINC:: Initializing ... ok.
[2011- 4-30 22:17:41:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev39052.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/casd_sr10_boinc_nmr_control.1ff3B_20_abrelax_cs_frags_tex.boinc.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
# cpu_run_time_pref: 28800
======================================================
DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish

</stderr_txt>
]]>





____________


Snagletooth

Joined: Feb 22 07
Posts: 192
ID: 149031
Credit: 1,396,123
RAC: 1,318
Message 70210 - Posted 1 May 2011 17:27:12 UTC - in response to Message ID 70201.

Well guess we will have to wait for the Grad student to wake up and come on duty to fully address you question. I found a little something from the Wiki of Boinc that addresses this issue:

Why am I getting a 'Reason: Access Violation (0xc0000005) error'?

1. Change your preferences to leave Rosetta@Home in memory, General Preferences Log in (at General Preferences if you're not already) -> Edit Preferences (down the bottom) -> Leave applications in memory while preempted? Check yes and click the update preferences button; also, remember to "update" the BOINC Client Software so that the changes are downloaded. Open the BOINC Manager and select the "Projects Tab", left-click on "Rosetta@home" to select the project, and click the "Update" Button.
2. An error occurred somewhere on the computer, it could have been the BOINC Client Software or the Rosetta@Home Science Application or any programme that your computer was doing at the time. This is not a Rosetta@Home specific error, as far as I am aware it happens, on occasion, in all of the BOINC Powered Projects with all of the Science Applications. Keep Rosetta@Home in memory and ignore this problem if it's not getting out of hand.

I'm going to leave it at that....wait for the big experts

Yep, task completed after I restarted BOINC - put into snooze , then shutdown and started (in that order) ??

[quote]Despite the heartbeat issues, you did complete the task:

DONE :: 56 starting structures 14704.5 cpu seconds
This process generated 56 decoys from 56 attempts


[quote]So what's with the following

No heartbeat from core client for 30 sec - exiting
messages ?

Job had been sitting doing NOTHING for 13.5 hours (???) which I noticed and subsequently restarted BOINC.

The Windows XP PC concerned is using nVidia onboard graphics (no idea if this has any bearing)

http://boinc.bakerlab.org/rosetta/result.php?resultid=419011804



The "no heartbeat" message means the science app and BOINC client lost contact with each other. When the science application doesn't receive the heartbeat (the "I'm alive") message from BOINC it is supposed to exit. As long as it was merely a temporary obstruction and BOINC hasn't actually crashed it should see that the application has stopped, restart it and proceed merrily on its way. Only when it happens repeatedly with a single task (100 times) does BOINC give up, sending that task back and starting a brand new task. If I'm reading correctly the "no heartbeat" messages occurred after you had restarted BOINC and Rosetta was able to successfully complete the task despite them. They may or may not be related to the cause of the error Gregg highlighted and which may have led to a BOINC crash which it couldn't recover from without a restart, thus the long delay until you noticed, restarted, and set BOINC and Rosetta on their merry way again.

You might try to recall what else was running on your computer at the time of the "no heartbeat" messages (22:6:36, 22:7:11, 22:8:47, 22:17:41). Anti-virus, anti-spyware, some other maintenance type scan, indexing? Could be something you started deliberately or could be something running automatically in the background. I don't suppose you started some new process (indexing, say) between 2:38:23 and the time BOINC stopped (which, if BOINC hadn't been running for 13.5 hours when you restarted must have been about 8. Is that right?). That could point to the cause of the crash and, if the process was ongoing (or maybe set to check for changes, like an index or a backup), could also explain the "no heartbeat" messages.


Best,
Snags

TPCBF

Joined: Nov 29 10
Posts: 105
ID: 403518
Credit: 1,704,662
RAC: 2,836
Message 70211 - Posted 1 May 2011 21:20:39 UTC

Hey guys, is it really necessary to full quote the same stuff over and over again? :(

Ralf

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 70212 - Posted 2 May 2011 0:00:18 UTC - in response to Message ID 70211.

Hey guys, is it really necessary to full quote the same stuff over and over again? :(

Ralf



Well in this case it keeps everything together in one block so we can reference ALL the information, the error messages, initial complaint, possible solutions and information about the error.

This is a small enough thread it wasn't that big of a deal.
In bigger threads it can be a problem.

Ive been around long enough to know how some of us as a joke created a thread so long by just replying to the same quote time after time. Mod remembers this.
So this is just a pidly thread.

Ian_D Profile

Joined: Sep 21 05
Posts: 55
ID: 757
Credit: 4,216,173
RAC: 0
Message 70214 - Posted 2 May 2011 9:23:19 UTC

Think I may have "solved" this one and as you so rightly said, it looks like it was a hardware problem. Looking at System info messages I've been getting a lot of intermittent paging problems to one of the hard disks aroud the times of the Reason: Access Violation (0xc0000005) failures

Cheers for the steer !

Ian
____________


TimL

Joined: Sep 16 06
Posts: 14
ID: 112884
Credit: 8,492,974
RAC: 5,867
Message 70217 - Posted 2 May 2011 10:24:30 UTC

FOLD_N_DOCK_YgaP_D2symm_2_SAVE_ALL_OUT_IGNORE_THE_REST_w_csts_26019_4307_0 ran for over 17 hours (Target run time = 8 hours) then failed with this error.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 70221 - Posted 2 May 2011 15:36:37 UTC - in response to Message ID 70210.
Last modified: 2 May 2011 16:07:29 UTC

Well guess we will have to wait for the Grad student to wake up


The "no heartbeat" message means the science app and BOINC client lost contact with each other. When the science application doesn't receive the heartbeat (the "I'm alive") message from BOINC it is supposed to exit. As long as it was merely a temporary obstruction and BOINC hasn't actually crashed it should see that the application has stopped, restart it and proceed merrily on its way. Only when it happens repeatedly with a single task (100 times) does BOINC give up, sending that task back and starting a brand new task. If I'm reading correctly the "no heartbeat" messages occurred after you had restarted BOINC and Rosetta was able to successfully complete the task despite them. They may or may not be related to the cause of the error Gregg highlighted and which may have led to a BOINC crash which it couldn't recover from without a restart, thus the long delay until you noticed, restarted, and set BOINC and Rosetta on their merry way again.

You might try to recall what else was running on your computer at the time of the "no heartbeat" messages (22:6:36, 22:7:11, 22:8:47, 22:17:41). Anti-virus, anti-spyware, some other maintenance type scan, indexing? Could be something you started deliberately or could be something running automatically in the background. I don't suppose you started some new process (indexing, say) between 2:38:23 and the time BOINC stopped (which, if BOINC hadn't been running for 13.5 hours when you restarted must have been about 8. Is that right?). That could point to the cause of the crash and, if the process was ongoing (or maybe set to check for changes, like an index or a backup), could also explain the "no heartbeat" messages.


Best,
Snags


When I've seen similar error messages, the Norton Internet Security antivirus program was always running in the background (no good way to shut it off other than uninstalling it). Not sure if that's also why I often see the BOINC Manager program lose contact with the rest of BOINC. Do the other people seeing this problem also use Norton Internet Security?

Ian_D Profile

Joined: Sep 21 05
Posts: 55
ID: 757
Credit: 4,216,173
RAC: 0
Message 70227 - Posted 2 May 2011 22:01:05 UTC - in response to Message ID 70221.


When I've seen similar error messages, the Norton Internet Security antivirus program was always running in the background (no good way to shut it off other than uninstalling it). Not sure if that's also why I often see the BOINC Manager program lose contact with the rest of BOINC. Do the other people seeing this problem also use Norton Internet Security?


Nope, will NOT have Norton on any of my machines.
____________


Jesse Viviano

Joined: Jan 14 10
Posts: 39
ID: 366696
Credit: 447,858
RAC: 912
Message 70254 - Posted 5 May 2011 23:03:56 UTC

I just got a validate error on work unit 420544516. Could someone please investigate why the validator failed here?

Jesse Viviano

Joined: Jan 14 10
Posts: 39
ID: 366696
Credit: 447,858
RAC: 912
Message 70255 - Posted 6 May 2011 3:42:10 UTC - in response to Message ID 70254.

I just got a validate error on work unit 420544516. Could someone please investigate why the validator failed here?

Oops! That should be result 420544516. The corresponding work unit number is 383771914.

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 507,926
RAC: 0
Message 70256 - Posted 6 May 2011 5:21:06 UTC

420656625 FOLD_N_DOCK_dagk_D2symm got Validate state Invalid after CPU time 2010.416 run time meant to be 3 hours. corresponding work unit number 420591203 got after Validate state Invalid after CPU time 3843.709 (has debug message)
____________
Have a crunching good day!!

SafeAggie

Joined: Oct 22 05
Posts: 3
ID: 6134
Credit: 458,414
RAC: 0
Message 70272 - Posted 7 May 2011 18:42:48 UTC

Validate Error: ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g056_009_26017_78
wuid=382515464
resultid=419989702


Validate Error: ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g056_010_26017_78
wuid=382515501
resultid=419989703

.clair.

Joined: Jan 2 07
Posts: 45
ID: 139198
Credit: 5,925,071
RAC: 3,321
Message 70276 - Posted 7 May 2011 20:59:48 UTC
Last modified: 7 May 2011 21:06:00 UTC

Validate error -ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g061_005_26530_180_0
http://boinc.bakerlab.org/rosetta/result.php?resultid=420705463
____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 70280 - Posted 8 May 2011 7:09:51 UTC

Error Message: - Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x7C812AFB
Wingman also had the same problem with a little longer run time.


Tasks:

FOLD_N_DOCK_2kqt_D2symm_SAVE_ALL_OUT_IGNORE_THE_REST_26674_9746_0
http://boinc.bakerlab.org/rosetta/result.php?resultid=421054105

FOLD_N_DOCK_2kqt_D2symm_SAVE_ALL_OUT_IGNORE_THE_REST_26674_1528_0
http://boinc.bakerlab.org/rosetta/result.php?resultid=420870386

FOLD_N_DOCK_dagk_D2symm_SAVE_ALL_OUT_IGNORE_THE_REST_26520_9259_1
http://boinc.bakerlab.org/rosetta/result.php?resultid=420803687

.clair.

Joined: Jan 2 07
Posts: 45
ID: 139198
Credit: 5,925,071
RAC: 3,321
Message 70284 - Posted 8 May 2011 16:00:24 UTC

Validate error - ProteinG_abinitio_SAVE_ALL_OUT_design_relax_g049_008_26508_177

Both of us that crunched this unit got this error

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=383720482
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 70285 - Posted 8 May 2011 16:43:08 UTC
Last modified: 8 May 2011 16:47:27 UTC

Another workunit that appeared to stop using any CPU time at all shortly after a checkpoint, but BOINC thought it was still running for about 2 more days elapsed:

pred_ECH19_lr19a_189_0003_nh.pdb_26473_588_0

However, it eventually decided that it had gone past a time limit and engaged the BOINC debugger. Could there be a problem with the BOINC debugger announcing that it is finished, and the workunit should be marked as ended?

Also, the listing of my results does not appear to contain any information on which version of minirosetta was used. 2.17 is the latest version, so I'm assuming that one.

Not sure if the Tthrottle add-on I'm using to prevent my computer from overheating has any effect on this problem.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 70456 - Posted 31 May 2011 1:31:47 UTC

Task 426111314 ( lysozyme_var_quota_8_15_noH_SAVE_ALL_OUT_27153_445_0 ) failed immediately on Mac.

ERROR: ERROR: FragmentIO: could not open file q-noHom.frags.15mers.gz
ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 258
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

.clair.

Joined: Jan 2 07
Posts: 45
ID: 139198
Credit: 5,925,071
RAC: 3,321
Message 70491 - Posted 2 Jun 2011 20:35:39 UTC

Compute error after 3 seconds

lysozyme_var_dis_8_15_SAVE_ALL_OUT_27136_429_0

both of us got the same error

ERROR: ERROR: FragmentIO: could not open file cs-lys.15mers.gz
ERROR:: Exit from: src/core/fragment/FragmentIO.cc line: 258
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=388396144
____________

Dean Costello

Joined: Feb 8 11
Posts: 4
ID: 410836
Credit: 5,967,395
RAC: 6,657
Message 70520 - Posted 8 Jun 2011 22:55:42 UTC

Hello,
I hate to leave this message because it seems like a problem that has already been answered somewhere, but I can't find it.

Here's the thing: I get the following error on my new iMac running a new version of BOINC (6.12.26)

Wed Jun 8 18:35:23 2011 | rosetta@home | Sending scheduler request: Requested by user.
Wed Jun 8 18:35:23 2011 | rosetta@home | Reporting 27 completed tasks, requesting new tasks for CPU
Wed Jun 8 18:35:23 2011 | | [error] Can't create HTTP response output file sched_reply_boinc.bakerlab.org_rosetta.xml
Wed Jun 8 18:35:23 2011 | rosetta@home | Scheduler request initialization failed: fopen() failed

Any ideas on this?
-
Dean Costello

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 70523 - Posted 9 Jun 2011 11:36:27 UTC

From the sound of it, the network is fine and the reply is coming back... but the BOINC Core client is unable to create a new file to write a copy of the response to. So either disk is full, or permissions are not correct to do so.
____________
Rosetta Moderator: Mod.Sense

Dean Costello

Joined: Feb 8 11
Posts: 4
ID: 410836
Credit: 5,967,395
RAC: 6,657
Message 70529 - Posted 9 Jun 2011 23:02:05 UTC

>From the sound of it, the network is fine and the reply is coming back... but the BOINC Core client is >unable to create a new file to write a copy of the response to.

Thanks for getting back to me. For further information, the Seti and Climate Prediction projects seem to be uploading/downloading as they normally do.

> So either disk is full, or permissions are not correct to do so.

I have a dab less than a TB free, so I don't think that it's a disk error. So, how do I address the permission issue?

Thank you for your help.
-
Dean Costello

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 70530 - Posted 9 Jun 2011 23:50:44 UTC

Dean, I found this thread that describes a possible scenario... but if other projects are working that doesn't sound like the problem. I've never heard of such a thing before. Since it doesn't seen related to the minirosetta v2.17 (the topic of this thread), if the info. in that link doesn't help, please open a new thread on the Number Crunching board to describe and discuss the issue further.
____________
Rosetta Moderator: Mod.Sense

Dean Costello

Joined: Feb 8 11
Posts: 4
ID: 410836
Credit: 5,967,395
RAC: 6,657
Message 70535 - Posted 10 Jun 2011 21:26:13 UTC

Again, thanks for getting back in touch.

I took a look at the referenced thread, and it seems that it is more associated with the PC and system directories than anything else, so I don't think that it is applicable to my mighty iMac. I'll go ahead and open it up to the number crunching crowd and see what happens.

Appreciate the information.
-
Dean Costello

beenie210772

Joined: Jan 20 06
Posts: 1
ID: 52804
Credit: 3,352,884
RAC: 2,345
Message 70545 - Posted 12 Jun 2011 11:18:48 UTC

Not sure if this is a problem with the work units or a server problem but i've got 6 units that are failing to report & update, all i keep getting in the log is
Project communication failed: attempting access to reference site
Internet access OK - project servers may be temporarily down.
If it's of any use the offending units are as follows :-

M50_boinc_NH_restraints_abrelax_cs_frags_tex_IGNORE_THE_REST_26671_153491
M50_boinc_NH_restraints_abrelax_cs_frags_tex_IGNORE_THE REST_26671_164636
M50_boinc_NH_restraints_abrelax_cs_frags_tex_IGNORE_THE_REST_26671_172575
heIF5_NTD_boinc_rosetta_cm_abrelax_cs_frags_hari_IGNORE_THE_REST_26734_168415
heIF5_NTD_boinc_rosetta_cm_abrelax_cs_frags_hari_IGNORE_THE_REST_26734_174250
heIF5_NTD_boinc_rosetta_cm_abrelax_cs_frags_hari_IGNORE_THE_REST_26734_180738
thanks in advance for any help

____________

Rabinovitch Profile
Avatar

Joined: Apr 28 07
Posts: 28
ID: 170444
Credit: 1,377,008
RAC: 1,448
Message 70560 - Posted 16 Jun 2011 2:08:10 UTC

Please someone tell why this application create 4 (!) processes (under Linux) with memory consumption same as one process under Windows? Thus it needs 4 times more RAM space under Linux to process the WU comparing to Windows! It's very prodigally and I want to solve this bug ('cause it can't be a legal option by design).
____________
From Siberia with love!

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 70566 - Posted 17 Jun 2011 0:27:46 UTC - in response to Message ID 70560.

Please someone tell why this application create 4 (!) processes (under Linux) with memory consumption same as one process under Windows? Thus it needs 4 times more RAM space under Linux to process the WU comparing to Windows! It's very prodigally and I want to solve this bug ('cause it can't be a legal option by design).


You can configure BOINC Manager to use as many CPUs as you would like in the preferences. If the RAM consumption is causing problems on your machine, you can also work specifically with setting limits on RAM BOINC is allowed to use.
____________
Rosetta Moderator: Mod.Sense

Rabinovitch Profile
Avatar

Joined: Apr 28 07
Posts: 28
ID: 170444
Credit: 1,377,008
RAC: 1,448
Message 70588 - Posted 19 Jun 2011 17:52:45 UTC

It's all OK now with new version (3.14). Seems that it was just a bug.

Message boards : Number crunching : minirosetta 2.17


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^