Message boards : Number crunching : Rosetta@Home version 3.31
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2140 Credit: 41,518,559 RAC: 10,612 |
I'm not sure if this can be any help. But, after my original post I looked in the computing preferences menu and I found that the last category, the one that says "Use at most __% of cpu time" was at 50%. So I reset everything to the default settings. Now it's at 100%, and since then not one work unit has failed. Maybe it's just a coincidence. Anyway I thought I would share. I think I reduced this CPU-time setting at one time, thinking it wouldn't fully utilise the core, but when I started using a Boinc sidebar app it was telling me what it really did was run each core at 100%, then 0%, then 100% etc in a proportion that matched whatever %age I put in there. There was never a 50% or whatever level. So in trying to ease the strain on the processor I was actually sending it crazy! This was actually a long time ago on a very old Boinc Manager version, but maybe this was your problem too? Who knows? But if resetting it to 100% works for you, that's great. Let's chalk it down as a win ;) PS: I notice you have a Quad-core machine but use just a 1-hour run-time - presumably because you weren't confident of completing a task before it crashed out. If that's the case and you're happy now, revert to at least the default runtime (3-hours) to reduce the bandwidth you use and ease the load on the Rosetta-servers too |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
CASP9_ai_benchmark_hybridization_run10_T0641_SAVE_ALL_OUT_IGNORE_THE_REST_39923_1470_0 died after 3827 seconds also CASP9_bg_benchmark_hybridization_run33_T0608_SAVE_ALL_OUT_IGNORE_THE_REST_43849_1469_0 died after 2115 seconds |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2140 Credit: 41,518,559 RAC: 10,612 |
I have had on-going and latest seven errors in a row with roseta. How can this time wasting be reduced? Which machine is this with? You aren't showing any tasks downloaded at all under the account you're posting from... |
ArcSedna Send message Joined: 23 Oct 11 Posts: 16 Credit: 71,462,581 RAC: 58,883 |
Following workunit exitted with error immediately (zero seconds). Rossmann2x3_abinitio_SAVE_ALL_OUT_design_f116_008_50776_354_0 Task ID 511456135 Name Rossmann2x3_abinitio_SAVE_ALL_OUT_design_f116_008_50776_354_0 Workunit 465884806 Created 8 Jun 2012 11:22:44 UTC Sent 8 Jun 2012 11:23:22 UTC Received 8 Jun 2012 11:27:30 UTC Server state Over Outcome Client error Client state Compute error Exit status 1 (0x1) Computer ID 1520448 Report deadline 18 Jun 2012 11:23:22 UTC CPU time 0 stderr out <core_client_version>6.12.43</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> [2012- 6- 8 20:23:44:] :: BOINC:: Initializing ... ok. [2012- 6- 8 20:23:44:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context </stderr_txt> ]]> Validate state Invalid Claimed credit 0 Granted credit 0 application version 3.31 |
Wayne Miller Send message Joined: 10 Feb 06 Posts: 5 Credit: 114,107 RAC: 0 |
Apparently I have spoke too soon. I have had 5 Computation errors in the last 6 hours. This is the last one. Task ID 511459663 Name Rossmann2x3_abinitio_SAVE_ALL_OUT_design_relax_f114_004_50773_49_1 Workunit 465880117 Created 8 Jun 2012 11:38:46 UTC Sent 8 Jun 2012 11:39:01 UTC Received 8 Jun 2012 12:06:07 UTC Server state Over Outcome Client error Client state Compute error Exit status 1 (0x1) Computer ID 1489547 Report deadline 18 Jun 2012 11:39:01 UTC CPU time 0 stderr out <core_client_version>7.0.25</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> [2012- 6- 8 6:39: 4:] :: BOINC:: Initializing ... ok. [2012- 6- 8 6:39: 4:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context </stderr_txt> ]]> Validate state Invalid Claimed credit 0 Granted credit 0 application version 3.31 |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Some of these tasks are erring, this one twice. Rossmann2x3_abinitio_SAVE_ALL_OUT_design_f116_007_50776_1949 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=465999156 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> [2012- 6- 9 22: 4:30:] :: BOINC:: Initializing ... ok. [2012- 6- 9 22: 4:30:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context </stderr_txt> ]]> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Seems like there might be a bad batch of tasks, I've had a few of these Rossmann2x3 type finish o.k. Rossmann2x3_abinitio_SAVE_ALL_OUT_design_f119_007_50782_165_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=465867819 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> [2012- 6-10 15:25:41:] :: BOINC:: Initializing ... ok. [2012- 6-10 15:25:41:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached ERROR: Option matching -relax:fastrelax_repeats not found in command line top-level context </stderr_txt> ]]> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Is there something weird going on with the latest BOINC version and Rosie? I am crunching rb_og_09 units and "elapsed" time is 6hrs+ now and "remaining" time is 5-7 hrs depending on the task. When I open a graphics window the "remaining" time is the cpu time according to the graphics window. So what's up with all these weird time values? I'll reboot my system and see if that changes anything, doubt that it will. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Task ID 511951994 Name rb_06_09_31803_62621__round2_t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_50843_736_0 Workunit 466256435 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x7C812AFB Engaging BOINC Windows Runtime Debugger... 3.2 gigs of memory is not enough???? that's some large task you got going there. better send it to Cray computer or a server farm with tons of memory. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. Had this one error last night, same as the first rig. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=467510749 rb_06_15_31944_62751__t000__0_D2_SAVE_ALL_OUT_IGNORE_THE_REST_51343_3715_1 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> [2012- 6-17 19:10:45:] :: BOINC:: Initializing ... ok. [2012- 6-17 19:10:45:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/input_rb_06_15_31944_62751__t000__0_D2_robetta.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 21600 SIGSEGV: segmentation violation Stack trace (23 frames): [0xa9631b7] [0xf7749400] [0xa145aab] [0xa02859c] [0x9c19ecb] [0x969cf5d] [0x96a3893] [0x96a5cc6] [0x969fa73] [0x96aebfa] [0x9753248] [0x91a498e] [0x9088aae] [0x92cfb9a] [0x8a0da75] [0x940d620] [0x941029a] [0x958c751] [0x95f3e75] [0x95f16a5] [0x80547ed] [0xa9f3148] [0x8048131] Exiting... </stderr_txt> ]]> |
m Send message Joined: 2 May 09 Posts: 12 Credit: 7,903,770 RAC: 2,261 |
Dear all, I have started getting all tasks on two hosts running win2k/boinc 5.10.45 failing "computation couldn't start", stderr is like this:- <core_client_version>5.10.45</core_client_version> <![CDATA[ <message> CreateProcess() failed - </message> ]]> Rosetta home page still shows Win2k as a supported OS, can anyone suggest what's gone wrong? m. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
"m" this sounds like an authority problem with the user you set up to run BOINC. When the BOINC Manager goes to begin a task for a project, it does a create process. Your message indicates this failed. Most likely cause it would fail is Windows security setup for the user that you run BOINC under. Rosetta Moderator: Mod.Sense |
m Send message Joined: 2 May 09 Posts: 12 Credit: 7,903,770 RAC: 2,261 |
"m" this sounds like an authority problem with the user you set up to run BOINC. When the BOINC Manager goes to begin a task for a project, it does a create process. Your message indicates this failed. Most likely cause it would fail is Windows security setup for the user that you run BOINC under. Thanks Mod.Sense. This seems to have appeared with 3.31. No problems previously (nor with other projects) so something seems to have changed. Both hosts run BOINC as a service, I shouldn't need to change this... should I? I have detached and re-attached to Rosetta, but the problem remains. m |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
I have detached and re-attached to Rosetta, but the problem remains. Try to reinstall BOINC (i.e. install it just once again with the same settings, no uninstall needed), that should usually fix any issues like that. . |
m Send message Joined: 2 May 09 Posts: 12 Credit: 7,903,770 RAC: 2,261 |
I have detached and re-attached to Rosetta, but the problem remains. OK, I'll try that although the problem affects two hosts. I've tried running the executables (for minirosetta 3.31 and for beta 5.98) "on their own" as it were, to see if any helpful error messages appeared (like missing dlls). The 5.98 is OK but the 3.31 reports "not a valid win32 application". I have also copied this file from another host running WXP, where it runs OK, to the W2K host and it still fails so it isn't a corrupted file. Maybe they haven't been compiled to run on W2K. I don't know if the admins read these posts; the "project requirements" do list Windows 2000, so it should work. This host is the one I'm trying things out on. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,338,560 RAC: 2,014 |
I have detached and re-attached to Rosetta, but the problem remains. Some programs have installation procedures that load different versions of the software depending on which operating system and which version they run under. |
m Send message Joined: 2 May 09 Posts: 12 Credit: 7,903,770 RAC: 2,261 |
I have detached and re-attached to Rosetta, but the problem remains. That's right. BOINC has comprehensive abilities to send specific versions of applications to particular hosts depending on OS, CPU, Instruction set or whatever. Rosetta seems to send the same version to all Windows hosts. Please, someone prove me wrong, but I think that the answer to this problem is here. If Rosetta is no longer going to support W2K, the least the developers should do is not list it in the Recommended System Requirements and make this clear so that any volunteers using it can detach and go elsewhere. Which is what I'll do. Both I and the helpful people who have answered my questions here have wasted a good bit of time on this. Thanks all. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Please, someone prove me wrong, but I think that the answer to this problem is here. If Rosetta is no longer going to support W2K, the least the developers should do is not list it in the Recommended System Requirements and make this clear so that any volunteers using it can detach and go elsewhere. Well, only the project developers can tell you that. And if that's true, when they remove Win2000 from the list, they might want to remove "The latest version of the BOINC client is recommended.", according to other threads that's wrong as well. . |
Nightwish Send message Joined: 29 Mar 12 Posts: 10 Credit: 307,377 RAC: 0 |
Four Client errors in a row. Here's one: <core_client_version>7.0.25</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> [2012- 7- 5 21:25: 9:] :: BOINC:: Initializing ... ok. [2012- 7- 5 21:25: 9:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/T0722_ab_t000___dekim_06_13.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 # cpu_run_time_pref: 28800 [2012- 7- 5 22:44: 1:] :: BOINC:: Initializing ... ok. [2012- 7- 5 22:44: 1:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev48292.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/T0722_ab_t000___dekim_06_13.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_1 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_2 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_1 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_2 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_3 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_4 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_5 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_6 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_7 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_8 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_9 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage_3_iter1_10 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage4_kk_1 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage4_kk_2 ... success! Continuing computation from checkpoint: chk_S_00000001_ClassicAbinitio__stage4_kk_3 ... success! # cpu_run_time_pref: 28800 Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk1_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk2_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk3_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk4_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk5_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk6_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk7_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk8_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk9_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk10_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk11_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk12_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk13_fa ... success! Continuing computation from checkpoint: chk_S_00000001_FastRelax__chk14_fa ... success! Starting work on structure: _00002 Starting work on structure: _00003 Starting work on structure: _00004 Starting work on structure: _00005 </stderr_txt> ]]> Validate state Invalid Claimed credit 17.1638733855513 Granted credit 0 application version 3.31 FYI, Nightwish |
Message boards :
Number crunching :
Rosetta@Home version 3.31
©2024 University of Washington
https://www.bakerlab.org