minirosetta 2.05

Author	Message
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0	Message 65480 - Posted: 7 Mar 2010, 11:42:31 UTC My first Protein_interface (validation related?) error as far as I know - MacOS 10.5: tyrsim_3gbn_2esa_Protein_interface_design_01Feb2010_17949_9_2 Outcome Success Client state Done Exit status 0 (0x0) CPU time 21540.8 <core_client_version>6.10.36</core_client_version> <![CDATA[ <stderr_txt> [...] # cpu_run_time_pref: 21600 ====================================================== DONE :: 327 starting structures 21540.3 cpu seconds This process generated 327 decoys from 327 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Workunit error - check skipped One of two wingmen validated successfully after his deadline, but with far fewer decoys completed. ID: 65480 · Rating: 0 · rate: / Reply Quote

AdeB Send message Joined: 12 Dec 06 Posts: 45 Credit: 4,428,086 RAC: 0	Message 65481 - Posted: 7 Mar 2010, 21:53:11 UTC - in response to Message 65480. Last modified: 7 Mar 2010, 22:00:19 UTC My first Protein_interface (validation related?) error as far as I know - MacOS 10.5: tyrsim_3gbn_2esa_Protein_interface_design_01Feb2010_17949_9_2 Outcome Success Client state Done Exit status 0 (0x0) CPU time 21540.8 <core_client_version>6.10.36</core_client_version> <![CDATA[ <stderr_txt> [...] # cpu_run_time_pref: 21600 ====================================================== DONE :: 327 starting structures 21540.3 cpu seconds This process generated 327 decoys from 327 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Workunit error - check skipped One of two wingmen validated successfully after his deadline, but with far fewer decoys completed. There is nothing wrong on your end. This is a very old (and rare) bug in the boinc server software. Take a look here. Wait a second, the trac item claims that the bug is fixed. Maybe it is time for Rosetta to update the server-code. AdeB ID: 65481 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5756 Credit: 6,089,880 RAC: 1,511	Message 65482 - Posted: 7 Mar 2010, 22:49:27 UTC https://boinc.bakerlab.org/rosetta/result.php?resultid=322413556 tyrsim_3gbn_q.gz_Protein_interface_design_25Feb2010_18415_276_1 Outcome Client error Client state Compute error Exit status 1 (0x1) CPU time 4.4375 stderr out <core_client_version>6.10.18</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) ID: 65482 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 65492 - Posted: 9 Mar 2010, 0:22:41 UTC Looks like there are still problems with this app, same task it just restarted near the end and i got it in the neck, not impressed. tyrsim_3gbn_1c81_Protein_interface_design_25Feb2010_18415_410_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=294414088 # cpu_run_time_pref: 14400 ====================================================== DONE :: 348 starting structures 14397.5 cpu seconds This process generated 348 decoys from 348 attempts ====================================================== # cpu_run_time_pref: 14400 ====================================================== DONE :: 2 starting structures 14498.9 cpu seconds This process generated 2 decoys from 2 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Valid Claimed credit 102.297287162446 Granted credit 0.384433279143336 application version 2.05 ID: 65492 · Rating: 0 · rate: / Reply Quote

apohawk Send message Joined: 13 Sep 08 Posts: 5 Credit: 30,438,070 RAC: 0	Message 65530 - Posted: 12 Mar 2010, 10:55:16 UTC This work unit reports "success" despite having errors in the end. https://boinc.bakerlab.org/rosetta/result.php?resultid=323517090 application: minitosetta 2.05 name of work unit: ina2inaN_to_NOE__18638_5045_0 Outcome: Success Exit status: 0 (0x0) CPU time: 2212.594 but at the end of the result we got: # cpu_run_time_pref: 7200 ERROR: Unrecognized edge type! ERROR:: Exit from: ....srccorekinematicsutil.cc line: 1422 called boinc_finish CPU: Phenom II 945 OS: WinXP 64 SP2 ID: 65530 · Rating: 0 · rate: / Reply Quote

Duzz Send message Joined: 14 Nov 05 Posts: 1 Credit: 13,148 RAC: 0	Message 65544 - Posted: 13 Mar 2010, 13:16:48 UTC Last modified: 13 Mar 2010, 13:17:53 UTC During the last days I had several WUs staying idle after some time of computation. Windows XP task manager shows no CPU activity. If one does not notice this, many hours of WU processing get lost, which is very unproductive for the project. ID: 65544 · Rating: 0 · rate: / Reply Quote

AdeB Send message Joined: 12 Dec 06 Posts: 45 Credit: 4,428,086 RAC: 0	Message 65547 - Posted: 13 Mar 2010, 22:39:05 UTC In workunit gunn_fragments_SAVE_ALL_OUT_-1wtyA__18642_1106 both tasks (324092645 and 323994500) ended with the same error: ERROR: ct == final_atoms ERROR:: Exit from: ....srccorescoringrms_util.cc line: 397 BOINC:: Error reading and gzipping output datafile: default.out AdeB ID: 65547 · Rating: 0 · rate: / Reply Quote

Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 29,147,785 RAC: 9,656	Message 65555 - Posted: 15 Mar 2010, 3:44:52 UTC Today I got strange validation errors: "Task was reported too late to validate" But there are 4 days until deadline (19 Mar)! Links to the tasks: https://boinc.bakerlab.org/rosetta/result.php?resultid=323161767 https://boinc.bakerlab.org/rosetta/result.php?resultid=323181972 https://boinc.bakerlab.org/rosetta/result.php?resultid=323205144 ID: 65555 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 65561 - Posted: 15 Mar 2010, 23:09:15 UTC What is odd is the way the tasks were reissued before he reported the completed ones back. That wouldn't normally happen. That isn't dependent upon Mad Max's machine, so I doubt they did a restore or anything. I'll have to see what we can find out. Rosetta Moderator: Mod.Sense ID: 65561 · Rating: 0 · rate: / Reply Quote

Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 29,147,785 RAC: 9,656	Message 65564 - Posted: 16 Mar 2010, 2:38:42 UTC - in response to Message 65560. I see more things happened during the weekend. I'm seeing you detached or something which might cause all incomplete units to report. Did you perhaps restore a backup which tried to continue from an earlier point? Error with "detached" is boinc related. Actually I have not detached from the project, but rather connect a new computer. But after that boinc client initially goes mad - first it started to download to the new computer(Athlon II X2 250 ) tasks have already downloaded to old computer (Athlon XP 2600+), then at some point, thought better of it and register new computer on the server under a new ID, and than deleted mistakenly downloaded tasks. (I think this point and recorded on the server as "detached"). Note: there was no transfer of any boinc-related files from old computer to new one. The new client was a clean install from the distrib. So I do not know what caused this behavior. Maybe the fact that the computer is connect to internet under same ip? Hmm, now I think that in principle, such an validate error could happen because of it. If one computer "cancels" the tasks(mistakenly downloaded), while the second worked on its, the server can issue the same WU to another volunteer computer and shift deadline time? ID: 65564 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 65567 - Posted: 16 Mar 2010, 16:32:04 UTC True, not a problem specific to v2.05 Rosetta. Perhaps BOINC server, or client. Either way, we should start another thread if further problem tasks are found. Certainly many users that have multiple machines are connecting from same IP address (I'm talking the router's public IP address that the project servers see). And many other users come in via dynamic IPs, and so it is always different. My understanding is that BOINC uses many factors to determine if a given machine is the same as an existing registered one to keep it all straight and separated correctly. Factors such as the user ID, host name, any existing BOINC host ID, machine type, installed OS, last RPC sequence number... so a fresh install should not have caused the client to "go mad" on either machine. Indeed many users have identically configured machines at same site coming in via same IP. Rosetta Moderator: Mod.Sense ID: 65567 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 65570 - Posted: 17 Mar 2010, 3:07:31 UTC This took 8hrs, 2min on my 3ghz intel, four hour run time. aqp9__boinc_aqp9_fast_run01_yfsong_loopbuild_threading_cst_relax_superfast_yfsong_IGNORE_THE_REST_18658_1421_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=296064742 # cpu_run_time_pref: 14400 Continuing computation from checkpoint: chk_S_2B6OA_15_0001_Remodel__loop_1_0_0_S ... success! BOINC:: CPU time: 28914.7s, 14400s + 14400s[2010- 3-17 13:39:17:] :: BOINC InternalDecoyCount: 0 ====================================================== DONE :: 1 starting structures 28914.7 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== called boinc_finish SIGSEGV: segmentation violation Stack trace (15 frames): [0x96c49b3] [0x96ee888] [0xb7fe9420] [0x91d6455] [0x842671e] [0x83e85d3] [0x80a7840] [0x84381fe] [0x812a54a] [0x812b82d] [0x86aa16b] [0x8243cf5] [0x8049897] [0x974c15c] [0x8048121] Exiting... </stderr_txt> ]]> Validate state Valid Claimed credit__69.3077894676244 Granted credit__25.52312719487 -- for 8hrs. ID: 65570 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2389 Credit: 45,744,984 RAC: 21,059	Message 65572 - Posted: 17 Mar 2010, 3:22:49 UTC Last modified: 17 Mar 2010, 3:24:29 UTC On this desktop I got a Compute error Exit status -177 (0xffffff4f) in the following task: aqp9__boinc_aqp9_fast_run01_blast_yfsong_loopbuild_threading_cst_relax_superfast_yfsong_IGNORE_THE_REST_18653_30510_0 <message> Maximum disk usage exceeded </message> I did notice while it was running it was about 2 hours over my 8 hour runtime, on Model 6 Step 19051, but it reported 0 CPU time in the end. I allow 10Gb disk space for Boinc and have about 581Mb in use on 5 current or waiting tasks, 9.43Gb free. Also, on this laptop I got a validate error on the following task a few days back: t290__boinc_filtered_loopbuild_threading_cst_lb_tex_IGNORE_THE_REST_16900_8451_0 ID: 65572 · Rating: 0 · rate: / Reply Quote

Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 29,147,785 RAC: 9,656	Message 65575 - Posted: 17 Mar 2010, 14:13:21 UTC 2 Mod.Sense Yes, it is certainly not a problem with minirosetta 2.05. It looks like some rare bug with boinc server. Probably connected with the fact that the computer had the same ip (not only "external" router ip, but internal too) and same network name. The new computer was a replacement of old, so I called the new as well as the previous one, before that renaming the old one. Actually, this should not be a factor, because boinc used to identify the internal id (such as 1211592) and not windows names. But the bug is a bug and that something is not go as intended :) In any case, now more such errors do not come across, so I think this can be forgotten. 2 Sid Celery I also had a lot of errors in tasks such as __boinc_filtered_loopbuild_threading_. In fact, every second job terminated by an error. And violating the target CPU time in each of the first (ie all tasks of this type) + strange looking things in graphics part (such as RMSD from 20 to 50 and odd-looking models) So now I am canceling all jobs of this type, if i see them in the job queue. ID: 65575 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 65576 - Posted: 17 Mar 2010, 20:51:24 UTC Sid, each task also has a configured maximum disk space. So that must be the limit that was hit by the task you mention. This is just one more failsafe that is in place to help assure things keep running smoothly. Rosetta Moderator: Mod.Sense ID: 65576 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2389 Credit: 45,744,984 RAC: 21,059	Message 65578 - Posted: 17 Mar 2010, 21:03:05 UTC - in response to Message 65575. I also had a lot of errors in tasks such as __boinc_filtered_loopbuild_threading_. In fact, every second job terminated by an error. And violating the target CPU time in each of the first (ie all tasks of this type) + strange looking things in graphics part (such as RMSD from 20 to 50 and odd-looking models) So now I am canceling all jobs of this type, if I see them in the job queue. It's the only error I've had in the last week on that W7 laptop, and credit was granted in the clean-up job, so I'm not worried by it - I don't understand any of these validate errors but while I was reporting the other one I thought I'd just mention it. I don't think my errors are the same as yours in that case. I'm more surprised by the disk-usage issue on the Vista desktop which is otherwise very well behaved. I did suspect the task type, but others have gone through now with no problem at all, so maybe it just went a bit 'rogue' on me. I just thought it was worth describing seeing as I noticed it was a bit odd while running for 10 hours, yet the task details didn't indicate anything more than it failed on startup, which wasn't actually the case. One for the backroom team to ponder. ID: 65578 · Rating: 0 · rate: / Reply Quote

svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0	Message 65655 - Posted: 28 Mar 2010, 0:24:31 UTC Miscellaneous computation errors: ---- 327069193 (v2FcInnerW_1dAl_3GM3_ProteinInterfaceDesign_15Mar2010_18672_254_0) failed on Mac OS X. Similar failure from wingman. ERROR: f.check_fold_tree() ERROR:: Exit from: src/protocols/docking/DockingProtocol.cc line: 405 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> ---- 326722657 (placestub_alt_denovo_1zvy_1z2m_ProteinInterfaceDesign_21Mar2010_18705_22_0) failed on W7 ERROR: in::file::zip minirosetta_database.zip does not exist! ERROR:: Exit from: ....srcappspublicboincminirosetta.cc line: 137 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ---- 326721814 (tedor-cs_-tdonly-1-calbindin__18708_33_1) failed on W7. Similar failure from wingman. ERROR: rsd_type_list.size() ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> ID: 65655 · Rating: 0 · rate: / Reply Quote

Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0	Message 65658 - Posted: 28 Mar 2010, 8:01:15 UTC Last modified: 28 Mar 2010, 8:09:45 UTC 326722657 (placestub_alt_denovo_1zvy_1z2m_ProteinInterfaceDesign_21Mar2010_18705_22_0) failed on W7 ERROR: in::file::zip minirosetta_database.zip does not exist! ERROR:: Exit from: ....srcappspublicboincminirosetta.cc line: 137 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish Add me to the list with tedor-cs_-tdonly-1-gb3__18708_4647 ERROR: rsd_type_list.size() ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62 BOINC:: Error reading and gzipping output datafile: default.out ID: 65658 · Rating: 0 · rate: / Reply Quote

allenandholmes Send message Joined: 17 Dec 07 Posts: 1 Credit: 7,563 RAC: 0	Message 65659 - Posted: 28 Mar 2010, 8:17:07 UTC I have been processing my current minirosetta task for 4 or 5 days now and have had a suspicion about its checkpointing capabilities. I shut my PC down each night and restart it the next morning for BOINC processing. However the elapsed time displayed resets to 0, the time to completion continues to increase all day long (and between sessions) and the processed percentage is dramatically different from a ratio of elapsed/completion times. Am I wasting my time? ID: 65659 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2389 Credit: 45,744,984 RAC: 21,059	Message 65663 - Posted: 28 Mar 2010, 15:14:17 UTC One unusual error I haven't seen before - W7-64bit laptop: Rossmann3x3_abinitio_SAVE_ALL_OUT_design_k031_001_18698_1551_0 Outcome Client error Client state Compute error Exit status 1 (0x1) [...] <core_client_version>6.10.36</core_client_version> [...] # cpu_run_time_pref: 28800 Starting work on structure: _00018 Continuing computation from checkpoint: chk_S_00000018_ClassicAbinitio__stage_3_iter1_10 ... success! Continuing computation from checkpoint: chk_S_00000018_ClassicAbinitio__stage4_kk_1 ... success! Continuing computation from checkpoint: chk_S_00000018_ClassicAbinitio__stage4_kk_2 ... success! std::cerr: Exception was thrown: no success reading silent file chk_S_00000018_ClassicAbinitio__stage4_kk_3.out ID: 65663 · Rating: 0 · rate: / Reply Quote