Message boards : Number crunching : MiniRosetta 3.17 Problems.
Author | Message |
---|---|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. I've had two different types of tasks error, the same types have been run before on this rig with 3.14 app and not erred. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=418800096 place_CE_20110919_EBOV_GP_2d1v_ProteinInterfaceDesign_31440_359_0 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> ERROR: drSOP ERROR:: Exit from: src/protocols/protein_interface_design/movers/PlaceStubMover.cc line: 1063 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ================================================================================= https://boinc.bakerlab.org/rosetta/workunit.php?wuid=418800129 3filtr5A_CYpa_2aak_ProteinInterfaceDesign_23Aug2011_30588_1098_0 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> ERROR: drSOP ERROR:: Exit from: src/protocols/protein_interface_design/movers/PlaceStubMover.cc line: 1063 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
Shawn Volunteer moderator Project developer Project scientist Send message Joined: 22 Jan 10 Posts: 17 Credit: 53,741 RAC: 0 |
Thanks for letting us know. As you are probably aware, we recently changed our version of Rosetta@home. These current jobs are associated with protocols written for an older version. I did not notice any compatibility problems at the time, but I will do some more testing on these jobs to find out why they didn't work. |
Shawn Volunteer moderator Project developer Project scientist Send message Joined: 22 Jan 10 Posts: 17 Credit: 53,741 RAC: 0 |
Thanks for letting us know. I think we've identified the problem, and the ProteinInterfaceDesign team is now aware of the issue. Thanks once again for your time, your computational resources, and your feedback! |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
See, if you guys would post a summary of this problem on the front page... it'd have a profound effect on users. They'd see that the rosetta team is working... etc. Same goes when the server goes down. Say: "Hey, someone unplugged the servers during last night's party. We'll get that fixed as soon a possible." Or something along those line would be great for people trying to know what's going on. Just my humble advice. |
pieface Send message Joined: 20 Sep 05 Posts: 17 Credit: 797,661 RAC: 0 |
I really don't mind the small things like the DrSOP problem, they tie up some resources for download then upload, but I don't get charged extra for that. But, during the same timeframe I also had something like a dozen ProteinInterfaceDesign and Ploop2x3 run to their full allotted time (6hrs or so depending on how watchdog was feeling) and then when the validator finally got caught-up they were marked as invalid. I had some of these on both machines I had crunching Rosetta - one is a Win XP X64 system and the other a Win7 box, no overclocking at all. Here are a couple of examples - any ideas or anyone else get those kind of results in this last batch? Ploop2x3 Ploop2x3 PID note: edited to take out 'over the weekend'. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Yup, Doctor SOP has a problem :¬) It is unusual for me to get errors, and i have now got fore errors with that in the output. https://boinc.bakerlab.org/rosetta/result.php?resultid=458811930 https://boinc.bakerlab.org/rosetta/result.php?resultid=458960603 https://boinc.bakerlab.org/rosetta/result.php?resultid=459144123 https://boinc.bakerlab.org/rosetta/result.php?resultid=459297330 |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 0 |
lots of errors, stop downloading units https://boinc.bakerlab.org/rosetta/result.php?resultid=459660390 https://boinc.bakerlab.org/rosetta/result.php?resultid=459660074 https://boinc.bakerlab.org/rosetta/result.php?resultid=459660070 https://boinc.bakerlab.org/rosetta/result.php?resultid=459635613 https://boinc.bakerlab.org/rosetta/result.php?resultid=459658860 |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 0 |
More info T0....units seem ok ab_07_19... crashing all 2stubs... crash place_CE_... crash rlx_jsr... OK |
Mad_Max Send message Joined: 31 Dec 09 Posts: 209 Credit: 26,727,148 RAC: 14,272 |
All my WUs with names 3filtr5A_CYpa_ - has compute errors too: https://boinc.bakerlab.org/rosetta/result.php?resultid=458277890 https://boinc.bakerlab.org/rosetta/result.php?resultid=459075255 https://boinc.bakerlab.org/rosetta/result.php?resultid=459077432 https://boinc.bakerlab.org/rosetta/result.php?resultid=459085059 https://boinc.bakerlab.org/rosetta/result.php?resultid=459095484 And all WUs with names ploop2x3_design_ ends with validate errors: https://boinc.bakerlab.org/rosetta/result.php?resultid=458647710 https://boinc.bakerlab.org/rosetta/result.php?resultid=458839191 https://boinc.bakerlab.org/rosetta/result.php?resultid=459110460 |
TJ Send message Joined: 29 Mar 09 Posts: 127 Credit: 4,799,890 RAC: 0 |
All my WU's error out very soon, I got these error messages: ERROR: [ERROR] invalid header input for kill_hairpins file. ERROR:: Exit from: ......srccorescoringSS_Killhairpins_Info.cc line: 370 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish Greetings, TJ. |
.clair. Send message Joined: 2 Jan 07 Posts: 274 Credit: 26,399,595 RAC: 0 |
Yup, I got some dead hairpin file`s as well in the ab_07_19_ series The things you have to do to a protein to make them behave :¬) https://boinc.bakerlab.org/rosetta/result.php?resultid=459619880 https://boinc.bakerlab.org/rosetta/result.php?resultid=459639244 |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Some more errors, different type of tasks others i've had have been running o.k. apart from these. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=401795240 ab_07_19_1fnaA_filtnr_IGNORE_THE_REST_06_08_28682_52_1 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> Starting work on structure: _00001 ERROR: [ERROR] invalid header input for kill_hairpins file. ERROR:: Exit from: src/core/scoring/SS_Killhairpins_Info.cc line: 370 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish Watchdog active. </stderr_txt> ================================================================================== https://boinc.bakerlab.org/rosetta/workunit.php?wuid=401801710 ab_07_19_1acfA_control_IGNORE_THE_REST_03_07_28679_51_0 <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> Starting work on structure: _00001 ERROR: [ERROR] invalid header input for kill_hairpins file. ERROR:: Exit from: src/core/scoring/SS_Killhairpins_Info.cc line: 370 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,144,809 RAC: 333 |
The offending jobs have been removed.But why the *snap* can no sysadmin post some proper info about this in a timely fashion? It's just a matter of simple communication, doesn't even cost much time. :-( Ralf |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
The offending jobs have been removed. Why isn't ralph being used to catch these errors? All workunits I've received from ralph recently have been using app version 3.14. Best, Snags |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,144,809 RAC: 333 |
Why isn't ralph being used to catch these errors? All workunits I've received from ralph recently have been using app version 3.14.Yeah, what RALPH@Home is doing is a bit odd recently. Several times, I got swamped with sets of 20 WUs at a time, and a mix of applications labeled both as "Rosetta Mini Beta 3.17" (currently 2 awaiting their turn) and as "Rosetta Mini 3.14" (another 20 WUs piled up to be eventually being processed). Ralf |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,144,809 RAC: 333 |
RALPH has separate executables for minirosetta (current version of Rosetta@Home) and minirosetta_beta (next version of Rosetta@Home). At the moment, the two applications are identical, despite their different version numbers.And are you sure that everyone's on the same page here? :? Ralf |
robertmiles Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,341,506 RAC: 433 |
Looks like the 3.14 problem with workunits that stop using any CPU time at all but don't tell BOINC that they're finished isn't fully fixed. Does appear to be less frequent, though. Rosetta Mini 3.17 T0552_boinc_alignment_loopbuild_threading_cst_relax_tex_IGNORE_THE_REST_34966_22 CPU time at last checkpoint 01:17:50 CPU time 01:17:51 Elapsed time 25:00:05 Estimated time remaining 60:12:19 Fraction done 10.594% Max RAM usage 95 MB Working set size 546.09 MB No longer using any CPU time, but still claims to be running. 64-bit Vista SP2 with 8 GB; BOINC allowed to use 40% 11/3/2011 1:42:40 AM | | Starting BOINC client version 6.12.34 for windows_x86_64 11/3/2011 1:42:40 AM | | log flags: file_xfer, sched_ops, task 11/3/2011 1:42:40 AM | | Libraries: libcurl/7.21.6 OpenSSL/1.0.0d zlib/1.2.5 11/3/2011 1:42:40 AM | | Data directory: C:ProgramDataBOINC 11/3/2011 1:42:40 AM | | Running under account Bobby 11/3/2011 1:42:40 AM | | Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz [Family 6 Model 23 Stepping 10] 11/3/2011 1:42:40 AM | | Processor: 6.00 MB cache 11/3/2011 1:42:40 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe 11/3/2011 1:42:40 AM | | OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00) 11/3/2011 1:42:40 AM | | Memory: 8.00 GB physical, 15.66 GB virtual 11/3/2011 1:42:40 AM | | Disk: 919.67 GB total, 555.16 GB free 11/3/2011 1:42:40 AM | | Local time is UTC -5 hours 11/3/2011 1:42:40 AM | | NVIDIA GPU 0: GeForce GTS 450 (driver version 28562, CUDA version 4010, compute capability 2.1, 1024MB, 476 GFLOPS peak) Selected workunit length 12 hours. Restarting BOINC lost all but 01:19:52 of the elapsed time. I'l give the workunit one more chance to restart properly; if that isn't adequate, I'll put Rosetta@Home on No new tasks again until the next minirosetta version is ready. I have not seen such a problem with the RALPH@Home 3.18 workunits (6 hour length selected), so I'll continue to run those. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,341,506 RAC: 433 |
Now finished, returned, and in Pending status. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,341,506 RAC: 433 |
The same no-longer-using-CPU-time problem is also present in another workunit. T0538_boinc_rosetta_cm_medal_ss_v2_cmiles_IGNORE_THE_REST_34758_10367 CPU time at last checkpoint 02:06:31 CPU time 02:07:46 Elapsed time 03:11:44 Fraction done 16.687% Boinc manager claims it is running, but Windows task manager says it is using no CPU time at all. 11/6/2011 6:23:11 PM | | Starting BOINC client version 6.12.34 for windows_x86_64 11/6/2011 6:23:11 PM | | log flags: file_xfer, sched_ops, task 11/6/2011 6:23:11 PM | | Libraries: libcurl/7.21.6 OpenSSL/1.0.0d zlib/1.2.5 11/6/2011 6:23:11 PM | | Data directory: C:ProgramDataBOINC 11/6/2011 6:23:11 PM | | Running under account Bobby 11/6/2011 6:23:11 PM | | Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz [Family 6 Model 23 Stepping 10] 11/6/2011 6:23:11 PM | | Processor: 6.00 MB cache 11/6/2011 6:23:11 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx smx tm2 pbe 11/6/2011 6:23:11 PM | | OS: Microsoft Windows Vista: Home Premium x64 Edition, Service Pack 2, (06.00.6002.00) 11/6/2011 6:23:11 PM | | Memory: 8.00 GB physical, 15.80 GB virtual 11/6/2011 6:23:11 PM | | Disk: 919.67 GB total, 527.06 GB free 11/6/2011 6:23:11 PM | | Local time is UTC -6 hours 11/6/2011 6:23:11 PM | | NVIDIA GPU 0: GeForce GTS 450 (driver version 28562, CUDA version 4010, compute capability 2.1, 1024MB, 476 GFLOPS peak) I'm about to restart BOINC to give that workunit another chance to restart properly, but I've already set No new tasks for Rosetta@home on that computer. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,341,506 RAC: 433 |
The restart made that workunit return quickly, with 99 decoys done; now in a pending state. Could that mean that 3.17 has trouble doing something reasonable after it finishes 99 decoys? Some of the previous versions of minirosetta did. |
Message boards :
Number crunching :
MiniRosetta 3.17 Problems.
©2025 University of Washington
https://www.bakerlab.org