Message boards : Number crunching : Problems with Rosetta version 5.68 and 5.70
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Metalsmith Send message Joined: 21 Apr 07 Posts: 6 Credit: 8,271 RAC: 0 |
Unfortunately, we can't label the new app plain "rosetta" because we need to keep the name of the stable app "rosetta". But I agree, maybe "rosetta_new" would be a better name than "rosetta_beta"... I'll talk to David K. about this. The new beta is hanging on my machine. Just sits there saying it's running, but doing nothing. Metalsmith |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Metalsmith: More detail will be needed to help with your question. I see you are running on a Mac. Where are you looking for an indication it is running? Is that the status shown in the BOINC Manager tasks tab? Or the status shown by the Mac OS? How do you know it is doing nothing? Is Mac OS showing the Rosetta task with a zero CPU %? Or is BOINC Manager showing the same CPU time? Rosetta Moderator: Mod.Sense |
snooptodd Send message Joined: 16 Dec 05 Posts: 2 Credit: 1,229,606 RAC: 0 |
This work unit froze and I aborted it. https://boinc.bakerlab.org/rosetta/result.php?resultid=90860596 No cpu utilization and cpu time not incrementing. It was using the 1.70 client on a current Ubuntu box. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Hey snoop. Did the task show a status of "running" in the BOINC Manager tasks tab in the advanced view? Or did it get suspended or "waiting for memory" when CPU time stopped incrementing? Rosetta Moderator: Mod.Sense |
Metalsmith Send message Joined: 21 Apr 07 Posts: 6 Credit: 8,271 RAC: 0 |
Metalsmith: More detail will be needed to help with your question.Einstein is the one with zero cpu %. Rosetta has a statit cpu time |
snooptodd Send message Joined: 16 Dec 05 Posts: 2 Credit: 1,229,606 RAC: 0 |
Hey snoop. Did the task show a status of "running" in the BOINC Manager tasks tab in the advanced view? Or did it get suspended or "waiting for memory" when CPU time stopped incrementing?The task showed running. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I was the second to have a problem with this one, errored after about 1 second. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=84069006 It was on 5.70B app. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
BENCH_051207_JUMPING_SAVE_ALL_OUT_-2tif_-_NATIVE_PAIR_5_44_1_1_BARCODE_R25E_filters_1807_29280_0_0 crashed my system when it was done and attempting to make a gzip file which generated this error in the results found in the above link. WARNING! attempt to gzip file .cc2tif.out failed: file does not exist. i got credit for it but it looks like you might not get all your data. |
Stevea Send message Joined: 19 Dec 05 Posts: 50 Credit: 738,655 RAC: 0 |
Here's one that never even got started on 2 different rigs. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=85064822 BETA = Bahhh Way too many errors, killing both the credit & RAC. And I still think the (New and Improved) credit system is not ready for prime time... |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
Both of these results failed when I tried to manually suspend them with BOINC manager. I'm not sure the Rosetta version; perhaps I could check my logs.... https://boinc.bakerlab.org/rosetta/result.php?resultid=93923269 https://boinc.bakerlab.org/rosetta/result.php?resultid=94623192 My computers are hidden because...it's a long story. I'm running Fedora Core 7 Linux kernel 2.6.21 on an x86_64 (but using the 32-bit version of BOINC). |
Marcel Koopmans Send message Joined: 4 Aug 06 Posts: 8 Credit: 1,134,689 RAC: 0 |
Hello, I also have many problems on 32 and 64 bit linux. I run Debian 4.0 amd64 on a Core 2 Duo ( host id = 304690 ). In the last 30 days I lost over 2 days in computing time! Till yesterday I run 32bit boinc using extra libs. Yesterday I did install 64 bit boinc and did a reset of all the projects. As far as I can see the number of compute errors has not dropped. I now also run my own scripting to automatically kill and restart boinc. ( if load / core < 90% or > 110% ) And it kills all "rosetta" and "setiathome" processes with init as parent. Even with these kind of tricks my machine cannot compete with a Window$ installation. I have the feeling that on macOSX the softawre is getting more stable. Marcel |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
Got another one crash with Rosetta Beta 5.70: https://boinc.bakerlab.org/rosetta/result.php?resultid=95296120 This one crashed with code 193 immediately after my computer became "active" and BOINC tried to pause all the running tasks. Here's the output from BOINC: 2007-07-26 07:41:26 [climateprediction.net] [task_debug] result hadcm3inct_cmuo_1920_160_35869820_1 checkpointed 2007-07-26 07:43:56 [rosetta@home] [task_debug] result 1ten__BOINC_ABINITIO_SAVE_ALL_OUT-1ten_-frags83__1838_4395_0 checkpointed 2007-07-26 07:44:29 [SETI@home] Sending scheduler request: Requested by user 2007-07-26 07:44:29 [SETI@home] Reporting 1 tasks 2007-07-26 07:44:34 [SETI@home] Scheduler RPC succeeded [server version 511] 2007-07-26 07:44:34 [SETI@home] Deferring communication for 11 sec 2007-07-26 07:44:34 [SETI@home] Reason: requested by project 2007-07-26 07:47:20 [---] Suspending computation - user is active 2007-07-26 07:47:20 [climateprediction.net] [task_debug] task_state=QUIT_PENDING for hadcm3inct_cmuo_1920_160_35869820_1 from preempt 2007-07-26 07:47:20 [rosetta@home] [task_debug] task_state=QUIT_PENDING for 1ten__BOINC_ABINITIO_SAVE_ALL_OUT-1ten_-frags83__1838_4395_0 from preempt 2007-07-26 07:47:21 [rosetta@home] [task_debug] Process for 1ten__BOINC_ABINITIO_SAVE_ALL_OUT-1ten_-frags83__1838_4395_0 exited 2007-07-26 07:47:21 [rosetta@home] [task_debug] task_state=EXITED for 1ten__BOINC_ABINITIO_SAVE_ALL_OUT-1ten_-frags83__1838_4395_0 from handle_exited_app 2007-07-26 07:47:21 [rosetta@home] Deferring communication for 1 min 0 sec 2007-07-26 07:47:21 [rosetta@home] Reason: Unrecoverable error for result 1ten__BOINC_ABINITIO_SAVE_ALL_OUT-1ten_-frags83__1838_4395_0 (process exited with code 193 (0xc1)) 2007-07-26 07:47:21 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for 1ten__BOINC_ABINITIO_SAVE_ALL_OUT-1ten_-frags83__1838_4395_0 from CS::report_result_error 2007-07-26 07:47:21 [rosetta@home] [task_debug] exit status 193 2007-07-26 07:47:21 [rosetta@home] Computation for task 1ten__BOINC_ABINITIO_SAVE_ALL_OUT-1ten_-frags83__1838_4395_0 finished 2007-07-26 07:47:21 [rosetta@home] Output file 1ten__BOINC_ABINITIO_SAVE_ALL_OUT-1ten_-frags83__1838_4395_0_0 for task 1ten__BOINC_ABINITIO_SAVE_ALL_OUT-1ten_-frags83__1838_4395_0 absent 2007-07-26 07:47:21 [rosetta@home] [task_debug] result state=COMPUTE_ERROR for 1ten__BOINC_ABINITIO_SAVE_ALL_OUT-1ten_-frags83__1838_4395_0 from CS::app_finished Cleaning up graphics data... Detaching shared memory... 2007-07-26 07:47:22 [climateprediction.net] [task_debug] Process for hadcm3inct_cmuo_1920_160_35869820_1 exited 2007-07-26 07:47:22 [climateprediction.net] [task_debug] task_state=UNINITIALIZED for hadcm3inct_cmuo_1920_160_35869820_1 from handle_exited_app 2007-07-26 07:47:22 [climateprediction.net] [task_debug] exit status 0 |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
Stacey Baird: I read that you are running the same 4 project that I am. Seti, Rosetta, ClimatePrediction, and Einstein. Would you please post what your resource shares for each are? I'm trying to find a happy balance between all these project (or perhaps detach from a couple). Thanks. |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
Got another one crash with Rosetta Beta 5.70: Sorry, this error was for Rosetta Beta 5.72. I believe the same thing has happened with 5.70 with previous workunits of mine, so there's still some debugging to do on Linux before 5.72 can be a final release. |
TeAm Enterprise Send message Joined: 28 Sep 05 Posts: 18 Credit: 27,911,735 RAC: 26 |
This and several other units hung and cycled for hours and days. Surely the software should be able to detect this. Below is just a snippet of a message repeated every 30 seconds or so for over 8 hours. [i]7/28/2007 8:04:50 AM|rosetta@home|Task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 exited with zero status but no 'finished' file 7/28/2007 8:04:50 AM|rosetta@home|If this happens repeatedly you may need to reset the project. 7/28/2007 8:04:50 AM|rosetta@home|Restarting task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 using rosetta version 568 7/28/2007 8:05:31 AM|rosetta@home|Task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 exited with zero status but no 'finished' file 7/28/2007 8:05:31 AM|rosetta@home|If this happens repeatedly you may need to reset the project. 7/28/2007 8:05:31 AM|rosetta@home|Restarting task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 using rosetta version 568 7/28/2007 8:06:13 AM|rosetta@home|Task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 exited with zero status but no 'finished' file 7/28/2007 8:06:13 AM|rosetta@home|If this happens repeatedly you may need to reset the project. 7/28/2007 8:06:13 AM|rosetta@home|Restarting task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 using rosetta version 568 7/28/2007 8:06:54 AM|rosetta@home|Task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 exited with zero status but no 'finished' file 7/28/2007 8:06:54 AM|rosetta@home|If this happens repeatedly you may need to reset the project. 7/28/2007 8:06:54 AM|rosetta@home|Restarting task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 using rosetta version 568 7/28/2007 8:07:35 AM|rosetta@home|Task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 exited with zero status but no 'finished' file 7/28/2007 8:07:35 AM|rosetta@home|If this happens repeatedly you may need to reset the project. 7/28/2007 8:07:35 AM|rosetta@home|Restarting task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 using rosetta version 568 7/28/2007 8:08:16 AM|rosetta@home|Task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 exited with zero status but no 'finished' file 7/28/2007 8:08:16 AM|rosetta@home|If this happens repeatedly you may need to reset the project. 7/28/2007 8:08:16 AM|rosetta@home|Restarting task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 using rosetta version 568 7/28/2007 8:08:57 AM|rosetta@home|Task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 exited with zero status but no 'finished' file 7/28/2007 8:08:57 AM|rosetta@home|If this happens repeatedly you may need to reset the project. 7/28/2007 8:08:57 AM|rosetta@home|Restarting task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 using rosetta version 568 7/28/2007 8:09:38 AM|rosetta@home|Task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 exited with zero status but no 'finished' file 7/28/2007 8:09:38 AM|rosetta@home|If this happens repeatedly you may need to reset the project. 7/28/2007 8:09:38 AM|rosetta@home|Restarting task BENCH_051207_JUMPING_SAVE_ALL_OUT_-1iibA-_NATIVE_PAIR_53_75_2_1_filters_1804_18348_0 using rosetta version 568[I] |
Jack Send message Joined: 19 Feb 07 Posts: 11 Credit: 521,099 RAC: 0 |
Latest work unit got stalled for a while. I am sharing time with Rosetta and SETI with a one hour swap timer. At 19:44 the Rosetta work unit pauses while SETI runs for an hour. When Rosetta resumes, it starts over at 18:44. This happens repeatedly. I finally had to suspend the SETI work unit until Rosetta finished. Rosetta ver 5.68 Job name: BENCH_051207_ABRELAX_SAVE_ALL_OUT_-2chf_-_BARCODE_R74_filters_1804_16911_0 Work Unit: 85773166 Result ID: 94765248 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Jack, suggest you change your general preferences to "leave application in memory while suspended YES". You must have run in to a task that needs more then an hour to reach a checkpoint. This way it will keep things in the swap file and pick up where it left off. Rosetta Moderator: Mod.Sense |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Work unit 86705477 stuck at 65.70% for a day on Rosetta 5.58 on an iMac running 10.4.10 : aborting. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 2 |
2 crashed wu this morning, one after about 2.5 hours, the otehr after ~35 minutes. Result ID 96120974 Name BENCH_051207_JUMPING_SAVE_ALL_OUT_-1hz6A-_NATIVE_PAIR_7_57_2_1_filters_1858_3126_1 Workunit 84975870 Created 30 Jul 2007 1:12:22 UTC Sent 30 Jul 2007 1:12:46 UTC Received 30 Jul 2007 6:59:34 UTC Server state Over Outcome Client error Client state Compute error Exit status 193 (0xc1) Computer ID 544079 Report deadline 9 Aug 2007 1:12:46 UTC CPU time 10279.558432 stderr out <core_client_version>5.8.16</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1) </message> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 # random seed: 2352845 SIGSEGV: segmentation violation Stack trace (20 frames): [0x8cdfdab] [0x8cdabdc] [0xb7f76420] [0x8b253ce] [0x8b3229f] [0x8b51734] [0x804c638] [0x85f687a] [0x8748441] [0x804e1c6] [0x8c33bfb] [0x8942222] [0x8604c80] [0x88773f5] [0x892a2c4] [0x85c461e] [0x86ed9a6] [0x86edac6] [0x8d43ca4] [0x8048111] Exiting... </stderr_txt> ]]> Validate state Invalid Claimed credit 14.8806448917294 Granted credit 0 application version 5.68 ---------------------------------------------------- Result ID 96127994 Name 1r69__BOINC_GENERIC__ABRELAX-1r69_-generic__1870_8796_0 Workunit 87041574 Created 30 Jul 2007 2:05:34 UTC Sent 30 Jul 2007 4:05:31 UTC Received 30 Jul 2007 6:59:34 UTC Server state Over Outcome Client error Client state Compute error Exit status 193 (0xc1) Computer ID 544079 Report deadline 9 Aug 2007 4:05:31 UTC CPU time 2376.856544 stderr out <core_client_version>5.8.16</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1) </message> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 10800 # random seed: 2191065 SIGSEGV: segmentation violation Stack trace (14 frames): [0x8d3bd1b] [0x8d36b4c] [0xb7f54420] [0x8ca6153] [0x8bac6ba] [0x8bb26c0] [0x8c8db88] [0x84b42d8] [0x80d8651] [0x85ee9b7] [0x871c6e3] [0x871c78e] [0x8d9fc14] [0x8048111] Exiting... </stderr_txt> ]]> Validate state Invalid Claimed credit 3.44072738375063 Granted credit 0 application version 5.72 Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
Riddle me this, Batman. Rosetta 5.68 when selecting "Suspend" from the Activity menu, suspends the task. But Rosetta 5.73 task goes into QUIT_PENDING, and again as I've posted several times, does not UNINITIALIZE (remove itself from memory). My setting for Leave applications in memory while suspended? = no. So, it appears 5.68 does not honor my global settings "sometimes". I do not understand this. If I kill [pid], the 5.68 workunit restarts at its previous checkpoint as it should. This is the computer I'm talking about: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=565755 |
Message boards :
Number crunching :
Problems with Rosetta version 5.68 and 5.70
©2024 University of Washington
https://www.bakerlab.org