Message boards : Number crunching : Problems with Minirosetta v1.54
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 15 · Next
Author | Message |
---|---|
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,214,047 RAC: 1,450 |
mikey, have you tried a different version of BOINC? I only have one project per pc, but I will add a second if the first is having workunit issues. All machines have at least a 20 gig hard drive but most have a 100 gig or bigger hard drive. The one above is a laptop with a 50 gig hard drive with almost 30 gig free. I have Boinc setup to use no more than 50% of the free hard drive space and don't have any issues with space. |
epcorian Send message Joined: 1 Jan 09 Posts: 16 Credit: 253,062 RAC: 0 |
So I took Mod.Sense's advice and downgraded to the 6.2.19 64-bit version of the BOINC client and so far so good with the mini's, I've crunched with 30 minutes thus far and no errors yet, much better then the 30-60 seconds I was getting before. I think I spoke too soon...that first WU crunched successfully but only 1 other was WU successful out of the 8 WU's. 2/8, better but still not good. I might try replacing Vista 64 with XP 64 another weekend when I'm bored. Just for curiosity sake I had my P4 and Atom 330 PC's running 32-bit XP SP3 crunch some Mini's and they did just fine. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Hello. I'd suggest allowing it to run normally. Was it still using CPU time? If you want to kind of cut it off, but get it to report in, let it run, then exit (not close) BOINC and restart it, let it run about 2 minutes, then exit again and restart, until you've done that 5 times and the task should be ended and report in with "too many restarts". Rosetta Moderator: Mod.Sense |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Hola, Hola Juan, I was able to translate his message and basically, he's been having problems with Mini, including the lastest version. He wishes Rosetta had subprojects, so he could select to crunch only the RosettaBeta application instead of mini. Looking at his 2 failed tasks, they both have Exit status -226 and the Can't acquire lockfile errors. He is running Win Vista x86. I know some of you have had these lock file problems as well. Were they always with WinVista? And I thought the v1.54 release of mini had resolved these issues. Can any of you that have had the problem suggest the best steps for Juan to take to resolve it? You might even convert your reply to Spanish as best we can using a tool like this: http://dictionary.reference.com/translate/text.html Rosetta Moderator: Mod.Sense |
Fishead Send message Joined: 3 Sep 08 Posts: 7 Credit: 89,566 RAC: 0 |
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=206610287 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=206617445 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=206618707 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=204395981 According to the graphics screen of these four WUs, every "accepted" step becomes the new low energy state. No matter if the energy value is smaller or higher... |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
*I* cured the lock file problem by running with 100% time ... if he has opted to run at some lower percentage of CPU time this may be the issue. Something else to try ... and if it works we can report another success ... this is one of the issues that we have been trying to pin down in RALPH... |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
|
Klimax Send message Joined: 27 Apr 07 Posts: 44 Credit: 2,800,788 RAC: 68 |
Hello. OK,set runtime at 8hours,so watchdog would cut it at 24hours.It has now uploaded and reported it.I have dump files as well,if somebody in team is interested.(Captured at reported time and step) And I see I was not alone... :-( |
Arkadiusz Dykiel Send message Joined: 13 Aug 06 Posts: 3 Credit: 12,823,537 RAC: 0 |
Hi, The work units exit with status code 193 (0xc1). Rosetta 5.98 and other projects work OK. Do I miss something? Some library. Full error report below: Server state Over Outcome Client error Client state Compute error Exit status 193 (0xc1) CPU time 0 <core_client_version>6.2.15</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> BOINC:: Initializing ... ok. [2009- 2- 8 1:29: 8:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. *** glibc detected *** corrupted double-linked list: 0x093544cc *** SIGABRT: abort called Stack trace (15 frames): [0x8f88f07] [0x8fb3778] [0xb7fff420] [0x9016944] [0x902c693] [0x90310d2] [0x9031c84] [0x903353d] [0x9000ec7] [0x81bed6d] [0x81bee1d] [0x8195f15] [0x8048e93] [0x900f84c] [0x8048111] Exiting... </stderr_txt> ]]> |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
As of v1.54, the watchdog kicks in at runtime pref. plus 4 hours. So, no longer 3 times runtime preference. Rosetta Moderator: Mod.Sense |
Andreas Send message Joined: 22 Sep 08 Posts: 1 Credit: 39,402 RAC: 0 |
If you are seeing errors with lock-file problems try setting the cpu setting back to 100%. If you are running at 100% CPU preference and are getting this problem, I for one, am very interested. If you are getting the failures and change the CPU setting to 100% and that cures the issue ... well, we are interested in THAT too ... I, too, was plagued by frequent R@H lock file problems. Setting CPU to 100% seems to have cured that. And, as I have a quad-core CPU, I can limit BOINC usage by setting "On Multiprocessor Systems, use at most 51% of all processors". (If I run BOINC at 100% on all cores, my system gets too hot - more precisely, my fan gets too loud) -- Andreas |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
problems with this one: 227327540 heartbeat error messages </stderr_txt> <message> <file_xfer_error> <file_name>abinitio_norelax_homfrag_natfrag_129_B_1o7uA_SAVE_ALL_OUT_6252_5178_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> |
robertmiles Send message Joined: 16 Jun 08 Posts: 1234 Credit: 14,338,560 RAC: 2,014 |
Hola, I never learned enough Spanish to do such a translation myself, so I tried asking that web site to translate all of your reply at once to Spanish, in preparation for writing an answer in English and doing the same to it. It appeared that the translation succeeded, but enough of it was hidden by advertisements that it was unusable. Anyone know another automatic translation site that doesn't have this problem? I've been trying to trigger that problem over on RALPH@home by setting my CPU time less than 100% and unable to actually get it less than 100%, so you might want to consider this: For anyone having this problem repeatedly, give them 1.54 workunits with extra debugging output enabled. Then have someone on the RALPH@home staff analyze the results and give them credits according to the RALPH@home standards instead of the Rosetta@home standards. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
http://www.babelfish.yahoo.com translates it as: Hello, First of all, excuses to write in Castilian, but my English is insufficient. From August of 2008 me 99% of the tasks of Mini Rosetta with computational error are finalizing. After a time I decided not to continue processing in this project. Even so, sometimes I return to try it, but everything follows equal: even with the new versions of Mini Rosetta, including this last one. The case is that the tasks of Rosetta Beta do not fail to me, but of that one sends very few proporcinalmente to me. The pain is that in this project the possibility of selecting sub-projects, does not exist there is as if it in other many. I would like to continue processing for this project, but there is no way, and it is not question to throw low-achieving hours of computation. I hope that this problem is solved soon. As for me I will continue trying from time to time. A coridal greeting for all, Juan he has 4 tasks running and 2 of them failed abinitio_norelax_homfrag_natfrag_129_B_1tit__SAVE_ALL_OUT_6252_2628_0 he got a lockfile failure on this one and it ran only CPU time 683.9708 and loopbuild_ref_tex_cst_hombench_loopbuild_tex_cst_t363__IGNORE_THE_REST_1WWTA_12_6651_14_0 this got lockfile as well it ran for CPU time 2155.325 the other 2 are split with a completion and in process |
rembertw Send message Joined: 21 Apr 07 Posts: 14 Credit: 628,529 RAC: 0 |
Mod.Sense Rembertw, which machine are you having the problem with? What version of BOINC are you running? Was this a newly installed machine? Or was it working before? - This specific problem occurs with computer ID 586996 - Boinc version 6.2.14 as on most of my computers currently - Not newly installed, but hardly a price winner with Rosetta. It crunches succesfully for other projects though Extra comments: I have the impression that it is Rosetta that crashes. This morning I noticed 2 other tasks at +7h progress and 0% progress. When cancelling these tasks I got the Windows crash notice where I can "inform microsoft of the problem". The only "special" about this computer is that it doesn't have 24/7 internet access. No solution as yet? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Mod.Sense I've not heard any other reports of the percent completed not increasing. What is it showing for the estimated runtime, before the task starts? Odd, the failed task with some time on it shows that your core client version is 6.2.14, but your BOINC Windows Runtime Debugger Version is 6.5.0. Not sure how that would happen. Rosetta Moderator: Mod.Sense |
Verrie Pearce Send message Joined: 2 Dec 05 Posts: 3 Credit: 90,299 RAC: 0 |
Hello All! |
Verrie Pearce Send message Joined: 2 Dec 05 Posts: 3 Credit: 90,299 RAC: 0 |
I have reached the end since your new patch nothing works form your project. I keep resetting and still I get no improvement. Until you patch your patch I am done sorry, I wanted to help. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2141 Credit: 41,518,559 RAC: 10,612 |
I have reached the end since your new patch nothing works form your project. I keep resetting and still I get no improvement. Until you patch your patch I am done sorry, I wanted to help. Urgh - bad news :( I notice you're using Boinc 6.2.19 with Vista64. Can you give it one last try and upgrade to 6.4.5? I had similar problems to you (not anywhere as bad) using Vista64 and these problems have disappeared for me after upgrading. It might make all the difference for you too. |
Markus Send message Joined: 21 Feb 08 Posts: 1 Credit: 28,072 RAC: 0 |
Good morning! I reinstalled my complete System a few days ago and restarded crunching rosetta@home again. Unfortuanally i got some errors Here is what i got 12.02.2009 05:37:59|rosetta@home|Restarting task cc_1_3_mamcstmix_cen_0.1_hb_t369__IGNORE_THE_REST_1RXQA_12_6836_46_0 using minirosetta version 154 12.02.2009 05:38:00|rosetta@home|Task cc_1_3_mamcstmix_cen_0.1_hb_t369__IGNORE_THE_REST_1RXQA_12_6836_46_0 exited with zero status but no 'finished' file 12.02.2009 05:38:00|rosetta@home|If this happens repeatedly you may need to reset the project. Therefore two workunits aborted with compuation error. Maybe just an error for my System, just wanted to post it Greetings |
Message boards :
Number crunching :
Problems with Minirosetta v1.54
©2024 University of Washington
https://www.bakerlab.org