41)
Questions and Answers :
Windows :
partial completion 'waiting to run'
(Message 71630)
Posted 22 Nov 2011 by amgthis Post: I've since put together another quad core box this time with the 2500 sandy bridge intel and 16g of ram. I have boinc manager set to use 100% of memory and plenty of disk space. I'm also now running Debian 'squeeze' release on this box. Unfortunately, Boinc running Rosetta still exhibits this behavior of abandoning wu's partially completed, and starting work with *later* expire dates. This has resulted in lots of work dying on the vine and expiring prior to completion. I've also set my extra work buffer up to 9 days sometimes because I've run out of work. Now I've lowered it to 4-5 days, since it never really gathers enough work for all cores for all days you set anyhow. Plus I didn't want it starting even more work before finishing others already in progress. BTW, right now Rosetta is the only project for this manager to try and manage. (6.10.58 from the debian stable tree) With all cores running 100% 24/7 no restrictions - my memory free is over 9 gigs. No swap being used. It seems the manager really isn't all that great at queuing work to consistently avoid letting good work go to waste and not being returned on time. Mod Sense, thanks again. More experimentation is needed. I have only one i7 cpu |
42)
Questions and Answers :
Windows :
partial completion 'waiting to run'
(Message 68851)
Posted 21 Dec 2010 by amgthis Post: Mod Sense, thanks again. More experimentation is needed. I have only one i7 cpu but I can tweak both ways for a couple of weeks and watch what happens. I'm hoping to install 64 bit windows 7 if I can get past some BIOS issues. I did watch while BOINC 'orphaned' off several of my nearly complete WU's as time expired and they were still 'waiting to run'. So that answered one question I had - BOINC will let the WU expire past it's due date and start newer WU's with later deadline dates if you have memory issues like I do. More testing is in order. Merry Christmas everyone! |
43)
Questions and Answers :
Windows :
partial completion 'waiting to run'
(Message 68783)
Posted 7 Dec 2010 by amgthis Post: Mod.Sense - first thanks for taking the time for such a detailed response. I believe what you are saying makes complete sense for my situation. I just installed my first i7 Bloomfield core cpu and while it's a quad I was a little surprised to see it running 8 tasks right off the bat. The i7 threading capabilities make for that apparently. The box has 4 gigs of ram but being Windows XP it's only using 3. I just checked another box with a Q9550 quad that has done the same thing with 2 WU's now waiting to run. Same deal, XP, 4 gigs of RAM (3 useable), etc. I think I just hit some bigger projects that pegged my RAM. My preferences are set to use 100% of all memory, page file, etc. on my boxes. Everything else you write appears to be what I've seen. I'll recheck system log messages also to see if this is started by a 'waiting for memory' issue that morphs into the 'waiting to run' as you write. I'm leaving the rest of your great response complete so hopefully it can help someone else if they experience this and are wondering. Now it will be shown twice on the page. Thanks again and best of the Holidays to you and everyone associated with Rosetta@home. /amgthis My best guess is that this is a memory issue. What can happen, especially with a many core machine, is that a task reaches a point or a model that requires more memory then the rest of the execution has. The combination of all 4 running at the same time then exceeds your preference for how much memory BOINC should use and the task goes to a status of "waiting for memory"... and BOINC seems to take a note that indicates it was using xxx MB of memory when it got deferred to the waiting status. |
44)
Questions and Answers :
Windows :
partial completion 'waiting to run'
(Message 68767)
Posted 6 Dec 2010 by amgthis Post: Win XP with intel quad core cpu's. |
45)
Questions and Answers :
Windows :
partial completion 'waiting to run'
(Message 68766)
Posted 6 Dec 2010 by amgthis Post: I notice sometimes my work units are shown partially (sometimes almost nearly) finished, but shown as 'waiting to run' while other work units have been started. I try to cache several days worth of work since I've run out many times in the past when the project is down. I don't understand why these units stop in the middle while others are started and finished, then new work started. But somehow the 'waiting to run' units sit. Some are like 95% complete and they just sit and wait to expire from work not being completed by the deadline. Does anyone know why this occasionally happens? The BOINC manager version doesn't seem to matter. I have this happen with new and old versions. ????? Why if a partial WU shows 'waiting to run' and it's almost totally finished, it never restarts before a brand new WU starts? |
46)
Message boards :
Number crunching :
minirosetta 2.15
(Message 67956)
Posted 4 Oct 2010 by amgthis Post: I have a quad core Q6700 with 4 gigs of RAM and I'm having the same problem reported here. With Windoze XP SP2 I'm getting constant 'nag' bubbles about low system memory. I check the usage under task manager and one instance is using over 1 gig of memory. The other 3 running WU's are looking more typical, using right around ~300k each of RAM, plus or minus. The work unit that is sucking over a gig is this one: task T0592_t4_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22268_1995_0 running under 2.15. I have my preferences basically set to use any resources they can grab, which has always worked great up till now. Also, I never run the graphic screen saver. I leave my computer on 24/7 with no other restrictions on Rosetta, other than the time it accesses my local LAN. |
47)
Message boards :
Number crunching :
No new work
(Message 64689)
Posted 1 Jan 2010 by amgthis Post: My boxes are running out of work. |
48)
Questions and Answers :
Windows :
lr8 wu's won't run
(Message 64331)
Posted 2 Dec 2009 by amgthis Post: These look like the "rama" work units that were posted last week. Your machines are hidden so I can't see the history of your specific WUs or when you got them to determine why it took you so long to run in to them. But, these have been discussed in some detail on the Number Crunching board. thanks, Mod sense. all of my boxes have been running slower this last week or so - maybe that accounts for the backup. |
49)
Questions and Answers :
Windows :
lr8 wu's won't run
(Message 64328)
Posted 1 Dec 2009 by amgthis Post: Is anyone else having trouble with these? 01-Dec-2009 12:44:41 [rosetta@home] Starting lr8_combine_smooth_torsion_it00_rama04_A_rlbd_1tul_IGNORE_THE_REST_DECOY_14889_747_0 01-Dec-2009 12:44:42 [rosetta@home] Starting task lr8_combine_smooth_torsion_it00_rama04_A_rlbd_1tul_IGNORE_THE_REST_DECOY_14889_747_0 using minirosetta version 200 01-Dec-2009 12:44:53 [rosetta@home] Computation for task lr8_combine_smooth_torsion_it00_rama04_A_rlbd_1tul_IGNORE_THE_REST_DECOY_14889_747_0 finished 01-Dec-2009 12:44:53 [rosetta@home] Output file lr8_combine_smooth_torsion_it00_rama04_A_rlbd_1tul_IGNORE_THE_REST_DECOY_14889_747_0_0 for task lr8_combine_smooth_torsion_it00_rama04_A_rlbd_1tul_IGNORE_THE_REST_DECOY_14889_747_0 absent dies after ~ 11 seconds and then 'output file absent'. I've had a bunch of these do this on several boxes. Sorry about the word wrap but I thought the time stamp should be included. |
50)
Message boards :
Number crunching :
Minirosetta 1.90 and 1.91
(Message 62785)
Posted 5 Aug 2009 by amgthis Post: Anyone else having trouble with these w/u's bombing out after only 20 seconds or so? More output file absent errors? 05-Aug-2009 12:16:20 [rosetta@home] Starting task lr5_combine_mods_run01_rlbn_1enh_IGNORE_THE_REST_NATIVE_14608_23_0 using minirosetta version 190 05-Aug-2009 12:16:42 [rosetta@home] Computation for task lr5_combine_mods_run01_rlbn_1enh_IGNORE_THE_REST_NATIVE_14608_23_0 finished 05-Aug-2009 12:16:42 [rosetta@home] Output file RE_THE_REST_lr5_combine_mods_run01_rlbn_1enh_IGNONATIVE_14608_23_0_0 for task lr5_combine_mods_run01_rlbn_1enh_IGNORE_THE_REST_NATIVE_14608_23_0 absent |
51)
Message boards :
Number crunching :
Problems with web site
(Message 59593)
Posted 16 Feb 2009 by amgthis Post: server status says everything is running but no results can be uploaded. ??? huh?? |
52)
Questions and Answers :
Windows :
SAN upgrade issue?
(Message 57524)
Posted 3 Dec 2008 by amgthis Post: Hitting 'update' about 10 times is a slow and dirty fix. Once the master file is fetched, all is redirected to the new server URL. this must be due to the upgrade not being quite completed yet: |
53)
Message boards :
Number crunching :
Problems with web site
(Message 57523)
Posted 3 Dec 2008 by amgthis Post: Hitting 'update' about 10 times worked for me. PITA on 20 boxes, though. I guess I could have done it thru boingmanager. Thanks moderators and others for the suggestion. /amgthis
|
54)
Questions and Answers :
Windows :
SAN upgrade issue?
(Message 57439)
Posted 2 Dec 2008 by amgthis Post: this must be due to the upgrade not being quite completed yet: 01-Dec-2008 16:59:22 [rosetta@home] Message from server: Server error: can't attach shared memory 'patience is a virtue' my old girlfriend used to claim. I'm still not sure I completely believed her...... 8^) |
55)
Questions and Answers :
Windows :
Output file absent
(Message 57227)
Posted 25 Nov 2008 by amgthis Post: I'm thinking this is why I'm getting many 'computation error' messages, even on units that have run what appears to be the full time to completion (7:48 or so) I'm set for 8 hr. work units: <snip> 24-Nov-2008 11:51:07 [rosetta@home] Computation for task loopbuild_minimalist_core_control_standardloopfile2_homo_bench_looprelax_cheat_chunk_control_standard_loopfiles_t286__olange_IGNORE_THE_REST_1FXWF_9_4817_73_0 finished 24-Nov-2008 11:51:07 [rosetta@home] Output file loopbuild_minimalist_core_control_standardloopfile2_homo_bench_looprelax_cheat_chunk_control_standard_loopfiles_t286__olange_IGNORE_THE_REST_1FXWF_9_4817_73_0_0 for task loopbuild_minimalist_core_control_standardloopfile2_homo_bench_looprelax_cheat_chunk_control_standard_loopfiles_t286__olange_IGNORE_THE_REST_1FXWF_9_4817_73_0 absent 24-Nov-2008 11:51:07 [rosetta@home] Computation for task loopbuild_minimalist_core_control_standardloopfile2_homo_bench_looprelax_cheat_chunk_control_standard_loopfiles_t286__olange_IGNORE_THE_REST_1FXWF_9_4817_74_0 finished 24-Nov-2008 11:51:07 [rosetta@home] Output file loopbuild_minimalist_core_control_standardloopfile2_homo_bench_looprelax_cheat_chunk_control_standard_loopfiles_t286__olange_IGNORE_THE_REST_1FXWF_9_4817_74_0_0 for task loopbuild_minimalist_core_control_standardloopfile2_homo_bench_looprelax_cheat_chunk_control_standard_loopfiles_t286__olange_IGNORE_THE_REST_1FXWF_9_4817_74_0 absent <snip> anyone else seen this? I have had a lot of the infamous 'no finished file' errors also but this one appears to be new for me. /amgthis |
56)
Questions and Answers :
Windows :
What's wrong with the "Rosetta mini with new score terms 1.02"?
(Message 56752)
Posted 7 Nov 2008 by amgthis Post: Same here all of those work units done blowed up. Thanks, Bruce. Yeah I think with 1.40 hopefully all the problems will stop. I still had some of the older "mini with new score terms" queued but they are almost all gone now. I've had no other problems (lately) with any other version(s). Cheers, /amgthis |
57)
Questions and Answers :
Windows :
What's wrong with the "Rosetta mini with new score terms 1.02"?
(Message 56730)
Posted 6 Nov 2008 by amgthis Post: Same here all of those work units done blowed up. No, Bruce. I was just practicing my NASCAR-speak. 8^) If I could understand the error messages better I'd forward them along but I think other people had that covered already. |
58)
Questions and Answers :
Windows :
What's wrong with the "Rosetta mini with new score terms 1.02"?
(Message 56693)
Posted 4 Nov 2008 by amgthis Post: Same here all of those work units done blowed up. /amgthis |
59)
Message boards :
Number crunching :
minirosetta v1.15 bug thread
(Message 52870)
Posted 5 May 2008 by amgthis Post: The mini rosetta 1.15 units just continually crash. Why keep queuing them to |
60)
Message boards :
Number crunching :
minirosetta v1.15 bug thread
(Message 52856)
Posted 4 May 2008 by amgthis Post: The mini rosetta 1.15 units just continually crash. Why keep queuing them to distribute until the problems are sorted? People are wasting k watts of power for nothing in the meantime... I would think we would just line up 5.96 units until the bugs were sorted instead of wasting thousands of watts of energy for nothing. ???? |
©2024 University of Washington
https://www.bakerlab.org