21)
Message boards :
Number crunching :
Problems with Rosetta version 5.93
(Message 51070)
Posted 29 Jan 2008 by FalconFly Post: Same here, had to abort the last 2h4o Model. One of my faster Hosts effectively stopped working, as the hourly rotation of the last 2h4o__BOINC_TWIST_RINGS WorkUnit apparently reset CPU time over and over, while making zero progress. As a side-effect, the Rosetta Long Term Debt of the affected Clients rocketed upto -90000s (lots of work but almost no progress done) |
22)
Message boards :
Number crunching :
Problems with Rosetta version 5.93
(Message 50955)
Posted 24 Jan 2008 by FalconFly Post: Falcon, what is your Rosetta Preference for target runtime? Was set at 6 hours until this evening, when I reduced it to 4 (4x4h no progress is at least better than 4x6h no progress) Typical WorkUnits that finished already : Watchdog Terminated Watchdog Terminated + Segmentation Violation (still valid though) Watchdog Terminated Watchdog Terminated ---------- If the WorkUnit just takes that long (and can't finish within 4 or 6 hours on a modern Athlon64 X2), I don't mind the increased runtime. I don't expect that to take 24 hours though (unless the Models are really much more complex than expected, which could be in theory for all I know) Looking at Claimed vs. Granted Credit however, it seems that approx. 50-70% of the runtime is simply lost due to Watchdog not cutting in until 4x the set runtime (not sure what the Client actually does in that time). |
23)
Message boards :
Number crunching :
Problems with Rosetta version 5.93
(Message 50945)
Posted 24 Jan 2008 by FalconFly Post: Noted a couple of 2H4O_BOINC_TWIST_RINGS WorkUnits stuck at ~10min remaining as well, all well beyond their target runtime. CPU time counts upwards but no progress is made. Oddball : Restarting BOINC on a System beyond runtime causes CPU time to drop from beyond target runtime to some point inside target runtime (e.g. 6h16m to 2h16m with a 6h preferences set), progress bar moved back accordingly from 99%. The same happens on a couple of Systems tested (CPU time dropped from 23h back to a seemingly random point within target runtime) Based on granted Credits and Decoys tested, the affected 2H4O_BOINC_TWIST_RINGS will stall at some point, but still cause full CPU utilization. WorkUnit will be ended by Watchdog after hitting 4x expected runtime. ------ All occurred with BOINC V5.10.28 and various Linux Systems. |
©2024 University of Washington
https://www.bakerlab.org