Posts by FalconFly

21) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 51070)
Posted 29 Jan 2008 by Profile FalconFly
Post:
Same here, had to abort the last 2h4o Model.
One of my faster Hosts effectively stopped working, as the hourly rotation of the last 2h4o__BOINC_TWIST_RINGS WorkUnit apparently reset CPU time over and over, while making zero progress.

As a side-effect, the Rosetta Long Term Debt of the affected Clients rocketed upto -90000s (lots of work but almost no progress done)
22) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50955)
Posted 24 Jan 2008 by Profile FalconFly
Post:
Falcon, what is your Rosetta Preference for target runtime?
Please see related info. in this thread.


Was set at 6 hours until this evening, when I reduced it to 4 (4x4h no progress is at least better than 4x6h no progress)

Typical WorkUnits that finished already :
Watchdog Terminated
Watchdog Terminated + Segmentation Violation (still valid though)
Watchdog Terminated
Watchdog Terminated

----------
If the WorkUnit just takes that long (and can't finish within 4 or 6 hours on a modern Athlon64 X2), I don't mind the increased runtime. I don't expect that to take 24 hours though (unless the Models are really much more complex than expected, which could be in theory for all I know)

Looking at Claimed vs. Granted Credit however, it seems that approx. 50-70% of the runtime is simply lost due to Watchdog not cutting in until 4x the set runtime (not sure what the Client actually does in that time).
23) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50945)
Posted 24 Jan 2008 by Profile FalconFly
Post:
Noted a couple of 2H4O_BOINC_TWIST_RINGS WorkUnits stuck at ~10min remaining as well, all well beyond their target runtime. CPU time counts upwards but no progress is made.

Oddball :
Restarting BOINC on a System beyond runtime causes CPU time to drop from beyond target runtime to some point inside target runtime (e.g. 6h16m to 2h16m with a 6h preferences set), progress bar moved back accordingly from 99%.

The same happens on a couple of Systems tested (CPU time dropped from 23h back to a seemingly random point within target runtime)

Based on granted Credits and Decoys tested, the affected 2H4O_BOINC_TWIST_RINGS will stall at some point, but still cause full CPU utilization. WorkUnit will be ended by Watchdog after hitting 4x expected runtime.

------
All occurred with BOINC V5.10.28 and various Linux Systems.


Previous 20



©2024 University of Washington
https://www.bakerlab.org