Posts by Melvin

1) Message boards : News : Outage notice (Message 97428)
Posted 16 Jun 2020 by Melvin
Post:
Followup - Seems there are other issues with 7.16.5 on Android with units not running.
Will uninstall this and give the phone a rest until perhaps an Android version becomes available that will run ok.
2) Message boards : News : Outage notice (Message 97400)
Posted 15 Jun 2020 by Melvin
Post:
After noticing a couple of completed Einstein units also well overdue and looking at the forum there, this seemed likely to be the same issue
of certification expiry that started to affect all projects on devices using BOINC below 7.10 since the last couple of weeks or so.
See https://einsteinathome.org/content/upcoming-ssltls-security-updates-old-boinc-client-support.
I then found and installed an android download for 7.16.5 from https://boinc.apk.cafe/ which seemed to overcome this ok.
[Caveat: alternatives may not yet have been beta tested enough for the store, and not being via the store is supposedly with greater risk ]
3) Message boards : News : Outage notice (Message 97394)
Posted 15 Jun 2020 by Melvin
Post:
Thank you.
I had checked that the store on the phone still only supported BOINC v7.4.53 (last updated 3 Jul 2016),
and browsing to the BOINC site from the phone detected the Android device type but also only offered 7.4.53.
So if the phone doesn't have a solution, and there seems no easy way to return completed units meanwhile, is it best to suspend the project to avoid wasting new downloads,
or to do nothing and expect the project to automatically not send new work until BOINC has been updated to a suitable version for the device?
4) Message boards : News : Outage notice (Message 97381)
Posted 14 Jun 2020 by Melvin
Post:
PS: although the tasks still show as awaiting upload, going to the Rosetta site I now see no tasks listed for that device (4462144).
Typing the long task names into the search shows.
"Unable to handle request. No such task: 14ffb8a64f5c34e1161bd6"
"Unable to handle request. No such task: Junior_HalfRoid_design6_cart_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_1yz8fo4k_929581_34_0"
"Unable to handle request. No such task: Junior_HalfRoid_design6_cart_COVID-19_SAVE_ALL_OUT_IGNORE_THE_REST_1hv3wm8h_929388_34_0"
5) Message boards : News : Outage notice (Message 97379)
Posted 14 Jun 2020 by Melvin
Post:
Yes, a while ago I noticed my Android phone (running BOINC 7.4.53) had 3 Rosetta work units sat there for well over a week with transfers pending, all were deadline 2/6.
Task summary showed "uploading", but project summary showed repeated "Communication scheduled in ..... Transfers pending 3".
No problem with other projects on that device, nor with Rosetta or other projects on Linux and Windows 10 desktops or laptop.
Just on this Android (v8.1.0) device.
6) Questions and Answers : Wish list : I would like to see:: (Message 89858)
Posted 9 Nov 2018 by Melvin
Post:
Thanks for making those suggestions Sid.
The two potentially awkward work-units completed earlier today.
Will have to wait and see if any more like those come along.
7) Questions and Answers : Wish list : I would like to see:: (Message 89852)
Posted 8 Nov 2018 by Melvin
Post:
Had to reboot after crash earlier today and later noticed these two units displayed similar (to each other) 'Elapsed' and 'Remaining' times.
I'd not realised this seems to be an elapsed real time since reboot - as opposed to the cumulative time elapsing whilst the units have been crunching !
Have added today's table below the screen-shots I uploaded earlier, to show the same work-units with only a handful of elapsed hours but around 66% 'Progress' - a big jump from earlier few % - confusing?
(I tend to think of 'progress' to be an estimate of percentage complete, but had previously concluded I'd misunderstood how various terms are calculated)
The most significant change I'd noticed after reboot was that 'remaining' times, that previously had started to move slowly upward, were now lower and now going down slowly.
One being an hour or so lower than a few days ago, the other being less than an hour higher than a few days ago, but both now in near unison.

Have now changed the 70% to 0% as you suggested. Will give the 85% a try at 100% as you mention this may be problematic.
Now summer has ended, we'll see if the extra heat is tolerated due to a few degrees lower ambient.
Tower runs on it's 'side' with other 'side' open (as top) so ribs on GPU card are vertical so more heat can convect away (no integral fan).
Otherwise in 'normal' orientation, card would be horizontal but with ribs on underside accumulating heat (not much case fan flow in that area).
I've found this to be an alternative to removing the GPU (which R@H does not use but other projects do) .
8) Questions and Answers : Wish list : I would like to see:: (Message 89842)
Posted 5 Nov 2018 by Melvin
Post:
Yes, and it was still going good after a further week from changing them settings, looking as though that had solved it, but that may have been too early to be sure because not all units previously stalled.
Certainly several units went by quickly to bump up the RAC, though after those earlier stalling units anything completing was an improvement.

I'd explicitly allocated 7Gb for disk as you'd suggested, and according to boinc-manager stats, this should be more than adequate. Screenshot at http://frintonet.dlinkddns.com/boinc1.jpg shows BOINC currently
using 4.3Gb of this (70% of which is taken by R@H) with 2.7Gb still available to BOINC. The "Leave applications in memory" thing was ticked, as you'd said (I later found reference to it causing some people problems depending on project check-pointing).
As before there seems ample available memory (never seen this machine swap since extra memory fitted). Process monitor shows BOINC gets a decent share of CPU.

Now Just when it was looking as though I now might not get any more 'awkward' units, a couple of suspects come along!
Conspicuous by their high 'elapsed time' (about 120hours) despite low 'remaining time' estimates of only a few hours left with a couple of days left to deadline, they are raised to 'high-priority'.
The displayed 'remaining' time estimates again do not seem to tally with the few % of 'progress' (taking elapsed/remaining on pro-rata basis) and are faltering (e.g. slowly increased by a minute or so in the last hour).
9) Questions and Answers : Wish list : I would like to see:: (Message 89769)
Posted 24 Oct 2018 by Melvin
Post:
Thanks for those suggestions Sid. I have made those changes on all my machines.
I think I'd misunderstood a zero to mean 'no restriction' as it does in some other preferences.
10) Questions and Answers : Wish list : I would like to see:: (Message 89758)
Posted 24 Oct 2018 by Melvin
Post:
Thanks for looking at this. Other machines have occasionally run high priority for R@H and sometimes also shown very high elapsed compute times.
This PC does so a bit more often, but not sure if it is the settings, hardware compatibility or just be the luck of the work-unit draw?
All are set to similar preferences, apart from CPU usage on this one is backed down a little as it has greater tendency to overheat, though if it was running
exceedingly slow I'd expect the elapsed compute time to be low, and all projects to be adversely affected.
What would seem more wasteful is if this behaviour is repeated many times if units are simply re-allocated until finally landing on a much faster machine
that can complete within the deadline period (assuming the compute is deterministic).

I didn't abort the tasks because when I next looked a couple of days ago they had already all been replaced by new ones.
Based on earlier elapsed/remaining proportions those earlier tasks would still all have be crunching now.
Screen-shot of he new set of tasks now added to the earlier link, you will see all are already overdue again.
I had wondered if this was unlike the initial wild time-estimates that soon converge to a mostly accurate one, such as when downloading a large file,
and that maybe the often up-counting remaining compute time does may converge/finish abruptly, as I've not watched close enough to know,
or, was there is a terminate-mechanism already built into the software (and if not, can it be very difficult to program in)?

Looking at the log I now see these lengthy units were indeed terminated rather than ending with a natural/useful answer. Log extract http://frintonet.dlinkddns.com/boinclog.jpg
Not sure however if a result becomes "no longer usable" based on a simple time-beyond-deadline for reallocation, or (preferably) actioned by a stability-assessment algorithm?
Software based on general knowledge of result characteristics should be able to make a more informed decision than the average user, some of whom may not even monitor this at all.
I'd also like to think that a result shown to be 'awkward to compute' (especially if confirmed so by others) has some value to help decide a longer allowance if it is to be reallocated.

Melvin
11) Questions and Answers : Wish list : I would like to see:: (Message 89733)
Posted 18 Oct 2018 by Melvin
Post:
Insert of table failed, see http://frintonet.dlinkddns.com/boinc.jpg.
12) Questions and Answers : Wish list : I would like to see:: (Message 89727)
Posted 18 Oct 2018 by Melvin
Post:
Even with other projects paused to focus the scheduler on R@H tasks I often notice R@H deadlines are reached and passed, even with work units running 24/7.
I realise the total compute time required may be hard to judge until the work unit is well under-way, though that seems a good reason to set deadlines further out.
I often see huge elapsed times compared to other projects. Work buffer set to 1 day min. 2 day max. so as not to get too big a work unit queue, yet there are 32 R@H units queued, with deadlines up to 25/10. (and a similar number of non-R@H units) which seems to imply at least 4 units/core/day is expected, but some units take way longer than that.
I don't usually monitor tasks closely because I tend to assume/hope the software can manage itself - but I do wonder to what depth it does this.
A low progress when approaching a deadline may cause user concern.

Firstly, there may be the temptation to abort rather than use more compute time for uncertain/no credits.
It would seem wasteful if all users running the same work unit came to a similar conclusion to abort when it could be just short of a useful result.
Is 'data-so-far' then just lost, with no part-result communicated back to base to help decide if the unfinished work should be re-allocated or not?
(with risk of events repeating unless a longer deadline is set). Does/can the software, at some point in the calculations, determine if it will definitely converge to a useful result, even if that would take it beyond the original deadline? What about a dynamically re-defined deadline to recognise this situation? Or, can the software decide on an early completion to retrieve a part-result for re-allocation to a continuing work-unit? If results are diverging beyond hope of finishing in reasonable time (whatever that is?) does the software have the means to terminate itself at that point?

Secondly, perceived completion estimates can seem an order of magnitude longer compared to projects that set longer deadlines (even with shorter compute times).
At the moment I see one R@H work-unit due 7/10 showing less than 3% progress in the below table with an elapsed time of 366 hours ( >15 days) that could seem to imply about 500 days left to finish! Yet the estimated remaining time left shows less than 10 hours - though very often this figure clocks up instead of down as time elapses.
Other work-units on the same PC: due 13/10 shows 166 elapsed hours as 32% progress, due 16/10 shows 37 elapsed hours as 20% progress, and one due yesterday with 30 elapsed hours as <1%. (activity setting 'Run Always' on Linux Mint 17.3 using AMD FX4100 quad-core CPU @1.4GHz, typical memory use 2/3 of 12GB free, no swap, other projects paused).

Although compute time seems inherently unpredictable 'I would like to see' some means to a) assure the user their overdue results may still be useful and credited,
and/or b) the software can make an abort/terminate decision when it is sensible to do so and credit user up to that point.



13) Message boards : Number crunching : 3.43 is causing pop-ups (Message 74415)
Posted 15 Nov 2012 by Melvin
Post:
In case it helps the findings, I've checked what I have currently running and found the following varied results on 3 PCs:-

1) 64bit AMD 4-core running up to date Windows7.
Had to abort three 3.43 jobs after noticing and minimising the screensaver pop-ups (always reappeared after closing these windows)
A days worth of cpu gone, all with 0.000% progress on them, after 11:37, 10:14 & 0:59 elapsed hrs.

2) 64bit AMD single-core PC (dual booted)
a) running up to date Windows XP Home SP3 (32bit).
The 3.43 job currently running appears to be normal (no pop-ups) so far, at 68.79% progress, 7:21 hrs elapsed hrs.
b) running Linux Mint 12 (64bit), KDE desktop.
One 3.41 job running normal at 45% progress 1:58 elapsed hrs.

3) 64bit AMD 4-core running Linux Mint 11 (64bit), Gnome desktop.
Two 3.45 jobs appear to be running normal at 52% & 60% progress, 1:18 & 1:39 elapsed hrs.
+ One 3.43 job ready to report at 100% after 3:43 hrs elapsed.







©2022 University of Washington
https://www.bakerlab.org