Posts by Mike.Gibson

1) Message boards : Number crunching : Problems with web site (Message 69113)
Posted 10 Jan 2011 by Mike.Gibson
Post:
I downloaded a WU on 8 Jan which is working. My problem is uploading the previous WU. Why rare we having the problems when the Server Statuses are showing OK?

Mike
2) Message boards : Number crunching : Problems with web site (Message 67901)
Posted 1 Oct 2010 by Mike.Gibson
Post:
Still no upload and all servers seem to be operational.

Mi9ke
3) Message boards : Number crunching : minirosetta 2.14 (Message 67275)
Posted 18 Aug 2010 by Mike.Gibson
Post:
I have preferences set to 24 hours crunching but 327442297 has now been crunching for 38 hours and is stuck at 60.034% although still clocking time up and the time to go is rising in proportion.

Should this be terminated? Is there any way it can be made to terminate without losing the results?

Mike


The properties of that task will show you the CPU time used. Check it, jot down the time, then a minute later check it again. Did it use any CPU time during that minute? I am doubtful the task is getting any CPU time. If it were, the watchdog would have already caught the problem and reported the task result. So, I'm guessing you may have to suspend and then resume the task to get it back to using CPU time. It will probably then run through it's originally expected 24 hour runtime (i.e. another 8-10 hours).


Many thanks. After suspending and resuming several times it finally restarted using CPU time, although it had been hogging one core for the last day without using CPU time.

Mike
4) Message boards : Number crunching : minirosetta 2.14 (Message 67271)
Posted 18 Aug 2010 by Mike.Gibson
Post:
I have preferences set to 24 hours crunching but 327442297 has now been crunching for 38 hours and is stuck at 60.034% although still clocking time up and the time to go is rising in proportion.

Should this be terminated? Is there any way it can be made to terminate without losing the results?

Mike
5) Message boards : Number crunching : Minirosetta v1.40 bug thread (Message 57296)
Posted 28 Nov 2008 by Mike.Gibson
Post:
I am using a dual-core 3800+ with Vista Premium and Boinc 6.2.19.

If I have a mini 1.40 & Beta 5.98 running and suspend the project, both tasks are shown as suspended by user. However, the mini 1.40 keeps on running, albeit slowly.

Two other tasks start to run, one at normal speed and the other slowly.

Obviously, one of the new tasks is running on its own in one core and the other new task is sharing the second core with mini 1.40.

I have never seen a core sharing before. Is this ok, or is this a problem. None of my other projects show any signs of this phenomenum.

Any ideas?



There has been a few problems I have experienced and others have as well with 1.4 tasks not suspending, that was mostly in the loopbuild tasks. I have found that you have to just exit boinc and restart it. you may also have to reboot your system. but that is probably last ditch. After one or both of these steps boinc mgr will act properly again.



I have tried all sorts of combinations including reboots but it recurs next time. It seems to happen with either suspending project or suspending task. However, suspending both can clear the problem until the next time.
6) Message boards : Number crunching : Minirosetta v1.40 bug thread (Message 57295)
Posted 28 Nov 2008 by Mike.Gibson
Post:
Check your processes running in task manager by pressing control, alt, delete. Do you show more than the normal number of tasks running?


Already checked - all 3 registered at variable amounts around 44%, 22% & 22%.
7) Message boards : Number crunching : Minirosetta v1.40 bug thread (Message 57291)
Posted 27 Nov 2008 by Mike.Gibson
Post:
I am using a dual-core 3800+ with Vista Premium and Boinc 6.2.19.

If I have a mini 1.40 & Beta 5.98 running and suspend the project, both tasks are shown as suspended by user. However, the mini 1.40 keeps on running, albeit slowly.

Two other tasks start to run, one at normal speed and the other slowly.

Obviously, one of the new tasks is running on its own in one core and the other new task is sharing the second core with mini 1.40.

I have never seen a core sharing before. Is this ok, or is this a problem. None of my other projects show any signs of this phenomenum.

Any ideas?
8) Message boards : Number crunching : Minirosetta v1.28 bug thread (Message 53883)
Posted 20 Jun 2008 by Mike.Gibson
Post:
Mini 1.28 computation finished on WU 155747918 when time to completion still over 40 minutes. Job reported automatically but a validation error was recorded. Are these factors linked? It seems to have been a waste of 19.3 hours. Can anything be done to recover the WU & points?
9) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50812)
Posted 19 Jan 2008 by Mike.Gibson
Post:
As a follow up to message 50796 etc, when my 1c26 unit approached the 10-hour preferred runtime, I increased the runtime to the maximum of 24 hours. As soon as it took effect, the progress % fell to 38. (i.e. CPU time /24)

Another 7 hours have gone by and the grogress % is still based on CPU time/24.

Another consequence of increasing the runtime was that BOINC Manager woke up to the fact that I had 6 Rosetta units that were liable to miss their deadline and consequently commandeered both cores of my 3800+ dual-core machine for Rosetta at the expense of everything else. This brought a second Rosetta into play, an s099 unit, which now seems to be going along the same lines with 7 hours CPU time and 29% progress.

Heaven help anyone with a PIII machine! They will never finish. Even I am wondering if how many, if any, of my units will finish before the deadline of 23/1/08. I am not expecting them to finish within the 24 hours.

Does anyone know how long these will take, please?

Regards

Mike
10) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50796)
Posted 18 Jan 2008 by Mike.Gibson
Post:
I see where you are coming from, but, if you take the 2 hours a day machine as an example, it will start the unit thinking it will finish within the deadline but when the 3 hours is up, a couple of days later, it then sticks on the 3 hours and no progress seems to be happening and the time will be wasted when the unit is eventually aborted or the deadline passes. It is far better for the true time to appear and then the unit can be aborted before it starts if the deadline cannot be met. That way another shorter unit can be run in its place, successfully.

Cheers

Mike

Mike, if everyone had the same time preference, and if all tasks had roughly the same time per model, what you say would certainly be done. But neither is the case. Some people want shorter times (and, yes, it would be nice if they never received a task that took longer then that, but it's not a perfect world). The mixture of work varies over time. The ratio of long to short model tasks varies. ...and you are correct, this can (and does) throw off the estimates and confuse BOINC about how much work to get.

The best way to get a fairly concistent and predictable completion time is to go the 24hr maximum runtime preference. But, if your machine is only on 2 hours a day, it would take you more then 10 days to complete a task and it would never get returned before the deadline. ...there's always something. But if BOINC is running 24hrs a day anyway, then this will offer the most predictability for human, and BOINC.

11) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50790)
Posted 18 Jan 2008 by Mike.Gibson
Post:
Thanks for this explanation. I had been dumping "stuck" 5.90s and was about to dump a "stuck" 5.93. As a result of your explanation, repeated below with the original question, I set a time of 10 hours in place of the default and lo & behold, after a while, the time to go shot up from 10 minutes to 5 hours meaning a total time of over 8 hours on a 3800+ dual-core with 1MB RAM! Also the progress dropped from 95% to about 35%. It is now going well.

Would it not be better to put out a message about the possible time increase and also to change the default from 3 hours to something more realistic? Presumably, this is only a few minutes work to do and it would solve all these problems.

Apart from anything else, BOINC Manager needs to know how long these units can take in order to assess what units to obtain and also for assessing priorities. If something is going to take 3 times the expected time, it could cause other units/projects to default on time limits.

Regards

Mike
Version 5.93 reached 96% plus on a WU showing 10 mins to go. An hour later 97% and 10 mins to go. Stopped BOINC and restarted the WU came up at 97% and when computation restarted reset to zero and 6 hours 20 remaining! Sigh Any ideas?


Ideas? Yes, don't stop BOINC. Seriously.

The fact that your % complete reset to zero implies that no checkpoint was reached during the calculations. Some types of work are able to checkpoint very frequently, some are not.

The time to completion is an estimate, and not always a very accurate estimate. Some of the work they are sending out can take 5 or 6 hours to complete a single model (longer on a slower machine). This is especially true for the 1zpy's. If your preferred runtime is less then this, you will see an estimated time to completion of something under 10 minutes for any time over your preference. So if your preference is the default 3hrs for example, it will show 10min to complete, with expoentially small reductions in that time for the last 2 or 3 hours of the model.

12) Message boards : Number crunching : Problems with Rosetta version 5.93 (Message 50440)
Posted 7 Jan 2008 by Mike.Gibson
Post:
Hi, folks

I thought that the same problem as I was having on 5.90 was recurring. That is it was adding time but not finishing from 10 minutes to go. But no, this time it was just the countdown that stuck. After 10 minutes stuck on 10 minutes to go it suddenly finished.

Regards

Mike
13) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 50227)
Posted 31 Dec 2007 by Mike.Gibson
Post:
Like Paul, I have had no problems for the last 2 days. The current series of WUs (Structural Genomics Target) seem fine.

Cheers

Mike
14) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 50049)
Posted 26 Dec 2007 by Mike.Gibson
Post:
Hi, folks

I have just had my second WU stick at 10 minutes to go out of 3 hours expected using Beta 5.90

Others have finished OK.

I have suspended both so that other WUs will run.

The only similarity that I can see is that they were both listed as Running, high priority and the successful ones were not.

The report deadline is 31/12, so I was going to try them again when the other WUs were finished.

Any other suggestions, please?

Mike
15) Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux) (Message 49468)
Posted 7 Dec 2007 by Mike.Gibson
Post:
Have you given up on 5.85? I've just been supplied with a 5.82 unit after having no problems with 5.85!!!
16) Message boards : Number crunching : Rosetta not downloading new work (Message 49299)
Posted 1 Dec 2007 by Mike.Gibson
Post:
...with your major project (50%) not giving new work, BOINC will have a challenge scheduling work. If LHC should come online tomorrow, BOINC would prefer (based on the debts) to get work for LHC (my guess), and so it's still holding out for that to happen. But, as time goes by and that does not happen, it will be adding to Rosetta debt. So it all balances out.

When you run 5 projects like that, it's normal to not always have work for all of them at the same time. From the sounds of it, once WCG gets work completed, BOINC will request work from Rosetta. Try to resist the temptation to try and force BOINC to do something it doesn't want to do. It will likely only confuse things further.


Thanks for the reply. Everything had been working well, with plenty of pending units on all schemes, until Rosetta went down. It obviously took BOINC a long time to re-schedule but they have finally supplied 5 Rosetta units at one time. However nothing else is now being supplied. I presume that they have now realised that Rosetta needs to catch up. Hopefully, when that has happened, everything will return to normal.
17) Message boards : Number crunching : Rosetta not downloading new work (Message 49285)
Posted 1 Dec 2007 by Mike.Gibson
Post:
Mike, it's possible your other projects got enough work that BOINC doesn't think you will have any time to be running Rosetta within your "connect to network every... days" setting.

Please review this thread . It has a number of things to check on your end.


Thanks, Mod.Sense, I have checked the thread and that seems to be the problem.

01/12/2007 18:07:24|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
01/12/2007 18:07:29|rosetta@home|Scheduler request succeeded: got 0 new tasks
01/12/2007 18:12:54|lhcathome|Sending scheduler request: To fetch work. Requesting 404171 seconds of work, reporting 0 completed tasks
01/12/2007 18:12:59|lhcathome|Scheduler request succeeded: got 0 new tasks

However, I have not crunched any Rosetta units for 3 days. One was reported as the website reopened but that had finished and uploaded some time before.

Seti & WCG owe time to rosetta and they are still crunching. Even more time is owed to LHC but that appears to be closed at the moment.

I have 2 cores and a 50% allocation to LHC which is not being taken up. The 25% allocation to CPDN is therefore boosted to 50% and is running continuously on one core. The other core is being shared between WCG (nominal 12.5% share) and Seti (nominal 6.25% share) with Rosetta (nominal 6.25% share) getting nothing.

I have 15 units in hand totalling 75 hours and have 5 days set. I normally have about 30-40 units in hand. These figures exclude CPDN. The earliest report deadline is 5 Dec for WCG and 18 Jan for Seti.

There does not seem to be any reason that BOINC should be excluding Rosetta.

18) Message boards : Number crunching : Rosetta not downloading new work (Message 49280)
Posted 1 Dec 2007 by Mike.Gibson
Post:
Hi, folks.

I have not been supplied with any new work since the website came back on-line. I was already out of work so Rosetta is losing time to other projects.

Anyone know what is happening, please?

Happy crunching!
19) Message boards : Number crunching : Whats up with Ralph website? (Message 49226)
Posted 29 Nov 2007 by Mike.Gibson
Post:
Anyone know why Ralph has been down all day?


Hmmm, the system is down. I'm not on campus today - I'll see if I can get someone there to take a quick look...

-KEL



Are things still down. I have been getting the following message since yesterday and have run out of units to crunch.

29/11/2007 16:17:23|rosetta@home|Message from server: Project encountered internal error: shared memory






©2024 University of Washington
https://www.bakerlab.org