Posts by adrianxw

21) Message boards : Number crunching : System Idle Process. (Message 76559)
Posted 27 Mar 2014 by Profile adrianxw
Post:
I was watching the Windows Task Manager page and was surprised to see the Windows Idle Process \"running\". It disappeared when I suspended the Rosetta job, which was showing as \"running\" in BOINC Manager, times advancing normally, but only using 2 - 3% CPU. I am used to seeing non BOINC jobs using time while a project is using less, sqlserver etc., and understand that, but the Idle Process?
22) Message boards : Number crunching : Long-running and failing rb_06_21_* work units (Message 76347)
Posted 14 Jan 2014 by Profile adrianxw
Post:
Okay, no responses, so time for experiments. I removed the suspended status and suspended enough other projects so the job would start again, my intention was to see what happened with the suspend then reactivate. So, it started again, but, the elapsed time dropped from the previous high to 3:20:17, and the %complete to 30.495%. I\'ll leave it but watch it more carefully.
23) Message boards : Number crunching : Long-running and failing rb_06_21_* work units (Message 76346)
Posted 13 Jan 2014 by Profile adrianxw
Post:
I have this wu, an rb_01_10... etc wu, on one of my systems. So far it has run for 50:09:53 and is showung 41.811% complete. The remaining time is showing \"---\". I\'m fairly sure it is not actually doing anything as I see the Windows idle process \"using\" 25% of my quad core. I\'ve suspended it pending advice, the deadline is a week away.
24) Message boards : Number crunching : Very long run time. (Message 75806)
Posted 25 Jun 2013 by Profile adrianxw
Post:
>>> If you think you\'re getting \"screwed\" and that it\'s \"pathetic\", then it\'s probably time to move on, bro

The fact remains, however, that a project that is causing \"issues\" for the cruncher pool WILL cause people to move on. There are a lot of good projects out there now, and if DB wants to, at least, stay where he is in terms of numbers of participants, something needs to be done.
25) Message boards : Number crunching : Very long run time. (Message 75759)
Posted 13 Jun 2013 by Profile adrianxw
Post:
That one finished fine after restarting it. Grossly low credit though, 31.90 for 36,259.55 seconds. There IS something wrong here.
26) Message boards : Number crunching : Very long run time. (Message 75758)
Posted 13 Jun 2013 by Profile adrianxw
Post:
This one also twitched the, now sensitized wu radar. Seemed stuck at 97.715%, elapsed going up, to completion not moving. Stopped and restarted, seemed to start progressing again. I\'ll let it go and the other wu I\'ve got, (unstarted), run then think I\'ll take a \"vacation\" from Rosetta.

The completed wu\'s disappear from the list to fast to do any comparison, (I\'m thinking wu type etc.).

I have always regarded Rosetta as a steady safe project, but it is bordering on the dubious area right now.
27) Message boards : Number crunching : Very long run time. (Message 75753)
Posted 12 Jun 2013 by Profile adrianxw
Post:
I allowed new tasks and have not seen this again. I am, however, removing Rosetta from unattended systems.
28) Message boards : Number crunching : extremely low crediting (Message 75709)
Posted 5 Jun 2013 by Profile adrianxw
Post:
It is flagged as Sucess. I have not updated BOINC because I do not like some of the \"features\" in the more recent versions. The BOINC version should not affect the applications anyway.
29) Message boards : Number crunching : Very long run time. (Message 75701)
Posted 5 Jun 2013 by Profile adrianxw
Post:
The wu has run to completion and reported a success. That, of course, does not alter the fact that the problem occurred.
30) Message boards : Number crunching : Very long run time. (Message 75699)
Posted 4 Jun 2013 by Profile adrianxw
Post:
I hadn\'t but will enable it and watch.

<edit>
And I saw that the elapsed time reduced to 03:28:23, the time to completion reverted to a normal looking 03:16:28.
</edit>
31) Message boards : Number crunching : Very long run time. (Message 75696)
Posted 4 Jun 2013 by Profile adrianxw
Post:
I have this wu here at the moment. It\'s remaining is \"---\" like a finished wu, but the elapsed is 40:09:09 increasing. It purports to be 54.248% completed.

This is the second issue I\'ve reported to Rosetta today, I\'ve suspended that wu, and set No Now Tasks pending replies.
32) Message boards : Number crunching : extremely low crediting (Message 75695)
Posted 4 Jun 2013 by Profile adrianxw
Post:
There does seem to be an issue here.

584971300 531234860 2 Jun 2013 1:29:13 UTC 2 Jun 2013 19:02:33 UTC Over Success Done 20,919.98 91.04 139.64
584723878 531007272 31 May 2013 19:45:22 UTC 1 Jun 2013 12:56:12 UTC Over Success Done 21,219.56 92.34 143.49
584327948 530642544 29 May 2013 18:01:53 UTC 30 May 2013 5:29:08 UTC Over Success Done 36,574.50 159.16 20.00
583998940 530344060 28 May 2013 0:34:57 UTC 29 May 2013 8:39:23 UTC Over Success Done 21,201.47 92.26 129.02

The highlighted wu stands out. Almost twice as long to complete, flagged as a success, and a miserly 20 credit.
33) Message boards : Number crunching : max # of error/total/success tasks 1, 2, 1 (Message 74893)
Posted 15 Jan 2013 by Profile adrianxw
Post:
When I download a wu, I see...

>>> max # of error/total/success tasks 1, 2, 1

... and what I have always assumed is that it means if the counts there are reached, the wu won\'t be sent out again.

When this one failed on one of my machines, I looked and found that, yes, it was registered as a failure, (it crashed out after 8 seconds), but that it had been sent to another cruncher.
34) Message boards : Number crunching : Long running but \"finished\" yet not. (Message 74693)
Posted 9 Dec 2012 by Profile adrianxw
Post:
I\'d suspended it, so reactivated it, and immediately it finished.

<fx>Shrugs></fx>
35) Message boards : Number crunching : Long running but \"finished\" yet not. (Message 74683)
Posted 8 Dec 2012 by Profile adrianxw
Post:
I have my run time preference set to 6 hours on this machine. This one raised my eyebrow for two reasons. Firstly, it has been running for almost 14 hours, and shows 86.286% complete. That, by itself, wouldn\'t cause me any concern - I have had the odd unit take this long despite the preference, (which I regard simply as a \"best endeavour\" type figure). Second though, the time remaining shows \"---\" as a complete wu does. I\'ve suspended it for now. Comments welcome.
36) Message boards : Number crunching : Long run and failure. (Message 73722)
Posted 31 Aug 2012 by Profile adrianxw
Post:
This morning, I noticed this wu was acting a little strangely. First, it has been running for more than 24 hours now, I\'ve been watching for a short while and it doesn\'t seem to be advancing, ie. the % done is not moving. Also, when I try to fire up the graphics from that wu,it doesn\'t run, I get the initial black rectangular window, but nothing more. When I try to stop the graphics, I get Windows saying the \"not running, abort or retry\" type message.

Digging a little deeper, I found another wu, this one, with an extended run time and an eventual error.

I have suspended the wu, pending comments. This is my usual machine I use almost all day, (ie not a dedicated BOINC cruncher), not had any other BOINC problems or issues with anything else. Fully patched, up to date Win XP machine.
37) Message boards : Number crunching : Work unit errors. (Message 69464)
Posted 21 Jan 2011 by Profile adrianxw
Post:
The machine that wu was on, (Intel Core2 Quad), rarely runs anything other than BOINC, certainly nothing else recently, (months). The projects allocated to it are Climate Prediction, Docking, Einstien, Leiden, Malaria Control, POEM, Rosetta and SIMAP. That project portfolio has not changed for some time, (years).

I fiddle around with the quotas from time to time, but not often, and not much. I tend to think of that machine as \"un-attended\", even though it is in the same room as me here, (I have to move the screen from this system to that to look at it). Of course, the projects change their apps from time to time, POEM has been going through the motions with POEM++ recently for example. Rosetta had a 20% share of it.

I HAVE seen tasks in the Waiting for Memory state on there recently, which may be relevent. It has 2GB in it. It will get the memory from this machine, (also 2GB), when this one is upgraded later this year.

Another thing is the graphics card, it is CUDA compatible and Einstien uses that now I think. I doubt this is relevent, but hey, it is a difference.

[edit] Forgot SIMAP on the tasks [/edit]
38) Message boards : Number crunching : Work unit errors. (Message 69443)
Posted 20 Jan 2011 by Profile adrianxw
Post:
I suspended the task a while back. I was going to abort it this morning, but figured it had been off for a while, I\'d \"just give it another chance\". So it then started running, and ran on to an apparent normal completion.

If the watchdog is supposed to have done stuff, well, it didn\'t. On both of the events I report in this thread, it apparently took manual action to get things going. I have seen problems on two different machines now, on different wu\'s.

I cannot risk running Rosetta anymore on machines that I am not looking at several times a day.

There is a problem here.
39) Message boards : Number crunching : Work unit errors. (Message 69407)
Posted 18 Jan 2011 by Profile adrianxw
Post:
This machine is in the \"Home\" group, the machine I refer to above is in the \"Work\" group. As far as I know, changes to settings in one should not affect settings in others. This being the case, it has, and has always had, 6 hours set.

The 17 hours, (now 21 hours and still 42.151% done), is elapsed. The CPU time in the Proprties tag does not appear to be advancing. There are four processes shown as \"Running\" in BOINC Manager, if the task is showing as running but actually not, then the process is wasting 25% of that machine. I have suspended it.

The graphics on these machines normally starts within a few seconds of issuing the request. I just went back to that machine, started the graphics and waited, after two minutes, I had a plain black window, and the window title had the \"Not Responding\" in it.
40) Message boards : Number crunching : Work unit errors. (Message 69404)
Posted 18 Jan 2011 by Profile adrianxw
Post:
Low and behold, another one. This one. Shows it has 17:34:10 running and a static 42.151% complete, 19:27:43 to go, rising. Similar, but not he same machine.

After the previous wu, I tried to start the graphics, it opens the window, but remains black.

I\'d set the desired run time to 1 day earlier, but set it back to the 6 hour normal. The machine with the current \"long runner\" was not in the same group as this one, so should never have seen the change, I would have expected the time out to have stopped it by now?

There is a problem here. I\'m setting No New Tasks across my systems.


Previous 20 · Next 20



©2017 University of Washington
http://www.bakerlab.org