RAC dropping, BOINC dropping comms

Message boards : Number crunching : RAC dropping, BOINC dropping comms

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 32272 - Posted: 8 Dec 2006, 14:02:37 UTC
Last modified: 8 Dec 2006, 14:08:47 UTC

I'm beginning to wonder if anyone is even listening to all these bug reports. I've had no response at all to the crash logs I've emailed and posted on the forums. Even posts on the BOINC Message Boards, which are supposed to be for reporting bugs in the software, get fobbed off. And Rosetta staff haven't said a word.

I'm prepared to collect as much information as required - if only someone was actually listening.

All I can suggest is that everyone posts details including crash logs over on the BOINC Message Boards, specifically this thread.
ID: 32272 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,589,590
RAC: 198
Message 32273 - Posted: 8 Dec 2006, 14:10:07 UTC - in response to Message 32272.  

All I can suggest is that everyone posts details including crash logs over on the BOINC Message Boards, specifically this thread.


Will do. I could use some help identifying the crash logs.
Reno, NV
Team: SETI.USA
ID: 32273 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 32274 - Posted: 8 Dec 2006, 14:21:55 UTC

Whenever boinc.exe crashes, it adds a crash log to the end of the stderrdae.txt file. It's not always easy to find where the last entry starts and the previous one ended though ("Exiting..." is usually the end of an entry).

One option would be to move that file somewhere each time you get a crash so that the crash logs are easily separated.

I usually just post the bit that says what address the code was on when it crashed.
ID: 32274 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 32279 - Posted: 8 Dec 2006, 16:08:24 UTC
Last modified: 8 Dec 2006, 16:12:53 UTC

It also help if people unhide their computers so people can look at the hardware and scan through the logs that are there.


It is also wise to test out the latest version 5.7.x to see if that fixes any problems. The chanelog is quite large since 5.4.11, there has been the 5.5.x series of releases (14/15 revisions ?), the 5.6.x* (5 revisions) series of releases and 5.7.x (5 revisions so far)


* it hard to track what of 5.6 goes into 5.7 since they are in different logs afaik.

Though I know Marky has tried some of the later versions.
Posting in that thread is also a good idea.
Team mauisun.org
ID: 32279 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,589,590
RAC: 198
Message 32296 - Posted: 8 Dec 2006, 21:15:30 UTC - in response to Message 32279.  

It is also wise to test out the latest version 5.7.x to see if that fixes any problems. The chanelog is quite large since 5.4.11, there has been the 5.5.x series of releases (14/15 revisions ?), the 5.6.x* (5 revisions) series of releases and 5.7.x (5 revisions so far)


All my windows machines (which are the machines having the problem), are running 5.7.5. FWIW, my machines aren't hidden.
Reno, NV
Team: SETI.USA
ID: 32296 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 32320 - Posted: 9 Dec 2006, 11:04:41 UTC - in response to Message 32296.  

It is also wise to test out the latest version 5.7.x to see if that fixes any problems. The chanelog is quite large since 5.4.11, there has been the 5.5.x series of releases (14/15 revisions ?), the 5.6.x* (5 revisions) series of releases and 5.7.x (5 revisions so far)


All my windows machines (which are the machines having the problem), are running 5.7.5. FWIW, my machines aren't hidden.


zombie, Marky-UK's are ;)


Team mauisun.org
ID: 32320 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Faust

Send message
Joined: 7 Sep 06
Posts: 14
Credit: 49,559
RAC: 0
Message 32537 - Posted: 12 Dec 2006, 19:54:16 UTC
Last modified: 12 Dec 2006, 20:03:13 UTC

Well, since it happens every single night, I came up with some sort of a 'solution'.. or better say a quick glue fix.

I set up a simple macro using automated software that every <1> hour closes and re-opens boinc. so if your'e not around the computer and boinc dissconnects it would re-open it and reconnect(the thing you would normally do manually). you still lose some work, but better than losing a whole night or day.

Unfortunatley for me the machine is shared with other family members so ofcourse when they started using the computer and moved the mouse it all messed up. but it should probably work if your'e the sole user.

However, this is easily the most annyoing defect i've exeperienced with Boinc/Rosseta in my short era here. so I will probably look for another project to crunch up for instead of wasting so many clock cycles - at least until the issue resolves. unless it happens with all Boinc-releated projects - which would then mean it's a much bigger problem than I first thought.


Faust.
ID: 32537 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 32769 - Posted: 16 Dec 2006, 21:59:59 UTC

I've seen several comments (various threads) by participants that their RAC is dropping. Does anyone know for certain that credits currently issued are less than before? It seems to me that it is, why?

My resource shares were set up 50/50 between Einstein and Rosetta. My RACs were about 226 and 176, respectively. For no good reason I decided to try and establish a balanced RAC of 200/200 so I increased the Rosetta resource share to 60% which is obviously 50% more share than Einstein's now 40%. This arrangement has existed for a few weeks now and my RAC for Rosetta has dropped from 176 to an average of 166 while the Einstein RAC remains above 200. Einstein has experienced outages recently and Rosetta has caused BOINC client to terminate (as described in this thread) during this same time period.
ID: 32769 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 32771 - Posted: 16 Dec 2006, 22:38:16 UTC

If you've had any WU failures (I'm getting them as I try to test the screen saver) they seem to consistent be awarded 20 credits. These are when the watchdog ends the task. So you'd have the hour that the task was detected to have spent without any progress on the model, gone, plus, depending upon how long you crunch WUs normally, you may have many many hours in to it before it gets hung up.

...that plus BOINC dropping the local host... which, for me, has been happening much less frequently recently.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 32771 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 11 Feb 06
Posts: 316
Credit: 6,589,590
RAC: 198
Message 32780 - Posted: 17 Dec 2006, 1:26:28 UTC - in response to Message 32769.  

[...]Rosetta has caused BOINC client to terminate (as described in this thread) during this same time period.


Actually, it's not just Rosetta. I have several machines that are not running Rosetta that have experienced the problem several times each. So either other projects have the same problem with their applications, or the problem is with TCP/IP (always happens during up/download), or the problem is purely a BOINC problem.

I emailed some error logs to Rom, but haven't heard anything back.
Reno, NV
Team: SETI.USA
ID: 32780 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 32804 - Posted: 17 Dec 2006, 12:41:12 UTC

Feet1st: I haven't had any WU failures or aborts, but other participants have. I believe you suggested before that aborts with 0 credits are being used in computing the credit to grant (something like that)?

Zombie67: Isn't/wasn't ROM in the middle of a move and can't work on this problem? I know I haven't been able to access his ROM World site for quite some time.
ID: 32804 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 32805 - Posted: 17 Dec 2006, 13:04:28 UTC - in response to Message 32804.  

Feet1st: I haven't had any WU failures or aborts, but other participants have. I believe you suggested before that aborts with 0 credits are being used in computing the credit to grant (something like that)?

Zombie67: Isn't/wasn't ROM in the middle of a move and can't work on this problem? I know I haven't been able to access his ROM World site for quite some time.


Moving, and a was I think since he's started again on the boinc code (5.8.0 release) so he'll be very busy sorting that out first.
I guess they would need to pay Rom to do some work again, like last time.
Team mauisun.org
ID: 32805 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 32812 - Posted: 17 Dec 2006, 18:17:17 UTC

Yes, what I'm seeing is if the watchdog end the WU, I get 20 credits (even though 100 might have been more fair for the work done on it), and if the WU crashes, well those are the ones you have to bring up each WU, looks like they eventually are awarded 20 credits as well.

What I had suggested at one time, and never got a specific confirmation on, was that perhaps these credit awards for the failures are thrown in to the rolling average credit per model of a given WU. At the time of report, a failing WU gets zero credit. Then a nightly job grants it 20 credits. Either way, it seems to me these might be ways that the rolling average gets skewed on the low side. In fact, if a given WU is built wrong and failing for many, I'm really unclear as to what that would do to this rolling average that is maintained.

I'm also unclear why an end by watchdog doesn't seem to report any of the successfully completed models. I mean we checkpoint after each completed model, so it's not like the data disappears. And if the completed models WERE reported in when the watchdog ends a task, then the credit issued could be more in-line with the productive time spent on the task prior to it's end.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 32812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 32814 - Posted: 17 Dec 2006, 18:22:23 UTC

Idle: a quick glance down your WU list shows that, on average, you are "granted" more credit then you "claim". This would contradict what I'd expect to see if my zero credit skewing theory were occuring.

Didn't the credit system recently change at Einstein as well?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 32814 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Buffalo Bill
Avatar

Send message
Joined: 25 Mar 06
Posts: 71
Credit: 1,630,458
RAC: 0
Message 32820 - Posted: 17 Dec 2006, 19:39:47 UTC

I've noticed my RAC slowly moving down over time. I crunch mostly Rosetta so I thought this may be because fewer participants are using the optimized BOINC clients that claim higher credit. This would bring down the rolling average that Rosetta uses for granted credit. Combine this with the graphics problems and it could be enough to cause a slight downward trend. Are the science apps getting more intense too?

Just some thoughts.

Bill
ID: 32820 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 32821 - Posted: 17 Dec 2006, 19:52:04 UTC
Last modified: 17 Dec 2006, 19:56:31 UTC

I'm not having WUs crash, probably because I only use the graphics to see if Rosie's still alive, but in the last week, there've been two days where the daily credit is down by 50% or so, at least according to theBoincsynergy web site. EG, the 12th should have had around 2k and it had ~800. The 16th should have had around 2k and it had 1k. They're pretty easy to pick out when you expect around 2k every day. And I went to the individual results for each computer and the ones with 8hr jobs are turning in 3/day and the ones with 12hr jobs are turning in 2/day/core, etc., and their values are all ballparkish. Everything looks smooth from that POV. So, is it Rosetta/stats or Boincsynergy that can't add?
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 32821 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hedera
Avatar

Send message
Joined: 15 Jul 06
Posts: 76
Credit: 5,156,426
RAC: 714
Message 32833 - Posted: 18 Dec 2006, 0:48:18 UTC

I'm confused here. I had the case the other day where BOINC Manager just dropped everything, and I had to kill the client and restart. Feet1st said something about BOINC versions (?) later than 5.4.11 but I just checked the BOINC site and 5.4.11 is the current Windows version; are the other version numbers some other O/S?
--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

ID: 32833 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 32834 - Posted: 18 Dec 2006, 1:04:48 UTC - in response to Message 32833.  

I'm confused here. I had the case the other day where BOINC Manager just dropped everything, and I had to kill the client and restart. Feet1st said something about BOINC versions (?) later than 5.4.11 but I just checked the BOINC site and 5.4.11 is the current Windows version; are the other version numbers some other O/S?

Hi, go to the download page, then click on All Versions, the latest "alpha" version will be listed there.

ID: 32834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 32842 - Posted: 18 Dec 2006, 7:59:08 UTC - in response to Message 32834.  

I'm confused here. I had the case the other day where BOINC Manager just dropped everything, and I had to kill the client and restart. Feet1st said something about BOINC versions (?) later than 5.4.11 but I just checked the BOINC site and 5.4.11 is the current Windows version; are the other version numbers some other O/S?

Hi, go to the download page, then click on All Versions, the latest "alpha" version will be listed there.



Astro, we are on 5.8.x series now, so they should be classed as Release Candidates (windows world) or Beta (Mac world), don't think the linux one has been compiled yet.
http://boinc.berkeley.edu/download_all.php

5.9.x series is certainly alpha though.
Team mauisun.org
ID: 32842 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 32856 - Posted: 18 Dec 2006, 12:23:30 UTC

I'm of the opinion that until Rom puts in on the download page as "recommended", it is still alpha. He even posted to the Alpha list that he wanted us testers to have a go at it before he releases it as "recommended". It was a good call too, since it had installer issues(which have been corrected now). LOL

It's just my opinion.

When they moved to Major version 5, they implemented the new three number version system (one that hasn't strictly been followed, IMO). Using 5.x.x as the example, the first number 5 means it's "major version 5", the second number is the "minor version"number. If it's ODD it's Alpha, if it's EVEN it's a recommended version. The third number is the "release number".

So, yes, it says 5.8.0, so it's slated to be a recommended version, it's just not supplanted 5.4.11 yet.


ID: 32856 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : RAC dropping, BOINC dropping comms



©2024 University of Washington
https://www.bakerlab.org