Posts by BadThad

21) Message boards : Number crunching : Rosetta Process Stalls (Message 34379)
Posted 8 Jan 2007 by BadThad
Post:
Looks like most of your hosts are 5.4.11. But This one is 5.2.11. Similar BOINC issues have been reported. It might be helpful if you could identify the specific host. BOINC manager seems to lose contact with the running threads. And seems to not detect when they end (generally with a no heartbeat indication) to schedule more tasks.


This is the machine I'm having the problem with.

Tonight I'll check the core temps with 100% load using Intel TAT to make sure it's not throttling. I'd be surprized because the regular Intel temp monitor utility shows load temps at about 70°C per core. The P4/PD CPU's don't normally start throttling until about 85+°C, and I'm way below that.....but one never knows.

Looking over the results, there's a ton of "compute errors". Guess that could be the root of the problem? Maybe I need to run some dianostics on that PC to check for a bad CPU or bad RAM modules?
22) Message boards : Number crunching : Rosetta Process Stalls (Message 34367)
Posted 8 Jan 2007 by BadThad
Post:
Which BOINC version are you running?


Whatever the latest version is. Last night I tried a simple uninstall/reinstall, but it didn't help. Last I checked both processes were "running" but stalled.
23) Message boards : Number crunching : Rosetta Process Stalls (Message 34335)
Posted 8 Jan 2007 by BadThad
Post:
Seems to have just started doing this a week or two ago. New PC:

CPU: Intel Pentium D 915 2800MHz @ 2800MHz
Motherboard: Intel D945Gpm
Memory: 1024 MB of Corsair DDR2-667
PS: OCZ Modstream
Video Card: ATI X700 Pro 256MB Radeon
Hard Drive: Seagate 7200.10 320.0 GB @ 7200 RPMS
OS: XP Pro with all updates

The BOINC client shows two processes are running, but the time is not incrementing and neither core shows a load in task manager. Shutdown the client and restart, it runs fine....for awhile. Today only ONE process was running on a single core.Restarted BOINC, everything is fine, came back 4-5 hours later to find Rosetta "stalled" again.

Screensaver set to blank, no other running processes except for antivirus (SAV 10.0.1). I know a lot about PC's, so there's no viruses nor malware on the system, it's very clean. Temperatures and voltages are fine, the PC is working perfectly.

Any ideas as to why the Rosetta processes are stalling?


24) Message boards : Number crunching : Is this for real? (Message 32379)
Posted 10 Dec 2006 by BadThad
Post:
Take a look at my account, you'll see something similar. In my case, some of the computers I've installed R@H on re-image themselves every night. These computers are in labs and the imaging software wipes off any changes students make during the day. . . and it also kills the boinc install. Every day after they are reimaged, the machines have to reattach to the project. They are the same computer, but each can show up as hundreds of 'hosts' in the system.

It's not easy to get around the imaging software. . if we remove the boinc directory from the reimaging, anything installed in that folder will stay. Yes there are ways to lock it down but IT staffs can't spend a lot of time trying to make it perfect since Boinc isn't likely a high priority to their manager :)


OK, good explaination, thanks. I guess you'd know if he was a cheat of some sort. As long as it's all in good science, I'm happy.
25) Message boards : Number crunching : Is this for real? (Message 32374)
Posted 10 Dec 2006 by BadThad
Post:
Over 6500 hosts?

I find it hard to believe!
26) Message boards : Number crunching : Help us solve the 1% bug! (Message 12526)
Posted 22 Mar 2006 by BadThad
Post:
this 1% bug, i think, is a big turnoff for a lot of people. especially the ones with "farms" and can not get to them daily. i had 3 computers at my 2nd job that got stuck for 6 days last week. i reset them saturday and now it looks like 2 of them are stuck on 1% again for the past 2 days. our team is even talking about moving on to something else because of the 1% bug and wasted cpu cycles. we really like rosetta as a whole but it seems to require a lot more monitoring than other projects. hope it is solved soon.


Indeed, that is my problem, I cannot baby sit my machines. I've had one PC hung since December 13 that I simply cannot get to....not for another month or two at least. I'm growing closer and closer to "bugging out" of Rosetta.
27) Message boards : Number crunching : Help us solve the 1% bug! (Message 12430)
Posted 21 Mar 2006 by BadThad
Post:
Arrgggg.....looks like the 1% stuck wu's are back:

FA_RLXc9_1c9oA_359_372_0

1% after 19 hr 44 min.
28) Message boards : Number crunching : How about some QC on Rosetta WU's? (Message 10631)
Posted 10 Feb 2006 by BadThad
Post:
With all the ongoing complaints about WU errors, perhaps it would help if the project stated the total error rate they see in the database to put things into perspective. Judging from the ~2000 WUs I have crunched, I would guess that it must be considerably below 1% (one failed WU among 184 on my current results page). It might also be of interest to see how the error rate varies across different hosts/OS type/BOINC versions...


In fact, the overall error rate is pretty low. with the cpu time limit problem fixed, it appears that a relatively small fraction of users are having the majority of the wu probems--we wish we understood what was causing these!


I think there's a small fraction of users with wu problems because I received all the bad ones on the 30 systems I run Rosetta on. LMAO
29) Message boards : Number crunching : How about some QC on Rosetta WU's? (Message 10540)
Posted 7 Feb 2006 by BadThad
Post:
The "max time exceeded" or "stuck at 1%" or "wu hosed for whatever reason" is getting out of control with this project. I have too many machines to waste time babysitting them every day. I've been running DC projects for many years and this is the only project I've seen send out masses of bad wu's.

Get some QC on those wu's....PLEASE. One sure way to kill a project for people that run lot's of computers is to force us to babysit the dang client. I've had countless hours of CPU time completely wasted away with Rosetta, IT MUST STOP!
30) Message boards : Number crunching : Hyper Threading or not? (Message 3417)
Posted 16 Nov 2005 by BadThad
Post:
<------ From emjem, not BadThad:

Windows task manager shows that my CPU is spending ~50% of it's time on each of the two WUs running. So on the surface it would appear that an HT cpu does twice as much work as a non-HT unit. But this could very well be false logic since not all is as it appears in the cpu world. It would be nice to have some FACTS on this issue.

As for the memory usage issue with HT I don't see a problem. Two of my P4 3.2 systems use ~69 meg for each 'cpu'. So it would seem that any system with at least 256meg of ram has room for around 80% growth in WU size.


Previous 20



©2024 University of Washington
https://www.bakerlab.org