Posts by Chris Holvenstot

1) Message boards : Number crunching : Web Site Updates (Message 71153)
Posted 27 Aug 2011 by Profile Chris Holvenstot
Post:
I've always said, that if rosetta had a website similar to that of WCG... many people would join, just because the website looks professional.


hah! that will be the day


Greg - you want to share your thoughts about WCG? You know something the rest of us don't?

That was an honest question. I've looked at the projects they represent and many of them seem to be worthy.

I don't think that I would crunch for the Clean Water project because I suspect that there is a real direct transfer between the Chinese University and Chinese Industry.

(they are looking at water filtering schemes)

In my gut I believe that any society that can throw up the computing resources necessary to create the Great Firewall and invest in several front-line super computers can do their own R&D

But that's just me and you can select and deselect projects at will.
2) Message boards : Number crunching : Web Site Updates (Message 71152)
Posted 26 Aug 2011 by Profile Chris Holvenstot
Post:
Rochester - don't think the phone call would do much good. It would likely go something like:

"Hello, this Rosetta. My name Peggy. What is problem?"

(for those of you not living in the US and scratching your heads - go to YouTube and search "My Name Peggy" - so representative of customer service now days.)

3) Message boards : Number crunching : Rosetta's credit granting compared to others (Message 71061)
Posted 15 Aug 2011 by Profile Chris Holvenstot
Post:
Not that it matters but one area where the credit granting system does skew things horribly is in the printing of Certificates of Participation or whatever they are called.

Not that I have ever known anyone who actually prints and frames them.

It is easy to accept that a Rosetta credit is "worth more" in terms of expended cycles than those granted by other projects and if you are of a competitive nature comparing Rosetta credits to those granted by another project will likely put you at a distinct disadvantage.

For me the only real use of the credit system is to act as a "red flag" that something is wrong when your RAC takes a significant hit.

However, even when you accept that the amount of processing cycles required to earn a credit varies between projects there is still another real area of inconsistency.

When generating the "Certificate of Participation" one credit is deemed to be equal to one cobblestone of work. A standard conversion is then made between cobblestones and floating point operations.

This conversion appears to be consistent across all projects - it is likely embedded in the "template" project website. After looking at a number of projects it appears that 1 cobblestone of work translates to 11,574 million floating point operations.

Thus you have to crank out a more hardware cycles with Rosetta to be credited with contributing a floating point operation than you do with other projects.

Since a floating point operation is an easily measured function of the hardware you would expect the number of floating point operations reported on the Certificate of Participation to closely reflect the actual number of hardware cycles donated, regardless of the project involved.

Not that it matters that much, but it was a good exercise for one of those sleepless nights.
4) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 71059)
Posted 15 Aug 2011 by Profile Chris Holvenstot
Post:
@Rochester - we need to shout put and hope someone with a current Windows 7 System takes mercy on us.

The last time I had a Windows system it was XP and you could get to the setup for color depth and screen resolution by right clicking on the desktop. However, my parole officer made me give up Windows before Vista come out.

You should be able to see the hardware configuration using the Device Manager which I believe is under "Start=>System"

Can someone help with a procedure to look at color depth and screen resolution on a Windows 7 system, PLEASE?

5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 71057)
Posted 15 Aug 2011 by Profile Chris Holvenstot
Post:
@Rochester - a shortage of shared system memory is often an indication that your video subsystem is eating a bunch of resources.

Please describe your video setup - integrated video on the mother board (uses shared memory) with an AGP interface?

The fact that your system profile shows a memory size of 3840 MB makes it look like you have a 4 GB system with an intergrated graphics solution using 256 MB of memory at the time the profile was run.

What are you running for screen resolution and color depth?

If you are running 31 bit color try dropping down to 24 bit or even 16 bit to reduce your video system's foot print on shared memory.

If you think that your video driver may have a memory leak then consider dropping back to the default non-vendor specific VGA driver supplied with Windows - the color will look crappy but it will prove / disprove the memory leak theory.

6) Message boards : Number crunching : Lack of communication from project (Message 71044)
Posted 13 Aug 2011 by Profile Chris Holvenstot
Post:
Greg - if you can't get a response here, you could always try forwarding your concerns and issues to either the Howard Hughes Medical Institute or the National Institutes of Health.

Both are listed as partners in the project on the home page - in this case I suspect that the term partner relates to funding ...

How serious are you about wanting a response <grin>
7) Message boards : Number crunching : Minirosetta 3.14 (Message 71043)
Posted 13 Aug 2011 by Profile Chris Holvenstot
Post:
I'm sorry, I think I need to editorialize a little bit. The T0423* tasks in my post were generated this past Thursday, a full week after the "1201 second" problem was spotted by another participant here.

Yet here we go again? Does anyone at the project read this forum? Better yet, does anyone at the project do anything to verify that a known problem is not propagated into a new batch of tasks before they are released into the wild?

While the cause of the problem behind the "1201 second" issue may be complex and as yet not identified, its signature is easy to spot - and could have been picked up in even the most rudimentary dry runs.

Dang!
8) Message boards : Number crunching : Minirosetta 3.14 (Message 71042)
Posted 13 Aug 2011 by Profile Chris Holvenstot
Post:
I am sure others have noted it already but tasks T0423* appear to be behaving in the exact same manner as the flxdsgn tasks of the past 10 days or so.

They are designed to generate only one decoy and if the system completes it in less than 1201 seconds it gets a validate error and is then sent to a second system.

Task ID 440948630 is an example where both my I and my wingman completed the task in less than 1201 seconds and we both got a validate error.

Task ID 440943948 is an example where my system completed the task in less than 1201 seconds and got a validate error while my wingman took 3350 seconds and got a success.


9) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 71041)
Posted 12 Aug 2011 by Profile Chris Holvenstot
Post:
Keep an eye out on the new T0xxxxxxxxx tasks.
Another person and me just had 1 each die on us.
He got a validate error and mine crashed and burned 50% of the way through.

Same here - have one 50.940% through. BOINC time was increasing but using no CPU time. I've just suspended it and resumed it with no effect. The Time Remaining for it isn't given in BOINC Manager and the graphics close pretty quickly after opening without displaying anything...


Me too ...
10) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 71025)
Posted 11 Aug 2011 by Profile Chris Holvenstot
Post:
Ed - thats about right - remember, the system will run past the target time if it is in the middle of a model (which for some unknown reason the call a decoy)

If it does run past the target time it will either terminate when the model it is currently working on completes, or until the "watch dog" wakes up and terminates it. This occurs when you reach a point four hours past the target time.


11) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 71024)
Posted 11 Aug 2011 by Profile Chris Holvenstot
Post:
Rochester - the tasks which are not completing - are they actually pulling cycles or have they stalled - what does your Performance Monitor (or whatever Windows calls it) currently show.

I also note you are running Windows 7 (unlike most of the Windows users here who have moldy copies of XP) - did you put on any maintenance this past weekend before the issue of non-completing tasks started?

And finally, was whoever it was who gave you the moniker "New York" upset at you and seeking to punish you for something?
12) Message boards : Number crunching : Lack of communication from project (Message 71023)
Posted 11 Aug 2011 by Profile Chris Holvenstot
Post:
Robert - you have a point but the issue with a lack of communication has been around a lot longer than the summer break.

I am more of a mind that the task of communication is much like that of the documentation in the shop I used to work in - we had a lot of talented coders who would work endless hours, but would go miles out of their way to avoid documenting anything beyond the bare minimum necessary to get through code reviews.

Writing (aka communication) was always considered to be drudge work to be avoided at all costs.
13) Message boards : Number crunching : Lack of communication from project (Message 71007)
Posted 10 Aug 2011 by Profile Chris Holvenstot
Post:
Greg - you can already tell when the project is out of work by looking at the server status page. I think that people are looking a "one liner" now and then about what is being done to resolve the issue and what the expectations for a return to normalcy are.

Last year at this time the project was cranking out 150+ TeraFLOPS - granted that was when SETI was having its issues but I think one of the big reasons R@H is now only producing 100 TeraFLOPS on a good day is the abhorrent lack of communication.

14) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 70994)
Posted 9 Aug 2011 by Profile Chris Holvenstot
Post:
Sid - just did a quick scan through my logs for today and I'm not seeing any problems with the number of available work units or with their successful completion.

I also took a quick look at his task log and see that as recently as two days ago he was successfully processing tasks. Since then all I noticed was a bunch of "abort before start" (and not on the flex design WU)

I did not see a post describing a problem so I can't offer any suggestions.
15) Message boards : Number crunching : Minirosetta 3.14 (Message 70985)
Posted 9 Aug 2011 by Profile Chris Holvenstot
Post:
@Sid: Yes, I know about the script, and quite frankly I really don't give a rat's posterior about the credits - lost or gained.

My logic in aborting these tasks while they were still in the queue was based on efficiency. To me it made no sense to run a work unit that was predestined to generate a validation error just to have it routed to a wingman to be recomputed.

Why compute the same work units twice or even three times (if the wingman also had a fast processor)

As far as being cheeky? I think that you are just showing your insecurity in the face of American exceptionalism. We may be the upstarts on the block, but we're catching up. For years you could proudly claim to have the world leader with the biggest ears.

But I think that Prince Charles has now been eclipsed by Obama.
16) Message boards : Number crunching : Minirosetta 3.14 (Message 70943)
Posted 7 Aug 2011 by Profile Chris Holvenstot
Post:
ED - one more thing - your comment about "working against the developers" - I understand exactly what you mean but in this case I think the developers probably have a thousand or more of these validate failures to evaluate.

Further, one of the real downsides to the Rosetta project is that there is almost no communication between the sysadmins and developers and those doing the crunching.

If the developer/scientist/student responsible for these tasks were to come out and state that they needed to look at a few more failure cases I would be pleased as punch to provide them.

However, the way it is around here there is a fairly good chance that the developer is not even aware of the issue, or he is aware, already has a fix in hand and is just letting the "broke" tasks flow through the system.

It is not likely you will ever see a post here from the project explaining what happened. Sad but true.
17) Message boards : Number crunching : Minirosetta 3.14 (Message 70942)
Posted 7 Aug 2011 by Profile Chris Holvenstot
Post:
@ED - in my case I am not a developer for BOINC or Rosetta. My reason for being here is two fold. First, I believe the work being done by the Rosetta project is important. The second reason is BOINC / Rosetta provides a nice solid testbed for the optimized Linux kernels I build in another life. The amount of credit granted by Rosetta is consistent enough that after running for a week to ten days I am able to evaluate the value of the optimizations I am testing.

Aborting a task is not a normal thing - I have never attempted to automate an abort in the past - however it is becoming clear that if your machine is fast enough to complete a flxdsgn_Ploop task in less than 1201 seconds you are not going to get a clean validation and the completed task will be sent to someone else for a second attempt.

It appears that there is a bug in these routines and if you have a fairly fast machine, you may be predestined to have the task complete with a validation error.

By the way welcome to the project - hope you enjoy associating with some weird (or diverse) folks.
18) Message boards : Number crunching : Minirosetta 3.14 (Message 70940)
Posted 7 Aug 2011 by Profile Chris Holvenstot
Post:
BINGO Robert - good eye.

I just screened about 100 of these tasks on two of my hosts (129350 & 1300412) and in each and every case your hypothesis was correct. Not only in the case of my hosts, but also for those of my "wingmen" whose systems are fast enough to complete the decoy in less than 1201 seconds.

I did not spot a single case where a system got a clean validation when the decoy was produced in less than 1201 seconds.

So I guess the bad news is that I have a few hundred of these tasks on the four hosts I currently have dedicated to Rosetta. The good news is they blow through pretty quick.

Now I guess its time to see if I can spot these in the queue and abort them before they start with some sort of chron script.

Thanks

19) Message boards : Number crunching : Minirosetta 3.14 (Message 70928)
Posted 6 Aug 2011 by Profile Chris Holvenstot
Post:
I've also had a bunch of them - so far I count 27 of them - same basic task name. Same validate error. Same claim that watchdog nailed them after 1201 seconds - although in most cases the task list shows they only ran somewhere between 750 and 950 seconds.

They have all been sent out to someone else for a "second try" - I've been good all week so maybe fate will smile on me and they will end up on Sid's bucket.

20) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 70893)
Posted 4 Aug 2011 by Profile Chris Holvenstot
Post:
@TCPBE -

Right up front I will say that during the year and a half that I have participated in the Rosetta, the developers and SysAdmins have set the bar for communications pretty low, and then consistently miss the mark.

Now I take a different position than my friends in the “Bangers and Mash” crowd who espouse the viewpoint that we, as volunteers, are due nothing, and we really don't have a leg to stand on when it comes to complaining.

It is my position that each and every one of us that regularly contribute cycles to the effort are full partners in the project. Unpaid partners to be sure, but partners none the less.

While there may be some among us who look at the collection of BIONC credits as a sport, I believe that the vast majority of those participate view themselves as members of a team working towards a common goal. And as members of the team we are due certain things:

1. We should have the expectation that the resources we contribute are used in a responsible manner and are expended towards meeting the stated goals of the project. I have no doubt that the researchers at the Rosetta project are sterling in this area.

2. As partners in this endeavor we are entitled to be kept in the loop when it comes to issues that effect us such as server failures and interruptions in the expected flow of work units. Unfortunately Rosetta's track record in this area is abysmal to say the least.

However, there is no expectation that Rosetta is a “full employment” program for our systems. It is expected that from time to time there will be a pause in the work flow. Just keep us informed.

This lack of communication actually caused me to withdraw from the project for a period of time. I believed then, as I believe now, that failing to keep us volunteer partners informed in a timely manner was not an option.

However, I have ** NEVER ** seen a post censored unless it was blatantly vulgar or contained a personal attack. And there were moments when I was pretty blunt with the moderator Mod.Sense (who at moments I referred to as Non.Sense)

If you really think that you had a critical post “censored” it is likely you crossed the line when it came to decorum or personal attacks.








Next 20



©2024 University of Washington
https://www.bakerlab.org