Posts by deesy58

21) Message boards : Number crunching : Servers? (Message 68017)
Posted 10 Oct 2010 by deesy58
Post:
My queue of work units finally emptied out last night at about 11:30 PM PDT, and my machine remained idle until 8:42 AM PDT this morning. Then, minirosetta 2.16 was downloaded, along with 17 new work units. The first nine of these work units aborted with computation errors. Two are currently being processed and six more are waiting to start.

2.16 appears to be exacerbating the computation error problem. I also had computation errors before 2.16 was downloaded to my machine, but I'm not sure if they were from 2.15 or 2.14.

The servers appear to be functioning again, but they are not responding very quickly.

deesy
22) Message boards : Number crunching : Servers? (Message 68001)
Posted 9 Oct 2010 by deesy58
Post:
Last night at between 12:30 AM and 01:00 AM PDT, it appeared that all servers were down. I was unable to access even the "Server Status Page."

Now, the Server Status Page shows all servers up and running, but I have 14 completed tasks waiting. Of the 14, 5 failed on "Computation Error[s], and the others are "Ready to Report." I have two tasks running, and four more "Ready to Start." I am running Windows 7 on an old Pentium D.

Is anybody else seeing this problem? Is this just a temporary outage for maintenance?

deesy

23) Message boards : Number crunching : minirosetta 2.15 (Message 67964)
Posted 5 Oct 2010 by deesy58
Post:
Greetings:

Unfortunately, just like everyone else who's been posting recently, I'm starting to have similar difficulty with the new version of Rosetta.

I'm using a Toshiba Satellite notebook computer with nearly a hundred gigabytes of memory.

Yet, I just now woke up to see the Rosetta screensaver graphic displaying a blank form, and frozen in place.

An alert on my task bar indicated I was out of virtual memory.

Regretfully, I'm going to have to disconnect from the Rosetta project.

Thank you.


I think you might have "nearly a hundred gigabytes" of hard disk space capacity, but I seriously doubt that you have that much RAM memory (Random Access Memory). Virtual memory is a combination of both types of memory, and a shortage of virtual memory is often an indication that your hard disk has become nearly full. How much disk space is available on your machine? To find out, open your "Computer" icon, then right click on "Local Disk (C:)" and select "Properties." This should tell you how much space is available on your hard disk. If you have insufficient space left, you can use the "Disk Cleanup" utility (carefully) to remove files that might no longer be needed. All of this assumes that you are using the Microsoft Windows Operating System, of course, and it appears that you are.

BTW, those "reverse slashes" you referred to in your original post are, indeed, normal in the Microsoft environment.

deesy
24) Message boards : Number crunching : SERVER PROBLEMS - 2. (Message 67917)
Posted 1 Oct 2010 by deesy58
Post:
Add me to the list. I noticed that the "Server Status Page" showed all servers up and running yesterday, but my machine has 19 completed tasks waiting to upload or report, and no work to do.

Is that "Server Status Page" really useful at all?

deesy
25) Message boards : Number crunching : no work units (Message 67665)
Posted 9 Sep 2010 by deesy58
Post:
My machine (admittedly older and less powerful than the new ones) had been earning about 800 credits/day until I changed my settings to those you had recommended. Now it is earning about 750-760, so it appears that the defaults might not actually be sub-optimum. Not that I care especially about the total number of credits, except that it is apparently a measure of the level of contribution made by a cruncher.


deesy


AFAIK, settings don't affect the work done. Although, when having a shorter run time, you are less affected credit-wise (and like you said amount-of-work-done-wise) when one of your WUs fail for any given reason. (Compare a WU failing @ 19hrs of processing to a WU failing @ 2hrs of processing... Who lost the most?)

Are you seeing these changes in RAC on a 24/7 running machine? If not, then a few hours more left on/off can make such differences in RAC. ((Sometimes when you restart your PC, the WU starts a bit behind than were it left off before shutting down)(this doens't happen on 24/7 running machines)) If yes, then it's probably what I mentioned first.

Bet you haven't seen double parentheses on the work before :-)


After I changed my settings to 12 hours as a result of a suggestion on this forum, my RAC dropped to about 750 (746 to be exact). When I changed it back to 4 hours, it rose to about 800, then leveled off. I have experienced no interruptions or restarts during this period (that I am aware of). In spite of my vaaast manufacturing and computer systems experience, I can't explain the increase. :-) One thing seems certain though: There is a correlation between CPU time and credit awards.

I have just restarted another machine, so I will probably get a kick in my credits for a while, but I think my previous observations have been confirmed.

deesy
26) Message boards : Number crunching : no work units (Message 67627)
Posted 7 Sep 2010 by deesy58
Post:
Cogent? That was that cop show in the 70's with Telly Savalas, right?


Be careful! You're dating yourself. Soon, somebody will accuse you of changing your age and experience. :)

deesy
27) Message boards : Number crunching : no work units (Message 67614)
Posted 7 Sep 2010 by deesy58
Post:
deesy: I'd love to chat more, but seeing as you're putting on 10 years to your age and experience each time you post I'm worried for your (and everyone's) health. There was some good advice from others earlier in the thread. I will be taking it. Adieu, with all best wishes for your crunching.


Why am I not surprised when somebody who is unable to formulate a cogent response runs away, taking another cheap shot over his shoulder as he departs?

deesy
28) Message boards : Number crunching : no work units (Message 67604)
Posted 6 Sep 2010 by deesy58
Post:
My machine (admittedly older and less powerful than the new ones) had been earning about 800 credits/day until I changed my settings to those you had recommended. Now it is earning about 750-760, so it appears that the defaults might not actually be sub-optimum. Not that I care especially about the total number of credits, except that it is apparently a measure of the level of contribution made by a cruncher.


deesy


AFAIK, settings don't affect the work done. Although, when having a shorter run time, you are less affected credit-wise (and like you said amount-of-work-done-wise) when one of your WUs fail for any given reason. (Compare a WU failing @ 19hrs of processing to a WU failing @ 2hrs of processing... Who lost the most?)

Are you seeing these changes in RAC on a 24/7 running machine? If not, then a few hours more left on/off can make such differences in RAC. ((Sometimes when you restart your PC, the WU starts a bit behind than were it left off before shutting down)(this doens't happen on 24/7 running machines)) If yes, then it's probably what I mentioned first.

Bet you haven't seen double parentheses on the work before :-)


My machine runs 24/7. At one time (a couple of months ago), I changed my run times to 24 hours on both CPUs, and it seemed to me that the tasks would crash more often, and I would lose much more progress on the restarts. That's why I accepted the defaults when I reinstalled BOINC after upgrading to Windows 7. I did, however, change the run time from the default 3 hours to 4 hours, and my RAC climbed to about 800, and it hovered there until I changed the run times to 12 hours.

There appears to be some correlation between credit awarded and CPU run times (at least on my machine). I have not seen a complete task crash since the middle of June, so that is an improvement. I don't know if the stability improvement was because of my upgrade to Windows 7, or because the Work Units were different. I do have a record of seven Work Unit tasks that crashed on my machine between May 5, 2010 and June 13, 2010, but nobody seemed interested, so I just killed the task and accepted the fact that I had lost progress and credit. I did post a couple of WU Names on a different Rosetta forum, though.

deesy
29) Message boards : Number crunching : no work units (Message 67599)
Posted 5 Sep 2010 by deesy58
Post:
Deesy: I tried to end previously on such a positive note, but it appears little of my message got through. The time comes when I have to wonder whether the problem is with my explanation or your understanding. Looking at your responses from everyone here and the usual feedback I get, the balance of likelihood is very one-sided.

I'd taken it, when you said you worked in IT for 30+ years and were a long time IT Director, that's what you meant, so when you said there were so many (simple) things you'd never come across, it made some sense. But it seems you now have 40+ years in manufacturing too. Which is it? Getting a bit whiffy.

I think I’ll return to the BOINC defaults. They wouldn’t be defaults if they were sub-optimum, would they?

They would be lowest common denominator settings available for all projects, so of course they'd be sub-optimum. But if you think it'll solve your problems, don't let me stop you.

Is there any part of this you're actually getting, because you're doing a great job of convincing me otherwise? I'm already certain you appreciate little of logistics. Now I'm doubting if you work in IT either. Sad.


As usual, your interpretation is too narrow. I have actually been working in a manufacturing environment for more than 50 years, but I didn't count the portion of my manufacturing experience that took place before my military service. I didn't say I worked in IT for 30+ years, I said I worked with systems for 30+ years, which is true. These systems included the systems I worked with when I studied for my degree in Computer Sciences from a major American university, and the systems design and development I worked on for one of the largest computer manufacturers in the world. Oh, that would also be manufacturing, by the way. We manufactured computer hardware and software.

After that, I held a number of positions in manufacturing companies that produced computer-based products. At two of my former positions, we designed and built our own microprocessor-based computers and communications systems. As most people are aware, computers are of little use without the appropriate software or firmware that makes them perform the tasks expected of them, so systems skills are always required to succeed in such an undertaking.

BTW, if you consider cheap shots to be "such a positive note," I wonder what you say when you are being critical and insulting.

My machine (admittedly older and less powerful than the new ones) had been earning about 800 credits/day until I changed my settings to those you had recommended. Now it is earning about 750-760, so it appears that the defaults might not actually be sub-optimum. Not that I care especially about the total number of credits, except that it is apparently a measure of the level of contribution made by a cruncher.

By the way, you still have not responded to my challenge to show me where I had ever complained about the number of tasks waiting in my buffer, which you appeared to have addressed to the exclusion of everything else in your previous posts.

Well?

deesy
30) Message boards : Number crunching : no work units (Message 67590)
Posted 4 Sep 2010 by deesy58
Post:
Quite possibly! Here are the parts where I agree with you:


- "In the real world of manufacturing, however, I have never seen a situation like that."
- "I believe that I made my situation clear."
- "...I am still not clear just exactly what point you’re trying to make."



Again, not a criticism. You can only know what you know and experience what you experience and it's pretty clear how limited that is. Not that it's hard to understand - I'm sure you'd pick it up straight away if the distinction of the differences was pointed out to you, but until then it's not a good idea to apply a limited understanding to scenarios where it doesn't apply. The saying "when the only tool you have is a hammer, every problem looks like a nail" applies here.


My! How astute of you! I believe that it was you who initially attempted to apply the KanBan analogy to the Rosetta project. Again, it appears that, because you are unable to make your point, you are resorting to an ad hominem attack. Sad!

Insofar as “limited” is concerned, I’ll match my 40+ years of experience in manufacturing, at all levels, in various capacities, in different industries, both discrete and continuous, against your 20+ years any day. But, keep in mind that I’m not the one who started this really distasteful process of bragging about my own experience while criticizing that of my adversary. What was it that was said about “dueling with an unarmed man”?


Parts that I disagree with or are misunderstood or flat wrong:

These assertions imply that I WAS crunching a task during the extended period when no Work Units were being received. That is not a valid assumption.


Just as well that wasn't stated or implied then. You seem to have a big problem with time. The above applied after you'd been re-supplied with tasks (though insufficient to fill your buffer). At the time you were totally out it was already too late to say something. Your 'set and forget' approach (perfectly legitimate) completely contradicts your need to be in work at all times. This is not the project's fault. The fault is with the parameters you set to meet your requirements.


Perhaps you have a reading problem, also. Where, in any post whatsoever, did I express any sort of problem or concern regarding the filling of my buffer? It appears that you have made an unsupported assumption. This might be understandable if English is a second language for you. Of course, I have no way of knowing if this is the case.

All of my remarks applied only to that period of time during which my machine had completely run out of work, and the Server Status page was reporting that the servers were up and running. Once Work Units began to be resupplied to my machine, my buffer filled immediately, and I never said otherwise.


No problem. Lesson learned. Now it's up to you to decide how to use the options Boinc (and Rosetta) provides to ensure your own req'ts are best met. Have fun!


I followed your suggestions regarding CPU run times and buffer sizes, and you attacked me anyway. I think I’ll return to the BOINC defaults. They wouldn’t be defaults if they were sub-optimum, would they?

Now, if one were to assume that the Project is like a very, very large factory, and that all of the manufacturing cells or work centers in that factory are able to process whatever inventory is available into whatever finished good is desired, then perhaps I can see your point.


The project indeed isn't like that (eg: Rosetta can't run Einstein jobs) but Boinc can. As such, the parallel you attempt to draw has no validity whatsoever so I won't dwell further on your convoluted (and inapplicable) what-if scenario - which will at least please Michael G! (In any case, the answer to your questions would be "no" or "none" or "not necessarily", but certainly not "yes" except in the most contrived of circumstances - which kind of explains why your what-ifs needed so many levels).


While it might be true that Rosetta can’t run Einstein jobs, it apparently is true that Rosetta runs a variety of different types of tasks, and the Rosetta license indicates that the Project gathers information about its contributors’ computers so that it can assign different types of work to different users. I do not recall expressing concern about BOINC. My concerns were in regard to Rosetta@Home.

Contrived of circumstances? I thought you had 20 years of manufacturing experience! Do you really expect us to believe that you have never once, in all that 20 years, seen a materials stockout that impacted production? That assertion would be a bit difficult for anybody to swallow.

I think we’re going to have to agree to disagree on this one, because I am still not clear just exactly what point you’re trying to make. I appreciate your efforts to enlighten me, but our experiential backgrounds are apparently too different, and we are talking past each other.


I may've been talking past you, but you weren't talking past me. That's why I'm clear about what you're saying and am sure it's not correct, while you're not clear what I'm saying at all.


Even the most irrational among us are usually certain that they are perfectly lucid and correct.

Anyway, we've talked through the whole issue pretty well in the last couple of days, even if we've made the Mod's trigger-finger get itchy along the way (sorry mod.sense!). I think if you cast your eye again over your various settings you'll be more comfortable they reflect your intentions and req'ts more fully in the light of what's been said, and you won't get anxious when things don't go smoothly in future. Somehow I'm going to guess you aren't going to stop crunching even if the site goes down completely for a couple of days. Well, if they don't, you'll know who to blame anyway! ;)


Well, that was about as clear as mud! I don’t know if you are deliberately trolling, but I know that you do not appear to have provided any clarity to the matter that led to the establishment of this thread in the first place.

While it is true that it might be somewhat valid to compare the relationship between crunchers and servers to a KanBan (pull) manufacturing system because individual contributors’ computers must request Work Units in order to receive them, it is also the case that a materials stockout and the inability of Rosetta servers to provide work to contributors are very similar issues. In either case, once the buffer (safety stock) is empty/used up, the Work Center (crunching computer) can do nothing to acquire more work until the source of that work (Rosetta servers) can provide it.

deesy
31) Message boards : Number crunching : no work units (Message 67576)
Posted 3 Sep 2010 by deesy58
Post:
Your deliberate way of selective reading leaves me with only one conclusion: You're a troll. I won't feed you anymore.


Your accusation is crude, rude and uncalled for. Name calling is just such a mature activity, don't you think?

What do you want me to do? Should I copy and paste the license agreement that I see when I click on your link? Without knowing what I see (and it is NOT in German, by the way) you are out of line in calling me a troll and implying that I am a liar.

Mod.Sense, this is getting a little out of control, isn't it?

deesy
32) Message boards : Number crunching : no work units (Message 67573)
Posted 3 Sep 2010 by deesy58
Post:
Thanks, Jochen. It was interesting to re-read the Rosetta license agreement.

I didn't link to the license agreement, but to the Rules and Policies.
Scroll down to the bottom of the page and read the disclaimer. At least there is one in the German translation of that page. In bold letters...


I followed the link. It led me to the license agreement. The disclaimer in bold letters just said that Rosetta was not responsible for any damage to the licensee's computer. I am not aware that anybody has claimed that crunching for the Rosetta@Home Project has damaged their computer, and it is difficult to imagine how it could happen on any modern microprocessor with heat-sensitive throttling capability, like virtually all of the newer Intel processors.

deesy.
33) Message boards : Number crunching : no work units (Message 67571)
Posted 3 Sep 2010 by deesy58
Post:
And 'entitled' is what has gotten this Country into alot of problems, "undocumented aliens" think they are 'entitled' to the American Dream but then don't have to work for it. They get it thru 'entitlement' programs such as Welfare, Food Stamps, WIC, etc ,etc, etc! Is that a good thing, not to my way of thinking! Thinking anyone is 'entitled' to workunits is just not correct either, each project TRIES to provide work in a timely manner just as we users TRY to provide our pc's for them to give those workunits too. IF we users are 'entitled' to workunits wouldn't it also be true that each project is 'entitled' to a stable number of users and by extension their pc's? Meaning project A sends me work and I CANNOT move that pc to any other project and if it crashes and I am offline, or if my internet connection goes down, I then OWE the project something because they are 'entitled' to the use of my pc's. I do not think you meant to say 'entitled' in that context and may in fact wish to rethink that whole idea, IMO. 'Entitled' would seem to imply a contract between the project and yourself, meaning they supply you work and you supply them resources, that isn't the way Boinc works. A project is no more 'entitled' to the use of my pc's than I am 'entitled' to get work from them. Now IF you can find a project that will do that, please let us know, alot of us would like to make money for crunching!

Oh and there are PLENTY of charities that refuse to open their books and lose donations as a result! Several years ago the United Way in Alexandria Va got busted for misappropriating money and donations went thru the floor. They 'opened' their books and paid dearly for it. If a project does not have any workunits just move on to another one until they do, most of us have experienced long drawn out outages at most projects over the years, Rosetta is one project that has been up far more than alot of other big named projects! Seti, the place it all began, is now down 2 to days PER WEEK! They also DO NOT believe in keeping its users informed of the progress of outages, when they are back up they are back up, not before and no info will be sent out! Alot of us don't like that practice so we now crunch elsewhere, it is our choice, our pc's, our electricity, our time and our maintenance of those resources.


I think you are twisting the word "entitle" to mean something that is not at all applicable to the Rosetta@Home Project. This off-the-wall rant is an inappropriate introduction of "Tea party" politics into a technical and philanthropic forum. Perhaps it is time to lock or terminate this thread (hint, hint).

deesy

34) Message boards : Number crunching : no work units (Message 67569)
Posted 3 Sep 2010 by deesy58
Post:
deesy, read this: http://boinc.bakerlab.org/rosetta/info.php
Rosetta never promised to keep your computers busy. And they never will.
Any further questions?



Thanks, Jochen. It was interesting to re-read the Rosetta license agreement.

It is difficult to see, however, how it is, in any way, germane to the discussions taking place on this thread, as they do not appear to be license related. I did find it interesting that the Project can (and apparently does) "decide what type of work to assign to [our] computer[s]." So, perhaps the assumption that all contributors receive the exact same types of Work Units is not correct.

deesy


35) Message boards : Number crunching : no work units (Message 67563)
Posted 3 Sep 2010 by deesy58
Post:
It seems to me you guys are making a simple process very complicated.


Perhaps it is more complicated than you might realize.

We sign on as users to a project, and when the project has work for us we run it. This project is amazingly reliable in providing work for all the active users. Every once in a while, the project won't have work for all of us, for whatever reason.


This appears to be true.

We aren't entitled to any certain amount of work, and we aren't entitled to explanations from the project folks. After all, we are supporting and helping the project, not adding to their workload.


Sorry! I can't agree with you on this. We are entitled to a sufficient amount of work to keep our machines busy so long as other contributors are receiving sufficient work to keep their machines busy. If nobody has work, then that is a different matter. Your assertion is analogous to saying that we are not entitled to know how a charity spends our money after we have donated it. Where I come from, such a position would probably be considered naive.

Comparisons to corporate culture, or commercial web sites, simply don't apply. The project isn't a business, and we aren't paying customers.


I can't agree with you here, either. You can be sure that the Project management and staff has a culture, and it might not be all that different from a corporate culture. (Is Baker Labs a corporation, BTW? I'll bet it is.) A project such as Rosetta@Home has no less responsibility to keep their Web sites current and accurate as any other organization, and perhaps even more of a responsibility in light of the fact that computational resources are being donated to them free-of-charge. It would be arrogant to conclude that the Project need not treat its contributors with a minimal level of respect and consideration because "we aren't paying customers." We certainly are paying. We are contributing our capital resources, the energy required to operate those resources, and the time and effort required to keep those resources in operation.

Altruism is a wonderful thing, but does anybody, anybody at all, believe that the management and staff at Baker Labs are not being compensated for their efforts. They certainly must be altruistic, but you can bet that they are also benefiting from the receipt of salaries and grants, the respect of their peers, the opportunity to publish papers, gain promotions within their organizations, and perhaps even future employment opportunities with pharmaceutical companies.

deesy
36) Message boards : Number crunching : no work units (Message 67561)
Posted 3 Sep 2010 by deesy58
Post:
I guess I still don’t take your point. Perhaps you have simplified it too much. ;-)

You say: “The task you're crunching is the only one in 'production'. That's not affected by whether you have 1, 10, a million or zero back-up tasks.” (Emphasis added)

You also say: “… let's say the target is to have 10 tasks available after the one in progress.” (Emphasis added)

These assertions imply that I WAS crunching a task during the extended period when no Work Units were being received. That is not a valid assumption. When my Windows Task Manager indicates that the percentage of CPU usage consumed by MiniRosetta is zero, then isn't it clear that NO crunching is being accomplished on my machine? Zero? Zip? None? Nada? When the buffer empties out, the crunching stops.

From the perspective of my particular Work Center (computer), if this happens, then production has ceased on my machine. I am no longer crunching numbers for the Rosetta@Home Project. Since it was obvious that some number of users had also experienced an interruption in the flow of Work Units, then didn’t their production also cease? Perhaps the Rosetta@Home Project continued to function, but it certainly was not at the same capacity, because that capacity must be measured by the number and type of CPUs making up the grid, independent of the number of Work Units available to some contributors in the form of “back-up stock.”

I believe that I made my situation clear. My computer received NO Work Units for a sufficient amount of time that it was no longer able to crunch numbers for the Rosetta@Home Project. The buffers had been exhausted, and no new work was being received.

Now, if one were to assume that the Project is like a very, very large factory, and that all of the manufacturing cells or work centers in that factory are able to process whatever inventory is available into whatever finished good is desired, then perhaps I can see your point. In the real world of manufacturing, however, I have never seen a situation like that.

Typically, a Work Center or Manufacturing Cell can only perform a certain stage of the manufacturing process, and would not be able to process materials or items that are not appropriate to the machines and worker skills. They also are usually limited in the scope of the items they can process (limited number of sizes, designs, or materials).

If I am manufacturing a certain type of (for example) fasteners, and some of these fasteners are (for example) made of Titanium, and some of these Titanium fasteners (for example) require a Teflon sleeve that is set up as a KanBan item, then what happens to the Work Center’s production if no Teflon sleeves are available in Inventory? What if this happened in spite of the number of KanBan baskets of sleeves normally kept in stock? Wouldn’t that particular Manufacturing Cell or Work Center have to shut down? What would that mean to the workers who staff that cell or center? What would be the impact on the Production Manager? How about the customers who might be waiting for the product? What is the overall effect on the company as a whole? Hasn’t the cessation of production in one or more Work Centers or Manufacturing Cells adversely affected overall production levels and, thereby, financial performance?

Wouldn’t it be true that, regardless of the number of backlog Work Units maintained in contributors’ buffers, if the number of running CPUs in the grid were to suddenly decrease, would the project not be adversely affected?

I think we’re going to have to agree to disagree on this one, because I am still not clear just exactly what point you’re trying to make. I appreciate your efforts to enlighten me, but our experiential backgrounds are apparently too different, and we are talking past each other.

deesy
37) Message boards : Number crunching : no work units (Message 67554)
Posted 2 Sep 2010 by deesy58
Post:
Say you run one task at a time and you want to have another spare task made available to you as soon as you start the first one. Think of it as a KanBan in ERP terms. You call for it and you get it. Great.

But if you don't get it, you call for it again 10 minutes later, then again and again and finally after an hour, let's say, (and 6 'failures') it arrives. Is that 6 failures or not a problem at all because your first job takes 3 hours to run and the new one arrived well before the in-progress one finished?

It's a failure because it doesn't meet your criteria, but if your criteria is decided in order to allow for 20+ failures without facing a real problem then it's a success for your criteria and for your 'production'.

This, I believe, is the crux of your misinterpretation. You make no distinction when the distinction is actually everything.


Hmm. I think that your analogy breaks down. A more accurate analogy might be that the KanBan system tries to “pull” materials from Inventory, but the materials aren’t there. Regardless of the number of attempts made, no materials are available. As a result, the Work Station or Manufacturing Cell runs out of materials and has to stop work. Production ceases. In this scenario, it is not the IT Manager who gets a close look at the parking lot, it is the Materials Manager (assuming that the MRP program correctly generated the correct Purchase Requisition at the correct point in time). Wouldn’t the interruption of production be regarded as a very bad thing? Even if the KanBan system had sufficient materials to produce for a day or two, it wouldn’t help much in the event of a stockout on an item with a (for example) one-week lead time, right?

The IT analogy would be that production must be halted because the database and applications servers are down, and no transactions can be completed. EVERYBODY is upset because Customer Orders cannot be entered, Purchase Orders cannot be generated, MRP cannot be run, Inventory levels cannot be checked, etc., etc. When the servers are down, production often ceases. At the very least, it becomes seriously impacted.

When a Rosetta contributor uses the KanBan-like method of attempting to retrieve Work Units, and no Work Units are forthcoming, isn’t that analogous to the Materials Management situation where the Workstations run out of materials and production ceases?

In my experience, having a VP who wasn't a bit of an idiot is a rarity. Sounds like you have one of the usual ones.

First, though, you'd need to distinguish between the 'crash' over a week ago and the slow-down during all the time since. If you hadn't already assessed the difference between trivial ups and downs and built in some margin for that then you'd deserve that walk to the car. Don't sweat the small stuff and especially don't trouble the big guys with every bump and squeak. They expect you to handle that yourself.

If it's something major that requires more heavyweight intervention you can tweak some stuff (runtimes in our case here to provide more time for the big solution) or the final contingency of having a back-up project altogether. We're lucky that our machinery (Boinc) can run anything else with no changeover time.

The point being, set a safety margin that covers the small stuff and stop worrying if your safety margin is being eaten into - that's precisely what it's for. How you set your safety level depends on your situation. I'm away for half of each week so I keep 2 days usually. 1 day may be better for you if you check things each night.

If your safety margin is close to being exhausted, as long as you're sure solutions are being worked on by TPTB and you've tweaked as much as you can, it's out of our hands. At the end of the day, the loss is theirs. If you're prepared to keep a back-up project you can stay productive and returned when the problem's properly solved (looking better now, and the slow validator issue seems to have gone too).


MY VP of Operations was very sharp, and a very good leader. His job was Production. If we didn’t make our production targets, his butt was on the line, so to speak. If we experienced interruptions due to IT problems, then my butt was right alongside his on the line. I can tell you that I spent more than one night, weekend and holiday in the Data Center in order to ensure that our ERP system was up and available (we ran 24/7).

I believe that I have not been affected by any slowdowns (not that I’ve noticed, anyway). When my buffer emptied out, and no new Work Units were available for a couple of days, my production ceased. I simply reported that fact on a thread that was established by another poster for the same reason. What I thought I was seeing in some of the reply posts was analogous to saying that a Work Station could not possibly be starved for materials because some other workstation was able to work just fine. To me, that was (and is) a nonsensical position). It’s kind of like saying that you couldn’t possibly be stuck in traffic on 8th Street because I am driving along just fine here on Elm Street.

As I understand it, when BOINC and Rosetta install, a buffer of appropriate size is created automatically. The default run time is 3 hours, but can be adjusted up to 24 hours. The downside of having a large buffer is the possibility of missing deadlines. We are advised that increasing buffer sizes is not necessarily a good idea. We are left, then, in the situation of the manufacturing Workstation running a KanBan system. If we try to pull materials (Work Units) that are not available, then production ceases. How do we handle that ourselves? Are we not 100% at the mercy of the server systems that distribute the work to our workstations?

I don’t think it is accurate to compare a complete cessation of production with a safety margin being eaten into. I don’t watch my buffer levels closely. I shouldn’t have to. The way I noticed that my machine ran out of work was when the cooling fans got quiet. Then I saw that it was unable to acquire new Work Units. If the BOINC system had to be closely monitored in order to assure that it was working properly, then it wouldn’t be as valuable to contributors as it clearly is. I, like a lot of contributors, like to “set it and forget it” with BOINC. That philosophy works pretty well most of the time.

The bottom line for me was not that Rosetta’s servers were down. It was that the “Server Status” page indicated that they were up and running, even though there seemed to be some evidence that they really were not. As a long time IT director, I understand that things happen. What concerned me was the appearance that at least some posters were convinced that nothing had happened, and nothing was wrong.

deesy
38) Message boards : Number crunching : no work units (Message 67536)
Posted 1 Sep 2010 by deesy58
Post:
These are a few posts that I (perhaps incorrectly) interpreted to mean that at least some posters were implying that there were no problems, and no interruptions.
This was in spite of the fact that numerous crunchers were reporting that they were not receiving Work Units. Some of these posts appear contradictory to me, and perhaps might explain my apparent confusion.


I don't see anything wrong with those statements or anyone denying that Rosetta is not performing at its usual capacity. There is indeed some contradictory phrasing used as each quote is the opinion of a different volunteer. Different people will always have a different perspective on an issue.

What does it mean when some contributors stop receiving work, while others apparently continue to receive plenty? I'm asking because I want to know, and because I believe that it is a legitimate question.


Normally Rosetta puts out an average of tens of thousands of work units an hour but at the moment appears to be generating several thousand instead. When those work units become available it is a bit of pot-luck as to who gets them. If your BOINC manager calls the server just as work is released then you will get some, if your manager calls the server a couple of minutes later there might not be any left. If you had a two day cache of work units prior to the slow down then you would have had two days in which to ask the server for more work, raising the chance of you seeing no interruption. People with no cache have no spare capacity, so they either got a new task when the last one finished or they didn't.

Your question also revolves around the issue of "plenty". I think it must be a very rare individual who has not had a reduced amount of work from Rosetta in the last week, but many supplement Rosetta with other projects to keep their cores busy. It also depends on scale; if someone with 1 computer with 1 core says, "I didn't get any work" but someone with 10 computers with multiple cores says, "I got some work but less than usual" is it any surprise that the person with more cores got more work?

In the meantime, it is an absolute fact that my computer, which runs 24/7 for Rosetta@Home, received absolutely no Work Units for a period of about two or three days beginning on August 25 or 26. The buffer was emptied, and no crunching at all was accomplished during this time. I interpret that as a stoppage, not a slowdown. If I had tried to explain a "crash" of our company's ERP system to the V.P. of Operations as a "slowdown," I would have been bounced all the way across the parking lot to my car. :-|

deesy


This is a question of perspective. For you there was a stoppage. For the project there was a slowdown (processing speed dropping to half then a third of normal). In the perspective of a super computer, which BOINC simulates, your account happens to be one set of cores in the super computer that didn't get work for a time; an unfortunate slow down of the project but hardly a complete crash of a system.

Every BOINC project breaks down or experiences difficulties now and then, which is why many people have a backup project. I am not sure why you choose not to have a backup (such as World Community Grid's Human Proteome Folding project which is directly related to Rosetta), but periods of complete inactivity are an unfortunate but inevitable consequence.


Thanks! Your information is lucid and helpful.

At this time, I contribute only to Rosetta because it appears to me that this project is closest to the sort of applied science that might help to find a cure for a deadly disease that took a very close loved one several years ago. I have a good GPU, and I used it on the Collatz Conjecture Project for a while. Then I realized that the conjecture could never be proved, no matter how many years was spent trying. I "folded" for the "other guys" for several years, but that work appears to me to be more theoretical science, and less applied science. Credits mean little to me, except as a measure of how much my machine is contributing. I'm trying to assist the science, not engage in a competition.

deesy

39) Message boards : Number crunching : no work units (Message 67524)
Posted 1 Sep 2010 by deesy58
Post:
Sid Celery said:

No-one is trying to convince anyone there are no problems . . .


Jochen said:

There's nothing wrong and absolutely nothing you could do.


Evan said:

No, there is a problem with the servers. Look on the server status page
and you will see they have taken nearly all of them off line while they
make a proper fix.


Michael Gould said:

It seems that on the rosetta home page, the "Server status" box almost
always says "scheduler running," ...


Chris Holvenstot said:

Maybe, I don't know - but I can tell you from what I have seen this whole
thing amounts to nothing more than a minor slowdown - and I suspect that
a lot of the credit for it being a slowdown instead of an outage goes to
the Admins for holding things together.


Chilean said:

btw, I still have plenty of WUs in queue.

(implying that work interruptions
must not be real ... otherwise why say what was said?)

Sid Celery said:

Unless your aim is to run out of work I can't see the problem. It's the responsible thing
to do.


and also:

"No work" is false, as you confirm above.


These are a few posts that I (perhaps incorrectly) interpreted to mean that at least some posters were implying that there were no problems, and no interruptions.
This was in spite of the fact that numerous crunchers were reporting that they were not receiving Work Units. Some of these posts appear contradictory to me, and perhaps might explain my apparent confusion.

What does it mean when some contributors stop receiving work, while others apparently continue to receive plenty? I'm asking because I want to know, and because I believe that it is a legitimate question.

In the meantime, it is an absolute fact that my computer, which runs 24/7 for Rosetta@Home, received absolutely no Work Units for a period of about two or three days beginning on August 25 or 26. The buffer was emptied, and no crunching at all was accomplished during this time. I interpret that as a stoppage, not a slowdown. If I had tried to explain a "crash" of our company's ERP system to the V.P. of Operations as a "slowdown," I would have been bounced all the way across the parking lot to my car. :-|

deesy
40) Message boards : Number crunching : no work units (Message 67503)
Posted 1 Sep 2010 by deesy58
Post:
So, what did you increase your runtime to? That's all you really needed to respond to. It's the only thing we can control and contribute from our end.


At your suggestion, I have increased my runtime from 4 hours to 12 hours. I increased my buffer from .5 days to 1.5 days. In the past, I had a 24-hour runtime, and had problems. I shortened it to 4 hours when I upgraded from Windows XP to Windows 7, and reinstalled BOINC. Is there an optimum? If so, what is it, and why?

Yes, I have been receiving WU's for the past three days. When I originally posted on August 27, however, I had received none at all for about 36 hours. Is this not the proper place to report problems and ask questions? Don't you believe that it might be insulting to tell contributors that there really are no problems when they report that they are not receiving work?

As I said, my problem has been resolved (by Rosetta and/or BOINC). What is being gained by continuing to assert that there were no interruptions in the assignment of Work Units when there clearly was an interruption, even though it might now have been rectified? I guess I don't see the point.

deesy


Previous 20 · Next 20



©2024 University of Washington
https://www.bakerlab.org