Message boards : Number crunching : Large numbers of tasks aborted by project killing CPU performance
Previous · 1 · 2
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1684 Credit: 17,950,321 RAC: 23,118 |
I have the setting as "Store at least 0.2 days of work" and "Store up to an additional 0.3 days of work",You would be better served with "At least 0.2 days" and "Additional 0.01 days" I don't mind short deadlines. Why did so many WUs get put in my queue, though?Detaching & re-attaching resets everything (except for your Credit), so all the processing performance & time history was no longer there & it downloaded work based on the defaults, not past history. However, a while back the project made changes to how the Estimated completion times were worked out- if those changes were working correctly then you should only have received as much work as your cache settings allowed- ie a half a days worth. I think the project needs to check that their implementation has been put in place & is working as it should, if your cache is set as you say it is, there is no way you should have received that much work if it was working. Grant Darwin NT |
MarkJ Send message Joined: 28 Mar 20 Posts: 72 Credit: 25,238,680 RAC: 0 |
I don't mind short deadlines. Why did so many WUs get put in my queue, though? There was a bug with work fetch when it refills the cache if you have an app_config with a max_concurrent statement. It was supposedly fixed in 7.16.6. That may or may not be relevant to your situation. BOINC blog |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,633,537 RAC: 7,232 |
There I was, all set to start with "Now, now, children, that's not how REAL science works." But I'm still planning to include my confession here. Maybe it's all my fault? Is there a historian of science in the house? I'm waiting for you Nobel Prize in Chemistry |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,177,195 RAC: 3,176 |
There I was, all set to start with "Now, now, children, that's not how REAL science works." But I'm still planning to include my confession here. Maybe it's all my fault? Is there a historian of science in the house? ALL workunits that are aborted by you OR the Project are put bck into the pool of available workunits so they are not...LOST or "tossed data". Boinc is over 20 years old now and has come a long way, your thinking of how it works shows how"amateurish" you really are!! But that's a natural segue to the actual problem I reported on my last visit here. That was an announced and scheduled outage, though badly announced (and possibly linked to an unscheduled outage about a day later?). Not only was the announcement not pushed to the clients (which would have allowed us, the volunteers, to have made some scheduling adjustments), but the announcement wasn't clear about the changes. If they are just adding encryption for connections to this website, that's one thing. Not exactly silly, and quite belated, but there may be some bits of personal information here, so why not? However, the wording of the description of the upgrade causing the outage makes it sound much heavier. Encryption for a website is trivial, but encryption for large quantities of data is something else again. Quite possibly it would involve a significant purchase of encryption hardware for the project side. (One of the researchers I used to work for designed such chips as entire families before he returned to academia. Our employer lost interest in "commodity" chips, so it's probably become yet another market niche dominated by the long-sighted Chinese. (Which actually reminds me of the first time I worked at the bleeding edge of computer science. Ancient history, but the punchline is that it was obvious (to me, at least) that the project in question would never make a profit, and the entire division (with some of my friends) was dumped and sold cheap to HP a few years after I had moved along. (CMINT))) Is there a link between the 3-day deadline and the encryption? From an HPC perspective, the answer is probably yes. Throwing away lots of data becomes a larger cost, a larger waste of resources, when you have also invested in encrypting that data before you threw it away. It also raises questions from a scientific perspective. For one thing it indicates the results are probably not being replicated, which is a concern in a situation like this, but it might indicate worse problems. Which is actually a segue to my confession... The story is buried in the history of BOINC now, going back about 25 years. Way back then, there was a project called seti@home that had a heavy client. In discussions on the (late and dearly departed) usenet I became one of the advocates for the kind of lightweight client that BOINC became, while seti@home became just another BOINC subproject. If there is a historian of science in the house, I think it would be interesting to find out where the BOINC design team got their ideas... Maybe some part of it is my fault? There was a company named Deja News that had a copy of much of usenet, and those archives were sold or transferred to the google later on... (I actually "discovered" the WWW on usenet (around 1994 during another academic stint) when I was searching for stuff about the (late and not so dearly departed) Gopher and WAIS knowledge-sharing systems.) (But I'm pretty sure the main linking guy at Berkeley must also be late by now. He was already an old-timer way back then.) Now I'm the old-timer, and I'm still wheezing about the silly 3-day deadlines.[/quote] You OBVIOUSLY have NOT been a Boinc cruncher for very long as EVERY Project has experienced shortages of workunits over time, even Seti the first Boinc Project has shut down and is no longer creating any new workunits. In fact there are over 100 Boinc Projects that have started and closed for one reason oranother but yet Rosetta chugs right along still producing workunits!! As for 3 day dead lines you REALLY need to expand your crunching to other projects, more than a couple have 2 day deadlines that are met by 99% of it's users with no problem. YOUR problem seems to be your unwillingness to adjust to the fact that not every Project is the same or run in the exact same way. One basic 'rule' of Boinc is to always set your workunit cache to a very small amount until your computer and the new Project can figure out how long each workunit takes to runandwhat cahce size works for you. SEVERAL people have already said that but you STILL seem to be saying the same old thing...adjust to ME not me adjust to you!!! In short if you can't handle the 3 days deadlines then maybe Rosetta isn't the Project best suited for you and your resources, it works for 99% of the people who are here, so it seems you are the outlier here. Or in Teacher terms YOU are the one screwing up the curve!!! If you prefer LOOOOOOOONG deadlines why not try Climate Prediction as some of their workunits take over 365 days to complete!! |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
You OBVIOUSLY have NOT been a Boinc cruncher for very long as EVERY Project has experienced shortages of workunits over time, even Seti the first Boinc Project has shut down and is no longer creating any new workunits. In fact there are over 100 Boinc Projects that have started and closed for one reason oranother but yet Rosetta chugs right along still producing workunits!! I admire y'alls patience dealing with him lol. |
Sven Send message Joined: 7 Feb 16 Posts: 8 Credit: 222,005 RAC: 0 |
Hmm, I must say that I can see a bit of truth in the suggestion to have longer deadlines. As long as it is mandatory sometimes to switch of computers over weekends, the 3 day deadline is very fast reached. 5 days instead would be a great deal to get crunching in time. I usually adjust my system in a way that makes successful crunchings possible - for example store at least 0 days of work and at least 0.1 days (standard adjustment). But the 3 days over weekends are always problematic. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,177,195 RAC: 3,176 |
Hmm, I must say that I can see a bit of truth in the suggestion to have longer deadlines. As long as it is mandatory sometimes to switch of computers over weekends, the 3 day deadline is very fast reached. 5 days instead would be a great deal to get crunching in time. We are doing COVID-19 research research right now, 1000 people are dieing per day in the World, longer deadlines means more delay in the info that might just save some people. The way you have your cache setup if your return all the workunits on Friday when you leave your pc's then come Monday morning you should have all of the ones you crunched over the weekend done and ready to return on time. An option might be to increase your cache on Friday so your pc can crunch more workunits that you get Friday afternoon/evening and then reduce your cache again on Monday morning when you return all those units you crunched over the weekend. |
shanen Send message Joined: 16 Apr 14 Posts: 195 Credit: 12,662,308 RAC: 0 |
Hmm... If that guy has become a key manager of the Rosetta@home project, then (1) No wonder the project stopped sending tasks, and (2) This project is probably in its death throes. Just reading another book about black-hat hackers. It has got me to wondering if the real problem with Rosetta@home is that we've all been "recruited" for mining BitCoins or some similarly worthless task. That could actually be related to the push for more encryption, eh? Plus I see how it could explain the peculiar way the downloads were working, almost as though someone had imposed a paged memory system on the project, with data pages around half a GB each, notwithstanding large numbers of ostensibly different projects working on the same data. Security is a chain, and the attackers are always looking for the weakest links. From reading the comments in this thread, some of which seem to be from honchos at Rosetta@home, the weak links seem pretty obvious... Much as I disliked some of the management policies of WCG, it looks like I should switch back there. It might be amusing to find out if any of my suggestions were ever implemented,. Rosetta@home seems to have clearly crossed into the territory of even more poorly managed projects. I've seen a couple of references to WCG in threads here, and it's a long-term project with some degree of corporate support (even if IBM is only a shadow of the great company it was when I was young). (But I still think HP has fallen harder and faster...) #1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech) |
Falconet Send message Joined: 9 Mar 09 Posts: 353 Credit: 1,227,479 RAC: 753 |
Curious, what were those suggestions? |
yoerik Send message Joined: 24 Mar 20 Posts: 128 Credit: 169,525 RAC: 0 |
Just reading another book about black-hat hackers. It has got me to wondering if the real problem with Rosetta@home is that we've all been "recruited" for mining BitCoins or some similarly worthless task. That could actually be related to the push for more encryption, eh? Plus I see how it could explain the peculiar way the downloads were working, almost as though someone had imposed a paged memory system on the project, with data pages around half a GB each, notwithstanding large numbers of ostensibly different projects working on the same data. no comment on the security/bitcoin question - but I am also curious about your criticisms of WCG. A quick search of the WCG forums brought up your concern over a lack of a server status page - I just made a new thread bringing up the issue, thanks to you. https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,42562_lastpage,yes#631089 if there are any other concerns about that project, I would certainly appreciate it being brought back into the forums there. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,177,195 RAC: 3,176 |
Hmm... If that guy has become a key manager of the Rosetta@home project, then (1) No wonder the project stopped sending tasks, and (2) This project is probably in its death throes. You must have the missed the News article saying that mining has gone thru the floor during the Pandemic and is still dropping, so it's unlikely that with the INCREASE in Rosetta users that WE are in fact mining. Tin foil hats are cool but not always needed. |
Message boards :
Number crunching :
Large numbers of tasks aborted by project killing CPU performance
©2024 University of Washington
https://www.bakerlab.org