Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 87 · 88 · 89 · 90 · 91 · 92 · 93 . . . 300 · Next
Author | Message |
---|---|
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Each task has 3 processes using the same amount of ram, but only one of those 3 is using the cpu, the other two are near zero cpu time. So, 6 tasks, 18 processes using 1-2gb each.That doesn’t sound right, but as I don’t run BOINC on Linux I can’t add more… |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
should I set up any firewall rules?Assuming it’s the same as on Windows: The only thing that requires Internet access is the client, and it only makes HTTP(S) connections to the project servers. So you need to open tcp/80 and/or tcp/443 outbound (plus udp/53 or whatever else your DNS needs if that’s not handled by a separate resolver); everything else can be blocked. |
Trotador Send message Joined: 30 May 09 Posts: 108 Credit: 291,214,977 RAC: 4 |
Hello in the last few hours i had 22 WU who stoppt with an error a few second after starting. 278 fails here |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2113 Credit: 41,065,024 RAC: 21,613 |
Regarding my 2 day cache, I do not consider that excessive given that my £35 SBC running off an SD card (OK, it's an SSD now but when I started it was an SD card) has a more reliable uptime than the servers running the project. In the time I've been doing work for Rosetta I have seen several periods of downtime where work units have not been deployed for days at a time. I don't mind having a Pi dedicated to the task so long as it's doing real work and not just heating the room. For an individual host's circumstances it's fine if you have a specific reason, but as a general rule it is excessive. About a year ago we had a lot of new hosts arrive from Seti with huge multicore machines (which you don't) who were used to large caches, because there were no restrictions there, and ran with the shortest runtimes (which you don't) so they were hoovering up all available tasks, to the exclusion of everyone else, running them for the shortest, least productive time, sending them back almost immediately, then complaining they couldn't re-fill their oversize caches again. Simultaneously, a whole bunch of very keen new users with more reasonable settings couldn't get any tasks to run at all and had their enthusiasm knocked out of them. With tasks in short supply, no-one was happy. The solution was to cut deadlines from 7 to 3 days, force the immediate aborting of tasks for re-issue that couldn't make deadline, removing the possibility of 1hr runtimes so they ran 2hr minimum and default 8hrs, so that the tasks which were available didn't sit in offline caches that wouldn't run for a week while others had empty cores waiting for work, and ensuring the tasks that did come back were more productive. The result was immediate availability of work for everyone, no more shortage of tasks and more rapid task turnaround of greater value for the project. I only say it out loud now because all those reasons still apply and we're tight on tasks in the queue again, so it helps to eke them out just a little longer. Obviously, your (currently) 18 tasks with 4 running to default hrs doesn't do any harm individually - more as a general rule, like when 32 & 64-core machines had 2-3000 tasks in their cache, each running just an hour (or less). My old 8-core used to store around 50 tasks, now my 16-core keeps nearer 55-60. All proportionate. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
I have seen several periods of downtime where work units have not been deployed for days at a time. I don't mind having a Pi dedicated to the task so long as it's doing real work and not just heating the room.If you have access to the BOINC Manager application, you might try adding World Community Grid. That reportedly has an ARM Linux application for its OpenPandemics sub-project, and much smaller work units. Otherwise it might be worth getting in touch with Balena to explain the issue and see if they would consider adding something for a different project in the same way they did for Rosetta (though they may find it harder to convince IBM than Baker Lab to let them hack at their applications; SiDock is another similar project without an ARM build (yet) that might benefit from that kind of effort). Do bear in mind that we are here to help the project, not the other way round. If they happen not to have any work that needs doing at any given time, it’s their choice not to make use of a resource that’s available to them, not a cause for us to complain. I’m not sure which part of the U.K. you’re in where an idle Pi is useful for heating, but I think I’d like to move there… (I’ve got four 8-cylinder Xeons pulling 500 W out the wall and barely keeping the place warm…) |
Garry Heather Send message Joined: 23 Nov 20 Posts: 10 Credit: 362,743 RAC: 0 |
I did reach out to Balena about their solution (their response https://forums.balena.io/t/fold-client-offers-unsupported-project-climateprediction-net/218911/9?u=goto_gosub]) and I subsequently tried a couple of other projects included in their manager (cannot remember which now) and none of those worked either. Hopefully without sounding disrespectful to Balena I do not think they are going to make any changes to how their client works any time soon for a number of reasons, not least with getting other projects on board and comitting time and resources to making someone else's project compatible. On a slightly different note, at the time of writing this the server status as reported on the website appears to be OK but my rig is reporting internet connectivity but no response from the servers again. Ho hum. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Ho hum.You’ve got 4 tasks running and 20 ready to start. That’s 24 more than a lot of other people… |
Garry Heather Send message Joined: 23 Nov 20 Posts: 10 Credit: 362,743 RAC: 0 |
This is true, but lets have some context here. I just wanted my single Pi to be kept busy because the cost in leaving it on 24/7 is not insignificant to me. There are some people here with multiple monsters processing work units. My solitary Pi was never going to make a dent on their requirements so please do not think badly of me for trying to cache enough to to keep it busy for a couple of days. I will complete the units currently being processed but suspect that this project is not for me. I have aborted my cached units back into the pool. |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Nobody was asking or expecting you to abort the jobs – but what’s done is done, and cannot be undone. It makes no difference to the project who runs them, so please don’t be dissuaded from participating. The ones that weren’t resends are already out to other hosts. My machines are out of Rosetta work primarily because of the way I chose to set them up, and I’m too lazy to go round and change them all just to work around a bug in the work unit configuration. It’s arguably better that machines capable of running the ‘big’ tasks don’t pick up the ‘small’ ones, so that less-powerful machines do have a chance to run something. |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,708,282 RAC: 22,464 |
Are you one of those pricks who said "made you look" in the school playground as a kid? If so, how's the broken nose? |
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1600 Credit: 11,708,282 RAC: 22,464 |
Duplicate post deleted.You'd think there'd be a delete button. Who designs these things? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,265,269 RAC: 4,483 |
Duplicate post deleted.You'd think there'd be a delete button. Who designs these things? There's a workaround. If you use the same way to mark it as a duplicate every time, the software will see it as multiple identical posts, and delete all but one of them. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
should I set up any firewall rules?Assuming it’s the same as on Windows: Those ports seem to be open by default so I guess that I'm okay. Thanks for your reply. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
I have seen several periods of downtime where work units have not been deployed for days at a time. This kind of reminds me of the hoarding that takes place here (even prior to the pandemic). There's a supply problem, which leads to hoarding, which makes it worse. Kind of remarkable that we have too much unused CPU time to go around. |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
Woah, dude, where did that come from? Over the use of an "at" symbol? If you get spun up that hard, that fast over what I write, maybe the better solution is to stop reading my posts, okay? |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
Don't take it personally. There's a three roll limit on toilet paper here because of some hoarders (not you). That's the rule. But best practice for the community at large is for folks to take less, if they can. If everybody does it, then there is more likely to be a ready supply available, including for you. It's something worth repeating, just so everyone is aware of it. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1670 Credit: 17,462,930 RAC: 24,679 |
A few days in and the impact of the mis-configured Work Units is becoming clearer. Looks like the amount of work being done has dropped by almost a third, and isn't showing any signs of recovering.In the past it has taken several days for In progress numbers to get back to their pre-work shortage numbers. And that's with out running out of work again only a few hours after new work started coming through (which occurred this time).Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)… For all of the latest & greatest systems there are, there are an awful lot more older much more resource limited systems. Grant Darwin NT |
mrhastyrib Send message Joined: 18 Feb 21 Posts: 90 Credit: 2,541,890 RAC: 0 |
Looks like the profile of a dead body lying in a shallow grave. How metaphorical. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,109,972 RAC: 6,489 |
A few days in and the impact of the mis-configured Work Units is becoming clearer. Looks like the amount of work being done has dropped by almost a third, and isn't showing any signs of recovering.In the past it has taken several days for In progress numbers to get back to their pre-work shortage numbers. And that's with out running out of work again only a few hours after new work started coming through (which occurred this time).Say hello to two less hosts after they finish their current tasks, @Rosetta. I don't know if I have the time that's required to provide the space that is needed.You’re not alone. Look at the recent results graphs – ‘tasks in progress’ has dropped by around 200,000 (a third)… Just means more work for the rest of us!! |
jsm Send message Joined: 4 Apr 20 Posts: 3 Credit: 76,014,871 RAC: 64,193 |
Bandwidth usage massively increased in March I migrated to Rosetta from Seti almost exactly one year ago. For eleven months there was little impact on my capped 50gb bandwidth allowance but in March the usage has more than doubled. I am using the same 6 computers and the same preferences so nothing on my side has changed. When my ISP notified me of the sudden cap half way through March I installed wireshark after a difficult setup to capture packets at the router rather than at specific computers. Imagine my horror when I found that the culprit was rosetta using over 1gb per 6 hours. This is unsustainable and I will either have to shell out for an expensive unlimited contract (because I have an Ultima connection at over 100mbps) or cut back on Rosetta work. Has there been a significant project change which could be the cause of this increased usage or am I looking for another problem? Any suggestions most welcome. I have clawed my way to league position 599 and would like to break 500 if possible. Capt |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org