Outage notice

Message boards : News : Outage notice

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 13 · Next

AuthorMessage
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99361 - Posted: 21 Oct 2020, 15:18:26 UTC - in response to Message 99360.  

Looks like today’s was only a small batch, as it’s already run out…
ID: 99361 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 99362 - Posted: 21 Oct 2020, 18:36:09 UTC - in response to Message 99360.  

New tasks started loading at 7:30 UTC today. Looks like we're rolling again.

Some notification would be nice in the future since I spent a good amount of time trying to figure out what was wrong with my configuration yesterday when there wasn't any issue on my end.


There's no reason to expect a continuous workflow, and why should they have to tell you, when they're busy with the science? Just look in the messages tab of Boinctasks, or the event viewer in Boinc Manager, and see it saying "server has no tasks available". Or check the Rosetta home page. There's a big blue number on their homepage telling you how many million tasks are left. We collectively get through half a million a day. I suggest you join an additional project, you can tell Boinc which you prefer, eg. set Rosetta to weighting 1000 and the other to weighting 1 if you like. Then when you see it getting loads from the other project, you know it's just Rosetta that's offline. 1000 and 1 means it will do 1000 times more Rosetta than the other one. You can even set a project to 0, that means it will only get tasks from there if it has to. If you're just trying to do Coronavirus for example, you could join World Community Grid, and select to do only those type of tasks. They do Cancer, TB, African Rainfall, Immunity aswell if you're interested.
ID: 99362 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2146
Credit: 41,570,180
RAC: 8,210
Message 99363 - Posted: 21 Oct 2020, 22:31:10 UTC - in response to Message 99359.  

I don't know why the server status of this project and most others is hiding information

The information is only updated every few hours, in which time 100k tasks could be made available and exhausted.

As to why it's only updated every few hours, I have no idea, but it is and has been for years.

But essentially, you're right. We're available to grab tasks 247. If there are tasks to grab, we get them. If there aren't, we don't.
If people are happy to wait for tasks from only this project, fine. If not, get some from elsewhere.
It's all so simple I don't know why I'm bothering to say so. And yet, people still ask.

Personally, I increased my runtime to 12hrs from 8hrs, so the few I get last me longer and I don't need so many so other people can get a share too.
I'm slipping in a few WCG too, so when tasks come available again, the debt can be repaid to Rosetta with less interruption.
Again, it's so simple I don't know why I'm bothering to write that.

Running, completing and returning tasks promptly is thousands of times more important to the researchers than it is to any of us.
If they need any work done, no-one will be more motivated to provide the tasks than the researchers. That's obvious too.

And so, we wait.
ID: 99363 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 99364 - Posted: 22 Oct 2020, 0:34:33 UTC - in response to Message 99363.  

I don't know why the server status of this project and most others is hiding information

The information is only updated every few hours, in which time 100k tasks could be made available and exhausted.


Update frequency is irrelevant. The problem is the server status page always says 30,000 available, which is wrong. The main homepage can say something like 15,000,000! I can only assume there's a high speed buffer which is what downloads the tasks to us, and that buffer holds 30,000. But it's more sensible to tell us the number of work units in the whole buffer, which is up to about 15,000,000. That number is already publicly available on the homepage, so why not put it in server status too?

Personally, I increased my runtime to 12hrs from 8hrs, so the few I get last me longer and I don't need so many so other people can get a share too.


I'm not sure what the best to set that to is. They seem to think 8 hours is right so I leave it at that. Fast CPUs will do more work in that 8 hours so there must be returned units with varying sizes of work done. I read something on this forum that they don't need each unit "completed", but each unit helps build a map along with the others. So I'm not sure your extra 4 hours is of any use to them. You're just increasing the resolution in a particular point which they don't need, or they'd be setting the default to 12 hours.

I'm slipping in a few WCG too, so when tasks come available again, the debt can be repaid to Rosetta with less interruption.


I like WCG because they have a handful of projects and you can pick which you feel is most important. I've got it running cancer research on 62 cores.

Running, completing and returning tasks promptly is thousands of times more important to the researchers than it is to any of us.
If they need any work done, no-one will be more motivated to provide the tasks than the researchers. That's obvious too.
And so, we wait.


Indeed. There seem to be a lot of people doing this as some kind of competition to see who can get the most points. I do it to further research. It's not worth spending money on electricity just for a number of points, but it is to donate to science. I use the points only as a guide to how well I'm doing and if something isn't working properly. I'll sometimes buy extra hardware because I'm not doing well enough compared to others, but it's for the science not the world position. And sometimes I'll see I'm getting no points on a project and realise something's up - like LHC not returning valid units due to a Virtualbox error.
ID: 99364 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sombraguerrero

Send message
Joined: 5 Mar 18
Posts: 8
Credit: 617,193
RAC: 0
Message 99365 - Posted: 22 Oct 2020, 7:56:38 UTC
Last modified: 22 Oct 2020, 7:59:13 UTC

I think distributive computation projects have become commonplace enough now that they're starting to take up residence in the collective psyche. Especially right now with the pandemic, it's easy to start viewing something like this as just a part of your daily routine. People don't like having their routines disrupted and so when they are, they want justification for it as they would for anything else. I think that's what people get lost in. It's also easy to get lost in the stats competition as something much simpler and possibly more concrete to some than the underlying science. It is a really good metric for certain aspects of system performance, but I think it's important to have a balance between that and healthy reverence for the larger purpose. I appreciate the stats as a data point. I even preserve them beyond the XMLs so I can record them over broader history, but I mostly like to think of it in terms of, "My computer's on all the time right now anyway. Might as well do the most good I can with the cycles." I for one enjoy the science, but I also think I just appreciate being kept informed enough to know that there isn't a technical difficulty preventing work from coming, because if I know that, I can plan to do other things.
ID: 99365 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 99367 - Posted: 22 Oct 2020, 11:08:30 UTC - in response to Message 99365.  

I think distributive computation projects have become commonplace enough now that they're starting to take up residence in the collective psyche. Especially right now with the pandemic, it's easy to start viewing something like this as just a part of your daily routine. People don't like having their routines disrupted and so when they are, they want justification for it as they would for anything else. I think that's what people get lost in. It's also easy to get lost in the stats competition as something much simpler and possibly more concrete to some than the underlying science. It is a really good metric for certain aspects of system performance, but I think it's important to have a balance between that and healthy reverence for the larger purpose. I appreciate the stats as a data point. I even preserve them beyond the XMLs so I can record them over broader history, but I mostly like to think of it in terms of, "My computer's on all the time right now anyway. Might as well do the most good I can with the cycles." I for one enjoy the science, but I also think I just appreciate being kept informed enough to know that there isn't a technical difficulty preventing work from coming, because if I know that, I can plan to do other things.


As long as you do at least 2 projects, you don't get that interruption. You just see it doing different work today and immediately know that project A is offline. If there's no work at all, the fault's at your end.

I've joined 8 projects plus 2 beta projects plus 1 "silent" project (as in rarely has work) on 6 computers. I've set the weighting for each project equal to my world position, so the ones I'm behind on get more work, and I know how well I'm doing compared to others. It's an incentive to buy more hardware! But I chose those projects because I think they're interesting and worthwhile, and when I see a task running that interests me, I'll switch the other projects off (no new work setting) and let it do loads of that. At the moment I've got all the CPUs on cancer research, unfortunately GPUs can't do that.
ID: 99367 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sombraguerrero

Send message
Joined: 5 Mar 18
Posts: 8
Credit: 617,193
RAC: 0
Message 99369 - Posted: 22 Oct 2020, 14:59:07 UTC - in response to Message 99367.  

Oh, I do multiple projects, to be clear, but I still like to know when one is going to be legitimately down so I can adjust some related processes I have.
ID: 99369 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 99370 - Posted: 22 Oct 2020, 15:27:46 UTC - in response to Message 99369.  

Oh, I do multiple projects, to be clear, but I still like to know when one is going to be legitimately down so I can adjust some related processes I have.


What is there to adjust? Boinc does that for you.
ID: 99370 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sombraguerrero

Send message
Joined: 5 Mar 18
Posts: 8
Credit: 617,193
RAC: 0
Message 99371 - Posted: 22 Oct 2020, 17:28:29 UTC - in response to Message 99370.  

I have an ETL I run on my stats data to do my own stuff with it, for fun, and for education.
ID: 99371 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 99372 - Posted: 22 Oct 2020, 18:15:51 UTC - in response to Message 99371.  

I have an ETL I run on my stats data to do my own stuff with it, for fun, and for education.


Then upgrade it so it can handle the common thing that is server outages and work shortage.
ID: 99372 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sombraguerrero

Send message
Joined: 5 Mar 18
Posts: 8
Credit: 617,193
RAC: 0
Message 99373 - Posted: 22 Oct 2020, 18:53:32 UTC - in response to Message 99372.  
Last modified: 22 Oct 2020, 18:56:38 UTC

Oh I have, but because outages really just leave the data stale, quite often, I'd rather modify the consumption than stop it altogether, especially when it comes to spanning multiple projects. The point I'm trying to make with this is that there are legitimate reasons for wanting to know if a project is actually in a fail state. Most of the time, they aren't, but it's nice to know when they are.
ID: 99373 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 99374 - Posted: 22 Oct 2020, 19:48:16 UTC - in response to Message 99373.  

The point I'm trying to make with this is that there are legitimate reasons for wanting to know if a project is actually in a fail state.

Of course. Most projects make some attempt to notify you when there is a planned maintenance outage or a lack of work.
Rosetta doesn't.

Several types of work units have some conflict with each other, so that running them together slows down the total output. I can't plan on how to juggle that with Rosetta.
I had four machines (two Ryzen 3900X and two Ryzen 3950X) on Rosetta until recently, Now I am down to zero.

I am actually glad that they can get their work done without so many crunchers, if that is what they want. If it is not what they want, they need to wake up and say something.
I don't need to explain to them. They need to explain to me.
ID: 99374 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2146
Credit: 41,570,180
RAC: 8,210
Message 99375 - Posted: 22 Oct 2020, 22:51:10 UTC - in response to Message 99364.  

I don't know why the server status of this project and most others is hiding information

The information is only updated every few hours, in which time 100k tasks could be made available and exhausted.

Update frequency is irrelevant. The problem is the server status page always says 30,000 available, which is wrong. The main homepage can say something like 15,000,000! I can only assume there's a high speed buffer which is what downloads the tasks to us, and that buffer holds 30,000. But it's more sensible to tell us the number of work units in the whole buffer, which is up to about 15,000,000. That number is already publicly available on the homepage, so why not put it in server status too?

I don't know if it's a high-speed buffer. It has been explained but I didn't care enough to be paying attention tbh.
I thought it was something to do with being converted into a downloadable format for us, but I'm probably wrong.

Personally, I increased my runtime to 12hrs from 8hrs, so the few I get last me longer and I don't need so many so other people can get a share too.

I'm not sure what the best is to set it to either. They seem to think 8 hours is right so I leave it at that. Fast CPUs will do more work in that 8 hours so there must be returned units with varying sizes of work done. I read something on this forum that they don't need each unit "completed", but each unit helps build a map along with the others. So I'm not sure your extra 4 hours is of any use to them. You're just increasing the resolution in a particular point which they don't need, or they'd be setting the default to 12 hours.

It returns more decoys and I think that makes a difference.
I'm not sure what the best setting is either - I'm just eking out the tasks and returning more data while there's a current shortage, which now seems to be solved.
I'll keep it at 12hrs until Sunday, just to make sure the current supply isn't only temporary, then I'll switch back to the default 8hrs.

I'm slipping in a few WCG too, so when tasks come available again, the debt can be repaid to Rosetta with less interruption.

I like WCG because they have a handful of projects and you can pick which you feel is most important. I've got it running cancer research on 62 cores.

I do too. I haven't limited my tasks only to Open Pandemics. I don't see any of their projects as any less important - probably the other parts of the project need some extra attention while everyone else is focussing on CV19 for the more obvious reasons.

Indeed. There seem to be a lot of people doing this as some kind of competition to see who can get the most points. I do it to further research. It's not worth spending money on electricity just for a number of points, but it is to donate to science. I use the points only as a guide to how well I'm doing and if something isn't working properly. I'll sometimes buy extra hardware because I'm not doing well enough compared to others, but it's for the science, not the world position. And sometimes I'll see I'm getting no points on a project and realise something's up - like LHC not returning valid units due to a Virtualbox error.

Nicely said
ID: 99375 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 99379 - Posted: 23 Oct 2020, 19:29:02 UTC - in response to Message 99374.  

The point I'm trying to make with this is that there are legitimate reasons for wanting to know if a project is actually in a fail state.

Of course. Most projects make some attempt to notify you when there is a planned maintenance outage or a lack of work.
Rosetta doesn't.

Several types of work units have some conflict with each other, so that running them together slows down the total output. I can't plan on how to juggle that with Rosetta.
I had four machines (two Ryzen 3900X and two Ryzen 3950X) on Rosetta until recently, Now I am down to zero.

I am actually glad that they can get their work done without so many crunchers, if that is what they want. If it is not what they want, they need to wake up and say something.
I don't need to explain to them. They need to explain to me.


Oh for goodness sake. If you can't handle a varying workload you're doing something drastically wrong. I manage just fine. Find a backup project.
ID: 99379 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,116,986
RAC: 4,044
Message 99380 - Posted: 23 Oct 2020, 19:36:23 UTC - in response to Message 99375.  

Update frequency is irrelevant. The problem is the server status page always says 30,000 available, which is wrong. The main homepage can say something like 15,000,000! I can only assume there's a high speed buffer which is what downloads the tasks to us, and that buffer holds 30,000. But it's more sensible to tell us the number of work units in the whole buffer, which is up to about 15,000,000. That number is already publicly available on the homepage, so why not put it in server status too?

I don't know if it's a high-speed buffer. It has been explained but I didn't care enough to be paying attention tbh.
I thought it was something to do with being converted into a downloadable format for us, but I'm probably wrong.


Perhaps. But I think all projects should show the larger figure (or both) on the server status page. what the users want to know is how much work is left to come. Most projects don't even list the larger figure anywhere, at least Rosetta puts it on the homepage.

I'm not sure what the best is to set it to either. They seem to think 8 hours is right so I leave it at that. Fast CPUs will do more work in that 8 hours so there must be returned units with varying sizes of work done. I read something on this forum that they don't need each unit "completed", but each unit helps build a map along with the others. So I'm not sure your extra 4 hours is of any use to them. You're just increasing the resolution in a particular point which they don't need, or they'd be setting the default to 12 hours.

It returns more decoys and I think that makes a difference.
I'm not sure what the best setting is either - I'm just eking out the tasks and returning more data while there's a current shortage, which now seems to be solved.
I'll keep it at 12hrs until Sunday, just to make sure the current supply isn't only temporary, then I'll switch back to the default 8hrs.


If they don't need those extra decoys, you're wasting CPU time. Probably better using it for another similar project.

If they do need the extra decoys, they would have set the 8 hours higher themselves.

I like WCG because they have a handful of projects and you can pick which you feel is most important. I've got it running cancer research on 62 cores.

I do too. I haven't limited my tasks only to Open Pandemics. I don't see any of their projects as any less important - probably the other parts of the project need some extra attention while everyone else is focussing on CV19 for the more obvious reasons.


It probably evens out actually. Anyone (probably quite a lot of people) with their settings on the default of "take anything that's there" will get whatever others don't select. So people saying "I'm only doing Coronavirus (I refuse to use the technical name) work" aren't actually making it happen faster, because they "do anything" users just end up getting less of it as they've taken WUs out of the queue.
ID: 99380 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sombraguerrero

Send message
Joined: 5 Mar 18
Posts: 8
Credit: 617,193
RAC: 0
Message 99384 - Posted: 23 Oct 2020, 23:21:04 UTC - in response to Message 99374.  

Several types of work units have some conflict with each other, so that running them together slows down the total output. I can't plan on how to juggle that with Rosetta.


To this point, I feel like Rosetta is more than a bit hoggish at times of the resource share, in spite of the client purportedly distributing the work load evenly. All other project WUs seem quite outnumbered and overpowered when Rosetta is active on my box.
ID: 99384 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 99385 - Posted: 24 Oct 2020, 0:15:03 UTC - in response to Message 99384.  
Last modified: 24 Oct 2020, 0:16:28 UTC

To this point, I feel like Rosetta is more than a bit hoggish at times of the resource share, in spite of the client purportedly distributing the work load evenly.
It doesn't distribute it evenly, it distributes it in accordance with your Resource share settings.
Your Average Credit for your projects shows that Rosetta is getting the least amount of processing time, so how your other projects can be "outnumbered and overpowered when Rosetta is active" makes no sense to me at all.

Since you are using your GPU, you should reserve a CPU core to support each running GPU WU in order to avoid any impact on GPU work with the CPU trying to use all CPU cores and threads for CPU work as well as supporting the GPU.
Grant
Darwin NT
ID: 99385 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sombraguerrero

Send message
Joined: 5 Mar 18
Posts: 8
Credit: 617,193
RAC: 0
Message 99386 - Posted: 24 Oct 2020, 1:32:51 UTC - in response to Message 99385.  

It does distribute it evenly if you never touch anything, hehe. I only ever do have one gpu wu running at one time, so that's not too much of an issue.
I think it's just surprised me over time how little Rosetta yields for the default duration it's expected to cycle.
I also find the way the Boinc projects handle the resource share settings a little too arbitrary, but everyone has their own sweet spot, I suppose. In my case, even with the hour allocation set lower, I always get assigned really long running Rosetta units and I honestly would rather spend that time on other projects anyway, now that I've been tracking stats for a good while.
ID: 99386 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1734
Credit: 18,532,940
RAC: 17,945
Message 99387 - Posted: 24 Oct 2020, 1:50:18 UTC - in response to Message 99386.  
Last modified: 24 Oct 2020, 1:53:02 UTC

I think it's just surprised me over time how little Rosetta yields for the default duration it's expected to cycle.
Or that the other projects you crunch for pay more than they should.
If all projects awarded Credit according to the definition of the Cobblestone, then the differences between projects would be minimal. But they don't- so some pay the going rate, others less, some more & and there are those that pay stupidly obscene amounts more.



I always get assigned really long running Rosetta units
As does everyone.
It is not like other projects where the time taken to process a task varies depending on the capability of the system, the default processing time for Rosetta Tasks is 8 hours. Some may bail out early, others may take up to 10 hours longer. But the vast majority run to the set Target CPU time.
Grant
Darwin NT
ID: 99387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sombraguerrero

Send message
Joined: 5 Mar 18
Posts: 8
Credit: 617,193
RAC: 0
Message 99388 - Posted: 24 Oct 2020, 2:49:32 UTC - in response to Message 99387.  

That makes sense in the context of the stats then. I guess i was just making assumptions based on past experience. I thank you for the input, sir!
ID: 99388 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 13 · Next

Message boards : News : Outage notice



©2024 University of Washington
https://www.bakerlab.org