Posts by Sid Celery

1) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109481)
Posted 2 days ago by Sid Celery
Post:
'Project down for maintenance' messages being issued for over 24hrs
While all tasks are pretty much completed this sounds like the best time
Servers being randomly up and down over a considerable period, it does need a thorough going over
Let's hope they find and resolve everything...
...and have a whole bunch of tasks waiting for us on completion

I can dream
2) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109473)
Posted 7 days ago by Sid Celery
Post:
Just a hiccup in when they get awarded which might take a day or two at most (but might also be just a few hours)

Well, not so few hours...

I looked earlier today. I think it came back about 10hrs after your post, so between 1 & 2 days.
I think I saw the whole site go down (again) a few hours before too.
Everything seems so fragile.
No new tasks yet, but I've picked up a few resends through the day
3) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109471)
Posted 8 days ago by Sid Celery
Post:
Still... 'completed awaiting validation'...
More credits gone, along with electricity and time?

All credits do get caught up once the server is restarted. No time or energy lost.
Just a hiccup in when they get awarded which might take a day or two at most (but might also be just a few hours)
4) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109470)
Posted 8 days ago by Sid Celery
Post:
Some more work would be nice.
It's been freezing the last few mornings here, and the system has been keeping the lounge room almost comfortable.

But now it's out of work, and tomorrow morning if more work doesn't come along, it'll be almost as cold inside as it is outside (or an upgraded version over at Ralph & some new work there would be nice- either this or that, or even both would be nice).

Your post made me look for the first time exactly where Darwin is and, after checking on a weather site, discover your winter is still 2-4C higher than this English summer.
My sympathies are therefore quite limited, as well as thinking it's a rather inefficient way to heat the house.

While I understand and accept your reasoning for keeping a tight cache, I can only repeat my advice to change from setting a default runtime at Rosetta, which turns out to be only 3hrs, to making it explicitly 8hrs to match what Boinc thinks it is (at the point of download anyway). Not only would you get an extra 5hrs work, you would reduce your churn through tasks by almost two-thirds, marginally extending how long each batch of tasks will last, which is valuable when we see each batch run out before further tasks become available.

To emphasise the difference between me and you, I keep a 0.5 plus 0.1 cache and set a 12hr runtime.
So when I have 4-5hrs of tasks remaining, I already have 16 tasks (8C16T) cued up and another 16 can come down, which works out at 28-29hrs of work when Rosetta runs out.
At an 8hr runtime, this would still be 20-21hrs.
As compared to your maximum of 3hrs work while trying to gobble up tasks only at the last minute.
The difference is huge for one host and, the more people who make the runtime change I suggest, the longer batches of tasks would last and the shorterfewer periods without any on the whole site.

This is why I keep repeating myself. Everyone should do both yourselves and everyone else a favour imo,
5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109466)
Posted 9 days ago by Sid Celery
Post:
I think it went to almost 500k, but I took a look at 20:35 UK time just as parts of boinc-process came back online and after a refresh it was all back
A glance now (01:38 UK time) and it shows 266k, so it's coming down slowly

Now the server are green, but there are over 18k wu pending validation. Increasing.

Now pink - boinc-process is down again and 56k awaiting validation.
And not too many tasks left to come down either.
We continue to be very hand-to-mouth atm
6) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109465)
Posted 9 days ago by Sid Celery
Post:
Snd seems a new kind of simulations: "testmpnn_hallucinated" and "testmpnn_diffusion"

Yup - wonder what that's all about.

Maybe related to "message-passing neural networks" (mpnn), like this

Very likely. Thanks for the link - looks like good work.
7) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109460)
Posted 10 days ago by Sid Celery
Post:
A relatively small number of tasks available - showing 230k on the front page 3hrs ago. Hopefully part of more, but may not be.
It's something

Snd seems a new kind of simulations: "testmpnn_hallucinated" and "testmpnn_diffusion"

Yup - wonder what that's all about.
A few more tasks becoming available too - still not a great amount. Showing 475k an hour ago on the front page.
Every little bit helps
8) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109458)
Posted 10 days ago by Sid Celery
Post:
I'm just in the final stages of clearing down all the excess WCG tasks Boinc brought down from the previous Rosetta outage and we're out of Rosetta tasks again.
So frustrating...

A relatively small number of tasks available - showing 230k on the front page 3hrs ago. Hopefully part of more, but may not be.
It's something
9) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109454)
Posted 13 days ago by Sid Celery
Post:
Almost 300k now.
Almost 400k now.

I think it went to almost 500k, but I took a look at 20:35 UK time just as parts of boinc-process came back online and after a refresh it was all back
A glance now (01:38 UK time) and it shows 266k, so it's coming down slowly

I'm just in the final stages of clearing down all the excess WCG tasks Boinc brought down from the previous Rosetta outage and we're out of Rosetta tasks again.
So frustrating...
10) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109453)
Posted 15 days ago by Sid Celery
Post:
So, if you feel bad about my reply here, take a moment to think about my situation...

I'm not bad about your reply, I'm sorry for your pessimism.
I continue to think that if a software is bugged, it's good thing to advice developers.

Don't they read it? Too bad for them.

I'm not a coder of any kind, but the impression I get is that it's an error-trapping issue rather than a bug (you could say that's the same thing, I accept).
The impression I get (but may be very wrong) is that tasks are seeded randomly, but don't double-check if the random seed is out of bounds so it can be re-seeded, and errors out as a result.
It's a <perfect>, even if ugly, solution.
It happens so rarely and with such little consequence (wasted CPU time is approx zero) that it's not worth the effort to correct among a batch somewhere around a million tasks.
The rest give them the results they need.

It may offend from a user pov, but I think from a researcher pov it's neither here nor there.
It's very likely they <do> know. It just doesn't matter.
And, as always, we're here for the project's needs. They don't exist for ours.

The tail has never wagged the dog at this project - unlike many other projects.
That's been made very clear to me. It's not pessimism on my part, but realism.
I don't need to be told twice, even if others need to be told ten or twenty times and still not take the hint.
I know that sounds harsh, but I don't know how else to say it.
11) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109452)
Posted 15 days ago by Sid Celery
Post:
Almost 300k now.
Almost 400k now.

I think it went to almost 500k, but I took a look at 20:35 UK time just as parts of boinc-process came back online and after a refresh it was all back
A glance now (01:38 UK time) and it shows 266k, so it's coming down slowly
12) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109448)
Posted 16 days ago by Sid Celery
Post:
Even clicking reply, typing +1, then clicking send takes more time, let alone the time taken checking if I had any
I can't bring myself to care, let alone mention it

Is there a remote hope that someone of the team reads, before or later, the forum and take a solution for an old bug??
A hope, remote hope...

After a few years now, I think we can be certain the answer is a firm no.

I was taken by a reply I had (in the days when I was being replied to - also years ago) when a lot higher proportion of tasks were getting rejected and, rather than delete the offending tasks, because they only ran for 15-20secs of CPU time, was to let them run and error out because even 30 tasks would only be 5-600secs of CPU time (actually core time, so divide by the number of cores for actual seconds of CPU time) and that was several orders of magnitude less work than coding some way of deleting them before they went out. During which exercise, a lot of good tasks would be taken out at the same time, so it was counterproductive in a multitude of ways.

And that's what happened.

The same applies here. No-one in their right mind would do any different.

The only real problem is the amount of time wasted complaining about it.

Tbh, I think it's exactly the same reason why <I> stopped getting replies. A complete waste of time and effort.
So, if you feel bad about my reply here, take a moment to think about my situation...

Meanwhile, boinc-process server is down again - no validation going on right now - 200k waiting in the queue
13) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109444)
Posted 17 days ago by Sid Celery
Post:
Sadly, still with the lower connect error

+1

I've had one.
CPU runtime 2 seconds
Even clicking reply, typing +1, then clicking send takes more time, let alone the time taken checking if I had any
I can't bring myself to care, let alone mention it

In the meantime, the whole site went down for a few hours, in which time Boinc decided to bring down 21 WCG tasks I didn't really want to have in my cache, which I consider a waste of time even of it will keep my PC occupied
14) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109435)
Posted 19 days ago by Sid Celery
Post:
New tasks came down about an hour ago
15) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109432)
Posted 23 days ago by Sid Celery
Post:
While there are no Rosetta tasks, you can crunch some Ralph nvidia GPU tasks (1000 available at this moment https://ralph.bakerlab.org/server_status.php) and help accelerate release of GPU app to Rosetta!

I just did.
And then remembered the minimum 5Gb (6Gb) req't for RAM on my Video Card, which only has 4Gb... <sigh>
16) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109431)
Posted 23 days ago by Sid Celery
Post:
boinc-process server has died, again.

I didn't notice again and, now I look, it's back.
Maybe I should look more often.
Or you should look less often...

The last of my Rosetta tasks are running now, showing the benefit of ensuring all my runtimes are at least 8hrs rather than the 3hr mistake Rosetta Beta tasks are set to
17) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109426)
Posted 24 days ago by Sid Celery
Post:
Queued jobs down to 153k 3hrs ago, so another shout out for this.
I'm estimating we only have another 12-13hrs of tasks unless more get queued up.

I think we had a few extra Rosetta 4.20 tasks but not many and we're out anyway now
Fingers crossed for another batch
18) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109425)
Posted 25 days ago by Sid Celery
Post:
Now things just need to stop falling over in the first place.

Yes, but also I'd remind everyone of my view
Rosetta Beta 6.04 tasks wrongly default to 3hrs CPU runtime while Rosetta v4.20 rightly default to 8hrs.

So set the Rosetta@home Target CPU Runtime explicitly to 8hrs so that CPU runtime matches what Boinc is told to assume, and not to 'not selected'.

Do more work, get more credits, Boinc schedules more correctly and sooner, batches of tasks issued by Rosetta last longer. Rosetta tasks run out less often. <Everyone> wins.

The alternative is what we have now - no new tasks. Everyone loses.

The more people make this change, the better for everyone, whether that boinc-process server goes down or not

Queued jobs down to 153k 3hrs ago, so another shout out for this.
I'm estimating we only have another 12-13hrs of tasks unless more get queued up.
19) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109419)
Posted 26 Jun 2024 by Sid Celery
Post:
boinc-process server is dead again, Validation backlog continues to grow.
And it's back again.

This is getting like my home-life...
"I've lost my xyz"
"You could at least help to look"
"Oh, there it is"
Me: "What was that you said?"

If I play dumb long enough before paying any attention, most things right themselves on their own

Edit: I just reached 40,000,000 on Rosetta
Edit2: And 100,000,000 for my team across all projects
20) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 109412)
Posted 23 Jun 2024 by Sid Celery
Post:
Now things just need to stop falling over in the first place.

Yes, but also I'd remind everyone of my view
Rosetta Beta 6.04 tasks wrongly default to 3hrs CPU runtime while Rosetta v4.20 rightly default to 8hrs.

So set the Rosetta@home Target CPU Runtime explicitly to 8hrs so that CPU runtime matches what Boinc is told to assume, and not to 'not selected'.

Do more work, get more credits, Boinc schedules more correctly and sooner, batches of tasks issued by Rosetta last longer. Rosetta tasks run out less often. <Everyone> wins.

The alternative is what we have now - no new tasks. Everyone loses.

The more people make this change, the better for everyone, whether that boinc-process server goes down or not


Next 20



©2024 University of Washington
https://www.bakerlab.org