Rosetta 4.1+ and 4.2+

Message boards : Number crunching : Rosetta 4.1+ and 4.2+

To post messages, you must log in.

Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 34 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1498
Credit: 14,744,380
RAC: 16,817
Message 98524 - Posted: 15 Aug 2020, 21:51:26 UTC - in response to Message 98522.  
Last modified: 15 Aug 2020, 21:53:15 UTC

and probably help the programmers find out why they went wrong.
That's the annoying thing.
There is a problem with them, thousands of results for these faulty Work Units have been sent back. So it's well past time time to fix the problem before sending out even more pointless Tasks. It's not like it's one here or there that has an issue, or there are different types of failures- it's the entire group that fail, with the same error, every time.

It may not use up much of our time, but it does use up project bandwidth & storage resources- which could be better used for work that does actually provide a result that isn't an error.
Grant
Darwin NT
ID: 98524 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98525 - Posted: 15 Aug 2020, 21:57:16 UTC

Remember this project is run by a university. It’s August. Most likely everybody is on holiday.
ID: 98525 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,760,353
RAC: 7,227
Message 98526 - Posted: 15 Aug 2020, 22:10:53 UTC - in response to Message 98524.  
Last modified: 15 Aug 2020, 22:11:46 UTC

and probably help the programmers find out why they went wrong.
That's the annoying thing.
There is a problem with them, thousands of results for these faulty Work Units have been sent back. So it's well past time time to fix the problem before sending out even more pointless Tasks. It's not like it's one here or there that has an issue, or there are different types of failures- it's the entire group that fail, with the same error, every time.

It may not use up much of our time, but it does use up project bandwidth & storage resources- which could be better used for work that does actually provide a result that isn't an error.


It's not like this is LHC, the tasks are small and download instantly, and they have very high bandwidth servers. They're not at their limit, so a few problems are not causing anything to be delayed. And I've no idea why you think there are a lot of them, I spot only one or two a day, while running 66 cores and I'm in front of the computer most of the time. The server will automatically stop resending a work unit if it crashes on a few of our machines.
ID: 98526 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1498
Credit: 14,744,380
RAC: 16,817
Message 98527 - Posted: 15 Aug 2020, 23:20:44 UTC - in response to Message 98526.  
Last modified: 15 Aug 2020, 23:26:56 UTC

It's not like this is LHC, the tasks are small and download instantly, and they have very high bandwidth servers.
But the results are large, even the error ones. All bandwidth has to be paid for. Datacentre compute & file storage storage costs money. Better to use that money for things that provide valid results, not errors.


And I've no idea why you think there are a lot of them, I spot only one or two a day, while running 66 cores and I'm in front of the computer most of the time.
One or two a day, with 66 cores. there are thousands of machines, many with dozens (even hundreds) of threads each.
Check the top system list- roughly 30 or so errors or invalid per system per day. That works out to thousands, even tens of thousands, of Work Units that produced nothing but error/invalid results.

It's not like this is something that just started, it's been going on for almost 2 weeks now.


The server will automatically stop resending a work unit if it crashes on a few of our machines.
Better not to send out rubbish in the first place- even more so once you should already know that it's rubbish from all the previous ones that failed.

It's not like some of these Work Unit process ok and others don't- all of the present failures are failing & failing in the same way.
Grant
Darwin NT
ID: 98527 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1498
Credit: 14,744,380
RAC: 16,817
Message 98528 - Posted: 15 Aug 2020, 23:22:16 UTC - in response to Message 98525.  
Last modified: 15 Aug 2020, 23:24:15 UTC

Remember this project is run by a university. It’s August. Most likely everybody is on holiday.
It's not up to the project to sort this out, but the researchers that keep submitting work that doesn't produce useful results.
And it's not been going on for just a few days, we're talking almost 2 weeks now.
Grant
Darwin NT
ID: 98528 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,760,353
RAC: 7,227
Message 98529 - Posted: 15 Aug 2020, 23:25:18 UTC - in response to Message 98527.  

But the results are large, even the error ones. All bandwidth has to be paid for. Datacentre compute & file storage storage costs money. Better to use that money for things that provide valid results, not errors.


They have a huge amount of spare bandwidth, I doubt it's metered any more than yours or mine.

One or two a day, with 66 cores. there are thousands of machines, many with dozens (even hundreds) of threads each.
Check the top system list- roughly 30 or so errors or invalid per system per day. That works out to thousands, even tens of thousands, of Work Units that produced nothing but error/invalid results.

It's not like this is something that just started, it's been going on for almost 2 weeks now.


You need to express that as a percentage or it's meaningless. It's like when governments say 500 people died in car crashes. Yeah, out of 100 million, so not important.
ID: 98529 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,760,353
RAC: 7,227
Message 98530 - Posted: 15 Aug 2020, 23:26:41 UTC - in response to Message 98528.  

Remember this project is run by a university. It’s August. Most likely everybody is on holiday.
It's not up to the project to sort this out, but the researchers that keep submitting work that doesn't produce useful results.
And it's not been going on for just a few days, we're talking almost 2 weeks now.


Do you seriously think they're blindly throwing in work when the last batch comes back as failed? Only thing I've seen act like that is my cat.
ID: 98530 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1498
Credit: 14,744,380
RAC: 16,817
Message 98531 - Posted: 15 Aug 2020, 23:37:15 UTC - in response to Message 98529.  

But the results are large, even the error ones. All bandwidth has to be paid for. Datacentre compute & file storage storage costs money. Better to use that money for things that provide valid results, not errors.
They have a huge amount of spare bandwidth, I doubt it's metered any more than yours or mine.
You bet it's metered. It all has to be paid for, and the more you use the more you pay.


One or two a day, with 66 cores. there are thousands of machines, many with dozens (even hundreds) of threads each.
Check the top system list- roughly 30 or so errors or invalid per system per day. That works out to thousands, even tens of thousands, of Work Units that produced nothing but error/invalid results.

It's not like this is something that just started, it's been going on for almost 2 weeks now.
You need to express that as a percentage or it's meaningless.
It's not meaningless, it's an absolute value. It's might be a small percentage of the whole, but it is still a large number.
There will always be some Work Units that produce errors as they try new things. but it's ridiculous to keep submitting Work Units that have yet to produce a single valid result.


It's like when governments say 500 people died in car crashes. Yeah, out of 100 million, so not important.
Unless it's your wife, husband, kids, parents, girlfriend, boyfriend, best friend etc that is one of the dead.
Grant
Darwin NT
ID: 98531 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98532 - Posted: 15 Aug 2020, 23:37:19 UTC - in response to Message 98528.  

It's not up to the project to sort this out, but the researchers
… who all work at universities, too
ID: 98532 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1498
Credit: 14,744,380
RAC: 16,817
Message 98533 - Posted: 15 Aug 2020, 23:47:04 UTC - in response to Message 98530.  

Do you seriously think they're blindly throwing in work when the last batch comes back as failed? Only thing I've seen act like that is my cat.
The first ones came out a week and a half ago. The present ones were released in the last day or so.
When the supply of new Work Units dries up, the Ready to send supply runs out within 2 days.

That would indicate that yes they are sending out more work even when the initial tasks from that batch all failed. And keep on sending out more work, even as all the returns are failures.
Either that or there is a large batch of work all queued up to be sent out, and instead of cancelling the rest of the batch (since everything returned so far has been Invalid or an Error) they're just letting it go out anyway.
Grant
Darwin NT
ID: 98533 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1498
Credit: 14,744,380
RAC: 16,817
Message 98534 - Posted: 15 Aug 2020, 23:50:30 UTC - in response to Message 98532.  

It's not up to the project to sort this out, but the researchers
… who all work at universities, too
Has the last week and half been a University holiday in the US?
Given that Rosetta hasn't run out of work for a while, i figure that some people must still be there submitting new work to process. It would be nice if they checked some of the early results coming back to see if they should keep submitting certain work or cancel it and sort out what's wrong with it, before resubmitting it.
Grant
Darwin NT
ID: 98534 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 98537 - Posted: 16 Aug 2020, 8:28:06 UTC - in response to Message 98513.  

And again I continue to receive validation errors (another 8 total today, 8-15-20,) on my Android device (3396190) for the foldit1 tasks. Again I question why these clearly erroneous tasks continue to be created! The batch in question today were created during the day of 8/15. The admins and researcher(s) involved with this particular set of tasks must know by now that this particular set of tasks won't produce anything useful! Of note, I've only recently seen these errors/tasks on my Android devices -- not on my PCs -- for whatever reason.

Examples of the specific validation errors have previously been quoted by myself and others in this thread. Today's errors include the following tasks:
1. Name: foldit1_2008762_0003_00_asym_dock_SAVE_ALL_OUT_1005836_4487
Task: 1241587912
2. Name: foldit1_2008835_c016_00_asym_dock_SAVE_ALL_OUT_1005885_4489
Task: 1241587932
3. Name: foldit1_2008835_0008_00_asym_dock_SAVE_ALL_OUT_1005883_4423
Task: 1241586218

Thank you, Grant, for your expert logical arguments regarding this continued issue!
ID: 98537 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1498
Credit: 14,744,380
RAC: 16,817
Message 98538 - Posted: 16 Aug 2020, 8:48:33 UTC - in response to Message 98537.  
Last modified: 16 Aug 2020, 9:09:20 UTC

Of note, I've only recently seen these errors/tasks on my Android devices -- not on my PCs -- for whatever reason.
Luck.
All foldit1 Work Units on my PCs have crashed & burned. foldit0 Work Units however aren't an issue and process normally.


Edit- or more a case of bad luck with the system that's getting nothing but foldit1Tasks. It's a slow system, and doesn't have much in it's cache, so it's requesting new work as it completes the previous Task. Unfortunately at the time, there's a bunch of folidt1Tasks sitting there, and since they complete in a matter of minutes, there's still plenty of them there each time it finishes one, and so ends up getting another as a replacement.
Grant
Darwin NT
ID: 98538 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98539 - Posted: 16 Aug 2020, 8:54:41 UTC - in response to Message 98534.  

A typical academic year in the U.S. is (at longest) September–July, so you can expect most universities to be very quiet for the whole of August.

The plentiful supply of work might be down to researchers submitting huge batches before going on holiday, but they won’t see that it’s failed until they get back. And/or the people active now and submitting new work are not the same people, so are in no position to do anything about the broken tasks.
ID: 98539 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1498
Credit: 14,744,380
RAC: 16,817
Message 98540 - Posted: 16 Aug 2020, 9:16:14 UTC - in response to Message 98539.  

A typical academic year in the U.S. is (at longest) September–July, so you can expect most universities to be very quiet for the whole of August.
Huh.
Here the academic year aligns with the calendar year.
General schools start at the end of January (i think early/mid February for Unis) and the school year ends in mid Dec (Mid/late Nov for Unis). Summer semesters are from Mid Nov to late Feb.
Grant
Darwin NT
ID: 98540 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2003
Credit: 38,610,491
RAC: 18,020
Message 98542 - Posted: 16 Aug 2020, 13:55:04 UTC

The recent posts are all rather peculiar to me.
I thought I must have missed these tasks going through, but there are none at all in my recent history.
I had a couple of errors, but I caused them myself after crashing one PC
ID: 98542 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,760,353
RAC: 7,227
Message 98545 - Posted: 16 Aug 2020, 18:09:31 UTC - in response to Message 98533.  

You bet it's metered. It all has to be paid for, and the more you use the more you pay.


You know for a fact it's a metered connection do you? And since the errors are probably a tenth of a percent of the bandwidth, it's not worth getting upset about.

It's not meaningless, it's an absolute value. It's might be a small percentage of the whole, but it is still a large number.
There will always be some Work Units that produce errors as they try new things. but it's ridiculous to keep submitting Work Units that have yet to produce a single valid result.

Unless it's your wife, husband, kids, parents, girlfriend, boyfriend, best friend etc that is one of the dead.


Which is unlikely if the percentage is low! Let me explain this in simpler terms....

Scenario 1:
100 people in a room, 2 die.
That's alarming. It could have been you.

Scenario 2:
1 million people in a room, 2 die.
No concern at all. Chances are you're safe. But according to you, 2 is the same as 2, so just as dangerous!

If that doesn't make sense to you, how about some real analogies:

Scenario 1 becomes:
100 people go skydiving and 2 die.
We can conclude skydiving is dangerous.

Scenario 2 becomes:
1 million people drive to work every day and 2 die.
We can conclude driving is 10,000 times safer than skydiving, but your calculation would say they're equally dangerous, because both killed 2 people.

That would indicate that yes they are sending out more work even when the initial tasks from that batch all failed. And keep on sending out more work, even as all the returns are failures.
Either that or there is a large batch of work all queued up to be sent out, and instead of cancelling the rest of the batch (since everything returned so far has been Invalid or an Error) they're just letting it go out anyway.


You're assuming they're incompetant morons, I very much doubt that. And they probably have a lot more idea of how it works than you do, since they work there.

Huh.
Here the academic year aligns with the calendar year.
General schools start at the end of January (i think early/mid February for Unis) and the school year ends in mid Dec (Mid/late Nov for Unis). Summer semesters are from Mid Nov to late Feb.


Which means you have big holidays too, but you may have noticed you're in the other hemisphere, so the month names are different....
ID: 98545 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 9,760,353
RAC: 7,227
Message 98546 - Posted: 16 Aug 2020, 18:09:41 UTC - in response to Message 98542.  

The recent posts are all rather peculiar to me.
I thought I must have missed these tasks going through, but there are none at all in my recent history.
I had a couple of errors, but I caused them myself after crashing one PC


There's not many faulty tasks, just a few people getting upset over nothing. I think we must have some journalists in here, that's their way of thinking. One little thing goes wrong and they think the world is about to end.
ID: 98546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1498
Credit: 14,744,380
RAC: 16,817
Message 98555 - Posted: 17 Aug 2020, 5:52:28 UTC - in response to Message 98545.  

Which is unlikely if the percentage is low! Let me explain this in simpler terms....
Since you're missing the point entirely (deliberately or otherwise), there's no point discussing it further.
Grant
Darwin NT
ID: 98555 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1498
Credit: 14,744,380
RAC: 16,817
Message 98556 - Posted: 17 Aug 2020, 5:57:23 UTC - in response to Message 98542.  

The recent posts are all rather peculiar to me.
I thought I must have missed these tasks going through, but there are none at all in my recent history.
And today is the first day i don't have any in my current Task list either, after having them continuously for over a week and a half.
Even the number of errored/Invalid Tasks for the top systems has dropped off a lot over the last few hours.
Grant
Darwin NT
ID: 98556 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 34 · Next

Message boards : Number crunching : Rosetta 4.1+ and 4.2+



©2024 University of Washington
https://www.bakerlab.org