No work

Message boards : Number crunching : No work

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 88561 - Posted: 27 Mar 2018, 4:41:08 UTC - in response to Message 88559.  

I've said this before for other reasons, but we make our machines available for whatever the project needs. We don't pay for tasks so we can't demand them. 24/7/365 availability of tasks has never been guaranteed. If the project doesn't utilise that resource, that's up to them. If any of us want to have our machines utilised 247 we're at liberty to hold a backup project.

Exactly so. You can of course never precisely match the supply of work units with the requests by the crunchers for them, and it is not the duty of the scientists to provide us a pastime. The scientists do what they have to do. We are here to assist them.
ID: 88561 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
niswes

Send message
Joined: 21 Jun 09
Posts: 2
Credit: 5,059,922
RAC: 630
Message 88564 - Posted: 27 Mar 2018, 10:46:29 UTC

statement from rosetta staff would be nice
ID: 88564 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnH

Send message
Joined: 25 Mar 13
Posts: 43
Credit: 2,319,355
RAC: 0
Message 88566 - Posted: 27 Mar 2018, 12:26:04 UTC - in response to Message 88564.  

Well said. They always seem conspicuous by their absence from these boards.
ID: 88566 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnH

Send message
Joined: 25 Mar 13
Posts: 43
Credit: 2,319,355
RAC: 0
Message 88567 - Posted: 27 Mar 2018, 12:37:09 UTC

At the risk of appearing stupid - who can tell me the difference between these status elements?

Computing status
Work
Tasks ready to send 18125

Tasks by application
Application Unsent
Rosetta 1
Rosetta Mini 0
ID: 88567 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 8,387
Message 88568 - Posted: 27 Mar 2018, 12:47:52 UTC - in response to Message 88564.  

statement from rosetta staff would be nice


+1
ID: 88568 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Warped

Send message
Joined: 15 Jan 06
Posts: 48
Credit: 1,788,185
RAC: 0
Message 88571 - Posted: 27 Mar 2018, 18:40:32 UTC - in response to Message 88568.  

statement from rosetta staff would be nice


+1

+2
ID: 88571 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile VO
Avatar

Send message
Joined: 4 Nov 05
Posts: 7
Credit: 3,250,754
RAC: 0
Message 88572 - Posted: 27 Mar 2018, 19:19:28 UTC

linux only i think
ID: 88572 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 88577 - Posted: 28 Mar 2018, 10:08:42 UTC

Back again, apparently affecting all types of machines. The server status page shows very few unsent units (with the requisite scrolling).

I still think I saw sufficient evidence the other day to suggest there was something different going on among the different OS/browser combinations.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 88577 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 88579 - Posted: 28 Mar 2018, 12:48:00 UTC

I run the 24-hour work units, and haven't gotten any work for a bit longer than that. So I am all on my backup project, GPUGrid - Quantum Chemistry, which is relatively new, but Linux only, and runs multi-core.
ID: 88579 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 8,387
Message 88592 - Posted: 30 Mar 2018, 5:24:51 UTC

Before or later the queue will restart.
I hope.
ID: 88592 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 8,387
Message 88594 - Posted: 30 Mar 2018, 13:13:56 UTC - in response to Message 88559.  
Last modified: 30 Mar 2018, 13:16:18 UTC

I've said this before for other reasons, but we make our machines available for whatever the project needs. We don't pay for tasks so we can't demand them. 24/7/365 availability of tasks has never been guaranteed. If the project doesn't utilise that resource, that's up to them.


I agree. But if admins write two lines to explain the situation (for example: "hey, guys, we have problems with scheduler")...
ID: 88594 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 88595 - Posted: 30 Mar 2018, 14:15:53 UTC - in response to Message 88594.  

I agree. But if admins write two lines to explain the situation (for example: "hey, guys, we have problems with scheduler")...

That would be useful for our planning purposes. A temporarily glitch is different than a long-term shortage, and we could make arrangements accordingly.
ID: 88595 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnH

Send message
Joined: 25 Mar 13
Posts: 43
Credit: 2,319,355
RAC: 0
Message 88596 - Posted: 30 Mar 2018, 15:08:31 UTC - in response to Message 88594.  

I agree. But if admins write two lines to explain the situation (for example: "hey, guys, we have problems with scheduler")...


True dat
ID: 88596 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
JohnH

Send message
Joined: 25 Mar 13
Posts: 43
Credit: 2,319,355
RAC: 0
Message 88601 - Posted: 30 Mar 2018, 21:44:40 UTC

Looks like we're back running ... wonder how long until next "blockage"
ID: 88601 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 9,701
Message 88602 - Posted: 31 Mar 2018, 2:51:19 UTC - in response to Message 88594.  

I've said this before for other reasons, but we make our machines available for whatever the project needs. We don't pay for tasks so we can't demand them. 24/7/365 availability of tasks has never been guaranteed. If the project doesn't utilise that resource, that's up to them.

I agree. But if admins write two lines to explain the situation (for example: "hey, guys, we have problems with scheduler")...

I'm trying to think what the next few words would be after "..." and I can only really come up with "it wouldn't make the slightest difference to anything"

My current issue is now to manage down the tasks from my back-up project to make space for Rosetta tasks again
ID: 88602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 88611 - Posted: 2 Apr 2018, 3:36:42 UTC

Just stopped by to see if there was any explanation of the recent outages or for the increasing problem with "computation errors" that terminate long-running tasks... Used to be the computation errors usually happened within a few minutes of starting, but I just saw another as the task approached 8 hours.

As usual, I was unable to find much substantive information in these forums, but perhaps that is mostly a visibility-and-search problem for the information that might exist somewhere on the website. Perhaps I have actually come to prefer the "We don't care, so you shouldn't worry either" attitude of this project? It would be nice to know if I get any credit at all for 8 hours of computation that ends with a "computation error" and it would be nice to know if the computation errors were related to particular hardware or OSes, but if they don't care, why should I?

I guess from a BOINC-level perspective the solution is to run several projects. I've actually run a number of them over the years, but most of them were more or less problematic, so that approach doesn't much appeal to me.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 88611 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim1348

Send message
Joined: 19 Jan 06
Posts: 881
Credit: 52,257,545
RAC: 0
Message 88614 - Posted: 2 Apr 2018, 12:16:46 UTC - in response to Message 88611.  

Perhaps I have actually come to prefer the "We don't care, so you shouldn't worry either" attitude of this project? It would be nice to know if I get any credit at all for 8 hours of computation that ends with a "computation error" and it would be nice to know if the computation errors were related to particular hardware or OSes, but if they don't care, why should I?

I guess from a BOINC-level perspective the solution is to run several projects. I've actually run a number of them over the years, but most of them were more or less problematic, so that approach doesn't much appeal to me.

Let's just say that they don't find communicating with users to be an efficient use of their time. They might be right.

If you want trouble-free, there is really only World Community Grid. I run a lot of others too of course, but set my expectations accordingly.
ID: 88614 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 9,701
Message 88615 - Posted: 2 Apr 2018, 14:05:13 UTC - in response to Message 88614.  
Last modified: 2 Apr 2018, 14:06:26 UTC

Perhaps I have actually come to prefer the "We don't care, so you shouldn't worry either" attitude of this project? It would be nice to know if I get any credit at all for 8 hours of computation that ends with a "computation error" and it would be nice to know if the computation errors were related to particular hardware or OSes, but if they don't care, why should I?

I guess from a BOINC-level perspective the solution is to run several projects. I've actually run a number of them over the years, but most of them were more or less problematic, so that approach doesn't much appeal to me.

Let's just say that they don't find communicating with users to be an efficient use of their time. They might be right.

If you want trouble-free, there is really only World Community Grid. I run a lot of others too of course, but set my expectations accordingly.

I would've said the same, except in the recent period where I've run a lot of WCG tasks I've come up with 6 errors, all from one sub-project (MIP), however all of which were validly completed by the user who was reissued with them.

This on my new Intel i3-8350K desktop and not at all on my AMD FX8370 which itself has occasional issues with Rosetta 4.07 tasks (but not mini Rosetta 3.78 tasks). However, both are overclocked so maybe those particularly tasks are making specific individual demands that find the cracks on the outer extremes of my machines or during crashes or power losses etc. I have a flaky laptop that has occasional errors too, but my non-overclocked, non-flaky devices produce none. That's a pretty big clue as to where my issues originate and explains why I don't begin by blaming something else for my own self-inflicted problems.

As such, demanding to find a cause at the project end seems to be a futile exercise, when it's just as likely (if not moreso) that it's caused at the user end. So then it's just as legitimate a question for shanen to ask himself what's happening at his end that might explain his computation errors. Do those machines survive a stress test for example. That would be my first port of call before repeatedly blaming somewhere else.
ID: 88615 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile shanen
Avatar

Send message
Joined: 16 Apr 14
Posts: 195
Credit: 12,662,308
RAC: 0
Message 88627 - Posted: 4 Apr 2018, 6:29:15 UTC - in response to Message 88614.  

WCG is one of the projects I ran pretty heavily. I've concluded that I feel less forgiving towards them because IBM is (or was?) supporting the umbrella of WCG for other projects. One of the many problems that drove me away from WCG was confusing inconsistencies and problems among the projects, perhaps like the next poster noted.

Having said that, I actually stopped by today to warn people about the DRH project, and yet as I type this one I see another computation error from a d9244 project... At least it was an early failure. However I think the DRH warning calls for a fresh thread.
#1 Freedom = (Meaningful - Constrained) Choice{5} != (Beer^3 | Speech)
ID: 88627 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2125
Credit: 41,228,659
RAC: 9,701
Message 88649 - Posted: 8 Apr 2018, 0:11:36 UTC - in response to Message 88627.  

WCG is one of the projects I ran pretty heavily. I've concluded that I feel less forgiving towards them because IBM is (or was?) supporting the umbrella of WCG for other projects. One of the many problems that drove me away from WCG was confusing inconsistencies and problems among the projects, perhaps like the next poster noted.

Having said that, I actually stopped by today to warn people about the DRH project, and yet as I type this one I see another computation error from a d9244 project... At least it was an early failure. However I think the DRH warning calls for a fresh thread.

What were the results of the stress tests you ran on your own machines? What did you run? For how long? Were there no errors at all? Or there were errors? How have you gone about fixing or mitigating your issues?

Or you haven't run stress tests and you're blaming everyone and everything else first? Because out of the 4 million people running Boinc projects, no-one else is highlighting issues that you are at this time.

Just trying to help.
ID: 88649 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : No work



©2024 University of Washington
https://www.bakerlab.org