Validator down... :-(

Message boards : Number crunching : Validator down... :-(

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,411
RAC: 2,617
Message 71461 - Posted: 22 Oct 2011, 20:08:22 UTC

Well, never a dull moment...

Does anyone know what the issue is here or is this (just) another "it's weekend and no sysadmin is around" kind of typical R@H thing again? :-(

Ralf
ID: 71461 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 71462 - Posted: 22 Oct 2011, 21:03:01 UTC - in response to Message 71461.  

Well, never a dull moment...

Does anyone know what the issue is here or is this (just) another "it's weekend and no sysadmin is around" kind of typical R@H thing again? :-(

Ralf


are the work units we do wasted until this is fixed??
ID: 71462 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jesse Viviano

Send message
Joined: 14 Jan 10
Posts: 42
Credit: 2,700,472
RAC: 0
Message 71464 - Posted: 23 Oct 2011, 2:10:27 UTC - in response to Message 71462.  

Well, never a dull moment...

Does anyone know what the issue is here or is this (just) another "it's weekend and no sysadmin is around" kind of typical R@H thing again? :-(

Ralf


are the work units we do wasted until this is fixed??

No. They are not wasted. They will simply build a backlog for the validator to process once it is restarted.
ID: 71464 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jesse Viviano

Send message
Joined: 14 Jan 10
Posts: 42
Credit: 2,700,472
RAC: 0
Message 71465 - Posted: 23 Oct 2011, 2:12:18 UTC - in response to Message 71461.  

Well, never a dull moment...

Does anyone know what the issue is here or is this (just) another "it's weekend and no sysadmin is around" kind of typical R@H thing again? :-(

Ralf

I would have to guess that server bk1 either has gone down or failed. All of the processes that are on bk1 are down, but the processes on the other servers are up.
ID: 71465 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,411
RAC: 2,617
Message 71466 - Posted: 23 Oct 2011, 2:43:45 UTC - in response to Message 71465.  

Well, never a dull moment...

Does anyone know what the issue is here or is this (just) another "it's weekend and no sysadmin is around" kind of typical R@H thing again? :-(

Ralf

I would have to guess that server bk1 either has gone down or failed. All of the processes that are on bk1 are down, but the processes on the other servers are up.
Well, that's a very obvious guess...

Problem is now that not only the validator doesn't run but that you can not upload finished WU's either... :-(

And of course this all just happens to happen when I added another workstation back to crunching for R@H... :?

Ralf
ID: 71466 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,463,172
RAC: 15,101
Message 71467 - Posted: 23 Oct 2011, 4:36:47 UTC - in response to Message 71466.  

Well, never a dull moment...

Does anyone know what the issue is here or is this (just) another "it's weekend and no sysadmin is around" kind of typical R@H thing again? :-(

Ralf

I would have to guess that server bk1 either has gone down or failed. All of the processes that are on bk1 are down, but the processes on the other servers are up.
Well, that's a very obvious guess...

Problem is now that not only the validator doesn't run but that you can not upload finished WU's either... :-(

Is that the case for you? Everything's uploaded here and new downloads coming down too. Just awaiting validation - I think for 14 hours.

Because rah_validator_beta appears to be running on server bk2, does that mean that some WUs are being validated - just too slowly?
ID: 71467 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,411
RAC: 2,617
Message 71468 - Posted: 23 Oct 2011, 5:41:50 UTC - in response to Message 71467.  
Last modified: 23 Oct 2011, 5:42:11 UTC

Problem is now that not only the validator doesn't run but that you can not upload finished WU's either... :-(

Is that the case for you? Everything's uploaded here and new downloads coming down too. Just awaiting validation - I think for 14 hours.
1 WU made it mysteriously past the uploading and joined the rest of the previous WU waiting for validation. At least the laptop where I had added R@H again after it "seemed" that things are working ok for a few weeks has two previous finished WU's still sitting as "uploading"...
Get the occasional message that "there is no active Internet connection" (which is absolute bullcrap) now on that one too.
Have stopped R@H from receiving new WU's and added WCG instead. Will see what the R@H WUs currently running will do when they finish in about an hour, tried already to reboot to no avail...

Ralf
ID: 71468 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,411
RAC: 2,617
Message 71470 - Posted: 23 Oct 2011, 7:26:32 UTC - in response to Message 71468.  

At least the laptop where I had added R@H again after it "seemed" that things are working ok for a few weeks has two previous finished WU's still sitting as "uploading"...
Subsequent WU's finshed and uploaded fine, but those two just wont budge... :-(

In the meantime, the pending list keeps growing... :-(

Ralf
ID: 71470 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,463,172
RAC: 15,101
Message 71472 - Posted: 23 Oct 2011, 13:01:49 UTC - in response to Message 71468.  

Problem is now that not only the validator doesn't run but that you can not upload finished WU's either... :-(

Is that the case for you? Everything's uploaded here and new downloads coming down too. Just awaiting validation - I think for 14 hours.
1 WU made it mysteriously past the uploading and joined the rest of the previous WU waiting for validation. At least the laptop where I had added R@H again after it "seemed" that things are working ok for a few weeks has two previous finished WU's still sitting as "uploading"...
Get the occasional message that "there is no active Internet connection" (which is absolute bullcrap) now on that one too.
Have stopped R@H from receiving new WU's and added WCG instead. Will see what the R@H WUs currently running will do when they finish in about an hour, tried already to reboot to no avail...

About 8 hours after my previous message my uploads started having a problem too. A manual update got 3 out of 5 to upload but the other 2 wouldn't shift. Receiving new WUs seems fine.

There's no point switching to another project for me yet as crunching is fine & points are just saved up for later, not lost. I still have debt to Rosetta to catch up from several months ago - don't think I've touched WCG on this account for 3+ months.
ID: 71472 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 71477 - Posted: 24 Oct 2011, 12:21:28 UTC

Ten hour`s ago the validator was not running and i had eight work units pending,
these have now been validated, so something worked during the night.
though i still see plenty of red on the server status page !!.

I think now that SETI has got rid of it`s latest gremlins they have set up home@Rosetta :-)

It will get fixed.
ID: 71477 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,463,172
RAC: 15,101
Message 71478 - Posted: 24 Oct 2011, 12:23:22 UTC

All my uploads have gone through and all previous uploads have been validated now.

Typically, the server status page shows bk1 is still down, but that's about par for the course.
ID: 71478 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,411
RAC: 2,617
Message 71480 - Posted: 24 Oct 2011, 14:30:08 UTC - in response to Message 71478.  

All my uploads have gone through and all previous uploads have been validated now.

Typically, the server status page shows bk1 is still down, but that's about par for the course.
The two WUs stuck on uploading finally moved and a couple of WUs have been validated over night, but most of them still show "pending", so not much news on this end...

Ralf
ID: 71480 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jesse Viviano

Send message
Joined: 14 Jan 10
Posts: 42
Credit: 2,700,472
RAC: 0
Message 71483 - Posted: 24 Oct 2011, 20:11:26 UTC - in response to Message 71467.  

Well, never a dull moment...

Does anyone know what the issue is here or is this (just) another "it's weekend and no sysadmin is around" kind of typical R@H thing again? :-(

Ralf

I would have to guess that server bk1 either has gone down or failed. All of the processes that are on bk1 are down, but the processes on the other servers are up.
Well, that's a very obvious guess...

Problem is now that not only the validator doesn't run but that you can not upload finished WU's either... :-(

Is that the case for you? Everything's uploaded here and new downloads coming down too. Just awaiting validation - I think for 14 hours.

Because rah_validator_beta appears to be running on server bk2, does that mean that some WUs are being validated - just too slowly?

I believe that rah_validator_beta is for beta work units, not Rosetta@home production units.
ID: 71483 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,411
RAC: 2,617
Message 71485 - Posted: 25 Oct 2011, 0:12:59 UTC

Well, after two days being down, it took someone apparently less than two hours to fix the problem.
All servers show status running and all but 3 WUs that where stuck as pending have been validated.

Still wonder why the response from the R@H team has to be so abysmal compared to other scientific projects... :-(

Ralf
ID: 71485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mt4cancer

Send message
Joined: 2 Mar 11
Posts: 1
Credit: 1,321,587
RAC: 0
Message 71489 - Posted: 25 Oct 2011, 14:42:33 UTC

Either the validator is malfunctioning and this is not registering on the Server Status page, or else it is badly behind. I have over 120 work units waiting to validate -- some are several days old.
ID: 71489 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,411
RAC: 2,617
Message 71490 - Posted: 25 Oct 2011, 16:13:02 UTC - in response to Message 71489.  

Either the validator is malfunctioning and this is not registering on the Server Status page, or else it is badly behind. I have over 120 work units waiting to validate -- some are several days old.
Yeah, something's still up, all WU's that were pending over the weekend went through but now since this morning/last night, WU's keep getting stuck as pending again here as well...

Ralf
ID: 71490 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1982
Credit: 38,463,172
RAC: 15,101
Message 71491 - Posted: 25 Oct 2011, 17:10:28 UTC - in response to Message 71490.  

Either the validator is malfunctioning and this is not registering on the Server Status page, or else it is badly behind. I have over 120 work units waiting to validate -- some are several days old.
Yeah, something's still up, all WU's that were pending over the weekend went through but now since this morning/last night, WU's keep getting stuck as pending again here as well...

I agree. My machines are fine with new WUs getting uploaded & validated quickly, but a couple of my team-mates haven't fared so well - many unvalidated WUs over a few days:

Comp 1327856
Comp 1327862
Comp 1370235
ID: 71491 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,411
RAC: 2,617
Message 71492 - Posted: 25 Oct 2011, 19:15:53 UTC

Pending WUs are definitely piling up again, RALPH@Home WU's can not be reported due to a server error as well and on the server status page, everything shows running, which it certainly is not... :-(

Ralf
ID: 71492 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
.clair.

Send message
Joined: 2 Jan 07
Posts: 274
Credit: 26,399,595
RAC: 0
Message 71493 - Posted: 25 Oct 2011, 21:07:39 UTC

Everything normal here as of now,
Uploads, downloads, validation, no pending`s,
Though from what you are saying here something else is not.
ID: 71493 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 109
Credit: 4,684,411
RAC: 2,617
Message 71494 - Posted: 25 Oct 2011, 22:38:48 UTC - in response to Message 71493.  

Everything normal here as of now,
Uploads, downloads, validation, no pending`s,
Though from what you are saying here something else is not.
There's certainly something not right, not only with Rosetta@Home but with RALPH@Home as well.
On R@H, WU's are uploaded and reported but then just sit as "pending". This was working at some point yesterday.
And on RALPH@Home, you can not upload any finished WUs sue t a "can not attach to shared memory" error on the server(s).

Don't know how much resources Rosetta@Home and RALPH@Home are sharing, but it looks to me as if whatever they fixed yesterday isn't in fact working properly...

Ralf
ID: 71494 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Validator down... :-(



©2024 University of Washington
https://www.bakerlab.org