Question for developers - Does the New Versions on the 20Th have stuck at 1% fix?

Message boards : Number crunching : Question for developers - Does the New Versions on the 20Th have stuck at 1% fix?

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 6917 - Posted: 20 Dec 2005, 19:34:09 UTC - in response to Message 6915.  

I found one waiting to run and deleted it. Can we now assume that there are no more waiting to be downloaded, just in case I go to bed and get one overnight.


I think that they no longer in the queue, so you should be able to rest easy.
ID: 6917 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Pixiebot
Avatar

Send message
Joined: 6 Nov 05
Posts: 50
Credit: 60,515
RAC: 0
Message 6918 - Posted: 20 Dec 2005, 19:34:19 UTC
Last modified: 20 Dec 2005, 19:37:24 UTC

Unfortunately not

4468774 3761481 20 Dec 2005 19:23:37 UTC 17 Jan 2006 19:23:37 UTC In Progress Unknown New
ID: 6918 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lee Carre

Send message
Joined: 6 Oct 05
Posts: 96
Credit: 79,331
RAC: 0
Message 6919 - Posted: 20 Dec 2005, 19:36:15 UTC - in response to Message 6914.  
Last modified: 20 Dec 2005, 19:37:11 UTC

The results would be valid, though I'm not sure how your or our computers would like the files that are 100 times as big. ... It's probably better for everybody to abort and get new WUs. We are still investigating the issue with the WUs finish too quickly.

Well my concerns are; is/has the work been sent out as the "correct" smaller WUs?
and have the "DEFAULT" units been canceled on the server/database, so that they aren't sent out anymore?

I for one, am going to leave mine running until i hear an definate answer from the officials
ID: 6919 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 6921 - Posted: 20 Dec 2005, 19:55:21 UTC

Having now aborted the 'default...' WU, it now sits in the work area with a status of 'Aborted by User'. There have been a couple of updates since then where results have been reported but it still sits there.
How can I get rid of it?
ID: 6921 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 6928 - Posted: 20 Dec 2005, 20:57:39 UTC - in response to Message 6919.  


I for one, am going to leave mine running until i hear an definate answer from the officials


I am an official, and I can tell you that right thing to do is abort the Work Unit. They will be sent again, with the correct arguments.
ID: 6928 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 6931 - Posted: 20 Dec 2005, 21:09:26 UTC - in response to Message 6908.  

IF ANYONE SEES A "DEFAULT_xxxxx_205_.........." (batch 205) WORKUNIT PLEASE ABORT IT.
.


When Einstein made a mistake like this, they managed to give everyone credit for the aborted WU - if you ask them nicely they may still have the script handy. (If they are not sure when this was, it was when they issued WU whose names differed only by upper-vs-lower case, and they confused the Windows machines)

People initially got 0, but after the script was run everyone ended up getting what their client claimed for the result.

Just a thought, don't think it affects me personally.

R~~
ID: 6931 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Desti

Send message
Joined: 16 Sep 05
Posts: 50
Credit: 3,018
RAC: 0
Message 6944 - Posted: 20 Dec 2005, 22:23:52 UTC - in response to Message 6908.  

IF ANYONE SEES A "DEFAULT_xxxxx_205_.........." (batch 205) WORKUNIT PLEASE ABORT IT.

An explanation will be posted soon, but in short, we accidentally sent out 1100 work units with very long run times (1000 structures to be made instead of 10).

Sorry about this problem for those who have been crunching these since last night.


The DEFAULT__206 workunits are ok?
I have just finished one of them without any problems :)

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=3768678
LUE
ID: 6944 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 6949 - Posted: 20 Dec 2005, 22:49:26 UTC

Batch 206 is okay, ONLY ABORT 205.
ID: 6949 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Lee Carre

Send message
Joined: 6 Oct 05
Posts: 96
Credit: 79,331
RAC: 0
Message 6951 - Posted: 20 Dec 2005, 23:13:54 UTC - in response to Message 6928.  

I am an official, and I can tell you that right thing to do is abort the Work Unit. They will be sent again, with the correct arguments.
sorry, didn't mean to imply that you weren't, i just ment someone from the group of admins generally, since i've been out for a bit the WU timed out anyway, think it exceeded the max CPU time allowed, so there we go lol
ID: 6951 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile MikeX

Send message
Joined: 17 Sep 05
Posts: 1
Credit: 16,201
RAC: 0
Message 6953 - Posted: 20 Dec 2005, 23:17:05 UTC

WU's 4469584,
4436212 and
4399861
had an 'Unrecoverable error' just after a few minutes crunching.

Wanna visit BOINC Synergy? Click my stats!

Join BOINC Synergy
ID: 6953 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jack Schonbrun

Send message
Joined: 1 Nov 05
Posts: 115
Credit: 5,954
RAC: 0
Message 6967 - Posted: 21 Dec 2005, 1:12:58 UTC

Just to let everyone know, we are closing in on the bug causing many jobs to exit after a minute or so. We have a work around until the bug is found, so we should be able to keep sending reliable jobs over the holidays.
ID: 6967 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 6969 - Posted: 21 Dec 2005, 5:48:32 UTC

Thanks Jack
ID: 6969 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 6988 - Posted: 21 Dec 2005, 11:39:14 UTC

Still getting lots of the aborting ones after a few seconds...
ID: 6988 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 7022 - Posted: 21 Dec 2005, 15:24:25 UTC - in response to Message 6908.  

IF ANYONE SEES A "DEFAULT_xxxxx_205_.........." (batch 205) WORKUNIT PLEASE ABORT IT.

An explanation will be posted soon, but in short, we accidentally sent out 1100 work units with very long run times (1000 structures to be made instead of 10).

Sorry about this problem for those who have been crunching these since last night.

Is there any way to stop these from getting re-issued after someone processes them until they default? I got a second go-around on one that someone else had crunched until it was defaulted. I guess they could be re-issued up to 5 times?
Regards,
Bob P.
ID: 7022 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 7023 - Posted: 21 Dec 2005, 15:29:28 UTC
Last modified: 21 Dec 2005, 15:30:47 UTC

5 times? I have some that have been issued 10 times...that is why I have suspended Rosetta till they are fixed....it is a waste of my bandwidth.

(this is with ref to the 'crashing' ones after a few seconds)
ID: 7023 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 7030 - Posted: 21 Dec 2005, 16:02:37 UTC - in response to Message 6921.  

Having now aborted the 'default...' WU, it now sits in the work area with a status of 'Aborted by User'. There have been a couple of updates since then where results have been reported but it still sits there.
How can I get rid of it?

Wait, it will try to run and immediately client error .. then reporting will clear it like other failed work unist.
ID: 7030 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 7032 - Posted: 21 Dec 2005, 16:13:39 UTC

Yeah thanks Paul...it had gone when I got up this morning!
ID: 7032 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 7047 - Posted: 21 Dec 2005, 17:48:56 UTC - in response to Message 7023.  

5 times? I have some that have been issued 10 times...that is why I have suspended Rosetta till they are fixed....it is a waste of my bandwidth.

(this is with ref to the 'crashing' ones after a few seconds)

I was referring to the 100X as long ones, the 205 series that run for hours before they default out, but your point is valid too!

Regards,
Bob P.
ID: 7047 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Question for developers - Does the New Versions on the 20Th have stuck at 1% fix?



©2024 University of Washington
https://www.bakerlab.org