Message boards : Number crunching : Question for developers - Does the New Versions on the 20Th have stuck at 1% fix?
Author | Message |
---|---|
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Did the new versions issued on the 20th contain the hoped for fix for the stuck at 1% problem? Secondly, were you folks aware that BOINC is supposed to be able to invalidate the prior versions so that work not started can run with the new executable? At least it used to have that feature. It may be too late for now, but, this may be something to keep in mind for the future. For many of us, this is/was a show-stopper kind of error, and I would much prefer to not have the risk of having another "hang" ... ==== edit Oh, and an announcement might have been friendly too ... Nothing in new, technical news, or the forums ... oh well, more important things to do I suppose ... |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
Yup, did not know there was a new one........ |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
I only noticed it because of the fact I have multiple computers showing in BOINC View and I saw the different version numbers. another developer question It seems that I am seeing a slightly higher client error rate. I have had 3-4 work units error out within seconds (which is less annoying than dieing after consuming 4 hours I suppose), I don't know if this is common experience with other people. But, it does seem strange that I would see as many errors as this when prior experience is that this is not the case ... |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
Looking at my results I had one on the 16th Dec and one way back in November, so nothing seen here yet! |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
|
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
I only noticed it because of the fact I have multiple computers showing in BOINC View and I saw the different version numbers. I'm glad (sort of) to see you make this claim. I got my FIRST client error ever on Dec 19th, and I've been running Rosetta since the beta testing phase. I was shocked. I checked the WU 3721807 and saw that it had been sent out to someone else before me who also had a client error. The WU was re-issued a third time. Now I see this morning another client error on WU 3745860 and it also errored out twice. There must be a problem. (edited to add urls) |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
Gee thanks.......got 3 Client Errors in a row just after I posted..... |
Basilaris Send message Joined: 2 Nov 05 Posts: 4 Credit: 17,014 RAC: 0 |
My first job with v4.81 errored out as well. Before it did, I noticed that the native structure can be rotated now. The second seems to be running fine so far. |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
I'm glad (sort of) to see you make this claim. I got my FIRST client error ever on Dec 19th, and I've been running Rosetta since the beta testing phase. I was shocked. I checked the WU 3721807 and saw that it had been sent out to someone else before me who also had a client error. The WU was re-issued a third time. Now I see this morning another client error on WU 3745860 and it also errored out twice. There must be a problem. (edited to add urls) 1hz6A_topology_sample_203_2788 failed on four computers (Linux/Windows/Intel/AMD/old and new application version). |
Tern Send message Joined: 25 Oct 05 Posts: 576 Credit: 4,695,362 RAC: 7 |
I had to try it... the first one, 1n0u__topology_sample_204_12744, errored after 13 seconds. I was the second to get it, it's about to go to a third. You may want to lower the number of "error/total/success" allowed. It doesn't make sense if a WU is bad, to send it to 10 people. I would think 3 or 4 would be enough to be pretty sure there's a problem. And I have to ask - what happened to the communication? We get a new application downloaded, I have no idea if it's Windows-only, Windows/Linux, or Windows/Linux/Mac, what is changed in it (other than rotating the protein in the graphics on Windows), and I have to assume based on thread posting timestamps, that it was released just as everybody left for the day... |
Hoelder1in Send message Joined: 30 Sep 05 Posts: 169 Credit: 3,915,947 RAC: 0 |
I had to try it... the first one, 1n0u__topology_sample_204_12744, errored after 13 seconds. I was the second to get it, it's about to go to a third. So, what are these *topology_sample_nnn_nnnnn units anyway -- are they any different from the recent *topology_sample_nnnnnn ones ? ;-) |
Webmaster Yoda Send message Joined: 17 Sep 05 Posts: 161 Credit: 162,253 RAC: 0 |
I had to try it... the first one Me too. One is still running OK after 20 minutes (1n0u__topology_sample_204_14869_0). The other (1ogw__topology_sample_204_3923_4) errored out in less than 20 seconds (and had crashed on 4 other machines before mine). Both on Rosetta 4.81. Observation: the error number (0xc0000005) is the same as occurs when switching Rosetta out of memory. [EDIT]Either we have a bad batch of work units or the new app is broken[/EDIT] *** Join BOINC@Australia today *** |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
I had to try it... the first one As Hoelderlin said, client error occurred before and after the new version of the app; seems to reflect on the WU not the app? |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
The new applications have modifications in the scientific code. The modifications were, again, made to allow increased diversity in the search. The app can now read in larger protein fragment libraries, and the number of score specific cycles can be increased. The screen saver was slightly modified to allow rotation of the native structure. Bin in our group was able to find a bug that may have caused an infinite loop in certain very infrequent circumstances, but we do not know if this is the bug people are seeing. There is no difference between the *topology_sample_nnnnn and the *topology_sample_nnn_nnnnn work units. The additional number (nnn), specifies a batch number used in our new work generator. I do not know what is causing these errors but I will look into it. I would try restarting boinc and seeing what happens. I'll be posting something up on the web site soon about these recent changes. |
Lee Carre Send message Joined: 6 Oct 05 Posts: 96 Credit: 79,331 RAC: 0 |
I've got a "DEFAULT_2reb_205_29_0" unit crawling along very slowly with the v4.81 app it's at 3.8% after 11 hours, running at 19MFIOps according to boincview but then the deadline is 17/01/2006 is this normal? |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
IF ANYONE SEES A "DEFAULT_xxxxx_205_.........." (batch 205) WORKUNIT PLEASE ABORT IT. An explanation will be posted soon, but in short, we accidentally sent out 1100 work units with very long run times (1000 structures to be made instead of 10). Sorry about this problem for those who have been crunching these since last night. |
rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0 |
IF ANYONE SEES A "DEFAULT_xxxxx_205_.........." (batch 205) WORKUNIT PLEASE ABORT IT. Nonetheless, if by some reason the whole workunit is processed (over days and days), might we assume the results would be valid for you (it's just that the workunit is 100X larger than normal)? Assuming it is completed before the one month deadline of course. :) Regards, Bob P. |
Lee Carre Send message Joined: 6 Oct 05 Posts: 96 Credit: 79,331 RAC: 0 |
IF ANYONE SEES A "DEFAULT_xxxxx_205_.........." (batch 205) WORKUNIT PLEASE ABORT IT. lol, oops, made me smile thou doesn't seem to be a good day for projects, SIMAP posted a news item saying that stats were available on boincsimap.com instead of boincstats.com as rbpeake asked: would the results actually be useful? i don't mind leaving it run, and i'm pretty sure it'll meet the deadline (only 1 of 2 projects running on that dual-core host) if not i'm happy to abort if it's not gonna be useful, i'm here for the science not credits, so doesn't bother me at all |
Jack Schonbrun Send message Joined: 1 Nov 05 Posts: 115 Credit: 5,954 RAC: 0 |
The results would be valid, though I'm not sure how your or our computers would like the files that are 100 times as big. (All of the structures are concatenated in one file.) It's probably better for everybody to abort and get new WUs. We are still investigating the issue with the WUs finish too quickly. |
Scribe Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 |
I found one waiting to run and deleted it. Can we now assume that there are no more waiting to be downloaded, just in case I go to bed and get one overnight. |
Message boards :
Number crunching :
Question for developers - Does the New Versions on the 20Th have stuck at 1% fix?
©2024 University of Washington
https://www.bakerlab.org