March 2022 - WU error rates

Author	Message
jay Send message Joined: 12 Jan 08 Posts: 25 Credit: 204,025 RAC: 0	Message 105411 - Posted: 11 Mar 2022, 15:46:08 UTC Greetings, I am working the non-vbox WU and getting errors. I compare my errored results with my wing-man. for example, on one WU a Windows Volunteer errors out in 20 seconds. My Linux WU ran for 35939 seconds and had the error: "Too many errors (may have bug) Too many total results" on validation. See https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1315068895 What gives? Is this a time of testing the WU? If so, can they be tested by Rosetta before releasing? For me it is a matter of electricity, heat, and non-productive work. Another WU, https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1315163591 has already errored-out by my windows wing-man, while My linux box is still crunching. ( I has run for about 6 hours with 3.5 hours remaining. I am concerned about a similar failure.) I looked on the forums for recent errors - but did not see any. Anyone else have Problems - or having no errors? THANKS, Jay ID: 105411 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2207 Credit: 13,720,774 RAC: 3	Message 105412 - Posted: 11 Mar 2022, 16:07:41 UTC - in response to Message 105411. Is this a time of testing the WU? Yes If so, can they be tested by Rosetta before releasing? No, Ralph@Home is largely unused ID: 105412 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1940 Credit: 18,534,891 RAC: 0	Message 105414 - Posted: 11 Mar 2022, 22:41:12 UTC - in response to Message 105411. What gives? Is this a time of testing the WU? If so, can they be tested by Rosetta before releasing? For me it is a matter of electricity, heat, and non-productive work. Due to the nature of Rosetta work, Tasks that error out can still give useful results, which is why in most cases you will still get Credit for a Task that produces an error. Unfortunately there has been little work (if any) to actually code so that such Tasks instead of crashing out with an error just end early (as they should). Only Tasks that are truly an error (ie not producing useful data) should actually error out. And the applications should be fixed so that one version -eg for Windows- produces errors when the other -eg LINUX- doesn't produce errors (the reverse has also occurred in the past). But since Rosetta 4.20 has pretty much been abandoned apart from the occasional small batch every so often, there is no such effort. Not surprising as the new type of Rosetta Tasks -Python- have plenty of significant issues of their own of which there has been no updated application to address them at all. If they aren't going to fix their current application, there's no way on Erath they're going to fix the old depreciated ones. Grant Darwin NT ID: 105414 · Rating: 0 · rate: / Reply Quote

Sid Celery Send message Joined: 11 Feb 08 Posts: 2599 Credit: 47,220,881 RAC: 0	Message 105420 - Posted: 12 Mar 2022, 2:41:39 UTC - in response to Message 105414. What gives? Is this a time of testing the WU? If so, can they be tested by Rosetta before releasing? For me it is a matter of electricity, heat, and non-productive work. Due to the nature of Rosetta work, Tasks that error out can still give useful results, which is why in most cases you will still get Credit for a Task that produces an error. Unfortunately there has been little work (if any) to actually code so that such Tasks instead of crashing out with an error just end early (as they should). Only Tasks that are truly an error (ie not producing useful data) should actually error out. And the applications should be fixed so that one version -eg for Windows- produces errors when the other -eg LINUX- doesn't produce errors (the reverse has also occurred in the past). But since Rosetta 4.20 has pretty much been abandoned apart from the occasional small batch every so often, there is no such effort. Not surprising as the new type of Rosetta Tasks -Python- have plenty of significant issues of their own of which there has been no updated application to address them at all. If they aren't going to fix their current application, there's no way on Erath they're going to fix the old depreciated ones. Taking a quick look, they're both "preetham" tasks. I just had one here and it barely reached 2 seconds of CPU time before crashing. Interesting to read that they last much longer on linux than windows. That seems to be a pattern in recent times. Ugh... ID: 105420 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2207 Credit: 13,720,774 RAC: 3	Message 105423 - Posted: 12 Mar 2022, 9:46:54 UTC - in response to Message 105414. But since Rosetta 4.20 has pretty much been abandoned apart from the occasional small batch every so often, there is no such effort. Not surprising as the new type of Rosetta Tasks -Python- have plenty of significant issues of their own of which there has been no updated application to address them at all. I continue to write messages (about, for example, multi-attach disks on Virtualbox) on Twitter to "stimulate" admins. Up to now, without results. ID: 105423 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2207 Credit: 13,720,774 RAC: 3	Message 105424 - Posted: 12 Mar 2022, 9:53:38 UTC - in response to Message 105420. Interesting to read that they last much longer on linux than windows. That seems to be a pattern in recent times. Ugh... They write the native code on linux and after they compiled it for other platforms. So, probably, they don't pay attention to this part of coding (that is important as much as write the code) ID: 105424 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 105425 - Posted: 12 Mar 2022, 16:15:30 UTC - in response to Message 105423. I continue to write messages (about, for example, multi-attach disks on Virtualbox) on Twitter to "stimulate" admins. Up to now, without results. I could live with their other problems, but not the high writes rates to the SSD. Even with a huge (32 GB) write cache, and running only six work units on 50% of the cores of a Ryzen 3600 (Ubuntu 20.04.4), I was seeing writes to disk of over 800 GB/day. It is probably because of how they handle the .VDI files; computezrmle tells them how to do it. I get the impression that this researcher has never developed a program for BOINC before, and isn't interested in learning how to do it now. ID: 105425 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 105426 - Posted: 12 Mar 2022, 18:52:37 UTC - in response to Message 105425. I continue to write messages (about, for example, multi-attach disks on Virtualbox) on Twitter to "stimulate" admins. Up to now, without results. I could live with their other problems, but not the high writes rates to the SSD. Even with a huge (32 GB) write cache, and running only six work units on 50% of the cores of a Ryzen 3600 (Ubuntu 20.04.4), I was seeing writes to disk of over 800 GB/day. It is probably because of how they handle the .VDI files; computezrmle tells them how to do it. I get the impression that this researcher has never developed a program for BOINC before, and isn't interested in learning how to do it now. Your just stating the obvious. As long as they get something around a 95% clean result (perhaps as low as 90) then they are happy. We have discussed this to the end of the world and beyond. They don't care about the PC side as long as they get a pretty good result. They do not monitor twitter as far as I know and never here anymore. They are not open to suggestions. Their way works, why change it or update it. RALPH, they should shut that off. They never use it. That's the summary of it all. You get what you get, if it works good on linux, great, then they have a result from the linux. If it works on windows, even better, if it doesn't, oh well. The only thing that is important to them is their neural network system. The PC's are a nice addition. Very much the way BOINC TACC works. ID: 105426 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 105427 - Posted: 12 Mar 2022, 20:52:12 UTC - in response to Message 105426. Your just stating the obvious. As long as they get something around a 95% clean result (perhaps as low as 90) then they are happy. We have discussed this to the end of the world and beyond. I have been discussing it for far longer. And you managed to miss the point about the writes. Maybe you don't monitor them? ID: 105427 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5774 Credit: 6,139,760 RAC: 0	Message 105435 - Posted: 13 Mar 2022, 10:31:03 UTC - in response to Message 105427. Your just stating the obvious. As long as they get something around a 95% clean result (perhaps as low as 90) then they are happy. We have discussed this to the end of the world and beyond. I have been discussing it for far longer. And you managed to miss the point about the writes. Maybe you don't monitor them? I don't write as much data to my drive as you probably. So far on my oldest drive I have written around 65TB and it is still in good health. According to Samsung's information I am just approaching the middle age of the drive. Wasn't it you who talked about a cache program that would reduce the writes? But again, its another topic that has been discussed and ignored by the team, so why holler on about it? They obviously don't care. I've said it a lot already and others say the same. The team does NOT care about PC users. They only care about their neural network. We gets the scraps or the wild ideas in whatever form they come in. They do not change anything. You get what you get good or bad, large or small. if you burn out your drive due to the writes, that's nothing of concern to them. There will always be someone to take your place. We can make suggestions and complain all we want, but they are NOT interested. That is very clear here in the messages boards and via twitter and by the one person who can get through to them, of which they just acknowledge the email and do nothing. As long as they get the data by whatever means necessary they are happy. If machines fail, people quit, that does not matter. ID: 105435 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2207 Credit: 13,720,774 RAC: 3	Message 105438 - Posted: 13 Mar 2022, 16:36:59 UTC - in response to Message 105435. They obviously don't care. I've said it a lot already and others say the same. The team does NOT care about PC users. They only care about their neural network. We gets the scraps or the wild ideas in whatever form they come in. They do not change anything. You get what you get good or bad, large or small. if you burn out your drive due to the writes, that's nothing of concern to them. There will always be someone to take your place. We can make suggestions and complain all we want, but they are NOT interested. That is very clear here in the messages boards and via twitter and by the one person who can get through to them, of which they just acknowledge the email and do nothing. As long as they get the data by whatever means necessary they are happy. If machines fail, people quit, that does not matter. I'm starting to think you're right. I love this project, I've supported it for YEARS, but I'm starting to get a little tired ID: 105438 · Rating: 0 · rate: / Reply Quote

xroule Send message Joined: 9 Feb 15 Posts: 4 Credit: 59,781,566 RAC: 0	Message 105465 - Posted: 16 Mar 2022, 15:34:34 UTC - in response to Message 105438. With 3371 wu and 3118 errors in 12 hours, I cant wait for WCG to reopen. For now, this is the only project for me. What a waste of resources! 9 PM, do you know what your PC is doing?? ID: 105465 · Rating: 0 · rate: / Reply Quote

keputnam Send message Joined: 18 Sep 05 Posts: 24 Credit: 2,134,864 RAC: 0	Message 105466 - Posted: 16 Mar 2022, 17:59:54 UTC - in response to Message 105426. "As long as they get something around a 95% clean result (perhaps as low as 90) then they are happy." really? In the last 2 1/2 days I have had 140 "Error while computing" results My wingmen have all had the same results They are NOT getting any results at all, and are awarding NO credit for these jobs ID: 105466 · Rating: 0 · rate: / Reply Quote

Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 221 Credit: 7,592,056 RAC: 0	Message 105467 - Posted: 16 Mar 2022, 18:36:42 UTC - in response to Message 105426. As long as they get something around a 95% clean result (perhaps as low as 90) then they are happy. We have discussed this to the end of the world and beyond. They don't care about the PC side as long as they get a pretty good result. Well, they sure are not getting 95% clean results from me and my "wingmen." We are getting 100% failure rates. My wingmen and I run different hardware and different operating systems. Some Linux, some Windows. It does not matter: they all fail. I disabled getting new work units last evening, and when I noticed more units added to the list today, I got a bunch more. They all failed immediately. Over 300 failures in this batch just for me. So I disabled getting new work units again. ID: 105467 · Rating: 0 · rate: / Reply Quote

spiralfeel Send message Joined: 25 Apr 20 Posts: 1 Credit: 235,796 RAC: 0	Message 105477 - Posted: 16 Mar 2022, 20:25:05 UTC - in response to Message 105465. With 3371 wu and 3118 errors in 12 hours, I cant wait for WCG to reopen. For now, this is the only project for me. What a waste of resources! You should consider TN-Grid http://gene.disi.unitn.it/test/ and SiDock@home https://www.sidock.si/sidock/ ID: 105477 · Rating: 0 · rate: / Reply Quote