Minirosetta v1.47 bug thread.

Author	Message
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0	Message 57902 - Posted: 15 Dec 2008, 22:08:36 UTC HoHo kids! We've got a new minirosetta version, with - you've guessed it - more bug fixes ! Woo! Please report remaining issues here - that would be grand :) http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ ID: 57902 · Rating: 0 · rate: / Reply Quote

Stephen Send message Joined: 26 Apr 08 Posts: 32 Credit: 429,286 RAC: 0	Message 57903 - Posted: 15 Dec 2008, 22:21:18 UTC - in response to Message 57902. are there any new changes to the science? ID: 57903 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 57905 - Posted: 15 Dec 2008, 23:25:25 UTC ......sooooo which bugs do you feel you've fixed? Which of the many users that have abandoned the project due to problems should feel it is safe to reenter the waters? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 57905 · Rating: 0 · rate: / Reply Quote

Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0	Message 57906 - Posted: 16 Dec 2008, 0:04:21 UTC - in response to Message 57905. ......sooooo which bugs do you feel you've fixed? Which of the many users that have abandoned the project due to problems should feel it is safe to reenter the waters? Amongst a bunch of minor things, one major bug that was fixed was causing jobs ro rash when they enetered full-atom stage ut had a fullatom energy > 0. Which usually occurs rarely, which would explain the random errors seen with the cs_vanilla jobs. The bug was due to a wrongly initialized varaible. This bug was also causing the majority of the ccc_1_8_* jobs to fail on RALPH (we didnt move these over to BOINC of course, sicne we noticed the bug there). THe reason those failed more frequently was that they have constraints built in and those cause the energy to be offset to higher values increasing the frequency of the problem to more like 70%. Looking at the RALPH results i think most of the easily reproducable errors i think we've fixed. I recently ran close to 10000 WUs on our local compute cluster resulting in.. well.. 0 errors. This is wherei t gets tricky really, if stuff is only failing on other plattforms or due to machine dependent issues or god knows what. I will propose that the lab aquire a small farm of windows machiens to do extensive bug testing& hunting on to get a grip one these errors.. but believe us, these are difficult grounds. Thanks for bearing with us, Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ ID: 57906 · Rating: 0 · rate: / Reply Quote

LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0	Message 57910 - Posted: 16 Dec 2008, 1:24:57 UTC Sorry Mike, not a good start... 1483407 <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0049162C read attempt to address 0x00000000 Engaging BOINC Windows Runtime Debugger... ID: 57910 · Rating: 0 · rate: / Reply Quote

Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0	Message 57911 - Posted: 16 Dec 2008, 1:39:50 UTC - in response to Message 57906. ......sooooo which bugs do you feel you've fixed? Which of the many users that have abandoned the project due to problems should feel it is safe to reenter the waters? Amongst a bunch of minor things, one major bug that was fixed was causing jobs ro rash when they enetered full-atom stage ut had a fullatom energy > 0. Which usually occurs rarely, which would explain the random errors seen with the cs_vanilla jobs. The bug was due to a wrongly initialized varaible. This bug was also causing the majority of the ccc_1_8_* jobs to fail on RALPH (we didnt move these over to BOINC of course, sicne we noticed the bug there). THe reason those failed more frequently was that they have constraints built in and those cause the energy to be offset to higher values increasing the frequency of the problem to more like 70%. Looking at the RALPH results i think most of the easily reproducable errors i think we've fixed. I recently ran close to 10000 WUs on our local compute cluster resulting in.. well.. 0 errors. This is wherei t gets tricky really, if stuff is only failing on other plattforms or due to machine dependent issues or god knows what. I will propose that the lab aquire a small farm of windows machiens to do extensive bug testing& hunting on to get a grip one these errors.. but believe us, these are difficult grounds. Thanks for bearing with us, Mike I can't even imagine the loads of code you (guys) went thru. ID: 57911 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 57913 - Posted: 16 Dec 2008, 1:56:14 UTC The one bug that comes to mind that would not be easy to observe by counting successfully completed results, on a farm of Linux machines all running only a single project, would be where the tasks were not suspending properly. Someone mentioned a BOINC API compatibility problem might be the cause? What would reasonable memory expectations be now? Are all the 1.47 tasks tagged as needing 512MB minimum? Or is there a mix? And, of that 512MB, what should one expect to see a task actually using when running normally? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 57913 · Rating: 0 · rate: / Reply Quote

Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0	Message 57914 - Posted: 16 Dec 2008, 2:12:00 UTC - in response to Message 57913. The one bug that comes to mind that would not be easy to observe by counting successfully completed results, on a farm of Linux machines all running only a single project, would be where the tasks were not suspending properly. Someone mentioned a BOINC API compatibility problem might be the cause? You're right. However I believe David Kim has updated and fixed this problem, at 1.45. If you guys still see problems with suspension of jobs then do let us know. We also hope that this lockfile problem should be largely fixed. We'll have to wait for the error statistics to come in before we know if the API fix has worked. What would reasonable memory expectations be now? Are all the 1.47 tasks tagged as needing 512MB minimum? Or is there a mix? And, of that 512MB, what should one expect to see a task actually using when running normally? I can't speak for the enzyme design guys but to give you an idea: The jobs named "_rlbd_" and "_rlbn_" should take no more than 160 MB or so. The jobs named "cc2_" or "_chunk_*" should take between 150 and 320MB or so (they are much larger proteins). I'm not aware of any jobs that require more than 400MB, that would definitely point to a problem. ALthough the enzyme design guys may well have higher requirements. Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ ID: 57914 · Rating: 0 · rate: / Reply Quote

Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0	Message 57915 - Posted: 16 Dec 2008, 2:14:15 UTC - in response to Message 57910. Sorry Mike, not a good start... Yes, i know. I'm not saying the app is perfect - just that we found a bunch of definite bugs that are now fixed. No doubt there are still issues - we'r e working on it :) Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ ID: 57915 · Rating: 0 · rate: / Reply Quote

Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0	Message 57916 - Posted: 16 Dec 2008, 2:16:13 UTC - in response to Message 57911. I can't even imagine the loads of code you (guys) went thru. Well.. to give you an idea .. Minirosetta has more than 200000 (yes two hundred thousand) lines of code. Each day there are maybe around 20 additions to the code, with around 40 people working on the code at each given time. But we'll get there, i'm optimisitic that with time we'll find the problems. http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ ID: 57916 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 57923 - Posted: 16 Dec 2008, 8:53:03 UTC Last modified: 16 Dec 2008, 8:54:42 UTC i hope you guys get a small farm of windows machines to double check problems against your linux machines. windows is what the majority of us crunchers use and certain error types may or may not show up on linux. for instance, how does one tell the difference between a machine error and a application error when the task dies with a (0xc0000005) error? is this something that shows up on your linux machines? or is that a specific windows error code? also in another thread you mentioned aborting tasks that are using lower than 1.47. would these tasks be reissued using 1.47 or would they use the same mini that they originated with? ID: 57923 · Rating: 0 · rate: / Reply Quote

ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0	Message 57926 - Posted: 16 Dec 2008, 10:50:42 UTC My 1.47 cc2_1_8_mammoth-tasks have all crashed on Ralph, now my 1.47 cc2_1_8_native-tasks are crashing on Rosetta. Example (1 of 2): cc2_1_8_native_fa_cst_hb_t369__IGNORE_THE_REST_1S3QA_4_5599_36_0 <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> minirosetta_1.47_i686-apple-darwin(95094,0xa0538fa0) malloc: * error for object 0x1747df0: Non-aligned pointer being freed (2) * set a breakpoint in malloc_error_break to debug # cpu_run_time_pref: 14400 SIGSEGV: segmentation violation ID: 57926 · Rating: 0 · rate: / Reply Quote

ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0	Message 57927 - Posted: 16 Dec 2008, 11:59:02 UTC And now all today's imported 1.47-tasks for the upcoming week have collapsed, most of them after less than 1 minute of computing, one was manually aborted as potentially ever-lasting. It seems that I have to stick to my 5.98-tasks for some days and increase the default runtime. ID: 57927 · Rating: 0 · rate: / Reply Quote

jjwhalen Send message Joined: 20 Dec 06 Posts: 4 Credit: 399,398 RAC: 0	Message 57928 - Posted: 16 Dec 2008, 12:17:35 UTC Minirosetta apparently "looks like" malware, whether it actually is or not. This applies to all versions I've run, thru v1.47. I run BOINC on two WinVista (God help me) boxes: one a 32 bit Sony with ZoneAlarm Pro\|ESET NOD32 for security; the other a 64 bit Sony with Kaspersky Internet Security 2009. On the first machine, NOD32 Antivirus thinks the Minirosetta .exe either contains a viral signature or looks bad heuristically (their UI doesn't say which). I have to add an exclusion to get the thing out of quarantine, every time a new version is released. Interestingly, ZoneAlarm Pro's application module hasn't had a problem with it. On the 64 bit machine, Kaspersky's Application Control module gives Minirosetta's executable a Threat Rating of "Potentially Dangerous" with a heuristic Danger Index score of 82. I have to manually override Kaspersky and move Minirosetta out of the "Untrusted Application" zone, to allow it to execute. (By comparison, Rosetta Beta 5.98 has a DI of 12, as does SETI's recently released Astropulse 5.0. SETI's regular Enhanced v6.03 has a DI of zero.) I realize that heuristic analysis is as much art as science, but both ESET and Kaspersky are rated at or near the top of their field. Of 10 project hosts I subscribe to, with over 25 project executables, Minirosetta is the ONLY one that has ever sent up a red flag to my security suite(s). Since most folks leave their security suite (if any) on autopilot, there are potentially many testers who never get to run Minirosetta because the .exe goes immediately into a black hole. Somewhere in those 200,000 lines of code, something apparently looks funky. Best wishes:) ID: 57928 · Rating: 0 · rate: / Reply Quote

Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0	Message 57929 - Posted: 16 Dec 2008, 12:32:05 UTC Last modified: 16 Dec 2008, 12:34:34 UTC After a 1 week hiatus I downloaded v1.47 and 4 tasks. The first task showed a completion time of 12 hours which corresponds to my chosen runtime. The other 3 tasks, all _rlbd_ tasks, showed completion times of only 1 hour. What's up with that? It suggests that the staff provided an estimated task runtime of something like 45 minutes instead of the customary 8 hours. Because of the 1-hour runtimes BOINC also downloaded additional tasks to fill the cache. Not good. ID: 57929 · Rating: 0 · rate: / Reply Quote

funkydude Send message Joined: 15 Jun 08 Posts: 28 Credit: 397,934 RAC: 0	Message 57935 - Posted: 16 Dec 2008, 13:17:00 UTC - in response to Message 57928. On the first machine, NOD32 Antivirus thinks the Minirosetta .exe either contains a viral signature or looks bad heuristically (their UI doesn't say which). I have to add an exclusion to get the thing out of quarantine, every time a new version is released. Hello, I've been using both nod32 and rosetta for years now, I've never had nod32 detect rosetta as anything malicious, make sure you are updated. v3.0.672.0 DB 3695 as of writing. ID: 57935 · Rating: 0 · rate: / Reply Quote

LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0	Message 57936 - Posted: 16 Dec 2008, 13:29:41 UTC - in response to Message 57915. Sorry Mike, not a good start... Yes, I know. I'm not saying the app is perfect - just that we found a bunch of definite bugs that are now fixed. No doubt there are still issues - we're working on it :) That's ok. Just that I'm trying to get more active here again after some computer problems and the first 1.47 task crashed out quickly. The next 4 have run with no problems though. Hopefully that continues. Usually all the problems are mine, not yours. Good to see a more active presence from you in this forum. You're feedback to issues makes a big difference, even if it's just to say you're working on it without a solution yet. That matters too. ID: 57936 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 57939 - Posted: 16 Dec 2008, 14:42:37 UTC - in response to Message 57936. Sorry Mike, not a good start... Yes, I know. I'm not saying the app is perfect - just that we found a bunch of definite bugs that are now fixed. No doubt there are still issues - we're working on it :) That's ok. Just that I'm trying to get more active here again after some computer problems and the first 1.47 task crashed out quickly. The next 4 have run with no problems though. Hopefully that continues. Usually all the problems are mine, not yours. Good to see a more active presence from you in this forum. You're feedback to issues makes a big difference, even if it's just to say you're working on it without a solution yet. That matters too. Just to expand on the point of this person....Thanks for taking the time to tell us what is going on. We like to know and the silence has been deafening lately. Thanks again for breaking it. We hope for more news as time goes along. ID: 57939 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 57941 - Posted: 16 Dec 2008, 21:26:31 UTC Hi. I found a problem with the graphics on Ubuntu 8.04, mini 1.45 worked fine but now when i click the show graphics button all i get is the outline of the graphic window, it looked transparent. I could not close it normally i had to go to processes and kill it from there, also it was showing that the graphics was using mini 1.40 for some reason. I'm sure that mini 1.45 was using the graphics 1.45, not a bigge but still. pete. ID: 57941 · Rating: 0 · rate: / Reply Quote

RodrigoPS Send message Joined: 28 Nov 08 Posts: 3 Credit: 1,517,232 RAC: 2	Message 57944 - Posted: 16 Dec 2008, 23:14:36 UTC - in response to Message 57941. Last modified: 16 Dec 2008, 23:16:20 UTC Hi. I found a problem with the graphics on Ubuntu 8.04, mini 1.45 worked fine but now when i click the show graphics button all i get is the outline of the graphic window, it looked transparent. I could not close it normally i had to go to processes and kill it from there, also it was showing that the graphics was using mini 1.40 for some reason. I'm sure that mini 1.45 was using the graphics 1.45, not a bigge but still. pete. I'm having the same problem, but in XP 32-bit, in one of the hosts after the installation of mini 1.47 ID: 57944 · Rating: 0 · rate: / Reply Quote