Message boards : Number crunching : Minirosetta v1.47 bug thread.
Author | Message |
---|---|
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
HoHo kids! We've got a new minirosetta version, with - you've guessed it - more bug fixes ! Woo! Please report remaining issues here - that would be grand :) http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
Stephen Send message Joined: 26 Apr 08 Posts: 32 Credit: 429,286 RAC: 0 |
are there any new changes to the science? |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
......sooooo which bugs do you feel you've fixed? Which of the many users that have abandoned the project due to problems should feel it is safe to reenter the waters? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
......sooooo which bugs do you feel you've fixed? Amongst a bunch of minor things, one major bug that was fixed was causing jobs ro rash when they enetered full-atom stage ut had a fullatom energy > 0. Which usually occurs rarely, which would explain the random errors seen with the cs_vanilla jobs. The bug was due to a wrongly initialized varaible. This bug was also causing the majority of the ccc_1_8_* jobs to fail on RALPH (we didnt move these over to BOINC of course, sicne we noticed the bug there). THe reason those failed more frequently was that they have constraints built in and those cause the energy to be offset to higher values increasing the frequency of the problem to more like 70%. Looking at the RALPH results i think most of the easily reproducable errors i think we've fixed. I recently ran close to 10000 WUs on our local compute cluster resulting in.. well.. 0 errors. This is wherei t gets tricky really, if stuff is only failing on other plattforms or due to machine dependent issues or *god knows what*. I will propose that the lab aquire a small farm of windows machiens to do extensive bug testing& hunting on to get a grip one these errors.. but believe us, these are difficult grounds. Thanks for bearing with us, Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
Sorry Mike, not a good start... 1483407 <core_client_version>6.2.19</core_client_version> |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
......sooooo which bugs do you feel you've fixed? I can't even imagine the loads of code you (guys) went thru. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
The one bug that comes to mind that would not be easy to observe by counting successfully completed results, on a farm of Linux machines all running only a single project, would be where the tasks were not suspending properly. Someone mentioned a BOINC API compatibility problem might be the cause? What would reasonable memory expectations be now? Are all the 1.47 tasks tagged as needing 512MB minimum? Or is there a mix? And, of that 512MB, what should one expect to see a task actually using when running normally? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
The one bug that comes to mind that would not be easy to observe by counting successfully completed results, on a farm of Linux machines all running only a single project, would be where the tasks were not suspending properly. Someone mentioned a BOINC API compatibility problem might be the cause? You're right. However I believe David Kim has updated and fixed this problem, at 1.45. If you guys *still* see problems with suspension of jobs then do let us know. We also hope that this lockfile problem should be largely fixed. We'll have to wait for the error statistics to come in before we know if the API fix has worked.
I can't speak for the enzyme design guys but to give you an idea: The jobs named "*_rlbd_*" and "*_rlbn_*" should take no more than 160 MB or so. The jobs named "cc2_*" or "*_chunk_*" should take between 150 and 320MB or so (they are much larger proteins). I'm not aware of any jobs that require more than 400MB, that would definitely point to a problem. ALthough the enzyme design guys may well have higher requirements. Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
Sorry Mike, not a good start... Yes, i know. I'm not saying the app is perfect - just that we found a bunch of definite bugs that are now fixed. No doubt there are still issues - we'r e working on it :) Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
Well.. to give you an idea .. Minirosetta has more than 200000 (yes two hundred thousand) lines of code. Each day there are maybe around 20 additions to the code, with around 40 people working on the code at each given time. But we'll get there, i'm optimisitic that with time we'll find the problems. http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
i hope you guys get a small farm of windows machines to double check problems against your linux machines. windows is what the majority of us crunchers use and certain error types may or may not show up on linux. for instance, how does one tell the difference between a machine error and a application error when the task dies with a (0xc0000005) error? is this something that shows up on your linux machines? or is that a specific windows error code? also in another thread you mentioned aborting tasks that are using lower than 1.47. would these tasks be reissued using 1.47 or would they use the same mini that they originated with? |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
My 1.47 cc2_1_8_mammoth-tasks have all crashed on Ralph, now my 1.47 cc2_1_8_native-tasks are crashing on Rosetta. Example (1 of 2): cc2_1_8_native_fa_cst_hb_t369__IGNORE_THE_REST_1S3QA_4_5599_36_0 <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> minirosetta_1.47_i686-apple-darwin(95094,0xa0538fa0) malloc: *** error for object 0x1747df0: Non-aligned pointer being freed (2) *** set a breakpoint in malloc_error_break to debug # cpu_run_time_pref: 14400 SIGSEGV: segmentation violation |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
And now all today's imported 1.47-tasks for the upcoming week have collapsed, most of them after less than 1 minute of computing, one was manually aborted as potentially ever-lasting. It seems that I have to stick to my 5.98-tasks for some days and increase the default runtime. |
jjwhalen Send message Joined: 20 Dec 06 Posts: 4 Credit: 399,398 RAC: 0 |
Minirosetta apparently "looks like" malware, whether it actually is or not. This applies to all versions I've run, thru v1.47. I run BOINC on two WinVista (God help me) boxes: one a 32 bit Sony with ZoneAlarm Pro|ESET NOD32 for security; the other a 64 bit Sony with Kaspersky Internet Security 2009. On the first machine, NOD32 Antivirus thinks the Minirosetta .exe either contains a viral signature or looks bad heuristically (their UI doesn't say which). I have to add an exclusion to get the thing out of quarantine, every time a new version is released. Interestingly, ZoneAlarm Pro's application module hasn't had a problem with it. On the 64 bit machine, Kaspersky's Application Control module gives Minirosetta's executable a Threat Rating of "Potentially Dangerous" with a heuristic Danger Index score of 82. I have to manually override Kaspersky and move Minirosetta out of the "Untrusted Application" zone, to allow it to execute. (By comparison, Rosetta Beta 5.98 has a DI of 12, as does SETI's recently released Astropulse 5.0. SETI's regular Enhanced v6.03 has a DI of zero.) I realize that heuristic analysis is as much art as science, but both ESET and Kaspersky are rated at or near the top of their field. Of 10 project hosts I subscribe to, with over 25 project executables, Minirosetta is the ONLY one that has ever sent up a red flag to my security suite(s). Since most folks leave their security suite (if any) on autopilot, there are potentially many testers who never get to run Minirosetta because the .exe goes immediately into a black hole. Somewhere in those 200,000 lines of code, something apparently looks funky. Best wishes:) |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
After a 1 week hiatus I downloaded v1.47 and 4 tasks. The first task showed a completion time of 12 hours which corresponds to my chosen runtime. The other 3 tasks, all _rlbd_ tasks, showed completion times of only 1 hour. What's up with that? It suggests that the staff provided an estimated task runtime of something like 45 minutes instead of the customary 8 hours. Because of the 1-hour runtimes BOINC also downloaded additional tasks to fill the cache. Not good. |
funkydude Send message Joined: 15 Jun 08 Posts: 28 Credit: 397,934 RAC: 0 |
Hello, I've been using both nod32 and rosetta for years now, I've never had nod32 detect rosetta as anything malicious, make sure you are updated. v3.0.672.0 DB 3695 as of writing. |
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
Sorry Mike, not a good start... That's ok. Just that I'm trying to get more active here again after some computer problems and the first 1.47 task crashed out quickly. The next 4 have run with no problems though. Hopefully that continues. Usually all the problems are mine, not yours. Good to see a more active presence from you in this forum. You're feedback to issues makes a big difference, even if it's just to say you're working on it without a solution yet. That matters too. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Sorry Mike, not a good start... Just to expand on the point of this person....Thanks for taking the time to tell us what is going on. We like to know and the silence has been deafening lately. Thanks again for breaking it. We hope for more news as time goes along. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. I found a problem with the graphics on Ubuntu 8.04, mini 1.45 worked fine but now when i click the show graphics button all i get is the outline of the graphic window, it looked transparent. I could not close it normally i had to go to processes and kill it from there, also it was showing that the graphics was using mini 1.40 for some reason. I'm sure that mini 1.45 was using the graphics 1.45, not a bigge but still. pete. |
RodrigoPS Send message Joined: 28 Nov 08 Posts: 3 Credit: 1,336,719 RAC: 9 |
Hi. I'm having the same problem, but in XP 32-bit, in one of the hosts after the installation of mini 1.47 |
Message boards :
Number crunching :
Minirosetta v1.47 bug thread.
©2024 University of Washington
https://www.bakerlab.org