Discuss Rosetta Application Errors and Fixes (all Vers)

Author	Message
cMw Send message Joined: 24 Apr 06 Posts: 9 Credit: 14,036 RAC: 0	Message 14913 - Posted: 28 Apr 2006, 20:18:42 UTC - in response to Message 14909. After looking into the cpu run time bug more, it appears that it should only effect users who keep the app in memory since the logic will pick up the run time preference if the app makes decoys and restarts. For those who leave the app in memory, the app will never restart. We will update the app later today with a fix. i hope so and im guessing the 502 pts i lost will be granted as well :D ? btw when do you expect to have this updated by causing im losing time :( ID: 14913 · Rating: 0 · rate: / Reply Quote

Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0	Message 14915 - Posted: 28 Apr 2006, 20:49:46 UTC A subtle bug???? I just hope that my not so subtle harping and arguing helped in finding it. :) Since I am one of the ones that keeps the application in memory, it seems that subtle bug hit me with a vengeance. I want to make this crystal clear: As the self appointed Poster Boy of the Rosetta Bugs and its Official Bug Magnet (Yes I can still keep my sense of humor.), I have been vocal in pointing out the bugs : vocal and (maybe for some tastes too fast) and constant in pointing out the bugs. My level of posting has been so intense in the last days that I know it has come close for the creation in this Board of a thread called "Jose is @@tching again!!!!) but, I digress. The fact that I complained that much is an indication that: 1- I do believe this is a most worthwhile project. This is why all my computing resources, limited as they are, are totally committed to Rosetta. 2- I have to thanks the moderator and all the participants in the exchanges of ideas and information I have been involved with for their civility, and their desire to help me as an individual and the community as a whole. Some of the ideas that have been proposed to me seem to have worked. ( This said as I am crossing my fingers hoping the now-stable situation holds) 3- I would be unfair if I don't recognize and applaud the massive effort the project scientists and the software and model developers have undertaken in addressing our concerns, in paying attention to our complaints and suggestions and, in finding solutions as fast as humanly possible without sacrificing the scientific integrity and validity of the data produced. Let's us remember that the points and the competition among teams are but the frosting of a very important cake: scientific progress. The cake is way more important than the frosting. That they have done it in relatively fast time [ Hey, not as fast I would have wanted it but I am in dire need of attending a nice and large BBQ party.], speaks very well of their commitment not only to the core scientific project but to US as a community. That they continously keep doing it , given our constant pressure ( in addition to the extremely high pressure of their professional environments) speaks very loud as to their personal and professional qualities. My thanks to all . My appreciation of all. Jose This and no other is the root from which a Tyrant springs; when he first appears he is a protector.â€ Plato ID: 14915 · Rating: 0 · rate: / Reply Quote

dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0	Message 14916 - Posted: 28 Apr 2006, 21:00:10 UTC - in response to Message 14905. Last modified: 28 Apr 2006, 21:02:16 UTC We found a bug that was accidentally introduced in the 5.06 release that ignores the cpu run time preference. ... We will place a fix soon. Sorry for any inconvenience. ... We will update the app later today with a fix. OH NO! The dreaded late-Friday-afternoon-code-release! dag --Finding aliens is cool, but understanding the structure of proteins is useful. ID: 14916 · Rating: 0 · rate: / Reply Quote

rbpeake Send message Joined: 25 Sep 05 Posts: 168 Credit: 247,828 RAC: 0	Message 14917 - Posted: 28 Apr 2006, 21:29:40 UTC - in response to Message 14916. OH NO! The dreaded late-Friday-afternoon-code-release! Maybe not so bad...the code was released late-Thursday, so technically I suppose this is the late-Friday-afternoon-fixed-code-release! :D It remains to be seen whether this will come to be known in the annals of Rosetta@home lore as the start of the dreaded bug-fix late-Friday-afternoon-code-release.....! ;D (Let's hope not, and let's reconvene on Monday....!) :D Regards, Bob P. ID: 14917 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 14918 - Posted: 28 Apr 2006, 22:06:41 UTC - in response to Message 14915. A subtle bug???? I just hope that my not so subtle harping and arguing helped in finding it. :) Since I am one of the ones that keeps the application in memory, it seems that subtle bug hit me with a vengeance. By subtle, I meant it didn't stand out on ralph because two situations had to occur: 1. nstruct had to be high enough to present a problematic run time which it wasn't, I think. 2. it only affects leave in memory users or users that only run R@h (if app never gets preempted). Obviously not subtle to those it affected. Sorry Jose. We just updated the app with the fix. I hope this takes care of your recent issues. ID: 14918 · Rating: 0 · rate: / Reply Quote

cMw Send message Joined: 24 Apr 06 Posts: 9 Credit: 14,036 RAC: 0	Message 14920 - Posted: 28 Apr 2006, 22:21:59 UTC - in response to Message 14918. A subtle bug???? I just hope that my not so subtle harping and arguing helped in finding it. :) Since I am one of the ones that keeps the application in memory, it seems that subtle bug hit me with a vengeance. By subtle, I meant it didn't stand out on ralph because two situations had to occur: 1. nstruct had to be high enough to present a problematic run time which it wasn't, I think. 2. it only affects leave in memory users or users that only run R@h (if app never gets preempted). Obviously not subtle to those it affected. Sorry Jose. We just updated the app with the fix. I hope this takes care of your recent issues. should it say in messages that i recieved an update or ? ID: 14920 · Rating: 0 · rate: / Reply Quote

Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0	Message 14923 - Posted: 28 Apr 2006, 23:23:31 UTC - in response to Message 14918. Last modified: 28 Apr 2006, 23:29:01 UTC Sorry Jose. We just updated the app with the fix. I hope this takes care of your recent issues. Hope you read message 14915 (Specially point 3). No need to apologize. I am in awe of how committed all of you are. I will repeat it : 3- I would be unfair if I don't recognize and applaud the massive effort the project scientists and the software and model developers have undertaken in addressing our concerns, in paying attention to our complaints and suggestions and, in finding solutions as fast as humanly possible without sacrificing the scientific integrity and validity of the data produced. Let's us remember that the points and the competition among teams are but the frosting of a very important cake: scientific progress. The cake is way more important than the frosting. That they have done it in relatively fast time [ Hey, not as fast I would have wanted it but I am in dire need of attending a nice and large BBQ party.], speaks very well of their commitment not only to the core scientific project but to US as a community. That they continously keep doing it , given our constant pressure ( in addition to the extremely high pressure of their professional environments) speaks very loud as to their personal and professional qualities. My thanks to all . My appreciation of all. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.â€ Plato ID: 14923 · Rating: 0 · rate: / Reply Quote

Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0	Message 14924 - Posted: 28 Apr 2006, 23:25:50 UTC - in response to Message 14921. BTW what I said in post 14915 also applies to you. Thanks for your patience. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.â€ Plato ID: 14924 · Rating: 0 · rate: / Reply Quote

David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0	Message 14928 - Posted: 29 Apr 2006, 0:06:36 UTC I did. Your comments are very much appreciated. ID: 14928 · Rating: 0 · rate: / Reply Quote

Cureseekers~Kristof Send message Joined: 5 Nov 05 Posts: 80 Credit: 689,603 RAC: 0	Message 14998 - Posted: 29 Apr 2006, 14:00:29 UTC So if I understand it well, the jobs with version 5.06 will not run for the number of hours set in the preferences, but a fixed number of hours? How much is this? 3 hours? Do you recomend aborting jobs with engine 5.06? Member of Dutch Power Cows ID: 14998 · Rating: 0 · rate: / Reply Quote

Jose Send message Joined: 28 Mar 06 Posts: 820 Credit: 48,297 RAC: 0	Message 15003 - Posted: 29 Apr 2006, 14:59:16 UTC - in response to Message 14998. So if I understand it well, the jobs with version 5.06 will not run for the number of hours set in the preferences, but a fixed number of hours? How much is this? 3 hours? Do you recommend aborting jobs with engine 5.06? IMHO... Don't do that there is good science to be obtained from them. This and no other is the root from which a Tyrant springs; when he first appears he is a protector.â€ Plato ID: 15003 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 15048 - Posted: 29 Apr 2006, 22:14:07 UTC - in response to Message 15003. So if I understand it well, the jobs with version 5.06 will not run for the number of hours set in the preferences, but a fixed number of hours? How much is this? 3 hours? Do you recommend aborting jobs with engine 5.06? IMHO... Don't do that there is good science to be obtained from them. I believe the 5.06 WUs run for a fixed number of models, rather than the runtime preference. The work they produce is every bit as useful to the project as the 5.07 WUs. So, it they appear to be running normally (i.e. stepping through models in the graphics display), leave them to do their thing. Credits are issued based on the CPU time you put into the work, not number of models or number of WUs. So the longer they take crunching, the more credit they earn. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 15048 · Rating: 0 · rate: / Reply Quote

Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0	Message 15057 - Posted: 30 Apr 2006, 2:23:54 UTC - in response to Message 15048. Feet1st is right! You can keep your 5.06 workunits in the queue... they don't have any known errors and will give us useful science results. They just may take shorter or longer than usual. So if I understand it well, the jobs with version 5.06 will not run for the number of hours set in the preferences, but a fixed number of hours? How much is this? 3 hours? Do you recommend aborting jobs with engine 5.06? IMHO... Don't do that there is good science to be obtained from them. I believe the 5.06 WUs run for a fixed number of models, rather than the runtime preference. The work they produce is every bit as useful to the project as the 5.07 WUs. So, it they appear to be running normally (i.e. stepping through models in the graphics display), leave them to do their thing. Credits are issued based on the CPU time you put into the work, not number of models or number of WUs. So the longer they take crunching, the more credit they earn. ID: 15057 · Rating: 0 · rate: / Reply Quote

anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0	Message 15073 - Posted: 30 Apr 2006, 9:18:42 UTC - in response to Message 15057. [quote]Feet1st is right! You can keep your 5.06 workunits in the queue... they don't have any known errors and will give us useful science results. They just may take shorter or longer than usual.[quote] Hi Rhiju When I look at my results they are either shorter due to few nstruct or they finish according to my time setting. Anders n ID: 15073 · Rating: 0 · rate: / Reply Quote

Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0	Message 15087 - Posted: 30 Apr 2006, 16:08:50 UTC I've started getting lockups on my x2 3800+ system over the last week; I think it's a RAM problem as it has got worse and worse over time, I can barely get 20 minutes loaded out of it now w/o a lockup. Running dual prime seems to stress the system less than Rosetta, certainly CPU temps are 2 or 3 degrees lower. So I've detached the system and am priming again, trying to nail exactly what's wrong. ID: 15087 · Rating: 0 · rate: / Reply Quote

Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0	Message 15090 - Posted: 30 Apr 2006, 17:27:09 UTC Last modified: 30 Apr 2006, 17:33:45 UTC Is there something grossly different about 5.06 and 5.07, or are the current WUs just more demanding? Prime is still running after 90 minutes with lower temps than Rosetta, which seems to tell me that Rosetta has become extremely tough on the hardware. I know it's difficult to quantify, but is this acknowledged by those that know more about it? NB: this seems to be the wrong thread for my comments, but it's possible WU problems are going to be mistook for hardware problems if the WUs really have got tougher. Where shall I take this? ID: 15090 · Rating: 0 · rate: / Reply Quote

tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0	Message 15092 - Posted: 30 Apr 2006, 18:18:16 UTC - in response to Message 15087. I've started getting lockups on my x2 3800+ system over the last week; I think it's a RAM problem as it has got worse and worse over time, I can barely get 20 minutes loaded out of it now w/o a lockup. Running dual prime seems to stress the system less than Rosetta, certainly CPU temps are 2 or 3 degrees lower. So I've detached the system and am priming again, trying to nail exactly what's wrong. Wrong thread but you can try to relax the RAM timings a bit and see if it helps. If you do Prime do the Blend test which tests RAM extensively and use Memtest86. ID: 15092 · Rating: 0 · rate: / Reply Quote

BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0	Message 15305 - Posted: 2 May 2006, 15:52:05 UTC Moderator9: Jim's "x2 3800+ system" is an AMD dual core cpu. While HT may be turned off on some Intel motherboards, with the dual core Athlon 64s and Opterons, once you've got the correct bios and OS HAL in use, you've got two cpus on the same socket. For the dual core cpus, if you want him to limit it to just one cpu core - you'll have to mention how to change Boinc/Rosetta to only use 1 cpu. Jim: Feel free to open up a new thread; but can you provide links to the failing WUs? Give the amount and type of Ram? (Did it pass the Memtest86+ tests?) (are you using more ram than you physically have, or are the new WUs hitting bad memory locations that earlier ones didn't?), what temp is the cpu running at? (is the added heat getting the cpu near where it can cause the lockups?) Have you verified that the system is spyware/virus/trojan free by running at least 4 anti spyware scans 2 anti virus scans and ewido&Trojan Hunter for trojans? (Do you have software on your system that may conflict with Boinc/Rosetta and cause lockups?) How is Boinc setup and running on your machine? i.e. is a it a service install? ID: 15305 · Rating: 0 · rate: / Reply Quote

Jimi@0wned.org.uk Send message Joined: 10 Mar 06 Posts: 29 Credit: 335,252 RAC: 0	Message 15311 - Posted: 2 May 2006, 18:25:26 UTC Last modified: 2 May 2006, 18:25:57 UTC Thanks for the input BennyRop. I detached the project and gave the system a good test again (memtest, dual prime), finally raised the vcore from 1.39v to 1.42v and dual primed for 6 hours, everything is ok once more. Tried 1.39v to check in Rosetta, locked up in 4 minutes. Why it ran pefectly well for nearly 4 weeks at lower vcore and then turned nasty is a mystery. It has been running fine since yesterday afternoon at the higher voltage. It is a bare build that only does Rosetta. I'm tempted to go Linux 64-bit on it tbh. ID: 15311 · Rating: 0 · rate: / Reply Quote

Bogdan Kosanovic Send message Joined: 17 Jan 06 Posts: 3 Credit: 1,212 RAC: 0	Message 15437 - Posted: 3 May 2006, 20:57:15 UTC I don't know what was done in 5.07, but it doesn't want to stop working. I scheduled 50% for Rosetta and 50% for SETI. When Rosetta is supposed to be preempted, it doesn't stop. It keeps running in parallel with SETI. Also, when it runs, I almost can't do anything else on my laptop. It did not work like this before (a few weeks ago). When I try to suspend it, I can't. Although it shows as being suspended, I can still see the process taking 80-90% of CPU time. It completely brings my system down and I have to manually kill the process in order to get it to stop... I had to install XP Service Pack 2 as well 2 weeks ago. Would it be related to that? Currently I had to suspend Rosetta since it doesn't behave nicely... I may resume some time in future when you get it back to normal... Regards, Bogdan ID: 15437 · Rating: 0 · rate: / Reply Quote