Discuss Rosetta Application Errors and Fixes (all Vers)

Message boards : Number crunching : Discuss Rosetta Application Errors and Fixes (all Vers)

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
cMw

Send message
Joined: 24 Apr 06
Posts: 9
Credit: 14,036
RAC: 0
Message 14913 - Posted: 28 Apr 2006, 20:18:42 UTC - in response to Message 14909.  

After looking into the cpu run time bug more, it appears that it should only effect users who keep the app in memory since the logic will pick up the run time preference if the app makes decoys and restarts. For those who leave the app in memory, the app will never restart. We will update the app later today with a fix.

i hope so and im guessing the 502 pts i lost will be granted as well :D ?

btw when do you expect to have this updated by causing im losing time :(
ID: 14913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 14915 - Posted: 28 Apr 2006, 20:49:46 UTC

A subtle bug???? I just hope that my not so subtle harping and arguing helped in finding it. :) Since I am one of the ones that keeps the application in memory, it seems that subtle bug hit me with a vengeance.

I want to make this crystal clear:

As the self appointed Poster Boy of the Rosetta Bugs and its Official Bug Magnet (Yes I can still keep my sense of humor.), I have been vocal in pointing out the bugs : vocal and (maybe for some tastes too fast) and constant in pointing out the bugs. My level of posting has been so intense in the last days that I know it has come close for the creation in this Board of a thread called "Jose is @@tching again!!!!) but, I digress.

The fact that I complained that much is an indication that:

1- I do believe this is a most worthwhile project. This is why all my computing resources, limited as they are, are totally committed to Rosetta.

2- I have to thanks the moderator and all the participants in the exchanges of ideas and information I have been involved with for their civility, and their desire to help me as an individual and the community as a whole. Some of the ideas that have been proposed to me seem to have worked. ( This said as I am crossing my fingers hoping the now-stable situation holds)

3- I would be unfair if I don't recognize and applaud the massive effort the project scientists and the software and model developers have undertaken in addressing our concerns, in paying attention to our complaints and suggestions and, in finding solutions as fast as humanly possible without sacrificing the scientific integrity and validity of the data produced. Let's us remember that the points and the competition among teams are but the frosting of a very important cake: scientific progress. The cake is way more important than the frosting.

That they have done it in relatively fast time [ Hey, not as fast I would have wanted it but I am in dire need of attending a nice and large BBQ party.], speaks very well of their commitment not only to the core scientific project but to US as a community.

That they continously keep doing it , given our constant pressure ( in addition to the extremely high pressure of their professional environments) speaks very loud as to their personal and professional qualities.

My thanks to all .
My appreciation of all.

Jose

This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 14915 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 14916 - Posted: 28 Apr 2006, 21:00:10 UTC - in response to Message 14905.  
Last modified: 28 Apr 2006, 21:02:16 UTC

We found a bug that was accidentally introduced in the 5.06 release that ignores the cpu run time preference.
...

We will place a fix soon.

Sorry for any inconvenience.

...
We will update the app later today with a fix.



OH NO! The dreaded late-Friday-afternoon-code-release!
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 14916 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rbpeake

Send message
Joined: 25 Sep 05
Posts: 168
Credit: 247,828
RAC: 0
Message 14917 - Posted: 28 Apr 2006, 21:29:40 UTC - in response to Message 14916.  

OH NO! The dreaded late-Friday-afternoon-code-release!

Maybe not so bad...the code was released late-Thursday, so technically I suppose this is the late-Friday-afternoon-fixed-code-release! :D

It remains to be seen whether this will come to be known in the annals of Rosetta@home lore as the start of the dreaded bug-fix late-Friday-afternoon-code-release.....! ;D (Let's hope not, and let's reconvene on Monday....!) :D

Regards,
Bob P.
ID: 14917 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1480
Credit: 4,334,829
RAC: 0
Message 14918 - Posted: 28 Apr 2006, 22:06:41 UTC - in response to Message 14915.  

A subtle bug???? I just hope that my not so subtle harping and arguing helped in finding it. :) Since I am one of the ones that keeps the application in memory, it seems that subtle bug hit me with a vengeance.


By subtle, I meant it didn't stand out on ralph because two situations had to occur:

1. nstruct had to be high enough to present a problematic run time which it wasn't, I think.
2. it only affects leave in memory users or users that only run R@h (if app never gets preempted).

Obviously not subtle to those it affected. Sorry Jose. We just updated the app with the fix. I hope this takes care of your recent issues.
ID: 14918 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cMw

Send message
Joined: 24 Apr 06
Posts: 9
Credit: 14,036
RAC: 0
Message 14920 - Posted: 28 Apr 2006, 22:21:59 UTC - in response to Message 14918.  

A subtle bug???? I just hope that my not so subtle harping and arguing helped in finding it. :) Since I am one of the ones that keeps the application in memory, it seems that subtle bug hit me with a vengeance.


By subtle, I meant it didn't stand out on ralph because two situations had to occur:

1. nstruct had to be high enough to present a problematic run time which it wasn't, I think.
2. it only affects leave in memory users or users that only run R@h (if app never gets preempted).

Obviously not subtle to those it affected. Sorry Jose. We just updated the app with the fix. I hope this takes care of your recent issues.

should it say in messages that i recieved an update or ?
ID: 14920 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 14923 - Posted: 28 Apr 2006, 23:23:31 UTC - in response to Message 14918.  
Last modified: 28 Apr 2006, 23:29:01 UTC

Sorry Jose. We just updated the app with the fix. I hope this takes care of your recent issues.


Hope you read message 14915 (Specially point 3). No need to apologize. I am in awe of how committed all of you are.

I will repeat it : 3- I would be unfair if I don't recognize and applaud the massive effort the project scientists and the software and model developers have undertaken in addressing our concerns, in paying attention to our complaints and suggestions and, in finding solutions as fast as humanly possible without sacrificing the scientific integrity and validity of the data produced. Let's us remember that the points and the competition among teams are but the frosting of a very important cake: scientific progress. The cake is way more important than the frosting.

That they have done it in relatively fast time [ Hey, not as fast I would have wanted it but I am in dire need of attending a nice and large BBQ party.], speaks very well of their commitment not only to the core scientific project but to US as a community.

That they continously keep doing it , given our constant pressure ( in addition to the extremely high pressure of their professional environments) speaks very loud as to their personal and professional qualities.

My thanks to all .
My appreciation of all.


This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 14923 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 14924 - Posted: 28 Apr 2006, 23:25:50 UTC - in response to Message 14921.  

BTW what I said in post 14915 also applies to you. Thanks for your patience.

This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 14924 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1480
Credit: 4,334,829
RAC: 0
Message 14928 - Posted: 29 Apr 2006, 0:06:36 UTC

I did. Your comments are very much appreciated.
ID: 14928 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Cureseekers~Kristof

Send message
Joined: 5 Nov 05
Posts: 80
Credit: 689,603
RAC: 0
Message 14998 - Posted: 29 Apr 2006, 14:00:29 UTC

So if I understand it well,
the jobs with version 5.06 will not run for the number of hours set in the preferences, but a fixed number of hours?
How much is this? 3 hours?

Do you recomend aborting jobs with engine 5.06?
Member of Dutch Power Cows
ID: 14998 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 15003 - Posted: 29 Apr 2006, 14:59:16 UTC - in response to Message 14998.  

So if I understand it well,
the jobs with version 5.06 will not run for the number of hours set in the preferences, but a fixed number of hours?
How much is this? 3 hours?

Do you recommend aborting jobs with engine 5.06?


IMHO... Don't do that there is good science to be obtained from them.

This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 15003 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15048 - Posted: 29 Apr 2006, 22:14:07 UTC - in response to Message 15003.  

So if I understand it well,
the jobs with version 5.06 will not run for the number of hours set in the preferences, but a fixed number of hours?
How much is this? 3 hours?

Do you recommend aborting jobs with engine 5.06?


IMHO... Don't do that there is good science to be obtained from them.

I believe the 5.06 WUs run for a fixed number of models, rather than the runtime preference. The work they produce is every bit as useful to the project as the 5.07 WUs. So, it they appear to be running normally (i.e. stepping through models in the graphics display), leave them to do their thing. Credits are issued based on the CPU time you put into the work, not number of models or number of WUs. So the longer they take crunching, the more credit they earn.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15048 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 15057 - Posted: 30 Apr 2006, 2:23:54 UTC - in response to Message 15048.  

Feet1st is right! You can keep your 5.06 workunits in the queue... they don't have any known errors and will give us useful science results. They just may take shorter or longer than usual.

So if I understand it well,
the jobs with version 5.06 will not run for the number of hours set in the preferences, but a fixed number of hours?
How much is this? 3 hours?

Do you recommend aborting jobs with engine 5.06?


IMHO... Don't do that there is good science to be obtained from them.

I believe the 5.06 WUs run for a fixed number of models, rather than the runtime preference. The work they produce is every bit as useful to the project as the 5.07 WUs. So, it they appear to be running normally (i.e. stepping through models in the graphics display), leave them to do their thing. Credits are issued based on the CPU time you put into the work, not number of models or number of WUs. So the longer they take crunching, the more credit they earn.


ID: 15057 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 15073 - Posted: 30 Apr 2006, 9:18:42 UTC - in response to Message 15057.  

[quote]Feet1st is right! You can keep your 5.06 workunits in the queue... they don't have any known errors and will give us useful science results. They just may take shorter or longer than usual.[quote]

Hi Rhiju

When I look at my results they are either shorter due to few nstruct or they finish according to my time setting.

Anders n
ID: 15073 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 15087 - Posted: 30 Apr 2006, 16:08:50 UTC

I've started getting lockups on my x2 3800+ system over the last week; I think it's a RAM problem as it has got worse and worse over time, I can barely get 20 minutes loaded out of it now w/o a lockup. Running dual prime seems to stress the system less than Rosetta, certainly CPU temps are 2 or 3 degrees lower. So I've detached the system and am priming again, trying to nail exactly what's wrong.
ID: 15087 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 15090 - Posted: 30 Apr 2006, 17:27:09 UTC
Last modified: 30 Apr 2006, 17:33:45 UTC

Is there something grossly different about 5.06 and 5.07, or are the current WUs just more demanding? Prime is still running after 90 minutes with lower temps than Rosetta, which seems to tell me that Rosetta has become extremely tough on the hardware. I know it's difficult to quantify, but is this acknowledged by those that know more about it?

NB: this seems to be the wrong thread for my comments, but it's possible WU problems are going to be mistook for hardware problems if the WUs really have got tougher. Where shall I take this?
ID: 15090 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15092 - Posted: 30 Apr 2006, 18:18:16 UTC - in response to Message 15087.  

I've started getting lockups on my x2 3800+ system over the last week; I think it's a RAM problem as it has got worse and worse over time, I can barely get 20 minutes loaded out of it now w/o a lockup. Running dual prime seems to stress the system less than Rosetta, certainly CPU temps are 2 or 3 degrees lower. So I've detached the system and am priming again, trying to nail exactly what's wrong.


Wrong thread but you can try to relax the RAM timings a bit and see if it helps. If you do Prime do the Blend test which tests RAM extensively and use Memtest86.
ID: 15092 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 15305 - Posted: 2 May 2006, 15:52:05 UTC

Moderator9: Jim's "x2 3800+ system" is an AMD dual core cpu. While HT may be turned off on some Intel motherboards, with the dual core Athlon 64s and Opterons, once you've got the correct bios and OS HAL in use, you've got two cpus on the same socket. For the dual core cpus, if you want him to limit it to just one cpu core - you'll have to mention how to change Boinc/Rosetta to only use 1 cpu.

Jim: Feel free to open up a new thread; but can you provide links to the failing WUs? Give the amount and type of Ram? (Did it pass the Memtest86+ tests?) (are you using more ram than you physically have, or are the new WUs hitting bad memory locations that earlier ones didn't?), what temp is the cpu running at? (is the added heat getting the cpu near where it can cause the lockups?) Have you verified that the system is spyware/virus/trojan free by running at least 4 anti spyware scans 2 anti virus scans and ewido&Trojan Hunter for trojans? (Do you have software on your system that may conflict with Boinc/Rosetta and cause lockups?) How is Boinc setup and running on your machine? i.e. is a it a service install?
ID: 15305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 15311 - Posted: 2 May 2006, 18:25:26 UTC
Last modified: 2 May 2006, 18:25:57 UTC

Thanks for the input BennyRop. I detached the project and gave the system a good test again (memtest, dual prime), finally raised the vcore from 1.39v to 1.42v and dual primed for 6 hours, everything is ok once more. Tried 1.39v to check in Rosetta, locked up in 4 minutes. Why it ran pefectly well for nearly 4 weeks at lower vcore and then turned nasty is a mystery. It has been running fine since yesterday afternoon at the higher voltage.

It is a bare build that only does Rosetta. I'm tempted to go Linux 64-bit on it tbh.
ID: 15311 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bogdan Kosanovic

Send message
Joined: 17 Jan 06
Posts: 3
Credit: 1,212
RAC: 0
Message 15437 - Posted: 3 May 2006, 20:57:15 UTC

I don't know what was done in 5.07, but it doesn't want to stop working. I scheduled 50% for Rosetta and 50% for SETI. When Rosetta is supposed to be preempted, it doesn't stop. It keeps running in parallel with SETI. Also, when it runs, I almost can't do anything else on my laptop. It did not work like this before (a few weeks ago). When I try to suspend it, I can't. Although it shows as being suspended, I can still see the process taking 80-90% of CPU time. It completely brings my system down and I have to manually kill the process in order to get it to stop...

I had to install XP Service Pack 2 as well 2 weeks ago. Would it be related to that?

Currently I had to suspend Rosetta since it doesn't behave nicely... I may resume some time in future when you get it back to normal...

Regards,
Bogdan
ID: 15437 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Discuss Rosetta Application Errors and Fixes (all Vers)



©2025 University of Washington
https://www.bakerlab.org