Rosetta@home

Problems with Rosetta stable version 5.69 and beta version 5.77

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Problems with Rosetta stable version 5.69 and beta version 5.77

Sort
AuthorMessage
David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 942
ID: 14
Credit: 2,303,046
RAC: 485
Message 45242 - Posted 21 Aug 2007 17:40:15 UTC

Please post any bugs regarding rosetta_beta_5.77 and/or rosetta_5.69.

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1740
ID: 44890
Credit: 2,350,568
RAC: 3,695
Message 45250 - Posted 21 Aug 2007 19:43:37 UTC

What exactly is meant by calling one version "stable" and the other version "beta"? I thought beta testing was all done on Ralph.

When you originally went to running two applications at the same time, we were told that the "beta" in the name would be changed to something without the word beta in the name.
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 942
ID: 14
Credit: 2,303,046
RAC: 485
Message 45273 - Posted 22 Aug 2007 18:45:50 UTC

The beta version is the latest and greatest, and is used to test new science but since it is constantly under development, it is less stable in regard to the science and experiments. For example, a researcher may add new code to hopefully improve design, but may inadvertently add a bug in structure prediction. This "beta" version is what gets tested on ralph first to make sure it runs okay and doesn't crash on people's computers, then the beta app gets put on R@h for experiments (for example, to see if design is improved). Also, some experiments need to use the same application version to ensure that the results stay consistent so we have a "stable" version that does not get updated as frequently. The "beta" in the name is used for convenience. For windows, the name is in the binary to reference the symbol store for debugging so if we had to change the name of the app for R@h, we'd have to recompile it.

anders n Profile

Joined: Sep 19 05
Posts: 403
ID: 578
Credit: 537,991
RAC: 0
Message 45371 - Posted 25 Aug 2007 6:53:23 UTC
Last modified: 25 Aug 2007 6:54:27 UTC

This Wu showed as running
but no CPU time was counting and I could not look at grafics. It was like this for 7,5 H.

Anders n

Edit After restarting Boinc 5.10.18 the Wu started normal.

Karel

Joined: Aug 18 07
Posts: 1
ID: 199288
Credit: 21,277
RAC: 0
Message 45396 - Posted 25 Aug 2007 13:27:37 UTC

Both Accepted energy and RMSD graphs are sometimes horribly distorted:



This happened on Rosetta 5.69 but I think 5.77 has the same issue.

Disclaimer: Sorry if this has already been discussed somewhere, I'm new to this project and this message board and this place looks like the right place that I can use to post a bug.
I know it's only a visual issue and it doesn't affect the science under the project in any way, but well...

Marcel Koopmans Profile

Joined: Aug 4 06
Posts: 8
ID: 103238
Credit: 1,134,689
RAC: 0
Message 45399 - Posted 25 Aug 2007 14:11:42 UTC

On my Mac's Core 2 Duo and G5 I get 5.77 jobs that hang after 2 seconds.
The only thing I can do is abort them.

with kind regards,
Marcel
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 45403 - Posted 25 Aug 2007 14:58:28 UTC

Karel, yes, there are still some issues with the autoscaling on the energy and RMSD graphs. As higher and lower values are reached, the chart is to automatically scale and include the new range of values (perhaps you've seen this autoscaling as a new model is starting).

So, it is something the Project Team is aware of and working to address. Yes, you are correct that the science work is still progressing well. It's just a quirk with how the graphs are shown.
____________
Rosetta Moderator: Mod.Sense

Ian_D Profile

Joined: Sep 21 05
Posts: 55
ID: 757
Credit: 4,216,173
RAC: 0
Message 45416 - Posted 25 Aug 2007 17:01:08 UTC
Last modified: 25 Aug 2007 17:02:46 UTC

Wuid=92517391
____________


Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 45423 - Posted 25 Aug 2007 19:20:00 UTC

Moved Ian's post here, he's wondering why that task got validation errors.
____________
Rosetta Moderator: Mod.Sense

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 45431 - Posted 25 Aug 2007 22:17:42 UTC

I've returned this one and it's still pending i,m the second to do it

any idea why. Was done with 5.69.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=92147057

____________


mdettweiler
Avatar

Joined: Oct 15 06
Posts: 33
ID: 118931
Credit: 2,509
RAC: 0
Message 45446 - Posted 26 Aug 2007 2:52:00 UTC - in response to Message ID 45431.

I've returned this one and it's still pending i,m the second to do it

any idea why. Was done with 5.69.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=92147057

From your signature:

Win98se, P4 3.0e (Prescott) 1gig Ram, ATI 128mb agp 8x card.
Win98se, P4 2.8e (Prescott) 1gig Ram + Nvidia 128mb agp 8x card.

Last time I heard, Windows 98 SE (and 95, 98 first edition, and Me) can't use more than 96 MB of RAM. That means that 928 MB of your RAM in each of those computers is going to waste--and those processors can easily take Windows XP. I think those computers are practically begging for XP (or at the very least 2000) to be put on them!
____________

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 45453 - Posted 26 Aug 2007 4:13:11 UTC

At the moment with all background tasks and Rosetta running it's

useing 340 MB, No problems go figure.


____________


Klimax

Joined: Apr 27 07
Posts: 29
ID: 170261
Credit: 107,923
RAC: 0
Message 45461 - Posted 26 Aug 2007 9:06:27 UTC - in response to Message ID 45446.

I've returned this one and it's still pending i,m the second to do it

any idea why. Was done with 5.69.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=92147057

From your signature:

Win98se, P4 3.0e (Prescott) 1gig Ram, ATI 128mb agp 8x card.
Win98se, P4 2.8e (Prescott) 1gig Ram + Nvidia 128mb agp 8x card.

Last time I heard, Windows 98 SE (and 95, 98 first edition, and Me) can't use more than 96 MB of RAM. That means that 928 MB of your RAM in each of those computers is going to waste--and those processors can easily take Windows XP. I think those computers are practically begging for XP (or at the very least 2000) to be put on them!


Not completely correct.512MB and more is waste and any of Win9X and ME (and apps running under them) cannot address memory bigger 4GB,because of unsupported extended memory functions,which are present in NT family(at least 2000 and XP).

Ian_D Profile

Joined: Sep 21 05
Posts: 55
ID: 757
Credit: 4,216,173
RAC: 0
Message 45536 - Posted 28 Aug 2007 7:31:00 UTC - in response to Message ID 45416.
Last modified: 28 Aug 2007 7:31:32 UTC

Any reason for the Validation Error - there's a few on my account from "Lister"

Wuid=92517391


* BUMP *
____________


Mark Schuster

Joined: Dec 30 05
Posts: 1
ID: 45032
Credit: 28,158
RAC: 0
Message 45568 - Posted 29 Aug 2007 11:05:45 UTC

5.77 Beta appears to be running out of control on my Mac. Running OS X 10.4.10.

Current preferences are set to run only while idle - yet the Rosetta application keeps running when the user is active. This morning, Activity Monitor showed the Rosetta app using 193.6% of the CPU processing power while user is active.
____________

Yank Profile
Avatar

Joined: Apr 18 06
Posts: 69
ID: 77735
Credit: 1,643,014
RAC: 0
Message 45641 - Posted 1 Sep 2007 2:42:23 UTC
Last modified: 1 Sep 2007 2:43:37 UTC

Running Windows XP on a Dual core Intel 1.66 with 1 GB memory, Boinc 5.10.13 and just downloaded a bunch of Rosetta beta version 5.77 units. First two units I aborted due to the constant increasing and decreasing time to completion during a run of about 6 hours. Is there a problem with this version and if so should I delete the whole batch of them?
____________

BitSpit
Avatar

Joined: Nov 5 05
Posts: 33
ID: 9581
Credit: 4,147,344
RAC: 0
Message 45654 - Posted 1 Sep 2007 13:38:09 UTC
Last modified: 1 Sep 2007 13:38:31 UTC

This job hung at 100%. I restarted BOINC. It ran 3 minutes longer, generated one more decoy, but apparently lost the other 22.

http://boinc.bakerlab.org/rosetta/result.php?resultid=103058570

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 45682 - Posted 1 Sep 2007 22:52:43 UTC - in response to Message ID 45641.

...constant increasing and decreasing time to completion during a run of about 6 hours...


The time to completion is truely only updated every 5 seconds of runtime. Any change with the 5 seconds is BOINC rounding the fractional percentages and revising the estimate based on that.

Let it run. It will finish normally.

____________
Rosetta Moderator: Mod.Sense

Jerry Goggin Profile

Joined: Jun 7 06
Posts: 4
ID: 92225
Credit: 226,010
RAC: 0
Message 45720 - Posted 3 Sep 2007 12:27:04 UTC

Something goes wrong with 5.77 on my machine. It gets down to saying 00:09:57 and then stays there. So for the second time I am about to abort a task. Suspect this old PC just isn't capable or something. Been chugging along with RAH for over a year, I guess, but maybe it's time to quit.
____________

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 45722 - Posted 3 Sep 2007 14:43:35 UTC
Last modified: 3 Sep 2007 14:44:28 UTC

I have been getting a few 5.77s errors every day since Aug 31st.

103330789 93741341 31 Aug 2007 5.77
103558967 93993526
103330789 93741341
103688571 94115743 1 Sep 2007
103712992 94138486
103558967 93993526
103688571 94115743
103712992 94138486
103825254 94243513 2 Sep 2007
103825254 94243513
103825942 94244778
____________
Jmarks

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 45725 - Posted 3 Sep 2007 17:45:28 UTC - in response to Message ID 45720.
Last modified: 3 Sep 2007 20:43:52 UTC

Something goes wrong with 5.77 on my machine. It gets down to saying 00:09:57 and then stays there. So for the second time I am about to abort a task. Suspect this old PC just isn't capable or something. Been chugging along with RAH for over a year, I guess, but maybe it's time to quit.


Jerry, ever tried just letting them run? They are probably doing just fine. Rosetta has no way to know ahead of time exactly how long it will take to crunch a given model, so the estimate is... well... just an estimate. Once it gets down to within 10 minutes of your target runtime, it starts to exponentially reduce the time remaining less and less, with the idea being that the time to completion will generally still be going down.

The watch dog is always there looking over your tasks, and prepared to abort them if it deems necessary.
____________
Rosetta Moderator: Mod.Sense

Warren B. Rogers

Joined: Oct 3 05
Posts: 5
ID: 2517
Credit: 821,633
RAC: 0
Message 45726 - Posted 3 Sep 2007 18:11:08 UTC - in response to Message ID 45725.
Last modified: 3 Sep 2007 18:16:43 UTC

Something goes wrong with 5.77 on my machine. It gets down to saying 00:09:57 and then stays there. So for the second time I am about to abort a task. Suspect this old PC just isn't capable or something. Been chugging along with RAH for over a year, I guess, but maybe it's time to quit.


Jerry, ever tried just letting them run? That are probably doing just fine. Rosetta has no way to know ahead of time exactly how long it will take to crunch a given model, so the estimate is... well... just an estimate. Once it gets down to within 10 minutes of your target runtime, it starts to exponentially reduce the time remaining less and less, with the idea being that the time to completion will generally still be going down.

The watch dog is always there looking over your tasks, and prepared to abort them if it deems necessary.


Good day all. I've also noticed this problem with version 5.77 but it doesn't happen all of the time only occasionally. When I notice that this is happening I usually suspend that WU and let something else run. The one time I just let it run it was at 09:57 for about 3 hours and the count down timer was not moving. Also, I saw the amount of work being done drop to the 1000th of a percent/sec which was considerably slower than it was for the first 96% of the WU. I didn't want to abort the WU and just hoped that if it did something else for a while and worked it's way back to the WU it would finish it properly. Well, it did take about 1 hour to finish but it was much better than the rate it was moving at. I hope my experience is helpful.

Warren Rogers

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,590,569
RAC: 2,277
Message 45731 - Posted 3 Sep 2007 20:29:44 UTC - in response to Message ID 45726.


Good day all. I've also noticed this problem with version 5.77 but it doesn't happen all of the time only occasionally. When I notice that this is happening I usually suspend that WU and let something else run. The one time I just let it run it was at 09:57 for about 3 hours and the count down timer was not moving. Also, I saw the amount of work being done drop to the 1000th of a percent/sec which was considerably slower than it was for the first 96% of the WU. I didn't want to abort the WU and just hoped that if it did something else for a while and worked it's way back to the WU it would finish it properly. Well, it did take about 1 hour to finish but it was much better than the rate it was moving at. I hope my experience is helpful.

Warren Rogers


Was it that WU which took your computer about 19K seconds? In that case there was nothing wrong, in my opinion. It is just one of those work units which need a large amount of time to generate even one model. And since at least one model needs to be generated, this one model may exceed your pre-set runtime. Maybe the low modelcount also causes the inaccuracies in the estimated time remaining.

____________

Winkle

Joined: May 22 06
Posts: 88
ID: 83983
Credit: 1,354,930
RAC: 0
Message 45739 - Posted 4 Sep 2007 11:18:11 UTC

Hi
On Beta 5.77. I suspended the Rosetta project through BOINC manager, and the project came up suspended, but task manager still says Rosetta is running at approx 90% CPU time. Is this a known bug ?
Ian

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 45745 - Posted 4 Sep 2007 15:00:41 UTC

Winkle, I hear such thing about Linux off and on, but I see all of your machines are Windows. Which one is seeing this occur? What BOINC version is installed there?
____________
Rosetta Moderator: Mod.Sense

Warren B. Rogers

Joined: Oct 3 05
Posts: 5
ID: 2517
Credit: 821,633
RAC: 0
Message 45748 - Posted 4 Sep 2007 16:08:51 UTC - in response to Message ID 45731.


Good day all. I've also noticed this problem with version 5.77 but it doesn't happen all of the time only occasionally. When I notice that this is happening I usually suspend that WU and let something else run. The one time I just let it run it was at 09:57 for about 3 hours and the count down timer was not moving. Also, I saw the amount of work being done drop to the 1000th of a percent/sec which was considerably slower than it was for the first 96% of the WU. I didn't want to abort the WU and just hoped that if it did something else for a while and worked it's way back to the WU it would finish it properly. Well, it did take about 1 hour to finish but it was much better than the rate it was moving at. I hope my experience is helpful.

Warren Rogers


Was it that WU which took your computer about 19K seconds? In that case there was nothing wrong, in my opinion. It is just one of those work units which need a large amount of time to generate even one model. And since at least one model needs to be generated, this one model may exceed your pre-set runtime. Maybe the low modelcount also causes the inaccuracies in the estimated time remaining.


Yes it did take about 19K seconds to complete and I had another on that took 20K to complete as well. The problem is that it is taking about 1 1/2 to 2 hours to get to the 10 minute mark then it sort of hangs up there for about 4 hours unless I suspend the project and let something else run and let the BOINC manager work it's way back to the WU.

Thanks,

Warren

____________

DerAndreas

Joined: Jan 21 07
Posts: 2
ID: 143052
Credit: 110,247
RAC: 0
Message 45812 - Posted 9 Sep 2007 12:22:37 UTC

Hello to all,

On my Both machines there are download problems.
With 5.69 there shows this message
|Sending scheduler request: Requested by user
|Reporting 1 tasks
|Scheduler RPC succeeded
|Message from server: Server can't open log file (../log_boinc/cgi.log)
|Deferring communication for 1 hr 0 min 0 sec
|Reason: project is down

On the other maschine wich is running 5.77 there is this message:
|[file_xfer] Started upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1iibA-_filters_1782_486910_0_0
|[error] Error on file upload: can't open log file
|[file_xfer] Temporarily failed upload of CNTRL_01ABRELAX_SAVE_ALL_OUT_-1iibA-_filters_1782_486910_0_0: transient upload error
|Backing off 5 min 38 sec on upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1iibA-_filters_1782_486910_0_0

Sometimes the second message will by on the first one too.
On both machine running Boinc 5.10.20 the jobs and the error are in my previous version of Boinc 5.8.15 the same.

What can i do?

--
Greetings from Germany
____________

anders n Profile

Joined: Sep 19 05
Posts: 403
ID: 578
Credit: 537,991
RAC: 0
Message 45815 - Posted 9 Sep 2007 12:38:51 UTC - in response to Message ID 45812.

Hello to all,

What can i do?

--
Greetings from Germany


Nohting but wait for now:)

Winkle

Joined: May 22 06
Posts: 88
ID: 83983
Credit: 1,354,930
RAC: 0
Message 46896 - Posted 24 Sep 2007 11:41:57 UTC - in response to Message ID 45745.

Sorry for the late reply. The BOINC version is 5.4.9. I just got back from Fiji, and left it running while I was away. All the other machines were fine, but on this one BOINC had locked up. I rebooted and had to kill the tasks, and now operates normally. The machine number is 225833

Regards

Ian

Winkle, I hear such thing about Linux off and on, but I see all of your machines are Windows. Which one is seeing this occur? What BOINC version is installed there?

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 46907 - Posted 24 Sep 2007 16:29:24 UTC

Winkle, looks like you are running BOINC version 5.4.9 on that machine. Suggest you start by downloading and installing a more current BOINC version.
____________
Rosetta Moderator: Mod.Sense

Winkle

Joined: May 22 06
Posts: 88
ID: 83983
Credit: 1,354,930
RAC: 0
Message 46941 - Posted 24 Sep 2007 22:38:56 UTC - in response to Message ID 46907.

Thanks, will do. I hadn't realised it all had changed so much. The computers simply sit there and crunch away.

Winkle, looks like you are running BOINC version 5.4.9 on that machine. Suggest you start by downloading and installing a more current BOINC version.

Luuklag

Joined: Sep 13 07
Posts: 262
ID: 205058
Credit: 4,171
RAC: 0
Message 47279 - Posted 1 Oct 2007 15:33:58 UTC

dont know if this is a good place to post, but i c some admins and stuff post, so i hope i got some attention.

please read my post about the abrelax WU's, toppictitel is abbrelax btw ;)

i have 6 failed WU's most about a wrong sin/cosin value, so if some1 could have a look at it.

Luuklag

Mike Francis
Avatar

Joined: Nov 24 05
Posts: 8
ID: 17484
Credit: 623,519
RAC: 0
Message 47361 - Posted 3 Oct 2007 22:02:17 UTC

For quite a while I have had no prblems with any of the work units I have been sent.
Today, when I got home, there was one unit that had already been sent in and had a compute error,don't know what caused it.
While I was looking at it,one unit finished premature'ly. The error message I got was;

10/3/2007 5:49:23 PM|rosetta@home|Reason: Unrecoverable error for result CNTRL_01ABRELAX_SAVE_ALL_OUT_-1a19A-_filters_1782_524938_0 ( - exit code -1073741819 (0xc0000005))
10/3/2007 5:49:23 PM|rosetta@home|Computation for task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1a19A-_filters_1782_524938_0 finished

Hope this is of use.

Mike F,


____________

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 47376 - Posted 4 Oct 2007 12:44:31 UTC
Last modified: 4 Oct 2007 12:44:56 UTC

I do not know if this post is needed becuase I have a seperate Number crunching post about it validating but not showing up on bonicstats with the 2 other rosetta wu's that my pc completed yesterday. These were 5.80 wu's.
Since it was a 5.69 wu maybe you need to look at it.
109142983

If need to look at my full post here is the link.
Msg3638
____________
Jmarks

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 47471 - Posted 7 Oct 2007 0:10:26 UTC

Problem with this one, got stuck. app 5.69.

CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_490977_0_0

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=100428228

Pete.


____________


[XTBA>XTC] ZeuZ Profile

Joined: Jun 4 06
Posts: 2
ID: 86581
Credit: 19,725
RAC: 0
Message 47491 - Posted 7 Oct 2007 17:24:11 UTC
Last modified: 7 Oct 2007 17:26:11 UTC

Hello everybody

Somes crunchers seems to have a problem with the granted credit, the claimed credit is higher than the granted credit on a lot of wus

We don't know what happens, it's a bit annoying

For exemple

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=99953073

http://boinc.bakerlab.org/rosetta/results.php?hostid=587047&offset=40

http://boinc.bakerlab.org/rosetta/results.php?hostid=603973

Why these computers have that problem while the other don't?

Thank you very much

ZeuZ @ L'Alliance Francophone - XTC Mini TEAM

[AF>EDLS>Physique] Pas93 Profile

Joined: Sep 28 05
Posts: 3
ID: 1738
Credit: 951,958
RAC: 759
Message 47497 - Posted 7 Oct 2007 20:06:47 UTC

Yes, i'm the same problem


http://boinc.bakerlab.org/rosetta/results.php?hostid=591113

:(


____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 47516 - Posted 8 Oct 2007 15:38:22 UTC

i thought you guys would be happy with more credit.
rosie is rewarding your computer for finishing faster than the average of the other computers that have run this model.
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 47518 - Posted 8 Oct 2007 16:27:51 UTC - in response to Message ID 47516.

i thought you guys would be happy with more credit.
rosie is rewarding your computer for finishing faster than the average of the other computers that have run this model.


Actually ZueZ's point was they are awarded LESS then claimed. But Greg is on to the right cause, your machine took longer to complete the task then the BOINC benchmarks would have predicted. The benchmarks don't require much memory nor much L2 cache, and so they don't test enough to give a good prediction on how long a given machine will take to complete a given amount of work.

The credit claimed is just based on your machine's benchmark rating for the time period it worked on the task. The credit granted is based on an average of the claims of others that have also worked on models from the same batch of work.
____________
Rosetta Moderator: Mod.Sense

[XTBA>XTC] ZeuZ Profile

Joined: Jun 4 06
Posts: 2
ID: 86581
Credit: 19,725
RAC: 0
Message 47521 - Posted 8 Oct 2007 19:42:29 UTC
Last modified: 8 Oct 2007 19:49:15 UTC

Thank you for your replies guys

So the benchmark is a bit falsified on some of our machines, ok but here is a problem all the same http://boinc.bakerlab.org/rosetta/workunit.php?wuid=99953073

The benchmark seem to be right, a core2quad is a fast cpu, 2 point for 11 000 seconde of calculation is very strange, i think it's a problem on some wu because the other wu calculated are right http://boinc.bakerlab.org/rosetta/results.php?hostid=573215&offset=40


Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 47522 - Posted 8 Oct 2007 21:02:24 UTC - in response to Message ID 47521.

Thank you for your replies guys

So the benchmark is a bit falsified on some of our machines, ok but here is a problem all the same http://boinc.bakerlab.org/rosetta/workunit.php?wuid=99953073

The benchmark seem to be right, a core2quad is a fast cpu, 2 point for 11 000 seconde of calculation is very strange, i think it's a problem on some wu because the other wu calculated are right http://boinc.bakerlab.org/rosetta/results.php?hostid=573215&offset=40


David, Rhiju, check this out. The result shows two! completion sections. One shows 29 decoys and 10494 seconds of CPU, the other shows 30 decoys and 10861.7 seconds of CPU. ...and either one would normally have granted more then 2 credits. It's almost like it completed the task once and then later ran another model on it.

This is Windows XP Pro. on Intel Core2 Quad. With only 1GB of memory for 4CPUs, it would say this machine is probably memory constrained. From what I can tell, more of the machines reporting Linux problems are memory constrained as well.

ZeuZ, I didn't mean to say any of the bechmarks were falsified (although some people do that, and that is a big part of why Rosetta made a more independant credit system). I only meant that the work measured in the benchmark is trivial (simple) when compared to running Rosetta. So it is possible for one machine to show benchmarks twice as high as another, and yet it does not get twice as much work done.
____________
Rosetta Moderator: Mod.Sense

ziegenmelker

Joined: Jul 26 06
Posts: 10
ID: 101925
Credit: 26,061
RAC: 0
Message 47535 - Posted 9 Oct 2007 7:32:41 UTC
Last modified: 9 Oct 2007 7:33:40 UTC

This host really tries out hard to get a valid WU. :-(
Btw. when I shut down the machine yesterday, there were afair 9(!) instances of Rosetta@home in memory, each using 79MB of RAM. At that time one WU was aktive, another one was at some % and waiting to run again.

<core_client_version>5.10.8</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
# random seed: 2692519
pure virtual method called
SIGSEGV: segmentation violation
Stack trace (9 frames):
[0x8cdfe17]
[0x8cdac0c]
[0xffffe500]
[0x8d65433]
[0x8d4b794]
[0x8cdc897]
[0x8cddeb5]
[0x8cd6ea5]
[0x8d777fa]

Exiting...
terminate called without an active exception
SIGABRT: abort called
Stack trace (19 frames):
[0x8cdfe17]
[0x8cdac0c]
[0xffffe500]
[0x8d4b224]
[0x8d38b0e]
[0x8d35e9d]
[0x8d35ed2]
[0x8d355b5]
[0x8be23b3]
[0x8bea61d]
[0x8b50074]
[0x8c31c58]
[0x849a8a1]
[0x80dad6d]
[0x85c5a97]
[0x86eda4f]
[0x86edafa]
[0x8d44164]
[0x8048111]

Exiting...
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation
Stack trace (13 frames):
[0x8cdfe17]
[0x8cdac0c]
[0xffffe500]
[0x8c4a1db]
[0x8b51266]
[0x8c31c58]
[0x849a87c]
[0x80dad6d]
[0x85c5a97]
[0x86eda4f]
[0x86edafa]
[0x8d44164]
[0x8048111]

Exiting...
SIGSEGV: segmentation violation

</stderr_txt>
]]>


Right now two WUs are waiting to run:
Rosetta Beta 5.80: 1ubi__BOINC_ABRELAX_SHORTREL... 85,247 %
Rosetta 5.69: CNTRL_01ABRELAX_SAVE_ALL_OU... 9,768

Nothing related to Rosetta in memory. If this is going to change, I will report here.

cu,
Michael

edit: spelling
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 47540 - Posted 9 Oct 2007 12:39:58 UTC

Michael, BOINC only runs one task per CPU. But it can get more tasks started if it starts to use more memory then your preference. These tasks are preempted and go to a "waiting for memory" state. On Windows at least, these may still look like they consume memory, but it is only in the swap file while they await BOINC to move them back to a state of "running". So you may want to review your General Preferences for how much memory BOINC is allowed to use.

Please add your comments to the Linux thread as you study it further.
____________
Rosetta Moderator: Mod.Sense

ziegenmelker

Joined: Jul 26 06
Posts: 10
ID: 101925
Credit: 26,061
RAC: 0
Message 47546 - Posted 9 Oct 2007 16:08:36 UTC - in response to Message ID 47540.

Please add your comments to the Linux thread as you study it further.


Thanks, I'll do so.

cu,
Michal
____________

dcdc Profile

Joined: Nov 3 05
Posts: 1596
ID: 8948
Credit: 33,801,861
RAC: 17,327
Message 47606 - Posted 10 Oct 2007 21:41:49 UTC

i've just had a stalled task:

10/10/2007 22:41:00|rosetta@home|Restarting task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1cc8A-_filters_1782_552194_0 using rosetta version 569

did net stop/start boinc and it's continued running again.


____________

Luuklag

Joined: Sep 13 07
Posts: 262
ID: 205058
Credit: 4,171
RAC: 0
Message 47855 - Posted 19 Oct 2007 9:02:24 UTC

also could some1 have a look at my topic abbrelax WU's, which is spelled wrong xD 3 failed WU's in a row

Admin

Joined: Apr 13 07
Posts: 42
ID: 164784
Credit: 260,782
RAC: 0
Message 48159 - Posted 30 Oct 2007 18:41:08 UTC

Somethings wrong but i dont know what. Im continually getting validation errors on my results and not one has gone through successfully. You can check my results at:
http://boinc.bakerlab.org/rosetta/results.php?userid=164784

Can anyone help me?

I detached as i thought that might be an issue, but to no use. Ive suspended until i can figure this out.

KSMarksPsych Profile
Avatar

Joined: Oct 15 05
Posts: 199
ID: 4774
Credit: 22,337
RAC: 0
Message 48177 - Posted 31 Oct 2007 10:41:43 UTC - in response to Message ID 48159.

Somethings wrong but i dont know what. Im continually getting validation errors on my results and not one has gone through successfully. You can check my results at:
http://boinc.bakerlab.org/rosetta/results.php?userid=164784

Can anyone help me?

I detached as i thought that might be an issue, but to no use. Ive suspended until i can figure this out.


You're running Vista. There's two things to check. Is BOINC installed to the default directory (c:\program files\boinc)? That generally causes trouble. If so, uninstall BOINC and reinstall to somewhere outside c:\program files. I use c:\boinc and it's fine. Do you exit out of BOINC before shutting down the computer? BOINC usually doesn't have enough time to do its housekeeping with Vista's super speedy shutdown. There's a registry hack here or you can just shut down BOINC manually (file -> exit out of the manager or stop the service).

____________
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.

Admin

Joined: Apr 13 07
Posts: 42
ID: 164784
Credit: 260,782
RAC: 0
Message 48178 - Posted 31 Oct 2007 11:44:12 UTC - in response to Message ID 48177.

Somethings wrong but i dont know what. Im continually getting validation errors on my results and not one has gone through successfully. You can check my results at:
http://boinc.bakerlab.org/rosetta/results.php?userid=164784

Can anyone help me?

I detached as i thought that might be an issue, but to no use. Ive suspended until i can figure this out.


You're running Vista. There's two things to check. Is BOINC installed to the default directory (c:\program files\boinc)? That generally causes trouble. If so, uninstall BOINC and reinstall to somewhere outside c:\program files. I use c:\boinc and it's fine. Do you exit out of BOINC before shutting down the computer? BOINC usually doesn't have enough time to do its housekeeping with Vista's super speedy shutdown. There's a registry hack here or you can just shut down BOINC manually (file -> exit out of the manager or stop the service).


Well ive always had it in the default directory before and its never been an issue, but ill uninstall it to a different directory, as for shutting down I usually shut down boinc before down but sometimes theres an occasion where i put it to sleep while boinc is running and it starts up when i boot it back up. Is that a problem?

KSMarksPsych Profile
Avatar

Joined: Oct 15 05
Posts: 199
ID: 4774
Credit: 22,337
RAC: 0
Message 48219 - Posted 1 Nov 2007 10:05:09 UTC - in response to Message ID 48178.

i put it to sleep while boinc is running and it starts up when i boot it back up. Is that a problem?


No idea. I've never tried putting the computer to sleep and waking it back up. When I do need to take it to work (vary rarely) I just shut down BOINC and the entire computer because it ends up running on battery power all day (I can't plug it in at work because they don't have at 110-220 converter there).
____________
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.

Admin

Joined: Apr 13 07
Posts: 42
ID: 164784
Credit: 260,782
RAC: 0
Message 48240 - Posted 1 Nov 2007 19:23:05 UTC - in response to Message ID 48219.

i put it to sleep while boinc is running and it starts up when i boot it back up. Is that a problem?


No idea. I've never tried putting the computer to sleep and waking it back up. When I do need to take it to work (vary rarely) I just shut down BOINC and the entire computer because it ends up running on battery power all day (I can't plug it in at work because they don't have at 110-220 converter there).


Well I did what was suggested and reinstalled boic and put it on the c drive and results are fine now, strange but thanks for the help. I think it might have been the putting to sleep thing, ill stop doing that and see how things go.

Admin

Joined: Apr 13 07
Posts: 42
ID: 164784
Credit: 260,782
RAC: 0
Message 48260 - Posted 1 Nov 2007 23:37:43 UTC - in response to Message ID 48240.

i put it to sleep while boinc is running and it starts up when i boot it back up. Is that a problem?


No idea. I've never tried putting the computer to sleep and waking it back up. When I do need to take it to work (vary rarely) I just shut down BOINC and the entire computer because it ends up running on battery power all day (I can't plug it in at work because they don't have at 110-220 converter there).


Well I did what was suggested and reinstalled boic and put it on the c drive and results are fine now, strange but thanks for the help. I think it might have been the putting to sleep thing, ill stop doing that and see how things go.


You can forget that i just said because i check and two units went through fine but there are two that turned into validation errors i dont see anything in my error log, and i dont wanna keep on if they just turn into errors, im kinda stuck!

My results page:
http://boinc.bakerlab.org/rosetta/results.php?userid=164784

Admin

Joined: Apr 13 07
Posts: 42
ID: 164784
Credit: 260,782
RAC: 0
Message 48376 - Posted 5 Nov 2007 15:14:32 UTC - in response to Message ID 48260.

i put it to sleep while boinc is running and it starts up when i boot it back up. Is that a problem?


No idea. I've never tried putting the computer to sleep and waking it back up. When I do need to take it to work (vary rarely) I just shut down BOINC and the entire computer because it ends up running on battery power all day (I can't plug it in at work because they don't have at 110-220 converter there).


Well I did what was suggested and reinstalled boic and put it on the c drive and results are fine now, strange but thanks for the help. I think it might have been the putting to sleep thing, ill stop doing that and see how things go.


You can forget that i just said because i check and two units went through fine but there are two that turned into validation errors i dont see anything in my error log, and i dont wanna keep on if they just turn into errors, im kinda stuck!

My results page:
http://boinc.bakerlab.org/rosetta/results.php?userid=164784


I dont know whats going on, but this is constantly continuing so im going to pull out of Rosetta for the time being. Sorry guys.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 48379 - Posted 5 Nov 2007 17:30:22 UTC
Last modified: 5 Nov 2007 17:31:39 UTC

Admin, I've never seen so many errors on Windows before.

Have you tried detaching and reattaching to Rosetta? This will bring down a fresh copy of the control files that drive the application, in case one may have become corrupted somehow.

Is there anything else about your environment that might not be typical of other people running Windows? I guess the main difference is you are running Windows Vista.
____________
Rosetta Moderator: Mod.Sense

Admin

Joined: Apr 13 07
Posts: 42
ID: 164784
Credit: 260,782
RAC: 0
Message 48382 - Posted 5 Nov 2007 17:46:12 UTC - in response to Message ID 48379.

Admin, I've never seen so many errors on Windows before.

Have you tried detaching and reattaching to Rosetta? This will bring down a fresh copy of the control files that drive the application, in case one may have become corrupted somehow.

Is there anything else about your environment that might not be typical of other people running Windows? I guess the main difference is you are running Windows Vista.


Been there done that, i also tried moving boinc to the c drive, and if i put the computer to sleep i shutdown boinc before hand, and i make sure it has enough time to exit. I just dont know what to do, and i dont wanna keep sending you guys errors, since it doesnt help anyone. Nothing is showing in my error log so i have NO idea whats going on. ANY help would be appreciated at this point. But im thinking of detaching for now so i dont send any more errors.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 48385 - Posted 5 Nov 2007 18:13:08 UTC

I certainly can see your point, and desire to exit until problems are resolved. There are some BOINC and potentially some Rosetta problems with Vista. Going forward, it will be important to have some numbers to see if any changes improve or correct Vista problems. Could I ask that instead of leaving, you just lower your resource share so that you do a few Rosetta tasks each week? That way the project will continue to have some data from Vista coming in to help assess future changes. You see, this is one of those cases where "failure" is still informative and helps work towards improvement and helps the project.
____________
Rosetta Moderator: Mod.Sense

Admin

Joined: Apr 13 07
Posts: 42
ID: 164784
Credit: 260,782
RAC: 0
Message 48386 - Posted 5 Nov 2007 18:21:48 UTC - in response to Message ID 48385.

I certainly can see your point, and desire to exit until problems are resolved. There are some BOINC and potentially some Rosetta problems with Vista. Going forward, it will be important to have some numbers to see if any changes improve or correct Vista problems. Could I ask that instead of leaving, you just lower your resource share so that you do a few Rosetta tasks each week? That way the project will continue to have some data from Vista coming in to help assess future changes. You see, this is one of those cases where "failure" is still informative and helps work towards improvement and helps the project.


If you want me to continue i sure will. Just for my curiosity what are you able to gain from the WU's such as mine? I feel it could potentially be an issue with shutting down or putting the computer to sleep, although i seem to exit boinc for some reason i think it might be corrupting the files. I dont know why but the only WUs that went through successfully is when the computer and been on and not shut off at all.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 48398 - Posted 5 Nov 2007 21:23:11 UTC

I'm just trying to say that if everyone with Vista had the same problem you are having, and they stopped running Rosetta, then it will be difficult to prove when you've resolved the problems, because you have no Vista users left. So the knowledge that "nope, that didn't correct the problem either" is useful.

I haven't seen posts by many with Vista, and so have not reviewed many Vista hosts. But yours seems to be having exceptional problems. It might also be helpful if you attach to Ralph (where changes to Rosetta are tested).


____________
Rosetta Moderator: Mod.Sense

Message boards : Number crunching : Problems with Rosetta stable version 5.69 and beta version 5.77


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^