Report problems with Rosetta version 5.36

Message boards : Number crunching : Report problems with Rosetta version 5.36

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile [B^S] Gamma^Ray
Avatar

Send message
Joined: 20 Apr 06
Posts: 12
Credit: 21,284
RAC: 0
Message 30625 - Posted: 5 Nov 2006, 5:58:14 UTC

Wow did I get burned by this W/U. It ran just over 3 cpu hours which is an hour over my set cpu time, I noticed the screen saver was frozen at I believe it was 1.59 percent done. I watched it sit there for around 20 minutes without anything updating except the cpu time. When I watched the manager, The To Completion was continuly rising and I believe was up to 6 hours at that point. I first then tried to suspend, Then restart it which didn't change the results any as it was still stuck at the same spot. It did have excessive red bands hanging off the main structure as been reported already, Although with this one, It was attached to the backbone structure and hanging in all the way down past the window box. So at last resort, I paused the WU, Exited the manager completly, THen restarted the manager and resumed the WU. This time, It completly started from beginning as if it never ran it at all, Thus I aborted it. So if you look at the wu's details it shows I only ran it for 450, But thats the second run. The first 3 hours run is now history, And not listed. :(

https://boinc.bakerlab.org/rosetta/result.php?resultid=45567927

Workunit 38766177
name FRA_2rio_154E_hom001_1_2rio_1_2bdwA_IGNORE_THE_REST_36_1305_21
Computer ID 285870
<core_client_version>5.2.13</core_client_version>
XP Pro-AMD 3000+

G^R
ID: 30625 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 151
Credit: 4,244,078
RAC: 2,770
Message 30649 - Posted: 5 Nov 2006, 14:06:00 UTC

> The Screensaver problem started back with 5.32, first in Ralph then in Rosetta. It has caused my computer to freeze to the point that i had to reinstall Boinc and lost data due to current version of Boinc wiping out existing version and installing as if a new machine.
I have had the screensaver disabled for nearly 2 weeks now and have not had any lock ups or freezing problems, with all work units going ok, this includes the Ralph ones.
I have posted this before but the problem remains.

The problem with long WU time, low decoy count and low credit given, that cropped up in 5.34 looks like an occassional 5.36 is doing the same.
The following WU
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=39776702
it ran for 29517.153582 seconds (preference = 21600), generated 19 decoys off 13 (nstruct) times for 27.8752 cobblestones. This is equal to 3.4 c/h, which is about half what it should be.
ID: 30649 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 30676 - Posted: 6 Nov 2006, 5:28:52 UTC
Last modified: 6 Nov 2006, 5:34:44 UTC

Clicking through my team-mates' computer pages it turned out that a quarter of the computers (6 of 24) are showing errors more or less on a regular basis. Below I linked to the "results for computer" pages with two or more errors. These errors fall into two groups: about half are access violations (exit code 0xc0000005) and the other half are "validate errors" giving the message "Rosetta score is stuck or going too long. Watchdog is ending the run!" While some of the errors occured with version 5.34, it doesn't look like the error rate declined with the most recent app. version 5.36. I included the type of error (AV for access violation and S for "stuck or going too long") and also whether the errors occured in the most recent version with the links:

298712 (AV, S), 287935 (5.36, AV, S), 282932 (5.36, AV), 301240 (5.36, AV), 287942 (AV, S), 289776 (5.36, S)

I am a bit concerned that this relatively high error rate will discourage people from crunching. So, I guess it would be good if these problems were being looked into.

Thanks, -H.
Team betterhumans.com - discuss and celebrate the future - hoelder1in.org
ID: 30676 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 30677 - Posted: 6 Nov 2006, 6:09:29 UTC

Are your team members running with BOINC screensaver enabled?

My error rate has disappeared since I switched screensaver to blank. I wonder if RALPH@Home users are running with screensaver on. This might explain why a number of bugs don't seem to be detected while the next updates get tested.
ID: 30677 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Buckley

Send message
Joined: 7 Aug 06
Posts: 1
Credit: 45,505
RAC: 0
Message 30683 - Posted: 6 Nov 2006, 11:57:25 UTC

I'm another one with problems. Normally Rosetta locks up at least once a day. Normally it happens when a unit is completed and the system tries to send the results to Rosetta. The other projects, Seti & Einstein continue to work and seem fine.
ID: 30683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keith Jillings

Send message
Joined: 26 Sep 06
Posts: 7
Credit: 536,631
RAC: 0
Message 30700 - Posted: 6 Nov 2006, 16:59:49 UTC

I thought it was just me, till I saw this thread.

My computer locks up daily now if I run Rosetta - I've turned Rosetta off until it's fixed. It was fine until a few weeks ago.

I get the same thing every time. Machine frozen, won't respond to keypresses or mouse clicks; no mouse icon on the screen, and a message from ZoneAlarm telling me that Rosetta_5.36_windows_intelx86.exe is trying to access the Internet. I have to restart it to get it to work again.

Every other programme that's come up against ZoneAlarm doesn't lock up the machine: the others just wait for me to click "Allow" and all is well. Something odd in Rosetta_blahblah.exe, I assume.

I've subscribed to this thread so that I can read when it's been fixed and I can start crunching Rosetta again.
ID: 30700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 151
Credit: 4,244,078
RAC: 2,770
Message 30736 - Posted: 6 Nov 2006, 23:15:35 UTC

>> @Keith Jillings, your problem is, I believe with the Boinc Screensaver.
You will be able to keep crunching as long as you turn the Boinc screensaver off. Other screensavers work ok but not the Boinc one (Only Rosetta and Ralph are affected by this problem).
I noticed this back with 5.32 then 5.34 and reported it more than once but it is still a problem.
I have had no Rosetta problems since turning the Boinc screensaver off. Give it a whirl and see if it works.
ID: 30736 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keith Jillings

Send message
Joined: 26 Sep 06
Posts: 7
Credit: 536,631
RAC: 0
Message 30741 - Posted: 7 Nov 2006, 0:10:24 UTC

Thanks - BOINC screensaver duly turned off, and Rosetta back on.

I'll see what happens.

SETI is turned off at the moment, too - that's failing in something like 50% of work units, but a different problem with a different solution, I'm sure.
ID: 30741 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 30746 - Posted: 7 Nov 2006, 4:16:21 UTC

Thanks Conan.

Actually I've been running with it disabled for four days and no problems. This goes back to the question of how many on RALPH are doing the same. This may explain why this bug hasn't been fixed yet. You gotta break it to fix it. If a majority of the RALPH users are indeed running with screensaver off then that would explain a lot.

I'll hold steady and take a peak every so often.
ID: 30746 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hoelder1in
Avatar

Send message
Joined: 30 Sep 05
Posts: 169
Credit: 3,915,947
RAC: 0
Message 30747 - Posted: 7 Nov 2006, 4:50:36 UTC - in response to Message 30677.  
Last modified: 7 Nov 2006, 5:17:45 UTC

Are your team members running with BOINC screensaver enabled?
I haven't yet had enough feedback from the team to say for sure whether all of this is a screensaver issue - but it definitely is a possibility (I believe the errors first appeared in version 5.32 which updated the graphics) . Thanks for pointing this out.
Team betterhumans.com - discuss and celebrate the future - hoelder1in.org
ID: 30747 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 8 Oct 05
Posts: 60
Credit: 704,566
RAC: 423
Message 30748 - Posted: 7 Nov 2006, 4:57:01 UTC

I'm running both Ralph and Rosetta with the screensaver ON. I agree that if we take the easy way out and just turn it off, we will have no problems, but they will never fix it. I'm reporting errors both in Ralph and Rosetta.

Here's some more errors, BTW, but I can't say if they are due to the screensaver:
resultid=45561965
resultid=45523378
resultid=45492781

Crunch on!

ID: 30748 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RichardJ

Send message
Joined: 19 Mar 06
Posts: 8
Credit: 73,014
RAC: 0
Message 30763 - Posted: 7 Nov 2006, 11:53:08 UTC

I am getting the following messages:
07/11/2006 10:20:36|rosetta@home|Message from server: Your computer has only 234340352 bytes of memory; workunit requires 265659648 more bytes
07/11/2006 10:20:36|rosetta@home|Message from server: No work sent
07/11/2006 10:20:36|rosetta@home|Message from server: (there was work but your computer doesn't have enough memory)
07/11/2006 10:20:36|rosetta@home|No work from project
Is there any way to increase the memory? My operating system is Windows XP. Ihave been running Rosetta continuously for over 6 months without this peoblem ocurring before.
ID: 30763 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 151
Credit: 4,244,078
RAC: 2,770
Message 30764 - Posted: 7 Nov 2006, 13:07:46 UTC

>> @ Rosetta/Ralph Project Team,

There is a problem with the Rosetta and Ralph projects that relates to the Boinc Screensaver, that has been happening since Ralph release 5.28 on or about the 6/10/06.
The problems that were occuring then and reported in Ralph thread "http://ralph.bakerlab.org/forum_thread.php?id=255", are still occuring.
Problems such as screens locking up and not responding, workunits needing to be killed by Taskmanager to get access back to the computer (during this scenario the processor is doing very little and so is the computer, the workunit promptly then often errors out with the 'error 161' error code), workunits going so long that the Boinc watchdog has to kill the job with error 'workunit stuck', Windows runtime errors sometimes develop (and worse case for me needed a computer reformat and rebuild (plus $200+ computer Tech visit) to get working again).
Ralph releases 5.28 (start of the problems), through to current 5.38 all have the problem.
Rosetta releases 5.32 through to current 5.36 all have the problem.
Numerous reports have been made in a number of threads.

The only way I have been able to continue working with Ralph and Rosetta has been to effectively turn off the Boinc Screensaver and use a default one on Windows, not as pretty or informative but the computer keeps working and I have had no more lockups, lost workunits or errors caused by this problem.
It is only a stop gap 'fix' as it also stops me seeing the screensaver for other projects as well (I run between 2 and 5 projects on my Windows machines and they are the only ones having the problem).
My 5 other computers (both Linux and XP), do not have the problem as they do not have graphics enabled (Linux does not have graphics at all and Boinc is installed as a service on the XP machines).
By a lot of us turning off the graphics so we can keep working it masks the problems that are still there (as pointed out by fellow tester 'genes').
We need this problem fixed as all the work you are putting into the graphics only to have people turn them off to keep working, really wastes a lot of your time and resources.

This covers most of the reasons for this post (it has taken me over an hour as I accidently wiped all typing twice when switched screens, need to save as I type, repeat need to save as I type).

Thanks for your time.

Keep smiling it makes others wonder what you have been up to.
ID: 30764 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 30781 - Posted: 7 Nov 2006, 20:12:55 UTC
Last modified: 7 Nov 2006, 20:13:36 UTC

I got this error last night after 38 minutes i was not using graphics it

was not preempted no idea.

https://boinc.bakerlab.org/rosetta/result.php?resultid=45855465

ID: 30781 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 30784 - Posted: 7 Nov 2006, 21:38:40 UTC

Hi, there are many debugging options we can use and one target at screen savers
https://boinc.bakerlab.org/forum_thread.php?id=2550

I posted about it here, maybe some of the people having screensaver troubles could enable some of the debugging info then post what it says in the message tab related to it ?


Team mauisun.org
ID: 30784 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 151
Credit: 4,244,078
RAC: 2,770
Message 30793 - Posted: 8 Nov 2006, 0:23:21 UTC - in response to Message 30784.  

Hi, there are many debugging options we can use and one target at screen savers
https://boinc.bakerlab.org/forum_thread.php?id=2550

I posted about it here, maybe some of the people having screensaver troubles could enable some of the debugging info then post what it says in the message tab related to it ?



>> Thanks FluffyChicken, as this has been going on for over a month now and no cure insight, I will take your advice and try and debug the thing myself.
I followed your link and have created the 'cc_config.xml' file, placing it in the Boinc folder. I hope I have created it correctly, I did not include any options only the flags 'guirpc_debug' and 'scrsave_debug'. I included the guirpc as I was not sure what it did and gui is graphical user interface is it not? I am having graphic problems so included the debug feature.
I have enabled Boinc screensaver again on one machine and will wait and see what I find.


ID: 30793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
genes
Avatar

Send message
Joined: 8 Oct 05
Posts: 60
Credit: 704,566
RAC: 423
Message 30803 - Posted: 8 Nov 2006, 13:55:43 UTC

Here's one I just had fail due to the screensaver:

resultid=46126708

The machine is a dual Xeon with HT, so 4 processors, and BOINC is running 4 projects at a time. I just switched to Boinc CC version 5.7.2, but that had no effect on the behavior of Rosetta, it did the same things under 5.4.11.

Here's how it went, because I was exercising at the time and I saw it happen:
Boinc went into screensaver mode, and Seti was displayed. After 10 minutes the CC changed the screensaver to Rosetta. Rosetta was initially running, and the graphics were changing. Sometime during its 10 minute slice, it froze (the cpu time counter on the graphics stopped updating) while in the "relax" phase. At the end of the slice, the graphics changed to QMC, no problem (but Rosetta was already dead). Then CDPN, and Seti again, then it was Rosetta's turn. The Seti graphics just stopped updating but remained on the screen, and the taskbar appeared. I could see the Rosetta app on the taskbar, and I could move the mouse onto the taskbar, but no programs responded. Ctrl-alt-del got me the task manager, and I killed the Rosetta app. The screen came back to life, and everything worked normally after that. The Rosetta WU showed as a Computation error in the Boinc manager. I manually reported it a few minutes ago.

I'll look at the debugging options mentioned in the last post to see what I can do to help.


ID: 30803 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 30806 - Posted: 8 Nov 2006, 15:30:23 UTC - in response to Message 30793.  

Hi, there are many debugging options we can use and one target at screen savers
https://boinc.bakerlab.org/forum_thread.php?id=2550

I posted about it here, maybe some of the people having screensaver troubles could enable some of the debugging info then post what it says in the message tab related to it ?



>> Thanks FluffyChicken, as this has been going on for over a month now and no cure insight, I will take your advice and try and debug the thing myself.
I followed your link and have created the 'cc_config.xml' file, placing it in the Boinc folder. I hope I have created it correctly, I did not include any options only the flags 'guirpc_debug' and 'scrsave_debug'. I included the guirpc as I was not sure what it did and gui is graphical user interface is it not? I am having graphic problems so included the debug feature.
I have enabled Boinc screensaver again on one machine and will wait and see what I find.



Don't do the GUIRPC one otherwise you'll have a never ending list updated very very fast.
I would stick with the screensaver one.

Just in case the
cc_config.xml file should be for screensaver
<cc_config>
<log_flags>
<scrsave_debug>1</scrsave_debug>
</log_flags>
</cc_config>

Also attach to Ralph@Home as we are up to R@H 5.40 there trying to fix some error code. http://ralph.bakerlab.org

You will know if the logging is working as it'll be logging from the beginning
08/11/2006 15:22:51||[scrsave_debug] ACTIVE_TASK::check_graphics_mode_ack(): got graphics ack <mode_hide_graphics/> for 1dcj__ETABLE_TEST_ABRELAX_rhh13sm6__1470_193_0, previous mode <mode_unsupported/>

Also example of mem_use debug, updated every 10 seconds, though
08/11/2006 15:22:59|ralph@home|[mem_usage_debug] 1dcj__ETABLE_TEST_ABRELAX_rhh13sm6__1470_193_0: RAM 28.30MB, page 54.39MB, 710.91 page faults/sec, user CPU 8.903, kernel CPU 0.110

Team mauisun.org
ID: 30806 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 30816 - Posted: 8 Nov 2006, 20:42:39 UTC

This WU failed when I did a file->exit of BOINC prior to rebooting the PC. I have a 24hr preference, and it only ran 15hrs, and ended upon my reboot, so, I'm pretty sure this CAUSED it to end.

This is a 1ogw__BOINC_POSE_ABRELAX_NEWRELAXFLAGS__1341_6529_0 WU.

Messages just shows the WU resuming after starting BOINC, and then 80 seconds later says computation is finished. It shows a successful outcome, but it shouldn't have ended when it did.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 30816 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 30818 - Posted: 8 Nov 2006, 20:55:07 UTC - in response to Message 30764.  
Last modified: 8 Nov 2006, 20:57:54 UTC

Hi Conan:

Thanks for keeping us posted on this -- your problems (including the reformat) sound bad, and we're really glad that you're sticking with Rosetta@home and ralph and helping us debug. Based on your first reports with 5.28, Chu and I thought your problems might stem from one of the new kinds of workunits that we are testing. Your report that things run smoothly with the screensaver off tells us that this is probably not the case.

So it seems to be a graphics-induced problem. We were able to trap such a problem in 5.28-5.30 and fix it, but apparently the app still isn't working for you. Please keep us posted on whether Fluffychicken's fix helps you. Otherwise, we'll need to track down what is particularly wrong with your system (our overall error rate from Windows machines remains as low as our pre 5.28 applications).


>> @ Rosetta/Ralph Project Team,

There is a problem with the Rosetta and Ralph projects that relates to the Boinc Screensaver, that has been happening since Ralph release 5.28 on or about the 6/10/06.
The problems that were occuring then and reported in Ralph thread "http://ralph.bakerlab.org/forum_thread.php?id=255", are still occuring.
Problems such as screens locking up and not responding, workunits needing to be killed by Taskmanager to get access back to the computer (during this scenario the processor is doing very little and so is the computer, the workunit promptly then often errors out with the 'error 161' error code), workunits going so long that the Boinc watchdog has to kill the job with error 'workunit stuck', Windows runtime errors sometimes develop (and worse case for me needed a computer reformat and rebuild (plus $200+ computer Tech visit) to get working again).
Ralph releases 5.28 (start of the problems), through to current 5.38 all have the problem.
Rosetta releases 5.32 through to current 5.36 all have the problem.
Numerous reports have been made in a number of threads.

The only way I have been able to continue working with Ralph and Rosetta has been to effectively turn off the Boinc Screensaver and use a default one on Windows, not as pretty or informative but the computer keeps working and I have had no more lockups, lost workunits or errors caused by this problem.
It is only a stop gap 'fix' as it also stops me seeing the screensaver for other projects as well (I run between 2 and 5 projects on my Windows machines and they are the only ones having the problem).
My 5 other computers (both Linux and XP), do not have the problem as they do not have graphics enabled (Linux does not have graphics at all and Boinc is installed as a service on the XP machines).
By a lot of us turning off the graphics so we can keep working it masks the problems that are still there (as pointed out by fellow tester 'genes').
We need this problem fixed as all the work you are putting into the graphics only to have people turn them off to keep working, really wastes a lot of your time and resources.

This covers most of the reasons for this post (it has taken me over an hour as I accidently wiped all typing twice when switched screens, need to save as I type, repeat need to save as I type).

Thanks for your time.

Keep smiling it makes others wonder what you have been up to.


ID: 30818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Report problems with Rosetta version 5.36



©2024 University of Washington
https://www.bakerlab.org