Report Problems with Rosetta Version 5.16 I

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

AuthorMessage
Seth Aaronson
Avatar

Send message
Joined: 5 Mar 06
Posts: 18
Credit: 3,976
RAC: 0
Message 16807 - Posted: 22 May 2006, 2:14:59 UTC - in response to Message 16804.  

Moderator9,
Since my errors and freezes seem to be related to the rosetta/BOINC screen saver, can you point me in the right direction to find some answers for the problems with that?
Now that I am not using the BOINC screen saver, rosetta is error free for me.



Seth,

Yes. Could you please attach to Ralph at this address. The programers are looking for problem system to help find this specific error.


What is the recommended way of doing that? Should I suspend rosetta after I've created a RALPH account, attach to RALPH, then start to use the BOINC screen saver? I'm also attached to SETI and Einstein. Please advise.
-Seth

You can just treat RALPH like any other project for the most part. The biggest difference is that while credits are awarded on RALPH there is no effort to restore lost credits. It is a development and diagnostic project. On a brighter note you will get to see the next versions of RALPH before the the rest of the world, and please do provide suggestion there if you think of any.

The link I provided is the URL that BOINC Manager is going to ask you for. Once you are attached, set the project priority low, say 10-20 percent share of your system. This will assure than when work is available you will get some, but it will not interfere with other processing too much. As far as running it just treat it as you would rosetta. If you have errors report them in the threads at RALPH, with a link to the result that had the error.

Thank you for the help.


Very well. I've attached to ralph and set its resource share to 20%. Thanks for your guidance. I'll be unsubscribing from this thread now.
Peace, year round.
-Seth

ID: 16807 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Laurenu2

Send message
Joined: 6 Nov 05
Posts: 57
Credit: 3,818,778
RAC: 0
Message 16812 - Posted: 22 May 2006, 3:56:45 UTC - in response to Message 16802.  

Hi Laurenu2... can you post the results page for one of your nodes that has this problem? Thanks!

I just looked through the pages for four or five of the nodes that are under your userid -- they all have had perfect success rates for
the last three days! We're not aware of any bad WU's being sent out on rosetta@home, and have been checking that the error rates are low. Obviously,
we need to know ASAP if there are any bad WUs. (There was a bad batch last week on ralph, but it was a small batch, and has been purged from the system.)

A lot of my nodes are without work due to reaching there WU quotas Rosetta should check there system and purge the BAD WU's they just sent out


Yes that is the same problem I have 60 to 70 PC's make Way way to many node pages to scan through
look here https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=196119
And
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=203528
There was another but it is lost in what I call my network

On this node
https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=218017
I found it locked up due to Rosetta eating up all the memory and about 500 MB of a swap file had to kill Rose through Task man rebooted and it started eating memory again about 400 meg on just under 3 min I had to abort that WU and then it worked fine again.

If You Want The Best You Must forget The Rest
---------------And Join Free-DC----------------
ID: 16812 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16816 - Posted: 22 May 2006, 6:51:08 UTC

Just a question: Are any of the people reporting errors of the 107 type using Zone Alarm?
Curious minds want to know.
ID: 16816 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hawgietonight

Send message
Joined: 18 Apr 06
Posts: 3
Credit: 808,621
RAC: 0
Message 16818 - Posted: 22 May 2006, 8:04:42 UTC - in response to Message 16816.  

Just a question: Are any of the people reporting errors of the 107 type using Zone Alarm?
Curious minds want to know.


No ZA here, just Xp's own firewall and AVG antivirus.
ID: 16818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stwato

Send message
Joined: 11 Jan 06
Posts: 150
Credit: 655,634
RAC: 0
Message 16819 - Posted: 22 May 2006, 8:21:41 UTC
Last modified: 22 May 2006, 8:23:09 UTC

I'm not sure if this is a 5.16 problem or whether its something to do with my computer but sometimes when I click 'show graphics' and maximise the graphics window, the very bottom part with Accepted Energy and Accepted RMSD dissapear behind/below the taskbar (obviously a Windows machine). For example, just now I displayed the graphics, maximised it and everything is good. Then I closed it, reopened it and remaximised it and the bottom bit was missing. Nothing else on my system changed between opening the windows. Any ideas?

If it helps I have a ATI Radeon 9700 graphics card. The computer is a laptop with a widescreen, could it be a resolution problem?

I've just noticed that the problem happens before maximisation, i.e. the bottom doesn't show in the small window if its not going to show in the big window and vice versa.

This is not a problem for me, just a little frustrating when trying to see the hidden details.

Stwato
[Edit: too many zero's on graphics card description]
ID: 16819 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 14 Apr 06
Posts: 29
Credit: 308,894
RAC: 531
Message 16823 - Posted: 22 May 2006, 11:13:09 UTC

Another one for you.

https://boinc.bakerlab.org/rosetta/result.php?resultid=21143590
Ian Cundell, St Albans, UK
ID: 16823 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 16824 - Posted: 22 May 2006, 11:18:54 UTC - in response to Message 16819.  
Last modified: 22 May 2006, 11:19:55 UTC

I'm not sure if this is a 5.16 problem or whether its something to do with my computer but sometimes when I click 'show graphics' and maximise the graphics window, the very bottom part with Accepted Energy and Accepted RMSD dissapear behind/below the taskbar (obviously a Windows machine).

Hi, I reported this back with version 5.12 in Ralph. I have pictures on several posts starting with this one. Later on a dev says he's going to try editting the text box. from that point on, the text in the text box doesn't line wrap but just runs off the screen to the right.

tony
ID: 16824 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16825 - Posted: 22 May 2006, 11:46:04 UTC - in response to Message 16818.  
Last modified: 22 May 2006, 12:00:03 UTC

ID: 16825 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile EdMulock
Avatar

Send message
Joined: 14 Mar 06
Posts: 30
Credit: 2,347,485
RAC: 0
Message 16826 - Posted: 22 May 2006, 11:46:12 UTC - in response to Message 16571.  

Win 98SE system with 256 MB memory frequently processes WUs and shows no CPU time after 8 hours. eg.

https://boinc.bakerlab.org/rosetta/result.php?resultid=20659581

My other systems are all XP and don't exhibit this problem.

This is a known issue with Win 98. They all do it. The programers are trying to find a way around the issue.


Time for some good news. A simple reboot ( 89SE had some refgistry updates to to - perhaps BOINC install ? ) fixed the problem. Been functioning normally for 48 hours. !

ID: 16826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16827 - Posted: 22 May 2006, 12:01:35 UTC - in response to Message 16807.  


Peace, year round.
-Seth


Peace
ID: 16827 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16832 - Posted: 22 May 2006, 14:02:47 UTC - in response to Message 16825.  
Last modified: 22 May 2006, 14:41:12 UTC


This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 16832 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
NewInCasp
Avatar

Send message
Joined: 12 May 06
Posts: 21
Credit: 5,229
RAC: 0
Message 16834 - Posted: 22 May 2006, 15:05:43 UTC - in response to Message 16412.  

Rosetta Version 5.16 has been released. Please report any problems in this thread.

The servers may be slow until the new application is distributed.

Version 5.16 has the following features;

(1) We're continuing our efforts to reduce memory usage by typical workunits by rosetta@home. You can expect an even further reduction in memory footprint in our next update.

(2) We're testing a new science mode which uses the sequence and structural information from homologous proteins in an early phase of the simulation, but then returns to the target protein sequence in the final refinement phase. This mode appears to have a larger memory footprint than typical workunits, so we will only send out these jobs to computers that have >1Gb RAM.

(3) Also, we're trying a new feature where at the end of a simulation, Rosetta compares its fold to the predictions made by a dozen other algorithms. (Those predictions are sent to the clients in a compressed format.) Seeing consensus between different algorithms is usually a good sign that a prediction is right.


i dont know why it stopped working suddenly. I will restart my job
ID: 16834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16835 - Posted: 22 May 2006, 15:07:09 UTC - in response to Message 16832.  

Jose,

If I understand you correctly all you did to achieve this result was remove the firewall? This might be a very important piece of information if it works. Keep us posted.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16835 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16837 - Posted: 22 May 2006, 15:18:00 UTC - in response to Message 16835.  

Jose,

If I understand you correctly all you did to achieve this result was remove the firewall? This might be a very important piece of information if it works. Keep us posted.


Yes, I am working without a firewall.
I will report.
As I cross my toes in the hopes I don't jinx it. (Rosetta WU has been going 23 minutes without hitch)

I am going to the doctor. I hope I am in good mood when I return. And that Boinc Manager keeps me in good mood.
This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 16837 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thor[Free-DC]

Send message
Joined: 24 Oct 05
Posts: 2
Credit: 354,251
RAC: 0
Message 16848 - Posted: 22 May 2006, 17:31:58 UTC - in response to Message 16795.  

Thanks Mod9 for the quick reply

But thats not what I noticed. The first 5.16 unit(s) I processed didn't show checkpoints every ~20min

The one I'm currently working on seems to behave nicely in the sugested way. Maybe it was a glitch in the first 5.16 units and nobody else noticed...

I'll keep an eye on it and report back if I notice anything unusual.

20min checkpoint intervals is fine with me. I can live with that.

Thor

This ist not really a bug, but it is bugging me:

The new work units seem to have only very few "saving points"

Which means, you put half an hour or even an hour of crunching in, shut down the computer for some reason and when you get back to runching, you have to start over again..

I had this happen at least three times, so I wonder if there is any possibility to put more save spots in the WUs for the crunchers who are not running 24/7 ???

Greets Thor[Free-DC]

You are already using the version that has had checkpoints added. Originally the checkpoints only were done at the end of a full model. Now they are every ~20 min.

There will be a better way to tell when checkpoints occur in future versions, but they cannot add more checkpoints. This is just a limitation of the nature of the work at this time. It should only be falling back to the last checkpoint, not starting over. Unless of course you are shutting down when the percent complete is 1.04x%, then it will start over form the start because it has not check-pointed yet.


ID: 16848 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile duanra

Send message
Joined: 12 Feb 06
Posts: 8
Credit: 36,223
RAC: 0
Message 16851 - Posted: 22 May 2006, 17:45:58 UTC

Hello!
I've got a problem with the graphics of the rosetta application.
Each time I let the graphics' window open more than a minute or so, the graphics stop and my screen go black.
Then the screen reopens again and I have to quickly close the graphics' window or it continues all the time.
Any idea of what might cause this "crash"?

Thank you.
Duanra

PS: my PC is a 1.73 GHz Intel Centrino with 1 Gb RAM, Windows XP SP2 and an ATI Mobility Radeon Graphics Card
ID: 16851 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 16858 - Posted: 22 May 2006, 18:56:26 UTC

I've been running Panda Titanium 2006 with its firewall (behind a router) - and I don't get 107 errors.

How about users of ZoneAlarm.. any others of them getting 107 errors?
ID: 16858 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 16861 - Posted: 22 May 2006, 20:05:59 UTC
Last modified: 22 May 2006, 20:10:01 UTC

ARGH!!!!!!!!!!!!

When I came back the computer had gone nuts . Only after detaching from Rosetta and after trying unsuccesfully to abort the wu in an attempt to rescue the results . The WU is now a lost/phantom unit Was I able to regain some control over the computer.

ALAS T0285_FACONTACTS_hom001_521_10640_0 Had some decoys produced. Does this means Those decoys are lost?

Weirdness agaim: the exe files of Rosetta and Ralph were not to be found in the task manager I better check what was running on my computer while I was away that caused this.
This and no other is the root from which a Tyrant springs; when he first appears he is a protector.”
Plato
ID: 16861 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 16862 - Posted: 22 May 2006, 20:07:04 UTC - in response to Message 16858.  

I've been running Panda Titanium 2006 with its firewall (behind a router) - and I don't get 107 errors.

How about users of ZoneAlarm.. any others of them getting 107 errors?

I run Zone Alarm on 3 Win XP systems and have no problems.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 16862 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bronwyn

Send message
Joined: 24 Sep 05
Posts: 3
Credit: 20,927
RAC: 0
Message 16863 - Posted: 22 May 2006, 20:17:12 UTC - in response to Message 16790.  

LINUX problem:
I need help with this problem: while running Rosetta on Linux server with PentiumIV HyperThreading processor, Rosetta occasionally hangs in a very strange state: everything is running except Rosetta. Boinc is running. Application on other thread (Simap@home) is running. Just Rosetta isn't.


I had encountered this particular issue back in Jan/Feb-06 (also under Linux). Overall about 5-6 times.

BOINC log would show that boinc restarted Rosetta, but the Rosetta process would just stay "idle" (ps flags were "SN"=sleep,nice consuming no CPU time) for hours/days, until I manually killed it (I guess nowadays the "watchdog" thread will catch it).


I don't think Watchdog can catch it, because whole process is sleeping.. it was in this state for more than 2 days and watchdog didn't catch it.


At the time, I thought it was an issue with Rosetta+BOINC interaction, as I think it happened upon resuming a Rosetta WU (with leave-in-mem=yes). At the time, I also suspected some issue with the system's resources, as that PC had only 256MB RAM and I was running 6 BOINC projects and 100+ processes.


I also have leave-in-mem=yes .. and it can be something with memory, as this is primarily webserver and it has only 1GB RAM so it can be low on RAM from time to time..


It COULD have been a faulty WU, but when I ran that WU with rosetta commandline outside BOINC and it completed fine.


No it wasn't faulty WU. After restarting boinc, both WUs were completed successfully.


I have been having the same problems with my Dell SC420... The Rosetta application just sleeps (watching in BoincView I have a 0.00 cpu efficiency, and when I "top" I see the Rosetta apps in memory, but 0% cpu usage). I have gotten this quite frequently while running Rosetta on a Linux box, even through all the different versionings. Any ideas/suggestions from the Mods, Testers or Dev's?



"Fiction reveals truth that reality obscures."
-Jessamyn West
ID: 16863 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 11 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.16 I



©2024 University of Washington
https://www.bakerlab.org