Discuss Rosetta Application Errors and Fixes (all Vers)

Message boards : Number crunching : Discuss Rosetta Application Errors and Fixes (all Vers)

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15092 - Posted: 30 Apr 2006, 18:18:16 UTC - in response to Message 15087.  

I've started getting lockups on my x2 3800+ system over the last week; I think it's a RAM problem as it has got worse and worse over time, I can barely get 20 minutes loaded out of it now w/o a lockup. Running dual prime seems to stress the system less than Rosetta, certainly CPU temps are 2 or 3 degrees lower. So I've detached the system and am priming again, trying to nail exactly what's wrong.


Wrong thread but you can try to relax the RAM timings a bit and see if it helps. If you do Prime do the Blend test which tests RAM extensively and use Memtest86.
ID: 15092 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15093 - Posted: 30 Apr 2006, 18:21:37 UTC - in response to Message 15090.  
Last modified: 30 Apr 2006, 18:24:06 UTC

Is there something grossly different about 5.06 and 5.07, or are the current WUs just more demanding? Prime is still running after 90 minutes with lower temps than Rosetta, which seems to tell me that Rosetta has become extremely tough on the hardware. I know it's difficult to quantify, but is this acknowledged by those that know more about it?

NB: this seems to be the wrong thread for my comments, but it's possible WU problems are going to be mistook for hardware problems if the WUs really have got tougher. Where shall I take this?

The application is actually more efficient than earlier releases. But the Work Units are significantly larger and there is a lot more motion in the graphics displays (works the display cards a little harder).

The larger Work Units take longer to create a model. If you are hyper-threading, you may want to turn that off. For more about this see these threads -

Application Information
Work Unit information

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15093 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 15305 - Posted: 2 May 2006, 15:52:05 UTC

Moderator9: Jim's "x2 3800+ system" is an AMD dual core cpu. While HT may be turned off on some Intel motherboards, with the dual core Athlon 64s and Opterons, once you've got the correct bios and OS HAL in use, you've got two cpus on the same socket. For the dual core cpus, if you want him to limit it to just one cpu core - you'll have to mention how to change Boinc/Rosetta to only use 1 cpu.

Jim: Feel free to open up a new thread; but can you provide links to the failing WUs? Give the amount and type of Ram? (Did it pass the Memtest86+ tests?) (are you using more ram than you physically have, or are the new WUs hitting bad memory locations that earlier ones didn't?), what temp is the cpu running at? (is the added heat getting the cpu near where it can cause the lockups?) Have you verified that the system is spyware/virus/trojan free by running at least 4 anti spyware scans 2 anti virus scans and ewido&Trojan Hunter for trojans? (Do you have software on your system that may conflict with Boinc/Rosetta and cause lockups?) How is Boinc setup and running on your machine? i.e. is a it a service install?
ID: 15305 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jimi@0wned.org.uk

Send message
Joined: 10 Mar 06
Posts: 29
Credit: 335,252
RAC: 0
Message 15311 - Posted: 2 May 2006, 18:25:26 UTC
Last modified: 2 May 2006, 18:25:57 UTC

Thanks for the input BennyRop. I detached the project and gave the system a good test again (memtest, dual prime), finally raised the vcore from 1.39v to 1.42v and dual primed for 6 hours, everything is ok once more. Tried 1.39v to check in Rosetta, locked up in 4 minutes. Why it ran pefectly well for nearly 4 weeks at lower vcore and then turned nasty is a mystery. It has been running fine since yesterday afternoon at the higher voltage.

It is a bare build that only does Rosetta. I'm tempted to go Linux 64-bit on it tbh.
ID: 15311 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bogdan Kosanovic

Send message
Joined: 17 Jan 06
Posts: 3
Credit: 1,212
RAC: 0
Message 15437 - Posted: 3 May 2006, 20:57:15 UTC

I don't know what was done in 5.07, but it doesn't want to stop working. I scheduled 50% for Rosetta and 50% for SETI. When Rosetta is supposed to be preempted, it doesn't stop. It keeps running in parallel with SETI. Also, when it runs, I almost can't do anything else on my laptop. It did not work like this before (a few weeks ago). When I try to suspend it, I can't. Although it shows as being suspended, I can still see the process taking 80-90% of CPU time. It completely brings my system down and I have to manually kill the process in order to get it to stop...

I had to install XP Service Pack 2 as well 2 weeks ago. Would it be related to that?

Currently I had to suspend Rosetta since it doesn't behave nicely... I may resume some time in future when you get it back to normal...

Regards,
Bogdan
ID: 15437 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15452 - Posted: 3 May 2006, 22:49:49 UTC - in response to Message 15437.  

I don't know what was done in 5.07, but it doesn't want to stop working.


The difference you are seeing is more likely due to the specific WU you were crunching that the application version. And as for it continuing to run after you suspend it, I've seen this before as well, the BOINC Manager basically loses contact with the processes that are doing the crunching (regardless of what project those processes are crunching for). I find that if I restart BOINC, I regain control over what is running and what is suspended.

If you have not already done so, you may want to change your General Preference to NOT run while you are using your computer. That should help avoid conflicts with your work. You might also configure it to only run during specific times of day (when you are not using your computer).

If you have further problems or questions, I'd suggest creating a thread in the Q&A board for Windows, because this doesn't sound like something v5.07 would have caused.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15452 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bogdan Kosanovic

Send message
Joined: 17 Jan 06
Posts: 3
Credit: 1,212
RAC: 0
Message 15505 - Posted: 4 May 2006, 14:44:41 UTC - in response to Message 15452.  

I don't know what was done in 5.07, but it doesn't want to stop working.


The difference you are seeing is more likely due to the specific WU you were crunching that the application version. And as for it continuing to run after you suspend it, I've seen this before as well, the BOINC Manager basically loses contact with the processes that are doing the crunching (regardless of what project those processes are crunching for). I find that if I restart BOINC, I regain control over what is running and what is suspended.


I don't think it has to do with BOINC. BOINC can suspend SETI without any problems at the same time when Rosetta runs like a "virus" :) ignoring any suspend atemtpts. It did not behave like this 2-3 weeks ago. This is something recent. WU's should not have much to do with ability to suspend the process.

Restarting BOINC might be a workaround (I'll try it), but I think new version of the process has some issues it did not have before :)

I'll have to remove Rosetta from my laptop since there is no other time I can run it under present conditions. Currently, the priority setting "feels" almost like it is ignored for Rosetta process. It competes with other Windows applications as if it was created with the same or higher priority.

Regards,
Bogdan
ID: 15505 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bogdan Kosanovic

Send message
Joined: 17 Jan 06
Posts: 3
Credit: 1,212
RAC: 0
Message 15506 - Posted: 4 May 2006, 14:51:38 UTC - in response to Message 15505.  
Last modified: 4 May 2006, 14:52:04 UTC

I don't know what was done in 5.07, but it doesn't want to stop working.


The difference you are seeing is more likely due to the specific WU you were crunching that the application version. And as for it continuing to run after you suspend it, I've seen this before as well, the BOINC Manager basically loses contact with the processes that are doing the crunching (regardless of what project those processes are crunching for). I find that if I restart BOINC, I regain control over what is running and what is suspended.


I don't think it has to do with BOINC. BOINC can suspend SETI without any problems at the same time when Rosetta runs like a "virus" :) ignoring any suspend atemtpts.


Just tried another WU. I played with suspend/resume. What happens is that for Rosetta, when you suspend the process it does not exit. It still sits in memory, not using any CPU. For SETI it exits and does not keep using virtual memory. I'm guessing that at some point something may go wrong and in addition to sitting in memory after being suspended, it would keep using it, i.e. running...

Regards,
Bogdan

ID: 15506 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 15507 - Posted: 4 May 2006, 15:15:11 UTC - in response to Message 15506.  

I don't know what was done in 5.07, but it doesn't want to stop working.


The difference you are seeing is more likely due to the specific WU you were crunching that the application version. And as for it continuing to run after you suspend it, I've seen this before as well, the BOINC Manager basically loses contact with the processes that are doing the crunching (regardless of what project those processes are crunching for). I find that if I restart BOINC, I regain control over what is running and what is suspended.


I don't think it has to do with BOINC. BOINC can suspend SETI without any problems at the same time when Rosetta runs like a "virus" :) ignoring any suspend atemtpts.


Just tried another WU. I played with suspend/resume. What happens is that for Rosetta, when you suspend the process it does not exit. It still sits in memory, not using any CPU. For SETI it exits and does not keep using virtual memory. I'm guessing that at some point something may go wrong and in addition to sitting in memory after being suspended, it would keep using it, i.e. running...

Regards,
Bogdan



Actually there is a setting in your general preferences whether suspended (preempted) application should stay in memory or not. When exited some computer time is lost since after a restart the app starts from the latest checkpoint, so leaving in memory is recommended. The process gets out of RAM to the swap file if the RAM is neeed and takes no CPU time while it's suspended.

What is strange is that the general setting for leaving in memory or not is the same for all BOINC projects. So it should behave identical for Seti and Rosetta. Can you check what you have in your general settings and report back?
ID: 15507 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15514 - Posted: 4 May 2006, 16:07:30 UTC - in response to Message 15505.  

I don't know what was done in 5.07, but it doesn't want to stop working.


The difference you are seeing is more likely due to the specific WU you were crunching that the application version.


I don't think it has to do with BOINC. BOINC can suspend SETI without any problems at the same time when Rosetta runs like a "virus" :) ignoring any suspend atemtpts. It did not behave like this 2-3 weeks ago. This is something recent. WU's should not have much to do with ability to suspend the process.

Currently, the priority setting "feels" almost like it is ignored for Rosetta process. It competes with other Windows applications as if it was created with the same or higher priority.


My point was that many of the more recent WUs use more memory and do more paging than the WUs of a month ago. I believe it is the memory utilization and paging that makes you "fell it" on your PC. Even though the BOINC work runs at the lowest possible priority, once it sends a request to the disk, your click over in MS Word or whatever has to wait for that BOINC disk request to complete before it can proceed to resolve a disk request from your user application. This is why the application is impacted, even though it has a higher priority for CPU time.

Not being able to suspend a task is a second issue, that I believe rests with BOINC, we'll table that one, because it seems less significant to you than being able to make use of your laptop while BOINC is running.

Please review your General Preferences, specifically
"Do work while computer is in use?" -- I think you want NO
"Do work only after computer is idle for xx" -- How many minutes would make it likely you have left your PC for a while? 5? 10?, you may want to increase it. Be sure your BOINC Manager Command tab shows "run based on preferences" for these to take effect.
"Use no more than x% of total virtual memory", you may want to reduce it, say 10% a day until it settles down.

If you have further problems or questions, I'd suggest creating a thread in the Q&A board for Windows, because this doesn't sound like something v5.07 would have caused.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15514 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TioSuper

Send message
Joined: 2 May 06
Posts: 17
Credit: 164
RAC: 0
Message 15607 - Posted: 6 May 2006, 13:20:28 UTC

This question maybe the dumbest question I have asked yet . Pardon my newbiness but is the data produced in those units that resulted "in errors" used by the scientists? In simpler words was the computing time involved in those "erroneous units" lost to science or not?

Be gentle in your answer: I am a newbie watch me look dumb :)
ID: 15607 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15608 - Posted: 6 May 2006, 15:22:42 UTC - in response to Message 15607.  
Last modified: 6 May 2006, 15:26:21 UTC

This question maybe the dumbest question I have asked yet . Pardon my newbiness but is the data produced in those units that resulted "in errors" used by the scientists? In simpler words was the computing time involved in those "erroneous units" lost to science or not?

Be gentle in your answer: I am a newbie watch me look dumb :)

This is not a stupid question, (The only stupid question is the one you don't ask).

The answer is YES. If the work unit produces any models they are valuable in their own right. But, Remember this project is as much about development of the computing techniques to do the modeling as it is about find the structures of the proteins themselves. If you look at the graphics you will usually see a "Native" structure. That is because we already know the structure of that protein. When you see that you know that was is being worked on is the technique and the software design. So any information that comes back, including errors is valuable to the effort.

In the next few days/weeks you will be seeing work units where there is no "Natural" structure shown. These will be for the CASP7 competition. This effort will be comparing different projects approaches to the modeling problem to see which is the best approach. At the end of each of these runs the actual structure will be compared to the results and posted on the website.

Eventually, when the modeling techniques are all worked out, you will begin to see the focus change to figuring out the structures of proteins for which we do not yet know the structure, and that is when the real science of protein prediction can begin.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15608 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 15609 - Posted: 6 May 2006, 15:23:19 UTC - in response to Message 15607.  

Was the computing time involved in those "erroneous units" lost to science or not?

The project values everything they learn. This is why they give credit, even if a WU errors out. They take the perspective that if there is an error, and noone reports it, then we all lose. WITH the report, they can learn why it failed, fix it, and take steps to avoid others encountering problems in the future.

Edison tried more than 1,000 things before he found a filiment for a hypothetical light bulb. Not all of the new science the project tries is going to produce fruitful results, but it adds to the list of things they know doesn't work :)

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 15609 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Henry

Send message
Joined: 25 Oct 05
Posts: 1
Credit: 19,112
RAC: 0
Message 15691 - Posted: 8 May 2006, 21:06:57 UTC

has anyone come across their screensaver freezing up on them? the best i can tell, it only happens when the rosetta is running as a screensaver. rosetta work units hit 100% and then it freezes. im pretty sure it has to do with my intel graphics/video card because i had this same problem with einstein as well. if the workunit doesnt hit 100%, it is fine.
ID: 15691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
senatoralex85

Send message
Joined: 27 Sep 05
Posts: 66
Credit: 169,644
RAC: 0
Message 15702 - Posted: 9 May 2006, 3:08:19 UTC

5/8/2006 10:03:09 PM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
5/8/2006 10:03:09 PM|rosetta@home|Requesting 0 seconds of work, returning 1 results
5/8/2006 10:03:10 PM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
5/8/2006 10:03:32 PM||request_reschedule_cpus: project op
5/8/2006 10:03:35 PM|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
5/8/2006 10:03:35 PM|rosetta@home|Requesting 8640 seconds of work, returning 0 results
5/8/2006 10:03:36 PM|rosetta@home|Scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi succeeded
5/8/2006 10:03:36 PM|rosetta@home|Message from server: Not sending work - last RPC too recent: 25 sec
5/8/2006 10:03:36 PM|rosetta@home|No work from project
5/8/2006 10:03:37 PM|rosetta@home|Deferring communication with project for 4 minutes and 1 seconds

---------------------------------------------------------------------------

Is this a bug? The system would not send me a workunit because the time between BOINC returning a workunit and getting a workunit was to recent. That makes no sense to me. I have to wait to get a workunit after reporting one? HMMMMMMm.

ID: 15702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 15704 - Posted: 9 May 2006, 3:48:46 UTC - in response to Message 15702.  
Last modified: 9 May 2006, 3:52:05 UTC

...Is this a bug? The system would not send me a workunit because the time between BOINC returning a workunit and getting a workunit was to recent. That makes no sense to me. I have to wait to get a workunit after reporting one? HMMMMMMm.

This is normal behavior for BOINC. It has nothing to do with Rosetta. The BOINC manager keeps track of how often you contact a project, and once it makes a contact it schedules the next contact. If you try to force it sooner than the scheduled time you will get the message you describe.

If you watch the information in the projects tab of the BOINC manager after you "update" a project, it will show how long you will have to wait for any downloads to begin.

Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 15704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TioSuper

Send message
Joined: 2 May 06
Posts: 17
Credit: 164
RAC: 0
Message 15726 - Posted: 9 May 2006, 16:52:06 UTC

Have the developers found a solution to the "107" types of errors that keep proping up?
ID: 15726 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
loren

Send message
Joined: 10 Oct 05
Posts: 3
Credit: 2,449,762
RAC: 0
Message 16640 - Posted: 19 May 2006, 15:37:36 UTC

sorry I guess this is in the wrong forum
ID: 16640 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 16661 - Posted: 19 May 2006, 20:55:34 UTC - in response to Message 16637.  
Last modified: 19 May 2006, 21:05:35 UTC

Every time Rosetta finishes a project, I get the error message "5/19/2006 7:15:17 AM|rosetta@home|Unrecoverable error for result t283_HOMOLOG_ABRELAX_hom004__515_13582_0 ( - exit code -1073741811 (0xc000000d))". And there is no credit given for the work done. How do I fix this problem?

All work gets credit. You have to look that the results link not the results list. There is a FAQ on this you can read from the link in my signature below. I will move your post to the problems thread.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 16661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Shawn H. Hall

Send message
Joined: 10 Sep 06
Posts: 6
Credit: 412,088
RAC: 0
Message 32806 - Posted: 17 Dec 2006, 14:05:19 UTC

I have been attached to Rosetta@home for many moons, but as of the last few months I have had a disturbing problem; I never get any tasks--ever. I have tried resetting the whole Rosetta project, I have tried MANY times to schedule a new task (or request a new task--whichever), with no luck. What's wrong?

Here is a sample log. They're all the same:

Sun Dec 17 05:49:48 2006|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
Sun Dec 17 05:49:48 2006|rosetta@home|Reason: Requested by user
Sun Dec 17 05:49:48 2006|rosetta@home|(not requesting new work or reporting completed tasks)
Sun Dec 17 05:49:53 2006|rosetta@home|Scheduler request succeeded
Sun Dec 17 05:59:03 2006||Rescheduling CPU: project reset by user
Sun Dec 17 05:59:03 2006|rosetta@home|Resetting project
Sun Dec 17 05:59:03 2006||Rescheduling CPU: exit_tasks
Sun Dec 17 05:59:19 2006|rosetta@home|Sending scheduler request to https://boinc.bakerlab.org/rosetta_cgi/cgi
Sun Dec 17 05:59:19 2006|rosetta@home|Reason: Requested by user
Sun Dec 17 05:59:19 2006|rosetta@home|(not requesting new work or reporting completed tasks)
Sun Dec 17 05:59:24 2006|rosetta@home|Scheduler request succeeded


Can anyone tell me what the heck is going on?! I'm stumped.
ID: 32806 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Discuss Rosetta Application Errors and Fixes (all Vers)



©2024 University of Washington
https://www.bakerlab.org