Report Problems with Rosetta Version 5.22

Message boards : Number crunching : Report Problems with Rosetta Version 5.22

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 18400 - Posted: 10 Jun 2006, 18:31:43 UTC - in response to Message 18399.  
Last modified: 10 Jun 2006, 18:33:08 UTC


What is accepted practice if it hangs up again, please? Do I wait for some watchdog abort, or manually abort it? I don't really care about credits, I'll take whatever action provides the best feedback about the failure. Thanks!


Wait and don't abort. It will finish after "completion" in maximum an hour. Rosetta waits for the watchdog to shut down. It was something introduced in 5.19 for better debugging but reported over at RALPH and supposedly fixed in 5.22. It is very good that you report this here.
If you happen to observe this again please check whether the graphics show 100% as well or something lower and make a screenshot from the graphics window in "idling" state.
ID: 18400 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 18405 - Posted: 10 Jun 2006, 19:27:14 UTC
Last modified: 10 Jun 2006, 19:27:26 UTC

The Fatal Winows Error Bug is still with us, I'm afraid. wuid=19791659

Result ID 23483927
Name t309__CASP7_ABRELAX_SAVE_ALL_OUT_nohistag_hom001__661_7645_0
Workunit 19791659
Created 9 Jun 2006 11:23:04 UTC
Sent 9 Jun 2006 12:59:31 UTC
Received 10 Jun 2006 19:23:54 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 212252
Report deadline 16 Jun 2006 12:59:31 UTC
CPU time 28426.171875
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 1655106

</stderr_txt>


Validate state Invalid
Claimed credit 109.225790694355
Granted credit 0
application version 5.22
ID: 18405 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 18409 - Posted: 10 Jun 2006, 20:30:57 UTC - in response to Message 18327.  
Last modified: 10 Jun 2006, 20:34:36 UTC

Hi Alan:

Thanks for reporting. There seem to be numerous little issues with the screensaver, and we've been trying to track them down one-by-one over on the test project, ralph. But I haven't seen a lot of problems like the one you describe -- has it happened in previous work units before this double batch? I wonder if something went haywire with the core boinc application -- you may need to restart.

Hello Moderator9

I pulled a client error on Work Unit 19677012 (the result is at https://boinc.bakerlab.org/rosetta/result.php?resultid=23357465), with exit code -1. Client messages for the error began with, "rosetta not responding to screensaver, exiting"

Then again on WU 19684873 (result link https://boinc.bakerlab.org/rosetta/result.php?resultid=23366061). Exit code was again -1, client-side messages once again report, "rosetta notresponding to screensaver, exiting".

The first was a 5.16 client, the second was with a 5.22. I'm not sure the screensaver bug is completely dead yet?

Cheers,
Alan



ID: 18409 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 18410 - Posted: 10 Jun 2006, 20:37:36 UTC - in response to Message 18405.  

Hi mmciastro... yes, we know its still there. You might be happy to know that the error -1073741811 (0xc000000d) is currently number one on our lists of things to kill. Its been the most common error for a while, but Rom only now has come up with a hypothesis for what the cause might be. He has just put in some extra debugging stuff on ralph to track it down -- maybe that will let us unravel this puzzle!

The Fatal Winows Error Bug is still with us, I'm afraid. wuid=19791659

Result ID 23483927
Name t309__CASP7_ABRELAX_SAVE_ALL_OUT_nohistag_hom001__661_7645_0
Workunit 19791659
Created 9 Jun 2006 11:23:04 UTC
Sent 9 Jun 2006 12:59:31 UTC
Received 10 Jun 2006 19:23:54 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -1073741811 (0xc000000d)
Computer ID 212252
Report deadline 16 Jun 2006 12:59:31 UTC
CPU time 28426.171875
stderr out <core_client_version>5.4.9</core_client_version>
<message>
- exit code -1073741811 (0xc000000d)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 1655106

</stderr_txt>


Validate state Invalid
Claimed credit 109.225790694355
Granted credit 0
application version 5.22


ID: 18410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 18411 - Posted: 10 Jun 2006, 20:54:32 UTC - in response to Message 18410.  

Hi mmciastro... yes, we know its still there.

It's only a problem for certain video cards, .net, whatever it is. If regular users who get this, turn OFF the screensaver, they'll never see it until it's fixed. I'm in direct communication with Rom on this bug, just FYI. I happen to have a machine, that regularly gets this error (lucky me, and I guess, lucky Rosetta/Rom}.

tony
ID: 18411 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Alan Roberts

Send message
Joined: 7 Jun 06
Posts: 61
Credit: 6,901,926
RAC: 0
Message 18444 - Posted: 11 Jun 2006, 4:01:01 UTC - in response to Message 18409.  

Hi Alan:

Thanks for reporting. There seem to be numerous little issues with the screensaver, and we've been trying to track them down one-by-one over on the test project, ralph. But I haven't seen a lot of problems like the one you describe -- has it happened in previous work units before this double batch? I wonder if something went haywire with the core boinc application -- you may need to restart.



Rhiju,

Work units before the failures and after were completed based on a look at my results. I may have pulled a boinc restart somewhere in there ...

I'm pitching this as an employee contribution/team-effort project at one of my customer sites, and the three of us who are the test cases have been grabbing our volunteer minutes here and there getting our sample desktops running, to demonstrate safety (at least lack of harm and impact on the "real work") and stability.

When I saw the comments about screen saver issues on the forum and noticed my failures, I may have restarted boinc in a quick-and-dirty quest for a fix.

I set the test machines to not use the screen saver over the weekend, but if there is a better procedure for providing debugging information (i.e., "run the screensaver and do the following if you get another error"), please let me know.

Cheers,
Alan

ID: 18444 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RWIoffice

Send message
Joined: 7 Jun 06
Posts: 4
Credit: 37,344
RAC: 0
Message 18465 - Posted: 11 Jun 2006, 14:56:18 UTC

Screensaver or possibly some other flavor of "completion" problem with a t299__CASP7 work unit.

I noticed lack of completion progress from home this morning. When I got to the office, the screensaver was sitting on "Model 9, Step 0."

Once I got past the screensaver, BOINC Manager was reporting the work unit as "Running" and "100%" for Progress. BOINC Manager would not display the graphics.

Shutdown of BOINC Manager seemed to take a long time, but it finally happened. I rebooted the system. Once BOINC Manager launched it reported the status on this work unit as completed, "Ready to Report" and started the next work unit.

I forced an update so the result would be available prior to my posting this report.

Keeping in mind that I'm a newbie and could easily be misinterpreting, the result seems to be referring to only 8 models, so the screensaver graphic's Model 9 reference doesn't make sense to me.

ID: 18465 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
tralala

Send message
Joined: 8 Apr 06
Posts: 376
Credit: 581,806
RAC: 0
Message 18473 - Posted: 11 Jun 2006, 18:46:15 UTC - in response to Message 18465.  

Screensaver or possibly some other flavor of "completion" problem with a t299__CASP7 work unit.

I noticed lack of completion progress from home this morning. When I got to the office, the screensaver was sitting on "Model 9, Step 0."

Once I got past the screensaver, BOINC Manager was reporting the work unit as "Running" and "100%" for Progress. BOINC Manager would not display the graphics.

Shutdown of BOINC Manager seemed to take a long time, but it finally happened. I rebooted the system. Once BOINC Manager launched it reported the status on this work unit as completed, "Ready to Report" and started the next work unit.

I forced an update so the result would be available prior to my posting this report.

Keeping in mind that I'm a newbie and could easily be misinterpreting, the result seems to be referring to only 8 models, so the screensaver graphic's Model 9 reference doesn't make sense to me.



This is related to a bug report on Ralph. The behaviour is exactly the same. It was supposed to be fixed in 5.22 obviously it is not.
ID: 18473 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Craig Miller

Send message
Joined: 5 Jun 06
Posts: 1
Credit: 241,534
RAC: 0
Message 18482 - Posted: 11 Jun 2006, 23:29:23 UTC

I am having a problem running Rosetta. I attach to Rosetta using BOINC manager, and receive the notice of a successfull attachment. When I look at BOINC manager it shows Rosetta running, while Einstein and SETI are suspended. But when I come back several hours later Rosseta is not present, either in Projects or Tasks. When I look at the messages they seem to show Rosetta being loaded and started, but then it ends with: Detaching from project, shown below.

-------------
11-Jun-06 12:41:20|rosetta@home|Starting task t314__CASP7_ABRELAX_SAVE_ALL_OUT_hom002__666_13970_1 using rosetta version 522
11-Jun-06 12:49:01||Contacting account manager at http://bam.boincstats.com/
11-Jun-06 12:49:03||Account manager: BAM Host-ID: 2098
11-Jun-06 12:49:03||Account manager contact succeeded
11-Jun-06 12:49:03|rosetta@home|Resetting project
11-Jun-06 12:49:04||Rescheduling CPU: exit_tasks
11-Jun-06 12:49:04|rosetta@home|Detaching from project


When I check with BAM, my resources are shown as Einstein, SETU, and Rosetta.

What could be causing this problem?

Caig Miller

ID: 18482 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dogbytes
Avatar

Send message
Joined: 4 Dec 05
Posts: 37
Credit: 207,563
RAC: 0
Message 18483 - Posted: 12 Jun 2006, 0:17:55 UTC
Last modified: 12 Jun 2006, 0:31:24 UTC

The below linked WU crunched for over 2 hours, and yet was stuck at 0.00%. I aborted the unit because it appeared to be completely hung up, and stopped crunching. I would be interested to know what the problem was...

Aborted 5.22 WU link
ID: 18483 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ian

Send message
Joined: 14 Apr 06
Posts: 29
Credit: 25,252
RAC: 0
Message 18484 - Posted: 12 Jun 2006, 0:50:05 UTC

Here's one from Saturday:

https://boinc.bakerlab.org/rosetta/result.php?resultid=23575087

And one from Friday:

https://boinc.bakerlab.org/rosetta/result.php?resultid=23484615

The only two errors for quite q while.
Ian Cundell, St Albans, UK
ID: 18484 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dogbytes
Avatar

Send message
Joined: 4 Dec 05
Posts: 37
Credit: 207,563
RAC: 0
Message 18485 - Posted: 12 Jun 2006, 1:12:54 UTC

The below linked WU has severe memory leakage...using >275Megs of CPU memory bringing the hosts commit charge to nearly 600Megs. WU was aborted by user.

Aborted WU
ID: 18485 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,154,245
RAC: 1,629
Message 18487 - Posted: 12 Jun 2006, 1:57:59 UTC - in response to Message 18482.  
Last modified: 12 Jun 2006, 1:59:43 UTC

When I check with BAM, my resources are shown as Einstein, SETU, and Rosetta.

What could be causing this problem?


Rosetta's servers were just upgraded to support BAM last week. But it looks like BAM did something, not Rosetta.

Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 18487 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile scsimodo

Send message
Joined: 17 Sep 05
Posts: 93
Credit: 946,359
RAC: 0
Message 18525 - Posted: 12 Jun 2006, 17:52:24 UTC

Had a few WUs crashing when hitting the "show graphics" button. The window popped up, closed immediately and trashed the WU. The Wus are:

WU1
WU2
WU3
WU4

Host list is unhidden, host is a Mac Mini Core Duo, 1,66Ghz, 2GB RAM. Please drop a short notice when I can hide my hosts again...

What's strange is: hitting the "show graphics" button a few minutes before worked perfectly, seems to be a random problem...



ID: 18525 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Billy

Send message
Joined: 29 May 06
Posts: 11
Credit: 1,205,672
RAC: 3,162
Message 18569 - Posted: 13 Jun 2006, 14:12:12 UTC - in response to Message 18196.  
Last modified: 13 Jun 2006, 14:12:54 UTC

I had a work unit processing at about 80% complete and it seemed to be going normally. I suspended the project (as well as Einstein and Seti) and quit Boinc. I shutdown the computer and restarted. When Boinc started again, it reported this work unit as complete and uploaded it. Either it was stuck before or isn't actually complete. I had a similar thing happen a couple of days ago and it also reported work units complete even though the completion times were unusually short.

https://boinc.bakerlab.org/rosetta/result.php?resultid=23946621

iMac Core Duo, Rosetta version 5.22
ID: 18569 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stwato

Send message
Joined: 11 Jan 06
Posts: 150
Credit: 655,634
RAC: 0
Message 18570 - Posted: 13 Jun 2006, 14:44:02 UTC
Last modified: 13 Jun 2006, 14:44:25 UTC

This work unit looks very strange in the graphics. Parts of the protein do not appear to be connected to the rest of it. It's like some of its missing. The protein appears very small. I first noticed the problem when I saw that the work unit is only using 20Mb memory and ~70Mb virtual but has nearly 9 million page faults. Assuming it was a small protein I took a look at the graphics and noticed the disjointed parts. Any ideas? I'll let it continue for now (5 hours into an 8 hour work unit)

If you would like me to screen shot it please let me know the best dimensions for posting into the forum as I have nowhere to upload it to.

Cheers
Stwato
ID: 18570 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 18572 - Posted: 13 Jun 2006, 16:04:07 UTC

After patting myself on the back for so many successful WU's, I get the following error on this unit:

6/12/2006 11:05:25 PM|rosetta@home|Unrecoverable error for result t306__CASP7_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_hom001__680_902_0 ( - exit code -1073741811 (0xc000000d))

This could be a v5.22, BOINC 5.4.9 or a conflict when checking mail with Mozilla Thunderbird.

Win XP Home Service Pack 2

Mozilla Firefox/Thunderbird combo.

Computers are visible and BOINC 5.4.9 should be debug reporting.

Ignore the Linux Computer as mine is a dual booter.
ID: 18572 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Snake Doctor
Avatar

Send message
Joined: 17 Sep 05
Posts: 182
Credit: 6,401,938
RAC: 0
Message 18580 - Posted: 13 Jun 2006, 17:57:34 UTC - in response to Message 18570.  

This work unit looks very strange in the graphics. Parts of the protein do not appear to be connected to the rest of it. It's like some of its missing. The protein appears very small. I first noticed the problem when I saw that the work unit is only using 20Mb memory and ~70Mb virtual but has nearly 9 million page faults. Assuming it was a small protein I took a look at the graphics and noticed the disjointed parts. Any ideas? I'll let it continue for now (5 hours into an 8 hour work unit)

If you would like me to screen shot it please let me know the best dimensions for posting into the forum as I have nowhere to upload it to.

Cheers
Stwato

this is a known problem with a few of the processing techniques being used. Not all the work units are using the same processing approach. In some cases they are only looking at parts of the protein structure and that somehow affects the display.

We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
ID: 18580 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,003,213
RAC: 0
Message 18581 - Posted: 13 Jun 2006, 18:12:31 UTC

ID: 18581 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TCU Computer Science

Send message
Joined: 7 Dec 05
Posts: 28
Credit: 12,861,977
RAC: 0
Message 18612 - Posted: 14 Jun 2006, 3:21:02 UTC

rosetta 5.22
WU Name: t314__CASP7_ABRELAX_SAVE_ALL_OUT_hom004__666_16529_0
running on Mac OS 10.4.6

BOINC Manager Tasks tab shows CPU Time stuck at 01:30:40 and 15%
top command shows TIME = 28:53:41 and climbing

stopped and restarted BOINC
CPU Time reverted to 01:13:00 and 15% but no longer stuck

Symptoms are identical to my post for ralph 5.18
ID: 18612 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Report Problems with Rosetta Version 5.22



©2021 University of Washington
https://www.bakerlab.org