Problems with Rosetta version 5.98

Message boards : Number crunching : Problems with Rosetta version 5.98

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
Profile (_KoDAk_)

Send message
Joined: 18 Jul 06
Posts: 109
Credit: 1,859,263
RAC: 0
Message 59648 - Posted: 18 Feb 2009, 9:49:57 UTC
Last modified: 18 Feb 2009, 9:50:25 UTC

https://boinc.bakerlab.org/rosetta/result.php?resultid=228610191
<core_client_version>6.6.5</core_client_version><![CDATA[<message>
- exit code -1073741819 (0xc0000005)
</message><stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 3595866
ID: 59648 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59687 - Posted: 20 Feb 2009, 16:51:53 UTC

Exceptions at various addresses on two different computers. ONly one of these tasks had a wingman but that person also had an exception death.

230043321 0x008BB97B read attempt to address 0x060A0000
230072124 0x008BB9A5 read attempt to address 0x0BA5A000
229964104 0x008BB97B read attempt to address 0x0A082000
229979933 0x008BB92B read attempt to address 0x0BB38000
230017643 0x008BB92B read attempt to address 0x0A23C000
ID: 59687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 173
Message 59691 - Posted: 20 Feb 2009, 20:48:21 UTC

resultid=230017087 Wing man still to return result.
resultid=230017086 Wing man also crashed

Have a crunching good day!!
ID: 59691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59695 - Posted: 21 Feb 2009, 1:20:23 UTC
Last modified: 21 Feb 2009, 1:23:51 UTC

Another crash like the ones below 230072124 and 230089446
ID: 59695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59697 - Posted: 21 Feb 2009, 6:10:08 UTC

And more crashes ... you know, this was the reason I stopped RaH a couple years ago ... non-stop errors ... and some of these tasks are wasting in the region of 10-20 minutes ...

230171238
230225023
ID: 59697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JStateson
Avatar

Send message
Joined: 7 May 07
Posts: 15
Credit: 4,061,331
RAC: 0
Message 59702 - Posted: 21 Feb 2009, 13:56:47 UTC

bunch of faults, maybe 5, windows xp, vista 64, etc. Started about 3 days ago. These faults are the type that show up over the desktop and ask if microsoft should be informed.

from xp event log
Faulting application rosetta_beta_5.98_windows_intelx86.exe, version 0.0.0.0, faulting module rosetta_beta_5.98_windows_intelx86.exe, version 0.0.0.0, fault address 0x0084fcf3.


ID: 59702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59707 - Posted: 21 Feb 2009, 17:47:35 UTC

Yeah, That is what I am getting too ... I keep forgetting to check to see if that pop-up is locking the CPU meaning until you dismiss it you lost the CPU ... if you get it again, let us know ... I keep making a mental note to check and keep forgetting ...

Today's list:
230306967 0x008BB92B read attempt to address 0x0BECC000
230288293 0x008BB955 read attempt to address 0x0B6D3000
230192144 0x008BB9A5 read attempt to address 0x0DD3F000
230089446 0x008BB97B read attempt to address 0x0C06A000
230018563 0x008BB92B read attempt to address 0x09D7A000


ID: 59707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59744 - Posted: 23 Feb 2009, 5:30:45 UTC

Only one today:

230534122 0x008BB955 read attempt to address 0x0C07C000
ID: 59744 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TomaszPawel

Send message
Joined: 28 Apr 07
Posts: 54
Credit: 2,791,145
RAC: 0
Message 59759 - Posted: 23 Feb 2009, 15:19:16 UTC

Yeah, That is what I am getting too ... I keep forgetting to check
to see if that pop-up is locking the CPU meaning until you dismiss it you
lost the CPU ... if you get it again, let us know ... I keep making a
mental note to check and keep forgetting ...

The same to me, on my Quad, 3 days 2 cores was doing nothing until today I
press "don't send".....

Here are:

https://boinc.bakerlab.org/rosetta/result.php?resultid=230037349

https://boinc.bakerlab.org/rosetta/result.php?resultid=230035649

https://boinc.bakerlab.org/rosetta/result.php?resultid=230035633

https://boinc.bakerlab.org/rosetta/result.php?resultid=230035617

so it is 2p64...
ID: 59759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5658
Credit: 5,670,291
RAC: 2,328
Message 59772 - Posted: 24 Feb 2009, 8:22:12 UTC

2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_18636_1
Exit status -1073741819 (0xc0000005)
CPU time 207.6563
# random seed: 2816709
Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x008BB955 read attempt to address 0x1383C000
ID: 59772 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59777 - Posted: 24 Feb 2009, 17:19:21 UTC

231014076 0x008BB97B read attempt to address 0x0A7ED000

ID: 59777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yaroslav Isakov

Send message
Joined: 2 Nov 07
Posts: 11
Credit: 98,027
RAC: 0
Message 59789 - Posted: 25 Feb 2009, 2:01:35 UTC
Last modified: 25 Feb 2009, 2:03:05 UTC

Very strange result:
https://boinc.bakerlab.org/rosetta/result.php?resultid=230979688
and respective WU:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=208222634
Why canonical result do 108 decoys and mine do 12 decoys? And why I got 'Workunit error - check skipped'? And why there are three results?
ID: 59789 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59792 - Posted: 25 Feb 2009, 5:38:30 UTC - in response to Message 59789.  

Very strange result:
https://boinc.bakerlab.org/rosetta/result.php?resultid=230979688
and respective WU:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=208222634
Why canonical result do 108 decoys and mine do 12 decoys? And why I got 'Workunit error - check skipped'? And why there are three results?


The difference is in the:

# cpu_run_time_pref: 86400

yours is 10,800 ...

8 times more runtime gets more decoys ...
ID: 59792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yaroslav Isakov

Send message
Joined: 2 Nov 07
Posts: 11
Credit: 98,027
RAC: 0
Message 59794 - Posted: 25 Feb 2009, 10:32:10 UTC - in response to Message 59792.  

Very strange result:
https://boinc.bakerlab.org/rosetta/result.php?resultid=230979688
and respective WU:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=208222634
Why canonical result do 108 decoys and mine do 12 decoys? And why I got 'Workunit error - check skipped'? And why there are three results?


The difference is in the:

# cpu_run_time_pref: 86400

yours is 10,800 ...

8 times more runtime gets more decoys ...


Ok, but why more runtime?
ID: 59794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 59799 - Posted: 25 Feb 2009, 17:11:12 UTC
Last modified: 26 Feb 2009, 15:04:52 UTC

With Rosetta@home, you can configure a runtime preference. This is done via the "[Participants]" link at the top of this forum page. And then click on the Rosetta preferences. The result of increasing the runtime is less bandwidth and scheduler requests on the server, and then the tasks just run more models until the target runtime is approached.

The particular WU you show suffers from the dreaded bug in the BOINC server code where a result is accepted after the deadline is reached but the task has already been reissued. There has been a BOINC trac item open to fix this for over a year.
http://boinc.berkeley.edu/trac/ticket/276
Rosetta Moderator: Mod.Sense
ID: 59799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yaroslav Isakov

Send message
Joined: 2 Nov 07
Posts: 11
Credit: 98,027
RAC: 0
Message 59803 - Posted: 25 Feb 2009, 19:37:41 UTC - in response to Message 59799.  

With Rosetta@home, you can configure a runtime preference. This is done via the "[Participants]" link at the top of this forum page. And then click on the Rosetta preferences. The result of increasing the runtime is less bandwidth and scheduler requests on the server, and then the tasks just run more models until the target runtime is approached.

The particular WU you show suffers from the dreaded bug in the BOINC server code where a result if accepted after the deadline is reached but the task has already been reissued. There has been a BOINC trac item open to fix this for over a year.
http://boinc.berkeley.edu/trac/ticket/276


Thank you for explanation!
ID: 59803 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JStateson
Avatar

Send message
Joined: 7 May 07
Posts: 15
Credit: 4,061,331
RAC: 0
Message 59810 - Posted: 26 Feb 2009, 8:21:01 UTC

2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_17767_0
got %5 done before faulting. I went and aborted it as it was hung after the app faulted.

Faulting application rosetta_beta_5.98_windows_intelx86.exe, version 0.0.0.0, faulting module rosetta_beta_5.98_windows_intelx86.exe, version 0.0.0.0, fault address 0x0084fcf3.

====

I monitor my farm using boincview and when it shows a yellow background I know there is a problem. When I logged in remotely, I was greeted with a fault report and was asked if I wanted to report it to microsoft.

There have been 3 separate rosetta app faults since Feb 21. My event log goes back to 10/23/08 and there are no other app faults. Milkyway and Poem are the other 2 boinc apps besides rosetta.

Windows Xp, sp3
BM 6.2.18
Dual MP 2800
ID: 59810 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ramostol

Send message
Joined: 6 Feb 07
Posts: 64
Credit: 584,052
RAC: 0
Message 59990 - Posted: 6 Mar 2009, 10:41:15 UTC - in response to Message 59799.  


The particular WU you show suffers from the dreaded bug in the BOINC server code where a result is accepted after the deadline is reached but the task has already been reissued. There has been a BOINC trac item open to fix this for over a year.
http://boinc.berkeley.edu/trac/ticket/276


I am still of the opinion that this is not at all a bug to be dreaded. All parties involved still seem to be compensated with appropriate credits. Interested parties might like to have a look at the following wus (while they last):

2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_8156

2 "Compute error", 1 subsequent success after "Too many error results Too many total results". Which somehow proves a point.


2p64__BOINC_SYMM_FOLD_AND_DOCK_RELAX-2p64_-native_frag2__7622_13799

1 "No reply", 1 "Compute error", after "Too many total results" 1 apparent success with no local error messages, but the result file shows the wu erroring out also with the third cruncher:
CPU time 11808.83
stderr out

<core_client_version>6.6.12</core_client_version> [[but computed with 6.6.11]]
<![CDATA[
<stderr_txt>
Rosetta@home Macintosh Stack Size checker.
Original size: 0.
Maximum size: 8388608.
RLIM_INFINITY 0
shell-init: could not get current directory: getcwd: cannot access parent directories: Permission denied [[normal error message in my situation]]
# cpu_run_time_pref: 21600
# random seed: 2821546
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -147.965 for 900 seconds
**********************************************************************
GZIP SILENT FILE: ./xx2p64.out

</stderr_txt>
]]>

Validate state Valid


ID: 59990 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
RamonS

Send message
Joined: 19 Jun 08
Posts: 3
Credit: 13,055,316
RAC: 8,110
Message 60018 - Posted: 7 Mar 2009, 23:30:00 UTC

Also now getting errors with rosetta_beta_5.98_windows_intelx86, but only after rebooting Windoze. Running BOINC on Server 2003 32bit. So far it crashed three times in a row and appears to have given up. Sent error reports to MS.
I looked at the reports, but the gibberish in there doesn't tell me a thing. Let me know if you need more specific info other than "it crashed and seems to be broken". :)
ID: 60018 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60021 - Posted: 8 Mar 2009, 3:08:03 UTC

RamonS, if you could post a link to the task that failed, that would be great.
Rosetta Moderator: Mod.Sense
ID: 60021 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Problems with Rosetta version 5.98



©2024 University of Washington
https://www.bakerlab.org