Rosetta@home

Minirosetta v1.45 bug thread

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Minirosetta v1.45 bug thread

Sort
AuthorMessage
David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 57612 - Posted 5 Dec 2008 1:52:42 UTC

Please post bugs and issues regarding minirosetta version 1.45.

This update includes fixes to long runtimes for 'relax' jobs, validation errors, check point recovery issues, and numerical instability in hydrogen-bond scoring.

We think we might have fixed the preemption problem so please keep an eye out for this. The "can't acquire lockfile" issue might also be related. If you are having lockfile problems, please make sure there are no other boinc applications running in the same slot. If necessary, turn off the client and make sure all boinc apps are not running, and then restart the client.

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 5,900,527
RAC: 7,817
Message 57625 - Posted 5 Dec 2008 14:52:06 UTC
Last modified: 5 Dec 2008 14:52:37 UTC

Please, please, please post new versions in Rosetta Application Version Release Log. That's how we get notified by email so that we can adjust our firewalls--it's very important.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 57626 - Posted 5 Dec 2008 16:02:43 UTC - in response to Message ID 57625.

Please, please, please post new versions in Rosetta Application Version Release Log. That's how we get notified by email so that we can adjust our firewalls--it's very important.


oh, sorry about that. will do.

ChiTownDale

Joined: Dec 10 05
Posts: 3
ID: 33928
Credit: 57,428
RAC: 0
Message 57639 - Posted 6 Dec 2008 1:11:01 UTC

This problem is getting very frustratung. I run Eosetta as well as nine other BOINC tasks so I try to balance approximate compute times across all but the Climate Prediction task which taks 2000 CPU minutes per task but allows two years to complete each task.
So when a Rosetta task runs away for 18-22 hours of CPU time I end up aborting it since it says a little more than 9 minutes left out of about 6 hours it was estimated to run.

All other BOINC based projects are fairly accurate and none have come close to 3-4 times the initial estimate as these are. Right now I have two Rosetta tasks running and both have hung at a little more than nine min for one and ten for the other.
So until this problem is resolved I have no choice but to suspend all further Rosetta tasks. I feel bad having to abort those 5-6 tasks previously since that is over 100 hours of CPU time wasted with two still suspended. That comes to over 160 hours of CPU time wasted when that amount of time could have completed dozens of tasks for other projects. Event SETI hasn't had a task that took more the 40 hours while the LHCAT tasks only take about two hours each so I could have processed 80 of them with the wasted CPU time from Rosette 1.40 tasks.
So I do hope they can fix what is wrong so I can get back to processing Rosette tasks again.
____________

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 57640 - Posted 6 Dec 2008 1:13:49 UTC - in response to Message ID 57639.

This problem is getting very frustratung. I run Eosetta as well as nine other BOINC tasks so I try to balance approximate compute times across all but the Climate Prediction task which taks 2000 CPU minutes per task but allows two years to complete each task.
So when a Rosetta task runs away for 18-22 hours of CPU time I end up aborting it since it says a little more than 9 minutes left out of about 6 hours it was estimated to run.

All other BOINC based projects are fairly accurate and none have come close to 3-4 times the initial estimate as these are. Right now I have two Rosetta tasks running and both have hung at a little more than nine min for one and ten for the other.
So until this problem is resolved I have no choice but to suspend all further Rosetta tasks. I feel bad having to abort those 5-6 tasks previously since that is over 100 hours of CPU time wasted with two still suspended. That comes to over 160 hours of CPU time wasted when that amount of time could have completed dozens of tasks for other projects. Event SETI hasn't had a task that took more the 40 hours while the LHCAT tasks only take about two hours each so I could have processed 80 of them with the wasted CPU time from Rosette 1.40 tasks.
So I do hope they can fix what is wrong so I can get back to processing Rosette tasks again.


do you still have the names of those problem tasks? can you try our recently updated version and see if you have the same problems?

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3378
ID: 106194
Credit: 0
RAC: 0
Message 57644 - Posted 6 Dec 2008 4:28:21 UTC
Last modified: 6 Dec 2008 4:31:32 UTC

Looks like they were all v1.40 tasks so far.
http://boinc.bakerlab.org/rosetta/results.php?hostid=812687

ChiTownDale have you seen problems like this with v1.45?? It includes changes that should eliminate the long running models, and unpredictable completion times.
____________
Rosetta Moderator: Mod.Sense

JChojnacki Profile
Avatar

Joined: Sep 17 05
Posts: 71
ID: 105
Credit: 6,731,215
RAC: 1,607
Message 57645 - Posted 6 Dec 2008 6:10:02 UTC

This WU failed:
212238609
____________



Guus Gerritsen van der Hoop

Joined: Feb 7 06
Posts: 1
ID: 57292
Credit: 1,018,844
RAC: 0
Message 57646 - Posted 6 Dec 2008 8:38:58 UTC

I run Rosetta on two computers and seem to be unable to get new work due to the following error. What could be the problem?
Gus.

6-12-2008 8:48:22|rosetta@home|Sending scheduler request: To fetch work. Requesting 30240 seconds of work, reporting 0 completed tasks
6-12-2008 8:48:27|rosetta@home|Scheduler request succeeded: got 0 new tasks
6-12-2008 8:48:27|rosetta@home|Message from server: Server error: can't attach shared memory

____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57647 - Posted 6 Dec 2008 9:01:59 UTC - in response to Message ID 57646.

I run Rosetta on two computers and seem to be unable to get new work due to the following error. What could be the problem?
Gus.

6-12-2008 8:48:22|rosetta@home|Sending scheduler request: To fetch work. Requesting 30240 seconds of work, reporting 0 completed tasks
6-12-2008 8:48:27|rosetta@home|Scheduler request succeeded: got 0 new tasks
6-12-2008 8:48:27|rosetta@home|Message from server: Server error: can't attach shared memory



read this thread for more info.

rochester new york Profile
Avatar

Joined: Jul 2 06
Posts: 2562
ID: 98229
Credit: 957,089
RAC: 119
Message 57648 - Posted 6 Dec 2008 12:40:04 UTC

problem here


http://boinc.bakerlab.org/rosetta/results.php?hostid=267483&offset=20

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57649 - Posted 6 Dec 2008 12:46:03 UTC - in response to Message ID 57648.

problem here


http://boinc.bakerlab.org/rosetta/results.php?hostid=267483&offset=20



these should be reported in the 1.40 thread. very odd, aborted and then detached and then completed ok on some of them. 3 different users.

RottenMutt

Joined: Jan 2 07
Posts: 2
ID: 139037
Credit: 249,397
RAC: 0
Message 57650 - Posted 6 Dec 2008 13:46:01 UTC - in response to Message ID 57649.
Last modified: 6 Dec 2008 13:51:15 UTC

lots of mini 1.45 compute errors here, with just a few successes.

no problems with beta 5.98.
____________

Mattia Verga

Joined: Jul 15 06
Posts: 3
ID: 100179
Credit: 124,357
RAC: 0
Message 57652 - Posted 6 Dec 2008 15:14:42 UTC

Error code 193 here:

212518199


____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57653 - Posted 6 Dec 2008 15:22:50 UTC - in response to Message ID 57650.

lots of mini 1.45 compute errors here, with just a few successes.

no problems with beta 5.98.


are you OC'd at all? if so try lowering your speed a bit.
Ive found that some of these tasks are speed sensitive.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,056,972
RAC: 5,479
Message 57659 - Posted 6 Dec 2008 19:43:17 UTC

Task 212334548, workunit 193576676 failed on my iMac2 10.4.11. after about half an hour. It seems to have been completed successfully by someone on an XP system.


<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
SIGBUS: bus error

Crashed executable name: minirosetta_1.45_i686-apple-darwin
built using BOINC library version 6.5.0
Machine type Intel 80486 (32-bit executable)
System version: Macintosh OS 10.4.11 build 8S2167
Fri Dec 5 12:23:43 2008

Thread 0 Crashed:
0 ...etta_1.45_i686-apple-darwin 0x00226273 __ZNK4core10kinematics8AtomTree17update_domain_mapERN9ObjexxFCL8FArray1DIiEERKNS_2id10AtomID_MapIbEESA_ + 41
1 ...etta_1.45_i686-apple-darwin 0x0001596f __ZNK4core12conformation12Conformation17update_domain_mapERN9ObjexxFCL8FArray1DIiEE + 403
2 ...etta_1.45_i686-apple-darwin 0x000830f3 __ZN4core4pose4Pose13scoring_beginEN7utility7pointer10owning_ptrINS_7scoring17ScoreFunctionInfoEEE + 1329
3 ...etta_1.45_i686-apple-darwin 0x000fa4e2 __ZNK4core7scoring13ScoreFunctionclERNS_4pose4PoseE + 4686
4 ...etta_1.45_i686-apple-darwin 0x001938b1 __ZNK9protocols8abinitio18AbrelaxApplication13process_decoyERN4core4pose4PoseERKNS2_7scoring13ScoreFunctionESsRNS2_2io6silent12SilentStructE + 35
5 ...etta_1.45_i686-apple-darwin 0x001afe27 __ZN9protocols8abinitio18AbrelaxApplication4foldEv + 9651
6 ...etta_1.45_i686-apple-darwin 0x001b5381 __ZN9protocols8abinitio18AbrelaxApplication3runEv + 881
7 ...etta_1.45_i686-apple-darwin 0x00009a87 _main + 3941
8 ...etta_1.45_i686-apple-darwin 0x0000292e __start + 216
9 ...etta_1.45_i686-apple-darwin 0x00002855 start + 41

Thread 1:
0 /usr/lib/libSystem.B.dylib 0x90037b57 _mach_wait_until + 7
1 /usr/lib/libSystem.B.dylib 0x9003799e _nanosleep + 398
2 /usr/lib/libSystem.B.dylib 0x9003a222 _usleep + 82
3 ...etta_1.45_i686-apple-darwin 0x00516bd1 __Z11boinc_sleepd + 197
4 ...etta_1.45_i686-apple-darwin 0x001f8583 __Z12timer_threadPv + 77
5 /usr/lib/libSystem.B.dylib 0x90024227 __pthread_body + 84

Thread 2:
0 /usr/lib/libSystem.B.dylib 0x90037b57 _mach_wait_until + 7
1 /usr/lib/libSystem.B.dylib 0x9003799e _nanosleep + 398
2 /usr/lib/libSystem.B.dylib 0x900377d9 _sleep + 121
3 ...etta_1.45_i686-apple-darwin 0x0051f4c0 __ZN9protocols5boinc8watchdog13main_watchdogEPv + 548
4 /usr/lib/libSystem.B.dylib 0x90024227 __pthread_body + 84

Thread 0 crashed with X86 Thread State (32-bit):
eax: 0x00000000 ebx: 0x00000000 ecx: 0x00000000 edx: 0x00000000
edi: 0x00000000 esi: 0x00000000 ebp: 0xbfffb0a8 esp: 0x00000000
ss: 0x00000000 efl: 0x00000000 eip: 0x0096e325 cs: 0x00000000
ds: 0x00000000 es: 0x00000000 fs: 0x00000000 gs: 0x00000000

Binary Images Description:
0x1000 - 0x12b2fff /Library/Application Support/BOINC Data/slots/0/../../projects/boinc.bakerlab.org_rosetta/minirosetta_1.45_i686-apple-darwin
0x162c000 - 0x170afff /usr/lib/libxml2.2.dylib
0x90000000 - 0x90171fff /usr/lib/libSystem.B.dylib

____________

JChojnacki Profile
Avatar

Joined: Sep 17 05
Posts: 71
ID: 105
Credit: 6,731,215
RAC: 1,607
Message 57662 - Posted 6 Dec 2008 22:42:51 UTC

This WU failed with exit code -1073741819 (0xc0000005)

212340099

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 57664 - Posted 7 Dec 2008 4:25:16 UTC

Promising results. All the silly warning messages have disappeared for me. No NAN hbonding errors either (yet?). Where I used to have 2 out of 5 WUs crash out with the error "Can't acquire lockfile" it's dropped to 2 out of 7 crashing out for that reason (over my first 21 results only - may not turn out to be representative).

In addition, similar to RottenMutt and JChojnacki, task 212336883 errored out with:

Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 7200


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000

This is the same exit code I reported under the Mini 1.34 thread here but with an access violation at a different address this time.

Hope that helps.
____________

AMD_is_logical

Joined: Dec 20 05
Posts: 299
ID: 41207
Credit: 31,460,681
RAC: 0
Message 57665 - Posted 7 Dec 2008 4:59:43 UTC

I'm getting a bunch of errors from cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs WUs:

http://boinc.bakerlab.org/rosetta/result.php?resultid=212352758
http://boinc.bakerlab.org/rosetta/result.php?resultid=212299725
http://boinc.bakerlab.org/rosetta/result.php?resultid=212268137
http://boinc.bakerlab.org/rosetta/result.php?resultid=212215308
http://boinc.bakerlab.org/rosetta/result.php?resultid=212192548

After running a while, the WUs exit with code 193 and a stack trace.

Note that this is on 4 different Linux nodes, (all of which were running well with version 1.40, except for the NANs problem).

David Ball

Joined: Nov 25 05
Posts: 25
ID: 19653
Credit: 1,270,528
RAC: 0
Message 57668 - Posted 7 Dec 2008 8:35:17 UTC

Vista 64 bit on stock HP machine with Q6600 CPU and 5 GB memory - no OC
BOINC 6.2.19
App: Mini 1.45

Name cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hr1958_olange_5387_12341_1

Ran for around 4 hours and exited with
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000

Stack trace is in the result

http://boinc.bakerlab.org/rosetta/result.php?resultid=212406604
____________
Have you read a good Science Fiction book lately?

Rifleman

Joined: Nov 19 08
Posts: 17
ID: 288725
Credit: 139,408
RAC: 0
Message 57670 - Posted 7 Dec 2008 8:58:23 UTC

I have had 3 WUs error out on me but seems to be much more stable than it was:
http://boinc.bakerlab.org/rosetta/result.php?resultid=212602945
Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 948562
Report deadline 16 Dec 2008 20:17:24 UTC
CPU time 18577.93
stderr out <core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x007FA877 read attempt to address 0x1F59DCA6

Engaging BOINC Windows Runtime Debugger...



********************



http://boinc.bakerlab.org/rosetta/result.php?resultid=212495875
Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 948562
Report deadline 16 Dec 2008 10:40:05 UTC
CPU time 6441.172
stderr out <core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...



http://boinc.bakerlab.org/rosetta/result.php?resultid=212434493
Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 948562
Report deadline 16 Dec 2008 4:04:00 UTC
CPU time 13200.43
stderr out <core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 28800


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...



********************




xsc2

Joined: Jul 9 08
Posts: 4
ID: 267987
Credit: 62,354
RAC: 0
Message 57677 - Posted 7 Dec 2008 13:21:37 UTC

Exit code: -1073741819 (0xc0000005)

http://boinc.bakerlab.org/rosetta/result.php?resultid=212596936

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 5,900,527
RAC: 7,817
Message 57680 - Posted 7 Dec 2008 15:29:44 UTC

Task ID 212423733
Name cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_ccr19_olange_5384_13138_0
Workunit 193652172

Validate state Invalid
Claimed credit 14.8874783714289
Granted credit 0
application version 1.45

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57681 - Posted 7 Dec 2008 15:43:38 UTC - in response to Message ID 57680.

Task ID 212423733
Name cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_ccr19_olange_5384_13138_0
Workunit 193652172

Validate state Invalid
Claimed credit 14.8874783714289
Granted credit 0
application version 1.45

---
here is the link to his task:

http://boinc.bakerlab.org/rosetta/result.php?resultid=212423733
another (0xc0000005) error

guhungry

Joined: Dec 1 08
Posts: 1
ID: 290354
Credit: 620,505
RAC: 0
Message 57683 - Posted 7 Dec 2008 17:04:28 UTC
Last modified: 7 Dec 2008 17:08:14 UTC

I have a lot of them and all errors I take a look returned exit code -1073741819 (0xc0000005) from task cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs.
---------------------------------------------
212713228
212673521
212467256
212463757
212356024
212292194
212243396
212205055
212172410
212319134
212285788
212244308
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57685 - Posted 7 Dec 2008 19:10:43 UTC

Another (0xc0000005) error:

cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_nsp1_olange_5389_30836_0

http://boinc.bakerlab.org/rosetta/result.php?resultid=212733641

Is there a problem with the cs_vanilla workunits?

I notice that this is one of the first two 1.45 workunits I've seen running on my dual-core machine at the same time - is there some problem with that and my memory size (2 GB total)?

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 57686 - Posted 7 Dec 2008 19:11:26 UTC - in response to Message ID 57664.
Last modified: 7 Dec 2008 19:17:22 UTC

Promising results. All the silly warning messages have disappeared for me. No NAN hbonding errors either (yet?). Where I used to have 2 out of 5 WUs crash out with the error "Can't acquire lockfile" it's dropped to 2 out of 7 crashing out for that reason (over my first 21 results only - may not turn out to be representative).

9 out of the next 11 were successful too, making 24 good out of 32, which is the best performance I've had for a very long time. Combined with a continuing 100% record on Beta 5.98s (much more coming through recently) I'm officially happier and less frustrated.

My 5th best day ever!

Not perfect yet, but just reporting some better news instead of constant misery. You must be working on the right lines. Keep it up!
____________

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,056,972
RAC: 5,479
Message 57687 - Posted 7 Dec 2008 19:47:52 UTC

Another crash on Mac OSX 10.4.11. Task 212684901: Workunit 193888088

Same area of code as before (update_domain_map)

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# cpu_run_time_pref: 14400
SIGBUS: bus error

Crashed executable name: minirosetta_1.45_i686-apple-darwin
built using BOINC library version 6.5.0
Machine type Intel 80486 (32-bit executable)
System version: Macintosh OS 10.4.11 build 8S2167
Sun Dec 7 06:56:14 2008

Thread 0 Crashed:
0 ...etta_1.45_i686-apple-darwin 0x00226273 __ZNK4core10kinematics8AtomTree17update_domain_mapERN9ObjexxFCL8FArray1DIiEERKNS_2id10AtomID_MapIbEESA_ + 41
1 ...etta_1.45_i686-apple-darwin 0x0001596f __ZNK4core12conformation12Conformation17update_domain_mapERN9ObjexxFCL8FArray1DIiEE + 403
2 ...etta_1.45_i686-apple-darwin 0x000830f3 __ZN4core4pose4Pose13scoring_beginEN7utility7pointer10owning_ptrINS_7scoring17ScoreFunctionInfoEEE + 1329
3 ...etta_1.45_i686-apple-darwin 0x000fa4e2 __ZNK4core7scoring13ScoreFunctionclERNS_4pose4PoseE + 4686
4 ...etta_1.45_i686-apple-darwin 0x001938b1 __ZNK9protocols8abinitio18AbrelaxApplication13process_decoyERN4core4pose4PoseERKNS2_7scoring13ScoreFunctionESsRNS2_2io6silent12SilentStructE + 35
5 ...etta_1.45_i686-apple-darwin 0x001afe27 __ZN9protocols8abinitio18AbrelaxApplication4foldEv + 9651
6 ...etta_1.45_i686-apple-darwin 0x001b5381 __ZN9protocols8abinitio18AbrelaxApplication3runEv + 881
7 ...etta_1.45_i686-apple-darwin 0x00009a87 _main + 3941
8 ...etta_1.45_i686-apple-darwin 0x0000292e __start + 216
9 ...etta_1.45_i686-apple-darwin 0x00002855 start + 41

etc.


____________

steve Profile

Joined: Nov 27 08
Posts: 7
ID: 289641
Credit: 1,085
RAC: 0
Message 57690 - Posted 7 Dec 2008 22:41:01 UTC - in response to Message ID 57612.

David,

I just recieved an error during this file analysis:

Time of DownLoad:
12/7/2008 12:30:34 PM|rosetta@home|Starting task fixed_bb_hb_rlbd_1tig_IGNORE_THE_REST_DECOY_5470_34_0 using minirosetta version 145

Time of Error
12/7/2008 3:31:37 PM|rosetta@home|Started upload of fixed_bb_hb_rlbd_1tig_IGNORE_THE_REST_DECOY_5470_34_0_0

Error Message
"Could not write to a specified memory location". The message asked me if I wanted to DeBug.. I pressed cancel.

Time of Upload:
The file was upload to your server as:
12/7/2008 3:31:43 PM|rosetta@home|Finished upload of fixed_bb_hb_rlbd_1tig_IGNORE_THE_REST_DECOY_5470_34_0_0


Steve


Please post bugs and issues regarding minirosetta version 1.45.

This update includes fixes to long runtimes for 'relax' jobs, validation errors, check point recovery issues, and numerical instability in hydrogen-bond scoring.

We think we might have fixed the preemption problem so please keep an eye out for this. The "can't acquire lockfile" issue might also be related. If you are having lockfile problems, please make sure there are no other boinc applications running in the same slot. If necessary, turn off the client and make sure all boinc apps are not running, and then restart the client.

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 57697 - Posted 8 Dec 2008 4:41:46 UTC

Hi.

This one broke after 2hrs, 44min.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=194032196

Mon 08 Dec 2008 15:03:25 EST|rosetta@home|Output file cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_flua_olange_5385_36439_0_0 for task cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_flua_olange_5385_36439_0 absent

pete.

____________


AMD_is_logical

Joined: Dec 20 05
Posts: 299
ID: 41207
Credit: 31,460,681
RAC: 0
Message 57699 - Posted 8 Dec 2008 4:56:26 UTC

Here's some more bad cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs WUs:

http://boinc.bakerlab.org/rosetta/result.php?resultid=212592040
http://boinc.bakerlab.org/rosetta/result.php?resultid=212475523
http://boinc.bakerlab.org/rosetta/result.php?resultid=212454329
http://boinc.bakerlab.org/rosetta/result.php?resultid=212415902
http://boinc.bakerlab.org/rosetta/result.php?resultid=212349479
http://boinc.bakerlab.org/rosetta/result.php?resultid=212298709
http://boinc.bakerlab.org/rosetta/result.php?resultid=212268849
http://boinc.bakerlab.org/rosetta/result.php?resultid=212260558

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57700 - Posted 8 Dec 2008 5:28:43 UTC - in response to Message ID 57699.

Here's some more bad cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs WUs:

http://boinc.bakerlab.org/rosetta/result.php?resultid=212592040
http://boinc.bakerlab.org/rosetta/result.php?resultid=212475523
http://boinc.bakerlab.org/rosetta/result.php?resultid=212454329
http://boinc.bakerlab.org/rosetta/result.php?resultid=212415902
http://boinc.bakerlab.org/rosetta/result.php?resultid=212349479
http://boinc.bakerlab.org/rosetta/result.php?resultid=212298709
http://boinc.bakerlab.org/rosetta/result.php?resultid=212268849
http://boinc.bakerlab.org/rosetta/result.php?resultid=212260558


Makes me suspect that at least one of the following is true:

1. cs_vanilla workunits are a high fraction of the workunits now going out.

2. The cs_vanilla workunits are using a new feature of 1.45 that hasn't been adequately tested for its ability to finish properly.

AdeB Profile
Avatar

Joined: Dec 12 06
Posts: 45
ID: 135244
Credit: 2,342,814
RAC: 1,852
Message 57702 - Posted 8 Dec 2008 16:36:54 UTC

ERROR: Illegal value for integer option -run:jran specified:

in workunit 1g73A_ZNMP_ABRELAX_tetraR_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1g73A-_5476_258_1

AdeB
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 57703 - Posted 8 Dec 2008 16:49:24 UTC - in response to Message ID 57686.

Promising results. All the silly warning messages have disappeared for me. No NAN hbonding errors either (yet?). Where I used to have 2 out of 5 WUs crash out with the error "Can't acquire lockfile" it's dropped to 2 out of 7 crashing out for that reason (over my first 21 results only - may not turn out to be representative).

9 out of the next 11 were successful too, making 24 good out of 32, which is the best performance I've had for a very long time. Combined with a continuing 100% record on Beta 5.98s (much more coming through recently) I'm officially happier and less frustrated.

And now 16 more successes out of 18 making 40 out of 50. Most errors came early, so I'm now confident enough to up my run-time from 2 to 3 hours again.

Good work on this "Can't acquire lockfile" problem. I'm just going to tidy up the lockfiles, reboot and see if the good results continue.

Efforts much appreciated here. Let's see if it can be nailed in the next update.
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 57704 - Posted 8 Dec 2008 17:42:41 UTC - in response to Message ID 57703.

I'm just going to tidy up the lockfiles, reboot and see if the good results continue.

On this, when trying to stop the BOINC service I wasn't allowed to until I'd ended the boinc.exe client process under User Name boinc_master (Vista64 OS quad-core AMD Phenom).

In the Task Manager I 'showed processes for all users' to do this and saw that 2 rosetta_beta_5.98_windows_x86_64.exe*32 processes were still running (correct) but also about 20 minirosetta_1.4x_windows_x86_64.exe*32 processes were running. About half of those were for MiniRosetta 1.40 and the other half for v1.45. All under the User Name boinc_project. There should just have been 2 for v1.45.

My last Mini 1.40 WU was completed late last Friday, so these 10-ish 1.40 processes have persisted for 3 days (no re-boots in that time).

I manually ended all these processes.

Going to the C:\ProgramData\BOINC\slots folder, there were 23 folders (numbered from 0 to 22), the first 19 of which contained a 0-byte boinc_lockfile file and a stderr.txt and a stdout.txt file. The other 4 folders contained the files I'd expect for running processes.

I deleted all the boinc_lockfile files and re-booted. On start-up, the first 19 folders had been removed, leaving the 4 active ones.

I'm no programmer and may be talking out of my hat, but could these old processes still running have something to do with being unable to acquire boinc_lockfile ?

When there's a Compute Error is there some fault in the way the process closes down and releases the files it holds open? Or could it be to do with the user (me) aborting a WU manually when I see it's stalled and producing error messages?

I'm guessing wildly, obviously, but hopefully this means something more sensible to you clever chaps. You seem to be close to some solutions for a problem that's persisted over several versions, so maybe this is the final clue you need? Hope it helps.
____________

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,056,972
RAC: 5,479
Message 57706 - Posted 8 Dec 2008 18:50:14 UTC

Some more workunits failing on Mac OS X 10.4.11

Task 212933060 : Workunit 194105301
Task 212892168 : Workunit 194071153
(names **_ZNMP_RELAX_**)

both failing at startup

ERROR: Illegal value for integer option -run:jran specified:

Also Task 212828576 : Workunit 194016740 (cs_vanilla_* again) failing halfway through in Update_domain_map

Thread 0 Crashed:
0 ...etta_1.45_i686-apple-darwin 0x00226273 __ZNK4core10kinematics8AtomTree17update_domain_mapERN9ObjexxFCL8FArray1DIiEERKNS_2id10AtomID_MapIbEESA_ + 41
1 ...etta_1.45_i686-apple-darwin 0x0001596f __ZNK4core12conformation12Conformation17update_domain_mapERN9ObjexxFCL8FArray1DIiEE + 403

etc.


____________

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 57708 - Posted 8 Dec 2008 20:43:04 UTC

Hi, me again.

This one crashed overnight after 5hrs, 30min. not good.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=194070527

Mon 08 Dec 2008 20:55:01 EST|rosetta@home|Output file cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_nsp1_olange_5389_38205_0_0 for task cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_nsp1_olange_5389_38205_0 absent


<core_client_version>6.2.14</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
SIGSEGV: segmentation violation
Stack trace (12 frames):
[0x8b883ab]
[0x8bb211c]
[0xffffe500]
[0x85d2f9a]
[0x85b766e]
[0x83efc90]
[0x811a22a]
[0x812a216]
[0x812be61]
[0x804b884]
[0x8c0dc1c]
[0x8048111]

Exiting...

</stderr_txt>

pete.

____________


Path7

Joined: Aug 25 07
Posts: 128
ID: 201002
Credit: 61,751
RAC: 0
Message 57709 - Posted 8 Dec 2008 20:49:43 UTC

Running Windows XP-home I found 2 Wu's with this error:
ERROR: Illegal value for integer option -run:jran specified:

1wjdA_ZNMP_ABRELAX_tetraL_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1wjdA-_5475_616_0
1dsvA_ZNMP_ABRELAX_tetraL_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1dsvA-_5475_81_0

The first WU also had the same error on the second run.

Have a nice day,
Path7.

stewjack

Joined: Apr 23 06
Posts: 39
ID: 78784
Credit: 95,871
RAC: 0
Message 57712 - Posted 8 Dec 2008 21:10:06 UTC
Last modified: 8 Dec 2008 21:15:11 UTC

This WU was crunched under v.1.45 and exited with an error.

========
Task ID 212609043
Name: loopbuild_reference_hombench_loopbuild_t327__IGNORE_THE_REST_1XMAA_2_5453_3_0

Workunit 193817482

http://boinc.bakerlab.org/rosetta/result.php?resultid=212609043

==== stderr out ======
<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x007FA87A read attempt to address 0x000002E8


<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x007FA87A read attempt to address 0x000002E8

Jack
____________

Chu

Joined: Feb 23 06
Posts: 120
ID: 61076
Credit: 112,439
RAC: 0
Message 57714 - Posted 8 Dec 2008 21:24:12 UTC

Hi everyone, due to a limitation on command line length set by BOINC, the jobs with name "*_ZN_ABRELAX_*" have their command lines automatically truncated when sent out on Rosetta@Home. That is why you see it stopped with an ERROR like "ERROR: Illegal value for integer option -run:jran specified: " right away. In fact, the same workunits returned with very good success rate on the testing server. We are investigating why the alpha testing server did not catch such errors in the first place. This is another unfortunate incident which will be a new lesson for us. Sorry for any inconvenience this has brought to you.
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57721 - Posted 9 Dec 2008 0:23:12 UTC - in response to Message ID 57699.

Here's some more bad cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs WUs:

http://boinc.bakerlab.org/rosetta/result.php?resultid=212592040
http://boinc.bakerlab.org/rosetta/result.php?resultid=212475523
http://boinc.bakerlab.org/rosetta/result.php?resultid=212454329
http://boinc.bakerlab.org/rosetta/result.php?resultid=212415902
http://boinc.bakerlab.org/rosetta/result.php?resultid=212349479
http://boinc.bakerlab.org/rosetta/result.php?resultid=212298709
http://boinc.bakerlab.org/rosetta/result.php?resultid=212268849
http://boinc.bakerlab.org/rosetta/result.php?resultid=212260558


I notice that for most of those workunits, your wingman returned an error on the same workunit, so I'd suspect problems built into either the cs_vanilla workunits or a new feature of minirosetta that few other workunits have used before.

In those cases where your wingman completed the workunit without a problem, it was on an Intel Xeon CPU with a lot more memory. This leads me to believe that cs_vanilla workunits need a lot more memory than your computer has, and suggests that your rather old version of BOINC (5.2.13) may not be sending the information needed to choose only workunits that will work with your memory size. The version of BOINC used where those workunits were successful was 6.2.19.

My computer is in between - using BOINC 5.10.45 with a total RAM memory of 2 GB, and the only cs_vanilla workunit I've had so far ran longer than most of yours, but still failed.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 57723 - Posted 9 Dec 2008 0:49:01 UTC - in response to Message ID 57714.

Hi everyone, due to a limitation on command line length set by BOINC, the jobs with name "*_ZN_ABRELAX_*" have their command lines automatically truncated when sent out on Rosetta@Home. That is why you see it stopped with an ERROR like "ERROR: Illegal value for integer option -run:jran specified: " right away.

Thanks. Just about to report 3 of them. Aborted another in advance.

Another cs_vanilla WU went down though:
212913623
<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000

____________

Jim_Clark Profile
Avatar

Joined: Sep 11 07
Posts: 7
ID: 204423
Credit: 38,439
RAC: 0
Message 57725 - Posted 9 Dec 2008 1:50:05 UTC
Last modified: 9 Dec 2008 2:03:31 UTC

I haven't gotten Minirosetta to run successfully on my XP Pro computer since it's inception. Every time there is a new version of Minirosetta or BOINC or any other change that might effect the success of Minirosetta, I give it a try. The rest of the time I abort Minirosetta until I get a Rosetta Beta WU.

I run many programs on my computer, even new programs that may need debugging; and Minirosetta is the only one that crashes or hangs my computer.

Here are my stats for the last 100 Rosetta WUs:
2 new Rosetta Beta WUs (anticipate success)
2 done (OK) Rosetta Beta WUs
8 failed Rosetta Mini WUs, most 1.40, some 1.45
80 aborted Rosetta Mini WUs, most 1.40, some 1.45

So I waste my time with 88 Rosetta Mini WUs, for 4 Rosetta Beta WUs that are good. (22 to 1)

Another computer in my house, an XP HE with 1/5th the horsepower, can sometimes compute Minirosetta WUs. Its stats are:

5 done (OK) Rosetta Mini WUs, most 1.40
3 failed Rosetta Mini WUs, most 1.40

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57726 - Posted 9 Dec 2008 2:02:21 UTC - in response to Message ID 57725.
Last modified: 9 Dec 2008 2:06:51 UTC

I haven't gotten Minirosetta to run successfully on my XP Pro computer since it's inception. Every time there is a new version of Minirosetta or BOINC or any other change that might effect the success of Minirosetta, I give it a try. The rest of the time I abort Minirosetta until I get a Rosetta Beta WU.

Here are my stats for the last 100 Rosetta WUs:
2 new Rosetta Beta WUs (anticipate success)
2 done (OK) Rosetta Beta WUs
8 failed Rosetta Mini WUs, most 1.40, some 1.45
80 aborted Rosetta Mini WUs, most 1.40, some 1.45

So I waste my time with 88 Rosetta Mini WUs, for 4 Rosetta Beta WUs that are good. (22 to 1)

Another computer in my house, an XP HE with 1/5th the horsepower, can sometimes compute Minirosetta WUs. Its stats are:

5 done (OK) Rosetta Mini WUs, most 1.40
3 failed Rosetta Mini WUs, most 1.40




How much RAM memory do each of those computers have, and how many CPU cores do each of them have? minirosetta is now memory-hungry enough that the answers make a big difference.

Also, have you tried a 1.45 workunit with a name which doesn't start with cs_vanilla? Those workunits have been especially troublesome lately.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,056,972
RAC: 5,479
Message 57727 - Posted 9 Dec 2008 2:05:00 UTC

Another cs_vanilla_* task failed on me. Task 213050853 : Workunit 194202708.

Failed in a different area this time.

If its a BOINC version issue as suggested previously, this is bad news for me as the latest Mac version (which I'm running) is only 6.2.18


Thread 0 Crashed:
0 ...etta_1.45_i686-apple-darwin 0x00081e9c __ZNK4core4pose4Pose11is_fullatomEv + 154
1 ...etta_1.45_i686-apple-darwin 0x001afdf8 __ZN9protocols8abinitio18AbrelaxApplication4foldEv + 9604
2 ...etta_1.45_i686-apple-darwin 0x001b5381 __ZN9protocols8abinitio18AbrelaxApplication3runEv + 881
3 ...etta_1.45_i686-apple-darwin 0x00009a87 _main + 3941
4 ...etta_1.45_i686-apple-darwin 0x0000292e __start + 216
5 ...etta_1.45_i686-apple-darwin 0x00002855 start + 41

etc.
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57729 - Posted 9 Dec 2008 2:43:07 UTC - in response to Message ID 57727.

Another cs_vanilla_* task failed on me. Task 213050853 : Workunit 194202708.

Failed in a different area this time.

If its a BOINC version issue as suggested previously, this is bad news for me as the latest Mac version (which I'm running) is only 6.2.18


Thread 0 Crashed:
0 ...etta_1.45_i686-apple-darwin 0x00081e9c __ZNK4core4pose4Pose11is_fullatomEv + 154
1 ...etta_1.45_i686-apple-darwin 0x001afdf8 __ZN9protocols8abinitio18AbrelaxApplication4foldEv + 9604
2 ...etta_1.45_i686-apple-darwin 0x001b5381 __ZN9protocols8abinitio18AbrelaxApplication3runEv + 881
3 ...etta_1.45_i686-apple-darwin 0x00009a87 _main + 3941
4 ...etta_1.45_i686-apple-darwin 0x0000292e __start + 216
5 ...etta_1.45_i686-apple-darwin 0x00002855 start + 41

etc.


What's the total amount of RAM memory on your machine? I just looked over many of the cs_vanilla type of workunits with enough information posted to this thread to find the workunits, and found the following:

Most of that type of workunit that ran under BOINC 6.2.19 on a machine with at least 4 GB of memory succeeded.

Perhaps half of those under BOINC 6.2.19 and 3 GB succeeded.

Most of those with BOINC 6.2.14 or older failed.

Most of those with 2 GB or less failed.

I didn't find enough under BOINC 6.2.18 to be sure, but perhaps half of those I saw succeeded.

Most of the workunits with _ZN_ABRELAX_ in the workunit name have problems; see the earlier message about them.

I saw a lot fewer failures for workunits with different types of names.

Naturally, what I was able to find was probably biased by the fact that people aren't likely to post enough information about workunits that don't fail on at least one machine for me to be able to find them.

I suspect that we need a new system requirements evaluation specific to workunits that use the same features as the cs_vanilla workunits.

Jim_Clark Profile
Avatar

Joined: Sep 11 07
Posts: 7
ID: 204423
Credit: 38,439
RAC: 0
Message 57730 - Posted 9 Dec 2008 2:49:05 UTC - in response to Message ID 57726.

How much RAM memory do each of those computers have, and how many CPU cores do each of them have? minirosetta is now memory-hungry enough that the answers make a big difference.

Also, have you tried a 1.45 workunit with a name which doesn't start with cs_vanilla? Those workunits have been especially troublesome lately.


The first one I spoke of has two cores, and -
Memory: 1.94 GB physical, 3.78 GB virtual

The second one has one core, and -
Memory: 1.02 GB physical, 3.91 GB virtual

I'll try a non-vanilla workunit. Do they have chocolate?

If memory is a critical issue, why can't Minirosetta check available memory at the start, and quit right away if memory is too scarce?

Also it seems that the Scheduler ought to observe when a client aborts more than 20 (or some other number) Minirosetta workunits, and then don't send any to that client. Or, better, let clients select the programs that they will accept, as other projects allow.


AMD_is_logical

Joined: Dec 20 05
Posts: 299
ID: 41207
Credit: 31,460,681
RAC: 0
Message 57731 - Posted 9 Dec 2008 3:11:47 UTC - in response to Message ID 57721.

In those cases where your wingman completed the workunit without a problem, it was on an Intel Xeon CPU with a lot more memory. This leads me to believe that cs_vanilla workunits need a lot more memory than your computer has, and suggests that your rather old version of BOINC (5.2.13) may not be sending the information needed to choose only workunits that will work with your memory size.

I find my BOINC version quite reliable, and it certainly sends the memory size information properly. The computers' links at Rosetta show the proper memory size, and in the past my 512MB machines have been refused work when none was available for their memory size.

Your suggestion about cs_vanilla WUs needing more memory may be right, though. I have two quads (8 cores total) with lots of memory, and neither have had any of the cs_vanilla errors.

Maybe those WU eventually hit a model where they are using a lot more memory than they are supposed to.

netwraith Profile
Avatar

Joined: Sep 3 06
Posts: 80
ID: 109740
Credit: 13,483,227
RAC: 0
Message 57732 - Posted 9 Dec 2008 3:21:45 UTC



I don't know about these units passing on Intel CORE cpu's and more memory... I am having dozens of these cs_vanilla units bomb out on machines with dual CORE2 quad XEONS and 16GB of RAM... I think these machines are big enough to handle anything out there. And I was running 14 of them up to a day or two ago... Most are still crunching, but, are starting to wind down so that they can be part of a compute farm...

So, I think these is something wrong in these units or the v1.45 of mini....
____________
Looking for a team ??? Join BoincSynergy!!


Tony Profile

Joined: Dec 12 05
Posts: 7
ID: 35547
Credit: 6,724,341
RAC: 889
Message 57733 - Posted 9 Dec 2008 3:48:49 UTC

Error with debug info.
<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 6.5.0


Dump Timestamp : 12/08/08 06:59:34
Install Directory :
Data Directory : C:\ProgramData\BOINC
Project Symstore : http://boinc.bakerlab.org/rosetta/symstore
LoadLibraryA( C:\ProgramData\BOINC\dbghelp.dll ): GetLastError = 126
Loaded Library : dbghelp.dll
LoadLibraryA( C:\ProgramData\BOINC\symsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:\ProgramData\BOINC\srcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:\ProgramData\BOINC\version.dll ): GetLastError = 126
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: C:\ProgramData\BOINC\slots\2;C:\ProgramData\BOINC\projects\boinc.bakerlab.org_rosetta;srv*C:\ProgramData\BOINC\projects\boinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:\ProgramData\BOINC\projects\boinc.bakerlab.org_rosettasymbols*http://boinc.bakerlab.org/rosetta/symstore;srv*C:\ProgramData\BOINC\projects\boinc.bakerlab.org_rosettasymbols*http://boinc.berkeley.edu/symstore


ModLoad: 00400000 006e5000 C:\ProgramData\BOINC\projects\boinc.bakerlab.org_rosetta\minirosetta_1.45_windows_x86_64.exe (-nosymbols- Symbols Loaded)
Linked PDB Filename : D:\boinc_build\minirosetta_1.45\mini\Visual Studio\BoincRelease\minirosetta_1.45_windows_intelx86.pdb

ModLoad: 77140000 00160000 C:\Windows\SysWOW64\ntdll.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wntdll.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 76c90000 00110000 C:\Windows\syswow64\kernel32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wkernel32.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75520000 000d0000 C:\Windows\syswow64\USER32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wuser32.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 760e0000 00090000 C:\Windows\syswow64\GDI32.dll (6.0.6001.18023) (-exported- Symbols Loaded)
Linked PDB Filename : wgdi32.pdb
File Version : 6.0.6001.18023 (vistasp1_gdr.080221-1537)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18023

ModLoad: 759d0000 000c6000 C:\Windows\syswow64\ADVAPI32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : advapi32.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75640000 000f0000 C:\Windows\syswow64\RPCRT4.dll (6.0.6001.18051) (-exported- Symbols Loaded)
Linked PDB Filename : wrpcrt4.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 752f0000 00060000 C:\Windows\syswow64\Secur32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wsecur32.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75f10000 00060000 C:\Windows\system32\IMM32.DLL (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wimm32.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75e00000 000c8000 C:\Windows\syswow64\MSCTF.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : msctf.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 75f70000 000aa000 C:\Windows\syswow64\msvcrt.dll (7.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : msvcrt.pdb
File Version : 7.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 7.0.6001.18000

ModLoad: 76170000 00009000 C:\Windows\syswow64\LPK.DLL (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wlpk.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75aa0000 0007d000 C:\Windows\syswow64\USP10.dll (1.626.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : usp10.pdb
File Version : 1.0626.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Uniscribe Unicode script processor
Product Version : 1.0626.6001.18000

ModLoad: 73b00000 00021000 C:\Windows\system32\NTMARTA.DLL (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : ntmarta.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 755f0000 0004a000 C:\Windows\syswow64\WLDAP32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : wldap32.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 75ed0000 0002d000 C:\Windows\syswow64\WS2_32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : ws2_32.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 75f00000 00006000 C:\Windows\syswow64\NSI.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : nsi.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75930000 00007000 C:\Windows\syswow64\PSAPI.DLL (6.0.6000.16386) (-exported- Symbols Loaded)
Linked PDB Filename : psapi.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 73ae0000 00011000 C:\Windows\system32\SAMLIB.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : samlib.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 75b20000 00144000 C:\Windows\syswow64\ole32.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : ole32.pdb
File Version : 6.0.6000.16386 (vista_rtm.061101-2205)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6000.16386

ModLoad: 6e470000 000dc000 C:\Windows\system32\dbghelp.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : dbghelp.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000

ModLoad: 74fe0000 00008000 C:\Windows\system32\version.dll (6.0.6001.18000) (-exported- Symbols Loaded)
Linked PDB Filename : version.pdb
File Version : 6.0.6001.18000 (longhorn_rtm.080118-1840)
Company Name : Microsoft Corporation
Product Name : Microsoft� Windows� Operating System
Product Version : 6.0.6001.18000



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 6819, Write: 0, Other 2098

- I/O Transfers Counters -
Read: 0, Write: 20696, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 73000, QuotaPeakPagedPoolUsage: 73064
QuotaNonPagedPoolUsage: 10288, QuotaPeakNonPagedPoolUsage: 11520

- Virtual Memory Usage -
VirtualSize: 257019904, PeakVirtualSize: 274767872

- Pagefile Usage -
PagefileUsage: 184328192, PeakPagefileUsage: 207704064

- Working Set Size -
WorkingSetSize: 190152704, PeakWorkingSetSize: 213340160, PageFaultCount: 544576

*** Dump of thread ID 2228 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 34632224.000000, User Time: 43788701696.000000, Wait Time: 3728048.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000

- Registers -
eax=00000001 ebx=00000000 ecx=00000000 edx=0017c19c esi=0017c8cc edi=00000000
eip=004ead47 esp=0017c180 ebp=0017c588
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246

- Callstack -
ChildEBP RetAddr Args to Child
0017c588 bfcfa595 029c34d0 029c34d0 029c3704 00956b08 minirosetta_1.45_windows_x86_64!+0x0
00957020 009a48a8 00477310 009a49a8 00476130 009a4aa8 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'bfcfa595'
00957024 00477310 009a49a8 00476130 009a4aa8 00476750 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '009a48a8'
009a48a8 00000000 00000000 00a18e58 009a48bc 00000000 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '00477310'

*** Dump of thread ID 764 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 3728048.000000

- Registers -
eax=00000000 ebx=00000000 ecx=00000000 edx=00000000 esi=01c8ff48 edi=00000000
eip=7716081d esp=01c8ff08 ebp=01c8ff6c
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
01c8ff6c 76ca0c88 00000064 00000000 01c8ff94 0041c61b ntdll!NtDelayExecution+0x0
01c8ff7c 0041c61b 00000064 00000000 76d1e3f3 00000000 kernel32!Sleep+0x0
01c8ff94 771bcfed 00000000 75d9cf1b 00000000 00000000 minirosetta_1.45_windows_x86_64!+0x0
01c8ffd4 771bd1ff 0041c610 00000000 00000000 00000000 ntdll!RtlCreateUserProcess+0x0
01c8ffec 00000000 0041c610 00000000 00000000 00000000 ntdll!RtlCreateProcessParameters+0x0

*** Dump of thread ID 5428 (state: Waiting): ***

- Information -
Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 3727960.000000

- Registers -
eax=00000000 ebx=0281c900 ecx=00000000 edx=00000000 esi=01d8fdfc edi=00000000
eip=7716081d esp=01d8fdbc ebp=01d8fe20
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
01d8fe20 76ca0c88 000007d0 00000000 76ca0c79 0079eb74 ntdll!NtDelayExecution+0x0
01d8fe30 0079eb74 000007d0 fc734f4d 00000000 0281c950 kernel32!Sleep+0x0
76ca0c79 ff006aec a6e80875 5d000006 900004c2 90909090 minirosetta_1.45_windows_x86_64!+0x0
76ca0c7d a6e80875 5d000006 900004c2 90909090 8b55ff8b minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'ff006aec'
76ca0c81 5d000006 900004c2 90909090 8b55ff8b 14458dec minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'a6e80875'
76ca0c85 900004c2 90909090 8b55ff8b 14458dec 1475ff50 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '5d000006'
76ca0c89 90909090 8b55ff8b 14458dec 1475ff50 ff1075ff minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '900004c2'
76ca0c8d 8b55ff8b 14458dec 1475ff50 ff1075ff 75ff0c75 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '90909090'
76ca0c91 14458dec 1475ff50 ff1075ff 75ff0c75 c415ff08 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8b55ff8b'
76ca0c95 1475ff50 ff1075ff 75ff0c75 c415ff08 8b76ca06 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '14458dec'
76ca0c99 ff1075ff 75ff0c75 c415ff08 8b76ca06 c985184d minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '1475ff50'
76ca0c9d 75ff0c75 c415ff08 8b76ca06 c985184d c0850b75 minirosetta_1.45_windows_x86_64!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'ff1075ff'
76ca0ca1 c415ff08 8b76ca06 c985184d c0850b75 c0330e7c msvcrt!mktime+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '75ff0c75'
76ca0ca5 8b76ca06 c985184d c0850b75 c0330e7c 14c25d40 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c415ff08'
76ca0ca9 c985184d c0850b75 c0330e7c 14c25d40 14558b00 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8b76ca06'
76ca0cad c0850b75 c0330e7c 14c25d40 14558b00 eeeb1189 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c985184d'
76ca0cb1 c0330e7c 14c25d40 14558b00 eeeb1189 0e95e850 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c0850b75'
76ca0cb5 14c25d40 14558b00 eeeb1189 0e95e850 c0330000 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c0330e7c'
76ca0cb9 14558b00 eeeb1189 0e95e850 c0330000 9090ebeb msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '14c25d40'
76ca0cbd eeeb1189 0e95e850 c0330000 9090ebeb 8b909090 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '14558b00'
76ca0cc1 0e95e850 c0330000 9090ebeb 8b909090 ec8b55ff msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'eeeb1189'
76ca0cc5 c0330000 9090ebeb 8b909090 ec8b55ff 458b5151 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '0e95e850'
76ca0cc9 9090ebeb 8b909090 ec8b55ff 458b5151 5d8b530c msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'c0330000'
76ca0ccd 8b909090 ec8b55ff 458b5151 5d8b530c 358b5614 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '9090ebeb'
76ca0cd1 ec8b55ff 458b5151 5d8b530c 358b5614 76ca06c8 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8b909090'
76ca0cd5 458b5151 5d8b530c 358b5614 76ca06c8 087d8b57 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'ec8b55ff'
76ca0cd9 5d8b530c 358b5614 76ca06c8 087d8b57 8df84589 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '458b5151'
76ca0cdd 358b5614 76ca06c8 087d8b57 8df84589 6a501445 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '5d8b530c'
76ca0ce1 76ca06c8 087d8b57 8df84589 6a501445 fc458d40 msvcrt!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '358b5614'
76ca0ce5 087d8b57 8df84589 6a501445 fc458d40 f8458d50 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '76ca06c8'
76ca0ce9 8df84589 6a501445 fc458d40 f8458d50 5d895750 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '087d8b57'
76ca0ced 6a501445 fc458d40 f8458d50 5d895750 85d6fffc kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8df84589'
76ca0cf1 fc458d40 f8458d50 5d895750 85d6fffc 8d197dc0 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '6a501445'
76ca0cf5 f8458d50 5d895750 85d6fffc 8d197dc0 6a501445 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'fc458d40'
76ca0cf9 5d895750 85d6fffc 8d197dc0 6a501445 fc458d04 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'f8458d50'
76ca0cfd 85d6fffc 8d197dc0 6a501445 fc458d04 f8458d50 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '5d895750'
76ca0d01 8d197dc0 6a501445 fc458d04 f8458d50 d6ff5750 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '85d6fffc'
76ca0d05 6a501445 fc458d04 f8458d50 d6ff5750 8c0fc085 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8d197dc0'
76ca0d09 fc458d04 f8458d50 d6ff5750 8c0fc085 0000009f kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '6a501445'
76ca0d0d f8458d50 d6ff5750 8c0fc085 0000009f a814458b kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'fc458d04'
76ca0d11 d6ff5750 8c0fc085 0000009f a814458b a85675cc kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'f8458d50'
76ca0d15 8c0fc085 0000009f a814458b a85675cc 8d407503 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'd6ff5750'
76ca0d19 00000000 a814458b a85675cc 8d407503 53500845 kernel32!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '8c0fc085'


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

</stderr_txt>
]]>


____________

Wee Todd Didd

Joined: Jan 6 08
Posts: 1
ID: 233262
Credit: 61,148
RAC: 0
Message 57737 - Posted 9 Dec 2008 11:46:18 UTC

Same here. cs_vanilla* errors.
Client 1.45

Using a Athlon 64 with 2g ram.

Most do pass though.




http://boinc.bakerlab.org/rosetta/workunit.php?wuid=194285701
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=194171642
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=194109205
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=193908266
http://boinc.bakerlab.org/rosetta/workunit.php?wuid=193528547

AMD_is_logical

Joined: Dec 20 05
Posts: 299
ID: 41207
Credit: 31,460,681
RAC: 0
Message 57743 - Posted 9 Dec 2008 13:50:57 UTC

Well, I've now had a cs_vanilla error 193 on a quad with 4GB:

http://boinc.bakerlab.org/rosetta/result.php?resultid=212454329

And this quad also had error 193 with two loopbuild_reference_hombench WUs:

http://boinc.bakerlab.org/rosetta/result.php?resultid=212758774
http://boinc.bakerlab.org/rosetta/result.php?resultid=212602760

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57750 - Posted 9 Dec 2008 17:59:38 UTC

cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hr1958_olange_5387_14875_0 died at 11355.2 seconds with -1073741819 (0xc0000005) error. I had 8 tasks that completed ok before this.

Path7

Joined: Aug 25 07
Posts: 128
ID: 201002
Credit: 61,751
RAC: 0
Message 57754 - Posted 9 Dec 2008 21:03:33 UTC

This is my very first time to find a WU that ended with a Unhandled Exception Detected...:
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_ccr19_olange_5384_37184_0
ran for 11603.2 seconds when it errored with exit status: -1073741819 (0xc0000005)

Windows XP-home, Boinc 5.10.45

Path7.

AMD_is_logical

Joined: Dec 20 05
Posts: 299
ID: 41207
Credit: 31,460,681
RAC: 0
Message 57760 - Posted 10 Dec 2008 2:50:11 UTC

Now I've seen some cs_vanilla errors on my biggest memory computer, an 8GB quad. This can't be due to running out of memory, (unless they're hitting the limit of a 32 bit processes address space).

http://boinc.bakerlab.org/rosetta/result.php?resultid=213025220
http://boinc.bakerlab.org/rosetta/result.php?resultid=212933676
http://boinc.bakerlab.org/rosetta/result.php?resultid=212854713

Mike Tyka

Joined: Oct 20 05
Posts: 96
ID: 5612
Credit: 2,190
RAC: 0
Message 57761 - Posted 10 Dec 2008 2:52:48 UTC

THanks for all the error reports! I think we've found the issue here. THis was damn tricky to find since, for some reason, it doesnt appear to occur on linux plattforms even nearly as frequently as on mAC and windows. I ran the equivalent of several thousand WUs on our local cluster and didnt have a single job crash.

But i think we've found at least one issue by testing on our limited windows/mac resources, and a bug fix is going out to ralph tonight or tomorrow morning depending on how much caffeine i can get hold of.

Our aim is to get mini inline with old rosetta in terms of error rate as soon as we can!

Thanks for all the feedback, it totally helps finding these bugs!

Mike

____________
http://beautifulproteins.blogspot.com/
http://www.miketyka.com/

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 57795 - Posted 11 Dec 2008 5:07:52 UTC - in response to Message ID 57714.

Hi everyone, due to a limitation on command line length set by BOINC, the jobs with name "*_ZN_ABRELAX_*" have their command lines automatically truncated when sent out on Rosetta@Home. That is why you see it stopped with an ERROR like "ERROR: Illegal value for integer option -run:jran specified: " right away. In fact, the same workunits returned with very good success rate on the testing server. We are investigating why the alpha testing server did not catch such errors in the first place. This is another unfortunate incident which will be a new lesson for us. Sorry for any inconvenience this has brought to you.

Is this still the case or have they been corrected and re-issued?

I ask this because more are coming through and I just had one crash out on me. I aborted the rest, just in case, to save wasting processing time.
____________

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,570,814
RAC: 1,811
Message 57796 - Posted 11 Dec 2008 6:16:17 UTC

If your computer used any processing time at all on the unit it must have been another error. because in this particular case the tasks fail to process at all


CPU time 0
stderr out

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR: Illegal value for integer option -run:jran specified:

</stderr_txt>
]]>

____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 57798 - Posted 11 Dec 2008 10:38:46 UTC - in response to Message ID 57796.

If your computer used any processing time at all on the unit it must have been another error. Because in this particular case the tasks fail to process at all

Of course, thanks. I forgot. Looks like they were corrected then - crashed out after 25 minutes. I'll let them run.
____________

ramostol

Joined: Feb 6 07
Posts: 64
ID: 145835
Credit: 584,052
RAC: 0
Message 57799 - Posted 11 Dec 2008 10:48:44 UTC

Failed tasks among tasks received 20081207:

Errored out on 2 computers:
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hi0719_olange_5386_25492

cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hr1958_olange_5387_30102_1

One computer failures:
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hr1958_olange_5387_32109_0

loopbuild_reference_hombench_loopbuild_t363__IGNORE_THE_REST_2CU3A_7_5461_31_0

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57807 - Posted 11 Dec 2008 18:10:40 UTC

Task 212871743
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hi0719_olange_5386_35341_0
compute error
died at 12807.86 seconds with the usual (0xc0000005)error

1co4A_ZNMP_ABRELAX_tetraR_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1co4A-_5476_95_1 died
CPU time 0
stderr out

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR: Illegal value for integer option -run:jran specified:

</stderr_txt>


task 212923776
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_ccr19_olange_5384_35553_1
died at 3373.781 seconds with the usual (0xc0000005) error


1dsvA_ZNMP_ABRELAX_tetraR_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1dsvA-_5476_655_1
CPU time 0
stderr out

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR: Illegal value for integer option -run:jran specified:

</stderr_txt>
]]>


Task 212986916
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_mth1598_olange_5388_38521_1
died at 1881.563 seconds with the usual (0xc0000005) error

Task 213039289
cs_vanilla_abrelax_homo_bench_cs_vanilla_abrelax_cs_hi0719_olange_5386_42252_0
died at 5740.828 seconds with the usual (0xc0000005) error

Aborted the remaing vanila task, to many compute errors.

Erwin Schlonz
Avatar

Joined: May 20 07
Posts: 5
ID: 178747
Credit: 203,397
RAC: 0
Message 57814 - Posted 12 Dec 2008 12:18:20 UTC

<core_client_version>6.2.19</core_client_version>
<message>
- exit code -1073741819 (0xc0000005)
</message>
# cpu_run_time_pref: 86400

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004EAD47 read attempt to address 0x00000000


http://boinc.bakerlab.org/rosetta/result.php?resultid=213272398
http://boinc.bakerlab.org/rosetta/result.php?resultid=213233743
http://boinc.bakerlab.org/rosetta/result.php?resultid=213221227
http://boinc.bakerlab.org/rosetta/result.php?resultid=212971112
http://boinc.bakerlab.org/rosetta/result.php?resultid=213385263

The last one was after 18 (!) hours of computation. 18 hours of wasted eletric power.

That's it. I will suspend to participate until the application runs more stable. For that I will set all my computers to not to download any further workunits on monday. Maybe I will come around next spring to see if things are working again.

Auf Wiedersehen
Erwin Schlonz

DaveSun

Joined: May 3 07
Posts: 5
ID: 172723
Credit: 200,480
RAC: 0
Message 57815 - Posted 12 Dec 2008 13:15:25 UTC

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57816 - Posted 12 Dec 2008 13:33:02 UTC - in response to Message ID 57815.

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.


I don't know, but I noticed that your wingman on that workunit seemed to have chosen a shorter workunit size, and therefore shut down before reaching whatever caused that problem. Also, I've noticed that choosing a preferred workunit length above 10 hours seems to get me more problematic workunits, so if you get such problems often, you might want to try reducing your preferred workunit size.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57818 - Posted 12 Dec 2008 14:46:18 UTC - in response to Message ID 57815.

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.


I took another look at your results, and noticed that it returned 596 decoys. I don't think I've seen a workunit before that returned a 3 digit number of decoys, so perhaps there needs to be a check of whether both minirosetta 1.45 and the workunit validation software can handle that many decoys for one workunit and still do it properly.

A Few Good Men

Joined: Mar 25 07
Posts: 14
ID: 157915
Credit: 2,031,382
RAC: 55
Message 57819 - Posted 12 Dec 2008 14:56:08 UTC


Last 24 hours have produced this error on 5 WU's

Server state Over
Outcome Client error
Client state Compute error
Exit status -226 (0xffffff1e)
Computer ID 963376
Report deadline 22 Dec 2008 1:30:07 UTC
CPU time 21570.15
stderr out <core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
Can't acquire lockfile - exiting
Can't acquire lockfile - exiting



DaveSun

Joined: May 3 07
Posts: 5
ID: 172723
Credit: 200,480
RAC: 0
Message 57821 - Posted 12 Dec 2008 15:30:28 UTC - in response to Message ID 57818.

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.


I took another look at your results, and noticed that it returned 596 decoys. I don't think I've seen a workunit before that returned a 3 digit number of decoys, so perhaps there needs to be a check of whether both minirosetta 1.45 and the workunit validation software can handle that many decoys for one workunit and still do it properly.


I've been running at this setting for several months with out any major troubles and have had several that returned triple digit decoys. I setup to run 1 day after running for less than 10 hours for a long time and having units run what seemed like forever. This way I've not had any taks run over my preference and it works well for my setup. I just don't remember a task that did not validate that had run to completion here before this one.

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57822 - Posted 12 Dec 2008 15:35:59 UTC - in response to Message ID 57821.
Last modified: 12 Dec 2008 15:41:31 UTC

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.


I took another look at your results, and noticed that it returned 596 decoys. I don't think I've seen a workunit before that returned a 3 digit number of decoys, so perhaps there needs to be a check of whether both minirosetta 1.45 and the workunit validation software can handle that many decoys for one workunit and still do it properly.


I've been running at this setting for several months with out any major troubles and have had several that returned triple digit decoys. I setup to run 1 day after running for less than 10 hours for a long time and having units run what seemed like forever. This way I've not had any taks run over my preference and it works well for my setup. I just don't remember a task that did not validate that had run to completion here before this one.


Then perhaps the limit handled successfuly is higher than 99 decoys per workunit, but not as high as 596.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,056,972
RAC: 5,479
Message 57823 - Posted 12 Dec 2008 16:01:25 UTC

Assertion failure in Task 213968874 (abinitio_abrelax_nohomfrag_129_B_1qgvA_5483_146_0)
Workunit 195032150, Mac OS X 10.4.11

Failed after 30 seconds

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>

ERROR: Assertion failure: assert( ( begin + size - 1 ) <= pose.total_residue() );
ERROR:: Exit from: src/protocols/abinitio/FragmentMover.cc line: 110
called boinc_finish
# cpu_run_time_pref: 14400

</stderr_txt>
]]>

____________

Nothing But Idle Time

Joined: Sep 28 05
Posts: 209
ID: 1675
Credit: 139,545
RAC: 0
Message 57826 - Posted 12 Dec 2008 16:57:25 UTC

When I encountered two cs_vanilla compute errors in a row I set Rosetta to NNW. That was 4 days ago. Until the software is fixed and announced here it will remain so. It behooves the project team to fix these errors ASAP rather than wait until this thread (like its predecessors) is cluttered with hundreds of posts reporting the same stuff. I do not understand this counter-productive behavior.

David E K Profile
Forum moderator
Project administrator
Project developer
Project scientist

Joined: Jul 1 05
Posts: 939
ID: 14
Credit: 2,303,010
RAC: 1,094
Message 57829 - Posted 12 Dec 2008 18:01:52 UTC

we are definitely working on it and will likely have an update within a few days after testing on ralph.

Mike Tyka

Joined: Oct 20 05
Posts: 96
ID: 5612
Credit: 2,190
RAC: 0
Message 57830 - Posted 12 Dec 2008 19:00:45 UTC - in response to Message ID 57823.
Last modified: 12 Dec 2008 19:11:45 UTC

Assertion failure in Task 213968874 (abinitio_abrelax_nohomfrag_129_B_1qgvA_5483_146_0)
Workunit 195032150, Mac OS X 10.4.11

Failed after 30 seconds

<core_client_version>6.2.18</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>

ERROR: Assertion failure: assert( ( begin + size - 1 ) <= pose.total_residue() );
ERROR:: Exit from: src/protocols/abinitio/FragmentMover.cc line: 110
called boinc_finish
# cpu_run_time_pref: 14400

</stderr_txt>
]]>




Appologies for this - i screwed up the submit for two proteins:
1qgv and 1t2j . I've tried to remove the jobs as soon as i noticed but
around 200 WUs went out anyway. If you get a WU with either of those two protein tags please abort it!

For the cs_vanilla jobs a fix is going out onto RALPH@HOme right now. If you get cs_vanilla jobs, also feel free to abort them. We'll resubmitonce the error is fixed
____________
http://beautifulproteins.blogspot.com/
http://www.miketyka.com/

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57831 - Posted 12 Dec 2008 19:16:03 UTC - in response to Message ID 57819.

read here for two links on how to take care of lockfiles.


Last 24 hours have produced this error on 5 WU's

Server state Over
Outcome Client error
Client state Compute error
Exit status -226 (0xffffff1e)
Computer ID 963376
Report deadline 22 Dec 2008 1:30:07 UTC
CPU time 21570.15
stderr out <core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
# cpu_run_time_pref: 86400
Can't acquire lockfile - exiting
Can't acquire lockfile - exiting




DaveSun

Joined: May 3 07
Posts: 5
ID: 172723
Credit: 200,480
RAC: 0
Message 57834 - Posted 12 Dec 2008 22:05:32 UTC - in response to Message ID 57822.

Got a validation error on score12_rlbd_1gvp_IGNORE_THE_REST_DECOY_5473_170 any indication as to what may have caused this?
The task ran for the full time so no indication on my end of a problem.


I took another look at your results, and noticed that it returned 596 decoys. I don't think I've seen a workunit before that returned a 3 digit number of decoys, so perhaps there needs to be a check of whether both minirosetta 1.45 and the workunit validation software can handle that many decoys for one workunit and still do it properly.


I've been running at this setting for several months with out any major troubles and have had several that returned triple digit decoys. I setup to run 1 day after running for less than 10 hours for a long time and having units run what seemed like forever. This way I've not had any taks run over my preference and it works well for my setup. I just don't remember a task that did not validate that had run to completion here before this one.


Then perhaps the limit handled successfuly is higher than 99 decoys per workunit, but not as high as 596.


While that is possible you'd think that if there was a limit it'd be coded into the app and tasks would end once the limit was reached.

Mike Francis
Avatar

Joined: Nov 24 05
Posts: 8
ID: 17484
Credit: 623,519
RAC: 0
Message 57841 - Posted 13 Dec 2008 10:44:34 UTC

12/13/2008 12:39:43 AM|rosetta@home|Starting loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1
12/13/2008 12:39:43 AM|rosetta@home|Starting task loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1 using minirosetta version 145
12/13/2008 12:46:00 AM|rosetta@home|Computation for task loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1 finished
12/13/2008 12:46:00 AM|rosetta@home|Output file loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1_0 for task loopbuild_reference_native_cst_hombench_loopbuild_t306__IGNORE_THE_REST_1B70B_12_5533_4_1 absent

____________

(_KoDAk_) Profile

Joined: Jul 18 06
Posts: 109
ID: 100677
Credit: 1,859,263
RAC: 0
Message 57842 - Posted 13 Dec 2008 10:52:54 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=213868228
Validate error Done 43,178.07 !!!!!

http://boinc.bakerlab.org/rosetta/result.php?resultid=213166655
http://boinc.bakerlab.org/rosetta/result.php?resultid=212932042
http://boinc.bakerlab.org/rosetta/result.php?resultid=212932029
http://boinc.bakerlab.org/rosetta/result.php?resultid=212906401
http://boinc.bakerlab.org/rosetta/result.php?resultid=212906412
http://boinc.bakerlab.org/rosetta/result.php?resultid=212906413
http://boinc.bakerlab.org/rosetta/result.php?resultid=212931903
http://boinc.bakerlab.org/rosetta/result.php?resultid=212896182
http://boinc.bakerlab.org/rosetta/result.php?resultid=212896182
http://boinc.bakerlab.org/rosetta/result.php?resultid=212881858
http://boinc.bakerlab.org/rosetta/result.php?resultid=212692623
http://boinc.bakerlab.org/rosetta/result.php?resultid=212611598
http://boinc.bakerlab.org/rosetta/result.php?resultid=212499093

____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57846 - Posted 13 Dec 2008 13:41:58 UTC
Last modified: 13 Dec 2008 13:44:01 UTC

1wjdA_ZNMP_ABRELAX_tetraL_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1wjdA-_5478_4043_0 got stuck and was showing 23.45% remaining which is odd, being that the messages in boinc manager showed it had started about 5 minutes earlier before getting inturputed by benchmark testing.
after aborting the task the next one started and the cores went to 100% immediately.

rochester new york Profile
Avatar

Joined: Jul 2 06
Posts: 2562
ID: 98229
Credit: 957,089
RAC: 119
Message 57863 - Posted 14 Dec 2008 13:39:28 UTC

Server Status Page is showing a problem 839am 12/14/08

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57867 - Posted 14 Dec 2008 18:49:46 UTC - in response to Message ID 57842.

http://boinc.bakerlab.org/rosetta/result.php?resultid=213868228
Validate error Done 43,178.07 !!!!!

http://boinc.bakerlab.org/rosetta/result.php?resultid=213166655
http://boinc.bakerlab.org/rosetta/result.php?resultid=212932042
http://boinc.bakerlab.org/rosetta/result.php?resultid=212932029
http://boinc.bakerlab.org/rosetta/result.php?resultid=212906401
http://boinc.bakerlab.org/rosetta/result.php?resultid=212906412
http://boinc.bakerlab.org/rosetta/result.php?resultid=212906413
http://boinc.bakerlab.org/rosetta/result.php?resultid=212931903
http://boinc.bakerlab.org/rosetta/result.php?resultid=212896182
http://boinc.bakerlab.org/rosetta/result.php?resultid=212896182
http://boinc.bakerlab.org/rosetta/result.php?resultid=212881858
http://boinc.bakerlab.org/rosetta/result.php?resultid=212692623
http://boinc.bakerlab.org/rosetta/result.php?resultid=212611598
http://boinc.bakerlab.org/rosetta/result.php?resultid=212499093


I notice that your results are the first I've seen that were run under boinc 6.4.1. I wonder if that's the source of the problem instead of minirosetta 1.45?

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57868 - Posted 14 Dec 2008 18:50:05 UTC - in response to Message ID 57842.
Last modified: 14 Dec 2008 18:51:03 UTC

[duplicate]

mikylinux

Joined: Jul 25 07
Posts: 3
ID: 193561
Credit: 73,155
RAC: 0
Message 57869 - Posted 14 Dec 2008 19:27:25 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=213307491

AMD_is_logical

Joined: Dec 20 05
Posts: 299
ID: 41207
Credit: 31,460,681
RAC: 0
Message 57870 - Posted 14 Dec 2008 20:36:05 UTC - in response to Message ID 57863.

Server Status Page is showing a problem 839am 12/14/08

As of 14 Dec 2008 20:26:34 UTC the Server Status Page shows:
Program rah_make_work1 on host srv3 with status "Not running".
Work units Ready to send: 1

It looks like program rah_make_work2 isn't able to handle the load all by itself.

A Few Good Men

Joined: Mar 25 07
Posts: 14
ID: 157915
Credit: 2,031,382
RAC: 55
Message 57878 - Posted 15 Dec 2008 0:00:49 UTC

Greg be

I have uninstalled and reinstalled XP, reinstalled boinc, added the save in memory clause, standard clocks on computer, memtest on ram, burning for whole machine and im still getting these errors about comp error and locked files. At the same time SETI has no problems at all with the machine, its speed or its ram or anything.

"A good program doesnt need 54 hoops to jump through before it works"
After a clean install and full format , i have to lean towards the rosetta coding as the cause.

Problems with this code are:

Doesnt release all or just some of the processes when asked to snooze, lockfile is always present, says too many restarts.
Other machines here are running fine but this one seems to have problems with only Rosetta at home. After a clean install and full format , i have to lean towards the rosetta coding as the cause.
Does Not follow fair sharing of resources, Boinc manager at 50:50 and Rosetta has basically locked out all other projects.
How about a nice little msi file to patch up the damage and lets get folding.

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,570,814
RAC: 1,811
Message 57882 - Posted 15 Dec 2008 6:08:38 UTC

Shredder,

The core client is the one doing the resource shuffling not the rosetta app. And I suspect it is doing its work alright as long as you're not trying to micro-manage BOINC.
____________

[AF>Slappyto] popolito Profile

Joined: Mar 8 06
Posts: 13
ID: 64384
Credit: 825,077
RAC: 0
Message 57883 - Posted 15 Dec 2008 6:13:53 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=213877775
Exit status -1073741819 (0xc0000005)
Reason: Access Violation (0xc0000005) at address 0x007FA877 read attempt to address 0xF87CC8B3
____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57886 - Posted 15 Dec 2008 9:41:25 UTC - in response to Message ID 57878.

very strange...but i think that installing boinc again just reinstalls the base program and does nothing to the project files. did you go into your slots folder and erase the slots? that is where the lockfiles are located. be sure to complete all your current running tasks first before deleting. I run boinc off of a different partition than C, perhaps you can complete your current work and then install on a different partition and see if that takes care of the problem. after I did the slot clean up on my system everything worked ok.


Greg be

I have uninstalled and reinstalled XP, reinstalled boinc, added the save in memory clause, standard clocks on computer, memtest on ram, burning for whole machine and im still getting these errors about comp error and locked files. At the same time SETI has no problems at all with the machine, its speed or its ram or anything.

"A good program doesnt need 54 hoops to jump through before it works"
After a clean install and full format , i have to lean towards the rosetta coding as the cause.

Problems with this code are:

Doesnt release all or just some of the processes when asked to snooze, lockfile is always present, says too many restarts.
Other machines here are running fine but this one seems to have problems with only Rosetta at home. After a clean install and full format , i have to lean towards the rosetta coding as the cause.
Does Not follow fair sharing of resources, Boinc manager at 50:50 and Rosetta has basically locked out all other projects.
How about a nice little msi file to patch up the damage and lets get folding.

A Few Good Men

Joined: Mar 25 07
Posts: 14
ID: 157915
Credit: 2,031,382
RAC: 55
Message 57889 - Posted 15 Dec 2008 14:54:55 UTC

Greg,
I did a full Hard Drive format, there are no files project or otherwise then reinstalled XP. The slots are cleaned up.
After reinstalling Boinc and Rosetta as a project letting it manage itself for 24hours I had all the same errors as before, lockfile, not releasing and as always no credit. I have been running the services from a nonSystem disk, I will reinstall on the system disk and see how that works.
Thanks for taking time to help out.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,492,524
RAC: 7,931
Message 57890 - Posted 15 Dec 2008 14:57:57 UTC - in response to Message ID 57703.

And now 16 more successes out of 18 making 40 out of 50. Most errors came early, so I'm now confident enough to up my run-time from 2 to 3 hours again.

Update on this. In the last week, 116 MiniRosetta 1.45 tasks, 3hr runtime:
64 Success (55%)
52 Failure (45%)

Of failures:
19 manually aborted
28 "Can't acquire lockfile" errors
5 Exit code -1073741819 (0cx0000005)

26 Rosetta Beta 5.98 tasks, 3hr runtime - 100% success.

So, better than last time I ran with 3hr runtimes (was 43%) but still some way to go. I think the figure for 2hr run times was 73% (up to 80% with v1.45 on above figures).
____________

ramostol

Joined: Feb 6 07
Posts: 64
ID: 145835
Credit: 584,052
RAC: 0
Message 57891 - Posted 15 Dec 2008 15:09:48 UTC

A number of failed 1.45 abinitio-tasks in the last 24 hours:

These two I had to abort, they had run more than 20 hours on 4 hours default, refused to display graphics:

abinitio_abrelax_nohomfrag_129_B_2ccvA_5483_3423_088

abinitio_abrelax_nohomfrag_129_B_1ctf__5483_3423_0

These collapsed quickly:

abinitio_abrelax_nohomfrag_129_B_1dzoA_5483_1560_1

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(47077,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
minirosetta_1.45_i686-apple-darwin(47077,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
SIGBUS: bus error

abinitio_abrelax_nohomfrag_129_B_1npsA_5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(48148,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
SIGBUS: bus error

abinitio_abrelax_nohomfrag_129_B_2chf__5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(48486,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
minirosetta_1.45_i686-apple-darwin(48486,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
minirosetta_1.45_i686-apple-darwin(48486,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
SIGBUS: bus error

abinitio_abrelax_nohomfrag_129_B_1o4wA_5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(58522,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
minirosetta_1.45_i686-apple-darwin(58522,0xb0087000) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_S_create
SIGABRT: abort called

abinitio_abrelax_nohomfrag_129_B_1elwA_5483_3423_0

<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
minirosetta_1.45_i686-apple-darwin(47419,0xa0538fa0) malloc: *** error for object 0x17478c0: Non-aligned pointer being freed (2)
*** set a breakpoint in malloc_error_break to debug
# cpu_run_time_pref: 14400
SIGSEGV: segmentation violation

ramostol

Joined: Feb 6 07
Posts: 64
ID: 145835
Credit: 584,052
RAC: 0
Message 57892 - Posted 15 Dec 2008 15:12:53 UTC
Last modified: 15 Dec 2008 15:13:57 UTC

Deleted as duplicate - unstable internet connection...

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57897 - Posted 15 Dec 2008 18:50:28 UTC
Last modified: 15 Dec 2008 19:15:38 UTC

come on guys..your killing me.
2 compute errors in 8 hours today and then 4 out of 6 compute errors on the 11th that can be placed on bad tasks. What is with tasks getting half way and then crashing with no credit? You should make rosie grant the claimed credit on these errors since it is computing the credit. then we are not wasting our cpu time and electricity on 0 points. I could have got 101 points for the second crash. this month i have lost 18.5 hrs in bad tasks that died halfway and I could have got 508.4 credits if there was granted credit for crashing. The crash rate this month so far has been 6% on my system.

Here are the latest tasks that died.

1ef4A_ZNMP_ABRELAX_tetraL_IGNORE_THE_REST_ZINC_METALLOPROTEIN-1ef4A-_5478_9411_0 Exit status -1073741819 (0xc0000005), CPU time 12223.05
stderr out

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600

</stderr_txt>
]]>
-----------

abinitio_abrelax_nohomfrag_129_B_1vie__5483_167_0
CPU time 14489.83
Exit status -1073741819 (0xc0000005)
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0046D054 write attempt to address 0x085A5FFC

Engaging BOINC Windows Runtime Debugger...

ramostol

Joined: Feb 6 07
Posts: 64
ID: 145835
Credit: 584,052
RAC: 0
Message 57924 - Posted 16 Dec 2008 10:17:21 UTC

With 1.47 launched these results are perhaps not too interesting, but to be on the safe side:

This has indeed been a Black Monday, with all 1.45 tasks reserved for the coming week already crashed.

Early crashes with no result file:
8 tasks lr5_score13
2 tasks lr5_score12
1 task cc_3_5_nocst4
1 task 1_irna
1 task cs_vanilla
26 tasks abinitio...

In addition 2 tasks had to be manually aborted showing the signs of being non-terminators:
abinitio_abrelax_nohomfrag_129_B_2acy__5483_2781_0

abinitio_abrelax_nohomfrag_129_B_1r26A_5483_2512_0

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,443,336
RAC: 1,840
Message 57931 - Posted 16 Dec 2008 12:51:58 UTC - in response to Message ID 57924.

With 1.47 launched these results are perhaps not too interesting, but to be on the safe side:

This has indeed been a Black Monday, with all 1.45 tasks reserved for the coming week already crashed.

Early crashes with no result file:
8 tasks lr5_score13
2 tasks lr5_score12
1 task cc_3_5_nocst4
1 task 1_irna
1 task cs_vanilla
26 tasks abinitio...

In addition 2 tasks had to be manually aborted showing the signs of being non-terminators:
abinitio_abrelax_nohomfrag_129_B_2acy__5483_2781_0

abinitio_abrelax_nohomfrag_129_B_1r26A_5483_2512_0



You seem to be inserting a few extra characters when you create links, probably quote marks, which prevents me from following the links.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57933 - Posted 16 Dec 2008 13:08:41 UTC - in response to Message ID 57931.
Last modified: 16 Dec 2008 13:10:53 UTC

With 1.47 launched these results are perhaps not too interesting, but to be on the safe side:

This has indeed been a Black Monday, with all 1.45 tasks reserved for the coming week already crashed.

Early crashes with no result file:
8 tasks lr5_score13
2 tasks lr5_score12
1 task cc_3_5_nocst4
1 task 1_irna
1 task cs_vanilla
26 tasks abinitio...

In addition 2 tasks had to be manually aborted showing the signs of being non-terminators:
abinitio_abrelax_nohomfrag_129_B_2acy__5483_2781_0

abinitio_abrelax_nohomfrag_129_B_1r26A_5483_2512_0



You seem to be inserting a few extra characters when you create links, probably quote marks, which prevents me from following the links.




**note** i edited his url lines to get rid of the " ". thing should be ok now.
there is nothing to see there as it was a user abort and that cancels run time information. this is for both links.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57940 - Posted 16 Dec 2008 19:25:13 UTC
Last modified: 16 Dec 2008 19:31:44 UTC

abinitio_abrelax_nohomfrag_129_B_1opd__5483_1764_0

now you guys are starting to irritate me badly!!!!!!!!!
CPU time 21344.08 vs 21600run time.
stderr out

<core_client_version>6.2.19</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00427BEA write attempt to address 0x08787FFC


What the heck is this error now? did your task go bad at the last minute?
can someone from the team explain what the heck 0xc blah blah error means?

I was giving you 6hr run times but now i have dropped to 4. to many credit losses lately. If the 1.47's crash I will be reducing my resource share as well, until you guys figure out what the heck is going on.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,776
RAC: 660
Message 57959 - Posted 17 Dec 2008 8:59:12 UTC
Last modified: 17 Dec 2008 9:00:52 UTC

duplicate

rochester new york Profile
Avatar

Joined: Jul 2 06
Posts: 2562
ID: 98229
Credit: 957,089
RAC: 119
Message 58441 - Posted 4 Jan 2009 1:00:08 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=216768999

Bas

Joined: Dec 18 08
Posts: 1
ID: 293274
Credit: 2,316
RAC: 0
Message 58526 - Posted 5 Jan 2009 16:49:48 UTC

I'm new here so i don't know much about this program. But for a couple of days i don't get new task(s) this is what i see at messages window:

5-1-2009 17:28:28|rosetta@home|Sending scheduler request: To fetch work. Requesting 21601 seconds of work, reporting 0 completed tasks
5-1-2009 17:28:33|rosetta@home|Scheduler request completed: got 0 new tasks

Can i do something about this or are their just not any new tasks at the moment?

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 58527 - Posted 5 Jan 2009 17:13:23 UTC

It would appear that there are no new tasks at the moment. Be patient and all will be revealed. There haven't been any announcement from the team so either there is a problem their end or they are getting some new work units ready.
____________

Message boards : Number crunching : Minirosetta v1.45 bug thread


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^