Rosetta@home

Problems with Rosetta version 5.80

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Problems with Rosetta version 5.80

Sort
AuthorMessage
Ingemar

Joined: Feb 28 06
Posts: 20
ID: 61985
Credit: 1,680
RAC: 0
Message 46153 - Posted 13 Sep 2007 20:58:05 UTC

Please report problems with this version. Thanks!
____________

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 46198 - Posted 14 Sep 2007 15:29:22 UTC

No it isn't.
104430949 94762978 9 Sep 2007
____________
Jmarks

DJStarfox

Joined: Jul 19 07
Posts: 140
ID: 191721
Credit: 564,981
RAC: 148
Message 46202 - Posted 14 Sep 2007 16:12:51 UTC - in response to Message ID 46153.

Please report problems with this version. Thanks!


5.80 needs a lot more memory than previous Betas. BOINC says waiting for memory on a 512MB linux system with 2 CPUs. This did not happen on previous versions of Rosetta. Is this a permanent change? One task runs but the other (second set of threads) below is waiting for memory.

%CPU %MEM VSZ RSS STAT START TIME COMMAND
100 43.5 356264 224188 RN 10:39 87:56 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 43.5 356264 224188 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 43.5 356264 224188 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 43.5 356264 224188 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu

0.1 37.7 320764 194128 SN 10:39 0:06 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 37.7 320764 194128 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 37.7 320764 194128 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 37.7 320764 194128 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu

Wits End

Joined: Apr 16 07
Posts: 4
ID: 165531
Credit: 21,839
RAC: 71
Message 46203 - Posted 14 Sep 2007 16:23:08 UTC - in response to Message ID 46153.
Last modified: 14 Sep 2007 16:25:52 UTC

First two WUs under v5.80: first validated (95717876), second failed (95776999)!

David Emigh Profile
Avatar

Joined: Mar 13 06
Posts: 158
ID: 65176
Credit: 417,178
RAC: 0
Message 46221 - Posted 14 Sep 2007 19:17:29 UTC - in response to Message ID 46203.

First two WUs under v5.80: first validated (95717876), second failed (95776999)!


Perhaps it is only coincidence, but I notice that the failed WU was a Capri WU, the successful WU was not...

____________
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!

Rayburner

Joined: Oct 4 05
Posts: 32
ID: 2632
Credit: 4,512,828
RAC: 1,918
Message 46222 - Posted 14 Sep 2007 19:18:01 UTC

Hi!

two validate errors lately. Is there a specail reason for that?

http://boinc.bakerlab.org/rosetta/result.php?resultid=105570644

http://boinc.bakerlab.org/rosetta/result.php?resultid=104716132
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 46223 - Posted 14 Sep 2007 19:21:39 UTC
Last modified: 14 Sep 2007 19:21:57 UTC

I moved Rayburner's post here. One of thos was 5.78 the other was 5.80.
____________
Rosetta Moderator: Mod.Sense

anders n Profile

Joined: Sep 19 05
Posts: 403
ID: 578
Credit: 537,991
RAC: 0
Message 46224 - Posted 14 Sep 2007 20:10:13 UTC

Validate error on this Wu.

Anders n

Mark Henderson

Joined: May 24 06
Posts: 9
ID: 84276
Credit: 643,001
RAC: 0
Message 46230 - Posted 14 Sep 2007 22:22:11 UTC
Last modified: 14 Sep 2007 22:24:49 UTC

I had a compute error today on 5.80 and a watchdog termination on another yesterday using 5.78 on my AMD X2 4800. I have ran rosetta a long time and this is the first 2 errors I remember.
____________

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 46234 - Posted 14 Sep 2007 23:22:15 UTC
Last modified: 14 Sep 2007 23:25:45 UTC

Here we go again... (1he8__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-1he8_-plexinmonomer__2083_1421_0)

I did look at the screen at about 3 hours... i think it said model 1, step 513, the percentage indicator was 95.9x% - 96.xx% and increasing. Nothing was visibly moving in any of the graphic representations.

Watchdog shut down...

~60+ credits requested for ~4 hours on a single core of a Core2Quad, 20 credits granted...

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 46239 - Posted 15 Sep 2007 1:06:06 UTC - in response to Message ID 46234.

~60+ credits requested for ~4 hours on a single core of a Core2Quad, 20 credits granted...


OK, great! I'm glad you were able to catch one. Assuming that others behave the same way (a bit of a stretch with only a single one observed, but it's all we have to go by)... the fact that it is still on model one is the reason why the task fails and only 20 credits are granted.

If you had completed several models, then (at least the design to my knowledge is) these completed results would be reported back and credit issued for them. So that was one of my oustanding questions was "is the partial reporting of tasks that run for a while and then fail working properly?" And, based on your observation, it sounds like it is working as well as I would have expected. But the long running single models are basically exhibiting a worst-case scenario where extensive time is spent and only 20 credits are issued.

Wow, your task shows the score was stuck for 1,800 seconds. I take it Rhiju has increased the timeout for the watchdog.
____________
Rosetta Moderator: Mod.Sense

Ingemar

Joined: Feb 28 06
Posts: 20
ID: 61985
Credit: 1,680
RAC: 0
Message 46241 - Posted 15 Sep 2007 1:52:54 UTC

It appears that some of the Capri docking runs get stuck and gets terminated by the watchdog. The watchdog seems to do its job, the problem seem to be the simulations. This is the first time we do large scale tests on some new simulation modes and we will have to analyze why some runs get stuck/crashes. CAPRI ( Critical Assesment of Protein Interactions) is a competion where we try to predict the structure of protein-protein complexes. We have a deadline for submission of our models to this competion coming up soon and thats why you see so many Capri-something jobs. they will soon be out of the queue.

And yes we did increase the watchdog timeout.
____________

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 46242 - Posted 15 Sep 2007 2:19:06 UTC

Again, I'm in it for the science, not the credits. So, if the info I am able to provide is helpful, great. Hope it helps for this round (or the next) of the competition (good luck!)...

Paul

Joined: Oct 29 05
Posts: 155
ID: 7397
Credit: 12,392,717
RAC: 1,890
Message 46245 - Posted 15 Sep 2007 3:09:13 UTC

I continue to get computation errors running Rosetta 5.80

I had very few of these errors over the last few months and recently I have received many of them.

What can I do correct this condition?

thx

PRaney
____________
Thx!

Paul

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 46246 - Posted 15 Sep 2007 3:16:28 UTC - in response to Message ID 46245.
Last modified: 15 Sep 2007 3:17:18 UTC

Seems to be a bunch of:

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x7C812A5B

Engaging BOINC Windows Runtime Debugger...




I continue to get computation errors running Rosetta 5.80

I had very few of these errors over the last few months and recently I have received many of them.

What can I do correct this condition?

thx

PRaney

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 46251 - Posted 15 Sep 2007 4:56:20 UTC
Last modified: 15 Sep 2007 4:57:09 UTC

Jim's post refers to this invalid result

Michael Buckingham
Avatar

Joined: Feb 13 06
Posts: 19
ID: 58569
Credit: 137,935
RAC: 0
Message 46254 - Posted 15 Sep 2007 6:38:58 UTC


One of my BOINC Managers won't let me attach to rosetta...keeps saying project is offline.
____________

Rayburner

Joined: Oct 4 05
Posts: 32
ID: 2632
Credit: 4,512,828
RAC: 1,918
Message 46262 - Posted 15 Sep 2007 11:49:54 UTC

I got 0 credits for this wu: too many results:

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=94605647
____________

Paul

Joined: Oct 29 05
Posts: 155
ID: 7397
Credit: 12,392,717
RAC: 1,890
Message 46263 - Posted 15 Sep 2007 12:09:25 UTC

Just noticed each WU is consuming about 248MB of RAM. With 2 GB of RAM, this was not a problem until the Q6600 went into the system. 4 WUs are consuming 1/2 of the system memory.

What changed in 5.8 to cause the massive memory consumption and all of the computation errors? Can you do anything to pull in the memory requirements? Did the previous versions hold memory requirements at about 128MB per WU?


____________
Thx!

Paul

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 46264 - Posted 15 Sep 2007 13:03:09 UTC - in response to Message ID 46263.
Last modified: 15 Sep 2007 13:04:49 UTC

Just noticed each WU is consuming about 248MB of RAM. With 2 GB of RAM, this was not a problem until the Q6600 went into the system. 4 WUs are consuming 1/2 of the system memory.

What changed in 5.8 to cause the massive memory consumption and all of the computation errors? Can you do anything to pull in the memory requirements? Did the previous versions hold memory requirements at about 128MB per WU?



Go into Your Account and Edit
General preferences
Disk and memory usage
Use at most - 50% of memory when computer is in use
*** Lower this to what you want.

Ps This post is not about 5.80 you should start a seperate thread in 'Number Crunching'.
____________
Jmarks

Beezlebub
Avatar

Joined: Oct 18 05
Posts: 40
ID: 5335
Credit: 260,375
RAC: 0
Message 46266 - Posted 15 Sep 2007 13:46:43 UTC

This Capri14 WU did "client error" but has a debug readout. Might be useful http://boinc.bakerlab.org/rosetta/result.php?resultid=105518403
____________
e6600 quad @ 2.5ghz
2418 floating point
5227 integer

e6750 dual @ 3.71ghz
3598 floating point
7918 integer


DJStarfox

Joined: Jul 19 07
Posts: 140
ID: 191721
Credit: 564,981
RAC: 148
Message 46267 - Posted 15 Sep 2007 13:47:00 UTC - in response to Message ID 46264.

Just noticed each WU is consuming about 248MB of RAM. With 2 GB of RAM, this was not a problem until the Q6600 went into the system. 4 WUs are consuming 1/2 of the system memory.

What changed in 5.8 to cause the massive memory consumption and all of the computation errors? Can you do anything to pull in the memory requirements? Did the previous versions hold memory requirements at about 128MB per WU?



Go into Your Account and Edit
General preferences
Disk and memory usage
Use at most - 50% of memory when computer is in use
*** Lower this to what you want.

Ps This post is not about 5.80 you should start a seperate thread in 'Number Crunching'.


I posted for him because I noticed the same thing. My post here went unanswered.
http://boinc.bakerlab.org/rosetta/forum_thread.php?id=3564

Path7

Joined: Aug 25 07
Posts: 128
ID: 201002
Credit: 61,751
RAC: 0
Message 46280 - Posted 15 Sep 2007 16:24:00 UTC - in response to Message ID 46153.

Please report problems with this version. Thanks!


While crunching Rosetta Beta 5.80, WU 95855024 on my (1 core) AMD Sempron processor 3000+, BOINC replied with an “Waiting for memory” error.
My computer (Windows XP-home SP2) has 448 MB of memory, which exceeds the recommended system requirements.

To get lost of this problem, I gave the 5.80 more memory by adjusting the: “Use at most 50% of memory when computer is in use” to 60% of memory.
This has solved the problem (so far).

O.t.: The screen saver looks like a beautiful piece of art!

Path7.

Gorkan

Joined: Sep 13 07
Posts: 10
ID: 204943
Credit: 151,300
RAC: 0
Message 46281 - Posted 15 Sep 2007 16:35:34 UTC

I dunno , looks like it was chewing on something it didnt want to swallow
On the plus side it didnt leave a mess on the floor.

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 2944148


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0092541B read attempt to address 0x16481000

Engaging BOINC Windows Runtime Debugger...



********************

Paul

Joined: Oct 29 05
Posts: 155
ID: 7397
Credit: 12,392,717
RAC: 1,890
Message 46283 - Posted 15 Sep 2007 16:38:59 UTC

Thanks for the help with the preferences. I made some changes.

After more investigation on the comutation errors, it is clear that of my 5 systems, only the one with a quad core process Q6600 is getting the computation errors. Of course, this is also the busiest system.

It looks like about 3 failed for every success WU.

Any suggestions are welcome.

I can provide the debug info if it will help.

http://boinc.bakerlab.org/rosetta/result.php?resultid=105762597
http://boinc.bakerlab.org/rosetta/result.php?resultid=105843964
http://boinc.bakerlab.org/rosetta/result.php?resultid=105811123
http://boinc.bakerlab.org/rosetta/result.php?resultid=105739580

____________
Thx!

Paul

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 46286 - Posted 15 Sep 2007 17:27:29 UTC
Last modified: 15 Sep 2007 17:28:34 UTC

Here is a double failure...

t030__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-t030_-plexinmonomer__2083_2234

stderr out <core_client_version>5.10.13</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 3553867
si
</stderr_txt>
]]>


Validate state Invalid

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 46293 - Posted 15 Sep 2007 18:02:12 UTC - in response to Message ID 46262.
Last modified: 15 Sep 2007 18:24:00 UTC

I got 0 credits for this wu: too many results:

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=94605647


There is a quirk with the BOINC server software which reissues a task too soon. Seems to only happen when one machine gets a compute error. This is discussed in an existing thread, and there is an item on the BOINC boards to get this corrected.
____________
Rosetta Moderator: Mod.Sense

[RKN] schatten1411 , Mitglied des Teams und des VEREINS Rechenkraft.net

Joined: Apr 25 07
Posts: 12
ID: 169402
Credit: 441,995
RAC: 0
Message 46303 - Posted 15 Sep 2007 20:17:20 UTC

Selber Fehler wie in 5.78 auch in der Beta ?

104916668 545978 11 Sep 2007 15:23:03 UTC 14 Sep 2007 5:36:06 UTC Over Success Done 9,430.36 50.13 20.00

Ihr arbeitet zwar gerade dran, aber was ist mit der Fehlerbeseitigung bei den erledigten WU`s ?

RC

Joined: Sep 27 05
Posts: 13
ID: 1401
Credit: 245,498
RAC: 0
Message 46304 - Posted 15 Sep 2007 20:56:47 UTC - in response to Message ID 46239.

OK, great! I'm glad you were able to catch one. Assuming that others behave the same way (a bit of a stretch with only a single one observed, but it's all we have to go by)... the fact that it is still on model one is the reason why the task fails and only 20 credits are granted.


Here's another one
____________

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 46319 - Posted 16 Sep 2007 2:06:25 UTC
Last modified: 16 Sep 2007 2:07:36 UTC

And a second "double failure" here

1g4u__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-1g4u_-lig_plexinmonomer__2085_1427

stderr out

<core_client_version>5.10.13</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 2944674
ERROR:: Exit from: .\pose.cc line: 769

</stderr_txt>
]]>

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 46320 - Posted 16 Sep 2007 2:14:33 UTC
Last modified: 16 Sep 2007 2:25:26 UTC

It's a good thing I don't care too much for credits as I do the science...

Just look at the amount of time that is being "wasted"...

My wu's are set for 3 hrs (10,800 seconds), and this one ran for over TWICE that, 21,653 seconds !!!

All for 20 credits...

Here it is...

1g4u__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-1g4u_-plexinmonomer__2083_3748_0

stderr out

<core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 3582353
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -69.8041 for 1800 seconds
**********************************************************************
GZIP SILENT FILE: .\xx1g4u.out

</stderr_txt>
]]>

Validate state Valid
Claimed credit 93.156397645739
Granted credit 20
application version 5.80

David Emigh Profile
Avatar

Joined: Mar 13 06
Posts: 158
ID: 65176
Credit: 417,178
RAC: 0
Message 46323 - Posted 16 Sep 2007 3:50:19 UTC

Here are a couple of failed WUs, both of them Capri14...

WU 95938082
WU 95780562

I think it is important to note that this computer has not had a single failure on non-Capri14 WUs, but has a dismal record of about 7 failures for each 8 attempts with Capri14...
____________
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!

tazrt

Joined: Aug 31 06
Posts: 6
ID: 109012
Credit: 468,735
RAC: 0
Message 46336 - Posted 16 Sep 2007 9:09:32 UTC

Hi,
I also have some trouble with 3 Capri-WUs.

2 of them are valid (granted Credit for 6-8h runtime = 20 credits) but have gotten stuck:
WUID:96055341 and WUID:95800425

1 Capri is invalid: Access Violation (0xc0000005)
WUID:95766573

PC is an not oc'ed Q6600 with 2GB RAM
Target CPU Runtime:12h.

____________

Daniel

Joined: Nov 4 05
Posts: 1
ID: 9206
Credit: 11,084
RAC: 0
Message 46339 - Posted 16 Sep 2007 9:53:34 UTC

still running 5.80 since friday and no errors

system:
athlon64-3000
win2k sp4
1GB RAM

Rolly

Joined: Dec 31 05
Posts: 4
ID: 45199
Credit: 691,271
RAC: 0
Message 46340 - Posted 16 Sep 2007 10:40:13 UTC

I also noticed a first failure on my system, Result 105943441. It seems the unit also hung somewhere during computation.

I was surpsised that this non Capri unit is also using the Beta core, I understand using the Beta core for a competition on rosetta@home bur for less urgent workunits I would think it to be better to first test it on ralp?
____________

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 46345 - Posted 16 Sep 2007 12:08:27 UTC

Here are 4 more

http://boinc.bakerlab.org/rosetta/result.php?resultid=104541449
http://boinc.bakerlab.org/rosetta/result.php?resultid=104542777
http://boinc.bakerlab.org/rosetta/result.php?resultid=104585618
http://boinc.bakerlab.org/rosetta/result.php?resultid=104621606

I hope that the few CAPRI14 that actually make it through are worth it.
____________
Jmarks

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,742,648
RAC: 5,672
Message 46346 - Posted 16 Sep 2007 12:14:15 UTC
Last modified: 16 Sep 2007 12:15:02 UTC

Here's another one of those CAPRI units which failed.

http://boinc.bakerlab.org/rosetta/result.php?resultid=105829549

I happen to have looked at graphics when it froze. 82 models were crunched when it failed, model 83 was at step 537. After that it was just waiting for the watchdog to terminate the task.
____________

Paul

Joined: Oct 29 05
Posts: 155
ID: 7397
Credit: 12,392,717
RAC: 1,890
Message 46347 - Posted 16 Sep 2007 12:16:29 UTC

I am starting to wonder if this problem with the failed work units is related to multicore or Q6600 processors. Could it be a memory management issue with the WUs attempting to access the same memory locations creating a lock or race condition?

I finally disconnected my Q6600 computer from Rosetta and started on other projects. Thus far, no computation errors.

Most of my other computers are Core Duo and they report no issues.

Is anyone else using an optimized boinc client?

Q6600
2GB RAM
500 GB Disk
400 MB Swap < this is very small
XP Home

Is anyone having this problem with Vista 32 or 64?

I will try increasing my swap space to 2GB and see if it corrects the problem.
____________
Thx!

Paul

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 46353 - Posted 16 Sep 2007 12:55:25 UTC - in response to Message ID 46347.
Last modified: 16 Sep 2007 13:04:00 UTC

Good question ! May be sheer coincidence, but seems we're hearing about this with the Q6600's more than "average"...

I'm running standard Boinc client, Q6600, 2 GB RAM, Swap = 75% of page file, Vista Premium (32).

EDIT--> Just noticed inetersting post here.

I am starting to wonder if this problem with the failed work units is related to multicore or Q6600 processors. Could it be a memory management issue with the WUs attempting to access the same memory locations creating a lock or race condition?

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 46355 - Posted 16 Sep 2007 13:06:11 UTC - in response to Message ID 46353.

Good question ! May be sheer coincidence, but seems we're hearing about this with the Q6600's more than "average"...

I'm running standard Boinc client, Q6600, 2 GB RAM, Swap = 75% of page file, Vista Premium (32).

I am starting to wonder if this problem with the failed work units is related to multicore or Q6600 processors. Could it be a memory management issue with the WUs attempting to access the same memory locations creating a lock or race condition?



I have a dual core e6600 4 gig and 70% of mine are bad also.
____________
Jmarks

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 46362 - Posted 16 Sep 2007 14:17:13 UTC - in response to Message ID 46355.

Good question ! May be sheer coincidence, but seems we're hearing about this with the Q6600's more than "average"...

I'm running standard Boinc client, Q6600, 2 GB RAM, Swap = 75% of page file, Vista Premium (32).

I am starting to wonder if this problem with the failed work units is related to multicore or Q6600 processors. Could it be a memory management issue with the WUs attempting to access the same memory locations creating a lock or race condition?



I have a dual core e6600 4 gig and 70% of mine are bad also.


I bet it has more to do with the fact that we have more memory available then other PC's so we get more of those wu's.
____________
Jmarks

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 46370 - Posted 16 Sep 2007 16:02:05 UTC

I had to abort this one due to 'waiting for memory'. All the others have worked without a problem.

http://boinc.bakerlab.org/rosetta/result.php?resultid=105692089
____________

stewjack

Joined: Apr 23 06
Posts: 39
ID: 78784
Credit: 95,871
RAC: 0
Message 46371 - Posted 16 Sep 2007 16:34:59 UTC - in response to Message ID 46370.
Last modified: 16 Sep 2007 16:41:11 UTC

I had to abort this one due to 'waiting for memory'. All the others have worked without a problem.

http://boinc.bakerlab.org/rosetta/result.php?resultid=105692089


Evan,
I have had your single 'waiting for memory' problem out of 3 Capri WU's
I run a single core CPU with 512 memory.

My WU is very similar to yours.
http://boinc.bakerlab.org/rosetta/result.php?resultid=105635179


Jack


____________

David Emigh Profile
Avatar

Joined: Mar 13 06
Posts: 158
ID: 65176
Credit: 417,178
RAC: 0
Message 46373 - Posted 16 Sep 2007 17:18:06 UTC - in response to Message ID 46345.

I hope that the few CAPRI14 that actually make it through are worth it.


I echo that sentiment.

I'm changing my preferences to allow BOINC access to 90% of memory all the time (whether or not the computer is "idle")

____________
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!

Rayburner

Joined: Oct 4 05
Posts: 32
ID: 2632
Credit: 4,512,828
RAC: 1,918
Message 46374 - Posted 16 Sep 2007 17:22:51 UTC - in response to Message ID 46355.

Good question ! May be sheer coincidence, but seems we're hearing about this with the Q6600's more than "average"...

I'm running standard Boinc client, Q6600, 2 GB RAM, Swap = 75% of page file, Vista Premium (32).

I am starting to wonder if this problem with the failed work units is related to multicore or Q6600 processors. Could it be a memory management issue with the WUs attempting to access the same memory locations creating a lock or race condition?



I have a dual core e6600 4 gig and 70% of mine are bad also.


I think there must be something else.

I am running a qx6700 with 2 gig on Vista Ultimate. There hasn't been one of such wus with this memory problem. I am also running several projects at a time (CPDN, malariacontrol, SETI, Einstein, WCG, Rosetta). So there are always several apps in memory (and they stay inside when tasks are switched; also multiple instances of rosetta, of course). Thats is why there is a heavy load on the memory.

I will keep watching my results if such memory problem appears on my machine.

Regards
Rayburner

____________

sslickerson Profile

Joined: Oct 14 05
Posts: 101
ID: 4578
Credit: 484,477
RAC: 0
Message 46377 - Posted 16 Sep 2007 17:48:37 UTC

I've had one CAPRI14 WU fail: 105503053 on this computer but all others have finished fine.

--Timothy

Paul

Joined: Oct 29 05
Posts: 155
ID: 7397
Credit: 12,392,717
RAC: 1,890
Message 46386 - Posted 16 Sep 2007 20:25:25 UTC

Swap space increased from 400MB to 2048MB. The system is attached to the World Community Grid 50% and R@H 50%. 2 success & 1 failure

Rayburner:

Can you try going to 100% on R@H and see if you start getting failures similar to what we see on XP?

I also wonder if the Vista memory manager is better & corrects for this memory conflict between WUs.

It would be good to have a test point with a Q6600 and Vista running 100% R@H.

This sounds like an issue with the XP memory manager, BOINC & large memory work units. If we can isolate the CPU types and OS, we might help find this issue quickly.

JMarks:

What is your config?
e6600
4GB RAM
Swap ??
OS ??




____________
Thx!

Paul

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 46392 - Posted 16 Sep 2007 22:09:12 UTC - in response to Message ID 46346.

Here's another one of those CAPRI units which failed.

http://boinc.bakerlab.org/rosetta/result.php?resultid=105829549

I happen to have looked at graphics when it froze. 82 models were crunched when it failed, model 83 was at step 537. After that it was just waiting for the watchdog to terminate the task.


DK/Rhiju could you look in to why this task only received 20 credits? I had thought that if 80 models were completed prior to the failure, that these should be reported and utilized by the project, and credit issued accordingly as well.
____________
Rosetta Moderator: Mod.Sense

Keith T.
Avatar

Joined: Mar 1 07
Posts: 37
ID: 150379
Credit: 12,959
RAC: 0
Message 46399 - Posted 17 Sep 2007 0:53:35 UTC

1g4u__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-1g4u_-lig_plexinmonomer__2085_4000_0

http://boinc.bakerlab.org/rosetta/result.php?resultid=105842490

Compute error Exit status -1073741819 (0xc0000005)

PC had been running unattended for > 3 hours when this occured. Screen Saver is Blank Screen.
____________

Gen_X_Accord Profile
Avatar

Joined: Jun 5 06
Posts: 154
ID: 87850
Credit: 279,018
RAC: 0
Message 46414 - Posted 17 Sep 2007 8:35:47 UTC

The only thing I've noticed strange about 5.80 is that my granted credits are much lower than normal.
____________

Konstantin Iliev

Joined: May 22 06
Posts: 4
ID: 83901
Credit: 2,053,517
RAC: 0
Message 46420 - Posted 17 Sep 2007 12:03:14 UTC
Last modified: 17 Sep 2007 12:04:24 UTC

Lots of Access Violations on one of my computers: http://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=225341

Capri units...
____________

hbobeck

Joined: Sep 4 07
Posts: 1
ID: 203472
Credit: 861
RAC: 0
Message 46421 - Posted 17 Sep 2007 12:07:37 UTC - in response to Message ID 46414.

Something is going terribly wrong... the last days 7 validate errors! (WU's 95174310, 95809736, 95873445, 96251758, 96251759, 96273830, 96299805).

Any particular reason for this???

Harry

Rayburner

Joined: Oct 4 05
Posts: 32
ID: 2632
Credit: 4,512,828
RAC: 1,918
Message 46442 - Posted 17 Sep 2007 15:28:23 UTC - in response to Message ID 46386.

Swap space increased from 400MB to 2048MB. The system is attached to the World Community Grid 50% and R@H 50%. 2 success & 1 failure

Rayburner:

Can you try going to 100% on R@H and see if you start getting failures similar to what we see on XP?

I also wonder if the Vista memory manager is better & corrects for this memory conflict between WUs.

It would be good to have a test point with a Q6600 and Vista running 100% R@H.

This sounds like an issue with the XP memory manager, BOINC & large memory work units. If we can isolate the CPU types and OS, we might help find this issue quickly.

JMarks:

What is your config?
e6600
4GB RAM
Swap ??
OS ??





I've been running 100% rosetta for the last 10 hours. So far no memory problems but one client error:

http://boinc.bakerlab.org/rosetta/result.php?resultid=106067477


____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,967,278
RAC: 589
Message 46443 - Posted 17 Sep 2007 15:29:50 UTC
Last modified: 17 Sep 2007 15:30:22 UTC

from ricky@seti.usa posted in the Cafe section

One of my PC's stops running R@H WU's when the screensaver kicks in and I am getting the following message from another PC from BOINC:

9/16/2007 14:23:04|rosetta@home|[error] rosetta_beta not responding to screensaver, requesting exit
9/16/2007 14:23:07|rosetta@home|Task 1mh1__BOINC_MINIMIZE2_SCORE12_CAPRI14_DOCK_FIXBACKBONE-1mh1_-lig_rxplxn_0585plexinmonomer__2084_138_0 exited with zero status but no 'finished' file
9/16/2007 14:23:07|rosetta@home|If this happens repeatedly you may need to reset the project.
9/16/2007 14:23:07|rosetta@home|Restarting task 1mh1__BOINC_MINIMIZE2_SCORE12_CAPRI14_DOCK_FIXBACKBONE-1mh1_-lig_rxplxn_0585plexinmonomer__2084_138_0 using rosetta_beta version 580



____________

Ingemar

Joined: Feb 28 06
Posts: 20
ID: 61985
Credit: 1,680
RAC: 0
Message 46479 - Posted 17 Sep 2007 21:41:44 UTC

A large fraction of the CAPRI-something jobs are failing. We are removing these jobs from the queue now and will not run more of those before we located the problem. Sorry for the inconvenience!

____________

Ricky@SETI.USA
Avatar

Joined: Dec 13 05
Posts: 18
ID: 36732
Credit: 68,588
RAC: 415
Message 46489 - Posted 18 Sep 2007 0:23:23 UTC

I have a AMD Desktop that downloaded 7 WU's 24 hours ago and so far has only completed 1 WU. The problem is it seems to hang and stops running. At 1st I thought it was a Screensaver problem but after turning off the Screensaver it still hangs, other projects are doing fine.

These WU's all have FIXBACKBONE in their file name. I am thinking of aborting them because I am causing other projects to be late because when R@H hangs nothing gets done.

____________
"Life is like an Ice Cream cone, just when you think you got it licked, it drips all over you!"

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 46492 - Posted 18 Sep 2007 1:14:09 UTC
Last modified: 18 Sep 2007 1:27:07 UTC

And a third "double failure" here

Unfortunately, it took my pc 8,256 seconds, while the other pc (a T5500 dual-core) took only 87 seconds to "fail"...

Again, I have to wonder if quad-cores (i.e., Q6600's) fail "bigger" (taking 100 times longer)...

If I had failed at 87 seconds, that would have been 8,169 seconds (2.25 hours) that could have been spent obtaining "valid" results with a different wu...

Why is the same wu failing at two different run times, and at two different points in the program?

1mh1__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-1mh1_-lig_plexinmonomer__2085_9238

stderr out

<core_client_version>5.10.13</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 2926863
ERROR:: Exit from: .\pose.cc line: 769

</stderr_txt>
]]>

Validate state Invalid

(_KoDAk_) Profile

Joined: Jul 18 06
Posts: 109
ID: 100677
Credit: 1,859,263
RAC: 0
Message 46524 - Posted 18 Sep 2007 15:19:05 UTC

2007-09-18 18:17:30 [rosetta@home] Sending scheduler request: Requested by user
2007-09-18 18:17:30 [rosetta@home] (not requesting new work or reporting completed tasks)
2007-09-18 18:17:35 [rosetta@home] Scheduler RPC succeeded
2007-09-18 18:17:35 [rosetta@home] Message from server: Project encountered internal error: shared memory
2007-09-18 18:17:35 [rosetta@home] Deferring communication for 1 hr 0 min 0 sec
2007-09-18 18:17:35 [rosetta@home] Reason: project is down
2007-09-18 18:17:40 [rosetta@home] [file_xfer] Started upload of file 1g4u__BOINC_CAPRI14_DOCK_FIXBACKBONE-1g4u_-nosillyloop_plexinmonomer__2067_8577_0_0
2007-09-18 18:17:40 [rosetta@home] [file_xfer] Started upload of file 1mh1__BOINC_CAPRI14_DOCK_FIXBACKBONE-1mh1_-plexindimer__2067_8698_0_0
2007-09-18 18:17:43 [---] Project communication failed: attempting access to reference site
2007-09-18 18:17:43 [rosetta@home] [file_xfer] Temporarily failed upload of 1mh1__BOINC_CAPRI14_DOCK_FIXBACKBONE-1mh1_-plexindimer__2067_8698_0_0: http error
2007-09-18 18:17:43 [rosetta@home] Backing off 1 hr 29 min 34 sec on upload of file 1mh1__BOINC_CAPRI14_DOCK_FIXBACKBONE-1mh1_-plexindimer__2067_8698_0_0
2007-09-18 18:17:43 [rosetta@home] [file_xfer] Started upload of file t030__BOINC_MINIMIZE2_SCORE12_CAPRI14_DOCK_FIXBACKBONE-t030_-lig_rxplxn_1152plexinmonomer__2084_586_0_0
2007-09-18 18:17:44 [---] Access to reference site succeeded - project servers may be temporarily down.
2007-09-18 18:17:45 [---] Project communication failed: attempting access to reference site
2007-09-18 18:17:45 [rosetta@home] [file_xfer] Temporarily failed upload of t030__BOINC_MINIMIZE2_SCORE12_CAPRI14_DOCK_FIXBACKBONE-t030_-lig_rxplxn_1152plexinmonomer__2084_586_0_0: http error
2007-09-18 18:17:45 [rosetta@home] Backing off 3 hr 26 min 11 sec on upload of file t030__BOINC_MINIMIZE2_SCORE12_CAPRI14_DOCK_FIXBACKBONE-t030_-lig_rxplxn_1152plexinmonomer__2084_586_0_0
2007-09-18 18:17:45 [rosetta@home] [file_xfer] Started upload of file 1g4u__BOINC_MINIMIZE2_SCORE12_CAPRI14_DOCK_FIXBACKBONE-1g4u_-lig_rxplxn_1036plexinmonomer__2084_2482_0_0
2007-09-18 18:17:47 [---] Access to reference site succeeded - project servers may be temporarily down.
2007-09-18 18:17:47 [rosetta@home] [file_xfer] Temporarily failed upload of 1g4u__BOINC_MINIMIZE2_SCORE12_CAPRI14_DOCK_FIXBACKBONE-1g4u_-lig_rxplxn_1036plexinmonomer__2084_2482_0_0: http error
2007-09-18 18:17:47 [rosetta@home] Backing off 2 hr 29 min 35 sec on upload of file 1g4u__BOINC_MINIMIZE2_SCORE12_CAPRI14_DOCK_FIXBACKBONE-1g4u_-lig_rxplxn_1036plexinmonomer__2084_2482_0_0
____________

Rayburner

Joined: Oct 4 05
Posts: 32
ID: 2632
Credit: 4,512,828
RAC: 1,918
Message 46527 - Posted 18 Sep 2007 15:49:54 UTC - in response to Message ID 46442.

Swap space increased from 400MB to 2048MB. The system is attached to the World Community Grid 50% and R@H 50%. 2 success & 1 failure

Rayburner:

Can you try going to 100% on R@H and see if you start getting failures similar to what we see on XP?

I also wonder if the Vista memory manager is better & corrects for this memory conflict between WUs.

It would be good to have a test point with a Q6600 and Vista running 100% R@H.

This sounds like an issue with the XP memory manager, BOINC & large memory work units. If we can isolate the CPU types and OS, we might help find this issue quickly.

JMarks:

What is your config?
e6600
4GB RAM
Swap ??
OS ??





I've been running 100% rosetta for the last 10 hours. So far no memory problems but one client error:

http://boinc.bakerlab.org/rosetta/result.php?resultid=106067477



Result of 24 Hours of rosetta only:

45 successes / 2 client errors both pose loops t30 WUs (4,44% error rate)

in total of all wus I crunched recently 3 validate errors and 3 client errors (pose loops t30 for the client errors) --> 4,34% error rate

____________

BarryAZ

Joined: Dec 27 05
Posts: 151
ID: 43659
Credit: 28,806,508
RAC: 8,542
Message 46532 - Posted 18 Sep 2007 16:11:29 UTC

OK, based on the reports embedded in this thread along with the current shared memory error, I've suspended processing on Rosetta for now and am busily aborting all of the Capri 'bad boy' work units I have out there on workstations (and there are a LOT of them running loose).

I'm wondering though if the better approach, once the Rosetta folks have corrected the shared memory issue and are able to *announce* they have purged the database of the Capri work units, would be to *Reset* Rosetta on workstations. For now, I'm limiting the damage to other projects (by the CPU waste that Capri work units can cause), by the action of suspending Rosetta on the workstations.

Sure would be nice to see some newsflash on this though -- rather than expect folks to wander down here to get the news.


____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,967,278
RAC: 589
Message 46533 - Posted 18 Sep 2007 17:22:47 UTC

i had some issues with old capri on 5.78 and just reset and redownloaded boinc and then it got 5.80 and 7 days of work automaticly. no comm errors or anything.
____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,967,278
RAC: 589
Message 46552 - Posted 18 Sep 2007 20:30:52 UTC

sure enough, this capri unit that was still on the server crashed..or was it aborted by rosie?

9/18/2007 8:25:34 PM|rosetta@home|Reason: Unrecoverable error for result 1g4u__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-1g4u_-lig_plexinmonomer__2085_9252_1 ( - exit code -1073741819 (0xc0000005))
9/18/2007 8:25:34 PM|rosetta@home|Computation for task 1g4u__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-1g4u_-lig_plexinmonomer__2085_9252_1 finished

____________

Conan Profile
Avatar

Joined: Oct 11 05
Posts: 136
ID: 4053
Credit: 1,906,525
RAC: 1,152
Message 46564 - Posted 19 Sep 2007 4:04:19 UTC

My preferences are set to 21,000 seconds (6 hours).
This WU took over 28,000 seconds (nearly 8 hours) but still validated.

It was still successful and reported as Valid.
I claimed 98.93 for the time taken and the work done but was granted just 5.38.

What is the go with that?

I suspect the WU locked up during processing and this is why the time is so long, but as it is still valid I should get full credit.

There is an error report in this result but at the end all came out valid and successful.

Please I would like to know the reason for this result.
____________

BarryAZ

Joined: Dec 27 05
Posts: 151
ID: 43659
Credit: 28,806,508
RAC: 8,542
Message 46579 - Posted 19 Sep 2007 4:37:05 UTC

I see that the server applications are all disabled (as of about 9:30PM PDT) -- what's with that?

____________

BarryAZ

Joined: Dec 27 05
Posts: 151
ID: 43659
Credit: 28,806,508
RAC: 8,542
Message 46581 - Posted 19 Sep 2007 5:22:47 UTC - in response to Message ID 46579.

OK -- one of those planned but unmentioned 'quickies' I guess -- it is up and running now.


I see that the server applications are all disabled (as of about 9:30PM PDT) -- what's with that?


____________

Luuklag

Joined: Sep 13 07
Posts: 262
ID: 205058
Credit: 4,171
RAC: 0
Message 46589 - Posted 19 Sep 2007 7:43:37 UTC

well i see most of you had problems running CAPRI14_DOCK_FIXBACKBONE WU's although i ran 4 of them in aproxemetly 3 hours without errors. http://boinc.bakerlab.org/rosetta/results.php?userid=205058

i run stuff on my AMD Athlon(tm) 64 Processor 3200+ most of you seem to have trouble running on a Q6600 thingie maybe the problem lies somewere in there?

Ingemar

Joined: Feb 28 06
Posts: 20
ID: 61985
Credit: 1,680
RAC: 0
Message 46590 - Posted 19 Sep 2007 7:49:21 UTC

We discovered that the result files of the now infamous capri were quite large which has caused some problems. We are looking into this issue now, but rest assured, no more of these jobs are being sent out. So things should be back to normal now that none of these jobs are in the queue.
____________

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 46598 - Posted 19 Sep 2007 10:54:13 UTC - in response to Message ID 46590.

We discovered that the result files of the now infamous capri were quite large which has caused some problems. We are looking into this issue now, but rest assured, no more of these jobs are being sent out. So things should be back to normal now that none of these jobs are in the queue.


I have a 2 day queue so I guess 'I'll Be Back'
http://boinc.bakerlab.org/rosetta/result.php?resultid=106031266

____________
Jmarks

Conan Profile
Avatar

Joined: Oct 11 05
Posts: 136
ID: 4053
Credit: 1,906,525
RAC: 1,152
Message 46604 - Posted 19 Sep 2007 11:19:25 UTC - in response to Message ID 46590.

We discovered that the result files of the now infamous capri were quite large which has caused some problems. We are looking into this issue now, but rest assured, no more of these jobs are being sent out. So things should be back to normal now that none of these jobs are in the queue.


Hello Ingemar, I was wondering if you can have a look at message 46564 in this thread and explain to me what happened please, as I would like to know.
____________

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 46606 - Posted 19 Sep 2007 11:53:30 UTC - in response to Message ID 46590.

This is the "real world", and even with Ralph@Home you're going to be learning what can be done better.

My posting of 5.80 Capri14 problems was to alert the Project as to what was happening.

Even with a higher than average failure rate, if it benefits the Project, I'm still willing to run these wu's, even at the risk of no credit.

Whatever helps Rosie, and she knows best what that is...

We discovered that the result files of the now infamous capri were quite large which has caused some problems. We are looking into this issue now, but rest assured, no more of these jobs are being sent out. So things should be back to normal now that none of these jobs are in the queue.

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1740
ID: 44890
Credit: 2,439,354
RAC: 3,785
Message 46617 - Posted 19 Sep 2007 15:24:07 UTC - in response to Message ID 46590.

We discovered that the result files of the now infamous capri were quite large which has caused some problems. We are looking into this issue now, but rest assured, no more of these jobs are being sent out. So things should be back to normal now that none of these jobs are in the queue.


Ya! 483 decoys in a 24hr WU! It was like a 3+MB upload. No WONDER the file server has been busy :)
____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

Rhiju
Forum moderator

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 46637 - Posted 19 Sep 2007 18:44:01 UTC - in response to Message ID 46564.

Hi Conan, I think you're exactly right -- the WU must have frozen at some time. We previously saw a high rate of watchdog kills for this type of workunit due to freezing, so that supports your hypothesis. There's clearly some problem with these jobs -- we've discontinued them until we can fix the issue!

My preferences are set to 21,000 seconds (6 hours).
This WU took over 28,000 seconds (nearly 8 hours) but still validated.

It was still successful and reported as Valid.
I claimed 98.93 for the time taken and the work done but was granted just 5.38.

What is the go with that?

I suspect the WU locked up during processing and this is why the time is so long, but as it is still valid I should get full credit.

There is an error report in this result but at the end all came out valid and successful.

Please I would like to know the reason for this result.


____________

Rhiju
Forum moderator

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 46638 - Posted 19 Sep 2007 18:46:38 UTC

Also, I think this has been posted elsewhere: if you have jobs marked CAPRI, you should feel free to abort them. Although most of these workunits are running fine and we are using the incoming data, there is a small possibility of the workunits being frozen or of the output files being so large that the file transfer can get bungled. No more of these workunits are being sent out!


____________

Prom

Joined: Jun 21 06
Posts: 23
ID: 96283
Credit: 509,175
RAC: 0
Message 46650 - Posted 20 Sep 2007 0:18:52 UTC
Last modified: 20 Sep 2007 0:33:55 UTC

Just a theory here.

I noticed that the cpu usage increased to above 50%. I was told that this might happen with graphics running but it seems it continues to do that after the graphics are shut down. The cpu usage is erratic and despite showing high usage the time and results seem to be going slower. This only seems to happen with the capri workunits though, the last one just finished after which things returned to normal. I know there was a graphics problem that is supposed to be fixed in the new versions. Is there any chance that another problem was introduced where the graphics thread continues to run after exiting the graphics?

Spoke too soon. :/
cpu usage is 2/3 on one unit and 1/3 on the other. This after viewing both.

____________
BBLounge - Broadband and Technology forum

Gatekeeper

Joined: Feb 26 07
Posts: 4
ID: 150007
Credit: 966,551
RAC: 0
Message 46653 - Posted 20 Sep 2007 0:53:17 UTC
Last modified: 20 Sep 2007 1:00:20 UTC

I don't really know if this is a 5.8 problem, a BOINC problem, or what..so..if I'm in the wrong thread, feel free to move this elsewhere.

Just happened to notice that a WU had just completed, so went to my results page to check credit, and in less than 5 minutes, the result was nowhere to be found. All my prior WU's from today's work are there, and my "in progress" work is there, but not this one. Here's the log (edited for brevity):

9/19/2007 2:33:13 PM|rosetta@home|Starting 1a68__SEARCH_PAIRINGS_ROUND2_RESCORE_150_SAVE_ALL_OUT_-1a68_-_BARCODE__2050_7166_0
9/19/2007 2:33:14 PM|rosetta@home|Starting task 1a68__SEARCH_PAIRINGS_ROUND2_RESCORE_150_SAVE_ALL_OUT_-1a68_-_BARCODE__2050_7166_0 using rosetta_beta version 580
9/19/2007 5:28:31 PM|rosetta@home|Computation for task 1a68__SEARCH_PAIRINGS_ROUND2_RESCORE_150_SAVE_ALL_OUT_-1a68_-_BARCODE__2050_7166_0 finished
9/19/2007 5:28:33 PM|rosetta@home|[file_xfer] Started upload of file 1a68__SEARCH_PAIRINGS_ROUND2_RESCORE_150_SAVE_ALL_OUT_-1a68_-_BARCODE__2050_7166_0_0
9/19/2007 5:28:40 PM|rosetta@home|[file_xfer] Finished upload of file 1a68__SEARCH_PAIRINGS_ROUND2_RESCORE_150_SAVE_ALL_OUT_-1a68_-_BARCODE__2050_7166_0_0
9/19/2007 5:28:40 PM|rosetta@home|[file_xfer] Throughput 4223 bytes/sec
9/19/2007 5:28:46 PM|rosetta@home|Sending scheduler request: To report completed tasks
9/19/2007 5:28:46 PM|rosetta@home|Reporting 1 tasks
9/19/2007 5:28:51 PM|rosetta@home|Scheduler RPC succeeded [server version 509]
9/19/2007 5:28:51 PM|rosetta@home|Deferring communication for 4 min 2 sec
9/19/2007 5:28:51 PM|rosetta@home|Reason: requested by project

By 5:33PM (PDT)(i.e., 00:33 UTC) this WU was not in my results list. Since I've never had one disappear THAT fast before, I'm wondering if something is amiss. Unfortunately I have no way to know the result # or WU#...

Thoughts, anyone?

Conan Profile
Avatar

Joined: Oct 11 05
Posts: 136
ID: 4053
Credit: 1,906,525
RAC: 1,152
Message 46654 - Posted 20 Sep 2007 1:19:30 UTC - in response to Message ID 46637.

Hi Conan, I think you're exactly right -- the WU must have frozen at some time. We previously saw a high rate of watchdog kills for this type of workunit due to freezing, so that supports your hypothesis. There's clearly some problem with these jobs -- we've discontinued them until we can fix the issue!

My preferences are set to 21,000 seconds (6 hours).
This WU took over 28,000 seconds (nearly 8 hours) but still validated.

It was still successful and reported as Valid.
I claimed 98.93 for the time taken and the work done but was granted just 5.38.

What is the go with that?

I suspect the WU locked up during processing and this is why the time is so long, but as it is still valid I should get full credit.

There is an error report in this result but at the end all came out valid and successful.

Please I would like to know the reason for this result.




Thanks for that Ingemar and for your response so quickly.
____________

Prom

Joined: Jun 21 06
Posts: 23
ID: 96283
Credit: 509,175
RAC: 0
Message 46656 - Posted 20 Sep 2007 1:55:24 UTC - in response to Message ID 46653.

I don't really know if this is a 5.8 problem, a BOINC problem, or what..so..if I'm in the wrong thread, feel free to move this elsewhere.

Just happened to notice that a WU had just completed, so went to my results page to check credit, and in less than 5 minutes, the result was nowhere to be found. All my prior WU's from today's work are there, and my "in progress" work is there, but not this one. Here's the log (edited for brevity):

9/19/2007 2:33:13 PM|rosetta@home|Starting 1a68__SEARCH_PAIRINGS_ROUND2_RESCORE_150_SAVE_ALL_OUT_-1a68_-_BARCODE__2050_7166_0
9/19/2007 2:33:14 PM|rosetta@home|Starting task 1a68__SEARCH_PAIRINGS_ROUND2_RESCORE_150_SAVE_ALL_OUT_-1a68_-_BARCODE__2050_7166_0 using rosetta_beta version 580
9/19/2007 5:28:31 PM|rosetta@home|Computation for task 1a68__SEARCH_PAIRINGS_ROUND2_RESCORE_150_SAVE_ALL_OUT_-1a68_-_BARCODE__2050_7166_0 finished
9/19/2007 5:28:33 PM|rosetta@home|[file_xfer] Started upload of file 1a68__SEARCH_PAIRINGS_ROUND2_RESCORE_150_SAVE_ALL_OUT_-1a68_-_BARCODE__2050_7166_0_0
9/19/2007 5:28:40 PM|rosetta@home|[file_xfer] Finished upload of file 1a68__SEARCH_PAIRINGS_ROUND2_RESCORE_150_SAVE_ALL_OUT_-1a68_-_BARCODE__2050_7166_0_0
9/19/2007 5:28:40 PM|rosetta@home|[file_xfer] Throughput 4223 bytes/sec
9/19/2007 5:28:46 PM|rosetta@home|Sending scheduler request: To report completed tasks
9/19/2007 5:28:46 PM|rosetta@home|Reporting 1 tasks
9/19/2007 5:28:51 PM|rosetta@home|Scheduler RPC succeeded [server version 509]
9/19/2007 5:28:51 PM|rosetta@home|Deferring communication for 4 min 2 sec
9/19/2007 5:28:51 PM|rosetta@home|Reason: requested by project

By 5:33PM (PDT)(i.e., 00:33 UTC) this WU was not in my results list. Since I've never had one disappear THAT fast before, I'm wondering if something is amiss. Unfortunately I have no way to know the result # or WU#...

Thoughts, anyone?

This workunit by any chance?
It hasn't disappeared, it's on page 3. The workunits are ordered by wu# not date. You got an earlier workunit later than you normally should if you look at the date and so you'e likely to miss it without digging deeper. :) Funny this usually only happens when a workunit is reassigned after the deadline.
____________
BBLounge - Broadband and Technology forum

Gatekeeper

Joined: Feb 26 07
Posts: 4
ID: 150007
Credit: 966,551
RAC: 0
Message 46657 - Posted 20 Sep 2007 2:50:11 UTC - in response to Message ID 46656.


This workunit by any chance?
It hasn't disappeared, it's on page 3. The workunits are ordered by wu# not date. You got an earlier workunit later than you normally should if you look at the date and so you'e likely to miss it without digging deeper. :) Funny this usually only happens when a workunit is reassigned after the deadline.


Well, duhhhhhhhhhhh...:)

Never thought to look all that far back..I've not seen R@H do that, thought that behavior was limited to S@H...thanks.

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 46665 - Posted 20 Sep 2007 7:04:49 UTC
Last modified: 20 Sep 2007 7:05:26 UTC

Result ID 106047623
Name t030__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-t030_-plexinmonomer__2083_6224_0
Workunit 96261079
Created 16 Sep 2007 14:15:43 UTC
Sent 16 Sep 2007 14:34:53 UTC
Received 20 Sep 2007 3:58:44 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 510574
Report deadline 26 Sep 2007 14:34:53 UTC
CPU time 23419.390625
stderr out <core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 3549877
# cpu_run_time_pref: 21600
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -207.662 for 1800 seconds
**********************************************************************
GZIP SILENT FILE: .\xxt030.out

</stderr_txt>
]]>


Validate state Valid
Claimed credit 97.7521869214656
Granted credit 20
application version 5.80

This WU ran for 23,419 secs for only 20 credits, is this right?
____________

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 46666 - Posted 20 Sep 2007 7:09:22 UTC

OOPS, sorry, have just seen correspondence on this sort of WU. No answer to my query is needed.
____________

Markus Schuhmacher

Joined: May 29 06
Posts: 4
ID: 85181
Credit: 1,455,542
RAC: 0
Message 46685 - Posted 20 Sep 2007 15:06:28 UTC

Since I've updated 5.10 version I have problems with the software. My Laptop is running Windows Vista 32Bit and I have a server, too, running Windows 2003 Server EE. On both machines the service is crashing. I must start the service again and then it works again until it crashes.

I've found a text file in the Boinc-folder called stderrdae.txt. It contains:





Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0035BED3 read attempt to address 0x00000008

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 5.10.20


Dump Timestamp : 09/19/07 07:01:36
Debugger Engine : 4.0.5.0
Symbol Search Path: F:\Anwendungen\Boinc;F:\Anwendungen\Boinc;srv*C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\symbols*http://msdl.microsoft.com/download/symbols;srv*C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\symbols*http://boinc.berkeley.edu/symstore


ModLoad: 00400000 000b6000 F:\Anwendungen\Boinc\boinc.exe (5.10.20.0) (-nosymbols- Symbols Loaded)
File Version : 5.10.20
Company Name : Space Sciences Laboratory
Product Name : BOINC client
Product Version: 5.10.20

ModLoad: 7c920000 000c6000 C:\WINDOWS\system32\ntdll.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.3959

ModLoad: 7c800000 00115000 C:\WINDOWS\system32\kernel32.dll (5.2.3790.4062) (-exported- Symbols Loaded)
File Version : 5.2.3790.4062 (srv03_sp2_gdr.070417-0203)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.4062

Get Company Name Failed.
ModLoad: 10000000 00013000 F:\Anwendungen\Boinc\zlib1.dll (1.2.3.0) (-exported- Symbols Loaded)
File Version : 1.2.3
Company Name :
Product Name : zlib
Product Version: 1.2.3

ModLoad: 78130000 0009b000 F:\Anwendungen\Boinc\MSVCR80.dll (8.0.50727.762) (-exported- Symbols Loaded)
File Version : 8.00.50727.762
Company Name : Microsoft Corporation
Product Name : Microsoft® Visual Studio® 2005
Product Version: 8.00.50727.762

ModLoad: 77b70000 0005a000 C:\WINDOWS\system32\msvcrt.dll (7.0.3790.3959) (-exported- Symbols Loaded)
File Version : 7.0.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 7.0.3790.3959

ModLoad: 00340000 00030000 F:\Anwendungen\Boinc\libcurl.dll (7.16.4.0) (-exported- Symbols Loaded)
File Version : 7.16.4
Company Name : The cURL library, http://curl.haxx.se/
Product Name : The cURL library
Product Version: 7.16.4

ModLoad: 719c0000 0000a000 C:\WINDOWS\system32\WSOCK32.dll (5.2.3790.0) (-exported- Symbols Loaded)
File Version : 5.2.3790.0 (srv03_rtm.030324-2048)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.0

ModLoad: 71a10000 00017000 C:\WINDOWS\system32\WS2_32.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 71a00000 00008000 C:\WINDOWS\system32\WS2HELP.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.3959

ModLoad: 77f30000 000ab000 C:\WINDOWS\system32\ADVAPI32.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.3959

ModLoad: 77c20000 0009f000 C:\WINDOWS\system32\RPCRT4.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 76e40000 00013000 C:\WINDOWS\system32\Secur32.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 76990000 0002e000 C:\WINDOWS\system32\WINMM.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.3959

ModLoad: 77e20000 00092000 C:\WINDOWS\system32\USER32.dll (5.2.3790.4033) (-exported- Symbols Loaded)
File Version : 5.2.3790.4033 (srv03_sp2_gdr.070228-0030)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.4033

ModLoad: 77bd0000 00048000 C:\WINDOWS\system32\GDI32.dll (5.2.3790.4033) (-exported- Symbols Loaded)
File Version : 5.2.3790.4033 (srv03_sp2_gdr.070228-0030)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.4033

ModLoad: 004c0000 000fd000 F:\Anwendungen\Boinc\LIBEAY32.dll (0.9.8.5) (-exported- Symbols Loaded)
File Version : 0.9.8e
Company Name : The OpenSSL Project, http://www.openssl.org/
Product Name : The OpenSSL Toolkit
Product Version: 0.9.8e

ModLoad: 00380000 00030000 F:\Anwendungen\Boinc\SSLEAY32.dll (0.9.8.5) (-exported- Symbols Loaded)
File Version : 0.9.8e
Company Name : The OpenSSL Project, http://www.openssl.org/
Product Name : The OpenSSL Toolkit
Product Version: 0.9.8e

ModLoad: 7c420000 00087000 F:\Anwendungen\Boinc\MSVCP80.dll (8.0.50727.762) (-exported- Symbols Loaded)
File Version : 8.00.50727.762
Company Name : Microsoft Corporation
Product Name : Microsoft® Visual Studio® 2005
Product Version: 8.00.50727.762

ModLoad: 76180000 0001d000 C:\WINDOWS\system32\IMM32.DLL (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 719d0000 00008000 C:\WINDOWS\system32\rdpsnd.dll (5.2.3790.0) (-exported- Symbols Loaded)
File Version : 5.2.3790.0 (srv03_rtm.030324-2048)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.0

ModLoad: 779b0000 00011000 C:\WINDOWS\system32\WINSTA.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 71a50000 00057000 C:\WINDOWS\system32\NETAPI32.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 76a60000 0000b000 C:\WINDOWS\system32\PSAPI.DLL (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 00b00000 00008000 F:\Anwendungen\Boinc\boinc.dll (5.10.20.0) (-exported- Symbols Loaded)
File Version : 5.10.20
Company Name : Space Sciences Laboratory
Product Name : BOINC Core Client
Product Version: 5.10.20

ModLoad: 71930000 00042000 C:\WINDOWS\System32\mswsock.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.3959

ModLoad: 76dc0000 0002b000 C:\WINDOWS\system32\DNSAPI.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.3959

ModLoad: 76e60000 00007000 C:\WINDOWS\System32\winrnr.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 76e00000 0002f000 C:\WINDOWS\system32\WLDAP32.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.3959

ModLoad: 76e70000 00005000 C:\WINDOWS\system32\rasadhlp.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 68000000 00035000 C:\WINDOWS\system32\rsaenh.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 76810000 000c4000 C:\WINDOWS\system32\USERENV.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.3959

ModLoad: 073e0000 0005b000 C:\WINDOWS\system32\hnetcfg.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Betriebssystem Microsoft® Windows®
Product Version: 5.2.3790.3959

ModLoad: 718f0000 00008000 C:\WINDOWS\System32\wshtcpip.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959

ModLoad: 03000000 00115000 F:\Anwendungen\Boinc\dbghelp.dll (6.6.7.5) (-exported- Symbols Loaded)
File Version : 6.6.0007.5 (debuggers(dbg).051021-1446)
Company Name : Microsoft Corporation
Product Name : Debugging Tools for Windows(R)
Product Version: 6.6.0007.5

ModLoad: 01d00000 00083000 F:\Anwendungen\Boinc\symsrv.dll (6.6.7.5) (-exported- Symbols Loaded)
File Version : 6.6.0007.5 (debuggers(dbg).051021-1446)
Company Name : Microsoft Corporation
Product Name : Debugging Tools for Windows(R)
Product Version: 6.6.0007.5

ModLoad: 00e40000 0003a000 F:\Anwendungen\Boinc\srcsrv.dll (6.6.7.5) (-exported- Symbols Loaded)
File Version : 6.6.0007.5 (debuggers(dbg).051021-1446)
Company Name : Microsoft Corporation
Product Name : Debugging Tools for Windows(R)
Product Version: 6.6.0007.5

ModLoad: 77b60000 00008000 C:\WINDOWS\system32\version.dll (5.2.3790.3959) (-exported- Symbols Loaded)
File Version : 5.2.3790.3959 (srv03_sp2_rtm.070216-1710)
Company Name : Microsoft Corporation
Product Name : Microsoft® Windows® Operating System
Product Version: 5.2.3790.3959



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 14315, Write: 0, Other 69318

- I/O Transfers Counters -
Read: 0, Write: 438027, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 22084, QuotaPeakPagedPoolUsage: 24288
QuotaNonPagedPoolUsage: 36384, QuotaPeakNonPagedPoolUsage: 40536

- Virtual Memory Usage -
VirtualSize: 31211520, PeakVirtualSize: 32260096

- Pagefile Usage -
PagefileUsage: 6230016, PeakPagefileUsage: 7655424

- Working Set Size -
WorkingSetSize: 8196096, PeakWorkingSetSize: 9461760, PageFaultCount: 18857

*** Dump of the thread (1aa4): ***

- Information -
Status: Waiting, Wait Reason: Executive, Kernel Time: 1250000.000000, User Time: 781250.000000, Wait Time: 195426816.000000

- Registers -
eax=00000070 ebx=00000000 ecx=00000003 edx=0012f758 esi=00000000 edi=000000d8
eip=7c9485ec esp=0012fb14 ebp=0012fb7c
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
0012fb7c 77f45edb 000000d8 0012fc40 0000021a 0012fba4 ntdll!KiFastSystemCallRet+0x0
0012fba8 77f45f82 000000d8 0012fc40 0000021a 0012fbfc ADVAPI32!LookupPrivilegeValueW+0x0
0012fc1c 77f975af 000000d8 0012fc40 0000021a 003e4c20 ADVAPI32!LookupPrivilegeValueW+0x0
0012fe60 00430414 0012fe84 0046fc58 00000001 0012ffc0 ADVAPI32!StartServiceCtrlDispatcherA+0x0
0012fe90 78136d6c 96365185 003e27cc 003e27b8 7c94758b boinc!+0x0
00000000 00000000 00000000 00000000 00000000 00000000 MSVCR80!_msize+0x0

*** Dump of the thread (1cfc): ***

- Information -
Status: Waiting, Wait Reason: ExecutionDelay, Kernel Time: 10000000.000000, User Time: 11562500.000000, Wait Time: 195428176.000000

- Registers -
eax=01c7fa7a ebx=00465db0 ecx=27f08631 edx=01c7fa7a esi=00000000 edi=00affe68
eip=7c9485ec esp=00affe28 ebp=00affe90
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
00affe90 7c8024ed 0000000a 00000000 00afff30 0043f7c2 ntdll!KiFastSystemCallRet+0x0
00affea0 0043f7c2 0000000a 0000000a 00000000 0040c9bd kernel32!Sleep+0x0
00afff30 004300a1 00000000 3ff00000 77f46253 00000000 boinc!+0x0
00afff98 004397b7 00000000 00156650 77f45e91 00000001 boinc!+0x0
00afffb8 7c824829 00156650 00000000 00000000 00156650 boinc!+0x0
00afffec 00000000 77f45e70 00156650 00000000 00905a4d kernel32!GetModuleHandleA+0x0

*** Dump of the thread (1bc0): ***

- Information -
Status: Waiting, Wait Reason: UserRequest, Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 195428176.000000

- Registers -
eax=003e20b0 ebx=015cfb00 ecx=00000004 edx=003e219c esi=000000b4 edi=00000000
eip=7c9485ec esp=015cf7e4 ebp=015cf854
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
015cf854 7c821c8d 000000b4 ffffffff 00000000 015cfad8 ntdll!KiFastSystemCallRet+0x0
015cf868 00444937 000000b4 ffffffff 00000000 7c88b7c0 kernel32!WaitForSingleObject+0x0
015cfad8 7c8392a3 015cfb00 7c821ac1 015cfb08 00000000 boinc!+0x0
015cffec 00000000 781329e1 00b387f0 00000000 000000c8 kernel32!QueryMemoryResourceNotification+0x0

*** Dump of the thread (160c): ***

- Information -
Status: Waiting, Wait Reason: EventPairLow, Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 195428176.000000

- Registers -
eax=719358ab ebx=c0000000 ecx=00000000 edx=00000000 esi=00000000 edi=719691fc
eip=7c9485ec esp=00e3ff80 ebp=00e3ffb8
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
00e3ffb8 7c824829 00159490 00000000 00000000 00159490 ntdll!KiFastSystemCallRet+0x0
00e3ffec 00000000 719358ab 00159490 00000000 00905a4d kernel32!GetModuleHandleA+0x0


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 46699 - Posted 20 Sep 2007 19:59:32 UTC

I moved Markus' post here from Q&A boards. Sorry the post is so long.

Looks like one of the CAPRI WUs may have caused the reported msgs.

See Rhiju's post below/above. There have been some problems with these tasks on some machines and so they've stopped sending them out.

Markus, could you post links to the two specific hosts (and the specific tasks if possible) where you had problems?
____________
Rosetta Moderator: Mod.Sense

Feet1st Profile
Avatar

Joined: Dec 30 05
Posts: 1740
ID: 44890
Credit: 2,439,354
RAC: 3,785
Message 46700 - Posted 20 Sep 2007 20:12:29 UTC

Tell me there's some mistake here in the native structure shown
(linky to screenshot with a rather straight native structure shown for a t015_1_NMRREF_1_t015_1_id_model_07_idlIGNORE_THE_REST_core_2097_3599_0 task).

____________
If having a DC project with BOINC is of interest to you, with volunteer or cloud computing resources, but have no time for the BOINC learning curve,
use a hosting service that understands BOINC projects: http://DeepSci.com

Conan Profile
Avatar

Joined: Oct 11 05
Posts: 136
ID: 4053
Credit: 1,906,525
RAC: 1,152
Message 46707 - Posted 20 Sep 2007 23:20:05 UTC - in response to Message ID 46665.

Result ID 106047623
Name t030__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-t030_-plexinmonomer__2083_6224_0
Workunit 96261079
Created 16 Sep 2007 14:15:43 UTC
Sent 16 Sep 2007 14:34:53 UTC
Received 20 Sep 2007 3:58:44 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 510574
Report deadline 26 Sep 2007 14:34:53 UTC
CPU time 23419.390625
stderr out <core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 3549877
# cpu_run_time_pref: 21600
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score -207.662 for 1800 seconds
**********************************************************************
GZIP SILENT FILE: .\xxt030.out

</stderr_txt>
]]>


Validate state Valid
Claimed credit 97.7521869214656
Granted credit 20
application version 5.80

This WU ran for 23,419 secs for only 20 credits, is this right?


Well Michael at least you got 20, mine ran for over 28,000 sec and got 5.38.

____________

teemac Profile

Joined: Jul 18 06
Posts: 1
ID: 100689
Credit: 192,962
RAC: 0
Message 46710 - Posted 21 Sep 2007 2:48:08 UTC

I have 5 machines on Rosetta at present:

3x Intel E4300 Core2Duos- Kubuntu v7.04 (64bit) 1gb ram - these machines have been locking one core and sometimes both cores over the last day or so. I have aborted all WU's with the word CAPRI in them. I also currently have nearly all work units with 'IGNORE THE REST' in the units name also locking and freezing cores or completely locking machines with an error message saying something like 'if this keeps happening you may need to reset the project'.

1x AMD X2/4600 - Kubuntu v7.04 (64bit) 1gb ram - this machine is mostly ok - no locking but some errored WU's.

1x AMD 3200+ - Kubuntu v7.04 (32bit) 512mb ram - same as the 4600 machine above.

I currently have 2 of the E4300's locked - no work ticking over for the last hour or so - one of the machines is totally locked and am unable to use the OS at all - the other machine only has BOINC locked up, but I can use the OS.


____________

hugothehermit

Joined: Sep 26 05
Posts: 238
ID: 1310
Credit: 314,893
RAC: 0
Message 46716 - Posted 21 Sep 2007 9:44:23 UTC

I noticed that the ...CAPRI14_DOCK... native looked wrong, they were too far apart to be interacting at all, compared with what I have seen before.

Wits End

Joined: Apr 16 07
Posts: 4
ID: 165531
Credit: 21,839
RAC: 71
Message 46727 - Posted 21 Sep 2007 17:13:17 UTC
Last modified: 21 Sep 2007 17:15:35 UTC

Of the eight post-CAPRI WUs that I've returned, two produced "validate errors". I received credit for the other six
but they all had "watchdog shutting down" notes, and one had "WARNING! Not sure non-ideal rotamers are compatible
with symmetry yet..." What's going on?!?

107006854: Validate error
106890130: Watchdog notice
106794699: Watchdog and Warning notices
106724332: Validate error
106613376: Watchdog notice
106550676: Watchdog notice
106521483: Watchdog notice
106514350: Watchdog notice

anders n Profile

Joined: Sep 19 05
Posts: 403
ID: 578
Credit: 537,991
RAC: 0
Message 46732 - Posted 21 Sep 2007 18:29:25 UTC - in response to Message ID 46727.

Of the eight post-CAPRI WUs that I've returned, two produced "validate errors". I received credit for the other six
but they all had "watchdog shutting down" notes, and one had "WARNING! Not sure non-ideal rotamers are compatible
with symmetry yet..." What's going on?!?

107006854: Validate error
106890130: Watchdog notice
106794699: Watchdog and Warning notices
106724332: Validate error
106613376: Watchdog notice
106550676: Watchdog notice
106521483: Watchdog notice
106514350: Watchdog notice


See this post about no more Capri for now.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 46733 - Posted 21 Sep 2007 18:53:09 UTC - in response to Message ID 46732.

RE: Watchdog notice


Normal. Been that way since the watchdog was implemented. Since the watchdog runs in a seperate thread, this message just confirms that the watchdog thread properly ended as the task was completed.

So it is just saying everything ended normally, including the watchdog.
____________
Rosetta Moderator: Mod.Sense

mdettweiler
Avatar

Joined: Oct 15 06
Posts: 33
ID: 118931
Credit: 2,509
RAC: 0
Message 46808 - Posted 22 Sep 2007 20:15:10 UTC - in response to Message ID 46733.

RE: Watchdog notice


Normal. Been that way since the watchdog was implemented. Since the watchdog runs in a seperate thread, this message just confirms that the watchdog thread properly ended as the task was completed.

So it is just saying everything ended normally, including the watchdog.

When the watchdog has to end a task, is it of any use at all to the project scientifically, or is it practically aborted? I think I heard that the watchdog will abort a task if it goes a given amount of times longer than your preferred runtime, regardless of whether the application is showing visible progress; is this true? If so, are those terminated results useful at all?
____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,967,278
RAC: 589
Message 46813 - Posted 22 Sep 2007 21:38:56 UTC - in response to Message ID 46808.

RE: Watchdog notice


Normal. Been that way since the watchdog was implemented. Since the watchdog runs in a seperate thread, this message just confirms that the watchdog thread properly ended as the task was completed.

So it is just saying everything ended normally, including the watchdog.

When the watchdog has to end a task, is it of any use at all to the project scientifically, or is it practically aborted? I think I heard that the watchdog will abort a task if it goes a given amount of times longer than your preferred runtime, regardless of whether the application is showing visible progress; is this true? If so, are those terminated results useful at all?


oh...good question..has me wondering the same thing.
____________

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 46817 - Posted 23 Sep 2007 2:00:28 UTC
Last modified: 23 Sep 2007 2:04:55 UTC

Result ID 106760639
Name NeT6__BOINC_SYMM_FOLD_AND_DOCK_RELAX-NeT6_-mfr__2100_7176_0
Workunit 96937615

Validate state Valid
Claimed credit 91.5795040403045
Granted credit 48.2594968464685
application version 5.80

Never seen such a big difference between claimed and granted credits,unless the WU failed in some way but don't see any sign of that. Anyone got any ideas?

____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 46823 - Posted 23 Sep 2007 5:14:07 UTC - in response to Message ID 46808.

When the watchdog has to end a task, is it of any use at all to the project scientifically, or is it practically aborted?


Results are always useful. I exchanged some EMails with Chu some time ago and collected some details on the watchdog. I'll compile them into the FAQ and post them shortly.

Even knowing that a given approach does not function as expected is important to know. This is why Rosetta considers all results useful and meaningful, and attempts to issue credit to participants for their assistence in making such a determination.
____________
Rosetta Moderator: Mod.Sense

Markus Schuhmacher

Joined: May 29 06
Posts: 4
ID: 85181
Credit: 1,455,542
RAC: 0
Message 46841 - Posted 23 Sep 2007 10:38:29 UTC - in response to Message ID 46699.
Last modified: 23 Sep 2007 10:42:53 UTC

I moved Markus' post here from Q&A boards. Sorry the post is so long.

Looks like one of the CAPRI WUs may have caused the reported msgs.

See Rhiju's post below/above. There have been some problems with these tasks on some machines and so they've stopped sending them out.

Markus, could you post links to the two specific hosts (and the specific tasks if possible) where you had problems?


Sorry, I've been wondering where my post was gone. The two maschines are

http://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=603857
http://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=509233

How can I figure it out which workunit was currently in progress?
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 46857 - Posted 23 Sep 2007 14:39:30 UTC

Explanation of watch dog added to FAQ here. Please post any comments, or suggestions about it here.
____________
Rosetta Moderator: Mod.Sense

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,967,278
RAC: 589
Message 46924 - Posted 24 Sep 2007 18:20:29 UTC

serious problems with this WU from 5.80 non capri
____________

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 46965 - Posted 25 Sep 2007 11:31:59 UTC

Left over CAPRI14
http://boinc.bakerlab.org/rosetta/result.php?resultid=106850152
____________
Jmarks

Andrii Muliar Profile

Joined: Nov 10 05
Posts: 12
ID: 10952
Credit: 7,050,629
RAC: 583
Message 47024 - Posted 26 Sep 2007 14:12:17 UTC - in response to Message ID 47023.

I am forgot to say: I have Core Duo processor, ADSL connection and Windows XP SP2 as operating system.
____________

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,273,242
RAC: 5,666
Message 47066 - Posted 27 Sep 2007 1:21:40 UTC

beat__BOINC_JUMPRELAX_BARCODE2_CONSTRAINT-beat_-_1951_67075_0 ( workunit 98293407 ) stuck on 0% for >1 hour on an Intel iMac2 under OS X 10.4.10 with Boinc 5.10.20. Aborting.

____________

Paul

Joined: Oct 29 05
Posts: 155
ID: 7397
Credit: 12,392,717
RAC: 1,890
Message 47138 - Posted 28 Sep 2007 13:11:16 UTC

I noticed that all of my work units are now based on the 5.80 Beta again. The last few days, it looked like most of them were an older version of the application.

This morning, I have 5 out of 6 units with Compute Error status.

The computer ID is 43057

Can someone please look into this situation? It is very frustrating to have so much CPU time wasted. I just refocused 100% of this computer on R@H because it looked like the problems were fixed. If we are going back to compute errors, it would appear to be a better use of resources to focus this CPU on other projects until R@H is fixed.

What is the problem with all the failed WUs and 5.8?

____________
Thx!

Paul

Nothing But Idle Time

Joined: Sep 28 05
Posts: 209
ID: 1675
Credit: 139,545
RAC: 0
Message 47142 - Posted 28 Sep 2007 13:48:47 UTC

beat__BOINC_JUMPRELAX_BARCODE2_CONSTRAINT-beat_-_1951_61847_0
WU 108207857 v.5.80
Ran 21% over my specified run time preference; never saw this before.

Purple Rabbit Profile
Avatar

Joined: Sep 24 05
Posts: 28
ID: 923
Credit: 1,634,990
RAC: 602
Message 47153 - Posted 28 Sep 2007 17:44:48 UTC
Last modified: 28 Sep 2007 18:27:48 UTC

I think this the same problem Paul is describing.

I've had two WU error out with "error 1" after an hour or two of processing. These are from two
different Linux computers. I had a third one complete successfully on yet another Linux computer. These are the only "v001" WU
I've finished. Looking at Paul's computer he is also seeing some success and some failure with "v001".

All my computers are running Suse 10. The first bad one I shrugged off as gremlins. The second happened less than an hour later
so it looked like a trend :-) I've got 5 more of these puppies waiting...sigh

Bad:

v001_1_NMRREF_1_v001_1_id_model_13IGNORE_THE_REST_idl_2125_1698 Bad #1

v001_1_NMRREF_1_v001_1_id_model_10IGNORE_THE_REST_idl_2125_1167 Bad #2

Good:

v001_1_NMRREF_1_v001_1_id_model_20IGNORE_THE_REST_idl_2125_424 Good #1
____________

Greenshit

Joined: Jan 30 07
Posts: 3
ID: 144575
Credit: 55,173
RAC: 0
Message 47164 - Posted 28 Sep 2007 19:18:50 UTC

Compute error here:
http://boinc.bakerlab.org/rosetta/result.php?resultid=108782014

- exit code -1073741819 (0xc0000005)
____________

The_Bad_Penguin
Avatar

Joined: Jun 5 06
Posts: 2747
ID: 89694
Credit: 1,859,902
RAC: 0
Message 47184 - Posted 29 Sep 2007 2:47:09 UTC
Last modified: 29 Sep 2007 3:41:27 UTC

Got some compute / validate errors:

v001_1_NMRREF_1_v001_1_id_model_11IGNORE_THE_REST_idl_2125_4843

HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_2121_9929

MFR_SYMM_FOLD_AND_DOCK_RELAX_gr83_2110_47912

Purple Rabbit Profile
Avatar

Joined: Sep 24 05
Posts: 28
ID: 923
Credit: 1,634,990
RAC: 602
Message 47230 - Posted 29 Sep 2007 22:45:43 UTC - in response to Message ID 47153.
Last modified: 29 Sep 2007 23:09:42 UTC

...I've got 5 more of these puppies waiting...sigh...

An update:

All five of the puppies finished successfully. One of my bad WU was successfully completed by another. The other bad one died
a second time and was put to rest.

With the scattered reports of occasional failures I'm guessing this is probably an initial conditions (random seed) problem
and/or 5.80 not being able to handle the output for particular starting conditions. Things aren't totally broken for "v001",
but something ain't quite right :-)

Rick

Can the forum moderator fix the formatting for this thread? It's way off my screen to the right. I've spent some time
adding hard returns to make my posts more readable.
____________

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 47233 - Posted 30 Sep 2007 8:01:35 UTC

Add this one to the list. Watchdog is not activated.

v001_1_NMRREF_1_v001_1_id_model_03IGNORE_THE_REST_idl_2125_1909
____________

Dotsch
Avatar

Joined: Feb 12 06
Posts: 102
ID: 58347
Credit: 122,591
RAC: 46
Message 47235 - Posted 30 Sep 2007 9:39:56 UTC

I had a 5.8 WU which reseted to 0 % after stop and start of the BOINC client.
____________

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 47263 - Posted 1 Oct 2007 9:53:38 UTC

Result ID 108706598
Name v001_1_NMRREF_1_v001_1_id_model_18IGNORE_THE_REST_idl_2125_859_0
Workunit 98755213
Created 28 Sep 2007 6:18:07 UTC
Sent 28 Sep 2007 6:19:01 UTC
Received 1 Oct 2007 9:37:48 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 510574
Report deadline 8 Oct 2007 6:19:01 UTC
CPU time 2098.765625
stderr out <core_client_version>5.10.20</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 1801142


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00853013 read attempt to address 0xE9C9ED2C

Engaging BOINC Windows Runtime Debugger...




____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 47269 - Posted 1 Oct 2007 12:45:51 UTC - in response to Message ID 47235.

I had a 5.8 WU which reseted to 0 % after stop and start of the BOINC client.


This would be normal if the task had not completed the first model, and not reached a checkpoint.

You will always lose something when you exit BOINC. For most tasks, and most machine configurations, you will lose less then about 15min. But for some types of tasks, it can be more on the order of an hour.
____________
Rosetta Moderator: Mod.Sense

Dotsch
Avatar

Joined: Feb 12 06
Posts: 102
ID: 58347
Credit: 122,591
RAC: 46
Message 47274 - Posted 1 Oct 2007 13:38:17 UTC - in response to Message ID 47269.

I had a 5.8 WU which reseted to 0 % after stop and start of the BOINC client.


This would be normal if the task had not completed the first model, and not reached a checkpoint.

You will always lose something when you exit BOINC. For most tasks, and most machine configurations, you will lose less then about 15min. But for some types of tasks, it can be more on the order of an hour.

The Task was at about 60 to 70 % completed (at about 2 hours computing time). So I think it is not normal.
____________

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 47276 - Posted 1 Oct 2007 14:20:34 UTC

And now another!

Result ID 108699749
Name v001_1_NMRREF_1_v001_1_id_model_18IGNORE_THE_REST_idl_2125_625_0
Workunit 98748895
Created 28 Sep 2007 5:36:50 UTC
Sent 28 Sep 2007 5:37:48 UTC
Received 1 Oct 2007 14:18:38 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 510574
Report deadline 8 Oct 2007 5:37:48 UTC
CPU time 13589.703125
stderr out <core_client_version>5.10.20</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 1801376


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00715B96 read attempt to address 0x5698A8EE

Engaging BOINC Windows Runtime Debugger...




____________

Christoph Jansen Profile
Avatar

Joined: Jun 6 06
Posts: 248
ID: 91851
Credit: 267,153
RAC: 0
Message 47295 - Posted 1 Oct 2007 21:06:06 UTC
Last modified: 1 Oct 2007 21:06:35 UTC

This one:

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=98865167

keeps on stopping, meaning the processor time keeps stopping to count up indefinitely, so the watchdog does not shut it down.

I had that on another one over last night, it must have been stuck for some hours. Didn't note the number though.

Alexander

Joined: May 29 07
Posts: 1
ID: 180893
Credit: 119,573
RAC: 0
Message 47313 - Posted 2 Oct 2007 10:07:54 UTC

5.80 repeatedly crashed. My computer ID = 519167.

The following is the crash information:

Faulting application rosetta_beta_5.80_windows_intelx86.exe, version 0.0.0.0, faulting module ntdll.dll, version 5.1.2600.2180, fault address 0x00013396.

<?xml version="1.0" encoding="UTF-16"?>
<DATABASE>
<EXE NAME="rosetta_beta_5.80_windows_intelx86.exe" FILTER="GRABMI_FILTER_PRIVACY">
<MATCHING_FILE NAME="rosetta_5.69_windows_intelx86.exe" SIZE="2570240" CHECKSUM="0x57279008" MODULE_TYPE="WIN32" PE_CHECKSUM="0x0" LINKER_VERSION="0x0" LINK_DATE="08/20/2007 21:12:18" UPTO_LINK_DATE="08/20/2007 21:12:18" />
<MATCHING_FILE NAME="rosetta_beta_5.80_windows_intelx86.exe" SIZE="2575872" CHECKSUM="0xA6936F6C" MODULE_TYPE="WIN32" PE_CHECKSUM="0x0" LINKER_VERSION="0x0" LINK_DATE="09/12/2007 04:42:46" UPTO_LINK_DATE="09/12/2007 04:42:46" />
</EXE>
<EXE NAME="ntdll.dll" FILTER="GRABMI_FILTER_THISFILEONLY">
<MATCHING_FILE NAME="ntdll.dll" SIZE="708096" CHECKSUM="0x9D20568" BIN_FILE_VERSION="5.1.2600.2180" BIN_PRODUCT_VERSION="5.1.2600.2180" PRODUCT_VERSION="5.1.2600.2180" FILE_DESCRIPTION="NT Layer DLL" COMPANY_NAME="Microsoft Corporation" PRODUCT_NAME="Microsoft® Windows® Operating System" FILE_VERSION="5.1.2600.2180 (xpsp_sp2_rtm.040803-2158)" ORIGINAL_FILENAME="ntdll.dll" INTERNAL_NAME="ntdll.dll" LEGAL_COPYRIGHT="© Microsoft Corporation. All rights reserved." VERFILEDATEHI="0x0" VERFILEDATELO="0x0" VERFILEOS="0x40004" VERFILETYPE="0x2" MODULE_TYPE="WIN32" PE_CHECKSUM="0xAF2F7" LINKER_VERSION="0x50001" UPTO_BIN_FILE_VERSION="5.1.2600.2180" UPTO_BIN_PRODUCT_VERSION="5.1.2600.2180" LINK_DATE="08/04/2004 07:56:36" UPTO_LINK_DATE="08/04/2004 07:56:36" VER_LANGUAGE="English (United States) [0x409]" />
</EXE>
<EXE NAME="kernel32.dll" FILTER="GRABMI_FILTER_THISFILEONLY">
<MATCHING_FILE NAME="kernel32.dll" SIZE="984576" CHECKSUM="0xF0B331F6" BIN_FILE_VERSION="5.1.2600.3119" BIN_PRODUCT_VERSION="5.1.2600.3119" PRODUCT_VERSION="5.1.2600.3119" FILE_DESCRIPTION="Windows NT BASE API Client DLL" COMPANY_NAME="Microsoft Corporation" PRODUCT_NAME="Microsoft® Windows® Operating System" FILE_VERSION="5.1.2600.3119 (xpsp_sp2_gdr.070416-1301)" ORIGINAL_FILENAME="kernel32" INTERNAL_NAME="kernel32" LEGAL_COPYRIGHT="© Microsoft Corporation. All rights reserved." VERFILEDATEHI="0x0" VERFILEDATELO="0x0" VERFILEOS="0x40004" VERFILETYPE="0x2" MODULE_TYPE="WIN32" PE_CHECKSUM="0xF9293" LINKER_VERSION="0x50001" UPTO_BIN_FILE_VERSION="5.1.2600.3119" UPTO_BIN_PRODUCT_VERSION="5.1.2600.3119" LINK_DATE="04/16/2007 15:52:53" UPTO_LINK_DATE="04/16/2007 15:52:53" VER_LANGUAGE="English (United States) [0x409]" />
</EXE>
</DATABASE>

Mac-Nic Profile

Joined: Jul 6 06
Posts: 7
ID: 98997
Credit: 50,523
RAC: 0
Message 47314 - Posted 2 Oct 2007 10:17:41 UTC

There seems tobe a problem with this unit.

Christoph Jansen Profile
Avatar

Joined: Jun 6 06
Posts: 248
ID: 91851
Credit: 267,153
RAC: 0
Message 47316 - Posted 2 Oct 2007 10:38:07 UTC
Last modified: 2 Oct 2007 10:38:50 UTC

Now finally BOINC hung when I tried to shutdown my computer (I did't notice that for about an hour). This
resulted in all of the rest of the WUs still present on the drive to error out, whatever the cause for that may be.

BOINC probably finished a WU and tried to execute the next one but didn't get any system resources to do so as the shutdown
process was meant to go on. So one after one they all marched straight into oblivion. Poor wretches...

Snagletooth

Joined: Feb 22 07
Posts: 192
ID: 149031
Credit: 1,419,222
RAC: 702
Message 47324 - Posted 2 Oct 2007 13:49:39 UTC

This WU 98810686 got stuck after 2:19:44. Quitting and restarting BOINC got things going again. First Einstein and WCG Wus ran for about 3 1/2 hours before Rosetta started again from the beginning. It ran three times (from 21:46 to 1:52, 7:57 to 11:26, 14:57 to 21:36) before completing 3 decoys in 36113.17 CPU seconds. My runtime is set for 10 hours. Hope this helps.

Snagletooth

Joined: Feb 22 07
Posts: 192
ID: 149031
Credit: 1,419,222
RAC: 702
Message 47326 - Posted 2 Oct 2007 15:06:39 UTC - in response to Message ID 47324.

This WU 98810686 got stuck after 2:19:44. Quitting and restarting BOINC got things going again. First Einstein and WCG Wus ran for about 3 1/2 hours before Rosetta started again from the beginning. It ran three times (from 21:46 to 1:52, 7:57 to 11:26, 14:57 to 21:36) before completing 3 decoys in 36113.17 CPU seconds. My runtime is set for 10 hours. Hope this helps.



I meant to add that this is the second 5.80 WU that got stuck but completed successfully after restarting BOINC. I apologize for not noting the number. I dashed off to work as soon as I restarted and forgot about it until this one stuck.

TomaszPawel

Joined: Apr 28 07
Posts: 54
ID: 170716
Credit: 2,791,145
RAC: 0
Message 47328 - Posted 2 Oct 2007 15:52:34 UTC

Errors:

See this:

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=98806026

and

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=98840340

what you think?

Tuc

Joined: Sep 30 07
Posts: 4
ID: 208837
Credit: 1,006
RAC: 0
Message 47344 - Posted 3 Oct 2007 2:20:04 UTC

Your gonna LOVE me for this.... :)

BACKGROUND: I'm running the rosetta_beta_5.80_i686-pc-linux-gnu under linum emulation of FreeBSD.. (So already I'm sure I've really confused things. ;) )

When I first attached to the project, as it was downloading files all of a sudden my boinc_client just "disappeared". No core, nothing. Not sure why. Weird... Never did that before

PROBLEM 1 : It started to run "HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_2121_45549_0".

After a while, the process started to use 0 CPU, so I checked... the stderr.txt had :

No heartbeat from core client for 31 sec - exiting
pure virtual method called
terminate called without an active exception
SIGABRT: abort called
*** glibc detected *** corrupted double-linked list: 0x08f61f98 ***
SIGABRT: abort called


No concept why I'd miss a heartbeat.

I killed the processes.

PROBLEM 2: It restarted that WU and then later on :
No heartbeat from core client for 31 sec - exiting
SIGSEGV: segmentation violation
SIGABRT: abort called
SIGABRT: abort called
(And about 1304 more of thse)

It was still taking alot of CPU, but BOINC Manager didn't show any updates to anything, so I restarted it again....


I'm still trying to get through my first WU under this emulation..

Thanks, Tuc

JChojnacki Profile
Avatar

Joined: Sep 17 05
Posts: 71
ID: 105
Credit: 6,766,867
RAC: 747
Message 47408 - Posted 4 Oct 2007 22:03:31 UTC

Hey,

Got an error today with this work unit:
HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_2121_43593_0
http://boinc.bakerlab.org/rosetta/result.php?resultid=109858641

Thanks,

~Joel
____________



Rhiju
Forum moderator

Joined: Jan 8 06
Posts: 223
ID: 48256
Credit: 3,546
RAC: 0
Message 47424 - Posted 5 Oct 2007 20:08:27 UTC
Last modified: 5 Oct 2007 20:11:59 UTC

Hi all: We're trying to track down several sources of error. I'm not sure if anyone's posted about this, but a small number of workuntis with the batch number 2156:

mcr1__BOINC_ABRELAX-mcr1_-mfr__2056_

appear to be flawed. I've cancelled the job; you should also feel free to abort these jobs if you see them. There aren't that many. I just fixed the problem and sent out a similar job with ID 2059.

We're looking into a few more issues too.. I've just contacted the people in charge of the other jobs... thanks *very* much for posting!
____________

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 47428 - Posted 5 Oct 2007 21:24:23 UTC
Last modified: 5 Oct 2007 21:26:03 UTC

Rhiju.
Do you mean batch 2156 or 2056?
[both nbrs appear in your post].
____________

Mark Henderson

Joined: May 24 06
Posts: 9
ID: 84276
Credit: 643,001
RAC: 0
Message 47430 - Posted 6 Oct 2007 1:30:15 UTC
Last modified: 6 Oct 2007 1:47:58 UTC

These 2 errored on me after completion

100299846 sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_442

100299822 sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_418
____________

JChojnacki Profile
Avatar

Joined: Sep 17 05
Posts: 71
ID: 105
Credit: 6,766,867
RAC: 747
Message 47433 - Posted 6 Oct 2007 7:13:14 UTC
Last modified: 6 Oct 2007 7:14:49 UTC

Looks like I had another 5.80 error, with this work unit:

FLUA__BOINC_LONGNOE_JUMPRELAX_SAVE_ALL_OUT_BARCODE-FLUA_-_2120_46884_1
http://boinc.bakerlab.org/rosetta/result.php?resultid=110504357

Joel

Bletchley Park

Joined: Oct 4 07
Posts: 4
ID: 209743
Credit: 18,052
RAC: 0
Message 47437 - Posted 6 Oct 2007 9:25:01 UTC
Last modified: 6 Oct 2007 9:27:22 UTC

I have another issue with a workunit for 5.80. computation error.
sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_4758_0.

Path7

Joined: Aug 25 07
Posts: 128
ID: 201002
Credit: 61,751
RAC: 0
Message 47439 - Posted 6 Oct 2007 9:35:31 UTC
Last modified: 6 Oct 2007 10:15:15 UTC

WU 100348417 ended with an error:

http://boinc.bakerlab.org/rosetta/result.php?resultid=110420517

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_33023_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

Does anyone knows what "file xfer error" actually means?

Think I found ½ the answer to my question in “stdoutdae.txt”:

2007-10-05 22:59:47 [rosetta@home] Computation for task sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_33023_0 finished
2007-10-05 22:59:47 [rosetta@home] Output file sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_33023_0_0 for task sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_33023_0 absent

I'm not sure whether the absence of the file is a 5.80 error, or a common error. Anybody an Idea?

Path7.

Marky-UK

Joined: Nov 1 05
Posts: 73
ID: 8117
Credit: 1,689,495
RAC: 0
Message 47443 - Posted 6 Oct 2007 10:47:21 UTC

I'm getting several -161 errors on 5.80 WUs now too.

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_21387_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


Waste of CPU time...

`

Joined: Oct 21 06
Posts: 254
ID: 123105
Credit: 56,691
RAC: 0
Message 47448 - Posted 6 Oct 2007 13:54:31 UTC

I received one also:

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_21608_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 47449 - Posted 6 Oct 2007 14:21:53 UTC

2120
FLUA__BOINC_LONGNOE_JUMPRELAX_SAVE_ALL_OUT_BARCODE-FLUA_-_2120_33288_0
____________
Jmarks

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,742,648
RAC: 5,672
Message 47451 - Posted 6 Oct 2007 16:31:21 UTC

This workunit gave me a computation error on my computer and a validate error by Rosetta.

sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_25094_0
____________

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 6,140,242
RAC: 5,464
Message 47456 - Posted 6 Oct 2007 18:44:49 UTC

Just noticed that I got this bad result on September 28.
____________

Nothing But Idle Time

Joined: Sep 28 05
Posts: 209
ID: 1675
Credit: 139,545
RAC: 0
Message 47483 - Posted 7 Oct 2007 10:58:32 UTC

Eight hours wasted on this; wingman also bombed on it:

<core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 28800
# random seed: 3947179
======================================================
DONE :: 1 starting structures 29178.8 cpu seconds
This process generated 13 decoys from 13 attempts
======================================================
BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_32822_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
...etc

Mike Francis
Avatar

Joined: Nov 24 05
Posts: 8
ID: 17484
Credit: 623,519
RAC: 0
Message 47487 - Posted 7 Oct 2007 15:31:08 UTC

I had a compute error on Beta;
This is the error message I received.

10/7/2007 8:10:01 AM|rosetta@home|Deferring communication for 1 min 0 sec
10/7/2007 8:10:01 AM|rosetta@home|Reason: Unrecoverable error for result mcr1__BOINC_RG_FULLWEIGHT_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_12945_0 ( - exit code -1073741819 (0xc0000005))
10/7/2007 8:10:01 AM|rosetta@home|Computation for task mcr1__BOINC_RG_FULLWEIGHT_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_12945_0 finished
10/7/2007 8:10:01 AM|rosetta@home|Output file mcr1__BOINC_RG_FULLWEIGHT_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_12945_0_0 for task mcr1__BOINC_RG_FULLWEIGHT_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_12945_0 absent

____________

ziegenmelker

Joined: Jul 26 06
Posts: 10
ID: 101925
Credit: 26,061
RAC: 0
Message 47492 - Posted 7 Oct 2007 18:49:21 UTC

Two compute errors with 5.80 (64-Bit Linux on X2 4400 no oc).

First one:

###BEGIN############################################################
<core_client_version>5.10.8</core_client_version>
<![CDATA[
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
# random seed: 3979598
*** glibc detected *** corrupted double-linked list: 0x09647f08 ***
SIGABRT: abort called
Stack trace (19 frames):
[0x8d7cf2f]
[0x8d77d1c]
[0xffffe500]
[0x8de8234]
[0x8dfd0ce]
[0x8e01ae2]
[0x8e02774]
[0x8e04045]
[0x8dd24b7]
[0x8dd3f51]
[0x8b1c308]
[0x8ccedcd]
[0x84b7f90]
[0x80d82b5]
[0x85f6c37]
[0x87320a7]
[0x8732152]
[0x8de10f4]
[0x8048121]

Exiting...
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
*** glibc detected *** corrupted double-linked list: 0x099a9ea0 ***
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 14009.8 cpu seconds
This process generated 10 decoys from 10 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_403_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
###END########################################################################

And the second one:

###BEGIN######################################################################
<core_client_version>5.10.8</core_client_version>
<![CDATA[
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
# random seed: 3979904
*** glibc detected *** corrupted double-linked list: 0x0991f6b8 ***
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 13128 cpu seconds
This process generated 9 decoys from 9 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_97_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
###END######################################################################

Maybe I should adjust crunching time to one h till the problems are solved?

cu,
Michael
____________

`

Joined: Oct 21 06
Posts: 254
ID: 123105
Credit: 56,691
RAC: 0
Message 47494 - Posted 7 Oct 2007 19:01:20 UTC

Another -161 error to report: sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_21898

ziegenmelker

Joined: Jul 26 06
Posts: 10
ID: 101925
Credit: 26,061
RAC: 0
Message 47498 - Posted 7 Oct 2007 20:17:50 UTC

I think it's not because of the app, but because of the WU-type. All my wingmen got errors too. My next WU is a different type(1ubi__BOINC_ABRELAX_SHORTRELAX_SAVE_ALL_OUT-1ubi_-frags83__2162...), so I'm curious if this on will crash too.

cu,
Michael
____________

BitSpit
Avatar

Joined: Nov 5 05
Posts: 33
ID: 9581
Credit: 4,147,344
RAC: 0
Message 47504 - Posted 7 Oct 2007 22:07:10 UTC

Guess I'll do the job of the admins at the moment and say feel free to abort all sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155 jobs. They've been marked as canceled.

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 47506 - Posted 8 Oct 2007 0:12:27 UTC
Last modified: 8 Oct 2007 0:15:11 UTC

And another...
Result ID 110379414
Name sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_11869_0
Workunit 100311273
Created 5 Oct 2007 3:10:59 UTC
Sent 5 Oct 2007 11:23:37 UTC
Received 8 Oct 2007 0:07:03 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 510574
Report deadline 15 Oct 2007 11:23:37 UTC
CPU time 20188.78125
stderr out <core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 3968132
======================================================
DONE :: 1 starting structures 20188.1 cpu seconds
This process generated 14 decoys from 14 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_11869_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>


Validate state Invalid
Claimed credit 84.3860161877551
Granted credit 0
application version 5.80
-----
Is this WU one of the ones to be 'aborted' or not?? Would love to hear the official verdict.

____________

Mysteron347

Joined: May 10 07
Posts: 1
ID: 175558
Credit: 248,553
RAC: 0
Message 47530 - Posted 9 Oct 2007 2:43:14 UTC

Result ID 110418391
Name sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_31217_0
Workunit 100346611
Created 5 Oct 2007 7:33:41 UTC
Sent 5 Oct 2007 7:36:21 UTC
Received 8 Oct 2007 13:21:37 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 500090
Report deadline 15 Oct 2007 7:36:21 UTC
CPU time 9604.046875
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 3948784
# cpu_run_time_pref: 10800
# cpu_run_time_pref: 10800
======================================================
DONE :: 1 starting structures 9603.16 cpu seconds
This process generated 5 decoys from 5 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_31217_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Ditto result IDs

110418409
110418408
110418407
110418390

(all error code -161)

All sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_31217
(31217 -> 31216, 31233, 31234, 31235)

Also:

I have 179 WUs reported as being "In Progress" according to http://boinc.bakerlab.org/rosetta/results.php?userid=175558
Yet BOINC shows only 29WUs.

Something not quite right here.....

Irwan Adinatha Profile

Joined: Jan 18 06
Posts: 5
ID: 51865
Credit: 1,245,260
RAC: 0
Message 47551 - Posted 9 Oct 2007 17:58:42 UTC

Here two mcr51 one with error and one fine.
error:
10/9/2007 8:57:58 PM|rosetta@home|Reason: Unrecoverable error for result mcr1__BOINC_RG_FULLWEIGHT_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_9186_0 (Incorrect function. (0x1) - exit code 1 (0x1))
10/9/2007 8:57:58 PM|rosetta@home|Computation for task mcr1__BOINC_RG_FULLWEIGHT_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_9186_0 finished
10/9/2007 8:57:58 PM|rosetta@home|Output file mcr1__BOINC_RG_FULLWEIGHT_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_9186_0_0 for task mcr1__BOINC_RG_FULLWEIGHT_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_9186_0 absent

OK:
10/9/2007 9:34:17 PM|rosetta@home|Starting mcr1__BOINC_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_11025_0
10/9/2007 9:34:17 PM|rosetta@home|Starting task mcr1__BOINC_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_11025_0 using rosetta_beta version 580
10/10/2007 12:20:28 AM|rosetta@home|Computation for task mcr1__BOINC_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_11025_0 finished

(_KoDAk_) Profile

Joined: Jul 18 06
Posts: 109
ID: 100677
Credit: 1,859,263
RAC: 0
Message 47552 - Posted 9 Oct 2007 18:11:02 UTC

2007-10-09 15:01:12 [rosetta@home] Deferring communication for 1 min 0 sec
2007-10-09 15:01:12 [rosetta@home] Reason: Unrecoverable error for result sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_37278_0 (<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_37278_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
)

2007-10-09 19:11:43 [rosetta@home] Deferring communication for 1 min 0 sec
2007-10-09 19:11:43 [rosetta@home] Reason: Unrecoverable error for result sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_13617_0 (<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_13617_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>
)

____________

Nemesis
Avatar

Joined: Mar 12 06
Posts: 149
ID: 65097
Credit: 21,395
RAC: 0
Message 47566 - Posted 9 Oct 2007 19:59:50 UTC

You would think that they would send this turkey back to RALPH by now.
____________
Nemesis n. A righteous infliction of retribution manifested by an appropriate agent.


Dr Who Fan
Avatar

Joined: May 28 06
Posts: 35
ID: 85050
Credit: 62,869
RAC: 32
Message 47579 - Posted 10 Oct 2007 0:08:52 UTC

several error to report:
http://boinc.bakerlab.org/rosetta/result.php?resultid=110371301
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 3975811
==
</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_4190_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Validate state Invalid

-------------

http://boinc.bakerlab.org/rosetta/result.php?resultid=110418515
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 3948666
======================================================
DONE :: 1 starting structures 8134.02 cpu seconds
This process generated 5 decoys from 5 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_31335_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Validate state Invalid

-------------


http://boinc.bakerlab.org/rosetta/result.php?resultid=110838918
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 1657484
==
</stderr_txt>
]]>

Validate state Invalid

____________

ziegenmelker

Joined: Jul 26 06
Posts: 10
ID: 101925
Credit: 26,061
RAC: 0
Message 47586 - Posted 10 Oct 2007 4:25:50 UTC - in response to Message ID 47579.

According to this post I kill every 'sen15_RESAMPLE_BOINC_MFR_ABRELAX_...' WU.

cu,
Michael
____________

Snagletooth

Joined: Feb 22 07
Posts: 192
ID: 149031
Credit: 1,419,222
RAC: 702
Message 47608 - Posted 10 Oct 2007 22:10:58 UTC
Last modified: 10 Oct 2007 22:12:36 UTC

1bq9A_SEARCH_PAIRINGS_ROUND3_RESCORE_75_SAVE_ALL_OUT_-1bq9A-_BARCODE__2166_3994_0
Crashed after I opened the graphics window about 1 hour and 11 minutes in as it was initializing the second model

result

Oliver

Joined: Oct 11 07
Posts: 4
ID: 211670
Credit: 525
RAC: 0
Message 47621 - Posted 11 Oct 2007 19:35:23 UTC

Hi all: We're trying to track down several sources of error. Workuntis with the batch number 2155:

sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED


appear to be flawed. I've cancelled the job; you should also feel free to abort these jobs if you see them. There aren't that many. I just fixed the problem and sent out a similar job with ID 2163.

thanks *very* much for posting!

hugothehermit

Joined: Sep 26 05
Posts: 238
ID: 1310
Credit: 314,893
RAC: 0
Message 47666 - Posted 13 Oct 2007 0:49:41 UTC
Last modified: 13 Oct 2007 0:54:36 UTC

HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_2121_65537_0

Exit status = -1073741819 (0xc0000005)


Edit: Only a couple more edits and this might make sense :)

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 47675 - Posted 13 Oct 2007 11:37:01 UTC
Last modified: 13 Oct 2007 11:37:48 UTC

I believe there is a problem with 2168 wus. They are pruducing to many decoys.
2168
____________
Jmarks

ziegenmelker

Joined: Jul 26 06
Posts: 10
ID: 101925
Credit: 26,061
RAC: 0
Message 47678 - Posted 13 Oct 2007 12:18:48 UTC

A valid WU, but still with errors:

<core_client_version>5.10.8</core_client_version>
<![CDATA[
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
# random seed: 3647667
SIGSEGV: segmentation violation
Stack trace (12 frames):
[0x8d7cf2f]
[0x8d77d1c]
[0xffffe500]
[0x8e024c7]
[0x8dd2715]
[0x8dd2481]
[0x83f9b8b]
[0x8de873f]
[0x8d79987]
[0x8d7afa5]
[0x8d73f9d]
[0x8e1487a]

Exiting...
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
ERROR:: Exit from: fragments.cc line: 465
FILE_LOCK::unlock(): close failed.: Bad file descriptor
*** glibc detected *** double free or corruption (fasttop): 0x0909e348 ***
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
*** glibc detected *** corrupted double-linked list: 0x09757f20 ***
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
*** glibc detected *** corrupted double-linked list: 0x09511408 ***
Graphics are disabled due to configuration...
# cpu_run_time_pref: 14400
======================================================
DONE :: 1 starting structures 14211.6 cpu seconds
This process generated 19 decoys from 19 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>

I really wonder about all these SIGSEGV errors. I don't think they are hardware related.
"glibc detected *** corrupted double-linked list" should be caused from the app itself.
System: AMD 64 X2 4400, 2Gig, standard clock, 64-Bit OpenSUSE 10.2, glibc-2.5-25.

cu,
Michael
____________

Conan Profile
Avatar

Joined: Oct 11 05
Posts: 136
ID: 4053
Credit: 1,906,525
RAC: 1,152
Message 47680 - Posted 13 Oct 2007 13:24:44 UTC - in response to Message ID 47675.
Last modified: 13 Oct 2007 13:26:35 UTC

I believe there is a problem with 2168 wus. They are pruducing to many decoys.
2168


To the project people, please also look at thread 4000 credit wus? which discusses this issue, the following is one of those posts from Xaak (in the quotes) and what I found when I looked at that host.


Found another one:
http://boinc.bakerlab.org/rosetta/results.php?hostid=567404&offset=20

Maybe the credit/decoy ratio on these are way off?


I don't believe the owner knows anything about this as all the WU's making the huge claims all have the same name, they all start with mcr1, for example

mcr1__BOINC_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-short_mfr__2168_67176_0

On the other side of the coin the same computer is getting a lot of stuck WU's that are being terminated by the Watchdog that start with

STM0082_BOINC_MFR_ABRELAX_PICKED_

and they get just 20 credits for each of these WU's.
____________

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 47690 - Posted 13 Oct 2007 17:28:52 UTC - in response to Message ID 47680.

[quote]I believe there is a problem with 2168 wus. They are pruducing to many decoys.
2168


To the project people, please also look at thread 4000 credit wus? which discusses this issue, the following is one of those posts from Xaak (in the quotes) and what I found when I looked at that host.

That is the same thread my 2168 points to.
____________
Jmarks

Christoph Jansen Profile
Avatar

Joined: Jun 6 06
Posts: 248
ID: 91851
Credit: 267,153
RAC: 0
Message 47692 - Posted 13 Oct 2007 18:23:29 UTC
Last modified: 13 Oct 2007 18:27:58 UTC

The machines affected by this "multi-decoy" bug seem all to be Core2Quads. And, contrary to what the 4000 credit WU thread implies, also other WU types than mcr1
are affected, see these two:

http://boinc.bakerlab.org/rosetta/result.php?resultid=111755669

http://boinc.bakerlab.org/rosetta/result.php?resultid=111753474
____________
"I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant." R.M. Nixon

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 47712 - Posted 14 Oct 2007 2:13:12 UTC

Result ID 111953379
Name HR19__BOINC_LONGNOE_JUMPRELAX_BARCODE_SAVE_ALL_OUT_200-HR19_-_2121_55153_0
Workunit 101758314
Created 11 Oct 2007 16:10:04 UTC
Sent 11 Oct 2007 16:12:47 UTC
Received 14 Oct 2007 2:04:59 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 510574
Report deadline 21 Oct 2007 16:12:47 UTC
CPU time 9247.8125
stderr out <core_client_version>5.10.20</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 1886848


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00B77A52 write attempt to address 0x1FF63C2C

Engaging BOINC Windows Runtime Debugger...



********************





____________

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 6,140,242
RAC: 5,464
Message 47725 - Posted 14 Oct 2007 19:01:45 UTC

Here's one that didn't work out:

Result ID 112315759
Name STM0082_BOINC_MFR_ABRELAX_PICKED_2175_1860_0
Workunit 102095676
Created 13 Oct 2007 5:23:55 UTC
Sent 13 Oct 2007 5:25:15 UTC
Received 14 Oct 2007 18:15:54 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 307276
Report deadline 23 Oct 2007 5:25:15 UTC
CPU time 3577.953125
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 10800
# random seed: 3485771
**********************************************************************
Rosetta score is stuck or going too long. Watchdog is ending the run!
Stuck at score 0 for 900 seconds
**********************************************************************
GZIP SILENT FILE: .\aaSTM1.out

</stderr_txt>
]]>

Validate state Valid
Claimed credit 9.24557159531191
Granted credit 20
application version 5.80
____________

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,967,278
RAC: 589
Message 47731 - Posted 14 Oct 2007 20:26:03 UTC

Result ID 110380482
Name sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0
Workunit 100312279

client error and compute error

CPU time 21566.46875
stderr out

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 3967126
======================================================
DONE :: 1 starting structures 21565.9 cpu seconds
This process generated 10 decoys from 10 attempts
======================================================
=================================================================================

from BOINC Manager time is CET (gmt+2)

10/13/2007 1:57:01 AM|rosetta@home|Computation for task sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0 finished
10/13/2007 1:57:01 AM|rosetta@home|Output file sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0_0 for task sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0 absent
10/13/2007 1:57:01 AM|rosetta@home|Starting CNTRL_01ABRELAX_SAVE_ALL_OUT_-1di2_-_filters_1782_408715_0
10/13/2007 1:57:01 AM|rosetta@home|Starting task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1di2_-_filters_1782_408715_0 using rosetta version 569
10/13/2007 1:57:02 AM|rosetta@home|Deferring communication for 1 min 0 sec
10/13/2007 1:57:02 AM|rosetta@home|Reason: Unrecoverable error for result sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0 (<file_xfer_error> <file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0_0</file_name> <error_code>-161</error_code></file_xfer_error>)



BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
<message>
<file_xfer_error>
<file_name>sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

____________

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,742,648
RAC: 5,672
Message 47735 - Posted 14 Oct 2007 21:21:51 UTC - in response to Message ID 47731.

Result ID 110380482
Name sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0
Workunit 100312279


You don't read the forum very much do you? (just joking)

Might I point you to this post in this very same thread?

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=3552&nowrap=true#47621
____________

Angus Profile

Joined: Sep 17 05
Posts: 412
ID: 83
Credit: 321,053
RAC: 0
Message 47736 - Posted 14 Oct 2007 21:46:28 UTC - in response to Message ID 47735.

Result ID 110380482
Name sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0
Workunit 100312279


You don't read the forum very much do you? (just joking)

Might I point you to this post in this very same thread?

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=3552&nowrap=true#47621


For some reason this strikes me as very funny.
____________
Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :)



"You can't fix stupid" (Ron White)

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,967,278
RAC: 589
Message 47752 - Posted 15 Oct 2007 9:11:52 UTC - in response to Message ID 47735.

doh! it got lost in the clutter of the rest of my work. I don't look at my work unit list that much so let that one slip through. thanks for the reminder.
have to go hunt for the rest if any.


Result ID 110380482
Name sen15_RESAMPLE_BOINC_MFR_ABRELAX_PICKED_2155_12875_0
Workunit 100312279


You don't read the forum very much do you? (just joking)

Might I point you to this post in this very same thread?

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=3552&nowrap=true#47621


____________

Bletchley Park

Joined: Oct 4 07
Posts: 4
ID: 209743
Credit: 18,052
RAC: 0
Message 47755 - Posted 15 Oct 2007 10:50:18 UTC

Version 5.80 BETA

computation error

unknown software exception 0xc0000409 occurred in the application at 0x00c2ec4a
lc26__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1c26_-foldanddock__2176_16844

using a lot of system cpu time.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,967,278
RAC: 589
Message 47766 - Posted 15 Oct 2007 20:07:26 UTC

the scaling for this work unit is way off in the graphs
mcr1__BOINC_SYMM_FOLD_AND_DOCK_RELAX-mcr1_-mfr__2128_72362_0

-275 accepted energy and lower is going down out of the window. the rmsd is often out the left of its window. I can't get any screenshots of this for some reason, but you guys know what i mean.
____________

Z3r0

Joined: Aug 3 06
Posts: 2
ID: 102995
Credit: 8,453
RAC: 0
Message 47804 - Posted 17 Oct 2007 8:59:03 UTC

I have 5.10.20 boinc and I have just used Rosetta since last week.

It said computation error, I haven't got a log to report, I hibernate my laptop a lot, maybe this is the cause?
____________

Z3r0

Joined: Aug 3 06
Posts: 2
ID: 102995
Credit: 8,453
RAC: 0
Message 47805 - Posted 17 Oct 2007 9:03:06 UTC

I have 5.10.20 boinc and I have just used Rosetta since last week.

It said computation error, I haven't got a log to report, I hibernate my laptop a lot, maybe this is the cause?

2007-10-17 13:13:01 [rosetta@home] Deferring communication for 1 min 11 sec
2007-10-17 13:13:01 [rosetta@home] Reason: Unrecoverable error for result 1opd__SEARCH_PAIRINGS_ROUND3_RESCORE_75_SAVE_ALL_OUT_-1opd_-_BARCODE__2166_6199_0 (Incorrect function. (0x1) - exit code 1 (0x1))

____________

Jmarks Profile
Avatar

Joined: Jul 16 07
Posts: 132
ID: 191202
Credit: 98,025
RAC: 0
Message 47809 - Posted 17 Oct 2007 12:39:17 UTC - in response to Message ID 47805.

I have 5.10.20 boinc and I have just used Rosetta since last week.

It said computation error, I haven't got a log to report, I hibernate my laptop a lot, maybe this is the cause?

2007-10-17 13:13:01 [rosetta@home] Deferring communication for 1 min 11 sec
2007-10-17 13:13:01 [rosetta@home] Reason: Unrecoverable error for result 1opd__SEARCH_PAIRINGS_ROUND3_RESCORE_75_SAVE_ALL_OUT_-1opd_-_BARCODE__2166_6199_0 (Incorrect function. (0x1) - exit code 1 (0x1))


In general Preferenced make sure the-
Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes') yes
____________
Jmarks

AMD_is_logical

Joined: Dec 20 05
Posts: 299
ID: 41207
Credit: 31,460,681
RAC: 0
Message 47812 - Posted 17 Oct 2007 16:55:51 UTC

This result says it is "invalid", even though the stderr.txt, the exit status, and the message log look perfectly normal.

http://boinc.bakerlab.org/rosetta/result.php?resultid=112607927

MMSihombing Profile
Avatar

Joined: May 22 06
Posts: 15
ID: 83829
Credit: 1,275,493
RAC: 64
Message 47850 - Posted 19 Oct 2007 7:56:31 UTC

1c26__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1c26_-foldanddock__2176_5799_0

Compute error

Exit status -1073741819 (0xc0000005)
____________

Xaak Profile

Joined: Mar 20 06
Posts: 17
ID: 66761
Credit: 3,701,702
RAC: 0
Message 47862 - Posted 19 Oct 2007 14:40:38 UTC
Last modified: 19 Oct 2007 14:41:20 UTC

The rediculously high credit wus are still happening

Latest example:
http://boinc.bakerlab.org/rosetta/result.php?resultid=113518482

Result ID 113518482
Name 1r69__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1r69_-frags83__2179_18461_1
Workunit 103081085
Created 18 Oct 2007 8:10:52 UTC
Sent 18 Oct 2007 8:11:00 UTC
Received 18 Oct 2007 17:48:39 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 567404
Report deadline 28 Oct 2007 8:11:00 UTC
CPU time 7703.15625
stderr out <core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 3319170
======================================================
DONE :: 1 starting structures 7702.38 cpu seconds
This process generated 165 decoys from 165 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>


Validate state Valid
Claimed credit 34.169257800865
Granted credit 1238.8941257865
application version 5.80



____________
XaaK



Luuklag

Joined: Sep 13 07
Posts: 262
ID: 205058
Credit: 4,171
RAC: 0
Message 47867 - Posted 19 Oct 2007 16:58:23 UTC - in response to Message ID 47862.

The rediculously high credit wus are still happening

Latest example:
http://boinc.bakerlab.org/rosetta/result.php?resultid=113518482

Result ID 113518482
Name 1r69__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1r69_-frags83__2179_18461_1
Workunit 103081085
Created 18 Oct 2007 8:10:52 UTC
Sent 18 Oct 2007 8:11:00 UTC
Received 18 Oct 2007 17:48:39 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 567404
Report deadline 28 Oct 2007 8:11:00 UTC
CPU time 7703.15625
stderr out <core_client_version>5.10.13</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 3319170
======================================================
DONE :: 1 starting structures 7702.38 cpu seconds
This process generated 165 decoys from 165 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
]]>


Validate state Valid
Claimed credit 34.169257800865
Granted credit 1238.8941257865
application version 5.80




that 1 made just an awfull lot of decoys, if you devide the credit given by the decoys, that comes to somewere between 7 and 8 per decoy. claimed is normal, cause thats calculated on hand of the cpu seconds spend. but this 1 made whay much decoys, i normaly make 10 to 20 decoys in 3 hours on my single core thingie.

mikus

Joined: Nov 7 05
Posts: 58
ID: 10139
Credit: 700,115
RAC: 0
Message 47868 - Posted 19 Oct 2007 18:28:37 UTC

aborted beta - http://boinc.bakerlab.org/rosetta/result.php?resultid=112298830

Went to my computer (to make a connection), and saw (gkrellm) that one of the cores was idle. Boincmgr status showed two Rosetta WUs running. Top showed one of them using CPU, the other sitting there "stuck". Manually aborted the second.

I have plenty of memory; "leave work in memory" is specified. Judging by the CPU time acumulated by the "stuck" workunit, it had completed its quota of decoys, and was in the process of shutting down when it got "stuck". Dual core Linux 32-bit system, boinc 5.10.21. Rosetta tasks usually complete just fine.

The problem that "stuck" workunits cause is that boinc keeps track of the number of seconds given to tasks. As near as I can tell, my system spent so much wall clock time __not__ executing that "stuck" WU that its boinc calculated efficiency has now been severely reduced. I run off-line, and connect only occasionally. The lowered efficiency value means that for a while I will be given *less* work each time I connect, and will therefore have to connect more often. Not good.
.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 47871 - Posted 19 Oct 2007 21:36:08 UTC

Mikus, please join the discussion on Linux preemption issues in this thread.
____________
Rosetta Moderator: Mod.Sense

(_KoDAk_) Profile

Joined: Jul 18 06
Posts: 109
ID: 100677
Credit: 1,859,263
RAC: 0
Message 47876 - Posted 20 Oct 2007 5:49:00 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=113112849
????
____________

Dr Who Fan
Avatar

Joined: May 28 06
Posts: 35
ID: 85050
Credit: 62,869
RAC: 32
Message 47884 - Posted 20 Oct 2007 16:05:15 UTC

Another crunching error

http://boinc.bakerlab.org/rosetta/result.php?resultid=113450720
stderr out

CPU type AuthenticAMD
AMD Athlon(tm) 64 X2 Dual-Core Processor TK-53 [x86 Family 15 Model 104 Stepping 1]

CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_1052746_0

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 2226295
==
</stderr_txt>
]]>

Validate state Invalid
Claimed credit 20.426126431417
Granted credit 0
application version 5.69
____________

mikus

Joined: Nov 7 05
Posts: 58
ID: 10139
Credit: 700,115
RAC: 0
Message 47897 - Posted 21 Oct 2007 18:29:21 UTC - in response to Message ID 47871.

Mikus, please join the discussion on Linux preemption issues in this thread.

I may do so -- but the reason I did not originally is that I believe that all of the recommendations in that thread were already in place on my system. As far as I can tell, the Rosetta workunit got "stuck" __after__ it had completed crunching. So to my mind there was no "task preemption" involved (only "task exit").

Also, if it were a preemption issue, I would expect other Rosetta tasks on my system to be failing in a similar fashion. But only that *beta* 5.80 has failed so far. I suspect the problem was triggered by something about that particular task. [Note: My 'Rosetta time to crunch' is 8 hours, meaning I run Rosetta applications (including 5.80) for a longer time than typical participants do.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,273,242
RAC: 5,666
Message 47998 - Posted 24 Oct 2007 16:31:35 UTC

Work unit 104545488 stuck after 3 seconds: aborting. Mac OS X 10.4.19, Intel-based Imac, Boinc 5.10.20
____________

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 48012 - Posted 25 Oct 2007 6:13:03 UTC

I just got these two units, now they both failed on the same user it seems.

Is there a problem with the work units, the app? or their puter's.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=104775454

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=104776777

This is from their results, same for both.

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<message>
No main program specified
</message>
]]>

Pete.

____________


Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 48023 - Posted 25 Oct 2007 18:03:28 UTC

I suspect the computer. I have eight of his and am working on the third without problems.
____________

Markus Schuhmacher

Joined: May 29 06
Posts: 4
ID: 85181
Credit: 1,455,542
RAC: 0
Message 48085 - Posted 28 Oct 2007 19:31:38 UTC

Since I installed BOINc 5.10.23 the service didn't crash anymore.

##

Seitdem ich den Boinc-Client auf 5.10.23 aktualisiert habe, läuft Boinc stabiel.
____________

Keith T.
Avatar

Joined: Mar 1 07
Posts: 37
ID: 150379
Credit: 12,959
RAC: 0
Message 48149 - Posted 30 Oct 2007 13:25:30 UTC
Last modified: 30 Oct 2007 13:28:09 UTC

http://boinc.bakerlab.org/rosetta/result.php?resultid=115759805

cryb__BOINC_ABRELAX_SAVE_ALL_OUT-cryb_-_2227_32333_0

Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)

CPU time 1314.3125


stderr out <core_client_version>5.10.7</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 2304098


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C910E23 read attempt to address 0x00150586

Engaging BOINC Windows Runtime Debugger...



Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C9105F8 read attempt to address 0x00150010

Engaging BOINC Windows Runtime Debugger...


</stderr_txt>
]]>
Validate state Invalid

I have seen this error before but it is rare. Last one was 16 Sep 2007 according to BoincView logs.

Keith T.

ziegenmelker

Joined: Jul 26 06
Posts: 10
ID: 101925
Credit: 26,061
RAC: 0
Message 48257 - Posted 1 Nov 2007 21:23:35 UTC
Last modified: 1 Nov 2007 21:25:29 UTC

Some more:

5.80: SIGSEGV and '*** glibc detected *** corrupted double-linked list: 0x0a01aa28 ***', but valid and granted credits (32 for 4h ???)
5.80: process got signal 11 and 2 SIGSEGV: Invalid
5.69: process exited with code 193 (0xc1) and 3 SIGSEGV: Invalid
5.80: process exited with code 193 (0xc1) and 1 SIGSEGV: Invalid

I shortened the crunching time from 4 to 1 h.

5.80: *** glibc detected *** corrupted double-linked list: 0x097ea480 *** and 1 SIGSEGV: Valid
5.69: resultid=116839781: Valid
5.69: resultid=116896290 1 SIGSEGV: Valid

The '*** glibc detected *** corrupted double-linked list:' is an error in the app.
One of the last(valid) WUs got stuck, so I shut down boinc, restarted and the WU was successfully finished.

This host is doing work for Einstein(32Bit), ABC(64Bit), Seti(64Bit) and WCG(32Bit) without problems.

cu,
Michael

[edit]format[/edit]
____________

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,273,242
RAC: 5,666
Message 48273 - Posted 2 Nov 2007 18:27:31 UTC

Workunit 106397169 (trunc_cryb__BOINC_ABRELAX_-trunc_cryb_-_2238_78739_0) stuck at 0.751%. Intel iMac2: Mac OSX 10.4.10; Boinc 5.10.20. Aborting.
____________

Trey Profile

Joined: Oct 3 06
Posts: 11
ID: 116085
Credit: 110,142
RAC: 0
Message 48307 - Posted 3 Nov 2007 19:12:56 UTC

I had a problem with 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0. I did just re-install my computer with openSUSE 10.3 (from 10.1) a few hours previous. However, WUs on the new O/S before/after the problem one seem OK.

2007-11-03 10:36:31 [---] Starting BOINC client version 5.10.21 for x86_64-pc-linux-gnu
2007-11-03 10:36:31 [---] log flags: task, file_xfer, sched_ops
2007-11-03 10:36:31 [---] Libraries: libcurl/7.15.5 OpenSSL/0.9.8c zlib/1.2.3 libidn/0.6.5
2007-11-03 10:36:31 [---] Executing as a daemon
2007-11-03 10:36:31 [---] Data directory: /home/trey/BOINC
2007-11-03 10:36:31 [---] Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ [Family 15 Model 43 Stepping 1]
2007-11-03 10:36:31 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_
legacy
2007-11-03 10:36:31 [---] OS: Linux: 2.6.22.9-0.4-default
2007-11-03 10:36:31 [---] Memory: 1.97 GB physical, 4.01 GB virtual
2007-11-03 10:36:31 [---] Disk: 98.44 GB total, 89.08 GB free
2007-11-03 10:36:31 [---] Local time is UTC -5 hours
2007-11-03 10:36:31 [rosetta@home] URL: http://boinc.bakerlab.org/rosetta/; Computer ID: 647624; location: home; project prefs: default
2007-11-03 10:36:31 [---] General prefs: from http://www.worldcommunitygrid.org/ (last modified 2007-10-28 21:44:36)
2007-11-03 10:36:31 [---] Host location: home
2007-11-03 10:36:31 [---] General prefs: no separate prefs for home; using your defaults
2007-11-03 10:36:31 [---] Reading preferences override file
2007-11-03 10:36:31 [---] Preferences limit memory usage when active to 1007.34MB
2007-11-03 10:36:31 [---] Preferences limit memory usage when idle to 1813.21MB
2007-11-03 10:36:31 [---] Preferences limit disk usage to 1.86GB
2007-11-03 10:40:13 [rosetta@home] Restarting task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0 using rosetta version 569
2007-11-03 10:40:13 [rosetta@home] Restarting task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0 using rosetta version 569
2007-11-03 11:23:31 [rosetta@home] Computation for task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0 finished
2007-11-03 11:23:31 [rosetta@home] Starting 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0
2007-11-03 11:23:31 [rosetta@home] Starting task 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 using rosetta_beta version 580

2007-11-03 11:23:33 [rosetta@home] [file_xfer] Started upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0_0
2007-11-03 11:23:36 [rosetta@home] [file_xfer] Finished upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2114224_0_0
2007-11-03 11:23:36 [rosetta@home] [file_xfer] Throughput 42065 bytes/sec
2007-11-03 12:04:45 [rosetta@home] Deferring communication for 1 min 0 sec
2007-11-03 12:04:45 [rosetta@home] Reason: Unrecoverable error for result 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 (process exited with code 1 (0x1, -255))
2007-11-03 12:04:45 [rosetta@home] Computation for task 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 finished
2007-11-03 12:04:45 [rosetta@home] Output file 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0_0 for task 2reb__TREEJUMP_ABRELAX_NOTOR-2reb_-_BARCODE__2241_85_0 absent

2007-11-03 12:04:45 [rosetta@home] Starting 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0
2007-11-03 12:04:45 [rosetta@home] Starting task 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0 using rosetta_beta version 580
2007-11-03 13:02:50 [rosetta@home] Computation for task CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0 finished
2007-11-03 13:02:50 [rosetta@home] Starting 1ogw__TREEJUMP_ABRELAX_NOTOR-1ogw_-_BARCODE__2241_674_0
2007-11-03 13:02:50 [rosetta@home] Starting task 1ogw__TREEJUMP_ABRELAX_NOTOR-1ogw_-_BARCODE__2241_674_0 using rosetta_beta version 580
2007-11-03 13:02:52 [rosetta@home] [file_xfer] Started upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0_0
2007-11-03 13:02:55 [rosetta@home] [file_xfer] Finished upload of file CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2115010_0_0
2007-11-03 13:02:55 [rosetta@home] [file_xfer] Throughput 43435 bytes/sec
2007-11-03 14:33:06 [rosetta@home] Computation for task 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0 finished
2007-11-03 14:33:06 [rosetta@home] Starting 2reb__BARCODE_ABRELAX_NOTOR-2reb_-_BARCODE__2242_687_0
2007-11-03 14:33:06 [rosetta@home] Starting task 2reb__BARCODE_ABRELAX_NOTOR-2reb_-_BARCODE__2242_687_0 using rosetta_beta version 580
2007-11-03 14:33:09 [rosetta@home] [file_xfer] Started upload of file 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0_0
2007-11-03 14:33:17 [rosetta@home] [file_xfer] Finished upload of file 1di2__TREEJUMP_ABRELAX_NOTOR-1di2_-_BARCODE__2241_71_0_0
2007-11-03 14:33:17 [rosetta@home] [file_xfer] Throughput 7927 bytes/sec

Eric

Joined: Jan 20 06
Posts: 3
ID: 52822
Credit: 47,910
RAC: 0
Message 48311 - Posted 3 Nov 2007 20:58:57 UTC

I have a computation error on 1n0u__TREEJUMP_ABRELAX_NOTOR-1n0u_-_BARCODE__2241_50881_1
This is a first for me.
____________

AMD_is_logical

Joined: Dec 20 05
Posts: 299
ID: 41207
Credit: 31,460,681
RAC: 0
Message 48312 - Posted 3 Nov 2007 21:04:24 UTC

I'm getting errors with exit from pose.cc in some TREEJUMP_ABRELAX WUs:

http://boinc.bakerlab.org/rosetta/result.php?resultid=117501958
http://boinc.bakerlab.org/rosetta/result.php?resultid=117467416
http://boinc.bakerlab.org/rosetta/result.php?resultid=117368350
http://boinc.bakerlab.org/rosetta/result.php?resultid=117362156
http://boinc.bakerlab.org/rosetta/result.php?resultid=117342990
http://boinc.bakerlab.org/rosetta/result.php?resultid=117297062

transient
Avatar

Joined: Sep 30 06
Posts: 376
ID: 115553
Credit: 7,742,648
RAC: 5,672
Message 48321 - Posted 4 Nov 2007 0:20:53 UTC

I've also noticed an error with the same sort of unit as the previous poster.

TREEJUMP ABRELAX TOR EQ -5 PROB .5 SAVE ALL OUT
____________

Trey Profile

Joined: Oct 3 06
Posts: 11
ID: 116085
Credit: 110,142
RAC: 0
Message 48357 - Posted 4 Nov 2007 18:27:30 UTC
Last modified: 4 Nov 2007 18:30:43 UTC

I've had another failure on a different computer (running Windoze this time):

1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8157_0



P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 48365 - Posted 5 Nov 2007 3:33:15 UTC

This one failed after 34sec, first one in a while.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=107003644

11/5/2007 2:12:36 PM|rosetta@home|Reason: Unrecoverable error for result 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8499_0 (Incorrect function. (0x1) - exit code 1 (0x1))

11/5/2007 2:12:36 PM|rosetta@home|Computation for task 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8499_0 finished

11/5/2007 2:12:36 PM|rosetta@home|Output file 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8499_0_0 for task 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-1n0u_-_BARCODE__2244_8499_0 absent

Pete.

____________


rsubler

Joined: Jun 24 07
Posts: 8
ID: 185824
Credit: 172,618
RAC: 0
Message 48378 - Posted 5 Nov 2007 17:22:05 UTC

After 19 hours of crunching on work unit 106367190 this error condition has suddenly appeared.

I tried limiting BOINC to this work unit, reboooting my computer and running only BOINC -- to no avail. The error condition persists.

The computer is an AMD x2 3800+, both cores are available for BOINC and 90% of the 1 gig physical memory and 75% of a 3 gig swap file are available to BOINC.

Is there anything that I can do besides purging the WU?

Ron

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 48380 - Posted 5 Nov 2007 17:35:11 UTC

rsubler, if you open Windows task manager and go to the processes tab, how much memory does it indicate is used for that process? (it will have Rosetta in the name)
____________
Rosetta Moderator: Mod.Sense

rsubler

Joined: Jun 24 07
Posts: 8
ID: 185824
Credit: 172,618
RAC: 0
Message 48381 - Posted 5 Nov 2007 17:43:33 UTC

Mod sense

When I activate the next Rosetta WU (also 5.80), Task Manager shows 130,808k.

When I try to activate the problem WU, Task Manager does not have a Rosetta entry.

This is with all other BOINC WUs suspended.

Ron

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3388
ID: 106194
Credit: 0
RAC: 0
Message 48383 - Posted 5 Nov 2007 18:04:40 UTC

rsubler, you say "after 19 hrs of crunching"... has it recorded that much CPU time? Or has it been "waiting for memory" that long? If it has recorded that much CPU time, then it should mean it has been in a "running" status for that long. Has the amount of time spent increased in the last several hours?

Looks like your runtime preference must be 24hrs. What is shown for the % completed on it now?

Have you exited and restarted BOINC since you noticed this one?
____________
Rosetta Moderator: Mod.Sense

rsubler

Joined: Jun 24 07
Posts: 8
ID: 185824
Credit: 172,618
RAC: 0
Message 48384 - Posted 5 Nov 2007 18:11:04 UTC
Last modified: 5 Nov 2007 18:25:34 UTC

Mod.Sense

1. That is the amount of CPU time shown by Boinc. It has not changed for the last hour or two -- since I noticed the problem.

2. Yes, I have been running 24 hour Rosetta WUs for some months. The problem WU is now showing 79.646% complete.

3. Maybe. I restricted BOINC to the problem WU by suspending all others and then rebooted my system. I did not explicitly exit BOINC. I will try this next.
I just did an exit of BOINC and restart. The problem persists.

Thanks,
Ron

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 48395 - Posted 5 Nov 2007 20:35:08 UTC

I had another task fail last night same type, different computer.

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=107168558

11/5/2007 7:25:53 PM|rosetta@home|Reason: Unrecoverable error for result 2reb__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-2reb_-_BARCODE__2244_13080_0 (Incorrect function. (0x1) - exit code 1 (0x1))

11/5/2007 7:25:53 PM|rosetta@home|Computation for task 2reb__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-2reb_-_BARCODE__2244_13080_0 finished

11/5/2007 7:25:53 PM|rosetta@home|Output file 2reb__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-2reb_-_BARCODE__2244_13080_0_0 for task 2reb__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-2reb_-_BARCODE__2244_13080_0 absent

Is someone looking at this problem.

Pete.


____________


rsubler

Joined: Jun 24 07
Posts: 8
ID: 185824
Credit: 172,618
RAC: 0
Message 48403 - Posted 5 Nov 2007 23:55:44 UTC

To Mod.Sense

Re: Waiting for memory, rsubler.

The problem has cured itself and the WU is now running.

I wish anyone good luck in duplicating the fault and cure. The WU resumed running
while I was playing an ancient game with DOSBOX. I thought that DOSBOX was a
resource hog, but that is the only variable of which I am aware.

Cheers,
Ron

Conan Profile
Avatar

Joined: Oct 11 05
Posts: 136
ID: 4053
Credit: 1,906,525
RAC: 1,152
Message 48440 - Posted 7 Nov 2007 12:08:33 UTC
Last modified: 7 Nov 2007 12:09:48 UTC

Had one of these errors on the 6th and now another two on the 7th.
2 were on Windows machine and 1 on a Linux machine all with the same error code on the same '1n0u' type WU.

This WU
This WU
This WU

<core_client_version>5.8.15</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 1408543
ERROR:: Exit from: .\pose.cc line: 769

Also on core client version 5.10.21
____________

Luuklag

Joined: Sep 13 07
Posts: 262
ID: 205058
Credit: 4,171
RAC: 0
Message 48470 - Posted 8 Nov 2007 19:09:10 UTC

[url=http://boinc.bakerlab.org/rosetta/result.php?resultid=117906669
]a failed one[/url]

ERROR:: Exit from: .\pose.cc line: 769

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 48476 - Posted 8 Nov 2007 21:26:43 UTC

Result ID 118032106
Name 2reb__TREEJUMP_ABRELAX_TOR_EQ_-1_PROB_.1_SAVE_ALL_OUT-2reb_-_BARCODE__2244_13258_0
Workunit 107265106
Created 5 Nov 2007 8:09:11 UTC
Sent 5 Nov 2007 11:10:58 UTC
Received 8 Nov 2007 16:17:30 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 510574
Report deadline 15 Nov 2007 11:10:58 UTC
CPU time 5042.90625
stderr out <core_client_version>5.10.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 1419133
ERROR:: Exit from: .\pose.cc line: 769

</stderr_txt>
]]>


Validate state Invalid
Claimed credit 20.725275001017
Granted credit 0
application version 5.80

____________

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 48477 - Posted 8 Nov 2007 21:28:10 UTC

To Luuklag.

SNAP.
____________

M.L.

Joined: Nov 21 06
Posts: 182
ID: 130574
Credit: 180,462
RAC: 0
Message 48486 - Posted 9 Nov 2007 7:34:20 UTC

Result ID 118192540
Name 1n0u__TREEJUMP_ABRELAX_TOR_EQ_-5_PROB_.5_SAVE_ALL_OUT-1n0u_-_BARCODE__2243_13643_1
Workunit 107277272
Created 5 Nov 2007 22:07:48 UTC
Sent 5 Nov 2007 22:08:22 UTC
Received 8 Nov 2007 23:31:01 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 510574
Report deadline 15 Nov 2007 22:08:22 UTC
CPU time 3377.8125
stderr out <core_client_version>5.10.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 21600
# random seed: 1488748
ERROR:: Exit from: .\pose.cc line: 769

</stderr_txt>
]]>


Validate state Invalid
Claimed credit 13.8820928833196
Granted credit 0
--also failed as WU 118046654 5 Nov.
____________

rochester new york Profile
Avatar

Joined: Jul 2 06
Posts: 2569
ID: 98229
Credit: 985,786
RAC: 1,084
Message 48495 - Posted 9 Nov 2007 12:30:30 UTC

could someone tell me why i got all these errors

Message boards : Number crunching : Problems with Rosetta version 5.80


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^