Rosetta@home

Problems with Minirosetta Version 1.71

  UW Seal
 
[ Home ] [ Join ] [ About ] [ Participants ] [ Community ] [ Statistics ]
  [ login/out ]


Advanced search
Message boards : Number crunching : Problems with Minirosetta Version 1.71

Sort
AuthorMessage
Yifan Song
Forum moderator
Project administrator
Project developer
Project scientist

Joined: May 26 09
Posts: 62
ID: 318024
Credit: 7,322
RAC: 0
Message 61386 - Posted 26 May 2009 19:07:08 UTC

minirosetta updated to 1.71. Bug fixes and some local parameter file loading options added in this version.

Nelson

Joined: Jun 8 06
Posts: 1
ID: 93215
Credit: 348,195
RAC: 0
Message 61404 - Posted 27 May 2009 2:20:46 UTC

I'm getting: "Message from server: Server error: Can't attach shared memory"
____________

adrianxw Profile
Avatar

Joined: Sep 18 05
Posts: 535
ID: 402
Credit: 1,057,641
RAC: 1,674
Message 61410 - Posted 27 May 2009 7:40:57 UTC

I think that is a server issue, not client specific. I've seen it a few times in the last few months.
____________
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

googloo
Avatar

Joined: Sep 15 06
Posts: 105
ID: 112667
Credit: 5,953,021
RAC: 7,157
Message 61413 - Posted 27 May 2009 12:48:35 UTC

Once again, I beg you: please, please, please post version changes to Rosetta Application Version Release Log. That way we'll get an email and will be able to update our firewalls. PLEASE

Drockarius

Joined: May 9 09
Posts: 1
ID: 314840
Credit: 0
RAC: 0
Message 61414 - Posted 27 May 2009 13:35:35 UTC - in response to Message ID 61386.

I am having problems getting the project. I keep getting this:

5/27/2009 9:07:14 AM rosetta@home [error] File minirosetta_1.71_windows_x86_64.exe has wrong size: expected 8556544, got 7619862
5/27/2009 9:30:55 AM rosetta@home [error] File minirosetta_graphics_1.64_windows_x86_64.exe has wrong size: expected 2498560, got 2216910

As a consequence, the project never completely downloads, so I am unable to participate in the project. I have detached from and attempted to re-attach to the project on several occasions, but the result is always the same and the files always stop downloading when they reach a certain size.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 61419 - Posted 28 May 2009 0:19:07 UTC

just a comment about these (lr5_E_yf_chbond_05_rlbd_1bq9_SAVE_ALL_OUT....) tasks. for me they go very very quick. had 99 decoys in 1hr and 17 mins on one task. think that is the fastest i have ever done 99 decoys. another one reached 99 in 2hrs plus. still very fast.

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 507,926
RAC: 0
Message 61422 - Posted 28 May 2009 4:48:19 UTC - in response to Message ID 61419.

just a comment about these (lr5_E_yf_chbond_05_rlbd_1bq9_SAVE_ALL_OUT....) tasks. for me they go very very quick. had 99 decoys in 1hr and 17 mins on one task. think that is the fastest i have ever done 99 decoys. another one reached 99 in 2hrs plus. still very fast.

this is normal 1r5 tasks have always reached 99 decoys fast. I am unsure as to why this is the case though.
____________
Have a crunching good day!!

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 61425 - Posted 28 May 2009 10:13:38 UTC - in response to Message ID 61422.

just a comment about these (lr5_E_yf_chbond_05_rlbd_1bq9_SAVE_ALL_OUT....) tasks. for me they go very very quick. had 99 decoys in 1hr and 17 mins on one task. think that is the fastest i have ever done 99 decoys. another one reached 99 in 2hrs plus. still very fast.

this is normal 1r5 tasks have always reached 99 decoys fast. I am unsure as to why this is the case though.


looks like it is just the yf tasks that have this speedy result. i had a new icoor task that ran the full 4hrs. yf must be a simple task.

Ian McGregor Profile

Joined: Oct 21 08
Posts: 5
ID: 284732
Credit: 1,358,192
RAC: 1,030
Message 61449 - Posted 29 May 2009 18:21:19 UTC

Not sure why but the past 25 WU's of v1.71 i've gotten have all had computation errors and exited before finishing

Hammeh Profile

Joined: Nov 11 08
Posts: 63
ID: 287579
Credit: 211,283
RAC: 0
Message 61450 - Posted 29 May 2009 18:24:43 UTC - in response to Message ID 61449.

Not sure why but the past 25 WU's of v1.71 i've gotten have all had computation errors and exited before finishing


Your computer list shows no failed tasks.

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 507,926
RAC: 0
Message 61455 - Posted 29 May 2009 22:11:26 UTC

looks like it is just the yf tasks that have this speedy result. Yf must be a simple task.
lr5_E_yf_chbond_05_run2_rlbd_4ubp_SAVE_ALL_OUT_12502_390_0 ran for 5.9 hours & my time preference is 6 hours.
____________
Have a crunching good day!!

DD UU

Joined: May 18 09
Posts: 2
ID: 316250
Credit: 227,587
RAC: 3,789
Message 61456 - Posted 30 May 2009 0:42:17 UTC

My client (works only with rosetta) can't get any work units.
It is saying that I need 39 Gb of space.
(And that requirement is increasing. Every time it connects to get work units it asks for slightly more space.)

boinc: 29-May-2009 19:56:41 [rosetta@home] Sending scheduler request: To fetch work. Requesting 33217 seconds of work, reporting 0 completed tasks
boinc: 29-May-2009 19:56:46 [rosetta@home] Scheduler request completed: got 0 new tasks
boinc: 29-May-2009 19:56:46 [rosetta@home] Message from server: No work sent
boinc: 29-May-2009 19:56:46 [rosetta@home] Message from server: There was work but you don't have enough disk space allocated.
boinc: 29-May-2009 19:56:46 [rosetta@home] Message from server: An additional 39382 MB is needed.

Is it true that I need 39Gb?

DD UU

Joined: May 18 09
Posts: 2
ID: 316250
Credit: 227,587
RAC: 3,789
Message 61461 - Posted 30 May 2009 4:14:50 UTC - in response to Message ID 61456.

I have restarted the client, and that has fixed it.

zpm

Joined: Mar 21 09
Posts: 6
ID: 306856
Credit: 349,801
RAC: 0
Message 61473 - Posted 30 May 2009 14:59:58 UTC - in response to Message ID 61461.

I have restarted the client, and that has fixed it.


this issue also popped of wu's needing more space up at dd@h (drugdiscovery) but it was server side related.

____________

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
http://boinc.drugdiscoveryathome.com

Michael Hoffmann Profile
Avatar

Joined: Jun 5 08
Posts: 8
ID: 263088
Credit: 886,216
RAC: 882
Message 61476 - Posted 30 May 2009 22:47:55 UTC

I'm running task lb_alnmatrix_threading_alncap__hb_t313__IGNORE_THE_REST_12577_3074_0 right now. The elapsed time is over 11 hours now, predicted are another 30. This was the same with lr5_D_rama_map_iter05_rlbn_1kpe_SAVE_ALL_OUT_NATIVE_NOCON_12603_60_0_0. The outcome is nothing really special so I wonder if this is normal. Usually, at least in the previous version, I needed between 3 and 5 hours for a task.
The current task's graphics also cannot be displayed. This is all a bit strange - is it due to the new minirosetta version?

(By the way, I'm running a Vista64 system with 2x3,25 GHz and 4Gb RAM)

Toby Broom

Joined: Oct 15 08
Posts: 7
ID: 283928
Credit: 5,360,371
RAC: 14,700
Message 61477 - Posted 30 May 2009 22:58:51 UTC

I seem to have a few tasks that seem to hang part way through, there still "running" in BOINC but there are way over the default 3hrs:



I aborted a few to keep my computer going e.g.:

255088102
255066919
255015300
254965104
254889680

Any other infomation that is of use?

P . P . L .
Avatar

Joined: Aug 20 06
Posts: 581
ID: 105843
Credit: 4,864,105
RAC: 0
Message 61478 - Posted 30 May 2009 23:09:10 UTC

Hi.

Some of mine seem to go in the other direction, i just noticed this one is a bit

odd, any idea why it finished early, it only ran half way!

lb_alnmatrix_threading_alncap__hb_t363__IGNORE_THE_REST_12591_1728

http://boinc.bakerlab.org/rosetta/workunit.php?wuid=232788417

# cpu_run_time_pref: 14400
======================================================
DONE :: 22 starting structures 7877.82 cpu seconds
This process generated 22 decoys from 22 attempts
======================================================

pete.

____________


robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 61481 - Posted 31 May 2009 0:26:31 UTC - in response to Message ID 61477.

I seem to have a few tasks that seem to hang part way through, there still "running" in BOINC but there are way over the default 3hrs:



I aborted a few to keep my computer going e.g.:

255088102
255066919
255015300
254965104
254889680

Any other infomation that is of use?


Looks like BOINC thinks you're running on 8 CPU cores. Do you actually have that many, or is hyperthreading making it look like you have twice as many as you actually have? Hyperthreading allows BOINC to use any cpu time that the other workunit on the same cpu core does not use, but BOINC is unable to keep good track of which workunits use now much CPU time when hyperthreading is in use.

Also, I'd expect the total memory requirements to be increased when that many workunits are trying to run at once. Just how much RAM memory do you have?

And how much disk space do you allow BOINC to use?

Toby Broom

Joined: Oct 15 08
Posts: 7
ID: 283928
Credit: 5,360,371
RAC: 14,700
Message 61482 - Posted 31 May 2009 1:13:50 UTC
Last modified: 31 May 2009 1:20:26 UTC

Hi Rob,

The systems that are having problems are Xeon 54xx's, both dual processor sytems without hypertreading, so the 8 cores is real.

The systems use 2-3Gb of ram and have 4Gb

There set to the default of 10Gb space

The 1.71 seems to be less reliable but I have just upgraded to BOINC 6.6.28?

See here:
1060278
941259

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 61483 - Posted 31 May 2009 1:53:55 UTC - in response to Message ID 61481.
Last modified: 31 May 2009 1:59:32 UTC

I traced through your aborted workunits to the information BOINC maintains about your computer. It thinks you have 8 CPU cores and 4 GB of memory. I've found that at least under Vista SP1, you need at least 1 GB of memory per CPU core to run minirosetta, and some of those WCG programs, at full speed. Is your motherboard capable of handling another 4 GB of memory, and can you afford it?

WARNING - don't use the usual Crucial program for telling how much memory your motherboard can handle under any 64-bit operating system, unless you're prepared for an immediate operating system crash. It seems that program hadn't been adequately tested under 64-bit operating systems the last time I tried it on my 64-bit machine.

By default, BOINC won't use more than about half the available memory unless you go out of your way to tell out of your way to tell it that it can. Therefore, just currently not using much more than half of it doesn't mean that BOINC has enough for the workload you're giving it. To check how much effect this has, try running BOINC with the setting not to use more than half your CPU cores at a time, and see how much effect this has on the speed at which minirosetta workunits run.

On my 2 CPU core 32-bit machine, I found that either the default of 10 GB disk space or the default settings for how much swap space BOINC could use weren't enough. It's hard for me to tell which, because I changed them both at once.

I'd suggest that you change only one of these at a time, and record what the effects are:

1. Increase the disk space to 10 GB times the number of CPU cores. Expect BOINC to divide the allowed swap space equally among all the BOINC projects it's been told to connect to, before deciding how much to allocate to each workunit. Therefore, some BOINC projects can run short of swap space, while others aren't using all they're allocated.

2. Allow BOINC to use a higher percentage of the swap space, since BOINC is probably all you're running on that machine that needs much swap space, and Vista will base the total size of the swap space on how much of it is used.

Note that when the number of apparant CPU cores has been doubled by hyperthreading, you cannot run all of the new number at full speed at the same time, and BOINC has problems judging how much CPU time is used on one of a hyperthreaded pair when the other member of the pair is also in use. Therefore, don't expect hyperthreading to increase your total throughput very much over using only one member of each hyperthreaded pair.

Toby Broom

Joined: Oct 15 08
Posts: 7
ID: 283928
Credit: 5,360,371
RAC: 14,700
Message 61496 - Posted 31 May 2009 11:10:04 UTC

The Vista SP1 machine seems fine, this only has 4 cores and 4gb of ram.

I'll keep an eye out for some more ram, the 8 core machines can take 8Gb easy.

I upped the 10GB of disk space and see how it goes, if it doesn't drop the error rate then I'll do the swap.

The older Xeons don't have hyper treading so no worries there.

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 61503 - Posted 31 May 2009 15:44:28 UTC

Task 255348421 failed at startup on Mac

Setting up checkpointing ...
Setting up graphics native ...

ERROR: ERROR: no template_pdb provided for alignment 1AXJ__1
ERROR:: Exit from: src/protocols/jd2/ThreadingJobInputter.cc line: 234
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>


____________

Chilean Profile
Avatar

Joined: Oct 16 05
Posts: 651
ID: 5008
Credit: 10,238,180
RAC: 4,709
Message 61510 - Posted 1 Jun 2009 5:37:26 UTC - in response to Message ID 61477.
Last modified: 1 Jun 2009 5:40:42 UTC

I seem to have a few tasks that seem to hang part way through, there still "running" in BOINC but there are way over the default 3hrs:



I aborted a few to keep my computer going e.g.:

255088102
255066919
255015300
254965104
254889680

Any other infomation that is of use?


Wow...

My guess is that HyperThreathing is enabled which might cause a few problems with BOINC.

Also, after going thru your PCs specs, I think this PC in particular has 4GB of RAM. Considering R@H uses ~0.25GB and the other do so as well... 2GB are used up by BOINC ONLY. Take away another 1GB by Windows... then another 1GB by some other application and your RAM is gone...
____________

Path7

Joined: Aug 25 07
Posts: 128
ID: 201002
Credit: 61,751
RAC: 0
Message 61515 - Posted 1 Jun 2009 10:19:23 UTC
Last modified: 1 Jun 2009 10:27:06 UTC

Hello all,
It's been a long time ago since my last error on Rosetta@home, but today I had an error on the next WU:
lb_dk_ksync_full_hb_t297__IGNORE_THE_REST_12608_4068_0

ERROR: ERROR: no template_pdb provided for alignment 1BWP__1
ERROR:: Exit from: ..\..\src\protocols\jd2\ThreadingJobInputter.cc line: 234
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

BOINC 5.10.45 / Vista home prem. SP-1

The same error on the second run:
BOINC 6.2.18 / Mac. (Darwin 9.7.0)

Have a nice day,
Path7.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,545,746
RAC: 7,447
Message 61530 - Posted 2 Jun 2009 1:47:04 UTC

Not sure if I'm posting to the right thread, but has anyone noticed a sudden reduction in both claimed and granted credits recently?

My 4hr WUs used to ask for about 55 credits\WU but from 29th May this suddenly dropped to about 34 credits\WU and it hasn't varied since.

The only change at my end was the installation of Vista SP2. Surely this can't be the cause, can it? Anyone else noticed the same thing?
____________

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 61533 - Posted 2 Jun 2009 3:19:41 UTC - in response to Message ID 61530.
Last modified: 2 Jun 2009 3:24:46 UTC

Not sure if I'm posting to the right thread, but has anyone noticed a sudden reduction in both claimed and granted credits recently?

My 4hr WUs used to ask for about 55 credits\WU but from 29th May this suddenly dropped to about 34 credits\WU and it hasn't varied since.

The only change at my end was the installation of Vista SP2. Surely this can't be the cause, can it? Anyone else noticed the same thing?


I've noticed something somewhat similar lately with 12 hour WUs now under Vista SP2, but at least in my case the difference seems to be that more workunits reach their 99 decoys limit instead of trying to use all 12 hours.

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 61537 - Posted 2 Jun 2009 8:23:02 UTC

This one 255673525

lb_dk_ksync_full_hb_t297__IGNORE_THE_REST_12608_4184_1

has had its second chance. On both times it failed at less than 21 seconds.
____________

Telescope Adrian

Joined: Nov 14 06
Posts: 9
ID: 129278
Credit: 1,906,378
RAC: 0
Message 61539 - Posted 2 Jun 2009 10:10:43 UTC

Hello there . I have been running some 1.71 work units for a few days now and have made the following strange observation . When running 2 together , after a while the CPU usage for both drops to around 50-60 percent .My preferences are set to allow 100% usage of both cores in my Athlon 64 x 2 3.2GHz. The system idle process shows as using around 40% processor .
Has anyone else observed this " anomaly " , or is there an obvious answer to this .I have the pedal to the metal for both cores , yet they're not being fully utilised .Doesn't happen with WCG or Spinhenge .
____________

Schobbe

Joined: Jun 1 09
Posts: 1
ID: 319122
Credit: 7,431
RAC: 0
Message 61542 - Posted 2 Jun 2009 11:06:20 UTC

I am having Problems with downloading these two files:
minirosetta_1.71_windows_intelx86.exe
minirosetta_graphics_1.64_windows_intelx86.exe

They stop downloading at about 90%.
I think it is the same problem that Drockarius has.

02.06.2009 13:00:11 rosetta@home [error] File minirosetta_1.71_windows_intelx86.exe has wrong size: expected 8556544, got 7860686
02.06.2009 13:00:11 rosetta@home [error] File minirosetta_graphics_1.64_windows_intelx86.exe has wrong size: expected 2498560, got 2296146


Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 61544 - Posted 2 Jun 2009 13:22:54 UTC - in response to Message ID 61539.

Hello there . I have been running some 1.71 work units for a few days now and have made the following strange observation . When running 2 together , after a while the CPU usage for both drops to around 50-60 percent .My preferences are set to allow 100% usage of both cores in my Athlon 64 x 2 3.2GHz. The system idle process shows as using around 40% processor .
Has anyone else observed this " anomaly " , or is there an obvious answer to this .I have the pedal to the metal for both cores , yet they're not being fully utilised .Doesn't happen with WCG or Spinhenge .


Do both tasks show they are still running? Or has one gone to a status of "waiting for memory"? Many of the WCG tasks take significantly less memory then the Rosetta work. Check the memory settings for your machine for when it is active and when it is idle.
____________
Rosetta Moderator: Mod.Sense

Ian McGregor Profile

Joined: Oct 21 08
Posts: 5
ID: 284732
Credit: 1,358,192
RAC: 1,030
Message 61545 - Posted 2 Jun 2009 15:14:59 UTC - in response to Message ID 61450.

Not sure why but the past 25 WU's of v1.71 i've gotten have all had computation errors and exited before finishing


Your computer list shows no failed tasks.


Here's where I'm looking..
http://boinc.bakerlab.org/rosetta/results.php?hostid=927910

Pepo
Avatar

Joined: Sep 28 05
Posts: 115
ID: 1676
Credit: 101,358
RAC: 0
Message 61549 - Posted 2 Jun 2009 22:06:11 UTC
Last modified: 2 Jun 2009 22:07:48 UTC

One failed task lb_alnmatrix_threading_alncap__hb_t325__IGNORE_THE_REST_12581_1162_0 without any appatent reason - exit 0, but invalid result.
Maybe a failed computation restart.

Win XP SP3, BOINC 6.6.23.

Peter

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,545,746
RAC: 7,447
Message 61553 - Posted 3 Jun 2009 2:45:11 UTC - in response to Message ID 61530.

Not sure if I'm posting to the right thread, but has anyone noticed a sudden reduction in both claimed and granted credits recently?

My 4hr WUs used to ask for about 55 credits\WU but from 29th May this suddenly dropped to about 34 credits\WU and it hasn't varied since.

The only change at my end was the installation of Vista SP2. Surely this can't be the cause, can it? Anyone else noticed the same thing?

In spite of no-one else reporting similar experiences, from this morning (exactly on midnight again) claimed credit has shot up to around 64 with an equivalent jump in granted credit. I certainly didn't do anything this time - not even a re-boot.

All very odd.
____________

Peter Moss Profile

Joined: Oct 3 05
Posts: 3
ID: 2398
Credit: 3,150,429
RAC: 2,853
Message 61557 - Posted 3 Jun 2009 7:06:55 UTC

Just a minor side-line issue, with the 'screen-saver' image.

Currently running the following...

lb_dk_ksync__full_hb_t293__IGNORE_THE_REST_12640_2287_0
Stage: unk
using minirosetta version 171

Out of curiosity I had a look at the 'Running' graphics,
there seemd to be a bug there. I can only see the occaisional upper
edge of the folding results. I see all of the Native, the top 1/4 of
the Low Energy, the same or less in the Searching/Accepted windows.
It makes no difference if I expand the 'window'

Seems to have no effect on performance tho', which is a relief.


____________

Toby Broom

Joined: Oct 15 08
Posts: 7
ID: 283928
Credit: 5,360,371
RAC: 14,700
Message 61585 - Posted 5 Jun 2009 22:22:25 UTC - in response to Message ID 61483.


1. Increase the disk space to 10 GB times the number of CPU cores. Expect BOINC to divide the allowed swap space equally among all the BOINC projects it's been told to connect to, before deciding how much to allocate to each workunit. Therefore, some BOINC projects can run short of swap space, while others aren't using all they're allocated.

2. Allow BOINC to use a higher percentage of the swap space, since BOINC is probably all you're running on that machine that needs much swap space, and Vista will base the total size of the swap space on how much of it is used.


Just to report back, 1. didn't seem to work, 2. seems to have fixed the problems :)

Toby Broom

Joined: Oct 15 08
Posts: 7
ID: 283928
Credit: 5,360,371
RAC: 14,700
Message 61586 - Posted 5 Jun 2009 22:26:09 UTC - in response to Message ID 61510.



Wow...

My guess is that HyperThreathing is enabled which might cause a few problems with BOINC.

Also, after going thru your PCs specs, I think this PC in particular has 4GB of RAM. Considering R@H uses ~0.25GB and the other do so as well... 2GB are used up by BOINC ONLY. Take away another 1GB by Windows... then another 1GB by some other application and your RAM is gone...


The PC doesn't have HyperThreathing, it's a Xeon so there is 2 Quad core chips in the motherboard.

The PC is dedicated to BOINC so there isn't any other applications running, after adjusting the memory settings for BOINC it's seems happier, still only using 66% of total ram.

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,545,746
RAC: 7,447
Message 61600 - Posted 7 Jun 2009 2:38:50 UTC

A rare compute error:

lb_dk_ksync_full_hb_t370__IGNORE_THE_REST_12633_894_1

<core_client_version>6.6.20</core_client_version>

[...]

ERROR: ERROR: no template_pdb provided for alignment 1AXJ__1
ERROR:: Exit from: ..\..\src\protocols\jd2\ThreadingJobInputter.cc line: 234
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

____________

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 507,926
RAC: 0
Message 61609 - Posted 7 Jun 2009 21:11:52 UTC
Last modified: 7 Jun 2009 21:12:20 UTC

Has anyone had the screen saver freeze but the mouse arrow moves around the screen with ease? I had screen saver set for 30 minutes, I got home to find the screen saver was stuck am unsure how long it was stuck for. Are there any flags I can set to see what is going on? So this doesn't happen in the meantime I have set my screen saver to something different. this is the host in question Thanks in advance.
____________
Have a crunching good day!!

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 61612 - Posted 8 Jun 2009 1:10:37 UTC

Another template_pdb error on Mac for task 257091019

ERROR: ERROR: no template_pdb provided for alignment 1BWP__1
ERROR:: Exit from: src/protocols/jd2/ThreadingJobInputter.cc line: 234
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

____________

Murasaki
Avatar

Joined: Apr 20 06
Posts: 303
ID: 78284
Credit: 365,375
RAC: 94
Message 61613 - Posted 8 Jun 2009 1:20:08 UTC

Task ID: 257140458
Name: lb_thread_all_multi_hb_t373__IGNORE_THE_REST_12747_579_1

sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
dof_atom1 atomno= 3 rsd= 1
atom1 atomno= 1 rsd= 1
atom2 atomno= 2 rsd= 1
atom3 atomno= 5 rsd= 1
atom4 atomno= 6 rsd= 1
THETA1 1.#QNAN00
THETA3 1.#QNAN00
PHI2 1.#QNAN00

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: ..\..\src\core\kinematics\AtomTree.cc line: 754
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish
____________

Paul D. Buck Profile

Joined: Sep 17 05
Posts: 815
ID: 269
Credit: 1,812,737
RAC: 0
Message 61614 - Posted 8 Jun 2009 6:46:56 UTC

A variety of errors:

lb_thread_all_multi_hb_t328__IGNORE_THE_REST_12734_731_0
lb_thread_control_hb_t297__IGNORE_THE_REST_12685_817_0
lb_thread_control_hb_t297__IGNORE_THE_REST_12685_741_1
lb_thread_all_multi_hb_t312__IGNORE_THE_REST_12726_551_0
lb_thread_all_multi_hb_t286__IGNORE_THE_REST_12715_228_1

The errors include no templates to:
interpolate rotamers bin out of range: SER_p:NtermProteinFull 0 nan nan nan
3 3 10 11 2147483649 22 0 nan

first time I saw a "Not a Number" error ...

Divide Overflow

Joined: Sep 17 05
Posts: 82
ID: 129
Credit: 921,382
RAC: 0
Message 61623 - Posted 8 Jun 2009 15:11:25 UTC - in response to Message ID 61614.

Had a sudden burst of faults here:

257095553
257150714
257151825

Value out of legal range and no template provided errors.

____________

svincent

Joined: Dec 30 05
Posts: 202
ID: 44923
Credit: 4,102,500
RAC: 5,735
Message 61626 - Posted 9 Jun 2009 2:04:49 UTC

Several errors (Mac)

Task 257226638 failed after 7hrs (my run time preference is 3hrs: I haven't seen other tasks overrun like this) with an Hbond tripped

Hbond tripped: [2009- 6- 8 4:54:59:]
BOINC:: CPU time: 25400.8s, 14400s + 10800s[2009- 6- 8 12:44:44:] :: BOINC
InternalDecoyCount: protocols::boinc::Boinc::decoy_count() (GZ)
======================================================
DONE :: 1 starting structures 25400.8 cpu seconds
This process generated 1 decoys from 1 attempts

-----

Task 257182103 failed, also with an Hbond tripped but in a different way and much earlier after anout 12 minutes

Hbond tripped: [2009- 6- 8 1:42:17:]
interpolate rotamers bin out of range: SER_p:NtermProteinFull 0 nan nan nan
3 3 10 11 2147483649 22 0 nan
ERROR:: Exit from: src/core/scoring/dunbrack/RotamericSingleResidueDunbrackLibrary.tmpl.hh line: 589
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

-----

ANd [url=http://boinc.bakerlab.org/rosetta/result.php?resultid=257133352] 257133352 failed in a similar way


____________

[SG] ronaldo

Joined: Mar 18 07
Posts: 1
ID: 153670
Credit: 600,025
RAC: 0
Message 61632 - Posted 9 Jun 2009 11:35:19 UTC

have also some errors

Task ID 257076984
Task ID 257061423

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 61636 - Posted 9 Jun 2009 14:20:31 UTC

Looks like 1.71 still has the lockfile problem:

http://boinc.bakerlab.org/rosetta/result.php?resultid=257487687

This is with BOINC 6.2.28 under 32-bit Vista SP2, set to 95% CPU in order to look for the lockfile problem.

Starfire

Joined: Jan 1 06
Posts: 2
ID: 45508
Credit: 209,760
RAC: 0
Message 61638 - Posted 9 Jun 2009 16:59:39 UTC
Last modified: 9 Jun 2009 17:01:10 UTC

I've also run into some WUs that errored out:

First type of error:
234647436

Second type of error:
234660726
234660731

Currently I've 2 more WUs running that show the same behavior in the application graphics like the 2 above:
234626282
234608536




Both have already errored out fore someone else. Should I abort them?
____________
Starfire

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 61639 - Posted 9 Jun 2009 17:21:05 UTC - in response to Message ID 61636.

Looks like 1.71 still has the lockfile problem:

http://boinc.bakerlab.org/rosetta/result.php?resultid=257487687

This is with BOINC 6.2.28 under 32-bit Vista SP2, set to 95% CPU in order to look for the lockfile problem.


Robert, have you been able to capture any of the information requested in the wiki here?
http://www.boinc-wiki.info/Can%27t_acquire_lockfile_-_exiting
____________
Rosetta Moderator: Mod.Sense

[ESL Brigade] marcsen

Joined: May 24 09
Posts: 1
ID: 317428
Credit: 96,858
RAC: 0
Message 61643 - Posted 9 Jun 2009 18:15:40 UTC
Last modified: 9 Jun 2009 18:17:27 UTC

I have also errors in all "lb_thread_all_multi..." WorkUnits after ~7 hours of crunching on 2 different computers.
http://boinc.bakerlab.org/rosetta/result.php?resultid=257123602
http://boinc.bakerlab.org/rosetta/result.php?resultid=257088113
http://boinc.bakerlab.org/rosetta/result.php?resultid=257122604
http://boinc.bakerlab.org/rosetta/result.php?resultid=257121758

I will abort this sort of WorkUnits now when i see any of this in the task-list.

gazzawazza

Joined: May 4 07
Posts: 28
ID: 173083
Credit: 294,873
RAC: 3
Message 61646 - Posted 9 Jun 2009 18:44:18 UTC

hi all.

I've been having problems with BOINC 6.6.31 (running as a service) and Rosetta 1.71.

Am running on vista home premium sp2 (32 bit) on a stock clock-speed Q6600.

Please review my thread for the detail:

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=4933

In summary though, I've had file size mismatches (when resetting project and downloading core rosetta files again) but think I fixed that by making sure network activity is always available.

Have been getting loads of task restarts, "exited with zero status but no 'finished' file" messages & recommendations to reset the project.

Also, the minirosetta 1.71.exe gets locked (sometimes with the process stuck in memory), even after exiting BOINC (and all other related processes exiting ok too).

Am running other projects (i.e. climateprediction, malariacontrol, world community grid) with no problems.

Finally, for the record, I have one rosetta WU that has run ok, with no computation errors, no restarts, etc. All the rest have been problematic:

"lr5_D_chbond_05_run2_rlbn_1u5z_SAVE_ALL_OUT_NATIVE_NOCON_12601_182_1".


Regards,

Gary

PinkPenguin Profile

Joined: Apr 26 09
Posts: 5
ID: 313164
Credit: 280,676
RAC: 0
Message 61647 - Posted 9 Jun 2009 20:43:50 UTC

Ok, seems like I am experiencing the same problem with mini rosetta 1.71 on both Linux a Windows Vista boxes. This seems to apply to all lb_thread_all_multi... work units which give a -161 error on file transfer at the end of the Job which appears to have completed OK.

Pentium 4 - Linux (Fedora Core 9) - BOINC 6.4.7
http://boinc.bakerlab.org/rosetta/result.php?resultid=257118194
http://boinc.bakerlab.org/rosetta/result.php?resultid=257399741

Pentium Core Duo - Windows Vista - BOINC 6.6.31
http://boinc.bakerlab.org/rosetta/result.php?resultid=257114975

This is the error message in the output for all three examples above:


<file_xfer_error>
<file_name>lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>


.... I suspect a bug - anyway the result is a "client error" and a depressed PC which I am having to persuade to keep crunching and, anyway, it should at least get a few points for having done it's best.... the things one has to do to persuade these things to do some work !

All the best,
Richard

Mike* Profile

Joined: Feb 16 09
Posts: 5
ID: 301833
Credit: 102,030
RAC: 0
Message 61650 - Posted 9 Jun 2009 22:40:40 UTC
Last modified: 9 Jun 2009 22:42:25 UTC

I have had 5 wu all error with the same result as this one:
<file_xfer_error>
<file_name>lb_thread_all_multi_hb_t328__IGNORE_THE_REST_12734_447_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

All were also lb_thread_all_multi...

One was even reprocessed by another host and IT also has the same error.

I currently have 7 successful WUs, with 6 left in cache, 3 started.

Host is 1077338 (core i7, vista 64 ultimate, 12g memory)

Mike

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 61651 - Posted 10 Jun 2009 0:46:14 UTC - in response to Message ID 61639.

Looks like 1.71 still has the lockfile problem:

http://boinc.bakerlab.org/rosetta/result.php?resultid=257487687

This is with BOINC 6.2.28 under 32-bit Vista SP2, set to 95% CPU in order to look for the lockfile problem.


Robert, have you been able to capture any of the information requested in the wiki here?
http://www.boinc-wiki.info/Can%27t_acquire_lockfile_-_exiting


I hadn't known about that request before, but I may have a situation ready to try it now.

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 507,926
RAC: 0
Message 61654 - Posted 10 Jun 2009 3:12:45 UTC
Last modified: 10 Jun 2009 3:13:43 UTC

Result id 257383313 Compute error after 22,774 seconds. I there any chance of been granted any credit?
____________
Have a crunching good day!!

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 61656 - Posted 10 Jun 2009 11:47:51 UTC - in response to Message ID 61654.

Result id 257383313 Compute error after 22,774 seconds. I there any chance of been granted any credit?


Looks like the nightly credit granting script (which finds errors and gives credit for them) gave it credit. But you have to open the specific task to see it. It doesn't show on the task list when granted by the script.
____________
Rosetta Moderator: Mod.Sense

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 61657 - Posted 10 Jun 2009 12:07:30 UTC - in response to Message ID 61651.

Looks like 1.71 still has the lockfile problem:

http://boinc.bakerlab.org/rosetta/result.php?resultid=257487687

This is with BOINC 6.2.28 under 32-bit Vista SP2, set to 95% CPU in order to look for the lockfile problem.


Robert, have you been able to capture any of the information requested in the wiki here?
http://www.boinc-wiki.info/Can%27t_acquire_lockfile_-_exiting


I hadn't known about that request before, but I may have a situation ready to try it now.


The email address for sending the results to does not work from my address, and my BOINC directory does not contain any of the files asked for. Does the email address work from your location?

Under BOINC 6.2.28 installed to let all users use it under 32-bit Vista SP2, what is the standard name of the directory containing the files asked for and are such files wanted for all subdirectories or just that first directory level?

Under that BOINC, where is the slots subdirectory? A files search was unable to find it.

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 61658 - Posted 10 Jun 2009 12:39:06 UTC
Last modified: 10 Jun 2009 12:40:11 UTC

Robert, keep in mind that BOINC has two main directories now. One for the BOINC Manager, and one for the "data directory" where the projects and slots reside (and they could be the same).

I believe the files they are looking for are located in the data directory. The one just OVER the projects and slots subdirectories. And they start with "std".

You could EMail me the files and I could try to forward for you if the EMail address still isn't working for you.
____________
Rosetta Moderator: Mod.Sense

nick n
Avatar

Joined: Aug 26 07
Posts: 49
ID: 201050
Credit: 219,102
RAC: 0
Message 61660 - Posted 10 Jun 2009 17:07:58 UTC
Last modified: 10 Jun 2009 17:13:08 UTC

I guess when it rains it pours here too.

http://boinc.bakerlab.org/rosetta/result.php?resultid=257820669
http://boinc.bakerlab.org/rosetta/result.php?resultid=257819658
http://boinc.bakerlab.org/rosetta/result.php?resultid=257812025
http://boinc.bakerlab.org/rosetta/result.php?resultid=257786791
http://boinc.bakerlab.org/rosetta/result.php?resultid=257682526
http://boinc.bakerlab.org/rosetta/result.php?resultid=257652583
http://boinc.bakerlab.org/rosetta/result.php?resultid=257238981
http://boinc.bakerlab.org/rosetta/result.php?resultid=257148875
http://boinc.bakerlab.org/rosetta/result.php?resultid=257098736
etc...... O and also how do you hyperlink stuff so you can just click on the links above?

Starfire

Joined: Jan 1 06
Posts: 2
ID: 45508
Credit: 209,760
RAC: 0
Message 61661 - Posted 10 Jun 2009 17:22:41 UTC - in response to Message ID 61660.

O and also how do you hyperlink stuff so you can just click on the links above?


Hi,

take a look at this page.

Basically you have to write it like this (without the *):

[*url=http://boinc.bakerlab.org/rosetta/result.php?resultid=257820669]Task 257820669[*/url]
[*url=http://boinc.bakerlab.org/rosetta/result.php?resultid=257819658]Task 257819658[*/url]

____________
Starfire

Speedy
Avatar

Joined: Sep 25 05
Posts: 159
ID: 1058
Credit: 507,926
RAC: 0
Message 61663 - Posted 10 Jun 2009 21:12:00 UTC - in response to Message ID 61656.


Looks like the nightly credit granting script (which finds errors and gives credit for them) gave it credit. But you have to open the specific task to see it. It doesn't show on the task list when granted by the script.

Result id 257383313 was granted credit thanks. As you can see the top 2 tasks in the list has --- under the Granted Credit column.257383314 & 257383313 have been granted credit by the overnight script, credit hasn't been shown under the Granted Credit column. Is there a reason for this? Thanks in advance.
____________
Have a crunching good day!!

PinkPenguin Profile

Joined: Apr 26 09
Posts: 5
ID: 313164
Credit: 280,676
RAC: 0
Message 61664 - Posted 10 Jun 2009 22:23:47 UTC

Regarding the problems with the lb_thread_all_multi.... work units returning a -161 error code on file transfer at the end of the work unit.

I should note that several people have signaled the problem and that the same error occurs on different machines both Linux and Windows. It also occurs on both runs of the same work unit by different people. This seems to indicate a general problem with this type of WU rather than an isolated client error (for example due to anti-virus activity as suggested in the past).

The -161 error code appears to be given because there is no output file to send back. Here is the log message from stdoutdae.txt on windows:


09-Jun-2009 04:47:13 [rosetta@home] Computation for task lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0 finished
09-Jun-2009 04:47:13 [rosetta@home] Output file lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0_0 for task lb_thread_all_multi_hb_t308__IGNORE_THE_REST_12724_587_0 absent


For examples se previous messages:
Message 61638 (2nd type of error).
Message 61647
Message 61650

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 61668 - Posted 11 Jun 2009 5:48:37 UTC - in response to Message ID 61658.

Robert, keep in mind that BOINC has two main directories now. One for the BOINC Manager, and one for the "data directory" where the projects and slots reside (and they could be the same).

I believe the files they are looking for are located in the data directory. The one just OVER the projects and slots subdirectories. And they start with "std".

You could EMail me the files and I could try to forward for you if the EMail address still isn't working for you.


My current problem is that I'm having trouble finding the BOINC data directory, since for this combination of BOINC version and Windows version the frequently written files are in the data directory tree instead of the program-oriented BOINC directory. The Vista SP2 update seems to interfere with using the search function to find a directory if you know the lowest level name in the directory path, but not much more about it. Since the "std" files aren't directories, I'll try searching for them instead and see if that gets around this problem.

In case I need to search for a complete filename instead, what is the current full name of the files minirosetta uses as lockfiles?

Evan

Joined: Dec 23 05
Posts: 268
ID: 42505
Credit: 402,585
RAC: 0
Message 61675 - Posted 11 Jun 2009 8:34:25 UTC

[quote]Under that BOINC, where is the slots subdirectory? A files search was unable to find it.]

In my case the slots directory is at:

D:\Documents and Settings\All Users\Application Data\BOINC
____________

Mod.Sense
Forum moderator
Project administrator

Joined: Aug 22 06
Posts: 3381
ID: 106194
Credit: 0
RAC: 0
Message 61684 - Posted 11 Jun 2009 13:02:59 UTC

:) Robert, it's much easier then that. If your messages tab doesn't show clear back to when you last started BOINC Manager, restart the Manager. Within the first 20 messages of starting it will show the data directory path in a message.
____________
Rosetta Moderator: Mod.Sense

robertmiles Profile

Joined: Jun 16 08
Posts: 656
ID: 264600
Credit: 3,462,248
RAC: 2,198
Message 61687 - Posted 11 Jun 2009 14:54:44 UTC - in response to Message ID 61684.

:) Robert, it's much easier then that. If your messages tab doesn't show clear back to when you last started BOINC Manager, restart the Manager. Within the first 20 messages of starting it will show the data directory path in a message.


While waiting for an answer, I found that it's C:\ProgramData\BOINC on my machine - a directory that my search window treats as hidden and not easily searchable. The messages tab agrees.

Now, I'm still looking for the proper commands to zip up all the *.txt and *.old files from that directory and its subdirectories. The help files I can find don't seem to offer that information anywhere.

Greg_BE Profile
Avatar

Joined: May 30 06
Posts: 4835
ID: 85645
Credit: 2,948,921
RAC: 243
Message 61713 - Posted 12 Jun 2009 16:10:23 UTC

lb_thread_all_multi_hb_t328__IGNORE_THE_REST_12734_577_1 ran over the my time limit and errored out. It almost looks like it ran twice.


Outcome Client error
Client state Compute error
Exit status 0 (0x0)
BOINC:: CPU time: 29151.9s, 14400s + 14400s[2009- 6-12 13:21: 5:]

then there is this: <file_xfer_error>
<file_name>lb_thread_all_multi_hb_t328__IGNORE_THE_REST_12734_577_1_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

ByRad Profile
Avatar

Joined: Apr 12 08
Posts: 8
ID: 252633
Credit: 8,231,131
RAC: 13,507
Message 61730 - Posted 13 Jun 2009 9:39:55 UTC

2009-06-13 00:42:33 rosetta@home Starting tails_homnative_relaxed_9ctf_m00_p00_SAVE_ALL_OUT_12749_9578_0
2009-06-13 00:42:34 rosetta@home Starting task tails_homnative_relaxed_9ctf_m00_p00_SAVE_ALL_OUT_12749_9578_0 using minirosetta version 171

2009-06-13 11:17:16 rosetta@home Computation for task tails_homnative_relaxed_9ctf_m00_p00_SAVE_ALL_OUT_12749_9578_0 finished
2009-06-13 11:17:16 rosetta@home Output file tails_homnative_relaxed_9ctf_m00_p00_SAVE_ALL_OUT_12749_9578_0_0 for task tails_homnative_relaxed_9ctf_m00_p00_SAVE_ALL_OUT_12749_9578_0 absent

And the screen of the window with the error informations:
____________

Sid Celery

Joined: Feb 11 08
Posts: 796
ID: 241409
Credit: 9,545,746
RAC: 7,447
Message 61734 - Posted 14 Jun 2009 2:26:18 UTC

Sorry for the late post - I had some errors while I was away:

lb_thread_all_multi_hb_t374__IGNORE_THE_REST_12748_968_1

CPU Time: 17863.3 (default 14400)
stderr out <core_client_version>6.6.20</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<stderr_txt>
value legal range
sin_cos_range ERROR: 1.#QNAN00 is outside of [-1,+1] sin and cos value legal range
(Repeat many times)
dof_atom1 atomno= 3 rsd= 1
atom1 atomno= 1 rsd= 1
atom2 atomno= 2 rsd= 1
atom3 atomno= 5 rsd= 1
atom4 atomno= 6 rsd= 1
THETA1 1.#QNAN00
THETA3 1.#QNAN00
PHI2 1.#QNAN00

ERROR: AtomTree::torsion_angle_dof_id: angle range error
ERROR:: Exit from: ..\..\src\core\kinematics\AtomTree.cc line: 754
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>

Credit was awarded, but looks nasty and not an error message I've seen reported before.
____________

Message boards : Number crunching : Problems with Minirosetta Version 1.71


Home | Join | About | Participants | Community | Statistics

Copyright © 2017 University of Washington

Last Modified: 10 Nov 2010 1:51:38 UTC
Back to top ^