Problems with Minirosetta 1.80

Message boards : Number crunching : Problems with Minirosetta 1.80

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2105
Credit: 40,925,612
RAC: 18,224
Message 62093 - Posted: 5 Jul 2009, 13:07:37 UTC

A late report - sorry for the delay:

azurin_BOINC_ABRELAX_4xBIN_1xCYCLES_SAVE_ALL_OUT_IGNORE_THE_REST-S25-9-S3-3--azurin-_12935_2849_1
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
CPU time 0

<core_client_version>6.6.20</core_client_version>

ERROR: Option matching -PCS:npc_files_input not found in command line top-level context

No other errors in the last 217 WUs
ID: 62093 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile bruce

Send message
Joined: 15 Sep 07
Posts: 10
Credit: 839,797
RAC: 0
Message 62095 - Posted: 5 Jul 2009, 15:18:20 UTC
Last modified: 5 Jul 2009, 15:30:11 UTC

Hi,
I'm experiencing issues with 1.80 where by:
1)a WU does not exit memory.
I currently have 25 minirosetta_1.80_windows_intelx86.exe processes in memory only 2 of which are using any cpu time. Memory utilization ranges from 400kb to 200mb
The fact they are not exiting, is causing my virtual memory to run out.
2)I get error messages in the BOINC client.
3)The ...BOINCslots folder is filling up with numbered folders where most have only three files:boinc_lockfile, stderr.txt and stdout.txt.

I've rebooted, reset the project and still continue to get these errors.

Here are some specifics about my setup and the errors:
System:
3.0ghz Pentium 4 (w/hyperthreading on)
2.0gb RAM
WinXP sp3 (32bit)
Boinc 6.6.36 (Windows 32bit)
Preferences:
swtich between apps every 200minutes
use at most 100% processors
use at most 75% of CPU time
use at most 20gb HD space
use at most 50% memory when in use
use at most 90% memory when idle.
Projects: rosetta@home (Resource Share:600); seti@home (Resource Share:75)


Error from the ...BOINCstdoutdae.txt file (similar output on the BOINC manager Messages tab):
05-Jul-2009 07:47:45 [rosetta@home] If this happens repeatedly you may need to reset the project.
05-Jul-2009 07:47:45 [rosetta@home] Restarting task abinitio_withrelax_homfrag_129_B_1ynvA_SAVE_ALL_OUT_13795_445_0 using minirosetta version 180
05-Jul-2009 07:48:26 [rosetta@home] Task abinitio_withrelax_homfrag_129_B_1ynvA_SAVE_ALL_OUT_13795_445_0 exited with zero status but no 'finished' file
05-Jul-2009 07:48:26 [rosetta@home] If this happens repeatedly you may need to reset the project.
etc..etc..etc...


Here is some output from the stderr.txt in the slots folders (with only the three files mentioned above):
BOINC:: Worker startup.
Starting watchdog...
Watchdog active.
Starting work on structure: _U9X3X_00001
...
[2009- 7- 5 7:47: 4:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 7- 5 7:47:45:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 7- 5 7:48:26:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
[2009- 7- 5 7:49: 8:] :: BOINC:: Initializing ... ok.
Can't acquire lockfile - exiting
...


After a reboot: only two minirosetta_1.80_windows_intelx86.exe in memory, both using cpu time (one at 168mb the other at 219mb) Much more along the lines of what I would expect to see)
After a reboot: all the 'slot' folders with the boinc_lockfile are gone save for 3, the two working rosetta@home WUs and the one Seti@home WU. (again, what I would expect to see)

What other information can I provide that might help clue in on what is causing this problem.

Thanks for your help
ID: 62095 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile William T.M. Theisen

Send message
Joined: 11 Sep 06
Posts: 7
Credit: 527,145
RAC: 0
Message 62098 - Posted: 5 Jul 2009, 20:39:20 UTC

lb_dk_ksync_withtrim_hb_t297__IGNORE_THE_REST_12980_1893_0 Got stuck at 6.888% and has been running 29 hours so far, and has gone up in time for "time to completion" from 60 hours to 65 hours. I'm not sure what is going on with it, should I abort it?
ID: 62098 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
xsc2

Send message
Joined: 9 Jul 08
Posts: 4
Credit: 62,354
RAC: 0
Message 62102 - Posted: 6 Jul 2009, 6:54:46 UTC

ID: 62102 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
[AF>france>pas-de-calais]symaski62

Send message
Joined: 19 Sep 05
Posts: 47
Credit: 33,871
RAC: 0
Message 62107 - Posted: 6 Jul 2009, 19:01:10 UTC

abinitio_withrelax_nohomfrag_129_B_1shfA_SAVE_ALL_OUT_13798_612_0

https://boinc.bakerlab.org/rosetta/result.php?resultid=263840421
<![CDATA[
<stderr_txt>
[2009- 7- 6 17:41:24:] :: BOINC:: Initializing ... ok.
[2009- 7- 6 17:41:24:] :: BOINC :: boinc_init()
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully. 
Registering options.. 
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok 
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize()  End reached
Loaded options.... ok 
Processed options.... ok 
Initializing random generators... ok 
Initialization complete. 
Setting WU description ...
Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev30680.zip
Unpacking WU data ...
Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/fragments_1shf.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
Setting up folding (abrelax) ...
Beginning folding (abrelax) ... 
BOINC:: Worker startup. 
Starting watchdog...
Watchdog active.
Starting work on structure: _U9X3X_00001
Starting work on structure: _U9X3X_00002
Starting work on structure: _U9X3X_00003
Starting work on structure: _U9X3X_00004
Starting work on structure: _U9X3X_00005
Starting work on structure: _U9X3X_00006
Starting work on structure: _U9X3X_00007
Starting work on structure: _U9X3X_00008
Starting work on structure: _U9X3X_00009
Starting work on structure: _U9X3X_00010
Starting work on structure: _U9X3X_00011
Starting work on structure: _U9X3X_00012
Starting work on structure: _U9X3X_00013
Starting work on structure: _U9X3X_00014
Starting work on structure: _U9X3X_00015
Starting work on structure: _U9X3X_00016
Starting work on structure: _U9X3X_00017
Starting work on structure: _U9X3X_00018
Starting work on structure: _U9X3X_00019
Starting work on structure: _U9X3X_00020
======================================================
DONE ::     1 starting structures  10442.9 cpu seconds
This process generated     20 decoys from      20 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down cleanly ...
called boinc_finish


ID: 62107 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 62115 - Posted: 7 Jul 2009, 14:40:14 UTC

This one is taking 689MB of memory, peak was 986MB!
2a05_NN_DISCONTROL_BOINC_ABRELAX_SAVE_ALL_OUT_13840
It is 20hrs in to a 24hr runtime on Windows XP, under BOINC 6.6.20.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 62115 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 62119 - Posted: 7 Jul 2009, 22:49:54 UTC - in response to Message 62115.  

This one is taking 689MB of memory, peak was 986MB!
2a05_NN_DISCONTROL_BOINC_ABRELAX_SAVE_ALL_OUT_13840
It is 20hrs in to a 24hr runtime on Windows XP, under BOINC 6.6.20.


Here's a 2a05_NN_DISCONTROL_BOINC_ABRELAX_SAVE_ALL_OUT_13840 WU that ran on a single core diskless Linux node with 1GB installed. It ended with a bad_alloc error, which means the node ran out of physical memory. I've had a number of bad_alloc errors on 512MB nodes (which I no longer crunch with), but now it seems 1GB/core may no longer be enough for Rosetta.
ID: 62119 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MikeMcC3

Send message
Joined: 13 May 08
Posts: 2
Credit: 501,309
RAC: 0
Message 62133 - Posted: 8 Jul 2009, 17:50:07 UTC

I have no idea what is going on. When I look at the work that has been sent to my computer, I see about one-thousand work units that I haven't received. The due dates arrive, and get red-flagged as time-outs. I can't find any of the work units listed as sent, and no mention of those work units as being received by my computer. What the heck is going on? If anyone can tell me if they have had similar problems like this, or what may have caused it. I've been reducing data for BOINC for over 2 years now, and have never encountered any such problems.
ID: 62133 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 62151 - Posted: 9 Jul 2009, 20:12:12 UTC
Last modified: 9 Jul 2009, 20:12:48 UTC

I'm getting this many times per day now... never had it before this batch:

7/9/2009 10:49:34 AM|rosetta@home|Task picker-L1-sssim-1bk2A_13839_593_0 exited with a DLL initialization error.
7/9/2009 2:03:14 PM|rosetta@home|Task lr10_seq_score12_rlbd_1elw_IGNORE_THE_REST_DECOY_13841_116_0 exited with a DLL initialization error.
7/9/2009 2:05:31 PM|rosetta@home|Task 1sn6_NN_DISCONTROL_BOINC_ABRELAX_SAVE_ALL_OUT_13840_1231_0 exited with a DLL initialization error.
ID: 62151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Rob Heilman [Echo Labs]

Send message
Joined: 26 Apr 07
Posts: 20
Credit: 2,815,410
RAC: 0
Message 62165 - Posted: 10 Jul 2009, 13:33:52 UTC

I am getting a lot of compute errors on sel_core_4.5 work units. They all seem to report error code -161. Examples:

https://boinc.bakerlab.org/rosetta/result.php?resultid=264525168
https://boinc.bakerlab.org/rosetta/result.php?resultid=264520827
https://boinc.bakerlab.org/rosetta/result.php?resultid=264466943
https://boinc.bakerlab.org/rosetta/result.php?resultid=264466941

Any ideas? Seeing this on multiple Linux hosts with different kernels. They are all running the recommended 6.4.5.

ID: 62165 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Rob Heilman [Echo Labs]

Send message
Joined: 26 Apr 07
Posts: 20
Credit: 2,815,410
RAC: 0
Message 62170 - Posted: 10 Jul 2009, 18:18:40 UTC - in response to Message 62165.  

I am getting a lot of compute errors on sel_core_4.5 work units. They all seem to report error code -161. Examples:

https://boinc.bakerlab.org/rosetta/result.php?resultid=264525168
https://boinc.bakerlab.org/rosetta/result.php?resultid=264520827
https://boinc.bakerlab.org/rosetta/result.php?resultid=264466943
https://boinc.bakerlab.org/rosetta/result.php?resultid=264466941

Any ideas? Seeing this on multiple Linux hosts with different kernels. They are all running the recommended 6.4.5.


This was moved into this thread by a moderator. Is this a 1.80 problem or a sel_core_4.5 problem? I did not want to assume it was 1.80 and that is why I started a new thread.
ID: 62170 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 62171 - Posted: 10 Jul 2009, 20:11:44 UTC

Rob, certainly a valid point. But we'll resolve the question here in this thread. Often new task types are related to new code changes in a release and so the two possibilities are often highly correlated anyway.
Rosetta Moderator: Mod.Sense
ID: 62171 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 808,098
RAC: 0
Message 62173 - Posted: 10 Jul 2009, 22:45:17 UTC

I have noticed something about this thread, it seems to be displaying on my screen in wide format. I have to move the bottom scroll bar across the screen to view the whole post. In the Number crunching thread I can view posts without having to move my scroll bar. Is anyone else having this problem?
Have a crunching good day!!
ID: 62173 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 62174 - Posted: 10 Jul 2009, 23:04:04 UTC

Yes, it start out normally and then changes to wide screen format.
ID: 62174 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 62175 - Posted: 10 Jul 2009, 23:11:59 UTC

It is due to wide images posted in the thread. Depending on how long 1.80 remains current release, I may have to move the wide posts.
Rosetta Moderator: Mod.Sense
ID: 62175 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 1
Message 62176 - Posted: 10 Jul 2009, 23:16:11 UTC
Last modified: 10 Jul 2009, 23:16:34 UTC

maybe you guys could suggest some resizing software that we can use to reduce the size of our screen shots. my screen shot started this mess and i can't edit the post to reduce the size and i can not access the storage site i put the image on for free. also maybe you could suggest a file storage site that we can use to post our screen shots for free. then this image issue wouldn't have to happen.

of course we will need a seperate thread for that...
ID: 62176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 808,098
RAC: 0
Message 62178 - Posted: 11 Jul 2009, 0:18:13 UTC - in response to Message 62175.  

It is due to wide images posted in the thread. Depending on how long 1.80 remains current release, I may have to move the wide posts.

Thank you for details Mod.Sense, I never gave the screen shots a thought. I'm not sure if this is the right place to ask, is there any chance the page Quick guide to Rosetta and its graphics can be updated to what the different colors mean?
Have a crunching good day!!
ID: 62178 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 62179 - Posted: 11 Jul 2009, 0:41:44 UTC
Last modified: 11 Jul 2009, 0:46:00 UTC

speedy, the colors are just rainbow spectrum blue to red. The help you see which end is which. Especially with longer proteins.

greg, I think it best to post links rather then pics, as described here. So, url tags rather then img tags. You might consider using flickr.com to host pics. I see geocities will be going away soon.
Rosetta Moderator: Mod.Sense
ID: 62179 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 808,098
RAC: 0
Message 62180 - Posted: 11 Jul 2009, 1:58:38 UTC - in response to Message 62179.  
Last modified: 11 Jul 2009, 1:59:31 UTC

speedy, the colors are just rainbow spectrum blue to red. The help you see which end is which. Especially with longer proteins.

Ok I was talking about the colours in the accepted energy colors are mainly yellow & blue. I can't tell which end is witch of the proteins now, when you say help you see witch end is witch of the proteins are you referring to the protein that is moving in the accepted panel of the graphics window?
Have a crunching good day!!
ID: 62180 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 62183 - Posted: 11 Jul 2009, 6:03:06 UTC

Hi.

This one seems to have the same type of problem as the real_core one's seems it

got stuck in a loop, done twice.

sel_core_5.0_low200_beta_low200_start_hb_t297__IGNORE_THE_REST_14061_180_1

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=241330995

Model:0
Step:44400

ABORTED MINE.


ID: 62183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Problems with Minirosetta 1.80



©2024 University of Washington
https://www.bakerlab.org