Message boards : Number crunching : Problems with Minirosetta 1.80
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2105 Credit: 40,925,612 RAC: 18,224 |
A late report - sorry for the delay: azurin_BOINC_ABRELAX_4xBIN_1xCYCLES_SAVE_ALL_OUT_IGNORE_THE_REST-S25-9-S3-3--azurin-_12935_2849_1 Outcome Client error No other errors in the last 217 WUs |
bruce Send message Joined: 15 Sep 07 Posts: 10 Credit: 839,797 RAC: 0 |
Hi, I'm experiencing issues with 1.80 where by: 1)a WU does not exit memory. I currently have 25 minirosetta_1.80_windows_intelx86.exe processes in memory only 2 of which are using any cpu time. Memory utilization ranges from 400kb to 200mb The fact they are not exiting, is causing my virtual memory to run out. 2)I get error messages in the BOINC client. 3)The ...BOINCslots folder is filling up with numbered folders where most have only three files:boinc_lockfile, stderr.txt and stdout.txt. I've rebooted, reset the project and still continue to get these errors. Here are some specifics about my setup and the errors: System: 3.0ghz Pentium 4 (w/hyperthreading on) 2.0gb RAM WinXP sp3 (32bit) Boinc 6.6.36 (Windows 32bit) Preferences: swtich between apps every 200minutes use at most 100% processors use at most 75% of CPU time use at most 20gb HD space use at most 50% memory when in use use at most 90% memory when idle. Projects: rosetta@home (Resource Share:600); seti@home (Resource Share:75) Error from the ...BOINCstdoutdae.txt file (similar output on the BOINC manager Messages tab): 05-Jul-2009 07:47:45 [rosetta@home] If this happens repeatedly you may need to reset the project. 05-Jul-2009 07:47:45 [rosetta@home] Restarting task abinitio_withrelax_homfrag_129_B_1ynvA_SAVE_ALL_OUT_13795_445_0 using minirosetta version 180 05-Jul-2009 07:48:26 [rosetta@home] Task abinitio_withrelax_homfrag_129_B_1ynvA_SAVE_ALL_OUT_13795_445_0 exited with zero status but no 'finished' file 05-Jul-2009 07:48:26 [rosetta@home] If this happens repeatedly you may need to reset the project. etc..etc..etc... Here is some output from the stderr.txt in the slots folders (with only the three files mentioned above): BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _U9X3X_00001 ... [2009- 7- 5 7:47: 4:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 7- 5 7:47:45:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 7- 5 7:48:26:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting [2009- 7- 5 7:49: 8:] :: BOINC:: Initializing ... ok. Can't acquire lockfile - exiting ... After a reboot: only two minirosetta_1.80_windows_intelx86.exe in memory, both using cpu time (one at 168mb the other at 219mb) Much more along the lines of what I would expect to see) After a reboot: all the 'slot' folders with the boinc_lockfile are gone save for 3, the two working rosetta@home WUs and the one Seti@home WU. (again, what I would expect to see) What other information can I provide that might help clue in on what is causing this problem. Thanks for your help |
William T.M. Theisen Send message Joined: 11 Sep 06 Posts: 7 Credit: 527,145 RAC: 0 |
lb_dk_ksync_withtrim_hb_t297__IGNORE_THE_REST_12980_1893_0 Got stuck at 6.888% and has been running 29 hours so far, and has gone up in time for "time to completion" from 60 hours to 65 hours. I'm not sure what is going on with it, should I abort it? |
xsc2 Send message Joined: 9 Jul 08 Posts: 4 Credit: 62,354 RAC: 0 |
Exit status: -1073741819 (0xc0000005) https://boinc.bakerlab.org/rosetta/result.php?resultid=263200171 https://boinc.bakerlab.org/rosetta/result.php?resultid=263584567 Exit status: 1 (0x1) https://boinc.bakerlab.org/rosetta/result.php?resultid=263381564 |
[AF>france>pas-de-calais]symaski62 Send message Joined: 19 Sep 05 Posts: 47 Credit: 33,871 RAC: 0 |
abinitio_withrelax_nohomfrag_129_B_1shfA_SAVE_ALL_OUT_13798_612_0 https://boinc.bakerlab.org/rosetta/result.php?resultid=263840421 <![CDATA[ <stderr_txt> [2009- 7- 6 17:41:24:] :: BOINC:: Initializing ... ok. [2009- 7- 6 17:41:24:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev30680.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/fragments_1shf.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _U9X3X_00001 Starting work on structure: _U9X3X_00002 Starting work on structure: _U9X3X_00003 Starting work on structure: _U9X3X_00004 Starting work on structure: _U9X3X_00005 Starting work on structure: _U9X3X_00006 Starting work on structure: _U9X3X_00007 Starting work on structure: _U9X3X_00008 Starting work on structure: _U9X3X_00009 Starting work on structure: _U9X3X_00010 Starting work on structure: _U9X3X_00011 Starting work on structure: _U9X3X_00012 Starting work on structure: _U9X3X_00013 Starting work on structure: _U9X3X_00014 Starting work on structure: _U9X3X_00015 Starting work on structure: _U9X3X_00016 Starting work on structure: _U9X3X_00017 Starting work on structure: _U9X3X_00018 Starting work on structure: _U9X3X_00019 Starting work on structure: _U9X3X_00020 ====================================================== DONE :: 1 starting structures 10442.9 cpu seconds This process generated 20 decoys from 20 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
This one is taking 689MB of memory, peak was 986MB! 2a05_NN_DISCONTROL_BOINC_ABRELAX_SAVE_ALL_OUT_13840 It is 20hrs in to a 24hr runtime on Windows XP, under BOINC 6.6.20. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
This one is taking 689MB of memory, peak was 986MB! Here's a 2a05_NN_DISCONTROL_BOINC_ABRELAX_SAVE_ALL_OUT_13840 WU that ran on a single core diskless Linux node with 1GB installed. It ended with a bad_alloc error, which means the node ran out of physical memory. I've had a number of bad_alloc errors on 512MB nodes (which I no longer crunch with), but now it seems 1GB/core may no longer be enough for Rosetta. |
MikeMcC3 Send message Joined: 13 May 08 Posts: 2 Credit: 501,309 RAC: 0 |
I have no idea what is going on. When I look at the work that has been sent to my computer, I see about one-thousand work units that I haven't received. The due dates arrive, and get red-flagged as time-outs. I can't find any of the work units listed as sent, and no mention of those work units as being received by my computer. What the heck is going on? If anyone can tell me if they have had similar problems like this, or what may have caused it. I've been reducing data for BOINC for over 2 years now, and have never encountered any such problems. |
dag Send message Joined: 16 Dec 05 Posts: 106 Credit: 1,000,020 RAC: 0 |
I'm getting this many times per day now... never had it before this batch: 7/9/2009 10:49:34 AM|rosetta@home|Task picker-L1-sssim-1bk2A_13839_593_0 exited with a DLL initialization error. 7/9/2009 2:03:14 PM|rosetta@home|Task lr10_seq_score12_rlbd_1elw_IGNORE_THE_REST_DECOY_13841_116_0 exited with a DLL initialization error. 7/9/2009 2:05:31 PM|rosetta@home|Task 1sn6_NN_DISCONTROL_BOINC_ABRELAX_SAVE_ALL_OUT_13840_1231_0 exited with a DLL initialization error. |
Rob Heilman [Echo Labs] Send message Joined: 26 Apr 07 Posts: 20 Credit: 2,815,410 RAC: 0 |
I am getting a lot of compute errors on sel_core_4.5 work units. They all seem to report error code -161. Examples: https://boinc.bakerlab.org/rosetta/result.php?resultid=264525168 https://boinc.bakerlab.org/rosetta/result.php?resultid=264520827 https://boinc.bakerlab.org/rosetta/result.php?resultid=264466943 https://boinc.bakerlab.org/rosetta/result.php?resultid=264466941 Any ideas? Seeing this on multiple Linux hosts with different kernels. They are all running the recommended 6.4.5. |
Rob Heilman [Echo Labs] Send message Joined: 26 Apr 07 Posts: 20 Credit: 2,815,410 RAC: 0 |
I am getting a lot of compute errors on sel_core_4.5 work units. They all seem to report error code -161. Examples: This was moved into this thread by a moderator. Is this a 1.80 problem or a sel_core_4.5 problem? I did not want to assume it was 1.80 and that is why I started a new thread. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Rob, certainly a valid point. But we'll resolve the question here in this thread. Often new task types are related to new code changes in a release and so the two possibilities are often highly correlated anyway. Rosetta Moderator: Mod.Sense |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,098 RAC: 0 |
I have noticed something about this thread, it seems to be displaying on my screen in wide format. I have to move the bottom scroll bar across the screen to view the whole post. In the Number crunching thread I can view posts without having to move my scroll bar. Is anyone else having this problem? Have a crunching good day!! |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
Yes, it start out normally and then changes to wide screen format. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
It is due to wide images posted in the thread. Depending on how long 1.80 remains current release, I may have to move the wide posts. Rosetta Moderator: Mod.Sense |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 1 |
maybe you guys could suggest some resizing software that we can use to reduce the size of our screen shots. my screen shot started this mess and i can't edit the post to reduce the size and i can not access the storage site i put the image on for free. also maybe you could suggest a file storage site that we can use to post our screen shots for free. then this image issue wouldn't have to happen. of course we will need a seperate thread for that... |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,098 RAC: 0 |
It is due to wide images posted in the thread. Depending on how long 1.80 remains current release, I may have to move the wide posts. Thank you for details Mod.Sense, I never gave the screen shots a thought. I'm not sure if this is the right place to ask, is there any chance the page Quick guide to Rosetta and its graphics can be updated to what the different colors mean? Have a crunching good day!! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
speedy, the colors are just rainbow spectrum blue to red. The help you see which end is which. Especially with longer proteins. greg, I think it best to post links rather then pics, as described here. So, url tags rather then img tags. You might consider using flickr.com to host pics. I see geocities will be going away soon. Rosetta Moderator: Mod.Sense |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,098 RAC: 0 |
speedy, the colors are just rainbow spectrum blue to red. The help you see which end is which. Especially with longer proteins. Ok I was talking about the colours in the accepted energy colors are mainly yellow & blue. I can't tell which end is witch of the proteins now, when you say help you see witch end is witch of the proteins are you referring to the protein that is moving in the accepted panel of the graphics window? Have a crunching good day!! |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi. This one seems to have the same type of problem as the real_core one's seems it got stuck in a loop, done twice. sel_core_5.0_low200_beta_low200_start_hb_t297__IGNORE_THE_REST_14061_180_1 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=241330995 Model:0 Step:44400 ABORTED MINE. |
Message boards :
Number crunching :
Problems with Minirosetta 1.80
©2024 University of Washington
https://www.bakerlab.org