Problems with Minirosetta v1.54

Message boards : Number crunching : Problems with Minirosetta v1.54

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · Next

AuthorMessage
TomaszPawel

Send message
Joined: 28 Apr 07
Posts: 54
Credit: 2,791,145
RAC: 0
Message 60746 - Posted: 20 Apr 2009, 5:57:52 UTC

ID: 60746 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TomaszPawel

Send message
Joined: 28 Apr 07
Posts: 54
Credit: 2,791,145
RAC: 0
Message 60747 - Posted: 20 Apr 2009, 10:23:59 UTC - in response to Message 60746.  

ID: 60747 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60753 - Posted: 20 Apr 2009, 15:28:29 UTC

TomaszPawel sights two cases where 99 models were completed in less then an hour with a 6.6.20 Win XP client, and resulted in validate error from miniRosetta v1.54.

WU names
243895936 rest3d85_ip40_2oqk.patchdock.7.pdb_0003_fa_dock.xml_score12_pert38_DOCK_10797_652_0

244107786
rest3d85_ip40_2w4f.patchdock.1.pdb_0001_fa_dock.xml_score12_pert38_DOCK_10797_943_0
Rosetta Moderator: Mod.Sense
ID: 60753 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60756 - Posted: 20 Apr 2009, 19:02:12 UTC

ID: 60756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TomaszPawel

Send message
Joined: 28 Apr 07
Posts: 54
Credit: 2,791,145
RAC: 0
Message 60793 - Posted: 23 Apr 2009, 11:47:57 UTC - in response to Message 60756.  

https://boinc.bakerlab.org/rosetta/result.php?resultid=245909228

Reason: Divide by Zero (0xc000008e) at address 0x004E51A9

ID: 60793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TomaszPawel

Send message
Joined: 28 Apr 07
Posts: 54
Credit: 2,791,145
RAC: 0
Message 60794 - Posted: 23 Apr 2009, 13:26:22 UTC

ID: 60794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TomaszPawel

Send message
Joined: 28 Apr 07
Posts: 54
Credit: 2,791,145
RAC: 0
Message 60797 - Posted: 23 Apr 2009, 20:09:13 UTC

ID: 60797 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
William Kahler

Send message
Joined: 26 Oct 06
Posts: 1
Credit: 241,241
RAC: 205
Message 60814 - Posted: 25 Apr 2009, 1:09:39 UTC

MiniRosetta 1.54 constantly crashing after ~5 seconds
& (note to Bill G) w/Boinc 6.4.x & 6.6.x (Error Code 5).
It runs a little slow for first 5 seconds of CPU time
w/last stable Boinc 5.x & finishes ok.
No difference with protected app. or not.
Complete BOINC un/re-install & Rosetta de/re-attach no help.

Dell Core Duo 2 GHz w/2 Gig Ram.
WinXP Sp3 Home Edition (up to date).
24/7, no throttle, no graphics/screensaver, leave in memory.
Stand alone or with other projects.
Memtest x2/Prime95/Dell Diagnostics run fine.

thoughts? suggestions?


ID: 60814 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Gavin Shaw
Avatar

Send message
Joined: 1 Feb 07
Posts: 10
Credit: 506,456
RAC: 0
Message 60817 - Posted: 25 Apr 2009, 7:49:42 UTC

And another big upload.

Task 246174559 run for 4 hours with 82 decoys. File upload size was 8.9MB. Took a while to upload. Hate to see what it would have been if there were 99 decoys...

Never surrender and never give up. In the darkest hour there is always hope.

ID: 60817 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 60825 - Posted: 25 Apr 2009, 21:54:34 UTC

Hi there.

I got this on Ubuntu x64 this morning, haven't had any in a while.

That's 41min run time.

Docking_benchmark_unbound__1AVZ.unbound.mppk.pdb.gzdock_score12_hi.xml_11475_29_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=224594412

Over__Validate error__Done__2,496.64

======================================================
DONE :: 1 starting structures 2496.42 cpu seconds
This process generated 99 decoys from 99 attempts
======================================================

pete.

ID: 60825 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 60828 - Posted: 26 Apr 2009, 7:52:48 UTC

Hi me again.

This was a big one, 7.04MB result file for a six hour run.

Docking_benchmark_natives__1FIN.mppk.pdb.gzdock_score_docking_hi.xml_11477_209_0

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=224813196

======================================================
DONE :: 1 starting structures 21620.9 cpu seconds
This process generated 75 decoys from 75 attempts
======================================================

pete.

ID: 60828 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 60836 - Posted: 26 Apr 2009, 23:34:12 UTC

Very few errors nowadays, but just came up with two compute errors:

Docking_benchmark_unbound__1ATN.unbound.mppk.pdb.gzdock_score_docking_hi.xml_11476_94_1
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x005C1D7D read attempt to address 0xC49A08B0

res_careful_ourward_cst_chunk_0_8_hb_t342__IGNORE_THE_REST_1VKBA_5_10927_2_2
ERROR: [ERROR] Unable to open constraints file: resample_outward0.05_ub0.1_lb0.02_median.t342_.cst
ERROR:: Exit from: ....srccorescoringconstraintsConstraintIO.cc line: 330
BOINC:: Error reading and gzipping output datafile: default.out


Running AMD9850 Vista64 8Gb RAM Boinc 6.6.20
ID: 60836 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yaroslav Isakov

Send message
Joined: 2 Nov 07
Posts: 11
Credit: 98,027
RAC: 0
Message 60869 - Posted: 28 Apr 2009, 13:42:45 UTC

Hello, I have a problem: very long pending status in my last WUs:
1 2 3 4 5 6
ID: 60869 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60870 - Posted: 28 Apr 2009, 14:38:46 UTC - in response to Message 60869.  

Hello, I have a problem: very long pending status in my last WUs:
1 2 3 4 5 6


That would explain why credit has been dropping. The assimilator must be having a problem. I've EMailed the Project Team to look in to it when they arrive for the day in Seattle.
Rosetta Moderator: Mod.Sense
ID: 60870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 1965
Credit: 38,160,504
RAC: 9,210
Message 60885 - Posted: 29 Apr 2009, 14:37:32 UTC - in response to Message 60870.  
Last modified: 29 Apr 2009, 14:38:12 UTC

[quote]Hello, I have a problem: very long pending status in my last WUs:
1 2 3 4 5 6/quote]
That would explain why credit has been dropping. The assimilator must be having a problem. I've EMailed the Project Team to look in to it when they arrive for the day in Seattle.

I'm assuming this is fixed now. 17 of my WUs have been allocated credit since the original post, but I have another 15 pending credit - 13 hours worth.

Just awaiting catch-up, I assume. The Server Status page is showing all systems 'Running'.

I also noticed credit was taking more than 4 minutes to come through in the days leading up to the outage, so the problem may've been building up for a few days.
ID: 60885 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 60898 - Posted: 29 Apr 2009, 21:21:28 UTC
Last modified: 29 Apr 2009, 21:23:04 UTC

BOINC v6.6.20 seems to be causing failures due to too many restarts.
https://boinc.bakerlab.org/rosetta/result.php?resultid=247095859
https://boinc.bakerlab.org/rosetta/result.php?resultid=246620233

It suggests keeping tasks in memory. But I've always had it configured to do so. I've also limited the memory available to BOINC while computer is in use. This seems to cause BOINC to begin and then suspend the tasks numerous times during the day. When the task attempts to run and then exceeds memory bound, it goes to a status of waiting for memory. But it no longer appears in the Windows task list, hence was removed from memory.

I have a HT P4, so 2 CPUs. As the primary task cycles through periods with lower memory usage, it attempts to fire up the second core. Only to find it ends up short of memory again a few minutes later as the second task gears up and uses more, or the first cycles in to another phase of higher memory usage.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 60898 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Speedy
Avatar

Send message
Joined: 25 Sep 05
Posts: 163
Credit: 800,690
RAC: 173
Message 60915 - Posted: 30 Apr 2009, 5:24:59 UTC
Last modified: 30 Apr 2009, 5:35:33 UTC

ID: 60915 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
WilMar

Send message
Joined: 29 Mar 09
Posts: 1
Credit: 1,984
RAC: 0
Message 60922 - Posted: 30 Apr 2009, 13:23:20 UTC

Hello !
Now at the advent of version 1.64, I´ve difficulties to load up my last crunched file with version 1.54. I get repeatedly the following messages:
30/04/2009 13:40:31|rosetta@home|Started upload of lb_all_multi_threshold.0.5_hb_t311__IGNORE_THE_REST_1ZK8A_1_10279_7_2_0
30/04/2009 13:42:19||Project communication failed: attempting access to reference site
30/04/2009 13:42:19|rosetta@home|Temporarily failed upload of lb_all_multi_threshold.0.5_hb_t311__IGNORE_THE_REST_1ZK8A_1_10279_7_2_0: connect() failed
30/04/2009 13:42:19|rosetta@home|Backing off 1 hr 50 min 57 sec on upload of lb_all_multi_threshold.0.5_hb_t311__IGNORE_THE_REST_1ZK8A_1_10279_7_2_0
30/04/2009 13:42:21||Internet access OK - project servers may be temporarily down.

As seen on the server status page, all servers are running. So, why this problem and how to cure it ?

Martin
ID: 60922 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,824,497
RAC: 2,340
Message 60923 - Posted: 30 Apr 2009, 14:22:24 UTC - in response to Message 60898.  
Last modified: 30 Apr 2009, 15:18:34 UTC

BOINC v6.6.20 seems to be causing failures due to too many restarts.
https://boinc.bakerlab.org/rosetta/result.php?resultid=247095859
https://boinc.bakerlab.org/rosetta/result.php?resultid=246620233

It suggests keeping tasks in memory. But I've always had it configured to do so. I've also limited the memory available to BOINC while computer is in use. This seems to cause BOINC to begin and then suspend the tasks numerous times during the day. When the task attempts to run and then exceeds memory bound, it goes to a status of waiting for memory. But it no longer appears in the Windows task list, hence was removed from memory.

I have a HT P4, so 2 CPUs. As the primary task cycles through periods with lower memory usage, it attempts to fire up the second core. Only to find it ends up short of memory again a few minutes later as the second task gears up and uses more, or the first cycles in to another phase of higher memory usage.


BOINC 6.6.20 is wotking better for me, so lets's compare our machines and settings. My newer machine, with BOINC 6.6.20 under 64-bit Vista SP1 with 8 GB of memory, does not appear to have any memory problems.

My 32-bit Vista SP1 machine, with BOINC 6.2.28, originally came with 1 GB of memory. I found that wasn't enough to even start running two minirosetta@home workunits at the same time. After enough other problems showed up which I decided were memory problems, I used this site to find out how much memory my motherboard could handle, and then order enough to raise it to the 2 GB limit for my motherboard:

http://www.crucial.com/

This was enough to allow it to start running two minirosetta workunits at one on my 2 CPU cores, but still not enough to run them well. Eventually, I raised both the amount of disk space BOINC is allowed to use, and the amount of swap space BOINC is allowed to use. It's not clear which of the last two steps were actually needed, if not both of them, but that combination handled the memory problems on that machine.

At least some versions of BOINC do not divide up the available swap space in the most efficient way - they first divide it up into equal shares for each BOINC project you have subscribed to, then those shares into smaller shares for each CPU core. If these smaller shares aren't large enough, it can't preserve any work done since the last checkpoint by simply swapping one into the swap space on the hard drive.

Does the HT stand for hyperthreaded, a method of appearing to have twice as many CPU cores by giving each one of them an extra set of registers? If so, I've seen messages from other BOINC users saying that this does not increase the total throughput very much. Therefore, until you are able to handle the memory and swapfile problems, you may find it worthwhile to tell BOINC to use only one of the two apparant CPU cores on your machine.
ID: 60923 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1223
Credit: 13,824,497
RAC: 2,340
Message 60926 - Posted: 30 Apr 2009, 16:05:45 UTC

I've recently had two workunits with the lockfile problem:

https://boinc.bakerlab.org/rosetta/result.php?resultid=247527853

https://boinc.bakerlab.org/rosetta/result.php?resultid=247443039

Both were then completed successfully by someone else.

Could minirosetta be modified to check for the lockfile problem sooner, and at least produce more debug information about it instead of wasting CPU time first?
ID: 60926 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · Next

Message boards : Number crunching : Problems with Minirosetta v1.54



©2024 University of Washington
https://www.bakerlab.org