Message boards : Number crunching : Report Problems with Rosetta Version 5.24
Author | Message |
---|---|
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
|
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
Not a bug, but I noticed a v5.24 WU FRA_t298_hom001_5_IGNORE_THE_REST_dec22.pdb_747_47_0 which has a working set (RAM usage) of almost 300MB (298) and 837MB virtual. Plus the data files for this WU are ~11.5MB. It reminds me that Rosetta could really use a BigWU flag and my concern that woth such WUs we might lose folks with 512MB RAM (who run other software in addition to crunching for Rosetta) and/or dialup. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Moderator9 Volunteer moderator Send message Joined: 22 Jan 06 Posts: 1014 Credit: 0 RAC: 0 |
Not a bug, but I noticed a v5.24 WU FRA_t298_hom001_5_IGNORE_THE_REST_dec22.pdb_747_47_0 You are correct that this is not a bug. However, the large memory test you have mentioned has already been shown not to be the answer. In fact BOINC will use virtual memory to compensate for memory issues. Moreover, the BOINC scheduler will allow other programs to run and yield the processor to them as necessary. But if you read the version 5.22 problem reporting thread, you will see many reports from people who were upset that they were denied work because of the size of a work unit. During CASP large work units are just part of the science that is being run. Moderator9 ROSETTA@home FAQ Moderator Contact |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Our work unit generator now alternates between high memory and standard jobs to prevent filling the queue with just high memory ones. This will allow users with low memory machines to keep getting work. |
Bin Qian Send message Joined: 13 Jul 05 Posts: 33 Credit: 36,897 RAC: 0 |
This protein is one of the largest CASP target we ran on r@h - 336 residues. We've marked these WUs as high memory jobs and they will only be sent to machines with more than 512M memory. As David Kim has said in his post below, the queueing system will make sure there are always low memory jobs available for all the users. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
I've not kept up with BOINC-server developments, but I think that currently such a high-memory job will still be sent to e.g. a PC with only 256MB RAM. And only AFTER that WU has been downloaded (a procedure which might last 45min-1hr for a dialup guy) the local BOINC client will notice it's not suitable and dump it. Personally, I don't mind, as I have fast Internet and a few months ago I had upgraded all my PCs with extra RAM "for Rosetta". But from the perspective of a dialup user... waiting 45min-1hr to download a 12MB WU and then see it aborted. He wouldn't be very happy. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Ananas Send message Joined: 1 Jan 06 Posts: 232 Credit: 752,471 RAC: 0 |
- Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x008C0DD1 write attempt to address 0x073CEA1C resultid=25245144 The box has only 256MB RAM, which could be the reason. Using no graphics, Rosetta shared the host with SIMAP, which has fairly low RAM requirements. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
The WU t316__CASP7_JUMPABINITIO_SAVE_ALL_OUT_BARCODE_250to373_hom008__737_139_0 hung after it finished crunching. After some hours, I stopped and restarted BOINC, and the WU immediately finished and uploaded. |
Tom Philippart Send message Joined: 29 May 06 Posts: 183 Credit: 834,667 RAC: 0 |
I had 2 computing errors in 24h :( shall i specify the WU? |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
Yesterday this WU was running for 40 minutes when I had to shut my computer down. Although checkpointing is enabled for this WU and I found a farlxcheck the WU started from the beginning after I restarted my computer today. @Dimitri The big WU are only sent to machines with 512 MB RAM and above. I'm glad they finally decided to use this BOINC-Feature to send larger jobs to higher-spec-machines. My current WU uses although about 300 MB RAM and I'm happy that my 1 GB RAM is of any use (but I'm not happy that checkpointing still does not work smoothly). Perhaps an announcement of this procedure would be in order in the announcement thread. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
@Dimitri Mea culpa, I thought it worked as I described below. I haven't looked into the sources for quite some time. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Amos Jeffries Send message Joined: 14 Dec 05 Posts: 3 Credit: 244,222 RAC: 0 |
I'm having trouble D/L the 5.24 application the link just hangs. Any help would be appreciated. The BOINC manager (win32) and boinc-client (linux) grabs about 95% of some files and then dies into a resume loop. I noticed this ~2 days ago, I reset the project 24 hours ago, with no change except that it restarted all downloads from the begining and hangs in the same place again. It downloads the small support files and WU files okay, its just the new application one. A manual test using wget does the same thing: " >wget -c https://boinc.bakerlab.org/rosetta/download/rosetta_5.24_i686-pc-linux-gnu --15:03:04-- https://boinc.bakerlab.org/rosetta/download/rosetta_5.24_i686-pc-linux-gnu => `rosetta_5.24_i686-pc-linux-gnu' Resolving boinc.bakerlab.org... 140.142.20.103 Connecting to boinc.bakerlab.org|140.142.20.103|:80... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 9,705,772 (9.3M), 424,092 (414K) remaining [application/octet-stream] 95% [+++++++++++++++++++++++++++++++ ] 9,281,680 --.--K/s " The windows application hangs at a different size, but with the same consistency: " wget -c https://boinc.bakerlab.org/rosetta/download/rosetta_5.24_windows_intelx86.exe --15:34:45-- https://boinc.bakerlab.org/rosetta/download/rosetta_5.24_windows_intelx86.exe => `rosetta_5.24_windows_intelx86.exe' Resolving boinc.bakerlab.org... 140.142.20.103 Connecting to boinc.bakerlab.org|140.142.20.103|:80... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 7,245,824 (6.9M), 642,944 (628K) remaining [application/octet-stream] 91% [++++++++++++++++++++++++++++++++++ ] 6,602,880 --.--K/s " |
Mike Gelvin Send message Joined: 7 Oct 05 Posts: 65 Credit: 10,612,039 RAC: 0 |
|
cnick6 Send message Joined: 30 May 06 Posts: 29 Credit: 12,597,623 RAC: 0 |
Hi Everyone -- I haven't had time to really investigate my problem, but has anyone seen issues with 5.24 where the Windows screensaver will just hang? My disk activity gets really intense but my system will never come back to the desktop. The system is not "locked" as my Numlock key still lights up. Could there be some kind of memory leak with 5.24? I can't even CTRL-ALT-DEL to task manager. I have to hard reset my machine. I didn't see this at all with version 5.22 or previous versions. My work units appear to be completing with 'success'... Thanks -Nick |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Hi Everyone -- Nick, does the graphic seem like it's just locked and won't be released so windows can resume using it? Does it seem like you can still interact with whatever screen is right below the graphic, but the graphic just doesn't go away? Like, one time I had a window open that had one of those "shoot the duck and win $1000 in grocery coupon games, and when I clicked the mouse, I could hear the gunshots and see the HD activity light as I was being swept away to a new window, since my aim was good and shot the duck (even though I could only see the Rosetta graphic). One night I saw this had happened, but didn't "hard reset" my laptop, and the next morning the Rosetta graphic had disappeared, only to be replaced by the Seti graphic. I still couldn't get my windows desktop back. I ended up "powering down". I've seen this happen twice, both times Rosetta was the first screen to appear. I've reported this to Rom Walton. I'm curious to see if you're the second person to see this. tony |
mewbysea Send message Joined: 29 Jan 06 Posts: 17 Credit: 15,838,515 RAC: 1,403 |
Hi folks, Just wanted to point out this wu 21185008 which failed on two computers. On the computer running Rosetta 5.22, the auto debug kicked in; on the one running version 5.24, the auto debug failed. |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
Bombed out due to high disk usage after 0 seconds. This seems to be a problem with your disk and your general BOINC settings. Clean some space on your disk or allow more disk usages in your general preferences. |
cnick6 Send message Joined: 30 May 06 Posts: 29 Credit: 12,597,623 RAC: 0 |
Hi Everyone -- Tony, yes it sounds very similar, but Rosetta will never recover. Rosetta seems to keep running (so the screensaver is always active) -- but I cannot return back to the desktop whatsoever. As soon as I try to wake up the machine, the hard disk light is more or less lit up the entire time. I've left it sit for several hours and it will never come back. |
DanSpitz Send message Joined: 5 Jun 06 Posts: 2 Credit: 15,967 RAC: 0 |
Version 5.24 is up: |
Bob Guy Send message Joined: 7 Oct 05 Posts: 39 Credit: 24,895 RAC: 0 |
This WU 21358231 crashed when restarting. I have 'leave in memory' turned on but this was after closing and restarting Boinc. Crashed immediately (well, it ran for 12 minutes) upon restarting. Graphics never opened for this WU - not a graphics problem. Error was the -1073741819 (0xc0000005) error. <Edit for the error code and run time.> |
Message boards :
Number crunching :
Report Problems with Rosetta Version 5.24
©2024 University of Washington
https://www.bakerlab.org