Posts by Rhiju

1) Message boards : Number crunching : Problems with Rosetta version 5.98 (Message 54267)
Posted 8 Jul 2008 by Rhiju
Post:
Thanks... we changed the validator code earlier today after testing on RALPH, but there's clearly still an issue! I've contacted DK to revert to the old code.

The Server Status page currently says that the validator is not running.


I've EMailed the Project Team pointing this out. Thanks for pointing it out.

2) Message boards : Number crunching : Problems with version 5.96 (Message 52546)
Posted 17 Apr 2008 by Rhiju
Post:
Hedera, can you post a link to one of your results, so that I can figure out what the problem is? I actually did not expect any of these RNA jobs to take a lot of memory, so I definitely want to track ths down.

Hedera, you can tell BOINC to "snooze", or limit it to less then all of your CPUs, or put a memory constraint on it to help keep it responsive for you. Most likely, as you already seem aware, you are bottlenecking on memory. Yes those RNA's do that a lot. And so it you limit the amount of memory BOINC is allowed to use, it will basically scale back on how many it's trying to do at the same time.

3) Message boards : Number crunching : Rosetta Application Version Release Log (Message 52098)
Posted 23 Mar 2008 by Rhiju
Post:
Rosetta 5.96 includes a new procedure for modeling large RNAs where experimental information is available. Basically, constraints that bring together parts of the chain that are further apart in sequence are introduced at later stages of the search.
4) Message boards : Number crunching : Problems with version 5.96 (Message 52097)
Posted 23 Mar 2008 by Rhiju
Post:
This version of Rosetta is pretty similar to 5.95, though we'll be running a few RNA workunits -- please post if there are any issues.
5) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 49936)
Posted 22 Dec 2007 by Rhiju
Post:
Hi: Here's a partial explanation. On ralph, nearly all the workunits for a full day had returned and come back as "successes", typically a very good sign -- but the linux issue, as you correctly pointed out, leads to delayed responses from clients (rather than a bunch of immediate WU errors that tell us to go track down the problem). Since there are very few RALPH linux users we didn't notice a drop in the overall return rate of successes. The only sign that things were wrong were from a message board posting there (later bolstered by your and others' posts) and here ...

So, thanks for posting -- it did help us catch the problem relatively quickly -- and please accept our apologies. We'll certainly pay closer attention to this in the future, and do tests for, say, at least two days. if you could recruit some more Rosetta@home linux users to give a fraction of their CPUs to ralph and occasionally post errors in the message boards, that would also help!

Update: we've tracked down the problem -- its an issue with the BOINC-provided API (I guess we happened to be unlucky in being the first to update our linux app after the bug got introduced). Later today, we'll update the ralph and rosetta@home linux apps and they should work.


Since you tracked down the problem, can you please tell us how it will effect all those of us running Rosetta on Linux ?

We already know that those 5.90 tasks will not finish after the specified runtime. Without manual intervention, will these tasks ever end on their own or do I have to go to each and every server and manually abort all the 5.90 tasks ?

I have over 100 cpus running Rosetta on Linux and having to clean up this mess is not something I'm looking forward to. It especially upsets me that the lack of testing on Ralph caused the problem to appear in Rosetta. This was clearly avoidable!
6) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 49917)
Posted 21 Dec 2007 by Rhiju
Post:
OK, just did the update -- this should revert the "cpu run time" and "% complete" behavior to what linux clients are used to! Please let me know if this fixes this issue (looks good locally).

Also, there were complaints about memory usage for Rosetta 5.89 -- have these problems become better?

Thanks for the continuing feedback!

Update: we've tracked down the problem -- its an issue with the BOINC-provided API (I guess we happened to be unlucky in being the first to update our linux app after the bug got introduced). Later today, we'll update the ralph and rosetta@home linux apps and they should work.

All linux users -- thanks for posting! Its quite interesting, in the past, we've seen issues that were Windows-specific, then Mac-specific, but typically linux has been robust (especially since the app doesn't have graphics).

We're looking into the current Rosetta@home/linux issue (I think the cpu time call must be messed up in the latest boinc api), but it may take a few days to track it down. In the meanwhile, please feel free to switch to another app. Apologies... there aren't that many linux users on RALPH -- if you're interested in helping out, we'd be grateful if some more linux clients attached to ralph at least part time.


7) Message boards : Number crunching : Rosetta Application Version Release Log (Message 49916)
Posted 21 Dec 2007 by Rhiju
Post:
The linux executable has been updated to 5.91. Scientifically, this is exactly the same code as 5.90, but this version should include a fix in the cpu-run-time counter that was causing errors!
8) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 49889)
Posted 21 Dec 2007 by Rhiju
Post:
Update: we've tracked down the problem -- its an issue with the BOINC-provided API (I guess we happened to be unlucky in being the first to update our linux app after the bug got introduced). Later today, we'll update the ralph and rosetta@home linux apps and they should work.

All linux users -- thanks for posting! Its quite interesting, in the past, we've seen issues that were Windows-specific, then Mac-specific, but typically linux has been robust (especially since the app doesn't have graphics).

We're looking into the current Rosetta@home/linux issue (I think the cpu time call must be messed up in the latest boinc api), but it may take a few days to track it down. In the meanwhile, please feel free to switch to another app. Apologies... there aren't that many linux users on RALPH -- if you're interested in helping out, we'd be grateful if some more linux clients attached to ralph at least part time.

9) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 49883)
Posted 21 Dec 2007 by Rhiju
Post:
All linux users -- thanks for posting! Its quite interesting, in the past, we've seen issues that were Windows-specific, then Mac-specific, but typically linux has been robust (especially since the app doesn't have graphics).

We're looking into the current Rosetta@home/linux issue (I think the cpu time call must be messed up in the latest boinc api), but it may take a few days to track it down. In the meanwhile, please feel free to switch to another app. Apologies... there aren't that many linux users on RALPH -- if you're interested in helping out, we'd be grateful if some more linux clients attached to ralph at least part time.
10) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 49847)
Posted 21 Dec 2007 by Rhiju
Post:
This seems very odd. Thanks a lot for posting, especially the link to the workunit. I checked here that the %cpu usage is fine for other platforms, so I fear that this is a linux-specific issue.

Anyone else out there noticing success or failure with Linux?

Astro, do other apps (e.g., SETI) run fine?

Also, do you happen to know what version of BOINC you are using?

I'm seeing the exact same thing with my AMD64 2800 and my AMD64 X2 4800 as well. I let them work on the tasks for 15 min and still only have --- as a cpu time. Not even 00:00:00. Although I did see the zeros on a couple of the ones from the 6000. Also, after suspending the already running 5.89 tasks both of them changed to "computation error". The 4800 is the only one that produced the "computation error" after suspension. Looks like I'll be windows only after these 5.89's run dry.

NOTE: 5.90 does run on my AMD64 3700 and using windows.

Hmmm, After 15 min and before I could abort the ones on my 4800 one of them switched to 19.864% done, but still shows --- as cpu time.

11) Message boards : Number crunching : Problems with version 5.90/5.91 (Message 49836)
Posted 20 Dec 2007 by Rhiju
Post:
Thanks for continuing to post bugs. We'd be particularly grateful if users who were noticing memory hog issues with 5.89 could post if the newer app is better!
12) Message boards : Number crunching : Rosetta Application Version Release Log (Message 49835)
Posted 20 Dec 2007 by Rhiju
Post:
Rosetta@home has been updated to version 5.90. New stuff:

1. Better conformational space exploration of proteins with "dihedral" symmetry.

2. Reduced virtual memory requirements ... 5.89 which was causing problems on some machines.

3. Compiled with the newest BOINC API, which should allow new users that download upcoming versions of the BOINC client to run Rosetta properly.
13) Message boards : Number crunching : Problems with Rosetta version 5.89 (Message 49834)
Posted 20 Dec 2007 by Rhiju
Post:
Hey everybody -- we've been listening, and we've been especially concerned regarding the "memory hogs". We think we've fixed this problem and are updating Rosetta@home!

So are we to assume that there are WU's coming that will have lower virtual memory requirements? The memory hogs are getting tired.


I don't have any advanced knowledge of future Rosetta releases. But my experience with the project tells me they are working on such modications to the program, and so yes, I would assume both that other WUs will require less memory, and that future releases will improve the memory footprint.

It may be helpful if folks would post the WU name and their observations of memory usage. That will provide something concrete to compare against when a new release comes out.

14) Message boards : Number crunching : Rosetta Application Version Release Log (Message 49367)
Posted 4 Dec 2007 by Rhiju
Post:
To fully clarify the current versions:

"Rosetta beta" 5.85 (mac/windows) or 5.86 (linux) contain the latest executable.

"Rosetta" 5.82 is an old executable -- during lull perids, we are still running a few workunits with this older executable to serve as a very large scale, fully self-consistent test of the structure prediction protocol.
15) Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux) (Message 49146)
Posted 28 Nov 2007 by Rhiju
Post:
Sorry, these "1gid" workunits have been canceled... looks like there are a few particular platforms where they consistently crash. Thanks for posting!
16) Message boards : Number crunching : Rosetta Application Version Release Log (Message 49145)
Posted 28 Nov 2007 by Rhiju
Post:
Due to a glitch in our previous application update, the Linux executable didn't have the same source code as the Mac and Windows executables. It now matches those executables! For reasons of bookkeeping, the Linux application now has a nominal version of 5.86, but is identical to the 5.85 apps for other platforms.
17) Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux) (Message 49034)
Posted 25 Nov 2007 by Rhiju
Post:
You should feel free to abort any of these BOINC_SYMM_FOLD_AND_DOCK_RELAX workunits.

And I'm contacting the person in charge of the MolecularRep workunits!

I just got this one it's a resend, the user that had it first had lots

of problems with it. Do you want me to let it run or abort?

It's a BOINC_SYMM_FOLD_AND_DOCK_RELAX-1uis_-crystal_foldanddock.

boinc.bakerlab.org/rosetta/workunit.php?wuid=111015584

Pete.



18) Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux) (Message 49019)
Posted 24 Nov 2007 by Rhiju
Post:
There are no more SYMM_FOLD_AND_DOCK_RELAX workunits being sent out -- I want to find out if those particular workunits were the source of the memory hog problem, or whether the application itself has an issue. Is anyone out there noticing bad behavior with other kinds of work? Thanks in advance for the feedback.

There is something seriously wrong with the application. It's not a virtual memory hog it's a complete memory hog. I don't have virtual memory, have 2GiB of physical memory. Rosetta now uses 800MB of that for one wu. With the 600MB normally used this is not enough to run two or sometimes even one. I am aborting all version 5.85 tasks until they are disabled.


All Rosetta units are 5.85 now, so please don't abort hundreds of units over and over. Just suspend the project.

19) Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux) (Message 48930)
Posted 22 Nov 2007 by Rhiju
Post:
I have a feeling that this might be due just to the workunits we've sent out in the first couple days of the 5.85 app ("SYMM_FOLD_AND_DOCK_..."). We're working on way to improve the memory usage and the (infrequent) errors with this particular kind of workunit -- if the excess virtual memory issue continues with other kinds of workunits, please do post here. Thanks!

I'm sure you've heard this before but. Rossettia version 5.85 is a virtual memory hog. I hope you can find soome ways to fix this problem.

Thank

20) Message boards : Number crunching : Problems with Rosetta version 5.81 (Message 48909)
Posted 21 Nov 2007 by Rhiju
Post:
OK, we're looking into this, especially the "1i8f" workunits. If you continue to see errors, please post at the message board for Rosetta version 5.85 (any new workunits since Nov. 20 should probably be running with that app). Thanks!

1di2__LOGREG_ABRELAX_PILOT_ALL_FREQ_SAVE_ALL_OUT-1di2_-_BARCODE__2284_1395

1i8f__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1i8f_-crystal_foldanddock__2257_33840

1igu__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1igu_-crystal_foldanddock__2257_33840



Next 20



©2024 University of Washington
https://www.bakerlab.org