Posts by Rhiju

21) Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux) (Message 48876)
Posted 20 Nov 2007 by Rhiju
Post:
Thanks for continuing to post!
22) Message boards : Number crunching : Rosetta Application Version Release Log (Message 48875)
Posted 20 Nov 2007 by Rhiju
Post:
Rosetta@home has been updated to 5.85:

- New features include proper diagnostics for a different kind of symmetry ('D2')

- A fix to the modeling of large symmetric complexes, and a lower memory usage by some of those jobs as well.

- A new output format that may be useful for designing new proteins.

- Some major improvements in the energy function used in RNA structure prediction, which we look forward to testing on a large scale!
23) Message boards : Number crunching : Problems with Rosetta version 5.81 (Message 48874)
Posted 20 Nov 2007 by Rhiju
Post:
We're looking into this one -- most of the failures (which are happening at a pretty low overall rate of a few percent) for BOINC_SYMM_FOLD_AND_DOCK_RELAX are indeed due to this error, which should be fixable! Thanks for continuing to post!

I've had errors from hbonds.cc on some BOINC_SYMM_FOLD_AND_DOCK_RELAX-****_-crystal_foldanddock__2257 WUs:
http://boinc.bakerlab.org/rosetta/result.php?resultid=121101205
http://boinc.bakerlab.org/rosetta/result.php?resultid=120920538
http://boinc.bakerlab.org/rosetta/result.php?resultid=120696678

24) Message boards : Number crunching : Problems with Rosetta version 5.81 (Message 48873)
Posted 20 Nov 2007 by Rhiju
Post:
Actually, I think this is our (the developer's) fault! There may be to much output from the run, and there's a maximum of something like a few Mb's allowed in the text output ("stdout.txt"). We're looking into it. Thanks much for posting -- we can't catch these kinds of problems on ralph!

120690844
Name 1i8f__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1i8f_-crystal_foldanddock__2257_49314_1
Workunit 109689498


On my AMD all units with 2257 fail due to 'maximum disk usage exceeded'.
Only that series of WU's gives this error, so I abort them whenever they come in to avoid wasting time.
To bad I have to do this, but as nothing I can do seems to solve this problem, it's the only solution for me.
Crunch on.


you could give boinc some more disk space to use ;)

25) Message boards : Number crunching : Problems with Rosetta version 5.80 (Message 47424)
Posted 5 Oct 2007 by Rhiju
Post:
Hi all: We're trying to track down several sources of error. I'm not sure if anyone's posted about this, but a small number of workuntis with the batch number 2156:

mcr1__BOINC_ABRELAX-mcr1_-mfr__2056_

appear to be flawed. I've cancelled the job; you should also feel free to abort these jobs if you see them. There aren't that many. I just fixed the problem and sent out a similar job with ID 2059.

We're looking into a few more issues too.. I've just contacted the people in charge of the other jobs... thanks *very* much for posting!
26) Message boards : Number crunching : Problems with Rosetta version 5.80 (Message 46638)
Posted 19 Sep 2007 by Rhiju
Post:
Also, I think this has been posted elsewhere: if you have jobs marked CAPRI, you should feel free to abort them. Although most of these workunits are running fine and we are using the incoming data, there is a small possibility of the workunits being frozen or of the output files being so large that the file transfer can get bungled. No more of these workunits are being sent out!

27) Message boards : Number crunching : Problems with Rosetta version 5.80 (Message 46637)
Posted 19 Sep 2007 by Rhiju
Post:
Hi Conan, I think you're exactly right -- the WU must have frozen at some time. We previously saw a high rate of watchdog kills for this type of workunit due to freezing, so that supports your hypothesis. There's clearly some problem with these jobs -- we've discontinued them until we can fix the issue!

My preferences are set to 21,000 seconds (6 hours).
This WU took over 28,000 seconds (nearly 8 hours) but still validated.

It was still successful and reported as Valid.
I claimed 98.93 for the time taken and the work done but was granted just 5.38.

What is the go with that?

I suspect the WU locked up during processing and this is why the time is so long, but as it is still valid I should get full credit.

There is an error report in this result but at the end all came out valid and successful.

Please I would like to know the reason for this result.

28) Message boards : Number crunching : loss of credit post crash (Message 46636)
Posted 19 Sep 2007 by Rhiju
Post:
Hi everybody: Thanks for your posts and for your patience over the last week. Quite a few things have been crazy. We have been testing all our workunits on the RALPH test server and they went through fine -- so your feedback over here at Rosetta@home has been critical to identifying and (in some cases) fixing new problems.

The issue with the CAPRI workunits appears to be the large numebr of generated models and the size of output files; this was hammering our already frazzled fileservers. We are no longer sending out those jobs -- if we do, we'll fix this issue first. We're very sorry for this problem; it was totally unanticipated.

There was also a separate issue with some workunits sent out before the crash not being accepted as valid; we had a problem with the database, and I think DK has fixed this.

Then of course there was the massive outage; as BarryAZ has explained, this is causing some craziness with the credits that should hopefully be gone in a week or so.

If you can, bear with us here. The results we're getting back are exciting on a number of scientific fronts. The CAPRI data on predicting protein-protein interactions is very interesting and we're analyzing it now. The work with NMR-constrained protein structural inference has the potential to revolutionize how structures are solved. And there's more exciting stuff coming soon -- we'll try to be as careful as possible!


Look instead at the total work completed credits. The average is based on something like the past 2 weeks, so you would expect it to drop and 'stay dropped' until the outage timeframe begins to fall outside that two week window. Daily credits for me still haven't quite recovered to the pre-crash levels -- todays hiccups didn't help with that of course, nor did the release into the wild of some 'bad boy' work units which CPU's would chew on but not yield credit. Take a look at the message board topic regarding the 5.80 application and look thru it -- work units with 'Capri' in the title have been mentioned as work units you want to abort.





I am looking at Average Work Done which is now down to 1286 and dropping fast. This seems to relate to the increasing delays from the server. It is now putting out 'communication deferred' times in the hours each day. The ranking of computers has dropped me from being around 6 or 7 to somewhere around 39 now. Why do we bother with these kinds of stats when the host site determines the out comes? I have watched hours and hours of work units sitting here not being able to be returned because the server was delaying communications.


29) Message boards : Number crunching : Problems with Rosetta version 5.78 (Message 46100)
Posted 12 Sep 2007 by Rhiju
Post:
One more question -- did you happen to notice if the screen looked totally stuck before the crash?
(Probably too much to ask.)

Ok, don't want to beat a dead horse, but just noticed

this also...

30) Message boards : Number crunching : Problems with Rosetta version 5.78 (Message 46098)
Posted 12 Sep 2007 by Rhiju
Post:
Thanks to everyone for posting. I think I know how to fix this (the watchdog problem)! I have removed these jobs from the queue for now, and when they are sent out again, we should see fewer premature exits...
31) Message boards : Number crunching : CAPRI14? (Message 46084)
Posted 12 Sep 2007 by Rhiju
Post:
This is puzzling -- those jobs arent taking long. We'll look into it. In the meanwhile, can your or other post links to the appropriate results that were killed?
32) Message boards : Number crunching : Rosetta Application Version Release Log (Message 45713)
Posted 2 Sep 2007 by Rhiju
Post:
This version adjusts the way that experimental data is used to guide some RNA folding runs; there's also a new feature that lets us choose building blocks for RNA folding from different existing source structures.

Also, this version should improve folding efficiency in runs that involve a special type of symmetry (''D2'').
33) Message boards : Number crunching : Problems with Rosetta version 5.78 (Message 45712)
Posted 2 Sep 2007 by Rhiju
Post:
Not too much different in this app from previous version. Thanks for continuing to post problems!
34) Message boards : Number crunching : Problems with Rosetta versions 5.72 and 5.73 (Message 44917)
Posted 12 Aug 2007 by Rhiju
Post:
Any further problems with those workunits? From our end, the overall error rate seems pretty low for those workunits.

oh joy, i got about 5 of those same style of WU's on my ssytem. but its a day or so out to get to them. wonder if they are going to hang on my system or not. let you know.

Found this wu stalled, on one of my dual core rigs, required a system reboot to get it going again. Good thing I was home, (Saturday) or it would have been stuck all day. If this would have happened on the same remote machine, it would have been pulled today.

http://boinc.bakerlab.org/rosetta/result.php?resultid=98752680

Another beta 5.73 wu

At least this was on a rig in my house, but still 2 problems with 5.73 BETA within a couple of days now.

These rigs have been problem free since that last wu that had 0 cpu time on July 26th, until 5.73. Guess it's time for BETA 5.74 huh

beta = bahhh
I think thats going in my sig.


35) Message boards : Number crunching : Problems with Rosetta versions 5.72 and 5.73 (Message 44916)
Posted 12 Aug 2007 by Rhiju
Post:
Hi -- this "warning" is OK, We'll try to remove it from later applications.

5.73
08/08/2007 16:38:52|rosetta@home|Computation for task 1c26__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1c26_-foldanddock__1878_6494_0 finished

In BOINC LOGX I see the message --
Warning! Not sure non-ideal rotamers are compatible with symmetry yet. etc etc.(as in message 44768).

This does not appear in Rosie's message logs.
Can I get an explanation?

36) Message boards : Number crunching : Rosetta Application Version Release Log (Message 44914)
Posted 12 Aug 2007 by Rhiju
Post:
Rosetta@home has been updated to 5.76; as with the previous update, some older workunits will continue to crunch with 5.68.

This version fixes another bug in high resolution modeling of RNA; due to a rather arcane issue, certain RNA hydrogen bonds were not being scored properly.

We will also be testing a new procedure for closing chainbreaks that should lead to faster recognition that a protein structure is completely wacky, and allow your computer to go on to simulating a new conformation quickly. More efficient, hopefully!
37) Message boards : Number crunching : Problems with Rosetta version 5.76 (Message 44913)
Posted 12 Aug 2007 by Rhiju
Post:
Hi, this application is pretty similar to the previous one (5.73) -- thanks for continuing to post and reply to issues.
38) Message boards : Number crunching : Problems with Rosetta versions 5.72 and 5.73 (Message 44685)
Posted 4 Aug 2007 by Rhiju
Post:
We figured out this issue -- when this workunit gets resent, it should work fine.

I contacted the person in charge of this workunit, and we won't be sending out anymore until it gets fixed...

This doesn't sound right -- we're looking into this job (1850) now.


This one never got started..
On either rig..

1d3z_non_ideal_BOINC_MFR_ABRELAX_PICKED_1850_5161_0

http://boinc.bakerlab.org/rosetta/result.php?resultid=94897692



39) Message boards : Number crunching : UPX (Message 44684)
Posted 4 Aug 2007 by Rhiju
Post:
Hi -- this looks great, we'll use this compression for the next update (sorry I missed it for these last couple). It wil be nice to compress the Mac apps now!
40) Message boards : Number crunching : Problems with Rosetta versions 5.72 and 5.73 (Message 44576)
Posted 1 Aug 2007 by Rhiju
Post:
I contacted the person in charge of this workunit, and we won't be sending out anymore until it gets fixed...

This doesn't sound right -- we're looking into this job (1850) now.


This one never got started..
On either rig..

1d3z_non_ideal_BOINC_MFR_ABRELAX_PICKED_1850_5161_0

http://boinc.bakerlab.org/rosetta/result.php?resultid=94897692




Previous 20 · Next 20



©2024 University of Washington
https://www.bakerlab.org