Posts by Hubington

1) Message boards : Number crunching : Add support for DB funcs over tasks (Message 56903)
Posted 13 Nov 2008 by Hubington
Post:
This was much my thinking this way they can get info on everyone who is showing long running units. They don't need to ammend the code for the client application though as based on what I can see when I examine the task information for my own tasks, there is enough information being captured in the database to create whats called a stored procedure that would generate a list of long running units that could be colated as required. If it took someone more than an hour to do this I'd be shocked, although a good DBA with existing knowledge of the tables should be able to put it together in under 5 mins.

If you could find out if this has been done it would be most usefull as I'm getting a lot of long runners that there is little point in me reporting them and cloging the forum if they are already known of. If it isn't though then I'd strongly recommend that this be looked into quickly.
2) Message boards : Number crunching : Add support for DB funcs over tasks (Message 56876)
Posted 12 Nov 2008 by Hubington
Post:
The point you haven't touched upon yet is what you would like to do with the information.


just to back up my previous statments on this one with some facts, in one of your posts (assuming this account is used by one person and not a general administrative account shared by a group) you open saying you start a thread for people to report long running WU. See http://boinc.bakerlab.org/rosetta/forum_thread.php?id=4375&nowrap=true#49379

The system I'm suggesting would not be so much for me as although I'd find it interesting to be able to look up that data in one easy place in the event that the credit level produced by me was low, the reality is I don't really care that much. By virtue of your starting a thread for people to report the things in though I'd have to assume your interest level was higher.

Alternatly if my assumption is in error, no-one in the porject has any interest in being notified of these long running units and your just trying to group together a number of irrelevent posts so they don't clog the forum then I'd suggest it would be helpfull to point this out (perhap even amend forum rules if it's happening a lot) as I've been going out of my way to provide this information to you not becuase I have a major problem with it, but becasue I thought it was something you wanted. If this isn't the case though then I've been wasting my time, and it woudln't supprise me to find I was the only one.

That said based on the reply from Mike Tyka to feedback supplied by users of long running units to his large homology model it would seem that this information was useful as it highlighted an issue that had not shown up with inhouse testing.

This system if implimented against the inhouse testing system and RALPH may even prove to help identify issues which are at present may be being missed. After all humans are falable and an extra check could help.

The only down side I can see for this is that the analysis of such a large amount of data would have a noticable impact in the database. Without knowing the more about the servers current hardware & load aswell as the number of records being handled it's hard to make a guess as to how long this would take but I'd guess it would have a run time of approximatly 5 mins or 15 at the most, which is why I sugesseted it only be ran once a day.

If the restults are then stored and distributed internally though, once a day shoud be ideal. If the performance hit of doing it all in one go is to much though then you could set it up to only process one model at a time and stagger them across the day, or not run it for models where they have no interest in the results.
3) Message boards : Number crunching : Add support for DB funcs over tasks (Message 56841)
Posted 11 Nov 2008 by Hubington
Post:
To deal with one of your earlier points first the time recorded isn't the time each WU started and the time it ended but the CPU time used in seconds. 1 CPU second is infact 1 second of complete usage of a single core. So taking a system with a single CPU as an example. If you had a WU running but only using 50% of the CPU for 2 seconds then that would equate to 1 CPU second. It's much the same principle as kilo watt hours on an electric meter except every CPU has it's own work load assosiated with 1 CPU second depending on clock speed.

To be honest I'm not 100% sure what you would do with it, it's the people running the project that seem to have requested we report this information to them so you might want to ask them what their interest is.

If I were to hazzard a guess as to why they are interested in it I'd assume long running running WU are symptomatic of a piece of botched or ineffeciant code/logical process. If thats the case then investigation into examples of where this has happened is the only way they will be able to resolve these issues and to investigate you first need to identify. Now if you want to identify this you are reliant on 1 of 2 processes. Have people (be that end users or project staff) look over it and miss out on 97% occurances or impliment the idea I've sighted above and depending on what level of variance you allow for impliment the above and probably catch about 75% of long running WU and 100% of the extreme cases, which are going to be the ones they are really interested in.

Not wanting to assue you need this all spoon fed, but if you were to generate an overnight report for each model then that report could be passed to the lead for that model and they can do what ever it is they do that keeps my CPU on the redline day and night.

as for users changing there WU run time preferances. It's my experiance that people set that sort of thing how they want it and then leave it there. In the unlikley event that a lot of false positives get thrown up owning to a user who alters there preferances on a daily basis then the SQL can be ammended to exclude these users. In much the same way as the code we run for Rosseta has evolved, every other system in the world is evolved to account for situations that had not been seen at there time of conception.

What I can assure you of though is that with no system you will be missing out on a great deal when compared to even a half baked system.
4) Message boards : Number crunching : Add support for DB funcs over tasks (Message 56836)
Posted 11 Nov 2008 by Hubington
Post:
We are kinda off-topic here though, so if you'd like to continue this conversation, let's find another thread to do so.


Granted the tie in was somewhat tenuious, my appologies.

I'd like to expand on this and after a quick poke about don't see any existing thread that relates. I don't supose you could be so kind as to break off my previous responce and your reply to a seperate topic?

many thanks
5) Message boards : Number crunching : Add support for DB funcs over tasks (Message 56825)
Posted 11 Nov 2008 by Hubington
Post:
I was just posting information on long running WU (work units) for a different model within the rosetta project when I had thoght, it gets a little techy though so if you don't know databases prepare to get a little lost.

I was thinking about if I could code anything up to scan my log and ID the WU that run long, and then it struck me, why isn't the project doing that?

You have all the information stored in a MSSQL database. You have who ran what, how long it took, and how long there preferaces state it should take. With access to that database I could code a stored procedure that either took the preferances for users, put in a 20% variance to exclude minor anomolies and compare it to there results, or even look at the average run time of WU for a user in the last month and see which results fall to far outside of it. And there there is the whole inbuilt reporting module that comes with MSSQL now (not wanting to sound like a MS marketing pictch but it is what your running after all)

Given most users will set it up and forget about it unless the whole thing bursts into flames, setting this up and running it once a day to gather the mistakes of the last 24hours would make a lot of sence to me. Certainly a lot more than rellying on users to catch it when most of the time the stuff is going to be reported before they even have a chance to look at it, so I can't see why you arn't doing it? Or if you are then why not announce it in the thread so users don't waste time digging out info you already have.

p.s. I'm not asking for access to the DB to do it. My thinking is, if I can do it you must have someone in house who if handed this could pick it up and run with it.
6) Message boards : Number crunching : Longer tasks providing poor granted credit? (Message 56824)
Posted 11 Nov 2008 by Hubington
Post:
I reported something similar not so long ago and was told that this was due to the research nature of the project. Although testing is conducted to catch issues such as rouge code in the ralph project, it would seem the remit of the Rosetta project also covers behaviour analysis. As such new research methods are tested within the rosetta work units to try and come up with better and faster ways of doing things. While some of them work, others don't.

This may be because the new code dosn't play well with your hardware and was never exposed to something like it during the ralph testing or it may be they just went down a wrong road. Either way by aborting it and not letting it have a chance your making it harder to conduct the research as they won't know if it worked or not, and for some of the work units it will work and something will be learened from it. For the ones that go long still something can still be learned, but not if you go round aborting them.

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=4388&nowrap=true#56204

go there if you want to see extact what was said to me.
7) Message boards : Number crunching : Report long-running models here (Message 56823)
Posted 11 Nov 2008 by Hubington
Post:
11/11/2008 06:11:44|rosetta@home|Computation for task 1hzh_2fi9_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_20_0 finished

run time is supposed to be 3 hours or there abouts, took 8 hours 11 mins

given the similarity in name to the above threed I assume it to went to a snails pace in the last few percent.
8) Message boards : Number crunching : Report long-running models here (Message 56195)
Posted 3 Oct 2008 by Hubington
Post:
You think you have it bad what a waste of time and energy this one was

194952967 26 Sep 2008 18:51:19 UTC Over Client error Compute error 61,849.00 562.03 ---


it's all these hombench tasks. It's like I said in this threed where they announced it (http://boinc.bakerlab.org/forum_thread.php?id=4388). This sort of thing should really be part of RALPH not Rosetta.

As I understand it Rosetta was set up so all us grunts can do the monkey work with the tested and proven applications. While RALPH was for RnD for Rosetta so they could test new ideas and get them working right before we all grind away at processing it all. If you go to the RALPH home page the first thing on the page says

"RALPH@home is the official alpha test project for Rosetta@home. New application versions, work units, and updates in general will be tested here before being used for production. The goal for RALPH@home is to improve Rosetta@home."
9) Message boards : Rosetta@home Science : Large Homology Modeling Benchmark (Message 56188)
Posted 3 Oct 2008 by Hubington
Post:
Over the next 4 months we will be testing some brand new ROSETTA code developed to address the problem of homology modeling in a standardized fashion. We will be using BOINC to run the code with various parameters and sub-algorithms on a comprehensive set of problems to try and come up with a good approach that yields consistent results.


Just a thoguht but isn't this what RALPH is for?

The idea behind RALPH also being that people who signed up for it were activly keeping an eye on these things so that they could report issues to you while the Rosetta users were more your, well meaning but don't really want to get to involved sorts.

The other advantage being that the RALPH users sign up knowing they arn't doing actual work but simply testing different ways in which the work can be done to try and find a better way of doing it. As a result they accept that at times someone is going to get it a little wrong and a 3 hour packet will take 40+ hours. While your average Rosetta user sees this and thinks the project is, buggy, wasting there resources they kindly donating and potentially either drops the project or becomes disillusioned with the whole grid computing idea and knocks the whole thing on the head.

To Quote R L Casey
This is research, and I am reminded of the saying "If we (really) knew what we were doing, it wouldn't be research."!


I agree it is research, however the research this project was set up for was protein folding not discovering how to write the code to do simulated protein folding. While RALPH was set up to research just that with a much smaller community who are prepared to be more involved with there feedback.

Given that someone went to the trouble of actually seperating the two elements to run them in partnership, I can't understand why it isn't being used in that way.
10) Message boards : Number crunching : Report long-running models here (Message 56187)
Posted 3 Oct 2008 by Hubington
Post:
Well I imgaine that the claime is based on cycles used so I imagaine that your processor puts out more power than mine which is why you generate more credits per hour than I do. But then the granted credit is problaby result based rather than effort put in. The theory being that X amount of effort usually yeilds Y amount of results. Which is why you get small variances between the claimed and granted, usually being granted less than claimed but seemingly not always. Also I suspect certain sub projects of WU yeild more/less results per hour than others.

In the case of these work units though the system is seemingly using a lot of cycles but producing little or no results for it and so the claime going in is much higher than whats being granted.

Just an observation though
11) Message boards : Number crunching : Report "hombench_..." issues here! (Message 56186)
Posted 3 Oct 2008 by Hubington
Post:

currently been running for 36 hours & 5 mins! 99.540% complete


Finished at 39 hours 35 mins

credit claimed: --
credit granted: --
outcome: Validate error
12) Message boards : Number crunching : Report "hombench_..." issues here! (Message 56182)
Posted 3 Oct 2008 by Hubington
Post:
New one on the way

minirosetta 1.34: hombench_mtyka_foldcst_boinc_test3_foldcst_simple_t286___4580_1561_0

currently been running for 36 hours & 5 mins! 99.540% complete

OK I just noticed something VERY worrying while trying to see how long it took to click over 0.001%, the run time jumped back 6 mins?!?!?! and now it lost 0.001% from the progress taking it back to 99.539

running on AMD dual cores of 2.41Ghz (4800+ combined) if that makes a difference. 64bit chip with a 32bit OS

When it is makeing progress it looks as though it's taking 5 mins to get 0.001% but if the CPU run time is constantly jumping back as I observed it do, then who can say what the run time really is!
13) Message boards : Number crunching : Report long-running models here (Message 56181)
Posted 3 Oct 2008 by Hubington
Post:
in fine accordance with Murphys law, it just finished.

Your machines are hidden so I can't look at your results. Curious to see how the claimed/granted credit you got from that wu compares to the same machines regular performance.



yeah I'm paranoid :)

Here is a smattering of the surrounding restults for that WU though
CPU Time |Claimed |Granted
9,912.30 |37.37 |33.71
9,435.89 |32.95 |34.51
33,868.45 |127.69 |22.29
10,209.95 |35.65 |33.69
10,327.84 |36.06 |34.75
5,979.70 |20.88 |23.02
9,766.56 |34.10 |25.06

(the formating gets messed up so I've seperated the coloumns with | marks)

New one on the way incidently

minirosetta 1.34: hombench_mtyka_foldcst_boinc_test3_foldcst_simple_t286___4580_1561_0

currently been running for 36 hours & 5 mins! 99.540% complete

OK I just noticed something VERY worrying while trying to see how long it took to click over 0.001%, the run time jumped back 6 mins?!?!?! and now it lost 0.001% from the progress taking it back to 99.539
14) Message boards : Number crunching : Report "hombench_..." issues here! (Message 56115)
Posted 30 Sep 2008 by Hubington
Post:
Something seriously wrong there. This machine typically claims and gets 50 - 60 for a 3 hour wu, this wu was over 7 hours and claimed 148 and was granted 20!!!!!


I had a similar problem with minirosetta 1.34: hombench_mtyka_foldcst_boinc_test3_foldcst_simple_t328___4598_724

It took 9 hours 25 mins to complete showing 9 mins 52 seconds left for atleast 3 hours of that. I'd quite like to get in the top 100,000 contributers so I decided to check the credit for this, claimed credit was 127, granted was 22 when I normally get around 35-40 for a 3 hour packet

At the end of the day I don't really care about the credits, it's just a nice little motivating factor, but I'm sure there are people who do and will start to abort these units so as not to lose credits, especially if their system dosn't kick out that much each day to start with.
15) Message boards : Number crunching : Report long-running models here (Message 56104)
Posted 30 Sep 2008 by Hubington
Post:
in fine accordance with Murphys law, it just finished.

If someone could find out why the run time was over 3 times the norm though it could be useful as others may kill off the work units thinking they had died.
16) Message boards : Number crunching : Report long-running models here (Message 56103)
Posted 30 Sep 2008 by Hubington
Post:
minirosetta 1.34: hombench_mtyka_foldcst_boinc_test3_foldcst_simple_t328___4598_724

Usually it takes between 2-3 hours for a work unit to compelte for me however for the above listed unit it is currently on 9 hours with 98.2% compelte with 9 mins 50 secs remaining

I first noticed the runtime of it earlier today when it was at about 6.5hours at 97.3% with 9 mins 50 secs remaining.

Now I know the remaining times are estimates, but there are estimates, theres what windows estimates when you go to copy a file and then there is this. Basicly I'm worried that the work unit is just wasteing cycles and wondered if anyone has any thoughts on it. based on a 60 second sampling I just took It is notching up 0.001% of progress every 20 seconds so in theory it should complete in about 9-10 hours time. That is assuming that it just contains a lot more work than normal rather than just spinning it's wheels.

any comments welcomed
17) Questions and Answers : Windows : stats are off (Message 54899)
Posted 3 Aug 2008 by Hubington
Post:
This may be a Boinc issue rather than a rosseta one but when I look at my rosetta stats in boinc the graphc have extra data for the 13th of Aug (10 days in future) for each day out to the 17th of Aug (2 weeks in the future).

not overly bothered by this as I'd describe it as more of a nice to have than something I need but it could be symptomatic of something that is causing a real problem elsewhere.

before anyone asks I'm only running the one project so it isn't happening with other projects on account of there not being any.
18) Questions and Answers : Windows : minirosetta_1.24_windows_intelx86.exe trying to access the internet (Message 53386)
Posted 27 May 2008 by Hubington
Post:
So it may prompt for aditional access in the future but only in the case of errors not as a standard course of practice?
19) Questions and Answers : Windows : minirosetta_1.24_windows_intelx86.exe trying to access the internet (Message 53316)
Posted 24 May 2008 by Hubington
Post:
I got onto my first mini rosetta 1.24 work unit today and it has started to try and access the internet which got caught by the firewalls.

All communication has till this point been handled by boinc so I don't know why it suddenly wants branch out on it's own.

I really don't want to go adding more exceptions to the firewalls as if nothing else it's a right pain in the arse.
20) Message boards : Cafe Rosetta : beta? (Message 53027)
Posted 12 May 2008 by Hubington
Post:
I've noticed it says rosetta beta on some of the work units, I thought they had ralph for testing new methodologies? and all the stuff on rosetta was final release, although granted the final release may sometimes have mistakes in it.


Next 20



©2024 University of Washington
https://www.bakerlab.org