Problems with rosetta 5.48

Message boards : Number crunching : Problems with rosetta 5.48

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 37478 - Posted: 5 Mar 2007, 17:12:17 UTC

Some people may have seen it on the front page, but everyone has access to their own BOINC manager.

When my older laptop wasn't running any Rosetta unit I checked the messages tab, and it said "Hey, dude, your computer only had 256MB and Rosetta needs 512MB now" and I said to myself "Okey dokey!"
ID: 37478 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 37479 - Posted: 5 Mar 2007, 17:13:39 UTC - in response to Message 37453.  

For example, all the "HINGE" WUs have been tested on Ralph with the same high memory requirement and the error rate was normal.

...

I think Ralph is overly represented by windows platforms


For what it is worth, I do have my desktop Linux system setup to participate on Ralph and while it did get 3 workunits for the new 5.48 client they were all of the normal (small) size. I followed the instructions on the Ralph website and gave Ralph a much lower priority than Rosetta but I think I'll better change that to give Linux more exposure in testing on Ralph.

I agree with the comments made about the lack of communication. One of the reasons I decided to participate in Rosetta was the fact that I repeatedly read about how great the communication between Project Scientists, Software Developers and the User Community was. I don't know why and how this has changed (I'm still fairly new to Rosetta) but I see a Technical News section and a Active WorkUnit(s) Log thread that aren't being updated and could (should!) have been used to give advance notice.

Thank you for taking the time to respond to posts on the message board. Every bit of communication is helpful and very much appreciated.
Team Helix
ID: 37479 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 37480 - Posted: 5 Mar 2007, 17:14:58 UTC

I appear to be in the minority in thinking that a message in my BOINC Manager telling me Rosetta is running units with more RAM than I have counts as communication -_-
ID: 37480 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rene
Avatar

Send message
Joined: 2 Dec 05
Posts: 10
Credit: 67,269
RAC: 0
Message 37486 - Posted: 5 Mar 2007, 18:30:06 UTC - in response to Message 37478.  

Some people may have seen it on the front page, but everyone has access to their own BOINC manager.

When my older laptop wasn't running any Rosetta unit I checked the messages tab, and it said "Hey, dude, your computer only had 256MB and Rosetta needs 512MB now" and I said to myself "Okey dokey!"


Sorry Matt... I don't want to piss on your parade... but if you check this results you can see that others ran fine on my Ubuntu host.

Same BOINC manager... same 5.48 client... and no messages appeared in the manager. Nope the manager crunching Rosetta, brought the X-server to a complete halt and the system wasn't responding anymore.

I wasn't waiting for memory... the just couldn't make the next step.

And maybe some more memory would have helpt... but it was never needed before.

;-)

ID: 37486 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 37487 - Posted: 5 Mar 2007, 18:46:59 UTC - in response to Message 37480.  

I appear to be in the minority in thinking that a message in my BOINC Manager telling me Rosetta is running units with more RAM than I have counts as communication -_-


hi Matt,

of course that counts as communication.

Two things are going on in response to this new communication.

Firstly, some people are reacting to the newness of the message, as they recognise that they have never seen it before. They wonder, did I do something or did the project team do something or is it a new feature. In fact, having asked, they discover it is two of those things combined. All of that counts as communication too, clarifiaction in response to the original communication.

And, I think it is totally fair comment that communication would have been even better had there been a short note on the forum, or even on the front page, to the effect that there are some big jobs on the way and that the project team are going to take advantage of the new facilities in BOINC to screen the big jobs away from smaller hosts.

Communication in advance is *even* *more* *effective* than communcation that starts at the point of change.

Secondly, some people (including myself) think we are spotting flaws in this new automated screening process. We are pointing out those (alleged) flaws. Whether we turn out to be right, or whether the project or BOINC folk come back and explain why we are mistaken, all of that is communication too.

But I would also add, that while I think more communication up front would have helped here, this is in the context of a project that consistently has the best communication of any BOINC project Ive been part of, and far better than the non-BOINC DC projects I have been part of.

So when I say that, on this occasion, communication in advance would have helped, I do not mean it as a complaint. I am offering a pointer to how (IMO) things could have been even more betterer than other projects.

R~~
ID: 37487 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 37489 - Posted: 5 Mar 2007, 18:53:07 UTC

My user average has lost serious amount of points since HINGE started comeing through. I keep getting credit and my user total keeps going up, but my user average keeps diving to the bottom of the graph every time a WU completes and credit is granted.

Anyone got any ideas?
ID: 37489 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rene
Avatar

Send message
Joined: 2 Dec 05
Posts: 10
Credit: 67,269
RAC: 0
Message 37490 - Posted: 5 Mar 2007, 18:53:22 UTC - in response to Message 37450.  

I think it is the problem of lacking enough memory as yours has only 256MB.


Thanks for responding, but there seems to be a problem related to the Linux-host.
Yesterday an update was performed and problems seem to be post-update.
Today Ubuntu froze again... Seti was running... after completing the wu and giving it the "ready to report" status... a new wu should have started, but instead of that... well... nothing... a complete "freeze" of the system.

;-)

ID: 37490 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 37493 - Posted: 5 Mar 2007, 19:03:02 UTC - in response to Message 37489.  
Last modified: 5 Mar 2007, 19:06:19 UTC

My user average has lost serious amount of points since HINGE started comeing through. I keep getting credit and my user total keeps going up, but my user average keeps diving to the bottom of the graph every time a WU completes and credit is granted.

Anyone got any ideas?


Hi Greg,

Are you on BOINC v 5.8.x, or still on v 5.4.x?

How much memory do you have on the relevant box?

The credits given are an average for other people who have run similar work - ie HINGE credits should be adjusted in line with the average crunchers progress on those tasks.

If (and I am taking a wild guess here, so forgive me if I am wrong on either/both counts) if you have less than 400Mb AND you are on the older BOINC client, then you may be running work that is too large for your system to run effectively - the result would be heavy swap file usage, and your system running slower than the typical systems that were used to set the level of credit.

This would not happen on the newer BOINC client, as if you had too little memory you would not have been given the work in the first place.

EDIT - add: On the other hand, with the cpu you have, I bet you have plenty of memory, so I am probably way off target in your case. But I will leave my comments on the board in any case in case they apply to anyone else.

Don't know if that helps.

R~~
ID: 37493 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 37497 - Posted: 5 Mar 2007, 21:10:12 UTC - in response to Message 37493.  


My user average has lost serious amount of points since HINGE started comeing through. I keep getting credit and my user total keeps going up, but my user average keeps diving to the bottom of the graph every time a WU completes and credit is granted.

Anyone got any ideas?


Hi Greg,

Are you on BOINC v 5.8.x, or still on v 5.4.x?

How much memory do you have on the relevant box?

The credits given are an average for other people who have run similar work - ie HINGE credits should be adjusted in line with the average crunchers progress on those tasks.

If (and I am taking a wild guess here, so forgive me if I am wrong on either/both counts) if you have less than 400Mb AND you are on the older BOINC client, then you may be running work that is too large for your system to run effectively - the result would be heavy swap file usage, and your system running slower than the typical systems that were used to set the level of credit.

This would not happen on the newer BOINC client, as if you had too little memory you would not have been given the work in the first place.

EDIT - add: On the other hand, with the cpu you have, I bet you have plenty of memory, so I am probably way off target in your case. But I will leave my comments on the board in any case in case they apply to anyone else.

Don't know if that helps.

R~~


Hi, thanks for the explanation.
I've got one stick of 512 DDR RAM in my box and I had a previous version of BOINC 5.8.x before HINGE. However it kept telling me that I didn't have enough memory, so I updated to the latest BOINC 5.8.x and changed my memory usage settings, however I think my machine is a bit slower in computing HINGE than other machines. So if I understand your message correctly this would be the cause of my average dropping.

ID: 37497 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingemar

Send message
Joined: 28 Feb 06
Posts: 20
Credit: 1,680
RAC: 0
Message 37502 - Posted: 5 Mar 2007, 23:17:01 UTC

Hi all, the HINGE jobs you run are for the current round of CAPRI (Critical Assessment of PRediction of Interactions) in which a the predictors are given the structure of two binding partners and are challenged to predict the structure of the protein complex during 2-4 weeks. This round involves a prediction of a structure of a homodimer. We do not have structure of the protein monomer in this case but know that it has high sequence similarity with a couple of other proteins which can be used as a template. The protein has two domains connected by a loop which we think act as a hinge. So its is pretty challenging problem where we have to simulate dimer of 750 aminoacids together with conformational flexibility in the hinge region. So we need a lot of computer power and memory in order to simulate this system.

Next time we will give heads up when we run something like this...

Thanks for your support!
ID: 37502 · Rating: 2 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile hedera
Avatar

Send message
Joined: 15 Jul 06
Posts: 76
Credit: 5,263,150
RAC: 87
Message 37507 - Posted: 6 Mar 2007, 5:19:51 UTC

WAH! I have 2 WUs running at once! Of course, only one of them is a HINGE; the other is a new one:

03/05/2007 8:06:30 PM|rosetta@home|Starting 1xpv_1_NMRREF_1_1xpv_1_idid_model_01IGNORE_THE_REST_idl_1597_2544_0

So maybe there's hope.
--hedera

Never be afraid to try something new. Remember that amateurs built the ark. Professionals built the Titanic.

ID: 37507 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 37508 - Posted: 6 Mar 2007, 5:28:28 UTC - in response to Message 37507.  

WAH! I have 2 WUs running at once! Of course, only one of them is a HINGE; the other is a new one:

03/05/2007 8:06:30 PM|rosetta@home|Starting 1xpv_1_NMRREF_1_1xpv_1_idid_model_01IGNORE_THE_REST_idl_1597_2544_0

So maybe there's hope.


Of course, if you don't want to run HINGE you can always reduce your box's memory ;-)
ID: 37508 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 37512 - Posted: 6 Mar 2007, 8:35:39 UTC

I had the same thing last night!
I was doing something else, so BOINC stalled HINGE at :43 and went on to another project of lower memory and then resumed HINGE after that WU completed.

However I have not got any more HINGE WU's in queue, instead its back to ABRELAX and NMRREF, why is this? Is my system with 512MB RAM not powerful enough to run HINGE or is that just luck of the draw for WU's?
ID: 37512 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 37521 - Posted: 6 Mar 2007, 11:13:19 UTC

Don't know if this problem is specific to this Rosetta version.

https://boinc.bakerlab.org/rosetta/result.php?resultid=65982275
https://boinc.bakerlab.org/rosetta/result.php?resultid=65982274
https://boinc.bakerlab.org/rosetta/result.php?resultid=65921927
https://boinc.bakerlab.org/rosetta/result.php?resultid=65759711

These four tasks all failed to get started during a 14min period when the CPU was in use intensively on mathematical calculations. According to top, Rosetta was getting about 5% cpu time but was also being squashed out of memory.

OK so this was not a normal occurrence, but even so Rosetta should wait politely for the higher priority cpu usage to go away and then pick up again nicely.

On the other hand, probably not a critical issue to put a lot of work into resolving.

R~~
ID: 37521 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 37527 - Posted: 6 Mar 2007, 12:43:45 UTC - in response to Message 37512.  

... I have not got any more HINGE WU's in queue, instead its back to ABRELAX and NMRREF, why is this? Is my system with 512MB RAM not powerful enough to run HINGE or is that just luck of the draw for WU's?


It is the luck of the draw, with four exceptions, if I understand how it is designed to work correctly:

- HINGE will not be issued to a machine with < 477 or whatever

- once a specific HINGE has been held back from a machine, that HINGE remains at the top of the pile until it is placed with a suitable machine

- a small machine that is being refused work will be offered each of the priority HINGEs in turn, then if there were less than 50 of them, gets (50-N) goes at picking jobs at random. If it gets jobs that fit, fine, if it does not get any in the rest of its 50 prize draws, then it gets the 'There was work but' message and is told to try again in about 5 mins.

- when there are 50 HINGE jobs that have all been prioritised, then the scheduler stops issuing smaller jobs (which is when everyone with small boxes see the 'There was work but' message for several tires in a row, until some bigger boxes come along and remove the priority pile.

'50' is a project settable limit, so may be some other number.

As you will see, none of those exceptions prevent a small job going to a big machine, even when there is big work waiting.

John Keck has already passed back to BOINC the suggestion that it would work better for this project if the work was released from different queues. In fairness this is BOINC's first attempt to address this issue and the best thing about all first attempts is that they show you how to do the second attempt better...

R~~
ID: 37527 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keck_Komputers
Avatar

Send message
Joined: 17 Sep 05
Posts: 211
Credit: 4,246,150
RAC: 0
Message 37534 - Posted: 6 Mar 2007, 16:45:24 UTC - in response to Message 37527.  


'50' is a project settable limit, so may be some other number.

I don't think that is settable. I can not find it listed in the description of config.xml. That is were I would expect it to be set if it was settable.
BOINC WIKI

BOINCing since 2002/12/8
ID: 37534 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 37536 - Posted: 6 Mar 2007, 17:42:30 UTC - in response to Message 37534.  

I don't think that is settable. I can not find it listed in the description of config.xml. That is were I would expect it to be set if it was settable.

Not sure, but was under the impression the Scheduling-server continued searching through the full Feeder-shared-memory-array, if didn't already find enough work...

Projects selects themselves how large the Feeder-array should be, there 100 result-"slots" is default and atleast some projects has increased to 500. Not sure how large Rosetta@home's shared-memory-array is...
"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 37536 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 37541 - Posted: 6 Mar 2007, 18:33:41 UTC - in response to Message 37527.  

... I have not got any more HINGE WU's in queue, instead its back to ABRELAX and NMRREF, why is this? Is my system with 512MB RAM not powerful enough to run HINGE or is that just luck of the draw for WU's?


It is the luck of the draw, with four exceptions, if I understand how it is designed to work correctly:

- HINGE will not be issued to a machine with < 477 or whatever

- once a specific HINGE has been held back from a machine, that HINGE remains at the top of the pile until it is placed with a suitable machine

- a small machine that is being refused work will be offered each of the priority HINGEs in turn, then if there were less than 50 of them, gets (50-N) goes at picking jobs at random. If it gets jobs that fit, fine, if it does not get any in the rest of its 50 prize draws, then it gets the 'There was work but' message and is told to try again in about 5 mins.

- when there are 50 HINGE jobs that have all been prioritised, then the scheduler stops issuing smaller jobs (which is when everyone with small boxes see the 'There was work but' message for several tires in a row, until some bigger boxes come along and remove the priority pile.

'50' is a project settable limit, so may be some other number.

As you will see, none of those exceptions prevent a small job going to a big machine, even when there is big work waiting.

John Keck has already passed back to BOINC the suggestion that it would work better for this project if the work was released from different queues. In fairness this is BOINC's first attempt to address this issue and the best thing about all first attempts is that they show you how to do the second attempt better...

R~~


You refer to 477, but that is what?
now my cpu can handle the following according to the benchmark test:

||Benchmark results:
|| Number of CPUs: 1
|| 1713 floating point MIPS (Whetstone) per CPU
|| 3147 integer MIPS (Dhrystone) per CPU

how does this relate to the criteria needed to run HINGE?
ID: 37541 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 37546 - Posted: 6 Mar 2007, 19:27:39 UTC

The HINGE tasks have a higher memory requirement. The 477MB is the amount of memory BOINC must be allowed to use in order to process a HINGE task. The amount of memory BOINC is allowed to use is configurable in your General Preferences. The prior versions of BOINC did not enforce the memory settings. But the 5.8.8 version does enforce it.

The BOINC benchmarks are not looking at your machine's available memory, they strictly are testing the CPU power of the machine.

When BOINC contacts Rosetta to get more work, it compares the machines memory with that requirement of the work units and sometimes results in the messages others have reported in this thread if there are no tasks available with the lower memory requirement.
Rosetta Moderator: Mod.Sense
ID: 37546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 37547 - Posted: 6 Mar 2007, 19:41:49 UTC - in response to Message 37541.  
Last modified: 6 Mar 2007, 19:43:07 UTC

You refer to 477, but that is what?

Each individual wu has a memory-requirement, for HINGE this memory-requirement is apparently 477 MB.

If wu's memory-requirement > computers memory * the highest % of "Use at most N% memory when computer is... in use" and "... is idle"

(one of the general preferences), computer won't download this wu at all.


Also, in BOINC-client v5.8.xx, if a wu somehow starts to use more memory than

computers memory * the highest % of "Use at most N% memory when computer is... in use" and "... is idle"

the wu will be automatically aborted...


Therefore, it's better to set wu's memory-requirements a little higher than neccessary to be on the safe side, instead of risking many wu's to be aborted since uses too much memory...


With default preferences, a computer with only 512 MB memory won't handle any wu's with memory_requirement > 460 MB.
"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 37547 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Problems with rosetta 5.48



©2024 University of Washington
https://www.bakerlab.org