|
1)
Message boards :
Number crunching :
Forget credit for a moment and look at this.
(Message 26801)
Posted 15 Sep 2006 by SuperG //1.303.02% Post: Now imagine another 2000 good machines running on this project. Always more than one perspective on this. |
|
2)
Message boards :
Number crunching :
out of work
(Message 26179)
Posted 6 Sep 2006 by SuperG //1.303.02% Post: Thanks Feet1st, tralala, doc, dcdc, and Ananas. Don't want to bore anyone, nor get too deeply into this, however.... ONLY the 4P/dual-core machines were effected. The 8P/duals, 8P/singles, 4P/singles, 2P/duals, 2P/singles were not. Nor did we make big changes to both WU settings and reconnect time, only the WU setting. I'm sure you see the problem... the General and Rosetta settings are universal, but the ridiculous work amount was only sent to the 4P/dual machines. Hence they were the only ones where we aborted un-started work, so they were the ones that got their quota cut, and so their problem. Once we left them alone for 12 hours, they got new work. Through-out the episode, all the other machines were kept busy 100% of the time. BTW - I do understand why folks would like for our computers to be visible, but given the testing environment, really can't happen for NDA reasons. And it is exactly the specs and OS that can't be visible. |
|
3)
Message boards :
Number crunching :
out of work
(Message 26118)
Posted 5 Sep 2006 by SuperG //1.303.02% Post: Thanks to doc, dcdc, and Ananas. Your comments helped determine root causes. I'll convey what happened so others may benefit... 1) With 8core machines, set to 24hr work unit, and 2 days network connect, we wound up with 120 days (!!!) of work in the machine queue. This was true and consistent amongst all those 8core machines. Not realizing the consequences (newbies to Rosetta), we reset to lower cpu target time, and more frequent network connect. And then committed suicide by manually aborting the processes which had not yet started... 2) The result (predictable to those who knew) was the daily quota problem. Otherwise known as "pilot error." That would my fault. 3) Had tried "Reset project" but that did no good to changing the daily quota numbers. Considered "Detaching" from project, then re-attaching later, and merging stats at another time. Finally decided to let things settle down overnight and see how things were in the AM. 4) All is back to normal now, machines being fed work, and results happily sending back to Rosetta servers. Again thanks foks, you were a big help. |
|
4)
Message boards :
Number crunching :
out of work
(Message 26062)
Posted 5 Sep 2006 by SuperG //1.303.02% Post: Getting a "no work sent" message from the Rosetta servers, due to "reached daily quota." This may be a simple problem to fix, but hasn't been easy so far (looked in FAQ's, etc.). It only effects the faster (8 core) machines, all the others are getting and crunching jobs just fine. Any suggestions on how to get new jobs to these big idle machines?? [quote] |
|
5)
Message boards :
Number crunching :
out of work
(Message 25979)
Posted 4 Sep 2006 by SuperG //1.303.02% Post:
Thanks, Joel. Hope this gets more work done...the true objective. We only powered on current computers on Aug. 21. Gratifying two weeks. Will be good to see the actual Rosetta results with a testing node at 1/2 power. [128 cores vs. 32 now) And in few months w/quad-cores. [256 cores=full node] We remain open to more suggestions to optimize for actual work results... |
|
6)
Message boards :
Number crunching :
out of work
(Message 25969)
Posted 3 Sep 2006 by SuperG //1.303.02% Post: SG Tralala just posted what I was going to. But I wanted to also point you to the caution mentioned in the QA item on the WU runtime pref. Basically, don't change BOTH your WU runtime preference and General preference for connect every ...days at the same time nor in large steps. Tralala and Feet1st: Thank you, that is most helpful in getting a quick understanding of maximizing these computers for the purpose of the results to Rosetta. (ah, the beauty of community...) Your data suggests setting runtime to 1 day & reconnect to 2 days. If anyone has a better idea, I'd appreciate hearing it. Our compute environment: - Each computer = 2x/4x fast Opteron; dual-core; 1Gig memory/core - 4-12 terabytes of disk per computer; RAID5 - multi-T3 network, - 100% dedicated to Rosetta, as each computer is brought through testing. |
|
7)
Message boards :
Number crunching :
out of work
(Message 25955)
Posted 3 Sep 2006 by SuperG //1.303.02% Post: You'll find this feature in 'Your account' > Rosetta@home preferences > Target CPU run time. Thank for that. In trying to keep my computers from waiting for work, perhaps you can recommend a very short or very long interval to maximize work done? For context, the machines have lots of cpu power, terabytes of disk, T3+ networks, and are 100% testing/dedicated to Rosetta. Seem to return a result approx. every 2.5 hours and have 4-8 cores each. |
|
8)
Message boards :
Number crunching :
out of work
(Message 25928)
Posted 3 Sep 2006 by SuperG //1.303.02% Post: Now that CASP is over and the turnaround times are not so critically any longer I suggest to increase the work buffer again in order to have for a longer time work available even if the make_work_process dies. Although it was requested to increase the deadline again to 2 weeks (or 10 days). Hey Tralala -- Noticed you once mentioned a reason you liked Rosetta was "User setable length of Work units!" Unfortunately, I'm not finding where that setting is. In general preferences, or in boinc manager, or? |
|
9)
Message boards :
Number crunching :
Suggestions for Future Rosetta Science Apps
(Message 25767)
Posted 31 Aug 2006 by SuperG //1.303.02% Post: Since larger computer systems (or complexes) require ongoing maintenance and reboots, I was trying to find the option on the boinc manager to toggle auto-start for the app. There is that option on the initial boinc manager install, but not later. Anyone know if it's there, and handy? Otherwise will try: Thanks FluffyChicken. Topics have "fuzziness" to them, and it is a question that folks with lots of system power might find of interest. Your answer certainly works for Windows. Not sure if it helps with Linux or Mac, but probably gets them started at least. Keep "clucking" away, & thanks again. |
|
10)
Message boards :
Number crunching :
Suggestions for Future Rosetta Science Apps
(Message 25759)
Posted 31 Aug 2006 by SuperG //1.303.02% Post: Since larger computer systems (or complexes) require ongoing maintenance and reboots, I was trying to find the option on the boinc manager to toggle auto-start for the app. There is that option on the initial boinc manager install, but not later. Anyone know if it's there, and handy? Otherwise will try: - suspend boinc - reinstall boinc - accept autostart |
|
11)
Message boards :
Number crunching :
Bye bye Rosetta
(Message 25311)
Posted 28 Aug 2006 by SuperG //1.303.02% Post: Mod.DE Would contributors known as "BurnHard" and "John Gann" please contact me? Based on your postings, I suspect you will like an idea to put some of the childish things going on here to rest, and allow those who are serious and large contributors to remain for the right reasons. Our moderator has agreed to exchange emails amongst us, if both parties agree. He is at: 'rosettamod at gmail . com' |
|
12)
Message boards :
Number crunching :
Discussion of the new credit system
(Message 24790)
Posted 25 Aug 2006 by SuperG //1.303.02% Post: I'm new to this forum, it seems this is a major thorn for many people, and I know you are all gentle & patient folks [not], so I'll be naive and ask a silly question..... Unless there is a wide variance in the tasks downloaded, why can't the simplest possible scoring system be implemented? Something like: number of finished results (per) last seven days a) displays the real work for science being done; b) accounts for processor (& overclocked), OS, rosetta/boinc client; c) seven days is familiar; accounts for downtime, work-days Now, throw stones, suggest better, or let's call Occam. [/quote] No no, you're wrong, this new credit-system is as fair as possible, please don't criticise the new system. We are all glad with it :) Sorry,sorry, I go straight to my room, without diner :')[/quote] |
|
13)
Message boards :
Number crunching :
Discussion of the new credit system
(Message 24753)
Posted 24 Aug 2006 by SuperG //1.303.02% Post:
Humorous too. Looks like the message boards need a bit more of that. Question: as a relatively new contributor, I've not been able to see if the (new) RAC is calculated over a thirty day period, or some other timeframe. Anyone know this off the tip of their tongue? |
©2025 University of Washington
https://www.bakerlab.org