Posts by Webmaster Yoda

21) Message boards : Number crunching : No Work (Message 7572)
Posted 25 Dec 2005 by Profile Webmaster Yoda
Post:
Maybe I am not following what you suggest, but I thought I was - I am still getting the maessage
"|rosetta@home|Message from server: (there was work but you don't have enough disk space allocated)"
BUT I have my settings in general preferences as below.


Those settings seem to be generous enough. Have you checked how much hard disk space is actually free and how much is already used by BOINC (check it in Windows Explorer)?

I recall having this same problem some time ago when I had a few old (finished) "Climate Prediction" work units taking up so much space that BOINC had used more space than I had allowed (even though there was plenty of free space). Once I deleted those old work units, all was well again.
22) Message boards : Number crunching : Aborted WUs due to (Message 7511)
Posted 24 Dec 2005 by Profile Webmaster Yoda
Post:
Thank you for the info. I had looked through most of them before posting.


You're welcome.

It was not obvious from what you wrote that you had actually looked at those threads (you made no reference to them). I tried to help by guiding you there as the symptoms are the same. At no time did I say anything about your level of knowledge.

Yes, many crunchers have been hit by those same problems and things are gradually improving. After the initial rush where something like 90% of work units failed, I have only seen a few in the last 12 hours. I expect to see a few more of them before they eventually disappear.

Merry Christmas!
23) Message boards : Number crunching : Does overclocking adversley affect rosetta? (Message 7504)
Posted 24 Dec 2005 by Profile Webmaster Yoda
Post:
The 3700+ "San Diego" was the one I wound up with, and I have zero complaints.


I'll second that. They have 1MB L2 Cache and can be overclocked to 2.5GHz or more without much effort. I have one running at 2.7GHz with stock heatsink and fan - halfway between an FX-55 and FX-57

If the 3700+ is beyond your reach, look for a cheaper Socket 939 CPU. That way you still have an upgrade path to an Athlon X2 later.

If you want cheaper than that, maybe look at the Socket 754 Sempron 2800+ or 3100+
24) Message boards : Number crunching : Aborted WUs due to (Message 7499)
Posted 24 Dec 2005 by Profile Webmaster Yoda
Post:
Since December 21 I have had 42 WUs aborted (within 18 - 24 seconds after starting crunching)


See Please abort WUs with
And Computation Error
And technical news
25) Message boards : Number crunching : Does overclocking adversley affect rosetta? (Message 7487)
Posted 24 Dec 2005 by Profile Webmaster Yoda
Post:
I am new at Rosetta and have seen some mention about overclocking affecting the credits for the work done.


I don't recall seeing it mentioned but two of my AMD CPUs are overclocked and are doing fine. It's a matter of finding the right balance - I have had some problems with work units crashing when I took the overclocking too far.
26) Message boards : Number crunching : Please abort WUs with (Message 7477)
Posted 24 Dec 2005 by Profile Webmaster Yoda
Post:
Q5)Not mentioning the time it took me to download'em, I have in a "Reday to Run" status, 14 Default_xxxx_219_xxxx_x WUs, 9 Default_xxxx_218_xxxx_x and 1 Default_xxxx_221_xxxx_x.

Should I "abort"?


See the very first message in this thread:
'please ABORT any WUs whose names start with "DEFAULT_....._205_...." '

The ones you mention are not the 205 batch so don't need to be aborted.

27) Message boards : Number crunching : Computation Error (Message 7463)
Posted 24 Dec 2005 by Profile Webmaster Yoda
Post:
We live in an imperfect world where not everything runs smoothly. That's life.

Yes, Rosetta has had problems and is still recovering from it. So have other projects (and I've participated in a few). Some of those took a lot longer to recover.

You're not the only one who has had problems with this batch of bad work units. I've had dozens, if not hundreds of them myself. I decided to not download any new Rosetta work for a few days and reduce my resource share temporarily. This problem will take some time to clear up but signs are that it is getting there.

Most of the problems were in jobs from batches 204 to 207. If you are having problems with other batches, perhaps the problem is elsewhere, e.g. hardware problems.

And you have been given straight answers.

So the fact that I have the same problem with 6 machines, 3 different OS's, Cable internet. The best solution is work another project?


Yes. That's the beauty of BOINC.

Thats 2 members of the same team with CPU's working at 100% for nothing.....


More like thousands of members of hundreds of teams having CPUs working in short bursts on work units that have a problem. It pales into insignificance with problems I have experienced on other projects.

Sounds like my team will have to see about a new project if that keeps up.


That's your choice and I have done so myself with some projects, after giving them time to sort out their problems. Let us know when you find a project that runs perfectly - I am yet to find one.

Edit: Some of your machines also fall below the recommend specs to run Rosetta. For instance, one of your machines has only 128MB RAM and a number of them are runnng Windows ME, which is not officially supported.
28) Message boards : Number crunching : How to have the best BOINC project. (Message 7380)
Posted 23 Dec 2005 by Profile Webmaster Yoda
Post:
I took a quick glance at hugothehermit's faq, and it looks pretty good. I think we should have 2 FAQs: a technical faq and a science faq, what do you think?


Hi Vanita

I agree. Some people want to know how to run BOINC/Rosetta. Others want to know what it actually does (and I guess many will want both) I tend to look mostly at the number crunching side of things, but do stop in at the science forum from time to time as I do like to know what I'm crunching.

Having an FAQ for the science (and possibly a series of ongoing, regular articles about what we're researching and achieving) would be good.

On the other matter - I agree with the others. Pointing people to a previously provided answer (or FAQ when we have it) is helpful, rather than rude.
29) Message boards : Number crunching : Report stuck work units here (Message 7355)
Posted 23 Dec 2005 by Profile Webmaster Yoda
Post:
When the Rosetta cumputation reaches over 10 minutes 20% and the timer shifts to Seti then comes back it will drop the comp time to less than 10 minutes


Sounds like the usual problem of not keeping work units in memory, in this case combined with very short times between switches (if we had an FAQ, this would probably be at the top of the list)

Check your preferences - to run Rosetta alongside other projects, you need to set "Leave applications in memory while preempted?" to yes or you will likely never finish a work unit.

I'd also change the setting for "Switch between applications every" to at least the recommended 60 minutes, but if you don't keep the work in memory, you will lose work done (back to 10%, 20%, 30% or whichever percentage it was at before the switch) every time you switch.

Of course, there is also the problem with a bad batch of work units, mostly in batches 204 to 207. They will crash soon after starting - nothing you can do about that.
30) Questions and Answers : Web site : Delete old, inactive hosts (Message 7337)
Posted 23 Dec 2005 by Profile Webmaster Yoda
Post:
You don't seem to have any computers that have zero credit on them, did you fix this and not tell us about it ?


Some time after I posted the message, the old work units that were stopping me from removing inactive hosts were deleted (by the project). I was then able to remove the inactive host(s).

So it looks like the problem was indeed fixed.
31) Message boards : Number crunching : How to have the best BOINC project. (Message 7335)
Posted 23 Dec 2005 by Profile Webmaster Yoda
Post:
I do think there must be better ways to arrange the message boards.


There probably is :-) But some people won't take any notice, they will post anything anywhere (questions about running BOINC in the Rosetta@home science board for instance), without looking if it's already been answered elsewhere - be that another thread or an FAQ.

One thing that may reduce the clutter a little would be to make the link to create a new thread harder to find... Not meaning hide it, but rather than just having it sit there on its own, clearly visible, put something like this at the top of the message boards:

Before creating a new thread
Your question may already have been answered in an earlier thread below. If not, try the Site Search and FAQ* before starting a new thread. Staff and volunteers give freely of their time to answer questions, but it gets a bit tedious to answer the same question over and over again. Please [create a new thread] only if you cannot find an answer through the site search or looking at recent threads.

* That is, once we have an FAQ/knowledge base. Hugothehermit was working on one but got called away for family reasons. Perhaps his work could be tied in to what Jack mentioned. The FAQ/knowledge base should be static, i.e. no postings from people other than those assigned to maintain it...

As far as a wiki goes... Paul is doing a great job with his unofficial BOINC Wiki but (and please don't take this personally as it's not meant that way) I have difficulty finding anything in there. Information overload type of thing.
32) Message boards : Number crunching : Please abort WUs with (Message 7191)
Posted 22 Dec 2005 by Profile Webmaster Yoda
Post:
The new defalts are takeing way to long


Rosetta work units vary in length and I can't see how longer work units are a problem other than if you like to process lots of tiny work units.

You should still get the equivalent amount of credit for valid results since it's based on the amount of CPU time spent. Whether that's 24 hours for 2 work units or 24 hours for 50 work units makes no difference. Of course, if you abort work units after they have run for a while, you get no credit for them.
33) Message boards : Number crunching : better utilization with multiple CPUs (Boinc 5.2.13 vs 8 CPUs) (Message 7164)
Posted 22 Dec 2005 by Profile Webmaster Yoda
Post:
Ahh, yes. There was such a setting :)


Glad I could help. I'd feel even better if I had a server like that but unless I win the Lotto, it's not gonna happen :-)
34) Message boards : Number crunching : Report stuck work units here (Message 7146)
Posted 22 Dec 2005 by Profile Webmaster Yoda
Post:
It's message 6479 and a few others that have the long command
line wrapped in a <pre> element, which means the formatting
will be preserved.

Remove the <pre> and </pre> from those posts (or insert some
line breaks) and they should wrap.


35) Message boards : Number crunching : better utilization with multiple CPUs (Boinc 5.2.13 vs 8 CPUs) (Message 7145)
Posted 22 Dec 2005 by Profile Webmaster Yoda
Post:
I can only get 25% of CPU utilization for my computer. Client is the latest Boinc, 5.2.13, running on Windows 2003 Server Enterprise x64. Server has 4 dualcore- Intel Pentium IV Xeon. (8 processors.)


What settings do you have in preferences for the following?

On multiprocessors, use at most [ how many ] processors?

Given you mention 25% CPU use, it sounds like that may be set to only 2, in which case that's all BOINC will use. If so, change it to 8, do a manual update of Rosetta in BOINC and see if the other 6 CPU's kick in.
36) Message boards : Number crunching : Please abort WUs with (Message 7025)
Posted 21 Dec 2005 by Profile Webmaster Yoda
Post:
Over 90% now failing.....gonna suspend :-(


Ditto. I'm not going to donate more of my bandwidth to Rosetta until these bad work units are history. At least 90% are crashing.
37) Message boards : Number crunching : Please abort WUs with (Message 7000)
Posted 21 Dec 2005 by Profile Webmaster Yoda
Post:
The "short" failures shouldn't add up to more than a minute or two on average for everyone


No problem with that other than the wasted bandwidth of downloading them.

But it looks like the admins haven't changed the settings for "max # of error results" - WUs that have already crashed on more than one system in rapid succession are still being sent out (e.g. WU 3821321). Waste of bandwidth for the project too.

Suggest the admins change that setting as soon as they get to work in the morning.

Unless of course the fixes/workarounds that Jack referred to mean those WUs will still be able to be done to completion by the next person to download it.

EDIT: I don't see any evidence of that - I get about 3 WU at a time and most error out. I get lucky about 1 in every 10 WU.
38) Message boards : Number crunching : Please abort WUs with (Message 6981)
Posted 21 Dec 2005 by Profile Webmaster Yoda
Post:
All is stopping after 10 minutes


Like Bill, I have not seen any WUs with names that start with "DEFAULT_xxxxx_207_" (replacing xxxxx with a number). However, I have had a relatively high number of work units in the 204 and 207 batch crash after a few minutes. Some have been OK.

I will persist with Rosetta for the day but if it gets out of hand I will suspend Rosetta on most of my computers (keeping an eye on the remaining one). The (minimal) time taken and lack of credit is not a major concern, but why waste bandwidth to download work units that are going to crash.

EDIT 1:
I just had three in a row crash, minutes after posting this message. All in the 207 batch:

1ogw__topology_sample_207_10103_1
1hz6A_topology_sample_207_7644_1
1ogw__topology_sample_207_14401_0

All with error 0xC00000005, in a matter of 10-30 seconds

EDIT 2:
4 more crashed, on two computers, since writing the above (a few minutes ago).
Two of them were batch 208, two of them batch 207

EDIT 3:
Minutes later, another 3.

It is getting out of hand - Rosetta has been set to "no new work" on all my computers pending a fix.
39) Message boards : Number crunching : Question for developers - Does the New Versions on the 20Th have stuck at 1% fix? (Message 6876)
Posted 20 Dec 2005 by Profile Webmaster Yoda
Post:
I had to try it... the first one


Me too. One is still running OK after 20 minutes (1n0u__topology_sample_204_14869_0). The other (1ogw__topology_sample_204_3923_4) errored out in less than 20 seconds (and had crashed on 4 other machines before mine). Both on Rosetta 4.81.

Observation: the error number (0xc0000005) is the same as occurs when switching Rosetta out of memory.

[EDIT]Either we have a bad batch of work units or the new app is broken[/EDIT]
40) Message boards : Number crunching : Dual Xeons (Message 6825)
Posted 20 Dec 2005 by Profile Webmaster Yoda
Post:
Do you think it would be ok to run 4 instances of Rosetta with only 512 megs of ram? Thanks!


Figures vary, but it's not unusual for Rosetta to use 160MB per WU. The system would need 640MB plus whatever the O/S needs.

With only 512MB, there will probably be so much swapping to disk that it's not worth running with HT on. If it had 1GB, it might be OK.


Previous 20 · Next 20



©2024 University of Washington
https://www.bakerlab.org