Posts by Dave Mickey

1) Message boards : Number crunching : These 7 files will not upload. (Message 72183)
Posted 20 Jan 2012 by Dave Mickey
Post:
I'm no expert, so that's well beyond me. The base function of DNS is, when you say go to yahoo.com, it goes and retrieves the 1.2.3.4 number that is really yahoo.com. DNS does not do data transfers or make connections. It's just the phone book.

Oh, you want Dave? His phone number is 555.5555.

Then it's up to you to dial the phone and see if I'm home or not. The phone book is just laying there on the table. So certainly somebody could code something in their application to retry based on something, but not via raw DNS.

The other way works perfectly fine, tho. Fill your hosts file with all the entries you can think of, assigning one address to many different host names.

Dave.com
robertmiles.com
rosetta.org
microsoft.com
whitehouse.gov

could all point to one physical IP address, in your hosts file.

In fact, corrupting hosts is a good virus/malware attack - if I can edit your hosts, I can make update.microsoft.com point at my server - the system consults the local host file FIRST (before checking online DNS), and does what it says. That's part of why the rosetta thing worked. The hosts file is considered authoritative, by the machine it's on.

But in the end, DNS just looks up the phone number, not much more. It is a huge system of many phone books all talking to each other and updating and caching and uncaching constantly, and the global economy relies on it, but still, phone books.

fyi

Dave

ps that's one of the silly things about that SOPA legislation, (US concept here) trying to have govt control of DNS to prevent piracy or bad manners, or something. If I can get all the IP addresses I need, I can do all the surfing I want just by putting raw IPs in the URL. In theory, you can live without DNS at all. So how much control could they really expect, in the face of determined hackers, etc. Want to try it? Surf to

http://128.95.160.140/rosetta/forum_forum.php?id=2

to see this forum, with no help from DNS (at least, no help in getting to the top level pages - I know there's lots of host names in the page, but you could replace them all with numbers). Watch out, you might be a hacker pirate!

/soapbox off /

WOW. don't know where I got all this energy....!
2) Message boards : Number crunching : These 7 files will not upload. (Message 72063)
Posted 10 Jan 2012 by Dave Mickey
Post:
I think, Win7 (and maybe vista too) have configurable setting(s) for just how intense the user protection is (UAC, or some such), and systems may have a settable amount of strictness is in force. Thus I think all of these may be true, given your system config. Don't ask me how to change it, but I think it's out there somewhere.

Dave
3) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 72059)
Posted 10 Jan 2012 by Dave Mickey
Post:
I'm no help on a Mac, but the hidden suggestion just might be true, if they want to keep us amateurs from borking it. Aren't macs unix=based these days? I think the hosts file was an original unix thing since thats where most all the early network protocols (like dns) came from. So first asssumption is that it would be "hosts". Or maybe mac just does it all their own way.....

If there is one under another name, you might find it by searching in file content for

"127.0.0.1"

or

"localhost"

which seem to be a standard entry in "hosts" files.

BTW, many units did not have the srv6 problem, so yes, many of yours probably work fine. Key symptom is "can't resolve hostname" in the boinc output screen.

Dave
4) Message boards : Number crunching : These 7 files will not upload. (Message 72058)
Posted 10 Jan 2012 by Dave Mickey
Post:
Update from here:

It's still doing it - I just had to fake out hosts once more for a unit that couldn't resolve host name, and that unit was created less than 24 hours ago. So you might want to keep that mod in hosts, or maybe leave it there but comment it out (by putting the # character as the first character on the line: that way you can turn it back on real easy in the future, in need be.

# i am a comment
# and so is the next line
# 128.95.160.145 srv6.bakerlab.org

To make the file work, take away the # in front of 128.......

Oh, and it so happens I have some ubuntu. In my ubu systems hosts is found as

/etc/hosts

and I make the presumption that this would be standard on most any unix flavored host, but am not sure. Look there first. I also had some ubuntu units stuck, and this boosted them along just as Winders did.

And sorry, I log in as admin, so permissions are usually not an issue for me, but could be for many folks.

For those who missed it, my orginal suggestion is in my post of " Posted 8 Jan 2012 1:30:23 UTC ", in this thread and in the top pinned thread in Number Crunching, too.

Dave



5) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 72056)
Posted 10 Jan 2012 by Dave Mickey
Post:
Update from here:

It's still doing it - I just had to fake out hosts once more for a unit that couldn't resolve host name, and that unit was created less than 24 hours ago. So you might want to keep that mod in hosts, or maybe leave it the but comment it out (by putting the # character as the first character on the line:

# i am a comment
# and so is the next line
# 128.95.160.145 srv6.bakerlab.org

Oh, and it so happens I have some ubuntu. In my ubu systems hosts is found as

/etc/hosts

and I make the presumption that this would be standard on most any unix flavored host, but am not sure. Look there first.

Dave

Greetings, Tyrant - my wife loves Fernando Alonso - I would love all the $$ he makes......

@CBSX01 - was your error message "can't resolve hostname"? Did you try tweaking "hosts"?
just curious if yours was this, or something else.....

6) Message boards : Number crunching : These 7 files will not upload. (Message 72006)
Posted 8 Jan 2012 by Dave Mickey
Post:

I think that you are asking for validate errors with this method because it is quite possible that the server that a work unit is assigned to is the only one that can validate it.
.....
To do other things risks corrupting files which potentially effects your whole boat of tasks.

Aborting the transfer will be throwing away the work you've done, and the credit you've earned for that work.

...having said that, the suggestion below to hit an alternate upload server should be processed normally if you are comfortable achieving the redirection via the hosts file, etc.




I looked for, and found my 13 holdouts, all reported at just about 1:00 UTC, and they all are "Over, Success, Done, and granted credit". So maybe I dodged a bullet, but I would guess that a robust parallel system like boinc would not have fragile path such as work having to go back into one and only one IP address. But I can't claim any expertise, just luck, I guess. But, from ModSenses comment, there is no "etc", it was just mod the host file. Period.

typical result record:


473457597 431993511 28 Dec 2011 17:41:51 UTC 8 Jan 2012 0:57:37 UTC Over Success Done 27,876.58 198.47 154.33


YMMV, I guess, but no sign of trouble here.

Dave
7) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 72005)
Posted 8 Jan 2012 by Dave Mickey
Post:

I think that you are asking for validate errors with this method because it is quite possible that the server that a work unit is assigned to is the only one that can validate it.
.....
To do other things risks corrupting files which potentially effects your whole boat of tasks.

Aborting the transfer will be throwing away the work you've done, and the credit you've earned for that work.

...having said that, the suggestion below to hit an alternate upload server should be processed normally if you are comfortable achieving the redirection via the hosts file, etc.




I looked for, and found my 13 holdouts, all reported at just about 1:00 UTC, and they all are "Over, Success, Done, and granted credit". So maybe I dodged a bullet, but I would guess that a robust parallel system like boinc would not have fragile path such as work having to go back into one and only one IP address. But I can't claim any expertise, just luck, I guess. But, from ModSenses comment, there is no "etc", it was just mod the host file. Period.

typical result record:


473457597 431993511 28 Dec 2011 17:41:51 UTC 8 Jan 2012 0:57:37 UTC Over Success Done 27,876.58 198.47 154.33


YMMV, I guess, but no sign of trouble here.

Dave
8) Message boards : Number crunching : These 7 files will not upload. (Message 71993)
Posted 8 Jan 2012 by Dave Mickey
Post:
Where it is might be variable based on your Windows rev, but

I have windows 7, and mine is found in the directory named:

c:windowssystem32driversetc

the filename is

hosts

with no extension like .txt or .bin or anything. It is plain text, so you can open it with notepad, or any simple text editor.

In a search or find, look for name hosts.

put the line I quoted before, by itself on the last line of the file, not disturbing any other lines. If you want, you can put a comment line above your new line for later explanation, like:

# this entry is to fix a problem with rosetta
128.95.160.145 srv6.bakerlab.org

There is one or more spaces or tabs between the .145 and srv6, and finish the line with a RETURN

Save the file.

Now, go restart the upload of the stuck file(s). If your problem is the same as mine, they will work now.

Dave

If you're nervous about fooling with the file, copy it before editing, and then you can easily put it back the way you found it.

9) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 71991)
Posted 8 Jan 2012 by Dave Mickey
Post:
Apparently, yes, there is. put this line in your "hosts" file - somewhere under windows, just "hosts", with no extension:

128.95.160.145 srv6.bakerlab.org

just by itself. Then, requests to srv6 will go to srv4. All mine are gone now......


Dave
10) Message boards : Number crunching : These 7 files will not upload. (Message 71990)
Posted 8 Jan 2012 by Dave Mickey
Post:
Apparently, yes, there is. put this line in your "hosts" file - somewhere under windows, just "hosts", with no extension:

128.95.160.145 srv6.bakerlab.org

just by itself. Then, requests to srv6 will go to srv4. All mine are gone now......


Dave
11) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 71989)
Posted 8 Jan 2012 by Dave Mickey
Post:
So, some IP expert - will a temporary hosts file entry let us point at srv6 and have it actually go to srv4 (I mean, I know how to do that, but would it work?)

Dave
12) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 71988)
Posted 8 Jan 2012 by Dave Mickey
Post:
I collected a little more data over in the "7 files" thread. The IP that my cache thought was srv6 is now an opendns host.

Successfull uploads request service from srv4.

Dave

SORRY, double post, the forum page was not responding for a couple minutes.
13) Message boards : Number crunching : Problems and Technical Issues with Rosetta@home (Message 71986)
Posted 8 Jan 2012 by Dave Mickey
Post:
I collected a little more data over in the "7 files" thread. The IP that my cache thought was srv6 is now an opendns host.

Successfull uploads request service from srv4.

Dave
14) Message boards : Number crunching : These 7 files will not upload. (Message 71985)
Posted 8 Jan 2012 by Dave Mickey
Post:
very interesting. when I turn on that debug, I see it using the same url as Holmis. And when I went to ping it, it returned as

C:Usersdwmickey>ping srv6.bakerlab.org

Pinging srv6.bakerlab.org [67.215.65.132] with 32 bytes of data:
Reply from 67.215.65.132: bytes=32 time=36ms TTL=53
Reply from 67.215.65.132: bytes=32 time=34ms TTL=53
Reply from 67.215.65.132: bytes=32 time=34ms TTL=53
Reply from 67.215.65.132: bytes=32 time=33ms TTL=53

Ping statistics for 67.215.65.132:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 33ms, Maximum = 36ms, Average = 34ms


BUT THEN, I did a flushdns via ipconfig, and it no longer returned. Now, I see that the IP that was returning above is now, really,

C:Usersdwmickey>tracert 67.215.65.132

Tracing route to hit-nxdomain.opendns.com [67.215.65.132]
over a maximum of 30 hops:


It looks like SRV6 is no more, in dns land. and if you look at the rah server status page, it looks like it should all be going thru srv4.

And indeed, looking at the file debug for uploads that work then are asking to go to :

/////////////////////////////////////////////////////
07-Jan-2012 16:50:59 [rosetta@home] Started upload of _11_29__optpps_T6161_optpps_03_09_35686_140788_0_0
07-Jan-2012 16:50:59 [rosetta@home] [file_xfer_debug] URL: http://srv4.bakerlab.org/rosetta_cgi/file_upload_handler
//////////////////////////////////////////////////////


and that works.......

So what's to become of these units, apparently cast aside by the server reconfig......??? They want to phone home to srv6, but alas, there is none!

Dave


15) Message boards : Number crunching : These 7 files will not upload. (Message 71980)
Posted 7 Jan 2012 by Dave Mickey
Post:
I think I conclude what the OP and Jesse did - there is a set of 13 results waiting to upload, and they always fail with can't resolve. Subsequent WU's process, upload, and report, and these 13 are stuck in the past, unable to move on.

Does a rah wu upload file contain something that would maybe cause it to try to go to an obsolete host name, after the reconfig got done at UW? Like are they coded with a host name that now does not exist, or is not in DNS servers any where?

Does anyone know if there is a CC debug switch we could turn on to see exactly what host name the failed units are attempting to use?

Dave
16) Message boards : Number crunching : Cant Upload (Message 69084)
Posted 9 Jan 2011 by Dave Mickey
Post:
I'm not very hopeful for the "fix itself" plan either. With 50 wu retrying on avg every 2 hours, that's 25 attempts per hour. At even a 1% sucess rate (if the problem is just capacity overload) by virtue of random chance, that's 1 upload success in 4 hours, avg. I've seen 0 success total for a couple of days now. That's not even .1% success. Just doesn't seem likely - seems like a brick wall. The failure comes right back in 3 seconds, so it's not like somebody is too busy to respond or timing out. It seems like we're really contacting the scheduler and he says "I don't know how to do upload, go away..."

Anybody *really* know what the file upload handler is? I assume it's a process on the scheduler server that takes in upload files, but I'm just guessing. Any chance it is some sort of reference contained in the upload files (the wu) itself? I'm not yet at the point of deleting all this work to find out, so it's wait and see (even if it is self fixing, or otherwise).

oh well.....

Dave
17) Message boards : Number crunching : Project File Upload Handler Is Missing (Message 69056)
Posted 9 Jan 2011 by Dave Mickey
Post:
Well, still 100% failure on upload attempts. If I recall, boinc does not attempt any work request while uploads are pending, thus most of my machines are now stuck with no r@h work until this thing gets fixed......

ah well

Dave
18) Message boards : Number crunching : Project File Upload Handler Is Missing (Message 69048)
Posted 8 Jan 2011 by Dave Mickey
Post:
Over here, I've found that unit reporting has started (even finished for me), however unit upload is still met immediately by "File Upload handler is missing". hmmph. Hopefully that's just a sign of overload, and will wear off as things progress. Hopefully. But to this point, I haven't even had one upload sneak thru, of about 50 that are retrying.

Dave
19) Message boards : Number crunching : No XML updates to BOINCstats for two days!? (Message 68942)
Posted 30 Dec 2010 by Dave Mickey
Post:
I noticed this also. Is there a system problem somewhere that's preventing the stats from being made available? r@h is always so reliable, this is odd. But, given how some other projects run, can't complain too much.

I have recently been prioritizing r@h and have been hoping to see results in boincstats, but without the stats export, it's just not as fun to watch.

Dave
20) Message boards : Number crunching : BOINC requesting GPU work from rah - why? (Message 68865)
Posted 24 Dec 2010 by Dave Mickey
Post:
So after checking back in on this thread, I see that setting this in the cc config does indeed make it not request GPU tasks. But what if, in one machine, I wanted s@h to use the GPU, and wanted r@h to not waste time requesting GPU work?

Is there a way to differentiate based on project within one host?

thx

Dave


Next 20



©2024 University of Washington
https://www.bakerlab.org