Changes to robots.txt to avoid Boinc DB overhead by bots

Message boards : Number crunching : Changes to robots.txt to avoid Boinc DB overhead by bots

To post messages, you must log in.

AuthorMessage
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 19293 - Posted: 25 Jun 2006, 21:46:50 UTC
Last modified: 25 Jun 2006, 21:47:13 UTC

I was reading over SIMAP about excluding bots from accessing BOINC dbs (for performance reasons) -here- and checking R's robots.txt noticed that it needs to be changed by adding the "/rosetta/" path to the URLs excluded from bot (Googlebot, Yahoo Slurp etc) visits.

ie.
https://boinc.bakerlab.org/rosetta/robots.txt

User-agent: *
Disallow: /account
Disallow: /add_venue
Disallow: /am_
Disallow: /bug_report
Disallow: /edit_
Disallow: /host_
Disallow: /prefs_
Disallow: /result
Disallow: /team
Disallow: /workunit

should be:

User-agent: *
Disallow: /rosetta/account
Disallow: /rosetta/add_venue
Disallow: /rosetta/am_
Disallow: /rosetta/bug_report
...
etc

as the default examples are relevant for project URLs which don't include a path.
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 19293 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Changes to robots.txt to avoid Boinc DB overhead by bots



©2024 University of Washington
https://www.bakerlab.org