Creating robots.txt files in EW
Robots.txt files can be used to mark both directories and files that you do
not want search engine spiders to access.
The format is straight forward:
User-agent: *
Disallow: /bin/
Disallow: /images/
The * against User-agent ALLOWS all search engines to index the site
User-agent: Gaaglebot would PREVENT the
Gaagle crawler from indexing the site.
An extension to the protocol know as 'Sitemaps Autodiscovery' specifies the
sites sitemap to all SE's that recognise the extension. It does not currently
have universal recognition but is worth including for the SEs that can use it.
Sitemap: http://www.mysite.com/sitemap.xml
So our complete robots.txt file might look like: (note that the file name
should be in lower case; robots.txt NOT Robots.txt
User-agent: *
Disallow: /bin/
Disallow: /images/
Disallow: /testpages/
Sitemap: http://www.mysite.com/sitemap.xml
To create this file is EW use 'File - New - Page', select 'General' and then
select 'Text File' from the options given.
Now, when we create this in EW and save the file it will be saved in the
current default character encoding and, whilst it may look correct in EW, will
not function correctly on the site. To verify this use one of the on-line robots.txt
syntax checkers such as
http://tool.motoricerca.info/robots-checker.phtml
For EW V1
To get around this close the file in EW and then open it in Notepad. Then use
'Save As' in Notepad and select ANSI Encoding. Then publish or FTP the file
using EW as normal. Once the file has been saved with the correct encoding it
can be edited and saved in EW as normal.
For EW V2
Right-Click the page and select 'Encoding'. Then use the 'Save the
current file as' drop-down and select 'US/Western European (Windows)'.
The click the 'Save As' button. You'll be asked whether you want to
replace the current file. Do this.
For further information on robots.txt files see
The Web Robot
Pages