Tuesday, March 06, 2007
Recently, Danny Sullivan brought up good questions about how
search engines handle meta
tags.
Here are some answers about how we handle these tags at Google.
Multiple content values
We recommend that you place all content values in one meta
tag. This keeps the meta
tags easy to
read and reduces the chance for conflicts. For instance:
<meta name="robots" content="noindex, nofollow" />
If the page contains multiple meta
tags of the same type, we will aggregate the content values.
For instance, we will interpret
<meta name="robots" content="noindex" /> <meta name="robots" content="nofollow" />
the same way as:
<meta name="robots" content="noindex, nofollow" />
If content values conflict, we will use the most restrictive. So, if the page has these meta
tags:
<meta name="robots" content="noindex" /> <meta name="robots" content="index" />
We will obey the noindex
value.
Unnecessary content values
By default, Googlebot will index a page and follow links to it. So there's no need to tag pages
with content values of index
or follow
.
Directing a robots meta
tag specifically at Googlebot
To provide instruction for all search engines, set the meta name to robots
. To
provide instruction for only Googlebot, set the meta name to Googlebot
. If you want
to provide different instructions for different search engines (for instance, if you want one
search engine to index a page, but not another), it's best to use a specific meta
tag for each
search engine rather than use a generic robots meta
tag combined with a specific one. You can find
a list of bots at robotstxt.org.
Casing and spacing
Googlebot understands any combination of lowercase and uppercase. So each of these meta
tags is
interpreted in exactly the same way:
<meta name="robots" content="noodp" /> <meta name="Robots" content="noodp" /> <meta name="Robots" content="NoOdp" />
If you have multiple content values, you must place a comma between them, but it doesn't matter if
you also include spaces. So the following meta
tags are interpreted the same way:
<meta name="robots" content="noindex, nofollow" /> <meta name="robots" content="noindex,nofollow" />
If you use both a robots.txt file and robots meta
tags
If the robots.txt and meta
tag instructions for a page conflict, Googlebot follows the most
restrictive. More specifically:
-
If you block a page with robots.txt, Googlebot will never crawl the page and will never read any
meta
tags on the page. -
If you allow a page with robots.txt but block it from being indexed using a
meta
tag, Googlebot will access the page, read themeta
tag, and subsequently not index it.
Valid meta robots content values
Googlebot interprets the following robots meta
tag values:
noindex
: prevents the page from being included in the index.-
nofollow
: prevents Googlebot from following any links on the page. (Note that this is different from the link-levelnofollow
attribute, which prevents Googlebot from following an individual link.) -
noarchive
: prevents a cached copy of this page from being available in the search results. -
nosnippet
: prevents a description from appearing below the page in the search results, as well as prevents caching of the page. -
noodp
: blocks the Open Directory Project description of the page from being used in the description that appears below the page in the search results. none
: equivalent tonoindex, nofollow
.
A word about content value none
As defined by
robotstxt.org, the
following direction means noindex, nofollow
.
<meta name="robots" content="none" />
However, some webmasters use this tag to indicate no robots restrictions and inadvertently block all search engines from their content.