Tuesday, September 30, 2008
Running a website can be complicated—so we've provided Google Webmaster Tools to help webmasters to recognize potential issues before they become real problems. Some of the issues that you can spot there are relatively small (such as having duplicate titles and descriptions), other issues can be bigger (such as your website not being reachable). While Google Webmaster Tools can't tell you exactly what you need to change, it can help you to recognize that there could be a problem that needs to be addressed.
Let's take a look at a few examples that we ran across in the Google Webmaster Help Groups:
Is your server treating Googlebot like a normal visitor?
While Googlebot tries to act like a normal user, some servers may get confused and react in
strange ways. For example, although your server may work flawlessly most of the time, some
servers running IIS may react with a server error (or some other action that is tied to a server
error occurring) when visited by a user with Googlebot's user-agent. In the Webmaster Help Group,
we've seen IIS servers return
result code 500
(Server error)
and
result code 404
(File not found)
in the "Web crawl" diagnostics section, as well as
result code 302
when submitting Sitemap files.
If your server is redirecting to an error page, you should make sure that we can crawl the error
page and that it returns the proper result code. Once you've done that, we'll be able to show you
these errors in Webmaster Tools as well. For more information about this issue and possible
resolutions, please see
https://todotnet.com/archive/0001/01/01/7472.aspx
and
https://www.kowitz.net/archive/2006/12/11/asp.net-2.0-mozilla-browser-detection-hole.aspx.
If your website is hosted on a Microsoft IIS server, also keep in mind that
URLs are case-sensitive
by definition (and that's how we treat them). This includes URLs in the
robots.txt file, which is something
that you should be careful with if your server is using URLs in a non-case-sensitive way. For
example, disallow: /paris
will block /paris
but not /Paris
.
Does your website have systematically broken links somewhere?
Modern content management systems (CMS) can make it easy to create issues that affect a large
number of pages. Sometimes these issues are straightforward and visible when you view the pages;
sometimes they're a bit harder to spot on your own. If an issue like this creates a large number
of broken links, they will generally show up in the "Web crawl" diagnostics section in your
Webmaster Tools account (provided those broken URLs return a proper 404
result
code). In one recent case, a site had a
small encoding issue in its RSS feed,
resulting in over 60,000 bad URLs being found and listed in their Webmaster Tools account. As you
can imagine, we would have preferred to spend time crawling content instead of these
404
errors
:).
Is your website redirecting some users elsewhere?
For some websites, it can make sense to concentrate on a group of users in a certain geographic location. One method of doing that can be to redirect users located elsewhere to a different page. However, keep in mind that Googlebot might not be crawling from within your target area, so it might be redirected as well. This could mean that Googlebot will not be able to access your home page. If that happens, it's likely that Webmaster Tools will run into problems when it tries to confirm the verification code on your site, resulting in your site becoming unverified. This is not the only reason for a site becoming unverified, but if you notice this on a regular basis, it would be a good idea to investigate. On this subject, always make sure that Googlebot is treated the same way as other users from that location, otherwise that might be seen as cloaking.
Is your server unreachable when we try to crawl?
It can happen to the best of sites—servers can go down and firewalls can be overly protective. If that happens when Googlebot tries to access your site, we won't be able crawl the website and you might not even know that we tried. Luckily, we keep track of these issues and you can spot "Network unreachable" and "robots.txt unreachable" errors in your Webmaster Tools account when we can't reach your site.
Has your website been hacked?
Hackers sometimes add strange, off-topic hidden content and links to questionable pages. If it's
hidden, you might not even notice it right away; but nonetheless, it can be a big problem. While
the Message Center may be able to give you a warning about some kinds of hidden text, it's best
if you also keep an eye out yourself. Google Webmaster Tools can
show you keywords from your pages in the "What Googlebot sees" section,
so you can often spot a hack there. If you see totally irrelevant keywords, it would be a good
idea to investigate what's going on. You might also try setting up
Google Alerts
or doing queries such as site:example.com spammy words
, where
spammy words
might be words like porn, viagra, tramadol, sex or other words that your
site wouldn't normally show. If you find that your site actually was hacked, I'd recommend going
through our blog post about
things to do after being hacked.
There are a lot of issues that can be recognized with Webmaster Tools; these are just some of the more common ones that we've seen lately. Because it can be really difficult to recognize some of these problems, it's a great idea to check your Webmaster Tools account to make sure that you catch any issues before they become real problems. If you spot something that you absolutely can't pin down, why not post in the discussion group and ask the experts there for help?
Have you checked your site lately?