Be a Spider, Man: Quickly Diagnose Crawlability Issues

By Bryson Meunier, Natural Search Associate Director, Content Solutions

The title of this article has nothing to do with your friendly neighborhood Peter Parker or his nemesis Doc Ock, but if you can help your organization or your client make their content more crawlable with the methods outlined here, you might feel something like a hero. Especially if you’re a beginner SEO or web designer/account executive/web developer who needs to understand SEO on a basic level, but not as their full-time job.

The back story most of us know: Flash designers and web designers looking to build an interactive multimedia web experience historically have used technologies that are difficult for search engines to crawl. Search engines have improved their indexing algorithms to index some text within Flash files, and they can process some types of JavaScript, but their solution is still not ideal.

The problem is that designers and developers often publish content that is unreadable to spiders. Google tells you to use a lynx viewer to see what content Googlebot can see, but if you’re picturing a View-master with slides of a medium sized wildcat, you’ll probably need a faster way.

The good news is that there are several simple ways to see the content that the engines can see on a page. I’ll be going through them with a random example of the home page for a brand that is a household name, makes delicious, fudge-striped cookies among other things, and has a website that’s not entirely accessible to simple cookie-loving users, and the search engine spiders that serve them. I hope they will consider my usage of their site as an example of a common design error free advice for the endless supply of goodies they produce.

Click to view larger image.
Human View of Keebler site with Flash
Figure 1 Human view of site, Flash enabled

If the page you’re auditing, like the one above, has been indexed, simply view the text-only cached version of the page. From the Google Toolbar, pull down the cached version next to the PageRank bar to see the full cached version of the page.

If you don’t have the Google Toolbar, simply enter the search operator in Google. Once you can see the cached version of the page, select text-version only to see the content that will be used when ranking the page. This is the content that the search engines will use to understand the site in question.

Click to view larger image.
Cache Version of Keebler Page
Figure 2 Google's Text-only cache of site in question

Still too complicated? Simply enter the URL in a site like SEO Browser to view the text version of the site.

Click to view larger image.
SEO Browser
If you want to see how a specific spider or browser will render content, you can also use Firefox plugins such as User Agent Switcher, WML Browser or a check server headers tool to get the view you’re looking for.

If the page is in development and hasn’t been indexed, all hope is not lost. Simply load the html file in your browser and disable Flash, images and JavaScript, and you should have a fairly close view of what most spiders will see.

To disable JavaScript in IE 7.0, first select Internet Options at the bottom of the Tools menu.

Click to view larger image.

Internet Options

Next, select Manage Add-ons on the Programs tab.

Click to view larger image.
Internet Options Screen

Finally, disable any technologies a spider might have trouble with, including Flash, JavaScript, and images.

Click to view larger image.
Manage Add-ons

The resulting view should look similar to the text-only cache above:

Click to view larger image.
Keebler Cache Version

Don’t like what you see? Neither do people with screen readers or on mobile devices. In fact, this view is most appealing to your competitors, because on-page factors will not be a part of your search engine ranking, and your pages will be less competitive as a result.

How do you fix it? Provide a text version of all digital content and serve it up to simple users with a program like SWF Object. But most importantly, be aware that search engine spiders are rarely going to see what human beings see, and use these quick techniques in the future to help determine the difference.


Copyright © 2008 Resolution Media, Inc. All rights reserved.