Webmaster Papers








An Introduction to Google Sitemaps


... and why I 'm dying to get finally in the Google SERP

Have you also experienced that getting indexed on Google, despite the Google crawler visits each day your site, is getting tougher and tougher, not to say it's apparently almost impossible in short term?! Between us, in the corridors of Google, they're talking about the notorious 'Google Sandbox' theory. According this theory, a new website is first 'sandboxed' and doesn't get a ranking when the keywords of that website are not incredibly competitive. The Google Sandbox is in fact a filter placed in March of 2004 which new websites prevents from having immediately success in the Google search engine result pages. This filter "is only intended to reduce search engine spam". The sandbox filter is not a permanent filter for your website, what means you can only wait, wait and wait until Google liberates you from this filter. In mean time, don't recline, but write original and well optimized content; write, publish and share articles, place a link on other websites etc.

An example:

I started with wallies.info this year on April 1st and submitted this URL on Google, Yahoo and MSN Search on the same day. Two months later, when I'm searching for 'http://www.wallies.info' and 'wallies.info', Google has twice 1 search result, Yahoo! twice 65 results and MSN Search 313 and 266 results. A remarkable difference, isn't it?! Anyway, Google has a huge problem and backlog to index (new) pages. But two or three times a week, I receive a Google Alert for these two searches, but they aren't encountered again in the Google search engine results pages (SERP) at all.

With the introduction of Google Sitemaps (https://www.google.com/webmasters/sitemaps/), a beta website update reporting service, on Friday 3rd of June 2, I hope this will restrict the Sandbox waiting room. With a Sitemap, crawlers are better enabled to find out recently changed pages and get immediately a list of present pages. As Google Sitemaps is released under a Creative Commons license, all search engines can make use of it. Important to know is that Google Sitemaps will not influence the calculation of your PageRank.

Sitemaps has its own variant of the XML protocol and is called the 'Sitemap Protocol'. For each URL some additional information such as the last modified date can be included.

There are several methods to create your XML Sitemap:

1. The Sitemap Generator (https://www.google.com/webmasters/sitemaps/docs/en/sitemap-generator.html) is a simple script that can be configured to automatically create Sitemaps and submit them to Google.

2. Make your own Sitemap script

3. With the Open Archives Initiative (OAI) protocol for metadata harvesting (http://www.openarchives.org/OAI/openarchivesprotocol.html)

4. With RSS 2.0 and Atom 0.3 syndication feeds

5. A simple list of URLs with one per line

In the current RSS era, it's obvious that the fourth method is the most logical and easiest. Roughly said, you need only to make a new XML template. For a working Sitemap example of the wallies.info blog, got to http://www.wallies.info/blog/gsm.php.

This XML Sitemap has to be submitted on the Google Sitemaps page ( https://www.google.com/webmasters/sitemaps/ ). When you've updated your listed pages or your Sitemap has changed, you have to resubmit your Sitemap link for re-crawling. After I've submitted the wallies.info Sitemap, it took approximately between 3 and 4 hours before Google has downloaded the file.

Please note that Sitemaps doesn't influence in no way the calculation of your PageRank, Google doesn't add every submitted Sitemap URL to the Google Index and Google doesn't guarantee anything about when or if your Sitemap pages will appear in the Google SERP.

Off course, it's easier for you to set up an automated job to submit this XML-file.

You can do this with an automated HTTP request, like this example (your sitemap has to be URL encoded, this is everything behind /ping?sitemap=):

www.google.com/webmasters/sitemaps/ping?sitemap=
http%3A%2F%2Fwww.yoursite.com%2Fsitemap.xml

What is the Sitemap Protocol?

The Sitemap Protocol informs the Google search engine which pages in your website are available for crawling. A Sitemap consists of a list of URLs and may also contain additional information about those URLs, such as when they were last modified, how frequently they change, etc.

An example of the XML Sitemap format:

-

-

http://www.wallies.info/blog/

2005-06-07T05:34:36+02:00

daily

1.0

-

http://www.wallies.info/blog/item/130/index.html

2005-06-05T10:59:22+02:00

1.0

-

...

The XML Sitemap Format uses the following XML tags:

- urlset : this tag encapsulates all other tags of this list;

- url : this tag encapsulates the changefreq, lastmod, loc and priority tags of this list;

- changefreq (optional) is how frequently the content at the URL is likely to change. Valid values are 'always', 'hourly', 'daily', 'weekly', 'monthly', 'yearly' and 'never';

- lastmod (optional) is the time the content at the URL was last modified. The timestamp has to be in a ISO 8601 format;

- loc (required) : the URL location / a URL for a page on your site (< 2.048 characters).

- priority (optional) : the priority of the page relative to other pages on the same site and is a number between 0.0 and 1.0 (default 0.5). This priority is only used to select between URLs on your site. The priority of your pages will not be compared to the priority of pages on other sites.

An urlset may contain up to 50.000 URL's and the file must not be larger than 10MB when uncompressed. Multiple Sitemaps are gathered in a Sitemap index file with a maximum of 1,000 sitemaps of the same site.

The Google Sitemaps URL: https://www.google.com/webmasters/sitemaps/

For feedback of this Sitemaps article, please feel free to visit http://www.wallies.info/blog/item/132/index.html

Walter V. is a self-employed internet entrepreneur and founder-webmaster of several websites, including wallies.info: A snappy blog about snappy blue things: blog | wiki | forum | links - http://wallies.info

mblo.gs: a snappy moblog community - http://mblo.gs

RELATED ARTICLES


Its The Same Deal For Both Search Engines And Real Estate
It's all about location, location, location!
Is Page Not Found Making Google Tell The World Site Not Found?
Search Engines are hard to tame, that's for sure. But if you can get on their good side, search can be your biggest ally when it comes to generating tons of free traffic to your business web site. Not many people understand how search engines think. So, search engine "optimization" ends up either ignored or left up to highly paid experts.
Can Invisible Text in CSSs Slip Under Search Engine Radar?
I'm literally inundated with questions on the subject of invisible text & hosting so in I thought I'd debunk some myths and give you the facts straight up.
The Small Screen SEO!
First of all, What is SEO? SEO stands for Search Engine Optimization and is the art of making your pages efficient for search engine robots in order to get your site listed in the top of their results, preferrably on the first page and within the top 10. We can easily achieve this if we know how the search engine is reading our sites, so we can further tweak our pages and make it to the top.
Keywords are the ?KEY? to a Popular and Profitable Web Site
Keyword Research will reveal answers to 3 critical questions:
Creating Search Engine Friendly Web Sites
With tons of competition and copycats online, you need a trustworthy search engine expert to help you tackle the competition and outdo your competitors in internet sales. There are many ways to go about killing the competition online and as long as it is honest work by your search engine guy, your web site will reap profits from it.
Search Engine Optimization Tips For 2005 - Part Three
Welcome to part three of our series of articles on search engine optimization. In the third and final part of our series of articles on search engine optimization we cover the topic of links, the types of links and what makes them so important.
Google ? A Bit of History
The first question most people have is, "What the heck is a "Google?" It is a play on the word "googol," which is the mathematical figure 1 followed by 100 zeros. Depending on the level of your love for math, this is either the greatest or lamest name for a search engine. Regardless, the clever kids at Google have turned it into a cultural standard.
21 Search Engine Terms Every Web Marketer Should Know Part 1
1. Search Engine - Is a database of web sites that is ranked according to the computerized criteria that the programmers decide upon called an algorithm. Various search engines determine ranking on their own different factors of importance or relevancy. For the last few years the Google search engine was the most popular search engine supplying the search results for Yahoo and to a lesser extent MSN and AOL. This all changed recently after Yahoo purchased different search engine companies and developed its own search engine. Soon MSN will enter this market with its own search engine algorithm.
Top Ten Listing? Hmmmmmmmm
This is my question...
How to Google; or How to be Easily Distracted
I set out with the intention of writing a self improvement type article with an original temporary working title of 'How To Overcome Fear'.
Why SEO (as we know it) is Doomed to Failure and How You Can Avoid the Trap
Search Engine Optimization (SEO) has become one of the biggest internet buzz-words recently. Everyone is talking about it. These days it seems there's an "expert" around every corner promising all kinds of wonderful things to online business owners. Beware! If you are interested in building a long term successful online business, there are few things you should know when it comes to search engine optimization.
Submit All Of Your Pages And Watch Your Traffic Grow
Everyone is looking for "secrets" about how to get more qualified traffic to their web sites. What I'm going to share with you is no secret, however it is not practiced by very many companies or individuals. Many companies and individuals only submit their home page to search engines and directories. You can easily quadruple your traffic in 90 - 120 days by implementing the following procedures.
Keyword Density And How To Use It To Keep Traffic Flowing To Your Site!
Generating high traffic to your web site can be costly, or not, depending on time and effort you commit to the business.
Are You Hung-Up on Page Rank and Back Links?
It's unfortunate that many website owners are so hung-up on Page Rank, they'll rarely if ever, link to a site with a page rank lower than their own.
How Real SEO Analysis Works
If you're serious about SEO, you need to know how to analyze the information you uncover.
The Truth About Search Engines: Playing A Game You Cant Win
If you go strictly by the numbers, Yahoo, MSN and Google are the "Big 3" of search engines and directories. Between them, they index millions and millions of pages in their directories. Although Google claims to index over 4 billion.
What Makes The Perfect SEO Firm?
SEO companies come in all shapes and sizes. You've got your solo SEOs that either a) do everything themselves and/or b) sub-contract out many aspects of each campaign while maintaining a tight control on the quality and results of the project. Then you have your big SEO firms that employ 20+ employees that handle various aspects of your account. These firms can often turn into SEO factories and can lack the ability to treat each client individually, because everything is done in bulk.
The Unethical SEO Myth
"The use of black hat SEO techniques are completely unethical." Really? I completely disagree.
RSS Feeds - a Website Owners Friend in Disguise
We've all heard about it-it seems like all the buzz right now in the search engine marketing industry is RSS. If you're a website owner, than there are two ways your website can benefit from using RSS on your website-you can provide an RSS feed or, for the not-so-technically-inclined folks like me, you can use an RSS feed to keep your site's content fresh.