signalkraft

Sitemaps for CodeIgniter

Google, Yahoo, Bing & Others

Down­load Sitemaps for Codeigniter

Ver­sion 0.7

Licensed under GNU GPL v2

This library for the pop­u­lar PHP frame­work CodeIgniter gen­er­ates XML sitemaps and informs search engines and other web­ser­vices of new con­tent for them to crawl.

With live search around the cor­ner and a rel­a­tively new, yet widely adopted and open stan­dard for Sitemaps, now is the time to cre­ate a Sitemap for your CodeIgniter application!

Impor­tant Features

  • Easy to use: See the exam­ple to get started,
  • Tiny foot­print: Only library and con­fig are necessary,
  • Scal­a­bil­ity: Split work load and update only when nec­es­sary (see a word about per­for­mance),
  • Open stan­dard: Fol­lows the Sitemaps XML pro­to­col 0.9, com­pat­i­ble with any mod­ern search engine,
  • Google, Yahoo, Bing & Ask.com: By default this library uses Sitemap Writer to spread your sitemap (this can be changed in the config),
  • Sitemap Indexes: Build an index of all your sitemaps (see Sitemap Index),
  • Autodis­cov­ery: Web ser­vices will find your sitemap through robots.txt (see Autodis­cov­ery).

This library is based on work of Sve­toslav Mari­nov but per­forms quicker (objects were tossed in favor of arrays), sup­ports a newer, non-proprietary stan­dard and in my opin­ion fits the CodeIgniter phi­los­o­phy much bet­ter (library ver­sus plugin).

Instal­la­tion

Down­load the library, extract the con­tents to your system/application folder. Mod­ify the con­fig file to suit your par­tic­u­lar appli­ca­tion then load and use the library as you would with any other. See the exam­ple below for a sam­ple con­troller.

Exam­ple

In this exam­ple we will assume you have a blog you wish to cre­ate a sitemap for. Your blog has a model “posts_model” that con­tains all the arti­cles you have written.

application/controllers/sitemap.php
<?php

class Sitemap extends Controller
{
	function Sitemap()
	{
		parent::Controller();
	}
	
	function index()
	{
		$this->load->model('posts_model');
		$this->load->library('sitemaps');
		
		$posts = $this->posts_model->get_posts();
		
		foreach($posts AS $post)
		{
			$item = array(
				"loc" => site_url("blog/" . $post->slug),
				// ISO 8601 format - date("c") requires PHP5
				"lastmod" => date("c", strtotime($post->last_modified)),
				"changefreq" => "hourly",
				"priority" => "0.8"
			);
			
			$this->sitemaps->add_item($item);
		}
		
		// file name may change due to compression
		$file_name = $this->sitemaps->build("sitemap_blog.xml");

		$reponses = $this->sitemaps->ping(site_url($file_name));
		
		// Debug by printing out the requests and status code responses
		// print_r($reponses);

		redirect(site_url($file_name));
	}
}

Call the con­troller and you should be redi­rected to your freshly built sitemap.

If you want to see HTTP response codes and the requests that were sent by the ping func­tion, uncom­ment the print_r line in the example.

Of course you will want to update your sitemap fre­quently — on *nix oper­at­ing sys­tems a cron­job stands to rea­son. To update every ten min­utes, use crontab -e and add this line:

crontab –e
# m h  dom mon dow   command
*/10  *  *  *  *  /usr/bin/wget -q -O /dev/null http://www.example.com/sitemap

With large sites and fre­quent changes, con­sider gen­er­at­ing only when cre­at­ing, updat­ing or delet­ing. In this case arrang­ing your data into sev­eral sitemaps can fur­ther reduce work load. For exam­ple a sitemap with all your blog posts can be updated seper­ately from a sitemap con­tain­ing your infre­quently mod­i­fied sta­tic pages.

To stitch all those sitemaps together into some­thing the search engines can han­dle you will need another type of file, the sitemap index:

Sitemap Index

Sitemap indexes are tech­ni­cally only needed if your sitemap exceeds 50,000 URLs or 10MB uncom­pressed file size — whichever comes first. In this case you need to build sev­eral smaller sitemaps and index their loca­tion in a seper­ate file, the sitemap index. This file is then treated as a nor­mal sitemap:

$sitemaps = array(
	array("loc" => site_url("sitemap_posts.xml.gz"), "lastmod" => date("c")),
	array("loc" => site_url("sitemap_pages.xml.gz"))
);

$index_file_name = $this->sitemaps->build_index($sitemaps, "sitemap_index.xml");
$reponses = $this->sitemaps->ping(site_url($index_file_name));

redirect(site_url($index_file_name));

For the sake of com­plete­ness I should men­tion that your sitemap index musn’t exceed 50,000 sitemaps or 10MB uncom­pressed file size. If you man­age to hit this 2.5 * 109 URL lim­i­ta­tion on CodeIgniter you should prob­a­bly pause for a minute and con­tem­plate just what got you into this posi­tion and how you may avoid it in the future.

Autodis­cov­ery

Autodis­cov­ery is a neat fea­ture by which other web ser­vices can find your sitemap. You just need to cre­ate a robots.txt file in the root direc­tory of your web­site (http://www.example.com/robots.txt) con­tain­ing the fol­low­ing line:

robots.txt
Sitemap: http://www.example.com/sitemap.xml

If you already have a robots.txt, put this direc­tive any­where; it is inde­pen­dent of user-agent.

For fur­ther infor­ma­tion visit the Sitemaps XML Pro­to­col web­site or look at the source code of this library. All func­tions and para­me­ters are documented.

Changelog

  • Ver­sion 0.7:
    • Fixed a bug in add_item_array()
    • Improved doc­u­men­ta­tion
  • Ver­sion 0.6:
    • Added sitemap_index function
  • Ver­sion 0.5:
    • First pub­lic release