Fading Coder

One Final Commit for the Last Sprint

Home > Notes > Content

Building XML Sitemaps for Django Web Applications

Notes 1

What Is a Sitemap?

A sitemap is a structured catalog of all URLs on a website, created to help search engine crawlers navigate and index your site efficiently. This is especially valuable for sites with deep page hierarchies where crawlers might miss some content otherwise. Stendard XML sitemaps are usually hosted at your domain root with the filename sitemap.xml. A minimal valid sitemap snippet looks like this:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
	<url>
		<loc>http://example.com/blog/post/63</loc>
		<lastmod>2020-12-01</lastmod>
		<changefreq>weekly</changefreq>
		<priority>0.5</priority>
	</url>
	<url>
		<loc>http://example.com/blog/post/61</loc>
		<lastmod>2020-12-01</lastmod>
		<changefreq>weekly</changefreq>
		<priority>0.5</priority>
	</url>
</urlset>

Django includes a robust built-in framework for generating XML sitemaps. You can implement a fully functional sitemap by defining custom Sitemap classes and adding a single URL route.

Installation & Setup

First, add the required Django apps to your INSTALLED_APPS in settings.py:

INSTALLED_APPS = [
    'django.contrib.sitemaps',
    'django.contrib.sites',
    # Your other installed apps here
]
SITE_ID = 1

Note: The django.contrib.sites app manages multiple site configurations for a single Django instance. After updating your settings, run the database migrations to create the required site tables:

python manage.py migrate

Next, log into the Django admin dashboard, locate the Sites entry, and update it with your website's official domain and display name.

Implementing Sitemaps

Step 1: Create Sitemap Classes

Create a sitemap.py file in your Django app directory. Below is a complete example with sitemap configurations for different content types:

from django.contrib.sitemaps import Sitemap
from django.urls import reverse
from .models import Post, Tag, Category, UserProfile

class StaticPageSitemap(Sitemap):
    priority = 0.5
    changefreq = 'daily'

    def items(self):
        return ['blog:search', 'blog:about', 'blog:archives']

    def location(self, item):
        return reverse(item)

class PostSitemap(Sitemap):
    changefreq = "weekly"
    priority = 0.5
    # Uncomment to force HTTPS URLs
    # protocol = 'https'

    def items(self):
        return Post.objects.all()

    def lastmod(self, obj):
        return obj.updated_at

    def location(self, obj):
        return f'/blog/post/{obj.id}'

class CategorySitemap(Sitemap):
    changefreq = "weekly"
    priority = 0.6

    def items(self):
        return Category.objects.all()

    def lastmod(self, obj):
        return obj.created_at

    # Omit this method if your Category model has a get_absolute_url() method
    def location(self, obj):
        return f'/blog/category/{obj.slug}'

class TagSitemap(Sitemap):
    changefreq = "weekly"
    priority = 0.3

    def items(self):
        return Tag.objects.all()

    def lastmod(self, obj):
        return obj.created_at

    def location(self, obj):
        return f'/blog/tag/{obj.slug}'

class UserProfileSitemap(Sitemap):
    changefreq = "weekly"
    priority = 0.3

    def items(self):
        return UserProfile.objects.all()

    def lastmod(self, obj):
        return obj.joined_at

    def location(self, obj):
        return f'/user/profile/{obj.id}'

Step 2: Configure URL Routes

Update your project's urls.py file to register the sitemap endpoint:

from django.urls import path
from django.contrib.sitemaps.views import sitemap
from blog.sitemap import (
    PostSitemap, CategorySitemap, TagSitemap, UserProfileSitemap, StaticPageSitemap
)

sitemap_config = {
    'posts': PostSitemap,
    'categories': CategorySitemap,
    'tags': TagSitemap,
    'users': UserProfileSitemap,
    'static': StaticPageSitemap
}

urlpatterns = [
    # Your other URL routes here
    path('sitemap.xml', sitemap, {'sitemaps': sitemap_config}, name='django.contrib.sitemaps'),
]

Once deployed, visit http://127.0.0.1:8000/sitemap.xml to view your generated sitemap, then submit the URL to search engines like Google.

Sitemap Class Reference

All custom sitemap classes inherit from django.contrib.sitemaps.Sitemap, and can include the following attributes and methods:

  1. items(): Required. Returns a list of objects that will be included in the sitemap. The framework does not restrict the type of object, as long as they are passed to other sitemap methods like location() or lastmod().
  2. location(): Optional. Can be iether a method or class attribute. If a method, it returns the absolute URL path (without protocol or domain) for an object from items(). If an attribute, it uses a fixed string for all objects. If omitted, the framework will call the model's get_absolute_url() method automatically. Valid paths look like /blog/post/123/, not full URLs like https://example.com/blog/post/123/.
  3. lastmod(): Optional. A method or attribute that returns the last modified timestamp of an object, as a datetime or string.
  4. changefreq(): Optional. A method or attribute that describes how often the content changes. Valid values are: always, hourly, daily, weekly, monthly, yearly, never.
  5. priority(): Optional. A method or attribute that sets the search priority of the URL relative to other site pages, ranging from 0.0 to 1.0. The default value is 0.5.
  6. protocol: Optional. Forces the URL protocol to either http or https for all sitemap entries.
  7. limit: Optional. Sets the maximum number of URLs per sitemap page, to comply with search engine guidelines.
  8. i18n: Optional. Boolean value that determines if the sitemap should be generated for all active languages. Defaults to False.

Notifying Search Engines

You can automatically notify search engines like Google when your sitemap updates using Django's built-in ping_google() function:

from django.contrib.sitemaps import ping_google

The functon accepts an optional sitemap_url parameter for the full absolute URL of your sitemap. If omitted, Django will attempt to auto-detect the sitemap URL, but this may raise SitemapNotFound if the URL cannot be resolved.

A common implementation is to call the function after saving a model instance:

from django.db import models
from django.contrib.sitemaps import ping_google

class Post(models.Model):
    title = models.CharField(max_length=200)
    content = models.TextField()
    updated_at = models.DateTimeField(auto_now=True)

    def save(self, *args, **kwargs):
        super().save(*args, **kwargs)
        try:
            ping_google()
        except Exception:
            # Suppress network or lookup errors to avoid breaking save operations
            pass

For better performance, use a cron job or scheduled task runner to call ping_google() periodically instead of on every model save, to reduce unnecessary HTTP requests to Google's servers.

Related Articles

Designing Alertmanager Templates for Prometheus Notifications

How to craft Alertmanager templates to format alert messages, improving clarity and presentation. Alertmanager uses Go’s text/template engine with additional helper functions. Alerting rules referenc...

Deploying a Maven Web Application to Tomcat 9 Using the Tomcat Manager

Tomcat 9 does not provide a dedicated Maven plugin. The Tomcat Manager interface, however, is backward-compatible, so the Tomcat 7 Maven Plugin can be used to deploy to Tomcat 9. This guide shows two...

Skipping Errors in MySQL Asynchronous Replication

When a replica halts because the SQL thread encounters an error, you can resume replication by skipping the problematic event(s). Two common approaches are available. Methods to Skip Errors 1) Skip a...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.