SEO Optimization

Great Docs includes comprehensive search engine optimization (SEO) features to help your documentation rank well in search results. These features are enabled by default and work automatically when you build your site.

What’s Included

Great Docs generates and injects SEO-related files and metadata automatically:

  • sitemap.xml: helps search engines discover all your pages
  • robots.txt: guides crawler behavior and references your sitemap
  • Canonical URLs**: prevents duplicate content issues
  • Meta descriptions**: provides search result snippets
  • JSON-LD structured data: enables rich search results
  • page title templates: consistent Page Title | Site Name format

Auditing SEO Health

After building your site, run the SEO audit command to check for issues:

Terminal
great-docs seo

This produces a report like this:

════════════════════════════════════════════════════════════
πŸ“Š SEO Audit Results
════════════════════════════════════════════════════════════

βœ… sitemap.xml: 65 URLs indexed
βœ… robots.txt: includes sitemap reference
βœ… robots.txt: has user-agent rules
βœ… Analyzed 65 HTML pages
βœ… All pages have canonical URLs
βœ… All pages have meta descriptions
βœ… 10 pages have JSON-LD structured data
βœ… All images have alt text

────────────────────────────────────────────────────────────
βœ… All SEO checks passed!

Fixing Issues

Use --fix to automatically generate missing SEO files:

Terminal
great-docs seo --fix

This will create sitemap.xml and robots.txt if they’re missing.

CI Integration

For continuous integration, use --json for machine-readable output:

Terminal
great-docs seo --json
{
  "status": "pass",
  "pages_checked": 65,
  "issues": [],
  "warnings": [],
  "info": ["βœ… sitemap.xml: 65 URLs indexed", "..."]
}

The command exits with code 1 if critical issues are found, making it easy to fail CI builds on SEO problems.

Configuration

All SEO settings live under the seo key in great-docs.yml. Here’s the full configuration with defaults:

great-docs.yml
seo:
  enabled: true  # Master switch for all SEO features

  sitemap:
    enabled: true
    changefreq:
      homepage: weekly
      reference: monthly
      user_guide: monthly
      changelog: weekly
      default: monthly
    priority:
      homepage: 1.0
      reference: 0.8
      user_guide: 0.9
      changelog: 0.6
      default: 0.5

  robots:
    enabled: true
    allow_all: true
    disallow: []
    crawl_delay: null
    extra_rules: []

  canonical:
    enabled: true
    base_url: null  # Auto-detected from GitHub Pages

  title_template: "{page_title} | {site_name}"

  structured_data:
    enabled: true
    type: SoftwareSourceCode

  default_description: null  # Falls back to package description

Most users won’t need to change these defaults as they’re optimized for typical Python documentation sites.

Sitemap Configuration

The sitemap tells search engines about all your pages and how often they change. Great Docs generates different priorities and change frequencies based on page type.

Page Types

Pages are automatically categorized:

Type Example Paths Default Priority Default Changefreq
homepage index.html 1.0 weekly
user_guide user-guide/*.html, recipes/*.html 0.9 monthly
reference reference/*.html 0.8 monthly
changelog changelog.html 0.6 weekly
default Everything else 0.5 monthly

Customizing Priorities

Adjust priorities based on what’s most important for your site:

great-docs.yml
seo:
  sitemap:
    priority:
      homepage: 1.0
      user_guide: 0.9  # Your tutorials are most valuable
      reference: 0.7   # API docs are secondary

Customizing Change Frequencies

If your reference documentation changes frequently:

great-docs.yml
seo:
  sitemap:
    changefreq:
      reference: weekly  # API changes often
      changelog: daily   # Frequent releases

Robots.txt Configuration

The robots.txt file tells search engine crawlers which pages to index and where to find your sitemap.

Default Behavior

By default, Great Docs generates a permissive robots.txt:

robots.txt
# Robots.txt generated by Great Docs

User-agent: *
Allow: /

Sitemap: https://username.github.io/repo/sitemap.xml

Blocking Paths

To prevent indexing of specific paths (e.g., draft pages):

great-docs.yml
seo:
  robots:
    disallow:
      - /drafts/
      - /_internal/

Blocking AI Crawlers

Some projects prefer to block AI training crawlers:

great-docs.yml
seo:
  robots:
    extra_rules:
      - "User-agent: GPTBot"
      - "Disallow: /"
      - "User-agent: CCBot"
      - "Disallow: /"

Setting Crawl Delay

For sites with limited bandwidth:

great-docs.yml
seo:
  robots:
    crawl_delay: 10  # Seconds between requests

Canonical URLs

Canonical URLs tell search engines which version of a page is the β€œofficial” one, preventing duplicate content issues when your site is accessible via multiple URLs.

Auto-Detection

Great Docs automatically generates canonical URLs based on your GitHub repository. If your repo is github.com/username/repo, the canonical base URL will be:

https://username.github.io/repo/

Manual Configuration

For custom domains or non-GitHub hosting, set the base URL explicitly:

great-docs.yml
seo:
  canonical:
    base_url: https://docs.myproject.com/

The trailing slash is important (Great Docs will add it if missing).

Page Titles

Great Docs applies a consistent title template to all pages, improving brand recognition in search results.

Default Template

The default template is {page_title} | {site_name}, which produces titles like:

  • Installation | My Package
  • GreatDocs.build | My Package
  • Configuration | My Package

Custom Templates

Change the separator or format:

great-docs.yml
seo:
  title_template: "{page_title} - {site_name}"

Or remove the site name entirely:

great-docs.yml
seo:
  title_template: "{page_title}"

Structured Data (JSON-LD)

Great Docs injects JSON-LD structured data into your pages, enabling rich search results with additional context about your software.

What Gets Injected

On the homepage and reference pages, Great Docs adds:

{
  "@context": "https://schema.org",
  "@type": "SoftwareSourceCode",
  "name": "My Package",
  "description": "A Python package for...",
  "codeRepository": "https://github.com/username/repo",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "Python"
  }
}

Customizing the Schema Type

For different types of documentation:

great-docs.yml
seo:
  structured_data:
    type: WebSite  # Or: SoftwareApplication, APIReference, etc.

Disabling Structured Data

If you prefer not to include JSON-LD:

great-docs.yml
seo:
  structured_data:
    enabled: false

Meta Descriptions

Meta descriptions appear in search result snippets. Great Docs automatically generates them from your page content.

Auto-Generation

For each page, Great Docs extracts a description from:

  1. The first meaningful paragraph in the page content
  2. Falls back to the default description if no suitable content is found

Setting a Default Description

Configure a fallback description for pages without extractable content:

great-docs.yml
seo:
  default_description: "Documentation for My Package, a Python library for..."

If not set, the package description from pyproject.toml is used.

Page-Level Descriptions

For individual pages, add a description field in the YAML frontmatter:

user_guide/01-installation.qmd
---
title: "Installation"
description: "How to install My Package using pip, conda, or from source."
---

Noindex for Internal Pages

Some pages shouldn’t appear in search results. Great Docs automatically adds noindex directives to internal pages like the Skills page.

Manual Noindex

To prevent a specific page from being indexed, add to its frontmatter:

drafts/experimental-feature.qmd
---
title: "Experimental Feature"
robots: "noindex, nofollow"
---

Disabling SEO Features

To completely disable SEO generation:

great-docs.yml
seo:
  enabled: false

Or disable specific features:

great-docs.yml
seo:
  sitemap:
    enabled: false
  robots:
    enabled: false
  canonical:
    enabled: false
  structured_data:
    enabled: false

Best Practices

For Maximum SEO Effectiveness

  1. set a base URL either through a GitHub repo or explicit configuration
  2. write good descriptions: add description: to key pages’ frontmatter
  3. use descriptive titles that are clear (as concise page titles improve click-through rates)
  4. add alt text to images: Great Docs audits this and will fix any missing alt text
  5. run great-docs seo to audit before deployment and catch issues early

For GitHub Pages

If you’re deploying to GitHub Pages, SEO works out of the box:

  1. your canonical base URL is auto-detected from the repository
  2. the sitemap is automatically referenced in robots.txt
  3. all pages get proper canonical links

For Custom Domains

When using a custom domain, this configuration is useful:

great-docs.yml
seo:
  canonical:
    base_url: https://docs.myproject.com/

And ensure your DNS is configured correctly (see Adding a Custom Domain).