SEO Optimization

Documentation is only useful if people can find it. When someone searches for how to install your package or call a specific function, your docs should appear near the top of the results. Good SEO ensures that search engines can crawl, index, and accurately represent your pages.

Great Docs includes comprehensive SEO features that are enabled by default and work automatically when you build your site. Sitemaps, canonical URLs, meta descriptions, structured data, and robots directives are all generated without any configuration.

What’s Included

Great Docs generates and injects SEO-related files and metadata automatically:

sitemap.xml: helps search engines discover all your pages
robots.txt: guides crawler behavior and references your sitemap
Canonical URLs: prevents duplicate content issues
Meta descriptions: provides search result snippets
JSON-LD structured data: enables rich search results
Page title templates: consistent Page Title | Site Name format

All of these are generated at build time. You can customize any of them or disable features you don’t need.

Auditing SEO Health

After building your site, you can audit the generated output to verify that all SEO features are in place. Run the SEO audit command to check for issues:

Terminal

great-docs seo

This produces a report like this:

════════════════════════════════════════════════════════════
📊 SEO Audit Results
════════════════════════════════════════════════════════════

✅ sitemap.xml: 65 URLs indexed
✅ robots.txt: includes sitemap reference
✅ robots.txt: has user-agent rules
✅ Analyzed 65 HTML pages
✅ All pages have canonical URLs
✅ All pages have meta descriptions
✅ 10 pages have JSON-LD structured data
✅ All images have alt text

────────────────────────────────────────────────────────────
✅ All SEO checks passed!

The audit checks every HTML page in the built site and reports missing or malformed SEO elements.

Fixing Issues

Use --fix to automatically generate missing SEO files:

Terminal

great-docs seo --fix

This creates sitemap.xml and robots.txt if they’re missing and patches any fixable issues in the built output.

CI Integration

For continuous integration, use --json for machine-readable output:

Terminal

great-docs seo --json

{
  "status": "pass",
  "pages_checked": 65,
  "issues": [],
  "warnings": [],
  "info": ["✅ sitemap.xml: 65 URLs indexed", "..."]
}

The command exits with code 1 if critical issues are found, making it easy to fail CI builds on SEO problems. Pair this with great-docs lint and great-docs links for a comprehensive pre-deployment check.

Configuration

All SEO settings live under the seo key in great-docs.yml. Here’s the full configuration with defaults:

great-docs.yml

seo:
  enabled: true  # Master switch for all SEO features

  sitemap:
    enabled: true
    changefreq:
      homepage: weekly
      reference: monthly
      user_guide: monthly
      changelog: weekly
      default: monthly
    priority:
      homepage: 1.0
      reference: 0.8
      user_guide: 0.9
      changelog: 0.6
      default: 0.5

  robots:
    enabled: true
    allow_all: true
    disallow: []
    crawl_delay: null
    extra_rules: []

  canonical:
    enabled: true
    base_url: null  # Auto-detected from GitHub Pages

  title_template: "{page_title} | {site_name}"

  structured_data:
    enabled: true
    type: SoftwareSourceCode

  default_description: null  # Falls back to package description

Most users won’t need to change these defaults. They are optimized for typical Python documentation sites, and every feature can be toggled independently.

Sitemap Configuration

The sitemap tells search engines about all your pages and how often they change. Great Docs generates different priorities and change frequencies based on page type, so your most important content gets crawled first.

Page Types

Pages are automatically categorized based on their path in the built site:

Type	Example Paths	Default Priority	Default Changefreq
`homepage`	`index.html`	1.0	weekly
`user_guide`	`user-guide/.html`, `recipes/.html`	0.9	monthly
`reference`	`reference/*.html`	0.8	monthly
`changelog`	`changelog.html`	0.6	weekly
`default`	Everything else	0.5	monthly

These defaults work well for most projects. The subsections below show how to override them if your site has different needs.

Customizing Priorities

Adjust priorities based on what’s most important for your site:

great-docs.yml

seo:
  sitemap:
    priority:
      homepage: 1.0
      user_guide: 0.9  # Your tutorials are most valuable
      reference: 0.7   # API docs are secondary

Customizing Change Frequencies

If your reference documentation changes frequently:

great-docs.yml

seo:
  sitemap:
    changefreq:
      reference: weekly  # API changes often
      changelog: daily   # Frequent releases

Robots.txt Configuration

The robots.txt file tells search engine crawlers which pages to index and where to find your sitemap. Great Docs generates one automatically, but you can customize it for more control over crawler behavior.

Default Behavior

By default, Great Docs generates a permissive robots.txt:

robots.txt

# Robots.txt generated by Great Docs

User-agent: *
Allow: /

Sitemap: https://username.github.io/repo/sitemap.xml

Blocking Paths

To prevent indexing of specific paths (e.g., draft pages):

great-docs.yml

seo:
  robots:
    disallow:
      - /drafts/
      - /_internal/

Blocking AI Crawlers

Some projects prefer to block AI training crawlers:

great-docs.yml

seo:
  robots:
    extra_rules:
      - "User-agent: GPTBot"
      - "Disallow: /"
      - "User-agent: CCBot"
      - "Disallow: /"

Setting Crawl Delay

For sites with limited bandwidth:

great-docs.yml

seo:
  robots:
    crawl_delay: 10  # Seconds between requests

Canonical URLs

Canonical URLs tell search engines which version of a page is the “official” one. This prevents duplicate content issues when your site is accessible via multiple URLs (for example, with and without a trailing slash, or through both a custom domain and github.io).

Auto-Detection

Great Docs automatically generates canonical URLs based on your GitHub repository. If your repo is github.com/username/repo, the canonical base URL will be:

https://username.github.io/repo/

Manual Configuration

For custom domains or non-GitHub hosting, set the base URL explicitly:

great-docs.yml

seo:
  canonical:
    base_url: https://docs.myproject.com/

The trailing slash is important (Great Docs will add it if missing). Every page in the built site receives a <link rel="canonical"> tag pointing to its full URL.

Page Titles

Great Docs applies a consistent title template to all pages, improving brand recognition in search results.

Default Template

The default template is {page_title} | {site_name}, which produces titles like:

Installation | My Package
GreatDocs.build | My Package
Configuration | My Package

Custom Templates

Change the separator or format:

great-docs.yml

seo:
  title_template: "{page_title} - {site_name}"

Or remove the site name entirely:

great-docs.yml

seo:
  title_template: "{page_title}"

Structured Data (JSON-LD)

Structured data helps search engines understand what your site is about beyond plain text. Great Docs injects JSON-LD structured data into your pages, enabling rich search results with additional context about your software.

What Gets Injected

On the homepage and reference pages, Great Docs adds a schema block like this:

{
  "@context": "https://schema.org",
  "@type": "SoftwareSourceCode",
  "name": "My Package",
  "description": "A Python package for...",
  "codeRepository": "https://github.com/username/repo",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "Python"
  }
}

Customizing the Schema Type

For different types of documentation:

great-docs.yml

seo:
  structured_data:
    type: WebSite  # Or: SoftwareApplication, APIReference, etc.

Disabling Structured Data

If you prefer not to include JSON-LD:

great-docs.yml

seo:
  structured_data:
    enabled: false

Meta Descriptions

Meta descriptions are the short summaries that appear below page titles in search results. A good description improves click-through rates because it tells searchers exactly what they’ll find on the page. Great Docs generates these automatically from your content.

Auto-Generation

For each page, Great Docs extracts a description from:

the first meaningful paragraph in the page content
falls back to the default description if no suitable content is found

Setting a Default Description

Configure a fallback description for pages without extractable content:

great-docs.yml

seo:
  default_description: "Documentation for My Package, a Python library for..."

If not set, the package description from pyproject.toml is used as a fallback.

Page-Level Descriptions

For individual pages, add a description field in the YAML frontmatter:

user_guide/01-installation.qmd

---
title: "Installation"
description: "How to install My Package using pip, conda, or from source."
---

Noindex for Internal Pages

Not every page belongs in search results. Internal pages, drafts, or experimental features can clutter search results and confuse users. Great Docs automatically adds noindex directives to internal pages like the Skills page.

Manual Noindex

To prevent a specific page from being indexed, add to its frontmatter:

drafts/experimental-feature.qmd

---
title: "Experimental Feature"
robots: "noindex, nofollow"
---

Disabling SEO Features

If you handle SEO through other means (for example, a hosting platform that generates sitemaps for you), you can disable individual features or all of them at once.

To completely disable SEO generation:

great-docs.yml

seo:
  enabled: false

Or disable specific features:

great-docs.yml

seo:
  sitemap:
    enabled: false
  robots:
    enabled: false
  canonical:
    enabled: false
  structured_data:
    enabled: false

Best Practices

The automatic features cover the technical foundations. These recommendations help you get the most out of them.

For Maximum SEO Effectiveness

set a base URL either through a GitHub repo or explicit configuration
write good descriptions: add description: to key pages’ frontmatter
use descriptive titles that are clear (as concise page titles improve click-through rates)
add alt text to images: Great Docs audits this and will fix any missing alt text
run great-docs seo to audit before deployment and catch issues early

For GitHub Pages

If you’re deploying to GitHub Pages, SEO works out of the box:

your canonical base URL is auto-detected from the repository
the sitemap is automatically referenced in robots.txt
all pages get proper canonical links

For Custom Domains

When using a custom domain, this configuration is useful:

great-docs.yml

seo:
  canonical:
    base_url: https://docs.myproject.com/

And ensure your DNS is configured correctly (see Adding a Custom Domain).

Next Steps

Good SEO makes your documentation discoverable. Great Docs handles the technical foundations (sitemaps, canonical URLs, meta tags, structured data) automatically, so you can focus on writing clear titles and descriptions that represent your content well.

Social Cards controls how links appear when shared on social platforms
Deployment covers publishing to GitHub Pages with SEO settings applied
Linting catches documentation quality issues that can also affect search relevance
Configuration covers all great-docs.yml options including SEO settings