YAML Tags, Anchors, and Advanced Features with yaml12

This guide picks up where the 2-minute intro leaves off. It shows what YAML tags are and how to work with them in yaml12 using handlers. Along the way it covers complex mapping keys and document streams so you can handle real-world YAML.

Tags in YAML and how yaml12 handles them

Tags annotate any YAML node with extra meaning. They always start with ! in YAML syntax and appear before the node's value; they are not part of the scalar text itself.

yaml12 preserves tags as Yaml objects. The object carries value (a regular Python type) and tag (a string). Parsing a tagged scalar:

from yaml12 import Yaml, parse_yaml

color = parse_yaml("!color red")
assert isinstance(color, Yaml) and (color.tag, color.value) == ("!color", "red")

Custom tags bypass the usual scalar typing; the scalar is returned as a string even if it looks like another type.

assert parse_yaml("! true") == Yaml(value="true", tag="!")
assert parse_yaml("true") is True

Using handlers to transform tagged nodes while parsing

parse_yaml() and read_yaml() accept handlers: a dict mapping tag strings to callables. Handlers run on any matching tagged node. For tagged scalars the handler receives the Python scalar; for tagged sequences or mappings it receives a plain list/dict.

from dataclasses import dataclass
from yaml12 import parse_yaml

@dataclass(frozen=True)
class Point:
    x: int
    y: int

def point_handler(value):
    return Point(x=value["x"], y=value["y"])

doc = parse_yaml(
    "vertex: !point {x: 1,  y: 2}",
    handlers={"!point": point_handler},
)

assert doc["vertex"] == Point(1, 2)

Handlers apply to both values and keys, including non-specific ! tags if you register "!". If a handler raises an exception, it propagates unchanged to make debugging easy.

Any tag without a matching handler stays as a Yaml object. Handlers without matching tags are simply unused.

Post-process tags yourself

You can also parse without handlers and walk the result yourself. For example, processing !expr scalars manually:

from yaml12 import Yaml, parse_yaml

def eval_yaml_expr_nodes(obj):
    if isinstance(obj, Yaml):
        if obj.tag == "!expr":
            return eval(str(obj.value))
        return Yaml(eval_yaml_expr_nodes(obj.value), obj.tag)
    if isinstance(obj, list):
        return [eval_yaml_expr_nodes(item) for item in obj]
    if isinstance(obj, dict):
        return {eval_yaml_expr_nodes(k): eval_yaml_expr_nodes(v) for k, v in obj.items()}
    return obj

raw = parse_yaml("!expr 1 + 1")
assert isinstance(raw, Yaml) and eval_yaml_expr_nodes(raw) == 2

Mappings revisited: non-string keys and Yaml

YAML mapping keys do not have to be plain strings; any node can be a key. For example, this is valid even though the key is a boolean:

true: true

Parsed with yaml12, unhashable or tagged keys become Yaml so they can live in a Python dict while preserving equality and hashing by structure:

parsed = parse_yaml("true: true")
key = next(iter(parsed))
assert key is True and parsed[key] is True and not isinstance(key, Yaml)

Complex keys use the explicit mapping-key indicator ?:

? [a, b]
: tuple
? {x: 1, y: 2}
: map-key

Becomes:

parsed = parse_yaml(...above yaml...)
keys = list(parsed)
assert all(isinstance(k, Yaml) for k in keys) and keys[0].value == ["a", "b"] and keys[1].value == {"x": 1, "y": 2}

Tagged mapping keys

Handlers run on keys too, so a handler can turn tagged keys into friendly Python keys before they are wrapped.

handlers = {"!upper": str.upper}
result = parse_yaml("!upper key: value", handlers=handlers)
assert result == {"KEY": "value"}

If you anticipate tagged mapping keys that you want to process yourself, walk the Yaml keys alongside the values and unwrap them as needed.

Document streams and markers

Most YAML files contain a single document. YAML also supports document streams: multiple documents separated by --- and optionally closed by ....

Reading multiple documents

parse_yaml() and read_yaml() default to multi=False, returning only the first document. When multi=True, all documents are returned as a list.

doc_stream = """
---
doc 1
---
doc 2
"""

parsed_first = parse_yaml(doc_stream)
parsed_all = parse_yaml(doc_stream, multi=True)
assert (parsed_first, parsed_all) == ("doc 1", ["doc 1", "doc 2"])

Writing multiple documents

write_yaml() and format_yaml() default to a single document. With multi=True, the value must be a sequence of documents and the output uses --- between documents and ... after the final one. For single documents, write_yaml() always wraps the body with --- and a final ..., while format_yaml() returns just the body.

from yaml12 import format_yaml, write_yaml

docs = ["first", "second"]
text = format_yaml(docs, multi=True)
assert text.startswith("---") and text.rstrip().endswith("...")
write_yaml(docs, path="out.yml", multi=True)

When multi=False, parsing stops after the first document, even if later content is not valid YAML. That makes it easy to extract front matter from files that mix YAML with other text (like Markdown).

rmd_lines = [
    "---",
    "title: Front matter only",
    "params:",
    "  answer: 42",
    "---",
    "# Body that is not YAML",
]
frontmatter = parse_yaml(rmd_lines)
assert frontmatter == {"title": "Front matter only", "params": {"answer": 42}}

Writing YAML with tags

Attach Yaml to values before calling format_yaml() or write_yaml() to emit a tag.

from yaml12 import Yaml, write_yaml

tagged = Yaml("1 + x", "!expr")
write_yaml(tagged)
# stdout:
# ---
# !expr 1 + x
# ...

Tagged collections or mapping keys work the same way:

from yaml12 import Yaml, format_yaml, parse_yaml

mapping = {
    "tagged_value": Yaml(["a", "b"], "!pair"),
    Yaml("tagged-key", "!k"): "v",
}
encoded = format_yaml(mapping)
reparsed = parse_yaml(encoded)

value = reparsed["tagged_value"]
assert isinstance(value, Yaml) and value.tag == "!pair" and value.value == ["a", "b"]

key = next(k for k in reparsed if isinstance(k, Yaml))
assert key.tag == "!k" and key.value == "tagged-key" and reparsed[key] == "v"

Serializing custom Python objects

You can opt into rich types by tagging your own objects on emit and supplying a handler on parse. Here is a round-trip for a dataclass:

from dataclasses import dataclass, asdict
from yaml12 import Yaml, format_yaml, parse_yaml

@dataclass
class Server:
    name: str
    host: str
    port: int

def encode_server(server: Server) -> Yaml:
    return Yaml(asdict(server), "!server")

def decode_server(value):
    return Server(**value)

servers = [Server("api", "api.example.com", 8000), Server("db", "db.local", 5432)]
yaml_text = format_yaml([encode_server(s) for s in servers])

round_tripped = parse_yaml(yaml_text, handlers={"!server": decode_server})
assert round_tripped == servers

By keeping the on-disk representation a plain mapping plus tag, you get a stable YAML format while still round-tripping your Python types losslessly.

Anchors

Anchors (&id) name a node; aliases (*id) copy it. yaml12 resolves aliases before returning Python objects.

from yaml12 import parse_yaml

parsed = parse_yaml("""
recycle-me: &anchor-name
  a: b
  c: d

recycled:
  - *anchor-name
  - *anchor-name
""")

first, second = parsed["recycled"]
assert first["a"] == "b" and second["c"] == "d"

(Very) advanced tags

Tag directives (%TAG)

YAML lets you declare tag handles at the top of a document. The syntax is %TAG !<name>! <handle> and it applies to the rest of the document.

text = """
%TAG !e! tag:example.com,2024:widgets/
---
item: !e!gizmo foo
"""
parsed = parse_yaml(text)
assert parsed["item"].tag == "tag:example.com,2024:widgets/gizmo"

You can also declare a global tag prefix, which expands a bare !:

text = """
%TAG ! tag:example.com,2024:widgets/
---
item: !gizmo foo
"""
assert parse_yaml(text)["item"].tag == "tag:example.com,2024:widgets/gizmo"

Tag URIs

To bypass handle resolution, use !<...> with a valid URI-like string:

parsed = parse_yaml("""
%TAG ! tag:example.com,2024:widgets/
---
item: !<gizmo> foo
""")
assert parsed["item"].tag == "gizmo"

Core schema tags

Tags beginning with !! resolve against the YAML core schema handle (tag:yaml.org,2002:). Scalar core tags (!!str, !!int, !!float, !!bool, !!null, !!seq, !!map) add no information when emitting and are normalized to plain Python values. Informative core tags (!!timestamp, !!binary, !!set, !!omap, !!pairs) stay tagged so you can decide how to handle them.

from yaml12 import Yaml, parse_yaml

yaml_text = """
- !!timestamp 2025-01-01
- !!timestamp 2025-01-01 21:59:43.10-05:00
- !!binary UiBpcyBBd2Vzb21l
"""
parsed = parse_yaml(yaml_text)
assert all(isinstance(item, Yaml) for item in parsed) and (
    parsed[0].tag,
    parsed[2].tag,
) == ("tag:yaml.org,2002:timestamp", "tag:yaml.org,2002:binary")

Handlers can convert these to richer Python types:

import base64
from datetime import datetime, timezone
from yaml12 import parse_yaml

def ts_handler(value):
    return datetime.fromisoformat(str(value).replace("Z", "+00:00")).replace(tzinfo=timezone.utc)

def binary_handler(value):
    return base64.b64decode(str(value))

converted = parse_yaml(
    yaml_text,
    handlers={
        "tag:yaml.org,2002:timestamp": ts_handler,
        "tag:yaml.org,2002:binary": binary_handler,
    },
)
assert isinstance(converted[0], datetime) and isinstance(converted[2], (bytes, bytearray))