YAML Tags, Anchors, and Advanced Features with yaml12¶

This guide picks up where the “YAML in 2 Minutes” intro leaves off. It explains what YAML tags are and how to work with them in yaml12 using tag handlers. Along the way we also cover complex mapping keys, document streams, and anchors, so you can handle real-world YAML 1.2.

Tags in YAML and how yaml12 handles them¶

Tags annotate any YAML node with extra meaning. In YAML syntax a tag always starts with !, and it appears before the node’s value; it is not part of the scalar text itself.

yaml12 preserves tags by wrapping nodes in a Yaml object. A Yaml is a small immutable wrapper class that carries the parsed value (a regular Python type) and the tag string. Here is an example of parsing a tagged scalar:

from yaml12 import Yaml, parse_yaml

color = parse_yaml("!color red")
assert isinstance(color, Yaml) and (color.tag, color.value) == ("!color", "red")

The presence of a custom tag bypasses the usual scalar typing: the scalar is returned as a string even if it looks like another type.

assert parse_yaml("! true") == Yaml(value="true", tag="!")
assert parse_yaml("true") is True

Using handlers to transform tagged nodes while parsing¶

parse_yaml() and read_yaml() accept handlers: a dict mapping tag strings to callables. Handlers run on any matching tagged node. For tagged scalars the handler receives a plain Python scalar; for tagged sequences or mappings, it receives a plain list or dict.

Here is an example of using a handler to evaluate !expr nodes:

from yaml12 import parse_yaml

handlers = {"!expr": lambda value: eval(str(value))}

assert parse_yaml("!expr 1 + 1", handlers=handlers) == 2

Any errors from a handler stop parsing and propagate unchanged:

from yaml12 import parse_yaml

def boom(value):
    raise ValueError("boom")

parse_yaml("!boom 1", handlers={"!boom": boom})  # ValueError("boom")

Any tag without a matching handler is left preserved as a Yaml object, and handlers without matching tags are simply unused:

from yaml12 import Yaml, parse_yaml

yaml_text = """
- !expr 1 + 1
- !upper yaml is awesome
- !note this tag has no handler
"""

handlers = {
    "!expr": lambda value: eval(str(value)),
    "!upper": str.upper,
    "!lower": str.lower,  # unused
}

out = parse_yaml(yaml_text, handlers=handlers)
assert out == [2, "YAML IS AWESOME", Yaml("this tag has no handler", "!note")]

With a tagged sequence or mapping, the handler is called with a plain list or dict:

from yaml12 import parse_yaml

def seq_handler(value):
    assert value == ["a", "b"]
    return "handled-seq"

def map_handler(value):
    assert value == {"key1": 1, "key2": 2}
    return "handled-map"

yaml_text = """
- !some_seq_tag [a, b]
- !some_map_tag {key1: 1, key2: 2}
"""

out = parse_yaml(
    yaml_text,
    handlers={
        "!some_seq_tag": seq_handler,
        "!some_map_tag": map_handler,
    },
)
assert out == ["handled-seq", "handled-map"]

Handlers apply to both values and keys, including the non-specific ! tag if you register "!". If a handler raises an exception, parsing stops and you see the exception unchanged, which makes debugging easy.

Any tag without a matching handler stays as a Yaml object, and any unused handlers are simply ignored. Handlers make it easy to opt into powerful behaviors while keeping the default parser strict and safe.

Post-process tags yourself¶

If you want more control, you can parse without handlers and then walk the result yourself. For example, here is a tiny post-processor for !expr scalars:

from yaml12 import Yaml, parse_yaml

def eval_yaml_expr_nodes(obj):
    if isinstance(obj, Yaml):
        if obj.tag == "!expr":
            return eval(str(obj.value))
        return Yaml(eval_yaml_expr_nodes(obj.value), obj.tag)
    if isinstance(obj, list):
        return [eval_yaml_expr_nodes(item) for item in obj]
    if isinstance(obj, dict):
        return {eval_yaml_expr_nodes(k): eval_yaml_expr_nodes(v) for k, v in obj.items()}
    return obj

raw = parse_yaml("!expr 1 + 1")
assert isinstance(raw, Yaml) and eval_yaml_expr_nodes(raw) == 2

Because tags can also appear on mapping keys, postprocessors should walk dict keys as well (as in the example above). If you transform all tagged keys into plain hashable scalars, you can rebuild a dict with those new keys.

Mappings revisited: non-string keys and `Yaml`¶

In YAML, mapping keys do not have to be plain strings; any node can be a key, including booleans, numbers, sequences, or other mappings. For example, this is valid YAML even though the key is a boolean:

true: true

When a key can’t be represented directly as a plain Python dict key, yaml12 wraps it in Yaml. That preserves the original key and makes it hashable (with equality defined by structure, including mapping key order), so it can safely live in a Python dict:

parsed = parse_yaml("true: true")
key = next(iter(parsed))
assert key is True and parsed[key] is True and not isinstance(key, Yaml)

For complex key values, YAML uses the explicit mapping-key indicator ?:

? [a, b]
: tuple
? {x: 1, y: 2}
: map-key

Becomes:

from yaml12 import Yaml, parse_yaml

yaml_text = """
? [a, b]
: tuple
? {x: 1, y: 2}
: map-key
"""
parsed = parse_yaml(yaml_text)

seq_key, map_key = list(parsed)
assert isinstance(seq_key, Yaml) and seq_key.value == ["a", "b"]
assert isinstance(map_key, Yaml) and map_key.value == {"x": 1, "y": 2}
assert parsed[Yaml(["a", "b"])] == "tuple"
assert parsed[Yaml({"x": 1, "y": 2})] == "map-key"

Tagged mapping keys¶

Handlers run on keys too, so a handler can turn tagged keys into friendly Python keys before they are wrapped.

handlers = {"!upper": str.upper}
result = parse_yaml("!upper key: value", handlers=handlers)
assert result == {"KEY": "value"}

If a tagged key has no matching handler, it is preserved as a Yaml key:

from yaml12 import Yaml, parse_yaml

parsed = parse_yaml("!custom foo: 1")
key = next(iter(parsed))
assert isinstance(key, Yaml) and key.tag == "!custom" and key.value == "foo"

If you anticipate tagged mapping keys that you want to process yourself, walk the Yaml keys alongside the values and unwrap them as needed.

Document streams and markers¶

Most YAML files contain a single YAML document. YAML also supports document streams: multiple documents separated by --- and optionally closed by ....

Reading multiple documents¶

parse_yaml() and read_yaml() default to multi=False. In that mode, they stop after the first document. When multi=True, all documents in the stream are returned as a list.

doc_stream = """
---
doc 1
---
doc 2
"""

parsed_first = parse_yaml(doc_stream)
parsed_all = parse_yaml(doc_stream, multi=True)
assert (parsed_first, parsed_all) == ("doc 1", ["doc 1", "doc 2"])

Writing multiple documents¶

write_yaml() and format_yaml() also default to a single document. With multi=True, the value must be a sequence of documents and the output uses --- between documents and ... after the final one. For single documents, write_yaml() always wraps the body with --- and a final ..., while format_yaml() returns just the body.

from yaml12 import format_yaml, write_yaml

docs = ["first", "second"]
text = format_yaml(docs, multi=True)
assert text.startswith("---") and text.rstrip().endswith("...")
write_yaml(docs, path="out.yml", multi=True)

When multi=False, parsing stops after the first document—even if later content is not valid YAML. That makes it easy to extract front matter from files that mix YAML with other text (like Markdown).

rmd_lines = [
    "---",
    "title: Front matter only",
    "params:",
    "  answer: 42",
    "---",
    "# Body that is not YAML",
]
frontmatter = parse_yaml("\n".join(rmd_lines))
assert frontmatter == {"title": "Front matter only", "params": {"answer": 42}}

Writing YAML with tags¶

To emit a tag, wrap a value in Yaml before calling format_yaml() or write_yaml().

from yaml12 import Yaml, write_yaml

tagged = Yaml("1 + x", "!expr")
write_yaml(tagged)
# stdout:
# ---
# !expr 1 + x
# ...

Tagged collections or mapping keys work the same way:

from yaml12 import Yaml, format_yaml, parse_yaml

mapping = {
    "tagged_value": Yaml(["a", "b"], "!pair"),
    Yaml("tagged-key", "!k"): "v",
}
encoded = format_yaml(mapping)
reparsed = parse_yaml(encoded)

value = reparsed["tagged_value"]
assert isinstance(value, Yaml) and value.tag == "!pair" and value.value == ["a", "b"]

key = next(k for k in reparsed if isinstance(k, Yaml))
assert key.tag == "!k" and key.value == "tagged-key" and reparsed[key] == "v"

Serializing custom Python objects¶

You can opt into richer domain types by tagging your own objects on emit and supplying a handler on parse. Here is a round-trip for a dataclass:

from dataclasses import dataclass, asdict
from yaml12 import Yaml, format_yaml, parse_yaml

@dataclass
class Server:
    name: str
    host: str
    port: int

def encode_server(server: Server) -> Yaml:
    return Yaml(asdict(server), "!server")

def decode_server(value):
    return Server(**value)

servers = [Server("api", "api.example.com", 8000), Server("db", "db.local", 5432)]
yaml_text = format_yaml([encode_server(s) for s in servers])

round_tripped = parse_yaml(yaml_text, handlers={"!server": decode_server})
assert round_tripped == servers

By keeping the on-disk representation a plain mapping plus tag, you get a stable YAML format while still round-tripping your Python types losslessly.

Anchors¶

Anchors (&id) name a node; aliases (*id) copy it. yaml12 resolves aliases before returning Python objects.

from yaml12 import parse_yaml

parsed = parse_yaml("""
recycle-me: &anchor-name
  a: b
  c: d

recycled:
  - *anchor-name
  - *anchor-name
""")

first, second = parsed["recycled"]
assert first["a"] == "b" and second["c"] == "d"

Debugging¶

If you want to inspect how YAML nodes are parsed before conversion to Python types, use the internal helper yaml12._dbg_yaml() to pretty-print the raw saphyr::Yaml structures:

import yaml12

yaml12._dbg_yaml("!custom [1, 2]")

_dbg_yaml is intended for debugging and may change without notice.

(Very) advanced tags¶

The following YAML features are uncommon, but yaml12 supports them for full YAML 1.2 compliance.

Tag directives (`%TAG`)¶

YAML lets you declare tag handles at the top of a document. The syntax is %TAG !<name>! <handle> and it applies to the rest of the document.

text = """
%TAG !e! tag:example.com,2024:widgets/
---
item: !e!gizmo foo
"""
parsed = parse_yaml(text)
assert parsed["item"].tag == "tag:example.com,2024:widgets/gizmo"

You can also declare a global tag prefix, which expands a bare !:

text = """
%TAG ! tag:example.com,2024:widgets/
---
item: !gizmo foo
"""
assert parse_yaml(text)["item"].tag == "tag:example.com,2024:widgets/gizmo"

Tag URIs¶

To specify a verbatim tag, use !<...> with a valid URI-like string. For simple tags like gizmo, yaml12 normalizes !<gizmo> to the local tag form !gizmo:

parsed = parse_yaml("""
%TAG ! tag:example.com,2024:widgets/
---
item: !<gizmo> foo
""")
assert parsed["item"].tag == "!gizmo"

Core schema tags¶

Tags beginning with !! resolve against the YAML core schema handle (tag:yaml.org,2002:). Scalar core tags (!!str, !!int, !!float, !!bool, !!null, !!seq, !!map) add no information when emitting and are normalized to plain Python values. Informative core tags (!!timestamp, !!binary, !!set, !!omap, !!pairs) stay tagged so you can decide how to handle them.

YAML 1.2 removed these informative tags from the core schema, but they are still valid tags and occasionally useful. yaml12 preserves them as Yaml unless you supply a handler to convert them to richer Python types.

from yaml12 import Yaml, parse_yaml

yaml_text = """
- !!timestamp 2025-01-01
- !!timestamp 2025-01-01 21:59:43.10-05:00
- !!binary UiBpcyBBd2Vzb21l
"""
parsed = parse_yaml(yaml_text)
assert all(isinstance(item, Yaml) for item in parsed) and (
    parsed[0].tag,
    parsed[2].tag,
) == ("tag:yaml.org,2002:timestamp", "tag:yaml.org,2002:binary")

Handlers can convert these to richer Python types:

import base64
from datetime import datetime, timezone
from yaml12 import parse_yaml

def ts_handler(value):
    return datetime.fromisoformat(str(value).replace("Z", "+00:00")).replace(tzinfo=timezone.utc)

def binary_handler(value):
    return base64.b64decode(str(value))

converted = parse_yaml(
    yaml_text,
    handlers={
        "!!timestamp": ts_handler,
        "!!binary": binary_handler,
    },
)
assert isinstance(converted[0], datetime) and isinstance(converted[2], (bytes, bytearray))

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search