YAML Tags, Anchors, and Advanced Features with yaml12¶
This guide picks up where the “YAML in 2 Minutes” intro leaves off. It
explains what YAML tags are and how to work with them in yaml12 using
tag handlers. Along the way we also cover complex mapping keys, document
streams, and anchors, so you can handle real-world YAML 1.2.
Tags in YAML and how yaml12 handles them¶
Tags annotate any YAML node with extra meaning. In YAML syntax a tag
always starts with !, and it appears before the node’s value; it is
not part of the scalar text itself.
yaml12 preserves tags by wrapping nodes in a Yaml object. A Yaml
is a small immutable wrapper class that carries the parsed value (a regular
Python type) and the tag string. Here is an example of parsing a tagged scalar:
from yaml12 import Yaml, parse_yaml
color = parse_yaml("!color red")
assert isinstance(color, Yaml) and (color.tag, color.value) == ("!color", "red")
The presence of a custom tag bypasses the usual scalar typing: the scalar is returned as a string even if it looks like another type.
assert parse_yaml("! true") == Yaml(value="true", tag="!")
assert parse_yaml("true") is True
Using handlers to transform tagged nodes while parsing¶
parse_yaml() and read_yaml() accept handlers: a dict mapping tag
strings to callables. Handlers run on any matching tagged node. For
tagged scalars the handler receives a plain Python scalar; for tagged
sequences or mappings, it receives a plain list or dict.
Here is an example of using a handler to evaluate !expr nodes:
from yaml12 import parse_yaml
handlers = {"!expr": lambda value: eval(str(value))}
assert parse_yaml("!expr 1 + 1", handlers=handlers) == 2
Any errors from a handler stop parsing and propagate unchanged:
from yaml12 import parse_yaml
def boom(value):
raise ValueError("boom")
parse_yaml("!boom 1", handlers={"!boom": boom}) # ValueError("boom")
Any tag without a matching handler is left preserved as a Yaml object,
and handlers without matching tags are simply unused:
from yaml12 import Yaml, parse_yaml
yaml_text = """
- !expr 1 + 1
- !upper yaml is awesome
- !note this tag has no handler
"""
handlers = {
"!expr": lambda value: eval(str(value)),
"!upper": str.upper,
"!lower": str.lower, # unused
}
out = parse_yaml(yaml_text, handlers=handlers)
assert out == [2, "YAML IS AWESOME", Yaml("this tag has no handler", "!note")]
With a tagged sequence or mapping, the handler is called with a plain list or dict:
from yaml12 import parse_yaml
def seq_handler(value):
assert value == ["a", "b"]
return "handled-seq"
def map_handler(value):
assert value == {"key1": 1, "key2": 2}
return "handled-map"
yaml_text = """
- !some_seq_tag [a, b]
- !some_map_tag {key1: 1, key2: 2}
"""
out = parse_yaml(
yaml_text,
handlers={
"!some_seq_tag": seq_handler,
"!some_map_tag": map_handler,
},
)
assert out == ["handled-seq", "handled-map"]
Handlers apply to both values and keys, including the non-specific !
tag if you register "!". If a handler raises an exception, parsing
stops and you see the exception unchanged, which makes debugging easy.
Any tag without a matching handler stays as a Yaml object, and any
unused handlers are simply ignored. Handlers make it easy to opt into
powerful behaviors while keeping the default parser strict and safe.
Post-process tags yourself¶
If you want more control, you can parse without handlers and then walk
the result yourself. For example, here is a tiny post-processor for
!expr scalars:
from yaml12 import Yaml, parse_yaml
def eval_yaml_expr_nodes(obj):
if isinstance(obj, Yaml):
if obj.tag == "!expr":
return eval(str(obj.value))
return Yaml(eval_yaml_expr_nodes(obj.value), obj.tag)
if isinstance(obj, list):
return [eval_yaml_expr_nodes(item) for item in obj]
if isinstance(obj, dict):
return {eval_yaml_expr_nodes(k): eval_yaml_expr_nodes(v) for k, v in obj.items()}
return obj
raw = parse_yaml("!expr 1 + 1")
assert isinstance(raw, Yaml) and eval_yaml_expr_nodes(raw) == 2
Because tags can also appear on mapping keys, postprocessors should walk dict keys as well (as in the example above). If you transform all tagged keys into plain hashable scalars, you can rebuild a dict with those new keys.
Mappings revisited: non-string keys and Yaml¶
In YAML, mapping keys do not have to be plain strings; any node can be a key, including booleans, numbers, sequences, or other mappings. For example, this is valid YAML even though the key is a boolean:
true: true
When a key can’t be represented directly as a plain Python dict key,
yaml12 wraps it in Yaml. That preserves the original key and makes
it hashable (with equality defined by structure, including mapping key order), so it can safely live
in a Python dict:
parsed = parse_yaml("true: true")
key = next(iter(parsed))
assert key is True and parsed[key] is True and not isinstance(key, Yaml)
For complex key values, YAML uses the explicit mapping-key indicator
?:
? [a, b]
: tuple
? {x: 1, y: 2}
: map-key
Becomes:
from yaml12 import Yaml, parse_yaml
yaml_text = """
? [a, b]
: tuple
? {x: 1, y: 2}
: map-key
"""
parsed = parse_yaml(yaml_text)
seq_key, map_key = list(parsed)
assert isinstance(seq_key, Yaml) and seq_key.value == ["a", "b"]
assert isinstance(map_key, Yaml) and map_key.value == {"x": 1, "y": 2}
assert parsed[Yaml(["a", "b"])] == "tuple"
assert parsed[Yaml({"x": 1, "y": 2})] == "map-key"
Tagged mapping keys¶
Handlers run on keys too, so a handler can turn tagged keys into friendly Python keys before they are wrapped.
handlers = {"!upper": str.upper}
result = parse_yaml("!upper key: value", handlers=handlers)
assert result == {"KEY": "value"}
If a tagged key has no matching handler, it is preserved as a Yaml
key:
from yaml12 import Yaml, parse_yaml
parsed = parse_yaml("!custom foo: 1")
key = next(iter(parsed))
assert isinstance(key, Yaml) and key.tag == "!custom" and key.value == "foo"
If you anticipate tagged mapping keys that you want to process yourself,
walk the Yaml keys alongside the values and unwrap them as needed.
Document streams and markers¶
Most YAML files contain a single YAML document. YAML also supports
document streams: multiple documents separated by --- and
optionally closed by ....
Reading multiple documents¶
parse_yaml() and read_yaml() default to multi=False. In that mode,
they stop after the first document. When multi=True, all documents in
the stream are returned as a list.
doc_stream = """
---
doc 1
---
doc 2
"""
parsed_first = parse_yaml(doc_stream)
parsed_all = parse_yaml(doc_stream, multi=True)
assert (parsed_first, parsed_all) == ("doc 1", ["doc 1", "doc 2"])
Writing multiple documents¶
write_yaml() and format_yaml() also default to a single document.
With multi=True, the value must be a sequence of documents and the
output uses --- between documents and ... after the final one. For
single documents, write_yaml() always wraps the body with --- and a
final ..., while format_yaml() returns just the body.
from yaml12 import format_yaml, write_yaml
docs = ["first", "second"]
text = format_yaml(docs, multi=True)
assert text.startswith("---") and text.rstrip().endswith("...")
write_yaml(docs, path="out.yml", multi=True)
When multi=False, parsing stops after the first document—even if later
content is not valid YAML. That makes it easy to extract front matter
from files that mix YAML with other text (like Markdown).
rmd_lines = [
"---",
"title: Front matter only",
"params:",
" answer: 42",
"---",
"# Body that is not YAML",
]
frontmatter = parse_yaml("\n".join(rmd_lines))
assert frontmatter == {"title": "Front matter only", "params": {"answer": 42}}
Writing YAML with tags¶
To emit a tag, wrap a value in Yaml before calling format_yaml() or
write_yaml().
from yaml12 import Yaml, write_yaml
tagged = Yaml("1 + x", "!expr")
write_yaml(tagged)
# stdout:
# ---
# !expr 1 + x
# ...
Tagged collections or mapping keys work the same way:
from yaml12 import Yaml, format_yaml, parse_yaml
mapping = {
"tagged_value": Yaml(["a", "b"], "!pair"),
Yaml("tagged-key", "!k"): "v",
}
encoded = format_yaml(mapping)
reparsed = parse_yaml(encoded)
value = reparsed["tagged_value"]
assert isinstance(value, Yaml) and value.tag == "!pair" and value.value == ["a", "b"]
key = next(k for k in reparsed if isinstance(k, Yaml))
assert key.tag == "!k" and key.value == "tagged-key" and reparsed[key] == "v"
Serializing custom Python objects¶
You can opt into richer domain types by tagging your own objects on emit and supplying a handler on parse. Here is a round-trip for a dataclass:
from dataclasses import dataclass, asdict
from yaml12 import Yaml, format_yaml, parse_yaml
@dataclass
class Server:
name: str
host: str
port: int
def encode_server(server: Server) -> Yaml:
return Yaml(asdict(server), "!server")
def decode_server(value):
return Server(**value)
servers = [Server("api", "api.example.com", 8000), Server("db", "db.local", 5432)]
yaml_text = format_yaml([encode_server(s) for s in servers])
round_tripped = parse_yaml(yaml_text, handlers={"!server": decode_server})
assert round_tripped == servers
By keeping the on-disk representation a plain mapping plus tag, you get a stable YAML format while still round-tripping your Python types losslessly.
Anchors¶
Anchors (&id) name a node; aliases (*id) copy it. yaml12 resolves
aliases before returning Python objects.
from yaml12 import parse_yaml
parsed = parse_yaml("""
recycle-me: &anchor-name
a: b
c: d
recycled:
- *anchor-name
- *anchor-name
""")
first, second = parsed["recycled"]
assert first["a"] == "b" and second["c"] == "d"
Debugging¶
If you want to inspect how YAML nodes are parsed before conversion to
Python types, use the internal helper yaml12._dbg_yaml() to
pretty-print the raw saphyr::Yaml structures:
import yaml12
yaml12._dbg_yaml("!custom [1, 2]")
_dbg_yaml is intended for debugging and may change without notice.
(Very) advanced tags¶
The following YAML features are uncommon, but yaml12 supports them for
full YAML 1.2 compliance.
Tag directives (%TAG)¶
YAML lets you declare tag handles at the top of a document. The syntax
is %TAG !<name>! <handle> and it applies to the rest of the document.
text = """
%TAG !e! tag:example.com,2024:widgets/
---
item: !e!gizmo foo
"""
parsed = parse_yaml(text)
assert parsed["item"].tag == "tag:example.com,2024:widgets/gizmo"
You can also declare a global tag prefix, which expands a bare !:
text = """
%TAG ! tag:example.com,2024:widgets/
---
item: !gizmo foo
"""
assert parse_yaml(text)["item"].tag == "tag:example.com,2024:widgets/gizmo"
Tag URIs¶
To specify a verbatim tag, use !<...> with a valid URI-like string. For
simple tags like gizmo, yaml12 normalizes !<gizmo> to the local tag
form !gizmo:
parsed = parse_yaml("""
%TAG ! tag:example.com,2024:widgets/
---
item: !<gizmo> foo
""")
assert parsed["item"].tag == "!gizmo"
Core schema tags¶
Tags beginning with !! resolve against the YAML core schema handle
(tag:yaml.org,2002:). Scalar core tags (!!str, !!int, !!float,
!!bool, !!null, !!seq, !!map) add no information when emitting
and are normalized to plain Python values. Informative core tags
(!!timestamp, !!binary, !!set, !!omap, !!pairs) stay tagged so
you can decide how to handle them.
YAML 1.2 removed these informative tags from the core schema, but they are
still valid tags and occasionally useful. yaml12 preserves them as Yaml
unless you supply a handler to convert them to richer Python types.
from yaml12 import Yaml, parse_yaml
yaml_text = """
- !!timestamp 2025-01-01
- !!timestamp 2025-01-01 21:59:43.10-05:00
- !!binary UiBpcyBBd2Vzb21l
"""
parsed = parse_yaml(yaml_text)
assert all(isinstance(item, Yaml) for item in parsed) and (
parsed[0].tag,
parsed[2].tag,
) == ("tag:yaml.org,2002:timestamp", "tag:yaml.org,2002:binary")
Handlers can convert these to richer Python types:
import base64
from datetime import datetime, timezone
from yaml12 import parse_yaml
def ts_handler(value):
return datetime.fromisoformat(str(value).replace("Z", "+00:00")).replace(tzinfo=timezone.utc)
def binary_handler(value):
return base64.b64decode(str(value))
converted = parse_yaml(
yaml_text,
handlers={
"!!timestamp": ts_handler,
"!!binary": binary_handler,
},
)
assert isinstance(converted[0], datetime) and isinstance(converted[2], (bytes, bytearray))