from yaml12 import Yaml, parse_yaml
color = parse_yaml("!color red")
assert isinstance(color, Yaml) and (color.tag, color.value) == ("!color", "red")Tags, Anchors, and Advanced YAML
This guide picks up where the “YAML in 2 Minutes” intro leaves off. It explains what YAML tags are and how to work with them in yaml12 using tag handlers. Along the way we also cover complex mapping keys, document streams, and anchors, so you can handle real-world YAML 1.2.
Mappings revisited: non-string keys and Yaml
In YAML, mapping keys do not have to be plain strings; any node can be a key, including booleans, numbers, sequences, or other mappings. For example, this is valid YAML even though the key is a boolean:
true: trueWhen a key can’t be represented directly as a plain Python dict key, yaml12 wraps it in Yaml. That preserves the original key and makes it hashable (with equality defined by structure), so it can safely live in a Python dict:
parsed = parse_yaml("true: true")
key = next(iter(parsed))
assert key is True and parsed[key] is True and not isinstance(key, Yaml)For complex key values, YAML uses the explicit mapping-key indicator ?:
? [a, b]
: tuple
? {x: 1, y: 2}
: map-keyBecomes:
from yaml12 import Yaml, parse_yaml
yaml_text = """
? [a, b]
: tuple
? {x: 1, y: 2}
: map-key
"""
parsed = parse_yaml(yaml_text)
seq_key, map_key = list(parsed)
assert isinstance(seq_key, Yaml) and seq_key.value == ["a", "b"]
assert isinstance(map_key, Yaml) and map_key.value == {"x": 1, "y": 2}
assert parsed[Yaml(["a", "b"])] == "tuple"
assert parsed[Yaml({"x": 1, "y": 2})] == "map-key"Tagged mapping keys
Handlers run on keys too, so a handler can turn tagged keys into friendly Python keys before they are wrapped.
handlers = {"!upper": str.upper}
result = parse_yaml("!upper key: value", handlers=handlers)
assert result == {"KEY": "value"}If a tagged key has no matching handler, it is preserved as a Yaml key:
from yaml12 import Yaml, parse_yaml
parsed = parse_yaml("!custom foo: 1")
key = next(iter(parsed))
assert isinstance(key, Yaml) and key.tag == "!custom" and key.value == "foo"If you anticipate tagged mapping keys that you want to process yourself, walk the Yaml keys alongside the values and unwrap them as needed.
Document streams and markers
Most YAML files contain a single YAML document. YAML also supports document streams: multiple documents separated by --- and optionally closed by ....
Reading multiple documents
parse_yaml() and read_yaml() default to multi=False. In that mode, they stop after the first document. When multi=True, all documents in the stream are returned as a list.
doc_stream = """
---
doc 1
---
doc 2
"""
parsed_first = parse_yaml(doc_stream)
parsed_all = parse_yaml(doc_stream, multi=True)
assert (parsed_first, parsed_all) == ("doc 1", ["doc 1", "doc 2"])Writing multiple documents
write_yaml() and format_yaml() also default to a single document. With multi=True, the value must be a sequence of documents and the output uses --- between documents and ... after the final one. For single documents, write_yaml() always wraps the body with --- and a final ..., while format_yaml() returns just the body.
import tempfile
from pathlib import Path
from yaml12 import format_yaml, write_yaml
docs = ["first", "second"]
text = format_yaml(docs, multi=True)
assert text.startswith("---") and text.rstrip().endswith("...")
with tempfile.TemporaryDirectory() as tmpdir:
out_path = Path(tmpdir) / "out.yml"
write_yaml(docs, path=out_path, multi=True)
assert out_path.read_text(encoding="utf-8") == textWhen multi=False, parsing stops after the first document—even if later content is not valid YAML. That makes it easy to extract front matter from files that mix YAML with other text (like Markdown).
rmd_lines = [
"---\n",
"title: Front matter only\n",
"params:\n",
" answer: 42\n",
"---\n",
"# Body that is not YAML\n",
]
frontmatter = parse_yaml(rmd_lines)
assert frontmatter == {"title": "Front matter only", "params": {"answer": 42}}Serializing custom Python objects
You can opt into richer domain types by tagging your own objects on emit and supplying a handler on parse. Here is a round-trip for a dataclass:
from dataclasses import dataclass, asdict
from yaml12 import Yaml, format_yaml, parse_yaml
@dataclass
class Server:
name: str
host: str
port: int
def encode_server(server: Server) -> Yaml:
return Yaml(asdict(server), "!server")
def decode_server(value):
return Server(**value)
servers = [Server("api", "api.example.com", 8000), Server("db", "db.local", 5432)]
yaml_text = format_yaml([encode_server(s) for s in servers])
round_tripped = parse_yaml(yaml_text, handlers={"!server": decode_server})
assert round_tripped == serversBy keeping the on-disk representation a plain mapping plus tag, you get a stable YAML format while still round-tripping your Python types losslessly.
Anchors
Anchors (&id) name a node; aliases (*id) copy it. yaml12 resolves aliases before returning Python objects.
from yaml12 import parse_yaml
parsed = parse_yaml("""
recycle-me: &anchor-name
a: b
c: d
recycled:
- *anchor-name
- *anchor-name
""")
first, second = parsed["recycled"]
assert first["a"] == "b" and second["c"] == "d"Debugging
If you want to inspect how YAML nodes are parsed before conversion to Python types, use the internal helper yaml12._dbg_yaml() to pretty-print the raw saphyr::Yaml structures:
import yaml12
yaml12._dbg_yaml("!custom [1, 2]")[
Tagged(
Tag {
handle: "!",
suffix: "custom",
},
Sequence(
[
Representation(
"1",
Plain,
None,
),
Representation(
"2",
Plain,
None,
),
],
),
),
]
_dbg_yaml is intended for debugging and may change without notice.