crawl.FetchedSource
A fetched source document and its metadata, prior to conversion.
Usage
crawl.FetchedSource(
origin,
body_path,
resolved_origin=None,
content_type=None,
status_code=None,
metadata=None,
fetched_at=None,
revalidated_at=None,
markdown_path=None
)A FetchedSource is the intermediate result returned by a crawler’s fetch_raw(). It points at the raw body on disk and carries the metadata needed to convert it to a MarkdownDocument. Custom convert callables passed to fetch_markdown() or markdown_documents() receive an instance of this class.
Parameter Attributes
origin: str-
The canonical origin the source was requested from (a
file://URI for local files, anhttp(s)URL for web sources). body_path: Path-
Filesystem path to the raw fetched body. For local files this is the file itself; for web and Cloudflare sources it is a cached copy.
resolved_origin: str | None = None-
The final origin after any redirects, when it differs from origin.
content_type: str | None = None-
The reported MIME type, such as
"text/html", when available. status_code: int | None = None-
The HTTP status code for web sources, when available.
metadata: dict[str, Any] | None = None-
Backend-specific metadata, such as the detected
type_label, validators (etag,last_modified), and source hashes. fetched_at: datetime | None = None-
When the source body was fetched, when known.
revalidated_at: datetime | None = None-
When a cached body was last revalidated against the server, when known.
markdown_path: Path | None = None-
Filesystem path to already-converted Markdown, when the backend produced or cached it.
Nonewhen conversion has not run.