scrape.find_links()
Discover hyperlinks starting from one or many documents and return them as URLs.
Usage
scrape.find_links(
x,
depth=0,
children_only=False,
progress=True,
*,
url_filter=None,
validate=False,
**request_kwargs
)Parameters
x: str | Path | Sequence[str | Path]-
Starting URL(s). Accepts strings or paths; inputs must expand to HTTP(S) URLs.
depth: int = 0-
Maximum traversal depth from each starting document.
0inspects the starting pages only,1also inspects their direct children, and so on. children_only: bool = False-
When
True, only links that stay under the originating host are returned and traversed. progress: bool = True-
Whether to display a progress bar while traversing links. Falls back to a no-op when
tqdmis not available. url_filter: Callable[[set[str]], list[str]] | None = None-
Receives a list of URL’s and decides returns a list of urls that should be kept. POssibly smaller.
validate: bool = False-
When
True, perform a lightweight validation to ensure targets are reachable before including them in the results. request_kwargs: Any = {}-
Additional keyword arguments forwarded to
requests.Session.get()(andheadduring validation) when fetching HTTP resources.
Returns
Iterator[str]- Yields absolute link targets, deduplicated and ordered as discovered.