web_monitoring.db.Client.get_pages¶
-
Client.
get_pages
(*, chunk=None, chunk_size=None, sort=None, tags=None, maintainers=None, url=None, title=None, include_versions=None, include_earliest=None, include_latest=None, source_type=None, hash=None, start_date=None, end_date=None, active=None, include_total=False)[source]¶ Get an iterable of all pages, optionally filtered by search criteria.
Any metadata about each paginated chunk of results is available on the “_list_meta” field of each page, e.g:
>>> pages = client.get_pages(include_total=True) >>> next(pages)['_list_meta'] {'total_results': 123456}
- Parameters
- chunkinteger, optional
Pagination chunk to start iterating from. If unset, starts at the beginning of the result set. (Under the hood, results are retrieved in “chunks”; using this to skip partway into the results is more optimized that skipping over the first few items in the iterable.)
- chunk_sizeinteger, optional
Number of items per chunk. (Under the hood, results are retrieved in “chunks”; this specifies how big those chunks are.)
- sortlist of string, optional
Fields to sort by in {field}:{order} format, e.g. title:asc.
- tagslist of string, optional
- maintainerslist of string, optional
- urlstring, optional
- titlestring, optional
- include_versionsboolean, optional
- include_earliestboolean, optional
- include_latestboolean, optional
- source_typestring, optional
Only include pages that have versions from a given source, e.g. ‘versionista’ or ‘internet_archive’.
- hashstring, optional
Only include pages that have versions whose response body has a given SHA-256 hash.
- start_datedatetime, optional
- end_datedatetime, optional
- activeboolean, optional
- include_totalboolean, optional
Whether to include a meta.total_results field in the response. If not set, links.last will usually be empty unless you are on the last chunk. Setting this option runs a pretty expensive query, so use it sparingly. (Default: False)
- Yields
- pagedict
Data about a page.