parallel_chat
parallel_chat(
chat,
prompts,
*,
max_active=10,
rpm=500,
on_error='return',
kwargs=None,
)Submit multiple chat prompts in parallel.
If you have multiple prompts, you can submit them in parallel. This is typically considerably faster than submitting them in sequence, especially with providers like OpenAI and Google.
If using ChatOpenAI or ChatAnthropic and if you’re willing to wait longer, you might want to use batch_chat() instead, as it comes with a 50% discount in return for taking up to 24 hours.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| chat | ChatT |
A base chat object. | required |
| prompts | list[ContentT] | list[list[ContentT]] |
A list of prompts. Each prompt can be a string or a list of string/Content objects. | required |
| max_active | int | The maximum number of simultaneous requests to send. For Anthropic, note that the number of active connections is limited primarily by the output tokens per minute limit (OTPM) which is estimated from the max_tokens parameter (defaults to 4096). If your usage tier limits you to 16,000 OTPM, you should either set max_active = 4 (16,000 / 4096) or reduce max_tokens via set_model_params(). |
10 |
| rpm | int | Maximum number of requests per minute. Default is 500. | 500 |
| on_error | Literal['return', 'continue', 'stop'] | What to do when a request fails. One of: * "return" (the default): stop processing new requests, wait for in-flight requests to finish, then return. * "continue": keep going, performing every request. * "stop": stop processing and throw an error. |
'return' |
| kwargs | Optional[dict[str, Any]] | Additional keyword arguments to pass to the chat method. | None |
Returns
| Name | Type | Description |
|---|---|---|
| A list with one element for each prompt. Each element is either a Chat | ||
| object (if successful), None (if the request wasn't submitted), or an | ||
| error object (if it failed). |
Examples
Basic usage with multiple prompts:
import asyncio
import chatlas as ctl
chat = ctl.ChatOpenAI()
countries = ["Canada", "New Zealand", "Jamaica", "United States"]
prompts = [f"What's the capital of {country}?" for country in countries]
# NOTE: if running from a script, you'd need to wrap this in an async function
# and call asyncio.run(main())
chats = await ctl.parallel_chat(chat, prompts)Using with interpolation:
import chatlas as ctl
chat = ctl.ChatOpenAI()
template = "What's the capital of {{ country }}?"
countries = ["Canada", "New Zealand", "Jamaica"]
prompts = [ctl.interpolate(template, variables={"country": c}) for c in countries]
chats = await ctl.parallel_chat(chat, prompts, max_active=5)See Also
parallel_chat_text: Get just the text responsesparallel_chat_structured: Extract structured databatch_chat: Batch API for discounted processing