parallel_chat

parallel_chat(
    chat,
    prompts,
    *,
    max_active=10,
    rpm=500,
    on_error='return',
    kwargs=None,
)

Submit multiple chat prompts in parallel.

If you have multiple prompts, you can submit them in parallel. This is typically considerably faster than submitting them in sequence, especially with providers like OpenAI and Google.

If using ChatOpenAI or ChatAnthropic and if you’re willing to wait longer, you might want to use batch_chat() instead, as it comes with a 50% discount in return for taking up to 24 hours.

Parameters

Name	Type	Description	Default
chat	`ChatT`	A base chat object.	required
prompts	list[`ContentT`] \| list[list[`ContentT`]]	A list of prompts. Each prompt can be a string or a list of string/Content objects.	required
max_active	int	The maximum number of simultaneous requests to send. For Anthropic, note that the number of active connections is limited primarily by the output tokens per minute limit (OTPM) which is estimated from the `max_tokens` parameter (defaults to 4096). If your usage tier limits you to 16,000 OTPM, you should either set `max_active = 4` (16,000 / 4096) or reduce `max_tokens` via `set_model_params()`.	`10`
rpm	int	Maximum number of requests per minute. Default is 500.	`500`
on_error	Literal['return', 'continue', 'stop']	What to do when a request fails. One of: * `"return"` (the default): stop processing new requests, wait for in-flight requests to finish, then return. * `"continue"`: keep going, performing every request. * `"stop"`: stop processing and throw an error.	`'return'`
kwargs	Optional[dict[str, Any]]	Additional keyword arguments to pass to the chat method.	`None`

Returns

Name	Type	Description
	A list with one element for each prompt. Each element is either a Chat
	object (if successful), None (if the request wasn't submitted), or an
	error object (if it failed).

Examples

Basic usage with multiple prompts:

import asyncio
import chatlas as ctl

chat = ctl.ChatOpenAI()
countries = ["Canada", "New Zealand", "Jamaica", "United States"]
prompts = [f"What's the capital of {country}?" for country in countries]

# NOTE: if running from a script, you'd need to wrap this in an async function
# and call asyncio.run(main())
chats = await ctl.parallel_chat(chat, prompts)

Using with interpolation:

import chatlas as ctl

chat = ctl.ChatOpenAI()
template = "What's the capital of {{ country }}?"

countries = ["Canada", "New Zealand", "Jamaica"]
prompts = [ctl.interpolate(template, variables={"country": c}) for c in countries]

chats = await ctl.parallel_chat(chat, prompts, max_active=5)

See Also

parallel_chat_text : Get just the text responses
parallel_chat_structured : Extract structured data
batch_chat : Batch API for discounted processing