Multi-modal input

PDFs

This example comes from Google’s cookbook and extracts structured data from a PDF invoice. The goal is to extract the invoice number, date, and all list items with description, quantity, and gross worth, as well as the total gross worth.

import chatlas as ctl
from pydantic import BaseModel, Field


class Item(BaseModel):
    description: str = Field(description="The description of the item")
    quantity: float = Field(description="The Qty of the item")
    gross_worth: float = Field(description="The gross worth of the item")


class Invoice(BaseModel):
    """Extract the invoice number, date and all list items with description, quantity and gross worth and the total gross worth."""

    invoice_number: str = Field(description="The invoice number e.g. 1234567890")
    date: str = Field(description="The date of the invoice e.g. 10/09/2012")
    items: list[Item] = Field(
        description="The list of items with description, quantity and gross worth"
    )
    total_gross_worth: float = Field(description="The total gross worth of the invoice")


_ = Invoice.model_rebuild()

chat = ctl.ChatOpenAI()
chat.extract_data(
    "https://storage.googleapis.com/generativeai-downloads/data/pdf_structured_outputs/invoice.pdf",
    data_model=Invoice,
)

{
  'invoice_number': 'INV-123456789',
  'date': '09/10/2023',
  'items': [
    {'description': 'Laptop', 'quantity': 2, 'gross_worth': 2000},
    {'description': 'Smartphone', 'quantity': 5, 'gross_worth': 3500},
    {'description': 'Tablet', 'quantity': 3, 'gross_worth': 1200}
  ],
  'total_gross_worth': 6700
}

Images

This example comes from Dan Nguyen (you can see other interesting applications at that link). The goal is to extract structured data from this screenshot:

Screenshot of schedule A: a table showing assets and “unearned” income

Even without any descriptions, ChatGPT does pretty well:

import chatlas as ctl
from pydantic import BaseModel, Field
import pandas as pd

class Asset(BaseModel):
    assert_name: str
    owner: str
    location: str
    asset_value_low: int
    asset_value_high: int
    income_type: str
    income_low: int
    income_high: int
    tx_gt_1000: bool

class DisclosureReport(BaseModel):
    assets: list[Asset]

chat = ctl.ChatOpenAI()
data = chat.extract_data(
    ctl.content_image_file("../images/congressional-assets.png"),
    data_model=DisclosureReport,
)
pd.DataFrame(data["assets"])

	assert_name	owner	location	asset_value_low	asset_value_high	income_type	income_low	income_high	tx_gt_1000
0	11 Zinfandel Lane - Home & Vineyard	JT	St. Helena/Napa, CA, US	5000001	25000000	Grape Sales	100001	1000000	False
1	25 Point Lobos - Commercial Property	SP	San Francisco/San Francisco, CA, US	5000001	25000000	Rent	100001	1000000	False