----------------------------------------------------------------------
This is the API documentation for the gdtest_tbl_preview library.
----------------------------------------------------------------------


## Functions

Public functions


sample_scores(n: 'int' = 20) -> 'dict[str, list]'

Generate a student scores dataset.

Parameters
----------
n
    Number of rows.

Returns
-------
dict[str, list]
    Column-oriented dict with name, subject, score, grade, and
    pass/fail columns.

Examples
--------
>>> data = sample_scores(5)
>>> len(data["name"])
5

sample_inventory(n: 'int' = 30) -> 'dict[str, list]'

Generate a product inventory dataset.

Parameters
----------
n
    Number of rows.

Returns
-------
dict[str, list]
    Column-oriented dict with product, category, price, stock,
    and rating columns.

Examples
--------
>>> data = sample_inventory(10)
>>> len(data["product"])
10

sample_wide(n_rows: 'int' = 15, n_cols: 'int' = 20) -> 'dict[str, list]'

Generate a wide dataset with many columns.

Parameters
----------
n_rows
    Number of rows.
n_cols
    Number of columns.

Returns
-------
dict[str, list]
    Column-oriented dict with columns named ``col_001``
    through ``col_{n_cols:03d}``.

Examples
--------
>>> data = sample_wide(5, 8)
>>> len(data)
8

sample_missing(n: 'int' = 15) -> 'dict[str, list]'

Generate a dataset riddled with missing values.

Parameters
----------
n
    Number of rows.

Returns
-------
dict[str, list]
    Column-oriented dict where roughly 25 percent of values are
    ``None`` or ``float('nan')``.

Examples
--------
>>> data = sample_missing(10)
>>> None in data["alpha"]
True

sample_types() -> 'dict[str, list]'

Generate a dataset that exercises many Python types.

Returns
-------
dict[str, list]
    Six rows with int, float, bool, string, None, and large-number
    columns.

Examples
--------
>>> data = sample_types()
>>> len(data["integer"])
6


----------------------------------------------------------------------
This is the User Guide documentation for the package.
----------------------------------------------------------------------

## Default Settings

The simplest way to use `tbl_preview()` — pass a column-oriented
dict and let the defaults do the work.

```{python}
from great_docs import tbl_preview
from gdtest_tbl_preview import sample_scores

tbl_preview(sample_scores(20))
```

## From a List of Dicts

You can also pass a list of row dicts:

```{python}
rows = [
    {"city": "Tokyo", "pop_m": 37.4, "country": "Japan"},
    {"city": "Delhi", "pop_m": 32.9, "country": "India"},
    {"city": "Shanghai", "pop_m": 29.2, "country": "China"},
    {"city": "São Paulo", "pop_m": 22.4, "country": "Brazil"},
    {"city": "Mexico City", "pop_m": 21.8, "country": "Mexico"},
]
tbl_preview(rows)
```

## With a Caption

```{python}
tbl_preview(
    sample_scores(12),
    caption="Student Performance — Fall 2025",
)
```


## Pandas DataFrame

Pass a Pandas DataFrame directly. The preview auto-detects the
library and shows a **Pandas** badge.

```{python}
import pandas as pd
from great_docs import tbl_preview

df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie", "Diana", "Eve",
             "Frank", "Grace", "Hank", "Iris", "Jack",
             "Kate", "Leo", "Mia", "Noah", "Olivia"],
    "department": ["Eng", "Sales", "Eng", "HR", "Sales",
                   "Eng", "HR", "Sales", "Eng", "HR",
                   "Sales", "Eng", "HR", "Sales", "Eng"],
    "salary": [95000, 72000, 88000, 65000, 78000,
              105000, 62000, 81000, 92000, 58000,
              74000, 110000, 67000, 83000, 97000],
    "years": [5, 3, 7, 2, 4, 10, 1, 6, 8, 3, 4, 12, 2, 5, 9],
})

tbl_preview(df)
```

## Custom Head and Tail

Show 8 rows from the top and 3 from the bottom:

```{python}
tbl_preview(df, n_head=8, n_tail=3)
```

## Show All Rows

```{python}
tbl_preview(df, show_all=True)
```


## Polars DataFrame

Polars DataFrames are detected automatically and show a blue
**Polars** badge with precise dtype labels.

```{python}
import polars as pl
from great_docs import tbl_preview

df = pl.DataFrame({
    "id": range(1, 26),
    "value": [x * 1.1 for x in range(1, 26)],
    "category": ["A", "B", "C", "D", "E"] * 5,
    "flag": [True, False] * 12 + [True],
})

tbl_preview(df)
```

## Head Only (No Tail)

```{python}
tbl_preview(df, n_head=10, n_tail=0)
```


## Highlighted Missing Values

By default, `None` and `NaN` values are highlighted in red:

```{python}
from great_docs import tbl_preview
from gdtest_tbl_preview import sample_missing

tbl_preview(sample_missing(15))
```

## Without Highlighting

Turn off missing-value highlighting with `highlight_missing=False`:

```{python}
tbl_preview(sample_missing(15), highlight_missing=False)
```

## Mixed Python Types

Inf, NaN, None, empty strings, HTML-unsafe characters, and large
numbers:

```{python}
from gdtest_tbl_preview import sample_types

tbl_preview(sample_types(), show_all=True)
```


## Column Subset

Select and reorder columns with the `columns` parameter:

```{python}
from great_docs import tbl_preview
from gdtest_tbl_preview import sample_inventory

data = sample_inventory(25)
tbl_preview(data, columns=["product", "price", "rating"])
```

## Wide Table

A table with 20 columns overflows and scrolls horizontally:

```{python}
from gdtest_tbl_preview import sample_wide

tbl_preview(sample_wide(12, 20))
```

## No Row Numbers

```{python}
tbl_preview(
    sample_inventory(10),
    show_row_numbers=False,
)
```

## No Dtype Labels

```{python}
tbl_preview(
    sample_inventory(10),
    show_dtypes=False,
)
```


## Minimal Chrome

Turn off every optional element — no row numbers, no dtypes,
no dimension badges:

```{python}
from great_docs import tbl_preview
from gdtest_tbl_preview import sample_scores

tbl_preview(
    sample_scores(8),
    show_row_numbers=False,
    show_dtypes=False,
    show_dimensions=False,
    show_all=True,
)
```

## Full Chrome with Caption

Everything enabled plus a caption:

```{python}
tbl_preview(
    sample_scores(50),
    n_head=10,
    n_tail=5,
    caption="Top & bottom of the class roster",
)
```

## Custom Column Width

Restrict columns to 120px max width:

```{python}
tbl_preview(
    sample_scores(15),
    max_col_width=120,
    min_tbl_width=400,
)
```

## Side-by-Side Comparison

Default Pandas output vs. `tbl_preview()` on the same data:

::: {layout-ncol=2}

```{python}
#| echo: false
import pandas as pd
df = pd.DataFrame(sample_scores(10))
df
```

```{python}
#| echo: false
tbl_preview(df)
```

:::


## Long Strings (Default Width)

Cells with very long text are capped at `max_col_width` (250px
by default) and show an ellipsis instead of wrapping.

```{python}
from great_docs import tbl_preview

data = {
    "id": [1, 2, 3, 4, 5],
    "title": [
        "A short title",
        "A moderately long title that tests mid-range widths",
        "This title is intentionally very long so that it will definitely exceed the maximum column width and trigger text-overflow ellipsis behavior in the rendered table cell",
        "Brief",
        "Another extremely verbose title string that goes on and on to stress-test the truncation and overflow handling in the preview table renderer",
    ],
    "status": ["draft", "published", "review", "archived", "published"],
}

tbl_preview(data, show_all=True)
```

## Descriptions and Paragraphs

Real-world data often has paragraph-length text in columns.

```{python}
data = {
    "package": ["NumPy", "Pandas", "Polars", "Great Tables", "Pointblank"],
    "description": [
        "Fundamental package for scientific computing with Python. Provides N-dimensional arrays, linear algebra, Fourier transforms, and random number generation.",
        "Powerful data structures for data analysis, time series, and statistics. Built on NumPy with labeled axes, automatic alignment, and rich I/O.",
        "Lightning-fast DataFrame library in Rust with a Python API. Lazy evaluation, multi-threaded queries, and Apache Arrow memory format.",
        "Build beautiful, publication-quality tables in Python. Supports Polars and Pandas DataFrames with fine-grained styling, formatting, and export.",
        "Data validation library for Python. Define expectations, validate data, and generate detailed reports with table-level and column-level checks.",
    ],
    "version": ["1.26.0", "2.2.0", "0.20.0", "0.15.0", "0.14.0"],
}

tbl_preview(data, show_all=True)
```

## Narrow Max Width (120px)

Force aggressive truncation with a tight `max_col_width`:

```{python}
tbl_preview(data, show_all=True, max_col_width=120)
```

## Wide Max Width (500px)

Allow generous room — long text is still capped, but more is visible:

```{python}
tbl_preview(data, show_all=True, max_col_width=500)
```

## Mixed Short and Long Columns

Short numeric/code columns alongside verbose text — each column
gets its own computed width.

```{python}
data = {
    "code": ["E001", "E002", "E003", "W001", "W002", "I001", "I002", "E004"],
    "severity": ["error", "error", "error", "warning", "warning", "info", "info", "error"],
    "message": [
        "Undefined variable: foobar",
        "Type mismatch: expected int, got str in argument `count` of function process_batch()",
        "Division by zero in expression total / n_items where n_items evaluates to 0",
        "Unused import: os (imported but never referenced in module)",
        "Variable `tmp` assigned on line 42 but never used anywhere in the function body",
        "Module docstring missing: consider adding a module-level docstring",
        "Line too long: 127 characters (max 120). Consider breaking this into multiple lines for readability",
        "Syntax error: unexpected token ) at position 34 in expression parse(input))",
    ],
    "line": [12, 45, 78, 3, 42, 1, 99, 34],
}

tbl_preview(data, show_all=True)
```


## Read a TSV File

`tbl_preview()` auto-detects `.tsv` and `.tab` files and reads
them with tab-delimited parsing.

```{python}
#| echo: false
import pathlib

tsv_path = pathlib.Path('assets/cities.tsv')
tsv_path.parent.mkdir(parents=True, exist_ok=True)
tsv_path.write_text(
    'city\tcountry\tpopulation\tarea_km2\n'
    'Tokyo\tJapan\t13960000\t2194\n'
    'Delhi\tIndia\t11030000\t1484\n'
    'Shanghai\tChina\t24870000\t6341\n'
    'São Paulo\tBrazil\t12330000\t1521\n'
    'Mexico City\tMexico\t9210000\t1485\n'
    'Cairo\tEgypt\t9540000\t3085\n'
    'Mumbai\tIndia\t12440000\t603\n'
    'Beijing\tChina\t21540000\t16411\n'
)
```

```{python}
from great_docs import tbl_preview

tbl_preview('assets/cities.tsv', show_all=True)
```

The badge shows **TSV** and the header reports the correct
row and column counts.


## Read a JSONL File

Newline-delimited JSON (`.jsonl` / `.ndjson`) is a common
format for streaming data and log records.

```{python}
#| echo: false
import pathlib, json

records = [
    {'timestamp': '2025-01-15T08:30:00', 'level': 'INFO', 'module': 'auth', 'message': 'User login successful'},
    {'timestamp': '2025-01-15T08:31:12', 'level': 'WARNING', 'module': 'db', 'message': 'Slow query detected (3.2s)'},
    {'timestamp': '2025-01-15T08:32:45', 'level': 'ERROR', 'module': 'api', 'message': 'Request timeout on /v2/users'},
    {'timestamp': '2025-01-15T08:33:01', 'level': 'INFO', 'module': 'cache', 'message': 'Cache miss for key user:42'},
    {'timestamp': '2025-01-15T08:34:20', 'level': 'DEBUG', 'module': 'auth', 'message': 'Token refresh for session abc123'},
    {'timestamp': '2025-01-15T08:35:55', 'level': 'ERROR', 'module': 'db', 'message': 'Connection pool exhausted'},
    {'timestamp': '2025-01-15T08:36:10', 'level': 'INFO', 'module': 'api', 'message': 'Health check passed'},
    {'timestamp': '2025-01-15T08:37:30', 'level': 'WARNING', 'module': 'auth', 'message': 'Failed login attempt from 192.168.1.100'},
]

jsonl_path = pathlib.Path('assets/server_logs.jsonl')
jsonl_path.parent.mkdir(parents=True, exist_ok=True)
jsonl_path.write_text('\n'.join(json.dumps(r) for r in records) + '\n')
```

```{python}
from great_docs import tbl_preview

tbl_preview('assets/server_logs.jsonl', show_all=True)
```

## NDJSON Extension

The `.ndjson` extension is treated identically:

```{python}
#| echo: false
import shutil
shutil.copy('assets/server_logs.jsonl', 'assets/server_logs.ndjson')
```

```{python}
tbl_preview('assets/server_logs.ndjson', show_all=True)
```


## Read a Parquet File

Apache Parquet is a columnar storage format popular in data
engineering workflows.

```{python}
#| echo: false
import polars as pl, pathlib

df = pl.DataFrame({
    'product': ['Widget', 'Gadget', 'Gizmo', 'Doohickey', 'Thingamajig'],
    'category': ['Electronics', 'Tools', 'Kitchen', 'Garden', 'Office'],
    'price': [29.99, 49.50, 12.00, 8.75, 199.99],
    'in_stock': [True, False, True, True, False],
    'rating': [4.5, 3.8, 4.9, 4.2, 2.1],
})

pq_path = pathlib.Path('assets/products.parquet')
pq_path.parent.mkdir(parents=True, exist_ok=True)
df.write_parquet(str(pq_path))
```

```{python}
from great_docs import tbl_preview

tbl_preview('assets/products.parquet', show_all=True)
```

The badge shows **Parquet** and dtype labels are preserved
from the original Polars schema.


## Feather File

Feather (Apache Arrow IPC format) is fast for local analytics.

```{python}
#| echo: false
import polars as pl, pathlib

df = pl.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'],
    'department': ['Engineering', 'Marketing', 'Engineering', 'Sales', 'Marketing', 'Sales'],
    'salary': [95000, 72000, 105000, 68000, 88000, 71000],
    'years': [5, 3, 8, 2, 6, 4],
})

feather_path = pathlib.Path('assets/employees.feather')
feather_path.parent.mkdir(parents=True, exist_ok=True)
df.write_ipc(str(feather_path))
```

```{python}
from great_docs import tbl_preview

tbl_preview('assets/employees.feather', show_all=True)
```

## Arrow IPC Extension

Files with `.arrow` or `.ipc` extensions are also read as
Arrow IPC, but get the **Arrow** badge instead of Feather:

```{python}
#| echo: false
import shutil
shutil.copy('assets/employees.feather', 'assets/employees.arrow')
```

```{python}
tbl_preview('assets/employees.arrow', show_all=True)
```


## In-Memory Arrow Table

`tbl_preview()` also accepts a `pyarrow.Table` directly —
no file needed.

```{python}
import pyarrow as pa
from great_docs import tbl_preview

tbl = pa.table({
    'city': ['Tokyo', 'Delhi', 'Shanghai', 'São Paulo', 'Mexico City',
             'Cairo', 'Mumbai', 'Beijing', 'Dhaka', 'Osaka'],
    'country': ['Japan', 'India', 'China', 'Brazil', 'Mexico',
                'Egypt', 'India', 'China', 'Bangladesh', 'Japan'],
    'population_m': [13.96, 11.03, 24.87, 12.33, 9.21,
                     9.54, 12.44, 21.54, 8.91, 2.75],
    'area_km2': [2194, 1484, 6341, 1521, 1485,
                 3085, 603, 16411, 306, 225],
})

tbl_preview(tbl, show_all=True)
```

## Arrow Table with Typed Columns

PyArrow preserves rich type information — booleans, dates,
decimals — which `tbl_preview()` maps to short dtype labels.

```{python}
import pyarrow as pa
from datetime import date

tbl = pa.table({
    'event': ['Launch', 'Update', 'Hotfix', 'Deprecation'],
    'date': [date(2025, 1, 15), date(2025, 3, 1), date(2025, 3, 12), date(2025, 6, 30)],
    'critical': [True, False, True, False],
    'affected_users': [50000, 12000, 8500, 2000],
})

tbl_preview(tbl, show_all=True)
```