---------------------------------------------------------------------- This is the API documentation for the gdtest_tbl_preview library. ---------------------------------------------------------------------- ## Functions Public functions sample_scores(n: 'int' = 20) -> 'dict[str, list]' Generate a student scores dataset. Parameters ---------- n Number of rows. Returns ------- dict[str, list] Column-oriented dict with name, subject, score, grade, and pass/fail columns. Examples -------- >>> data = sample_scores(5) >>> len(data["name"]) 5 sample_inventory(n: 'int' = 30) -> 'dict[str, list]' Generate a product inventory dataset. Parameters ---------- n Number of rows. Returns ------- dict[str, list] Column-oriented dict with product, category, price, stock, and rating columns. Examples -------- >>> data = sample_inventory(10) >>> len(data["product"]) 10 sample_wide(n_rows: 'int' = 15, n_cols: 'int' = 20) -> 'dict[str, list]' Generate a wide dataset with many columns. Parameters ---------- n_rows Number of rows. n_cols Number of columns. Returns ------- dict[str, list] Column-oriented dict with columns named ``col_001`` through ``col_{n_cols:03d}``. Examples -------- >>> data = sample_wide(5, 8) >>> len(data) 8 sample_missing(n: 'int' = 15) -> 'dict[str, list]' Generate a dataset riddled with missing values. Parameters ---------- n Number of rows. Returns ------- dict[str, list] Column-oriented dict where roughly 25 percent of values are ``None`` or ``float('nan')``. Examples -------- >>> data = sample_missing(10) >>> None in data["alpha"] True sample_types() -> 'dict[str, list]' Generate a dataset that exercises many Python types. Returns ------- dict[str, list] Six rows with int, float, bool, string, None, and large-number columns. Examples -------- >>> data = sample_types() >>> len(data["integer"]) 6 ---------------------------------------------------------------------- This is the User Guide documentation for the package. ---------------------------------------------------------------------- ## Default Settings The simplest way to use `tbl_preview()` — pass a column-oriented dict and let the defaults do the work. ```{python} from great_docs import tbl_preview from gdtest_tbl_preview import sample_scores tbl_preview(sample_scores(20)) ``` ## From a List of Dicts You can also pass a list of row dicts: ```{python} rows = [ {"city": "Tokyo", "pop_m": 37.4, "country": "Japan"}, {"city": "Delhi", "pop_m": 32.9, "country": "India"}, {"city": "Shanghai", "pop_m": 29.2, "country": "China"}, {"city": "São Paulo", "pop_m": 22.4, "country": "Brazil"}, {"city": "Mexico City", "pop_m": 21.8, "country": "Mexico"}, ] tbl_preview(rows) ``` ## With a Caption ```{python} tbl_preview( sample_scores(12), caption="Student Performance — Fall 2025", ) ``` ## Pandas DataFrame Pass a Pandas DataFrame directly. The preview auto-detects the library and shows a **Pandas** badge. ```{python} import pandas as pd from great_docs import tbl_preview df = pd.DataFrame({ "name": ["Alice", "Bob", "Charlie", "Diana", "Eve", "Frank", "Grace", "Hank", "Iris", "Jack", "Kate", "Leo", "Mia", "Noah", "Olivia"], "department": ["Eng", "Sales", "Eng", "HR", "Sales", "Eng", "HR", "Sales", "Eng", "HR", "Sales", "Eng", "HR", "Sales", "Eng"], "salary": [95000, 72000, 88000, 65000, 78000, 105000, 62000, 81000, 92000, 58000, 74000, 110000, 67000, 83000, 97000], "years": [5, 3, 7, 2, 4, 10, 1, 6, 8, 3, 4, 12, 2, 5, 9], }) tbl_preview(df) ``` ## Custom Head and Tail Show 8 rows from the top and 3 from the bottom: ```{python} tbl_preview(df, n_head=8, n_tail=3) ``` ## Show All Rows ```{python} tbl_preview(df, show_all=True) ``` ## Polars DataFrame Polars DataFrames are detected automatically and show a blue **Polars** badge with precise dtype labels. ```{python} import polars as pl from great_docs import tbl_preview df = pl.DataFrame({ "id": range(1, 26), "value": [x * 1.1 for x in range(1, 26)], "category": ["A", "B", "C", "D", "E"] * 5, "flag": [True, False] * 12 + [True], }) tbl_preview(df) ``` ## Head Only (No Tail) ```{python} tbl_preview(df, n_head=10, n_tail=0) ``` ## Highlighted Missing Values By default, `None` and `NaN` values are highlighted in red: ```{python} from great_docs import tbl_preview from gdtest_tbl_preview import sample_missing tbl_preview(sample_missing(15)) ``` ## Without Highlighting Turn off missing-value highlighting with `highlight_missing=False`: ```{python} tbl_preview(sample_missing(15), highlight_missing=False) ``` ## Mixed Python Types Inf, NaN, None, empty strings, HTML-unsafe characters, and large numbers: ```{python} from gdtest_tbl_preview import sample_types tbl_preview(sample_types(), show_all=True) ``` ## Column Subset Select and reorder columns with the `columns` parameter: ```{python} from great_docs import tbl_preview from gdtest_tbl_preview import sample_inventory data = sample_inventory(25) tbl_preview(data, columns=["product", "price", "rating"]) ``` ## Wide Table A table with 20 columns overflows and scrolls horizontally: ```{python} from gdtest_tbl_preview import sample_wide tbl_preview(sample_wide(12, 20)) ``` ## No Row Numbers ```{python} tbl_preview( sample_inventory(10), show_row_numbers=False, ) ``` ## No Dtype Labels ```{python} tbl_preview( sample_inventory(10), show_dtypes=False, ) ``` ## Minimal Chrome Turn off every optional element — no row numbers, no dtypes, no dimension badges: ```{python} from great_docs import tbl_preview from gdtest_tbl_preview import sample_scores tbl_preview( sample_scores(8), show_row_numbers=False, show_dtypes=False, show_dimensions=False, show_all=True, ) ``` ## Full Chrome with Caption Everything enabled plus a caption: ```{python} tbl_preview( sample_scores(50), n_head=10, n_tail=5, caption="Top & bottom of the class roster", ) ``` ## Custom Column Width Restrict columns to 120px max width: ```{python} tbl_preview( sample_scores(15), max_col_width=120, min_tbl_width=400, ) ``` ## Side-by-Side Comparison Default Pandas output vs. `tbl_preview()` on the same data: ::: {layout-ncol=2} ```{python} #| echo: false import pandas as pd df = pd.DataFrame(sample_scores(10)) df ``` ```{python} #| echo: false tbl_preview(df) ``` ::: ## Long Strings (Default Width) Cells with very long text are capped at `max_col_width` (250px by default) and show an ellipsis instead of wrapping. ```{python} from great_docs import tbl_preview data = { "id": [1, 2, 3, 4, 5], "title": [ "A short title", "A moderately long title that tests mid-range widths", "This title is intentionally very long so that it will definitely exceed the maximum column width and trigger text-overflow ellipsis behavior in the rendered table cell", "Brief", "Another extremely verbose title string that goes on and on to stress-test the truncation and overflow handling in the preview table renderer", ], "status": ["draft", "published", "review", "archived", "published"], } tbl_preview(data, show_all=True) ``` ## Descriptions and Paragraphs Real-world data often has paragraph-length text in columns. ```{python} data = { "package": ["NumPy", "Pandas", "Polars", "Great Tables", "Pointblank"], "description": [ "Fundamental package for scientific computing with Python. Provides N-dimensional arrays, linear algebra, Fourier transforms, and random number generation.", "Powerful data structures for data analysis, time series, and statistics. Built on NumPy with labeled axes, automatic alignment, and rich I/O.", "Lightning-fast DataFrame library in Rust with a Python API. Lazy evaluation, multi-threaded queries, and Apache Arrow memory format.", "Build beautiful, publication-quality tables in Python. Supports Polars and Pandas DataFrames with fine-grained styling, formatting, and export.", "Data validation library for Python. Define expectations, validate data, and generate detailed reports with table-level and column-level checks.", ], "version": ["1.26.0", "2.2.0", "0.20.0", "0.15.0", "0.14.0"], } tbl_preview(data, show_all=True) ``` ## Narrow Max Width (120px) Force aggressive truncation with a tight `max_col_width`: ```{python} tbl_preview(data, show_all=True, max_col_width=120) ``` ## Wide Max Width (500px) Allow generous room — long text is still capped, but more is visible: ```{python} tbl_preview(data, show_all=True, max_col_width=500) ``` ## Mixed Short and Long Columns Short numeric/code columns alongside verbose text — each column gets its own computed width. ```{python} data = { "code": ["E001", "E002", "E003", "W001", "W002", "I001", "I002", "E004"], "severity": ["error", "error", "error", "warning", "warning", "info", "info", "error"], "message": [ "Undefined variable: foobar", "Type mismatch: expected int, got str in argument `count` of function process_batch()", "Division by zero in expression total / n_items where n_items evaluates to 0", "Unused import: os (imported but never referenced in module)", "Variable `tmp` assigned on line 42 but never used anywhere in the function body", "Module docstring missing: consider adding a module-level docstring", "Line too long: 127 characters (max 120). Consider breaking this into multiple lines for readability", "Syntax error: unexpected token ) at position 34 in expression parse(input))", ], "line": [12, 45, 78, 3, 42, 1, 99, 34], } tbl_preview(data, show_all=True) ``` ## Read a TSV File `tbl_preview()` auto-detects `.tsv` and `.tab` files and reads them with tab-delimited parsing. ```{python} #| echo: false import pathlib tsv_path = pathlib.Path('assets/cities.tsv') tsv_path.parent.mkdir(parents=True, exist_ok=True) tsv_path.write_text( 'city\tcountry\tpopulation\tarea_km2\n' 'Tokyo\tJapan\t13960000\t2194\n' 'Delhi\tIndia\t11030000\t1484\n' 'Shanghai\tChina\t24870000\t6341\n' 'São Paulo\tBrazil\t12330000\t1521\n' 'Mexico City\tMexico\t9210000\t1485\n' 'Cairo\tEgypt\t9540000\t3085\n' 'Mumbai\tIndia\t12440000\t603\n' 'Beijing\tChina\t21540000\t16411\n' ) ``` ```{python} from great_docs import tbl_preview tbl_preview('assets/cities.tsv', show_all=True) ``` The badge shows **TSV** and the header reports the correct row and column counts. ## Read a JSONL File Newline-delimited JSON (`.jsonl` / `.ndjson`) is a common format for streaming data and log records. ```{python} #| echo: false import pathlib, json records = [ {'timestamp': '2025-01-15T08:30:00', 'level': 'INFO', 'module': 'auth', 'message': 'User login successful'}, {'timestamp': '2025-01-15T08:31:12', 'level': 'WARNING', 'module': 'db', 'message': 'Slow query detected (3.2s)'}, {'timestamp': '2025-01-15T08:32:45', 'level': 'ERROR', 'module': 'api', 'message': 'Request timeout on /v2/users'}, {'timestamp': '2025-01-15T08:33:01', 'level': 'INFO', 'module': 'cache', 'message': 'Cache miss for key user:42'}, {'timestamp': '2025-01-15T08:34:20', 'level': 'DEBUG', 'module': 'auth', 'message': 'Token refresh for session abc123'}, {'timestamp': '2025-01-15T08:35:55', 'level': 'ERROR', 'module': 'db', 'message': 'Connection pool exhausted'}, {'timestamp': '2025-01-15T08:36:10', 'level': 'INFO', 'module': 'api', 'message': 'Health check passed'}, {'timestamp': '2025-01-15T08:37:30', 'level': 'WARNING', 'module': 'auth', 'message': 'Failed login attempt from 192.168.1.100'}, ] jsonl_path = pathlib.Path('assets/server_logs.jsonl') jsonl_path.parent.mkdir(parents=True, exist_ok=True) jsonl_path.write_text('\n'.join(json.dumps(r) for r in records) + '\n') ``` ```{python} from great_docs import tbl_preview tbl_preview('assets/server_logs.jsonl', show_all=True) ``` ## NDJSON Extension The `.ndjson` extension is treated identically: ```{python} #| echo: false import shutil shutil.copy('assets/server_logs.jsonl', 'assets/server_logs.ndjson') ``` ```{python} tbl_preview('assets/server_logs.ndjson', show_all=True) ``` ## Read a Parquet File Apache Parquet is a columnar storage format popular in data engineering workflows. ```{python} #| echo: false import polars as pl, pathlib df = pl.DataFrame({ 'product': ['Widget', 'Gadget', 'Gizmo', 'Doohickey', 'Thingamajig'], 'category': ['Electronics', 'Tools', 'Kitchen', 'Garden', 'Office'], 'price': [29.99, 49.50, 12.00, 8.75, 199.99], 'in_stock': [True, False, True, True, False], 'rating': [4.5, 3.8, 4.9, 4.2, 2.1], }) pq_path = pathlib.Path('assets/products.parquet') pq_path.parent.mkdir(parents=True, exist_ok=True) df.write_parquet(str(pq_path)) ``` ```{python} from great_docs import tbl_preview tbl_preview('assets/products.parquet', show_all=True) ``` The badge shows **Parquet** and dtype labels are preserved from the original Polars schema. ## Feather File Feather (Apache Arrow IPC format) is fast for local analytics. ```{python} #| echo: false import polars as pl, pathlib df = pl.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'], 'department': ['Engineering', 'Marketing', 'Engineering', 'Sales', 'Marketing', 'Sales'], 'salary': [95000, 72000, 105000, 68000, 88000, 71000], 'years': [5, 3, 8, 2, 6, 4], }) feather_path = pathlib.Path('assets/employees.feather') feather_path.parent.mkdir(parents=True, exist_ok=True) df.write_ipc(str(feather_path)) ``` ```{python} from great_docs import tbl_preview tbl_preview('assets/employees.feather', show_all=True) ``` ## Arrow IPC Extension Files with `.arrow` or `.ipc` extensions are also read as Arrow IPC, but get the **Arrow** badge instead of Feather: ```{python} #| echo: false import shutil shutil.copy('assets/employees.feather', 'assets/employees.arrow') ``` ```{python} tbl_preview('assets/employees.arrow', show_all=True) ``` ## In-Memory Arrow Table `tbl_preview()` also accepts a `pyarrow.Table` directly — no file needed. ```{python} import pyarrow as pa from great_docs import tbl_preview tbl = pa.table({ 'city': ['Tokyo', 'Delhi', 'Shanghai', 'São Paulo', 'Mexico City', 'Cairo', 'Mumbai', 'Beijing', 'Dhaka', 'Osaka'], 'country': ['Japan', 'India', 'China', 'Brazil', 'Mexico', 'Egypt', 'India', 'China', 'Bangladesh', 'Japan'], 'population_m': [13.96, 11.03, 24.87, 12.33, 9.21, 9.54, 12.44, 21.54, 8.91, 2.75], 'area_km2': [2194, 1484, 6341, 1521, 1485, 3085, 603, 16411, 306, 225], }) tbl_preview(tbl, show_all=True) ``` ## Arrow Table with Typed Columns PyArrow preserves rich type information — booleans, dates, decimals — which `tbl_preview()` maps to short dtype labels. ```{python} import pyarrow as pa from datetime import date tbl = pa.table({ 'event': ['Launch', 'Update', 'Hotfix', 'Deprecation'], 'date': [date(2025, 1, 15), date(2025, 3, 1), date(2025, 3, 12), date(2025, 6, 30)], 'critical': [True, False, True, False], 'affected_users': [50000, 12000, 8500, 2000], }) tbl_preview(tbl, show_all=True) ```