Generate a column-level summary table of a dataset.
The col_summary_tbl() function generates a summary table of a dataset, focusing on providing column-level information about the dataset. The summary includes the following information:
the type of the table (e.g., "polars", "pandas", etc.)
the number of rows and columns in the table
column-level information, including:
the column name
the column type
measures of missingness and distinctness
descriptive stats and quantiles
statistics for datetime columns
The summary table is returned as a GT object, which can be displayed in a notebook or saved to an HTML file.
Warning
The col_summary_tbl() function is still experimental. Please report any issues you encounter in the Pointblank issue tracker.
Parameters
data:FrameT | Any
The table to summarize, which could be a DataFrame object or an Ibis table object. Read the Supported Input Table Types section for details on the supported table types.
tbl_name:str | None=None
Optionally, the name of the table could be provided as tbl_name=.
Returns
:GT
A GT object that displays the column-level summaries of the table.
Supported Input Table Types
The data= parameter can be given any of the following table types:
Polars DataFrame ("polars")
Pandas DataFrame ("pandas")
DuckDB table ("duckdb")*
MySQL table ("mysql")*
PostgreSQL table ("postgresql")*
SQLite table ("sqlite")*
Parquet table ("parquet")*
The table types marked with an asterisk need to be prepared as Ibis tables (with type of ibis.expr.types.relations.Table). Furthermore, using col_summary_tbl() with these types of tables requires the Ibis library (v9.5.0 or above) to be installed. If the input table is a Polars or Pandas DataFrame, the availability of Ibis is not needed.
Examples
It’s easy to get a column-level summary of a table using the col_summary_tbl() function. Here’s an example using the small_table dataset (itself loaded using the load_dataset() function):
import pointblank as pbsmall_table_polars = pb.load_dataset(dataset="small_table", tbl_type="polars")pb.col_summary_tbl(data=small_table_polars)
PolarsRows13Columns8
Column
NA
UQ
Mean
SD
Min
P5
Q1
Med
Q3
P95
Max
IQR
1
date_time
Datetime(time_unit='us', time_zone=None)
0 0.00
12 0.92
—
—
2016-01-04 00:32:00 – 2016-01-30 11:23:00
—
2
date
Date
0 0.00
11 0.85
—
—
2016-01-04 – 2016-01-30
—
3
a
Int64
0 0.00
7 0.54
3.77
2.09
1.00
1.60
2.00
3.00
4.00
7.40
8.00
2.00
4
b
String
0 0.00
12 0.92
9.00
SL
0.00
SL
9
SL
—
—
9
SL
—
—
9
SL
—
5
c
Int64
2 0.15
6 0.46
5.73
2.72
2.00
2.50
3.00
7.00
8.00
9.00
9.00
5.00
6
d
Float64
0 0.00
12 0.92
2305
2631
108
214
838
1036
3291
6335
10000
2453
7
e
Boolean
0 0.00
T 0.61 F 0.39
—
—
—
—
—
—
—
—
—
—
8
f
String
0 0.00
3 0.23
3.46
SL
0.52
SL
3
SL
—
—
3
SL
—
—
4
SL
—
This table used above was a Polars DataFrame, but the col_summary_tbl() function works with any table supported by pointblank, including Pandas DataFrames and Ibis backend tables. Here’s an example using a DuckDB table handled by Ibis: