Create a comprehensive data summary table with visualizations.
The gt_plt_summary() function takes a DataFrame and generates a summary table showing key statistics and visual representations for each column. Each row displays the column type, missing data percentage, descriptive statistics (mean, median, standard deviation), and a small plot overview appropriate for the data type (histograms for numeric and datetime and a categorical bar chart for strings).
Inspired by the Observable team and the observablehq/SummaryTable function: https://observablehq.com/@observablehq/summary-table
Parameters
df:IntoDataFrame
A DataFrame to summarize. Can be any DataFrame type that you would pass into a GT.
title:str | None=None
Optional title for the summary table. If None, defaults to “Summary Table”.
Returns
:GT
A GT object containing the summary table with columns for Type, Column name, Plot Overview, Missing percentage, Mean, Median, and Standard Deviation.
import randomn =100random.seed(23)uniform = [random.uniform(0, 10) for _ inrange(n)]for i inrange(2, 10): uniform[i] =Nonenormal = [random.gauss(5, 2) for _ inrange(n)]normal[4] =Nonenormal[10] =Nonesingle_tailed = [random.expovariate(1/2) for _ inrange(n)]bimodal = [random.gauss(2, 0.5) for _ inrange(n //2)] + [random.gauss(8, 0.5) for _ inrange(n - n //2)]df = pl.DataFrame({"uniform": uniform,"normal": normal,"single_tailed": single_tailed,"bimodal": bimodal,})gte.gt_plt_summary(df)
Summary Table
100 rows x 4 cols
Type
Column
Plot Overview
Missing
Mean
Median
SD
uniform
8.0%
5.13
5.21
2.96
normal
2.0%
5.29
5.11
1.91
single_tailed
0.0%
1.93
1.52
1.85
bimodal
0.0%
4.86
4.86
3.03
Note
The datatype (dtype) of each column in your dataframe will determine the classified type in the summary table. Keep in mind that sometimes pandas or polars have differing behaviors with datatypes, especially when null values are present.