Display a table that shows the missing values in the input table.
The missing_vals_tbl() function generates a table that shows the missing values in the input table. The table is displayed using the Great Tables API, which allows for further customization of the table’s appearance if so desired.
Parameters
data:FrameT | Any
The table for which to display the missing values. This could be a DataFrame object or an Ibis table object. Read the Supported Input Table Types section for details on the supported table types.
Returns
:GT
A GT object that displays the table of missing values in the input table.
Supported Input Table Types
The data= parameter can be given any of the following table types:
Polars DataFrame ("polars")
Pandas DataFrame ("pandas")
DuckDB table ("duckdb")*
MySQL table ("mysql")*
PostgreSQL table ("postgresql")*
SQLite table ("sqlite")*
Parquet table ("parquet")*
The table types marked with an asterisk need to be prepared as Ibis tables (with type of ibis.expr.types.relations.Table). Furthermore, using missing_vals_tbl() with these types of tables requires the Ibis library (v9.5.0 or above) to be installed. If the input table is a Polars or Pandas DataFrame, the availability of Ibis is not needed.
The Missing Values Table
The missing values table shows the proportion of missing values in each column of the input table. The table is divided into sectors, with each sector representing a range of rows in the table. The proportion of missing values in each sector is calculated for each column. The table is displayed using the Great Tables API, which allows for further customization of the table’s appearance.
To ensure that the table can scale to tables with many columns, each row in the reporting table represents a column in the input table. There are 10 sectors shown in the table, where the first sector represents the first 10% of the rows, the second sector represents the next 10% of the rows, and so on. Any sectors that are light blue indicate that there are no missing values in that sector. If there are missing values, the proportion of missing values is shown by a gray color (light gray for low proportions, dark gray to black for very high proportions).
Examples
The missing_vals_tbl() function is useful for quickly identifying columns with missing values in a table. Here’s an example using the nycflights dataset (loaded as a Polars DataFrame using the load_dataset() function):
import pointblank as pbnycflights = pb.load_dataset("nycflights", tbl_type="polars")pb.missing_vals_tbl(nycflights)
Missing Values 46,595 in total
PolarsRows336,776Columns18
Column
Row Sector
1
2
3
4
5
6
7
8
9
10
year
month
day
dep_time
sched_dep_time
dep_delay
arr_time
sched_arr_time
arr_delay
carrier
flight
tailnum
origin
dest
air_time
distance
hour
minute
NO MISSING VALUES PROPORTION MISSING:
0%
100%
ROW SECTORS
1 – 33677
33678 – 67354
67355 – 101031
101032 – 134708
134709 – 168385
168386 – 202062
202063 – 235739
235740 – 269416
269417 – 303093
303094 – 336776
The table shows the proportion of missing values in each column of the nycflights dataset. The table is divided into sectors, with each sector representing a range of rows in the table (with around 34,000 rows per sector). The proportion of missing values in each sector is calculated for each column. The various shades of gray indicate the proportion of missing values in each sector. Many columns have no missing values at all, and those sectors are colored light blue.