missing_vals_tbl

missing_vals_tbl(data)

Display a table that shows the missing values in the input table.

The missing_vals_tbl() function generates a table that shows the missing values in the input table. The table is displayed using the Great Tables API, which allows for further customization of the table’s appearance if so desired.

Parameters

data : FrameT | Any

The table for which to display the missing values. This could be a DataFrame object or an Ibis table object. Read the Supported Input Table Types section for details on the supported table types.

Returns

: GT

A GT object that displays the table of missing values in the input table.

Supported Input Table Types

The data= parameter can be given any of the following table types:

  • Polars DataFrame ("polars")
  • Pandas DataFrame ("pandas")
  • DuckDB table ("duckdb")*
  • MySQL table ("mysql")*
  • PostgreSQL table ("postgresql")*
  • SQLite table ("sqlite")*
  • Parquet table ("parquet")*

The table types marked with an asterisk need to be prepared as Ibis tables (with type of ibis.expr.types.relations.Table). Furthermore, using missing_vals_tbl() with these types of tables requires the Ibis library (v9.5.0 or above) to be installed. If the input table is a Polars or Pandas DataFrame, the availability of Ibis is not needed.

The Missing Values Table

The missing values table shows the proportion of missing values in each column of the input table. The table is divided into sectors, with each sector representing a range of rows in the table. The proportion of missing values in each sector is calculated for each column. The table is displayed using the Great Tables API, which allows for further customization of the table’s appearance.

To ensure that the table can scale to tables with many columns, each row in the reporting table represents a column in the input table. There are 10 sectors shown in the table, where the first sector represents the first 10% of the rows, the second sector represents the next 10% of the rows, and so on. Any sectors that are light blue indicate that there are no missing values in that sector. If there are missing values, the proportion of missing values is shown by a gray color (light gray for low proportions, dark gray to black for very high proportions).

Examples

The missing_vals_tbl() function is useful for quickly identifying columns with missing values in a table. Here’s an example using the nycflights dataset (loaded as a Polars DataFrame using the load_dataset() function):

import pointblank as pb

nycflights = pb.load_dataset("nycflights", tbl_type="polars")

pb.missing_vals_tbl(nycflights)
Missing Values   46,595 in total
PolarsRows336,776Columns18
Column Row Sector
1 2 3 4 5 6 7 8 9 10
year
month
day
dep_time
sched_dep_time
dep_delay
arr_time
sched_arr_time
arr_delay
carrier
flight
tailnum
origin
dest
air_time
distance
hour
minute
NO MISSING VALUES     PROPORTION MISSING:  
0%
100%
ROW SECTORS
  1. 1 – 33677
  2. 33678 – 67354
  3. 67355 – 101031
  4. 101032 – 134708
  5. 134709 – 168385
  6. 168386 – 202062
  7. 202063 – 235739
  8. 235740 – 269416
  9. 269417 – 303093
  10. 303094 – 336776

The table shows the proportion of missing values in each column of the nycflights dataset. The table is divided into sectors, with each sector representing a range of rows in the table (with around 34,000 rows per sector). The proportion of missing values in each sector is calculated for each column. The various shades of gray indicate the proportion of missing values in each sector. Many columns have no missing values at all, and those sectors are colored light blue.