MissingSpec

Specification for structured missing values in a column.

Usage

Source

MissingSpec()

Real-world data rarely encodes missingness as a single null value. Survey data distinguishes refused from don’t know from not applicable; clinical data uses codes like "NOT DONE"; statistical packages use sentinel values such as -99, ".A", or "". A MissingSpec captures these sentinel values, the reason each one represents, and how they should be handled during validation and analysis.

This brings the idea of structured missingness (a missing value carries a reason for its absence) into Pointblank’s runtime validation layer. Once defined, a MissingSpec can be passed to validation methods (via missing=) to automatically exclude sentinel values from constraint checks, or used with dedicated methods like Validate.col_missing_coded() and Validate.col_pct_missing().

Parameters

reasons: dict[Any, str]

A dictionary mapping sentinel values to reason labels. Keys are the actual values present in the data (e.g., -99, "NA", ".A"). Values are human-readable reason identifiers (e.g., "refused", "not_asked").

categories: dict[str, list[str]] | None = None

Optional grouping of reasons into categories (e.g., an "item_nonresponse" category that groups "refused" and "dont_know"). Useful for aggregate reporting and for checking missingness rates by category. Each value is a list of reason labels that appear in reasons. Default is None.

null_is_missing: bool = True

Whether actual null/None/NaN values should also be treated as missing (with reason given by null_reason). Default is True.

null_reason: str = "unknown"

The reason label assigned to actual null values when null_is_missing=True. Default is "unknown".

description: str | None = None
Optional human-readable description of the overall missingness pattern. Default is None.

Returns

MissingSpec
A missing-value specification that can be attached to a Field (via missing=) or passed to validation methods.

Examples

Define the missing-value codes for a survey age variable:

import pointblank as pb

age_missing = pb.MissingSpec(
    reasons={
        -99: "not_asked",       # Question wasn't asked to this participant
        -98: "refused",         # Participant declined to answer
        -97: "dont_know",       # Participant didn't know
        -96: "not_applicable",  # Question doesn't apply
    },
    categories={
        "item_nonresponse": ["refused", "dont_know"],
        "design": ["not_asked", "not_applicable"],
    },
)

The spec can then answer questions about its own structure:

age_missing.sentinel_values()              # [-99, -98, -97, -96]
age_missing.reason_for(-98)                # "refused"
age_missing.values_for_reason("refused")   # [-98]
age_missing.values_for_category("item_nonresponse")  # [-98, -97]

Methods

Name Description
from_cdisc() Alias for from_cdisc_null_flavors().
from_cdisc_null_flavors() Create a MissingSpec for the standard HL7/CDISC null flavors.
from_sas() Create a MissingSpec for SAS special missing values.
from_spss() Create a MissingSpec from SPSS-style user-defined missing values.
from_variable_metadata() Create a MissingSpec from an imported variable’s metadata.
is_missing() Check whether a value should be considered missing under this spec.
reason_for() Get the reason label for a specific value.
reasons_list() Get the distinct reason labels defined by this spec.
sentinel_values() Get all sentinel values that encode missingness.
values_for_category() Get all sentinel values whose reason falls in a given category.
values_for_reason() Get all sentinel values that correspond to a given reason.

from_cdisc()

Usage

Source

from_cdisc(**kwargs)

from_cdisc_null_flavors()

Create a MissingSpec for the standard HL7/CDISC null flavors.

Usage

Source

from_cdisc_null_flavors(
    null_is_missing=True,
    null_reason="no_information",
    description="CDISC/HL7 null flavors"
)

Clinical data uses standardized null flavor codes to record why a value is absent (e.g., "NASK" for “not asked”, "UNK" for “unknown”). This returns a ready-to-use spec mapping those codes to reason labels.

Parameters
null_is_missing: bool = True

Whether actual null values should also be treated as missing. Default is True.

null_reason: str = "no_information"

The reason label for actual null values. Default is "no_information".

description: str | None = "CDISC/HL7 null flavors"
Optional description. Default identifies the spec as CDISC/HL7 null flavors.
Returns
MissingSpec
A spec with the standard null flavor codes.
Examples
import pointblank as pb

cdisc_missing = pb.MissingSpec.from_cdisc_null_flavors()
cdisc_missing.reason_for("NASK")   # "not_asked"

from_sas()

Create a MissingSpec for SAS special missing values.

Usage

Source

from_sas(
    reasons=None,
    include_underscore=True,
    null_is_missing=True,
    null_reason="system_missing",
    description="SAS special missing values"
)

SAS encodes missingness with "." (system missing), "._", and ".A" through ".Z" (27 user-defined missing codes). This returns a spec covering all of them; you can override the reason label for any specific code via reasons=.

Parameters
reasons: dict[str, str] | None = None

Optional mapping of specific SAS missing codes to custom reason labels (e.g., {".A": "not_applicable", ".B": "below_detection"}). These override the defaults.

include_underscore: bool = True

Whether to include the "._" special missing code. Default is True.

null_is_missing: bool = True

Whether actual null values should also be treated as missing. Default is True.

null_reason: str = "system_missing"

The reason label for actual null values. Default is "system_missing".

description: str | None = "SAS special missing values"
Optional description. Default identifies the spec as SAS special missing values.
Returns
MissingSpec
A spec covering the SAS special missing values.
Examples
import pointblank as pb

sas_missing = pb.MissingSpec.from_sas(
    reasons={".A": "not_applicable", ".B": "below_detection"}
)
sas_missing.reason_for(".A")   # "not_applicable"
sas_missing.reason_for(".C")   # "user_missing_c"

from_spss()

Create a MissingSpec from SPSS-style user-defined missing values.

Usage

Source

from_spss(
    missing_values,
    labels=None,
    null_is_missing=True,
    null_reason="unknown",
    description="SPSS user-defined missing values"
)

SPSS supports up to 3 user-defined missing values per variable (plus a range). Pass the missing values (and optionally their value labels) to build a spec. Reason labels are derived from the labels when available, otherwise a "missing_<value>" placeholder is used.

Parameters
missing_values: list

The sentinel values that SPSS marks as missing for the variable (e.g., [-99, -98]).

labels: dict[Any, str] | None = None

Optional mapping of sentinel value to human-readable label (e.g., {-99: "Refused"}). Labels are slugified into reason identifiers (e.g., "Refused" -> "refused").

null_is_missing: bool = True

Whether actual null values should also be treated as missing. Default is True.

null_reason: str = "unknown"

The reason label for actual null values. Default is "unknown".

description: str | None = "SPSS user-defined missing values"
Optional description. Default identifies the spec as SPSS user-defined missing values.
Returns
MissingSpec
A spec built from the SPSS missing values.
Examples
import pointblank as pb

spss_missing = pb.MissingSpec.from_spss(
    missing_values=[-99, -98],
    labels={-99: "Not asked", -98: "Refused"},
)
spss_missing.reason_for(-98)   # "refused"

from_variable_metadata()

Create a MissingSpec from an imported variable’s metadata.

Usage

Source

from_variable_metadata(variable, null_is_missing=True, null_reason="unknown")

This works with a VariableMetadata object (as produced by import_metadata() for SPSS, Stata, and SAS files). It reads the variable’s missing_values and derives reason labels from missing_value_labels or value_labels when available.

Parameters
variable: Any

A variable-metadata object exposing missing_values and (optionally) missing_value_labels / value_labels attributes.

null_is_missing: bool = True

Whether actual null values should also be treated as missing. Default is True.

null_reason: str = "unknown"
The reason label for actual null values. Default is "unknown".
Returns
MissingSpec | None
A spec built from the variable’s missing values, or None if the variable declares no missing values.

is_missing()

Check whether a value should be considered missing under this spec.

Usage

Source

is_missing(value)
Parameters
value: Any
A value from the data.
Returns
bool
True if value is a declared sentinel value, or if value is None and null_is_missing=True.

reason_for()

Get the reason label for a specific value.

Usage

Source

reason_for(value)
Parameters
value: Any
A value from the data.
Returns
str | None
The reason label if value is a declared sentinel value, null_reason if value is None and null_is_missing=True, or None if the value is not considered missing.

reasons_list()

Get the distinct reason labels defined by this spec.

Usage

Source

reasons_list()
Returns
list[str]
The distinct reason labels (in first-seen order), including null_reason when null_is_missing=True.

sentinel_values()

Get all sentinel values that encode missingness.

Usage

Source

sentinel_values()
Returns
list
The keys of reasons (the actual values in the data that represent missingness). Note that this does not include None even when null_is_missing=True; use is_missing() to test individual values.

values_for_category()

Get all sentinel values whose reason falls in a given category.

Usage

Source

values_for_category(category)
Parameters
category: str
A category name defined in categories.
Returns
list
All sentinel values whose reason label is in the given category. Returns an empty list if categories is None or the category is undefined.

values_for_reason()

Get all sentinel values that correspond to a given reason.

Usage

Source

values_for_reason(reason)
Parameters
reason: str
A reason label.
Returns
list
All sentinel values mapped to reason.