MissingSpec
Specification for structured missing values in a column.
Usage
MissingSpec()Real-world data rarely encodes missingness as a single null value. Survey data distinguishes refused from don’t know from not applicable; clinical data uses codes like "NOT DONE"; statistical packages use sentinel values such as -99, ".A", or "". A MissingSpec captures these sentinel values, the reason each one represents, and how they should be handled during validation and analysis.
This brings the idea of structured missingness (a missing value carries a reason for its absence) into Pointblank’s runtime validation layer. Once defined, a MissingSpec can be passed to validation methods (via missing=) to automatically exclude sentinel values from constraint checks, or used with dedicated methods like Validate.col_missing_coded() and Validate.col_pct_missing().
Parameters
reasons: dict[Any, str]-
A dictionary mapping sentinel values to reason labels. Keys are the actual values present in the data (e.g.,
-99,"NA",".A"). Values are human-readable reason identifiers (e.g.,"refused","not_asked"). categories: dict[str, list[str]] | None = None-
Optional grouping of reasons into categories (e.g., an
"item_nonresponse"category that groups"refused"and"dont_know"). Useful for aggregate reporting and for checking missingness rates by category. Each value is a list of reason labels that appear inreasons. Default isNone. null_is_missing: bool = True-
Whether actual null/
None/NaNvalues should also be treated as missing (with reason given by null_reason). Default isTrue. null_reason: str = "unknown"-
The reason label assigned to actual null values when
null_is_missing=True. Default is"unknown". description: str | None = None-
Optional human-readable description of the overall missingness pattern. Default is
None.
Returns
MissingSpec-
A missing-value specification that can be attached to a
Field(viamissing=) or passed to validation methods.
Examples
Define the missing-value codes for a survey age variable:
import pointblank as pb
age_missing = pb.MissingSpec(
reasons={
-99: "not_asked", # Question wasn't asked to this participant
-98: "refused", # Participant declined to answer
-97: "dont_know", # Participant didn't know
-96: "not_applicable", # Question doesn't apply
},
categories={
"item_nonresponse": ["refused", "dont_know"],
"design": ["not_asked", "not_applicable"],
},
)The spec can then answer questions about its own structure:
age_missing.sentinel_values() # [-99, -98, -97, -96]
age_missing.reason_for(-98) # "refused"
age_missing.values_for_reason("refused") # [-98]
age_missing.values_for_category("item_nonresponse") # [-98, -97]Methods
| Name | Description |
|---|---|
| from_cdisc() |
Alias for from_cdisc_null_flavors().
|
| from_cdisc_null_flavors() | Create a MissingSpec for the standard HL7/CDISC null flavors. |
| from_sas() | Create a MissingSpec for SAS special missing values. |
| from_spss() | Create a MissingSpec from SPSS-style user-defined missing values. |
| from_variable_metadata() | Create a MissingSpec from an imported variable’s metadata. |
| is_missing() | Check whether a value should be considered missing under this spec. |
| reason_for() | Get the reason label for a specific value. |
| reasons_list() | Get the distinct reason labels defined by this spec. |
| sentinel_values() | Get all sentinel values that encode missingness. |
| values_for_category() | Get all sentinel values whose reason falls in a given category. |
| values_for_reason() | Get all sentinel values that correspond to a given reason. |
from_cdisc()
Alias for from_cdisc_null_flavors().
Usage
from_cdisc(**kwargs)from_cdisc_null_flavors()
Create a MissingSpec for the standard HL7/CDISC null flavors.
Usage
from_cdisc_null_flavors(
null_is_missing=True,
null_reason="no_information",
description="CDISC/HL7 null flavors"
)Clinical data uses standardized null flavor codes to record why a value is absent (e.g., "NASK" for “not asked”, "UNK" for “unknown”). This returns a ready-to-use spec mapping those codes to reason labels.
Parameters
null_is_missing: bool = True-
Whether actual null values should also be treated as missing. Default is
True. null_reason: str = "no_information"-
The reason label for actual null values. Default is
"no_information". description: str | None = "CDISC/HL7 null flavors"- Optional description. Default identifies the spec as CDISC/HL7 null flavors.
Returns
MissingSpec- A spec with the standard null flavor codes.
Examples
import pointblank as pb
cdisc_missing = pb.MissingSpec.from_cdisc_null_flavors()
cdisc_missing.reason_for("NASK") # "not_asked"from_sas()
Create a MissingSpec for SAS special missing values.
Usage
from_sas(
reasons=None,
include_underscore=True,
null_is_missing=True,
null_reason="system_missing",
description="SAS special missing values"
)SAS encodes missingness with "." (system missing), "._", and ".A" through ".Z" (27 user-defined missing codes). This returns a spec covering all of them; you can override the reason label for any specific code via reasons=.
Parameters
reasons: dict[str, str] | None = None-
Optional mapping of specific SAS missing codes to custom reason labels (e.g.,
{".A": "not_applicable", ".B": "below_detection"}). These override the defaults. include_underscore: bool = True-
Whether to include the
"._"special missing code. Default isTrue. null_is_missing: bool = True-
Whether actual null values should also be treated as missing. Default is
True. null_reason: str = "system_missing"-
The reason label for actual null values. Default is
"system_missing". description: str | None = "SAS special missing values"- Optional description. Default identifies the spec as SAS special missing values.
Returns
MissingSpec- A spec covering the SAS special missing values.
Examples
import pointblank as pb
sas_missing = pb.MissingSpec.from_sas(
reasons={".A": "not_applicable", ".B": "below_detection"}
)
sas_missing.reason_for(".A") # "not_applicable"
sas_missing.reason_for(".C") # "user_missing_c"from_spss()
Create a MissingSpec from SPSS-style user-defined missing values.
Usage
from_spss(
missing_values,
labels=None,
null_is_missing=True,
null_reason="unknown",
description="SPSS user-defined missing values"
)SPSS supports up to 3 user-defined missing values per variable (plus a range). Pass the missing values (and optionally their value labels) to build a spec. Reason labels are derived from the labels when available, otherwise a "missing_<value>" placeholder is used.
Parameters
missing_values: list-
The sentinel values that SPSS marks as missing for the variable (e.g.,
[-99, -98]). labels: dict[Any, str] | None = None-
Optional mapping of sentinel value to human-readable label (e.g.,
{-99: "Refused"}). Labels are slugified into reason identifiers (e.g.,"Refused"->"refused"). null_is_missing: bool = True-
Whether actual null values should also be treated as missing. Default is
True. null_reason: str = "unknown"-
The reason label for actual null values. Default is
"unknown". description: str | None = "SPSS user-defined missing values"- Optional description. Default identifies the spec as SPSS user-defined missing values.
Returns
MissingSpec- A spec built from the SPSS missing values.
Examples
import pointblank as pb
spss_missing = pb.MissingSpec.from_spss(
missing_values=[-99, -98],
labels={-99: "Not asked", -98: "Refused"},
)
spss_missing.reason_for(-98) # "refused"from_variable_metadata()
Create a MissingSpec from an imported variable’s metadata.
Usage
from_variable_metadata(variable, null_is_missing=True, null_reason="unknown")This works with a VariableMetadata object (as produced by import_metadata() for SPSS, Stata, and SAS files). It reads the variable’s missing_values and derives reason labels from missing_value_labels or value_labels when available.
Parameters
variable: Any-
A variable-metadata object exposing
missing_valuesand (optionally)missing_value_labels/value_labelsattributes. null_is_missing: bool = True-
Whether actual null values should also be treated as missing. Default is
True. null_reason: str = "unknown"-
The reason label for actual null values. Default is
"unknown".
Returns
MissingSpec | None-
A spec built from the variable’s missing values, or
Noneif the variable declares no missing values.
is_missing()
Check whether a value should be considered missing under this spec.
Usage
is_missing(value)Parameters
value: Any- A value from the data.
Returns
bool-
Trueifvalueis a declared sentinel value, or ifvalueisNoneandnull_is_missing=True.
reason_for()
Get the reason label for a specific value.
Usage
reason_for(value)Parameters
value: Any- A value from the data.
Returns
str | None-
The reason label if
valueis a declared sentinel value, null_reason ifvalueisNoneandnull_is_missing=True, orNoneif the value is not considered missing.
reasons_list()
Get the distinct reason labels defined by this spec.
Usage
reasons_list()Returns
list[str]-
The distinct reason labels (in first-seen order), including null_reason when
null_is_missing=True.
sentinel_values()
Get all sentinel values that encode missingness.
Usage
sentinel_values()Returns
list-
The keys of
reasons(the actual values in the data that represent missingness). Note that this does not includeNoneeven whennull_is_missing=True; useis_missing()to test individual values.
values_for_category()
Get all sentinel values whose reason falls in a given category.
Usage
values_for_category(category)Parameters
category: str-
A category name defined in
categories.
Returns
list-
All sentinel values whose reason label is in the given category. Returns an empty list if
categoriesisNoneor the category is undefined.
values_for_reason()
Get all sentinel values that correspond to a given reason.
Usage
values_for_reason(reason)Parameters
reason: str- A reason label.
Returns
list-
All sentinel values mapped to
reason.