Step Reports: Diagnosing Validation Results in Detail
While validation reports provide a comprehensive overview of all validation steps, sometimes you need to focus on a specific validation step in greater detail. This is where step reports come in. A step report is a detailed examination of a single validation step, providing in-depth information about the test units that were validated and their pass/fail status.
Step reports are especially useful when debugging validation failures, investigating problematic data, or communicating detailed findings to colleagues who are responsible for specific data quality issues.
Creating a Step Report
To create a step report, you first need to run a validation and then use the get_step_report() method, specifying which validation step you want to examine:
import pointblank as pbimport polars as pl# Sample data as a Polars DataFramedata = pl.DataFrame({"id": range(1, 11),"value": [10, 20, 3, 35, 50, 2, 70, 8, 20, 4],"category": ["A", "B", "C", "A", "D", "F", "A", "E", "H", "G"],"ratio": [0.5, 0.7, 0.3, 1.2, 0.8, 0.9, 0.4, 1.5, 0.6, 0.2],"status": ["active", "active", "inactive", "active", "inactive","active", "inactive", "active", "active", "inactive"]})# Create a validationvalidation = ( pb.Validate(data=data, tbl_name="example_data") .col_vals_gt( columns="value", value=10 ) .col_vals_in_set( columns="category",set=["A", "B", "C"] ) .interrogate())# Get step report for the second validation step (i=2)step_report = validation.get_step_report(i=2)step_report
Report for Validation Step 2
ASSERTION category ∈ {A, B, C}
5 / 10 TEST UNIT FAILURES IN COLUMN 3
EXTRACT OF ALL 5 ROWS (WITH TEST UNIT FAILURES IN RED):
id
Int64
value
Int64
category
String
ratio
Float64
status
String
5
5
50
D
0.8
inactive
6
6
2
F
0.9
active
8
8
8
E
1.5
active
9
9
20
H
0.6
active
10
10
4
G
0.2
inactive
In this example, we first create and interrogate a validation object with two steps. We then generate a step report for the second validation step (i=2), which checks if the values in the category column are in the set ["A", "B", "C"].
Understanding Step Report Components
A step report consists of several key components that provide detailed information about the validation step:
Header: displays the validation step number, type of validation, and a brief description
Table Body: presents either the failing rows, a sample of completely passing data, or an expected/actual comparison (for a col_schema_match() step)
The step report table highlights passing and failing rows, making it easy to identify problematic data points. This is especially useful for diagnosing issues when dealing with large datasets.
Different Types of Step Reports
It’s important to note that step reports vary in appearance and structure depending on the type of validation method used:
Uniqueness checks (rows_distinct()): group together the duplicate records in order of appearance
Schema validations (col_schema_match()): display column-level information about expected vs. actual data types
Additionally, step reports for value-based validations and uniqueness checks operate in two distinct modes:
When errors are present: The report shows only the failing rows and, for value-based validations, clearly highlights the column under study
When no errors exist: The report header clearly indicates success, and a sample of the data is shown (along with the studied column highlighted, for value-based validations)
This variation in reporting style allows step reports to effectively communicate the specific type of validation being performed and display relevant information in the most appropriate format. When you’re working with different validation types, expect to see different step report layouts optimized for each context.
Getting Reports for Specific Validation Steps
You can get a step report for any validation step by specifying its step number in the get_step_report() method:
validation.get_step_report(i=1)
Report for Validation Step 1
ASSERTION value > 10
5 / 10 TEST UNIT FAILURES IN COLUMN 2
EXTRACT OF ALL 5 ROWS (WITH TEST UNIT FAILURES IN RED):
id
Int64
value
Int64
category
String
ratio
Float64
status
String
1
1
10
A
0.5
active
3
3
3
C
0.3
inactive
6
6
2
F
0.9
active
8
8
8
E
1.5
active
10
10
4
G
0.2
inactive
Note that step numbers in Pointblank start at 1, matching what you see in the validation report’s STEP column (i.e., not 0-based indexing). So the first step is referred to with i=1, the second step with i=2, and so on.
Customizing Step Reports
Step reports can be customized with several parameters to better focus your analysis and tailor the output to your specific needs. The get_step_report() method offers multiple customization options to help you create more effective reports.
When a dataset has many columns, you might want to focus on just those relevant to your analysis. You can create a step report containing only a subset of the columns in the target table:
validation.get_step_report( i=2, columns_subset=["id", "category", "status"] # Only show these columns)
Report for Validation Step 2
ASSERTION category ∈ {A, B, C}
5 / 10 TEST UNIT FAILURES IN COLUMN 3
EXTRACT OF ALL 5 ROWS (WITH TEST UNIT FAILURES IN RED):
id
Int64
category
String
status
String
5
5
D
inactive
6
6
F
active
8
8
E
active
9
9
H
active
10
10
G
inactive
This approach makes step reports much easier to interpret by highlighting just the essential columns that help understand the validation failures.
For large datasets with many failing rows, you might want to use limit= to set a cap on the number of rows shown in the report:
validation.get_step_report( i=2, limit=2# Only show up to 2 failing rows)
Report for Validation Step 2
ASSERTION category ∈ {A, B, C}
5 / 10 TEST UNIT FAILURES IN COLUMN 3
EXTRACT OF FIRST 2 ROWS (WITH TEST UNIT FAILURES IN RED):
id
Int64
value
Int64
category
String
ratio
Float64
status
String
5
5
50
D
0.8
inactive
6
6
2
F
0.9
active
The report header can also be extensively customized to provide more specific context. You can replace the default header with plain text or Markdown formatting:
For more advanced header customization, you can use the templating system with the {title} and {details} elements to retain parts of the default header while adding your own content. The {title} template is the default title whereas {details} provides information on the assertion, number of failures, etc. Let’s move away from the default template of {title}{details} and provide a custom title to go with the details text:
EXTRACT OF ALL 5 ROWS (WITH TEST UNIT FAILURES IN RED):
id
Int64
value
Int64
category
String
ratio
Float64
status
String
5
5
50
D
0.8
inactive
6
6
2
F
0.9
active
8
8
8
E
1.5
active
9
9
20
H
0.6
active
10
10
4
G
0.2
inactive
We can keep {title} and {details} and add some more context in between the two:
validation.get_step_report( i=2, header=("{title}<br>""<span style='font-size: 0.75em;'>""This validation is critical for our data quality standards.""</span><br>""{details}" ))
Report for Validation Step 2 This validation is critical for our data quality standards.
ASSERTION category ∈ {A, B, C}
5 / 10 TEST UNIT FAILURES IN COLUMN 3
EXTRACT OF ALL 5 ROWS (WITH TEST UNIT FAILURES IN RED):
id
Int64
value
Int64
category
String
ratio
Float64
status
String
5
5
50
D
0.8
inactive
6
6
2
F
0.9
active
8
8
8
E
1.5
active
9
9
20
H
0.6
active
10
10
4
G
0.2
inactive
You could always use more HTML and CSS to do a lot of customization:
EXTRACT OF ALL 5 ROWS (WITH TEST UNIT FAILURES IN RED):
Report for Validation Step 2
id
Int64
value
Int64
category
String
ratio
Float64
status
String
5
5
50
D
0.8
inactive
6
6
2
F
0.9
active
8
8
8
E
1.5
active
9
9
20
H
0.6
active
10
10
4
G
0.2
inactive
If you prefer no header at all, simply set header=None:
validation.get_step_report( i=2, header=None)
id
Int64
value
Int64
category
String
ratio
Float64
status
String
5
5
50
D
0.8
inactive
6
6
2
F
0.9
active
8
8
8
E
1.5
active
9
9
20
H
0.6
active
10
10
4
G
0.2
inactive
These customization options can be combined to create highly focused reports tailored to specific needs:
validation.get_step_report( i=2, columns_subset=["id", "category"], header="*Category Validation:* Top Issues", limit=2)
Category Validation: Top Issues
id
Int64
category
String
5
5
D
6
6
F
Through these customization options, you can craft step reports that effectively communicate the most important information to different audiences. Technical teams might benefit from seeing all columns but with a limited number of examples. Business stakeholders might prefer a focused view with only the most relevant columns. For documentation purposes, custom headers provide important context about what’s being validated.
Remember that customizing your step reports is about more than aesthetics: it’s about making complex validation information more accessible and actionable for all stakeholders involved in data quality.
Using Step Reports for Data Investigation
Step reports can be powerful tools for investigating data quality issues. Let’s look at a more complex example:
# Create a more complex dataset with multiple issuescomplex_data = pl.DataFrame({"id": range(1, 11),"value": [10, 20, 3, 40, 50, 2, 70, 80, 90, 7],"ratio": [0.1, 0.2, 0.3, 1.4, 0.5, 0.6, 0.7, 0.8, 1.2, 0.9],"category": ["A", "B", "C", "A", "D", "B", "A", "C", "B", "E"]})# Create a validation with multiple stepsvalidation_complex = ( pb.Validate(data=complex_data, tbl_name="complex_data") .col_vals_gt(columns="value", value=10) .col_vals_le(columns="ratio", value=1.0) .col_vals_in_set(columns="category", set=["A", "B", "C"]) .interrogate())# Get step report for the ratio validation (step 2)ratio_report = validation_complex.get_step_report(i=2)ratio_report
Report for Validation Step 2
ASSERTION ratio ≤ 1.0
2 / 10 TEST UNIT FAILURES IN COLUMN 3
EXTRACT OF ALL 2 ROWS (WITH TEST UNIT FAILURES IN RED):
id
Int64
value
Int64
ratio
Float64
category
String
4
4
40
1.4
A
9
9
90
1.2
B
In this example, we’re investigating issues with the ratio column by generating a step report specifically for that validation step. The step report shows exactly which rows have values that exceed our maximum threshold of 1.0.
Combining Step Reports with Extracts
For more advanced analysis, you can extract the actual data from a step report into a DataFrame:
# Extract the data from the step reportfailing_ratios = validation_complex.get_data_extracts(i=2)failing_ratios
This extracts the failing rows from the validation step, which you can then further analyze or fix as needed. Note that the parameter i=2 corresponds directly to the step number shown in the validation report; it’s the same numbering system used for get_step_report().
These extracts are particularly valuable for analysts who need to:
perform additional calculations on problematic data
feed failing records into correction pipelines
create visualizations of data patterns that led to validation failures
export problem records to share with data owners
It’s worth noting that the validation report itself includes export buttons on the far right of each row that allow you to download CSV files of the failing data directly. This serves as a convenient delivery mechanism for sharing extracts with colleagues who may not be working in Python, making the validation report not just a visual tool but also a practical means of distributing problematic data for further investigation.
Step Reports with Segmented Data
When working with segmented validation, step reports become even more valuable as they allow you to investigate issues within specific segments:
# Create data with different regionssegmented_data = pl.DataFrame({"id": range(1, 10),"value": [10, 20, 3, 40, 50, 2, 6, 8, 60],"region": ["North", "North", "South", "South", "East", "East", "West", "West", "West"]})# Create a validation with segmentssegmented_validation = ( pb.Validate(data=segmented_data, tbl_name="regional_data") .col_vals_gt( columns="value", value=10, segments="region"# Segment by region ) .interrogate())# Get step report for a specific segment (the 'West' region)# For segmented validations, each segment gets its own step numbernorth_report = segmented_validation.get_step_report(i=4)north_report
Report for Validation Step 4
ASSERTION value > 10
2 / 3 TEST UNIT FAILURES IN COLUMN 2
EXTRACT OF ALL 2 ROWS (WITH TEST UNIT FAILURES IN RED):
id
Int64
value
Int64
region
String
1
7
6
West
2
8
8
West
For segmented validations, each segment is treated as a separate validation step with its own step number. This allows you to investigate issues specific to each data segment using the appropriate step number from the validation report.
Best Practices for Using Step Reports
Here are some guidelines for effectively using step reports in your data validation workflow:
Generate step reports selectively: create reports only for steps that require detailed investigation rather than for all steps
Use the limit= parameter for large datasets: when working with large datasets, focus only on a subset of failing rows to avoid information overload
Share specific step reports with stakeholders: when collaborating with domain experts, share relevant step reports to help them understand and address specific data quality issues (and customize the header to improve clarity)
Combine with extracts for deeper analysis: use the get_data_extracts() method to extract the failing rows for further analysis or correction
Document findings from step reports: when you discover patterns or insights from step reports, document them to inform future data quality improvements
Remember that step reports are most valuable when used strategically as part of a broader data quality framework. By following these best practices, you can use step reports not just for troubleshooting, but to develop a deeper understanding of your data’s characteristics and quality patterns over time. This approach transforms step reports from simple debugging tools into strategic assets for continuous data quality improvement.
Conclusion
Step reports provide a focused lens into specific validation steps, allowing you to investigate data quality issues in detail. By generating targeted reports for specific validation steps, you can:
pinpoint exactly which data points are causing validation failures
communicate specific issues to relevant stakeholders
gather insights that might be missed in the aggregate validation report
track improvements in specific aspects of data quality over time
Whether you’re debugging validation failures, investigating edge cases, or communicating specific data quality issues to colleagues, step reports can give you the detailed information you need to understand and resolve data quality problems effectively.