Helper function for referencing a column in the input table.
Many of the validation methods (i.e., col_vals_*() methods) in pointblank have a value= argument. These validations are comparisons between column values and a literal value, or, between column values and adjacent values in another column. The col() helper function is used to specify that it is a column being referenced, not a literal value.
The col() doesn’t check that the column exists in the input table. It acts to signal that the value being compared is a column value. During validation (i.e., when interrogate() is called), pointblank will then check that the column exists in the input table.
Either the name of a single column in the target table, provided as a string, or, an expression involving column selector functions (e.g., starts_with("a"), ends_with("e") \| starts_with("a"), etc.). Please read the documentation for further details on which input forms are valid depending on the context.
Returns
:Column
A Column object representing the column.
Usage with the columns= Argument
The col() function can be used in the columns= argument of the following validation methods:
col_vals_gt()
col_vals_lt()
col_vals_ge()
col_vals_le()
col_vals_eq()
col_vals_ne()
col_vals_between()
col_vals_outside()
col_vals_in_set()
col_vals_not_in_set()
col_vals_null()
col_vals_not_null()
col_vals_regex()
col_exists()
If specifying a single column with certainty (you have the exact name), col() is not necessary since you can just pass the column name as a string (though it is still valid to use col("column_name"), if preferred). However, if you want to select columns based on complex logic involving multiple column selector functions (e.g., columns that start with "a" but don’t end with "e"), you need to use col() to wrap expressions involving column selector functions and logical operators such as &, |, -, and ~.
Here is an example of such usage with the col_vals_gt() validation method:
If using only a single column selector function, you can pass the function directly to the columns= argument of the validation method, or, you can use col() to wrap the function (either is valid though the first is more concise). Here is an example of that simpler usage:
col_vals_gt(columns=starts_with("a"), value=10)
Usage with the value=, left=, and right= Arguments
The col() function can be used in the value= argument of the following validation methods
col_vals_gt()
col_vals_lt()
col_vals_ge()
col_vals_le()
col_vals_eq()
col_vals_ne()
and in the left= and right= arguments (either or both) of these two validation methods
col_vals_between()
col_vals_outside()
You cannot use column selector functions such as starts_with() in either of the value=, left=, or right= arguments since there would be no guarantee that a single column will be resolved from the target table with this approach. The col() function is used to signal that the value being compared is a column value and not a literal value.
Available Selectors
There is a collection of selectors available in pointblank, allowing you to select columns based on attributes of column names and positions. The selectors are:
starts_with()
ends_with()
contains()
matches()
everything()
first_n()
last_n()
Alternatively, we support selectors from the Narwhals library! Those selectors can additionally take advantage of the data types of the columns. The selectors are:
Suppose we have a table with columns a and b and we’d like to validate that the values in column a are greater than the values in column b. We can use the col() helper function to reference the comparison column when creating the validation step.
From results of the validation table it can be seen that values in a were greater than values in b for every row (or test unit). Using value=pb.col("b") specified that the greater-than comparison is across columns, not with a fixed literal value.
If you want to select an arbitrary set of columns upon which to base a validation, you can use column selector functions (e.g., starts_with(), ends_with(), etc.) to specify columns in the columns= argument of a validation method. Let’s use the starts_with() column selector function to select columns that start with "paid" and validate that the values in those columns are greater than 10.
In the above example the col() function contains the invocation of the starts_with() column selector function. This is not strictly necessary when using a single column selector function, so columns=pb.starts_with("paid") would be equivalent usage here. However, the use of col() is required when using multiple column selector functions with logical operators. Here is an example of that more complex usage:
In the above example the col() function contains the invocation of the starts_with() and matches() column selector functions, combined with the & operator. This is necessary to specify the set of columns that start with "paid"and match the text "2023" or "2024".
If you’d like to take advantage of Narwhals selectors, that’s also possible. Here is an example of using the numeric() column selector function to select all numeric columns for validation, checking that their values are greater than 0.
In the above example the col() function contains the invocation of the numeric() column selector function from Narwhals. As with the other selectors, this is not strictly necessary when using a single column selector, so columns=ncs.numeric() would also be fine here.
Narwhals selectors can also use operators to combine multiple selectors. Here is an example of using the numeric() and matches() selectors together to select all numeric columns that fit a specific pattern.
In the above example the col() function contains the invocation of the numeric() and matches() column selector functions from Narwhals, combined with the & operator. This is necessary to specify the set of columns that are numeric and match the text "2023" or "2024".