Merge data from two or more columns into a single column.
This method takes input from two or more columns and allows the contents to be merged into a single column by using a pattern that specifies the arrangement. The first column in the columns= parameter operates as the target column (i.e., the column that will undergo mutation) whereas all following columns will be untouched. There is the option to hide the non-target columns. The formatting of values in different columns will be preserved upon merging.
Parameters
columns:SelectExpr
The columns for which the merging operations should be applied. The first column name resolved will be the target column (i.e., undergo mutation) and the other columns will serve to provide input. Can be a list of column names or a selection expression, though a list is preferred here to ensure the order of columns is exactly as intended (since order matters for the pattern= parameter).
hide_columns:SelectExpr | Literal[False]=None
Any column names provided here will have their state changed to hidden (via internal use of .cols_hide()) if they aren’t already hidden. This is convenient if the shared purpose of these specified columns is only to provide string input to the target column. To suppress any hiding of columns, False can be used here. By default, all columns other than the first one specified in columns= will be hidden.
rows:int | list[int] | None=None
In conjunction with columns=, we can specify which of their rows should participate in the merging process. The default is all rows, resulting in all rows in columns= being formatted. Alternatively, we can supply a list of row indices.
pattern:str | None=None
A formatting pattern that specifies the arrangement of the column values and any string literals. The pattern uses numbers (within {}) that correspond to the indices of columns provided in columns=. If two columns are provided in columns= and we would like to combine the cell data onto the first column, "{0} {1}" could be used. If a pattern isn’t provided then a space-separated pattern that includes all columns will be generated automatically. The pattern can also use <</>> to surround spans of text that will be removed if any of the contained {} yields a missing value. Further details are provided in the How the pattern works section.
The GT object is returned. This is the same object that the method is called on so that we can facilitate method chaining.
Details
How the pattern works
There are two types of templating for the pattern string:
{} for arranging single column values in a row-wise fashion
<<>> to surround spans of text that will be removed if any of the contained {} yields a missing value
Integer values are placed in {} and those values correspond to the columns involved in the merge, in the order they are provided in the columns= argument. So the pattern "{0} ({1}-{2})" corresponds to the target column value listed first in columns and the second and third columns cited (formatted as a range in parentheses). With hypothetical values, this might result as the merged string "38.2 (3-8)".
Because some values involved in merging may be missing, it is likely that something like "38.2 (3-None)" would be undesirable. For such cases, placing sections of text in <<>> results in the entire span being eliminated if there were to be an None value (arising from {} values). We could instead opt for a pattern like "{0}<< ({1}-{2})>>", which results in "38.2" if either columns {1} or {2} have a None value. We can even use a more complex nesting pattern like "{0}<< ({1}-<<{2}>>)>>" to retain a lower limit in parentheses (where {2} is None) but remove the range altogether if {1} is None.
One more thing to note here is that if .sub_missing() is used on values in a column, those specific values affected won’t be considered truly missing by .cols_merge() (since they have been explicitly handled with substitute text).
Examples
Let’s use a subset of the sp500 dataset to create a table. We’ll merge the open & close columns together, and the low & high columns (putting an em dash between both).
Now we’ll use a portion of the gtcars for the next example that accounts for missing values in the pattern= parameter. Use the .cols_merge() method twice to merge together the: (1) trq and trq_rpm columns, and (2) mpg_c & mpg_h columns. Given the presence of missing values, we can use patterns with <</>> to create conditional text spans, avoiding results where any of the merged columns have missing values.