A generic function used to describe an object for use by LLM.
See also
Other btw_this()
#> [1] "```json"
#> [2] "{\"n_cols\":11,\"n_rows\":32,\"groups\":[],\"class\":\"data.frame\",\"columns\":{\"mpg\":{\"variable\":\"mpg\",\"type\":\"numeric\",\"mean\":20.0906,\"sd\":6.0269,\"p0\":10.4,\"p25\":15.425,\"p50\":19.2,\"p75\":22.8,\"p100\":33.9},\"cyl\":{\"variable\":\"cyl\",\"type\":\"numeric\",\"mean\":6.1875,\"sd\":1.7859,\"p0\":4,\"p25\":4,\"p50\":6,\"p75\":8,\"p100\":8},\"disp\":{\"variable\":\"disp\",\"type\":\"numeric\",\"mean\":230.7219,\"sd\":123.9387,\"p0\":71.1,\"p25\":120.825,\"p50\":196.3,\"p75\":326,\"p100\":472},\"hp\":{\"variable\":\"hp\",\"type\":\"numeric\",\"mean\":146.6875,\"sd\":68.5629,\"p0\":52,\"p25\":96.5,\"p50\":123,\"p75\":180,\"p100\":335},\"drat\":{\"variable\":\"drat\",\"type\":\"numeric\",\"mean\":3.5966,\"sd\":0.5347,\"p0\":2.76,\"p25\":3.08,\"p50\":3.695,\"p75\":3.92,\"p100\":4.93},\"wt\":{\"variable\":\"wt\",\"type\":\"numeric\",\"mean\":3.2172,\"sd\":0.9785,\"p0\":1.513,\"p25\":2.5812,\"p50\":3.325,\"p75\":3.61,\"p100\":5.424},\"qsec\":{\"variable\":\"qsec\",\"type\":\"numeric\",\"mean\":17.8487,\"sd\":1.7869,\"p0\":14.5,\"p25\":16.8925,\"p50\":17.71,\"p75\":18.9,\"p100\":22.9},\"vs\":{\"variable\":\"vs\",\"type\":\"numeric\",\"mean\":0.4375,\"sd\":0.504,\"p0\":0,\"p25\":0,\"p50\":0,\"p75\":1,\"p100\":1},\"am\":{\"variable\":\"am\",\"type\":\"numeric\",\"mean\":0.4062,\"sd\":0.499,\"p0\":0,\"p25\":0,\"p50\":0,\"p75\":1,\"p100\":1},\"gear\":{\"variable\":\"gear\",\"type\":\"numeric\",\"mean\":3.6875,\"sd\":0.7378,\"p0\":3,\"p25\":3,\"p50\":4,\"p75\":4,\"p100\":5},\"carb\":{\"variable\":\"carb\",\"type\":\"numeric\",\"mean\":2.8125,\"sd\":1.6152,\"p0\":1,\"p25\":2,\"p50\":2,\"p75\":4,\"p100\":8}}}"
#> [3] "```"
#> [1] "mutate package:dplyr R Documentation"
#> [2] ""
#> [3] "Create, modify, and delete columns"
#> [4] ""
#> [5] "Description:"
#> [6] ""
#> [7] " ‘mutate()’ creates new columns that are functions of existing"
#> [8] " variables. It can also modify (if the name is the same as an"
#> [9] " existing column) and delete columns (by setting their value to"
#> [10] " ‘NULL’)."
#> [11] ""
#> [12] "Usage:"
#> [13] ""
#> [14] " mutate(.data, ...)"
#> [15] " "
#> [16] " ## S3 method for class 'data.frame'"
#> [17] " mutate("
#> [18] " .data,"
#> [19] " ...,"
#> [20] " .by = NULL,"
#> [21] " .keep = c(\"all\", \"used\", \"unused\", \"none\"),"
#> [22] " .before = NULL,"
#> [23] " .after = NULL"
#> [24] " )"
#> [25] " "
#> [26] "Arguments:"
#> [27] ""
#> [28] " .data: A data frame, data frame extension (e.g. a tibble), or a lazy"
#> [29] " data frame (e.g. from dbplyr or dtplyr). See _Methods_,"
#> [30] " below, for more details."
#> [31] ""
#> [32] " ...: <‘data-masking’> Name-value pairs. The name gives the name of"
#> [33] " the column in the output."
#> [34] ""
#> [35] " The value can be:"
#> [36] ""
#> [37] " • A vector of length 1, which will be recycled to the"
#> [38] " correct length."
#> [39] ""
#> [40] " • A vector the same length as the current group (or the"
#> [41] " whole data frame if ungrouped)."
#> [42] ""
#> [43] " • ‘NULL’, to remove the column."
#> [44] ""
#> [45] " • A data frame or tibble, to create multiple columns in the"
#> [46] " output."
#> [47] ""
#> [48] " .by: *[Experimental]*"
#> [49] ""
#> [50] " <‘tidy-select’> Optionally, a selection of columns to group"
#> [51] " by for just this operation, functioning as an alternative to"
#> [52] " ‘group_by()’. For details and examples, see ?dplyr_by."
#> [53] ""
#> [54] " .keep: Control which columns from ‘.data’ are retained in the"
#> [55] " output. Grouping columns and columns created by ‘...’ are"
#> [56] " always kept."
#> [57] ""
#> [58] " • ‘\"all\"’ retains all columns from ‘.data’. This is the"
#> [59] " default."
#> [60] ""
#> [61] " • ‘\"used\"’ retains only the columns used in ‘...’ to create"
#> [62] " new columns. This is useful for checking your work, as it"
#> [63] " displays inputs and outputs side-by-side."
#> [64] ""
#> [65] " • ‘\"unused\"’ retains only the columns _not_ used in ‘...’"
#> [66] " to create new columns. This is useful if you generate new"
#> [67] " columns, but no longer need the columns used to generate"
#> [68] " them."
#> [69] ""
#> [70] " • ‘\"none\"’ doesn't retain any extra columns from ‘.data’."
#> [71] " Only the grouping variables and columns created by ‘...’"
#> [72] " are kept."
#> [73] ""
#> [74] ".before, .after: <‘tidy-select’> Optionally, control where new columns"
#> [75] " should appear (the default is to add to the right hand side)."
#> [76] " See ‘relocate()’ for more details."
#> [77] ""
#> [78] "Value:"
#> [79] ""
#> [80] " An object of the same type as ‘.data’. The output has the"
#> [81] " following properties:"
#> [82] ""
#> [83] " • Columns from ‘.data’ will be preserved according to the"
#> [84] " ‘.keep’ argument."
#> [85] ""
#> [86] " • Existing columns that are modified by ‘...’ will always be"
#> [87] " returned in their original location."
#> [88] ""
#> [89] " • New columns created through ‘...’ will be placed according to"
#> [90] " the ‘.before’ and ‘.after’ arguments."
#> [91] ""
#> [92] " • The number of rows is not affected."
#> [93] ""
#> [94] " • Columns given the value ‘NULL’ will be removed."
#> [95] ""
#> [96] " • Groups will be recomputed if a grouping variable is mutated."
#> [97] ""
#> [98] " • Data frame attributes are preserved."
#> [99] ""
#> [100] "Useful mutate functions:"
#> [101] ""
#> [102] " • ‘+’, ‘-’, ‘log()’, etc., for their usual mathematical"
#> [103] " meanings"
#> [104] ""
#> [105] " • ‘lead()’, ‘lag()’"
#> [106] ""
#> [107] " • ‘dense_rank()’, ‘min_rank()’, ‘percent_rank()’,"
#> [108] " ‘row_number()’, ‘cume_dist()’, ‘ntile()’"
#> [109] ""
#> [110] " • ‘cumsum()’, ‘cummean()’, ‘cummin()’, ‘cummax()’, ‘cumany()’,"
#> [111] " ‘cumall()’"
#> [112] ""
#> [113] " • ‘na_if()’, ‘coalesce()’"
#> [114] ""
#> [115] " • ‘if_else()’, ‘recode()’, ‘case_when()’"
#> [116] ""
#> [117] "Grouped tibbles:"
#> [118] ""
#> [119] " Because mutating expressions are computed within groups, they may"
#> [120] " yield different results on grouped tibbles. This will be the case"
#> [121] " as soon as an aggregating, lagging, or ranking function is"
#> [122] " involved. Compare this ungrouped mutate:"
#> [123] ""
#> [124] " starwars %>%"
#> [125] " select(name, mass, species) %>%"
#> [126] " mutate(mass_norm = mass / mean(mass, na.rm = TRUE))"
#> [127] " "
#> [128] " With the grouped equivalent:"
#> [129] ""
#> [130] " starwars %>%"
#> [131] " select(name, mass, species) %>%"
#> [132] " group_by(species) %>%"
#> [133] " mutate(mass_norm = mass / mean(mass, na.rm = TRUE))"
#> [134] " "
#> [135] " The former normalises ‘mass’ by the global average whereas the"
#> [136] " latter normalises by the averages within species levels."
#> [137] ""
#> [138] "Methods:"
#> [139] ""
#> [140] " This function is a *generic*, which means that packages can"
#> [141] " provide implementations (methods) for other classes. See the"
#> [142] " documentation of individual methods for extra arguments and"
#> [143] " differences in behaviour."
#> [144] ""
#> [145] " Methods available in currently loaded packages: no methods found."
#> [146] ""
#> [147] "See Also:"
#> [148] ""
#> [149] " Other single table verbs: ‘arrange()’, ‘filter()’, ‘reframe()’,"
#> [150] " ‘rename()’, ‘select()’, ‘slice()’, ‘summarise()’"
#> [151] ""
#> [152] "Examples:"
#> [153] ""
#> [154] " # Newly created variables are available immediately"
#> [155] " starwars %>%"
#> [156] " select(name, mass) %>%"
#> [157] " mutate("
#> [158] " mass2 = mass * 2,"
#> [159] " mass2_squared = mass2 * mass2"
#> [160] " )"
#> [161] " "
#> [162] " # As well as adding new variables, you can use mutate() to"
#> [163] " # remove variables and modify existing variables."
#> [164] " starwars %>%"
#> [165] " select(name, height, mass, homeworld) %>%"
#> [166] " mutate("
#> [167] " mass = NULL,"
#> [168] " height = height * 0.0328084 # convert to feet"
#> [169] " )"
#> [170] " "
#> [171] " # Use across() with mutate() to apply a transformation"
#> [172] " # to multiple columns in a tibble."
#> [173] " starwars %>%"
#> [174] " select(name, homeworld, species) %>%"
#> [175] " mutate(across(!name, as.factor))"
#> [176] " # see more in ?across"
#> [177] " "
#> [178] " # Window functions are useful for grouped mutates:"
#> [179] " starwars %>%"
#> [180] " select(name, mass, homeworld) %>%"
#> [181] " group_by(homeworld) %>%"
#> [182] " mutate(rank = min_rank(desc(mass)))"
#> [183] " # see `vignette(\"window-functions\")` for more details"
#> [184] " "
#> [185] " # By default, new columns are placed on the far right."
#> [186] " df <- tibble(x = 1, y = 2)"
#> [187] " df %>% mutate(z = x + y)"
#> [188] " df %>% mutate(z = x + y, .before = 1)"
#> [189] " df %>% mutate(z = x + y, .after = x)"
#> [190] " "
#> [191] " # By default, mutate() keeps all columns from the input data."
#> [192] " df <- tibble(x = 1, y = 2, a = \"a\", b = \"b\")"
#> [193] " df %>% mutate(z = x + y, .keep = \"all\") # the default"
#> [194] " df %>% mutate(z = x + y, .keep = \"used\")"
#> [195] " df %>% mutate(z = x + y, .keep = \"unused\")"
#> [196] " df %>% mutate(z = x + y, .keep = \"none\")"
#> [197] " "
#> [198] " # Grouping ----------------------------------------"
#> [199] " # The mutate operation may yield different results on grouped"
#> [200] " # tibbles because the expressions are computed within groups."
#> [201] " # The following normalises `mass` by the global average:"
#> [202] " starwars %>%"
#> [203] " select(name, mass, species) %>%"
#> [204] " mutate(mass_norm = mass / mean(mass, na.rm = TRUE))"
#> [205] " "
#> [206] " # Whereas this normalises `mass` by the averages within species"
#> [207] " # levels:"
#> [208] " starwars %>%"
#> [209] " select(name, mass, species) %>%"
#> [210] " group_by(species) %>%"
#> [211] " mutate(mass_norm = mass / mean(mass, na.rm = TRUE))"
#> [212] " "
#> [213] " # Indirection ----------------------------------------"
#> [214] " # Refer to column names stored as strings with the `.data` pronoun:"
#> [215] " vars <- c(\"mass\", \"height\")"
#> [216] " mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]])"
#> [217] " # Learn more in ?rlang::args_data_masking"
#> [218] " "
#> [1] "# Introduction to dplyr {#introduction-to-dplyr .title .toc-ignore}"
#> [2] ""
#> [3] "When working with data you must:"
#> [4] ""
#> [5] "- Figure out what you want to do."
#> [6] ""
#> [7] "- Describe those tasks in the form of a computer program."
#> [8] ""
#> [9] "- Execute the program."
#> [10] ""
#> [11] "The dplyr package makes these steps fast and easy:"
#> [12] ""
#> [13] "- By constraining your options, it helps you think about your data"
#> [14] " manipulation challenges."
#> [15] ""
#> [16] "- It provides simple \"verbs\", functions that correspond to the most"
#> [17] " common data manipulation tasks, to help you translate your thoughts"
#> [18] " into code."
#> [19] ""
#> [20] "- It uses efficient backends, so you spend less time waiting for the"
#> [21] " computer."
#> [22] ""
#> [23] "This document introduces you to dplyr's basic set of tools, and shows"
#> [24] "you how to apply them to data frames. dplyr also supports databases via"
#> [25] "the dbplyr package, once you've installed, read `vignette(\"dbplyr\")` to"
#> [26] "learn more."
#> [27] ""
#> [28] "::: {#data-starwars .section .level2}"
#> [29] "## Data: starwars"
#> [30] ""
#> [31] "To explore the basic data manipulation verbs of dplyr, we'll use the"
#> [32] "dataset `starwars`. This dataset contains 87 characters and comes from"
#> [33] "the [Star Wars API](https://swapi.dev), and is documented in `?starwars`"
#> [34] ""
#> [35] "::: {#cb1 .sourceCode}"
#> [36] "``` {.sourceCode .r}"
#> [37] "dim(starwars)"
#> [38] "#> [1] 87 14"
#> [39] "starwars"
#> [40] "#> # A tibble: 87 × 14"
#> [41] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [42] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [43] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [44] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [45] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [46] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [47] "#> # ℹ 83 more rows"
#> [48] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [49] "#> # vehicles <list>, starships <list>"
#> [50] "```"
#> [51] ":::"
#> [52] ""
#> [53] "Note that `starwars` is a tibble, a modern reimagining of the data"
#> [54] "frame. It's particularly useful for large datasets because it only"
#> [55] "prints the first few rows. You can learn more about tibbles at"
#> [56] "<https://tibble.tidyverse.org>; in particular you can convert data"
#> [57] "frames to tibbles with `as_tibble()`."
#> [58] ":::"
#> [59] ""
#> [60] "::: {#single-table-verbs .section .level2}"
#> [61] "## Single table verbs"
#> [62] ""
#> [63] "dplyr aims to provide a function for each basic verb of data"
#> [64] "manipulation. These verbs can be organised into three categories based"
#> [65] "on the component of the dataset that they work with:"
#> [66] ""
#> [67] "- Rows:"
#> [68] " - `filter()` chooses rows based on column values."
#> [69] " - `slice()` chooses rows based on location."
#> [70] " - `arrange()` changes the order of the rows."
#> [71] "- Columns:"
#> [72] " - `select()` changes whether or not a column is included."
#> [73] " - `rename()` changes the name of columns."
#> [74] " - `mutate()` changes the values of columns and creates new"
#> [75] " columns."
#> [76] " - `relocate()` changes the order of the columns."
#> [77] "- Groups of rows:"
#> [78] " - `summarise()` collapses a group into a single row."
#> [79] ""
#> [80] "::: {#the-pipe .section .level3}"
#> [81] "### The pipe"
#> [82] ""
#> [83] "All of the dplyr functions take a data frame (or tibble) as the first"
#> [84] "argument. Rather than forcing the user to either save intermediate"
#> [85] "objects or nest functions, dplyr provides the `%>%` operator from"
#> [86] "magrittr. `x %>% f(y)` turns into `f(x, y)` so the result from one step"
#> [87] "is then \"piped\" into the next step. You can use the pipe to rewrite"
#> [88] "multiple operations that you can read left-to-right, top-to-bottom"
#> [89] "(reading the pipe operator as \"then\")."
#> [90] ":::"
#> [91] ""
#> [92] "::: {#filter-rows-with-filter .section .level3}"
#> [93] "### Filter rows with `filter()`"
#> [94] ""
#> [95] "`filter()` allows you to select a subset of rows in a data frame. Like"
#> [96] "all single verbs, the first argument is the tibble (or data frame). The"
#> [97] "second and subsequent arguments refer to variables within that data"
#> [98] "frame, selecting rows where the expression is `TRUE`."
#> [99] ""
#> [100] "For example, we can select all character with light skin color and brown"
#> [101] "eyes with:"
#> [102] ""
#> [103] "::: {#cb2 .sourceCode}"
#> [104] "``` {.sourceCode .r}"
#> [105] "starwars %>% filter(skin_color == \"light\", eye_color == \"brown\")"
#> [106] "#> # A tibble: 7 × 14"
#> [107] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [108] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [109] "#> 1 Leia Org… 150 49 brown light brown 19 fema… femin…"
#> [110] "#> 2 Biggs Da… 183 84 black light brown 24 male mascu…"
#> [111] "#> 3 Padmé Am… 185 45 brown light brown 46 fema… femin…"
#> [112] "#> 4 Cordé 157 NA brown light brown NA <NA> <NA> "
#> [113] "#> # ℹ 3 more rows"
#> [114] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [115] "#> # vehicles <list>, starships <list>"
#> [116] "```"
#> [117] ":::"
#> [118] ""
#> [119] "This is roughly equivalent to this base R code:"
#> [120] ""
#> [121] "::: {#cb3 .sourceCode}"
#> [122] "``` {.sourceCode .r}"
#> [123] "starwars[starwars$skin_color == \"light\" & starwars$eye_color == \"brown\", ]"
#> [124] "```"
#> [125] ":::"
#> [126] ":::"
#> [127] ""
#> [128] "::: {#arrange-rows-with-arrange .section .level3}"
#> [129] "### Arrange rows with `arrange()`"
#> [130] ""
#> [131] "`arrange()` works similarly to `filter()` except that instead of"
#> [132] "filtering or selecting rows, it reorders them. It takes a data frame,"
#> [133] "and a set of column names (or more complicated expressions) to order by."
#> [134] "If you provide more than one column name, each additional column will be"
#> [135] "used to break ties in the values of preceding columns:"
#> [136] ""
#> [137] "::: {#cb4 .sourceCode}"
#> [138] "``` {.sourceCode .r}"
#> [139] "starwars %>% arrange(height, mass)"
#> [140] "#> # A tibble: 87 × 14"
#> [141] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [142] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [143] "#> 1 Yoda 66 17 white green brown 896 male mascu…"
#> [144] "#> 2 Ratts Ty… 79 15 none grey, blue unknown NA male mascu…"
#> [145] "#> 3 Wicket S… 88 20 brown brown brown 8 male mascu…"
#> [146] "#> 4 Dud Bolt 94 45 none blue, grey yellow NA male mascu…"
#> [147] "#> # ℹ 83 more rows"
#> [148] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [149] "#> # vehicles <list>, starships <list>"
#> [150] "```"
#> [151] ":::"
#> [152] ""
#> [153] "Use `desc()` to order a column in descending order:"
#> [154] ""
#> [155] "::: {#cb5 .sourceCode}"
#> [156] "``` {.sourceCode .r}"
#> [157] "starwars %>% arrange(desc(height))"
#> [158] "#> # A tibble: 87 × 14"
#> [159] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [160] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [161] "#> 1 Yarael P… 264 NA none white yellow NA male mascu…"
#> [162] "#> 2 Tarfful 234 136 brown brown blue NA male mascu…"
#> [163] "#> 3 Lama Su 229 88 none grey black NA male mascu…"
#> [164] "#> 4 Chewbacca 228 112 brown unknown blue 200 male mascu…"
#> [165] "#> # ℹ 83 more rows"
#> [166] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [167] "#> # vehicles <list>, starships <list>"
#> [168] "```"
#> [169] ":::"
#> [170] ":::"
#> [171] ""
#> [172] "::: {#choose-rows-using-their-position-with-slice .section .level3}"
#> [173] "### Choose rows using their position with `slice()`"
#> [174] ""
#> [175] "`slice()` lets you index rows by their (integer) locations. It allows"
#> [176] "you to select, remove, and duplicate rows."
#> [177] ""
#> [178] "We can get characters from row numbers 5 through 10."
#> [179] ""
#> [180] "::: {#cb6 .sourceCode}"
#> [181] "``` {.sourceCode .r}"
#> [182] "starwars %>% slice(5:10)"
#> [183] "#> # A tibble: 6 × 14"
#> [184] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [185] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [186] "#> 1 Leia Org… 150 49 brown light brown 19 fema… femin…"
#> [187] "#> 2 Owen Lars 178 120 brown, gr… light blue 52 male mascu…"
#> [188] "#> 3 Beru Whi… 165 75 brown light blue 47 fema… femin…"
#> [189] "#> 4 R5-D4 97 32 <NA> white, red red NA none mascu…"
#> [190] "#> # ℹ 2 more rows"
#> [191] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [192] "#> # vehicles <list>, starships <list>"
#> [193] "```"
#> [194] ":::"
#> [195] ""
#> [196] "It is accompanied by a number of helpers for common use cases:"
#> [197] ""
#> [198] "- `slice_head()` and `slice_tail()` select the first or last rows."
#> [199] ""
#> [200] "::: {#cb7 .sourceCode}"
#> [201] "``` {.sourceCode .r}"
#> [202] "starwars %>% slice_head(n = 3)"
#> [203] "#> # A tibble: 3 × 14"
#> [204] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [205] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [206] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [207] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [208] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [209] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [210] "#> # vehicles <list>, starships <list>"
#> [211] "```"
#> [212] ":::"
#> [213] ""
#> [214] "- `slice_sample()` randomly selects rows. Use the option prop to"
#> [215] " choose a certain proportion of the cases."
#> [216] ""
#> [217] "::: {#cb8 .sourceCode}"
#> [218] "``` {.sourceCode .r}"
#> [219] "starwars %>% slice_sample(n = 5)"
#> [220] "#> # A tibble: 5 × 14"
#> [221] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [222] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [223] "#> 1 Ayla Sec… 178 55 none blue hazel 48 fema… femin…"
#> [224] "#> 2 Bossk 190 113 none green red 53 male mascu…"
#> [225] "#> 3 San Hill 191 NA none grey gold NA male mascu…"
#> [226] "#> 4 Luminara… 170 56.2 black yellow blue 58 fema… femin…"
#> [227] "#> # ℹ 1 more row"
#> [228] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [229] "#> # vehicles <list>, starships <list>"
#> [230] "starwars %>% slice_sample(prop = 0.1)"
#> [231] "#> # A tibble: 8 × 14"
#> [232] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [233] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [234] "#> 1 Qui-Gon … 193 89 brown fair blue 92 male mascu…"
#> [235] "#> 2 Jango Fe… 183 79 black tan brown 66 male mascu…"
#> [236] "#> 3 Jocasta … 167 NA white fair blue NA fema… femin…"
#> [237] "#> 4 Zam Wese… 168 55 blonde fair, gre… yellow NA fema… femin…"
#> [238] "#> # ℹ 4 more rows"
#> [239] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [240] "#> # vehicles <list>, starships <list>"
#> [241] "```"
#> [242] ":::"
#> [243] ""
#> [244] "Use `replace = TRUE` to perform a bootstrap sample. If needed, you can"
#> [245] "weight the sample with the `weight` argument."
#> [246] ""
#> [247] "- `slice_min()` and `slice_max()` select rows with highest or lowest"
#> [248] " values of a variable. Note that we first must choose only the values"
#> [249] " which are not NA."
#> [250] ""
#> [251] "::: {#cb9 .sourceCode}"
#> [252] "``` {.sourceCode .r}"
#> [253] "starwars %>%"
#> [254] " filter(!is.na(height)) %>%"
#> [255] " slice_max(height, n = 3)"
#> [256] "#> # A tibble: 3 × 14"
#> [257] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [258] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [259] "#> 1 Yarael P… 264 NA none white yellow NA male mascu…"
#> [260] "#> 2 Tarfful 234 136 brown brown blue NA male mascu…"
#> [261] "#> 3 Lama Su 229 88 none grey black NA male mascu…"
#> [262] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [263] "#> # vehicles <list>, starships <list>"
#> [264] "```"
#> [265] ":::"
#> [266] ":::"
#> [267] ""
#> [268] "::: {#select-columns-with-select .section .level3}"
#> [269] "### Select columns with `select()`"
#> [270] ""
#> [271] "Often you work with large datasets with many columns but only a few are"
#> [272] "actually of interest to you. `select()` allows you to rapidly zoom in on"
#> [273] "a useful subset using operations that usually only work on numeric"
#> [274] "variable positions:"
#> [275] ""
#> [276] "::: {#cb10 .sourceCode}"
#> [277] "``` {.sourceCode .r}"
#> [278] "# Select columns by name"
#> [279] "starwars %>% select(hair_color, skin_color, eye_color)"
#> [280] "#> # A tibble: 87 × 3"
#> [281] "#> hair_color skin_color eye_color"
#> [282] "#> <chr> <chr> <chr> "
#> [283] "#> 1 blond fair blue "
#> [284] "#> 2 <NA> gold yellow "
#> [285] "#> 3 <NA> white, blue red "
#> [286] "#> 4 none white yellow "
#> [287] "#> # ℹ 83 more rows"
#> [288] "# Select all columns between hair_color and eye_color (inclusive)"
#> [289] "starwars %>% select(hair_color:eye_color)"
#> [290] "#> # A tibble: 87 × 3"
#> [291] "#> hair_color skin_color eye_color"
#> [292] "#> <chr> <chr> <chr> "
#> [293] "#> 1 blond fair blue "
#> [294] "#> 2 <NA> gold yellow "
#> [295] "#> 3 <NA> white, blue red "
#> [296] "#> 4 none white yellow "
#> [297] "#> # ℹ 83 more rows"
#> [298] "# Select all columns except those from hair_color to eye_color (inclusive)"
#> [299] "starwars %>% select(!(hair_color:eye_color))"
#> [300] "#> # A tibble: 87 × 11"
#> [301] "#> name height mass birth_year sex gender homeworld species films vehicles"
#> [302] "#> <chr> <int> <dbl> <dbl> <chr> <chr> <chr> <chr> <lis> <list> "
#> [303] "#> 1 Luke Sk… 172 77 19 male mascu… Tatooine Human <chr> <chr> "
#> [304] "#> 2 C-3PO 167 75 112 none mascu… Tatooine Droid <chr> <chr> "
#> [305] "#> 3 R2-D2 96 32 33 none mascu… Naboo Droid <chr> <chr> "
#> [306] "#> 4 Darth V… 202 136 41.9 male mascu… Tatooine Human <chr> <chr> "
#> [307] "#> # ℹ 83 more rows"
#> [308] "#> # ℹ 1 more variable: starships <list>"
#> [309] "# Select all columns ending with color"
#> [310] "starwars %>% select(ends_with(\"color\"))"
#> [311] "#> # A tibble: 87 × 3"
#> [312] "#> hair_color skin_color eye_color"
#> [313] "#> <chr> <chr> <chr> "
#> [314] "#> 1 blond fair blue "
#> [315] "#> 2 <NA> gold yellow "
#> [316] "#> 3 <NA> white, blue red "
#> [317] "#> 4 none white yellow "
#> [318] "#> # ℹ 83 more rows"
#> [319] "```"
#> [320] ":::"
#> [321] ""
#> [322] "There are a number of helper functions you can use within `select()`,"
#> [323] "like `starts_with()`, `ends_with()`, `matches()` and `contains()`. These"
#> [324] "let you quickly match larger blocks of variables that meet some"
#> [325] "criterion. See `?select` for more details."
#> [326] ""
#> [327] "You can rename variables with `select()` by using named arguments:"
#> [328] ""
#> [329] "::: {#cb11 .sourceCode}"
#> [330] "``` {.sourceCode .r}"
#> [331] "starwars %>% select(home_world = homeworld)"
#> [332] "#> # A tibble: 87 × 1"
#> [333] "#> home_world"
#> [334] "#> <chr> "
#> [335] "#> 1 Tatooine "
#> [336] "#> 2 Tatooine "
#> [337] "#> 3 Naboo "
#> [338] "#> 4 Tatooine "
#> [339] "#> # ℹ 83 more rows"
#> [340] "```"
#> [341] ":::"
#> [342] ""
#> [343] "But because `select()` drops all the variables not explicitly mentioned,"
#> [344] "it's not that useful. Instead, use `rename()`:"
#> [345] ""
#> [346] "::: {#cb12 .sourceCode}"
#> [347] "``` {.sourceCode .r}"
#> [348] "starwars %>% rename(home_world = homeworld)"
#> [349] "#> # A tibble: 87 × 14"
#> [350] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [351] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [352] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [353] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [354] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [355] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [356] "#> # ℹ 83 more rows"
#> [357] "#> # ℹ 5 more variables: home_world <chr>, species <chr>, films <list>,"
#> [358] "#> # vehicles <list>, starships <list>"
#> [359] "```"
#> [360] ":::"
#> [361] ":::"
#> [362] ""
#> [363] "::: {#add-new-columns-with-mutate .section .level3}"
#> [364] "### Add new columns with `mutate()`"
#> [365] ""
#> [366] "Besides selecting sets of existing columns, it's often useful to add new"
#> [367] "columns that are functions of existing columns. This is the job of"
#> [368] "`mutate()`:"
#> [369] ""
#> [370] "::: {#cb13 .sourceCode}"
#> [371] "``` {.sourceCode .r}"
#> [372] "starwars %>% mutate(height_m = height / 100)"
#> [373] "#> # A tibble: 87 × 15"
#> [374] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [375] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [376] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [377] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [378] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [379] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [380] "#> # ℹ 83 more rows"
#> [381] "#> # ℹ 6 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [382] "#> # vehicles <list>, starships <list>, height_m <dbl>"
#> [383] "```"
#> [384] ":::"
#> [385] ""
#> [386] "We can't see the height in meters we just calculated, but we can fix"
#> [387] "that using a select command."
#> [388] ""
#> [389] "::: {#cb14 .sourceCode}"
#> [390] "``` {.sourceCode .r}"
#> [391] "starwars %>%"
#> [392] " mutate(height_m = height / 100) %>%"
#> [393] " select(height_m, height, everything())"
#> [394] "#> # A tibble: 87 × 15"
#> [395] "#> height_m height name mass hair_color skin_color eye_color birth_year sex "
#> [396] "#> <dbl> <int> <chr> <dbl> <chr> <chr> <chr> <dbl> <chr>"
#> [397] "#> 1 1.72 172 Luke S… 77 blond fair blue 19 male "
#> [398] "#> 2 1.67 167 C-3PO 75 <NA> gold yellow 112 none "
#> [399] "#> 3 0.96 96 R2-D2 32 <NA> white, bl… red 33 none "
#> [400] "#> 4 2.02 202 Darth … 136 none white yellow 41.9 male "
#> [401] "#> # ℹ 83 more rows"
#> [402] "#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,"
#> [403] "#> # films <list>, vehicles <list>, starships <list>"
#> [404] "```"
#> [405] ":::"
#> [406] ""
#> [407] "`dplyr::mutate()` is similar to the base `transform()`, but allows you"
#> [408] "to refer to columns that you've just created:"
#> [409] ""
#> [410] "::: {#cb15 .sourceCode}"
#> [411] "``` {.sourceCode .r}"
#> [412] "starwars %>%"
#> [413] " mutate("
#> [414] " height_m = height / 100,"
#> [415] " BMI = mass / (height_m^2)"
#> [416] " ) %>%"
#> [417] " select(BMI, everything())"
#> [418] "#> # A tibble: 87 × 16"
#> [419] "#> BMI name height mass hair_color skin_color eye_color birth_year sex "
#> [420] "#> <dbl> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>"
#> [421] "#> 1 26.0 Luke Skyw… 172 77 blond fair blue 19 male "
#> [422] "#> 2 26.9 C-3PO 167 75 <NA> gold yellow 112 none "
#> [423] "#> 3 34.7 R2-D2 96 32 <NA> white, bl… red 33 none "
#> [424] "#> 4 33.3 Darth Vad… 202 136 none white yellow 41.9 male "
#> [425] "#> # ℹ 83 more rows"
#> [426] "#> # ℹ 7 more variables: gender <chr>, homeworld <chr>, species <chr>,"
#> [427] "#> # films <list>, vehicles <list>, starships <list>, height_m <dbl>"
#> [428] "```"
#> [429] ":::"
#> [430] ""
#> [431] "If you only want to keep the new variables, use `.keep = \"none\"`:"
#> [432] ""
#> [433] "::: {#cb16 .sourceCode}"
#> [434] "``` {.sourceCode .r}"
#> [435] "starwars %>%"
#> [436] " mutate("
#> [437] " height_m = height / 100,"
#> [438] " BMI = mass / (height_m^2),"
#> [439] " .keep = \"none\""
#> [440] " )"
#> [441] "#> # A tibble: 87 × 2"
#> [442] "#> height_m BMI"
#> [443] "#> <dbl> <dbl>"
#> [444] "#> 1 1.72 26.0"
#> [445] "#> 2 1.67 26.9"
#> [446] "#> 3 0.96 34.7"
#> [447] "#> 4 2.02 33.3"
#> [448] "#> # ℹ 83 more rows"
#> [449] "```"
#> [450] ":::"
#> [451] ":::"
#> [452] ""
#> [453] "::: {#change-column-order-with-relocate .section .level3}"
#> [454] "### Change column order with `relocate()`"
#> [455] ""
#> [456] "Use a similar syntax as `select()` to move blocks of columns at once"
#> [457] ""
#> [458] "::: {#cb17 .sourceCode}"
#> [459] "``` {.sourceCode .r}"
#> [460] "starwars %>% relocate(sex:homeworld, .before = height)"
#> [461] "#> # A tibble: 87 × 14"
#> [462] "#> name sex gender homeworld height mass hair_color skin_color eye_color"
#> [463] "#> <chr> <chr> <chr> <chr> <int> <dbl> <chr> <chr> <chr> "
#> [464] "#> 1 Luke Skyw… male mascu… Tatooine 172 77 blond fair blue "
#> [465] "#> 2 C-3PO none mascu… Tatooine 167 75 <NA> gold yellow "
#> [466] "#> 3 R2-D2 none mascu… Naboo 96 32 <NA> white, bl… red "
#> [467] "#> 4 Darth Vad… male mascu… Tatooine 202 136 none white yellow "
#> [468] "#> # ℹ 83 more rows"
#> [469] "#> # ℹ 5 more variables: birth_year <dbl>, species <chr>, films <list>,"
#> [470] "#> # vehicles <list>, starships <list>"
#> [471] "```"
#> [472] ":::"
#> [473] ":::"
#> [474] ""
#> [475] "::: {#summarise-values-with-summarise .section .level3}"
#> [476] "### Summarise values with `summarise()`"
#> [477] ""
#> [478] "The last verb is `summarise()`. It collapses a data frame to a single"
#> [479] "row."
#> [480] ""
#> [481] "::: {#cb18 .sourceCode}"
#> [482] "``` {.sourceCode .r}"
#> [483] "starwars %>% summarise(height = mean(height, na.rm = TRUE))"
#> [484] "#> # A tibble: 1 × 1"
#> [485] "#> height"
#> [486] "#> <dbl>"
#> [487] "#> 1 175."
#> [488] "```"
#> [489] ":::"
#> [490] ""
#> [491] "It's not that useful until we learn the `group_by()` verb below."
#> [492] ":::"
#> [493] ""
#> [494] "::: {#commonalities .section .level3}"
#> [495] "### Commonalities"
#> [496] ""
#> [497] "You may have noticed that the syntax and function of all these verbs are"
#> [498] "very similar:"
#> [499] ""
#> [500] "- The first argument is a data frame."
#> [501] ""
#> [502] "- The subsequent arguments describe what to do with the data frame."
#> [503] " You can refer to columns in the data frame directly without using"
#> [504] " `$`."
#> [505] ""
#> [506] "- The result is a new data frame"
#> [507] ""
#> [508] "Together these properties make it easy to chain together multiple simple"
#> [509] "steps to achieve a complex result."
#> [510] ""
#> [511] "These five functions provide the basis of a language of data"
#> [512] "manipulation. At the most basic level, you can only alter a tidy data"
#> [513] "frame in five useful ways: you can reorder the rows (`arrange()`), pick"
#> [514] "observations and variables of interest (`filter()` and `select()`), add"
#> [515] "new variables that are functions of existing variables (`mutate()`), or"
#> [516] "collapse many values to a summary (`summarise()`)."
#> [517] ":::"
#> [518] ":::"
#> [519] ""
#> [520] "::: {#combining-functions-with .section .level2}"
#> [521] "## Combining functions with `%>%`"
#> [522] ""
#> [523] "The dplyr API is functional in the sense that function calls don't have"
#> [524] "side-effects. You must always save their results. This doesn't lead to"
#> [525] "particularly elegant code, especially if you want to do many operations"
#> [526] "at once. You either have to do it step-by-step:"
#> [527] ""
#> [528] "::: {#cb19 .sourceCode}"
#> [529] "``` {.sourceCode .r}"
#> [530] "a1 <- group_by(starwars, species, sex)"
#> [531] "a2 <- select(a1, height, mass)"
#> [532] "a3 <- summarise(a2,"
#> [533] " height = mean(height, na.rm = TRUE),"
#> [534] " mass = mean(mass, na.rm = TRUE)"
#> [535] ")"
#> [536] "```"
#> [537] ":::"
#> [538] ""
#> [539] "Or if you don't want to name the intermediate results, you need to wrap"
#> [540] "the function calls inside each other:"
#> [541] ""
#> [542] "::: {#cb20 .sourceCode}"
#> [543] "``` {.sourceCode .r}"
#> [544] "summarise("
#> [545] " select("
#> [546] " group_by(starwars, species, sex),"
#> [547] " height, mass"
#> [548] " ),"
#> [549] " height = mean(height, na.rm = TRUE),"
#> [550] " mass = mean(mass, na.rm = TRUE)"
#> [551] ")"
#> [552] "#> Adding missing grouping variables: `species`, `sex`"
#> [553] "#> `summarise()` has grouped output by 'species'. You can override using the"
#> [554] "#> `.groups` argument."
#> [555] "#> # A tibble: 41 × 4"
#> [556] "#> # Groups: species [38]"
#> [557] "#> species sex height mass"
#> [558] "#> <chr> <chr> <dbl> <dbl>"
#> [559] "#> 1 Aleena male 79 15"
#> [560] "#> 2 Besalisk male 198 102"
#> [561] "#> 3 Cerean male 198 82"
#> [562] "#> 4 Chagrian male 196 NaN"
#> [563] "#> # ℹ 37 more rows"
#> [564] "```"
#> [565] ":::"
#> [566] ""
#> [567] "This is difficult to read because the order of the operations is from"
#> [568] "inside to out. Thus, the arguments are a long way away from the"
#> [569] "function. To get around this problem, dplyr provides the `%>%` operator"
#> [570] "from magrittr. `x %>% f(y)` turns into `f(x, y)` so you can use it to"
#> [571] "rewrite multiple operations that you can read left-to-right,"
#> [572] "top-to-bottom (reading the pipe operator as \"then\"):"
#> [573] ""
#> [574] "::: {#cb21 .sourceCode}"
#> [575] "``` {.sourceCode .r}"
#> [576] "starwars %>%"
#> [577] " group_by(species, sex) %>%"
#> [578] " select(height, mass) %>%"
#> [579] " summarise("
#> [580] " height = mean(height, na.rm = TRUE),"
#> [581] " mass = mean(mass, na.rm = TRUE)"
#> [582] " )"
#> [583] "```"
#> [584] ":::"
#> [585] ":::"
#> [586] ""
#> [587] "::: {#patterns-of-operations .section .level2}"
#> [588] "## Patterns of operations"
#> [589] ""
#> [590] "The dplyr verbs can be classified by the type of operations they"
#> [591] "accomplish (we sometimes speak of their **semantics**, i.e., their"
#> [592] "meaning). It's helpful to have a good grasp of the difference between"
#> [593] "select and mutate operations."
#> [594] ""
#> [595] "::: {#selecting-operations .section .level3}"
#> [596] "### Selecting operations"
#> [597] ""
#> [598] "One of the appealing features of dplyr is that you can refer to columns"
#> [599] "from the tibble as if they were regular variables. However, the"
#> [600] "syntactic uniformity of referring to bare column names hides semantical"
#> [601] "differences across the verbs. A column symbol supplied to `select()`"
#> [602] "does not have the same meaning as the same symbol supplied to"
#> [603] "`mutate()`."
#> [604] ""
#> [605] "Selecting operations expect column names and positions. Hence, when you"
#> [606] "call `select()` with bare variable names, they actually represent their"
#> [607] "own positions in the tibble. The following calls are completely"
#> [608] "equivalent from dplyr's point of view:"
#> [609] ""
#> [610] "::: {#cb22 .sourceCode}"
#> [611] "``` {.sourceCode .r}"
#> [612] "# `name` represents the integer 1"
#> [613] "select(starwars, name)"
#> [614] "#> # A tibble: 87 × 1"
#> [615] "#> name "
#> [616] "#> <chr> "
#> [617] "#> 1 Luke Skywalker"
#> [618] "#> 2 C-3PO "
#> [619] "#> 3 R2-D2 "
#> [620] "#> 4 Darth Vader "
#> [621] "#> # ℹ 83 more rows"
#> [622] "select(starwars, 1)"
#> [623] "#> # A tibble: 87 × 1"
#> [624] "#> name "
#> [625] "#> <chr> "
#> [626] "#> 1 Luke Skywalker"
#> [627] "#> 2 C-3PO "
#> [628] "#> 3 R2-D2 "
#> [629] "#> 4 Darth Vader "
#> [630] "#> # ℹ 83 more rows"
#> [631] "```"
#> [632] ":::"
#> [633] ""
#> [634] "By the same token, this means that you cannot refer to variables from"
#> [635] "the surrounding context if they have the same name as one of the"
#> [636] "columns. In the following example, `height` still represents 2, not 5:"
#> [637] ""
#> [638] "::: {#cb23 .sourceCode}"
#> [639] "``` {.sourceCode .r}"
#> [640] "height <- 5"
#> [641] "select(starwars, height)"
#> [642] "#> # A tibble: 87 × 1"
#> [643] "#> height"
#> [644] "#> <int>"
#> [645] "#> 1 172"
#> [646] "#> 2 167"
#> [647] "#> 3 96"
#> [648] "#> 4 202"
#> [649] "#> # ℹ 83 more rows"
#> [650] "```"
#> [651] ":::"
#> [652] ""
#> [653] "One useful subtlety is that this only applies to bare names and to"
#> [654] "selecting calls like `c(height, mass)` or `height:mass`. In all other"
#> [655] "cases, the columns of the data frame are not put in scope. This allows"
#> [656] "you to refer to contextual variables in selection helpers:"
#> [657] ""
#> [658] "::: {#cb24 .sourceCode}"
#> [659] "``` {.sourceCode .r}"
#> [660] "name <- \"color\""
#> [661] "select(starwars, ends_with(name))"
#> [662] "#> # A tibble: 87 × 3"
#> [663] "#> hair_color skin_color eye_color"
#> [664] "#> <chr> <chr> <chr> "
#> [665] "#> 1 blond fair blue "
#> [666] "#> 2 <NA> gold yellow "
#> [667] "#> 3 <NA> white, blue red "
#> [668] "#> 4 none white yellow "
#> [669] "#> # ℹ 83 more rows"
#> [670] "```"
#> [671] ":::"
#> [672] ""
#> [673] "These semantics are usually intuitive. But note the subtle difference:"
#> [674] ""
#> [675] "::: {#cb25 .sourceCode}"
#> [676] "``` {.sourceCode .r}"
#> [677] "name <- 5"
#> [678] "select(starwars, name, identity(name))"
#> [679] "#> # A tibble: 87 × 2"
#> [680] "#> name skin_color "
#> [681] "#> <chr> <chr> "
#> [682] "#> 1 Luke Skywalker fair "
#> [683] "#> 2 C-3PO gold "
#> [684] "#> 3 R2-D2 white, blue"
#> [685] "#> 4 Darth Vader white "
#> [686] "#> # ℹ 83 more rows"
#> [687] "```"
#> [688] ":::"
#> [689] ""
#> [690] "In the first argument, `name` represents its own position `1`. In the"
#> [691] "second argument, `name` is evaluated in the surrounding context and"
#> [692] "represents the fifth column."
#> [693] ""
#> [694] "For a long time, `select()` used to only understand column positions."
#> [695] "Counting from dplyr 0.6, it now understands column names as well. This"
#> [696] "makes it a bit easier to program with `select()`:"
#> [697] ""
#> [698] "::: {#cb26 .sourceCode}"
#> [699] "``` {.sourceCode .r}"
#> [700] "vars <- c(\"name\", \"height\")"
#> [701] "select(starwars, all_of(vars), \"mass\")"
#> [702] "#> # A tibble: 87 × 3"
#> [703] "#> name height mass"
#> [704] "#> <chr> <int> <dbl>"
#> [705] "#> 1 Luke Skywalker 172 77"
#> [706] "#> 2 C-3PO 167 75"
#> [707] "#> 3 R2-D2 96 32"
#> [708] "#> 4 Darth Vader 202 136"
#> [709] "#> # ℹ 83 more rows"
#> [710] "```"
#> [711] ":::"
#> [712] ":::"
#> [713] ""
#> [714] "::: {#mutating-operations .section .level3}"
#> [715] "### Mutating operations"
#> [716] ""
#> [717] "Mutate semantics are quite different from selection semantics. Whereas"
#> [718] "`select()` expects column names or positions, `mutate()` expects *column"
#> [719] "vectors*. We will set up a smaller tibble to use for our examples."
#> [720] ""
#> [721] "::: {#cb27 .sourceCode}"
#> [722] "``` {.sourceCode .r}"
#> [723] "df <- starwars %>% select(name, height, mass)"
#> [724] "```"
#> [725] ":::"
#> [726] ""
#> [727] "When we use `select()`, the bare column names stand for their own"
#> [728] "positions in the tibble. For `mutate()` on the other hand, column"
#> [729] "symbols represent the actual column vectors stored in the tibble."
#> [730] "Consider what happens if we give a string or a number to `mutate()`:"
#> [731] ""
#> [732] "::: {#cb28 .sourceCode}"
#> [733] "``` {.sourceCode .r}"
#> [734] "mutate(df, \"height\", 2)"
#> [735] "#> # A tibble: 87 × 5"
#> [736] "#> name height mass `\"height\"` `2`"
#> [737] "#> <chr> <int> <dbl> <chr> <dbl>"
#> [738] "#> 1 Luke Skywalker 172 77 height 2"
#> [739] "#> 2 C-3PO 167 75 height 2"
#> [740] "#> 3 R2-D2 96 32 height 2"
#> [741] "#> 4 Darth Vader 202 136 height 2"
#> [742] "#> # ℹ 83 more rows"
#> [743] "```"
#> [744] ":::"
#> [745] ""
#> [746] "`mutate()` gets length-1 vectors that it interprets as new columns in"
#> [747] "the data frame. These vectors are recycled so they match the number of"
#> [748] "rows. That's why it doesn't make sense to supply expressions like"
#> [749] "`\"height\" + 10` to `mutate()`. This amounts to adding 10 to a string!"
#> [750] "The correct expression is:"
#> [751] ""
#> [752] "::: {#cb29 .sourceCode}"
#> [753] "``` {.sourceCode .r}"
#> [754] "mutate(df, height + 10)"
#> [755] "#> # A tibble: 87 × 4"
#> [756] "#> name height mass `height + 10`"
#> [757] "#> <chr> <int> <dbl> <dbl>"
#> [758] "#> 1 Luke Skywalker 172 77 182"
#> [759] "#> 2 C-3PO 167 75 177"
#> [760] "#> 3 R2-D2 96 32 106"
#> [761] "#> 4 Darth Vader 202 136 212"
#> [762] "#> # ℹ 83 more rows"
#> [763] "```"
#> [764] ":::"
#> [765] ""
#> [766] "In the same way, you can unquote values from the context if these values"
#> [767] "represent a valid column. They must be either length 1 (they then get"
#> [768] "recycled) or have the same length as the number of rows. In the"
#> [769] "following example we create a new vector that we add to the data frame:"
#> [770] ""
#> [771] "::: {#cb30 .sourceCode}"
#> [772] "``` {.sourceCode .r}"
#> [773] "var <- seq(1, nrow(df))"
#> [774] "mutate(df, new = var)"
#> [775] "#> # A tibble: 87 × 4"
#> [776] "#> name height mass new"
#> [777] "#> <chr> <int> <dbl> <int>"
#> [778] "#> 1 Luke Skywalker 172 77 1"
#> [779] "#> 2 C-3PO 167 75 2"
#> [780] "#> 3 R2-D2 96 32 3"
#> [781] "#> 4 Darth Vader 202 136 4"
#> [782] "#> # ℹ 83 more rows"
#> [783] "```"
#> [784] ":::"
#> [785] ""
#> [786] "A case in point is `group_by()`. While you might think it has select"
#> [787] "semantics, it actually has mutate semantics. This is quite handy as it"
#> [788] "allows to group by a modified column:"
#> [789] ""
#> [790] "::: {#cb31 .sourceCode}"
#> [791] "``` {.sourceCode .r}"
#> [792] "group_by(starwars, sex)"
#> [793] "#> # A tibble: 87 × 14"
#> [794] "#> # Groups: sex [5]"
#> [795] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [796] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [797] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [798] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [799] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [800] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [801] "#> # ℹ 83 more rows"
#> [802] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [803] "#> # vehicles <list>, starships <list>"
#> [804] "group_by(starwars, sex = as.factor(sex))"
#> [805] "#> # A tibble: 87 × 14"
#> [806] "#> # Groups: sex [5]"
#> [807] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [808] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <fct> <chr> "
#> [809] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [810] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [811] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [812] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [813] "#> # ℹ 83 more rows"
#> [814] "#> # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [815] "#> # vehicles <list>, starships <list>"
#> [816] "group_by(starwars, height_binned = cut(height, 3))"
#> [817] "#> # A tibble: 87 × 15"
#> [818] "#> # Groups: height_binned [4]"
#> [819] "#> name height mass hair_color skin_color eye_color birth_year sex gender"
#> [820] "#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> "
#> [821] "#> 1 Luke Sky… 172 77 blond fair blue 19 male mascu…"
#> [822] "#> 2 C-3PO 167 75 <NA> gold yellow 112 none mascu…"
#> [823] "#> 3 R2-D2 96 32 <NA> white, bl… red 33 none mascu…"
#> [824] "#> 4 Darth Va… 202 136 none white yellow 41.9 male mascu…"
#> [825] "#> # ℹ 83 more rows"
#> [826] "#> # ℹ 6 more variables: homeworld <chr>, species <chr>, films <list>,"
#> [827] "#> # vehicles <list>, starships <list>, height_binned <fct>"
#> [828] "```"
#> [829] ":::"
#> [830] ""
#> [831] "This is why you can't supply a column name to `group_by()`. This amounts"
#> [832] "to creating a new column containing the string recycled to the number of"
#> [833] "rows:"
#> [834] ""
#> [835] "::: {#cb32 .sourceCode}"
#> [836] "``` {.sourceCode .r}"
#> [837] "group_by(df, \"month\")"
#> [838] "#> # A tibble: 87 × 4"
#> [839] "#> # Groups: \"month\" [1]"
#> [840] "#> name height mass `\"month\"`"
#> [841] "#> <chr> <int> <dbl> <chr> "
#> [842] "#> 1 Luke Skywalker 172 77 month "
#> [843] "#> 2 C-3PO 167 75 month "
#> [844] "#> 3 R2-D2 96 32 month "
#> [845] "#> 4 Darth Vader 202 136 month "
#> [846] "#> # ℹ 83 more rows"
#> [847] "```"
#> [848] ":::"
#> [849] ":::"
#> [850] ":::"
# Files ----
btw_this("./") # list files in the current working directory
#> [1] "| path | type | size | modification_time |\n|------|------|------|-------------------|\n| btw-package.html | file | 6.2K | 2025-03-10 17:34:39 |\n| btw.html | file | 13.52K | 2025-03-10 17:34:39 |\n| btw_register_tools.html | file | 10.55K | 2025-03-10 17:34:39 |\n| index.html | file | 7.02K | 2025-03-10 17:34:38 |"