2 Manipulating Rows
| Select rows | filter(), distinct(), slice() |
| Arrange rows | arrange() |
| Add rows | add_row(), bind_rows() |
Please copy the following code and paste them into a script in the RStudio. We will walk through them with the visual explanations from the dplyr cheat sheet.
Input
library(tidyverse)
?mtcars
mtcars
## 2.1 Filter rows with `filter()`
Goal: Keep only cars with fuel efficiency above 20 mpg.
over20 <- mtcars %>%
filter(mpg > 20)
over20
What changed: You now have a smaller data frame that keeps only rows matching the condition.
Mini practice: Change the threshold to mpg > 25. How many rows remain?
2.2 Keep unique rows with distinct()
Goal: See unique values for a row-level grouping variable.
Input
distinctmt <- mtcars %>%
distinct(gear)
distinctmt
What changed: Duplicate gear values are removed, so each value appears once.
Mini practice: Try distinct(cyl) and compare the result with distinct(gear).
2.3 Pick specific rows with slice() and helpers
Goal: Select rows by row position.
Input
slicemt <- mtcars %>%
slice(10:15)
headmt <- mtcars %>%
slice_head(n = 5)
tailmt <- mtcars %>%
slice_tail(n = 5)
slicemt
headmt
tailmt
What changed: Rows are selected by position, not by a logical condition.
Mini practice: Use slice(1:3) and compare it with slice_head(n = 3).
2.4 Reorder rows with arrange()
Goal: Sort rows by one variable.
Input
arrmt <- mtcars %>%
arrange(mpg)
descmt <- mtcars %>%
arrange(desc(mpg))
arrmt
descmt
What changed: The same rows are kept, but their order changes.
Mini practice: Sort by wt in ascending and descending order.
2.5 Add rows with add_row() and bind_rows()
Goal: Add one row manually and combine multiple data frames by rows.
Input
# add one row to an existing restaurant menu
breakfast_menu <- tibble(
item = c("Egg Sandwich", "Veggie Omelette"),
category = c("Sandwich", "Omelette"),
price = c(7.50, 8.25)
)
menu_updated <- breakfast_menu %>%
add_row(item = "Mushroom Toast", category = "Toast", price = 6.75)
menu_updated
# combine two day-by-day order logs with overlapping columns
monday_orders <- tibble(
order_id = c(101, 102),
item = c("Egg Sandwich", "Veggie Omelette"),
dine_in = c(TRUE, FALSE)
)
tuesday_orders <- tibble(
order_id = c(201, 202),
item = c("Mushroom Toast", "Egg Sandwich"),
takeout = c(TRUE, TRUE)
)
bind_rows(monday_orders, tuesday_orders)
What changed: add_row() appends one row; bind_rows() stacks data frames and fills missing columns with NA.
Mini practice: Add one more menu item to menu_updated and set your own price.
Practice 1
iris is a data frame with 150 cases (rows) and 5 variables (columns) such as Petal.Width and Species. In the iris data set, the cases with the minimum and maximum petal width belong to what species?
Click here for solutions
# solution 1
arrange(iris, Petal.Width)
# solution 2
slice_min(iris, Petal.Width)
slice_max(iris, Petal.Width)
# The case with the minimum petal width belongs to setosa.
# The case with the maximum petal width belongs to virginica.
Loading last updated date...