2 Manipulating Rows

Select rows filter(), distinct(), slice()
Arrange rows arrange()
Add rows add_row(), bind_rows()

Please copy the following code and paste them into a script in the RStudio. We will walk through them with the visual explanations from the dplyr cheat sheet.

Input

library(tidyverse)
?mtcars
mtcars

## 2.1 Filter rows with `filter()`
Goal: Keep only cars with fuel efficiency above 20 mpg.

over20 <- mtcars %>%
  filter(mpg > 20)
over20

What changed: You now have a smaller data frame that keeps only rows matching the condition.

Mini practice: Change the threshold to mpg > 25. How many rows remain?

2.2 Keep unique rows with distinct()

Goal: See unique values for a row-level grouping variable.

Input

distinctmt <- mtcars %>%
  distinct(gear)
distinctmt

What changed: Duplicate gear values are removed, so each value appears once.

Mini practice: Try distinct(cyl) and compare the result with distinct(gear).

2.3 Pick specific rows with slice() and helpers

Goal: Select rows by row position.

Input

slicemt <- mtcars %>%
  slice(10:15)
headmt <- mtcars %>%
  slice_head(n = 5)
tailmt <- mtcars %>%
  slice_tail(n = 5)

slicemt
headmt
tailmt

What changed: Rows are selected by position, not by a logical condition.

Mini practice: Use slice(1:3) and compare it with slice_head(n = 3).

2.4 Reorder rows with arrange()

Goal: Sort rows by one variable.

Input

arrmt <- mtcars %>%
  arrange(mpg)
descmt <- mtcars %>%
  arrange(desc(mpg))

arrmt
descmt

What changed: The same rows are kept, but their order changes.

Mini practice: Sort by wt in ascending and descending order.

2.5 Add rows with add_row() and bind_rows()

Goal: Add one row manually and combine multiple data frames by rows.

Input

# add one row to an existing restaurant menu
breakfast_menu <- tibble(
  item = c("Egg Sandwich", "Veggie Omelette"),
  category = c("Sandwich", "Omelette"),
  price = c(7.50, 8.25)
)

menu_updated <- breakfast_menu %>%
  add_row(item = "Mushroom Toast", category = "Toast", price = 6.75)
menu_updated

# combine two day-by-day order logs with overlapping columns
monday_orders <- tibble(
  order_id = c(101, 102),
  item = c("Egg Sandwich", "Veggie Omelette"),
  dine_in = c(TRUE, FALSE)
)

tuesday_orders <- tibble(
  order_id = c(201, 202),
  item = c("Mushroom Toast", "Egg Sandwich"),
  takeout = c(TRUE, TRUE)
)

bind_rows(monday_orders, tuesday_orders)

What changed: add_row() appends one row; bind_rows() stacks data frames and fills missing columns with NA.

Mini practice: Add one more menu item to menu_updated and set your own price.

Practice 1

iris is a data frame with 150 cases (rows) and 5 variables (columns) such as Petal.Width and Species. In the iris data set, the cases with the minimum and maximum petal width belong to what species?

Click here for solutions

# solution 1
arrange(iris, Petal.Width)
# solution 2
slice_min(iris, Petal.Width)
slice_max(iris, Petal.Width)

# The case with the minimum petal width belongs to setosa.
# The case with the maximum petal width belongs to virginica.

 
 


View in GitHub

Loading last updated date...