Link Search Menu Expand Document

List of practice problems

All of these problems are set up in the same way. You are given code to set up the problem, and then it’s up to you to work from there to get to the solution.

We also use these practice problems to introduce topics not covered in the “lecture” part of the workshop, so you will need to look up how to do most of these exercises, which is what you will have to do when using your own dataset.

The idea is not to solve the probelm the same way that is here. The important thing is to get the same result, which is why we provide answer code. There are many ways to get both correct and wrong answers, so check the output carefully.

I - Single letter

P1 - Replace all “-“ with “a” (vector)

## starting vector
fruits <- c("one -pple", "two pe-rs", "three b-n-n-s")
Answer fruits = str_replace_all(fruits, "-", "a")

P2 - Replace “-“ with “e” in the count column only

## get vector
fruit = c("appl-", "apricot", "b-ll p-pp-r")
count = c("on-", "two", "thr--")
## make dataframe
fruitcount = as.data.frame(cbind(fruit, count))
Answer fruitcount$count = gsub("-", "e", fruitcount$count)

II - Groups of letters

P1 - Replace all vowels with “-“

This is an example from stringr, type ?str_replace to find it in the help tab.

## get vector
fruits <- c("one apple", "two pears", "three bananas")
Answer fruits = str_replace(fruits, "[aeiou]", "-")

P2 - Capitalize all letters

## get vector
fruits <- c("one apple", "two pears", "three bananas")
Answer fruits = str_to_title(fruits)

III - Combinations of words

P1 - remove the end of the words that are Amarica and British english and sigular or plural

Your goal is for length(unique(words)) to output 2. Right now, it outputs 8.

## initial vector of words
words = c("color", "colour", "flavor", "flavour", "colors", "colours", "flavors", "flavours")

## how many unique words?
length(unique(words))
        # 8 words
Answer words = str_replace(words, "o(u)*r(s)*", "")

IV - Excluding potential matches from being changed

P1 - convert all letters to “-“ except “e”

fruits <- c("one apple", "two pears", "three bananas")
Answer fruits = str_replace_all(fruits, "[a-d,f-z]", "-")

V - Most complex problems

P1 - Herbarium inventory dataset

## values
dates = c("2000-03-08", "2001-3-15", "2002-03-21", "2003-March -12", "2004-mar-3", "2004-0 3-17")
collection_id = c("c1 ", "c2 ", "c3", "c4 ", "c5", "c6")
deposited = c("yes", "y", "yes", " yes", "Yes ", "no")
## make dataframe 
herbarium_inventory = as.data.frame(cbind(dates, collection_id, deposited))

Fix the herbarium_inventory to

  1. Not have any spaces.
  2. Convert all te variatons of yes to 1 all the no to 0.
  3. Convert all the permutations of the month March to be written as March.
Answer 1. Remove all spaces

herbarium_inventory <- str_replace_all(" ", "", herbarium_inventory)

2. Convert the "deposited" column to 1 for "yes" and 0 for "no"

herbarium_inventory$deposited <- ifelse(grepl("y(es)", herbarium_inventory$deposited), 1, 0)

3. Fix month

herbarium_inventory$dates <- gsub("-(0?3-|(?i)mar)-", "-March-", herbarium_inventory$dates)

P2 - set up data for the start of problem I-P2

Start with a nice dataset and change every “e” to “-“

## vectors
fruit = c("apple", "apricot", "bell pepper")
count = c("one", "two", "three")
## make dataframe 
fruitcount = as.data.frame(cbind(fruit, count))
Answer fruitcount = data.frame(lapply(fruitcount, gsub, pattern = "e", replacement = "-", fixed = TRUE))

P3 - Edit URLs

## starting url list
url = c("https://www2.gov.bc.ca/gov/content/home", 
        "https://www.canada.ca/en.html", 
        "https://yukon.ca/", 
        "https://www.alberta.ca/")

Your target is to have this url list “gov.bc” “canada” “yukon” “alberta”

Answer 1. get rid of the parts of the very start of the urls

url.short = gsub("https://", "", url)

2. get rid of the rest of the start start of URLs that is not interesting rigth now

url.short = gsub("(www|www2)(\\.)", "", url.short)

3. get rid of .ca and everything at the end</p>

url.short = gsub("\\.ca.*", "", url.short)