Link

Activities for Voyant

voyant-austen-numbered.png

  1. Cirrus - What do we learn about the text from this Word Cloud?

Let’s spend some time exploring:

  • What happens when you hover your mouse over different parts of the Cirrus widget?
  • What happens when you click on a word?
  • What hidden buttons can you find?
  • Find the “Options” button and edit or view the list of “Stopwords.” Does this change your results?
  1. Trends: an efficient corpus reader that fetches segments of text as you scroll
  • How are words in the “trends” widget chosen for inclusion?
  • What kind of information to the trend lines, x- and y- axes convey?
  • What does the graph tell us about the story?
  • Bonus Question: What is the difference between relative and raw frequencies and how do you find that information?

Activities for NLTK

  1. Select any paragraph of text and: a. tokenize by sentence b. tokenize by word c. identify all stop words

  2. Plot a frequency distribution of the words in your selected text. Change the number of words included in the distribution.

  3. Explore the Reuters corpus. Identify the five most frequently occurring categories?