Activities for Voyant

voyant-austen-numbered.png

  1. Cirrus - What do we learn about the text from this Word Cloud?

Let’s spend some time exploring:

  • What happens when you hover your mouse over different parts of the Cirrus widget?
  • What happens when you click on a word?
  • What hidden buttons can you find?
  • Find the “Options” button and edit or view the list of “Stopwords.” Does this change your results?
  1. Trends: an efficient corpus reader that fetches segments of text as you scroll
  • How are words in the “trends” widget chosen for inclusion?
  • What kind of information to the trend lines, x- and y- axes convey?
  • What does the graph tell us about the story?
  • Bonus Question: What is the difference between relative and raw frequencies and how do you find that information?

Activities for NLTK

  1. Select any paragraph of text and: a. tokenize by sentence b. tokenize by word c. identify all stop words

  2. Plot a frequency distribution of the words in your selected text. Change the number of words included in the distribution.

  3. Explore the Reuters corpus. Identify the five most frequently occurring categories?


View in GitHub

Loading last updated date...