Activities for Voyant
- Cirrus - What do we learn about the text from this Word Cloud?
Let’s spend some time exploring:
- What happens when you hover your mouse over different parts of the Cirrus widget?
- What happens when you click on a word?
- What hidden buttons can you find?
- Find the “Options” button and edit or view the list of “Stopwords.” Does this change your results?
- Trends: an efficient corpus reader that fetches segments of text as you scroll
- How are words in the “trends” widget chosen for inclusion?
- What kind of information to the trend lines, x- and y- axes convey?
- What does the graph tell us about the story?
- Bonus Question: What is the difference between relative and raw frequencies and how do you find that information?
Activities for NLTK
-
Select any paragraph of text and: a. tokenize by sentence b. tokenize by word c. identify all stop words
-
Plot a frequency distribution of the words in your selected text. Change the number of words included in the distribution.
-
Explore the Reuters corpus. Identify the five most frequently occurring categories?