Activity: Running a script and reading results
This activity is adapted Zoë Wilkinson Saldaña, “Sentiment Analysis for Exploratory Data Analysis,” The Programming Historian 7 (2018), https://doi.org/10.46430/phen0079.
What is VADER?
VADER, or Valence Aware Dictionary for Sentiment Reasoning, is an open-source model used for sentiment analysis. It can detect both polarity (positive / negative) and intensity of a sentiment. It is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
If you have not had a chance to install Anaconda and set up your environment, you can follow along by making a copy of this Collab notebook.
Getting Started
For this activity we’ll be making a file.
Create a new file from the terminal using the built-in text editor nano.
Input
nano touch myscript.py
Open the file. Input
nano myscript.py
Copy over the following:
Input
import nltk
nltk.download('vader_lexicon')
nltk.download('punkt')
Enter ESC and select Y for yes when prompted. Your file is now saved in the location you were in when you created it. You can double check by listing the files in your current location:
Input
ls
Run the file: Input
python myscript.py
Once you’ve taken these steps you are ready to run Vader commands.
Explore VADER lexicon: https://www.kaggle.com/nltkdata/vader-lexicon {.note}
Using VADER
To use VADER we need to import the SentimentIntensityAnalyzer module into our workspace.
Input
from nltk.sentiment.vader import SentimentIntensityAnalyzer
Next, we initialise VADER so we can use it within our Python script Input
sid = SentimentIntensityAnalyzer()
The variable ‘text’ now contains the text we will analyse. We’ll start with a famous tweet from that Mars Phoneix rover when it found ice on Mars in 2008.
Input
message_text = '''Are you ready to celebrate? Well, get ready: We have ICE!!!!! Yes, ICE, *WATER ICE* on Mars! w00t!!! Best day ever!!'''
print(message_text)
Now that we have VADER initialised and a message text to work with we can call on the polarity_scores method. This will read the message, process it, and output a dictionary with negative, neutral, positive, and compound scores for the input text.
Input
scores = sid.polarity_scores(message_text)
Finally we can loop through the keys contained in scores (pos, neu, neg, and compound scores) and print the key-value pairs as well
Input
for key in sorted(scores):
print('{0}: {1}, '.format(key, scores[key]), end='')
Putting it all together
You can save all of these steps in your script file.
Open it in nano again.
Input
nano myscript.py
Copy paste the following:
Input
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
text = '''Are you ready to celebrate? Well, get ready: We have ICE!!!!! Yes, ICE, *WATER ICE* on Mars! w00t!!! Best day ever!!'''
print(text)
scores = sid.polarity_scores(text)
for key in sorted(scores):
print('{0}: {1}, '.format(key, scores[key]), end='')
Again enter ESC and select Y for yes when prompted. Your file is now saved in the location you were in when you created it.
Now we can try running the script!
Input
python myscript.py
Try changing the text by removing some of the exclamation marks and rerunning the script – what happens?
The output should give you a polarity score for the text in the terminal. You can also print the output to a text file.
Input
python myscript.py > output.txt
Next Steps
Let’s look at how we might use this with a larger block of text. For instance, a paragraph. Below is the same script but with sentence-level analysis.
To look at the sentences in a paragraph we need to tokenize the text and we do this by importing a new module.
from nltk import word_tokenize
We use the “english.pickle” function from punkt to give it a short name.
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
Our text is coming from NASA’s official statement on the Phoenix Rover finding ICE.
text = '''Laboratory tests aboard NASA's Phoenix Mars Lander have identified water in a soil sample. The lander's robotic arm delivered the sample Wednesday to an instrument that identifies vapors produced by the heating of samples.
"We have water," said William Boynton of the University of Arizona, lead scientist for the Thermal and Evolved-Gas Analyzer, or TEGA. "We've seen evidence for this water ice before in observations by the Mars Odyssey orbiter and in disappearing chunks observed by Phoenix last month, but this is the first time Martian water has been touched and tasted."
'''
We break up the text by using our new tokenizer function which uses “english.pickle” function from punkt to split text into sentences. It’s not perfect and sometimes gets confused by spacing.
sentences = tokenizer.tokenize(message_text)
Finally, we need to move through all of the sentences, not just one, and use a for loop to do so.
for sentence in sentences:
print(sentence)
scores = sid.polarity_scores(sentence)
for key in sorted(scores):
print('{0}: {1}, '.format(key, scores[key]), end='')
print()
Input
nano touch sentence.py
Copy over this new script.
Input
import nltk.data
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk import sentiment
from nltk import word_tokenize
sid = SentimentIntensityAnalyzer()
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
text = '''Laboratory tests aboard NASA's Phoenix Mars Lander have identified water in a soil sample. The lander's robotic arm delivered the sample Wednesday to an instrument that identifies vapors produced by the heating of samples.
"We have water," said William Boynton of the University of Arizona, lead scientist for the Thermal and Evolved-Gas Analyzer, or TEGA. "We've seen evidence for this water ice before in observations by the Mars Odyssey orbiter and in disappearing chunks observed by Phoenix last month, but this is the first time Martian water has been touched and tasted."
'''
sentences = tokenizer.tokenize(text)
for sentence in sentences:
print(sentence)
scores = sid.polarity_scores(sentence)
for key in sorted(scores):
print('{0}: {1}, '.format(key, scores[key]), end='')
print()
Again enter ESC and select Y for yes when prompted. Your file is now saved in the location you were in when you created it.
Run the script as before.
Input
python sentence.py