Practical Problem

A Sample Python Program

We might skip this program depending on group needs

Fetching data from the UN and assessing the most discussed topics

Assume you have a function that is a black-box. You don’t need to understand it, just use it to get data from a web page. The function get_text_from takes a resource r and return the text from that resource as a list.

Input

import requests
from bs4 import BeautifulSoup
# https://stackoverflow.com/questions/328356/extracting-text-from-html-file-using-python
def get_text_from(r):
    page = requests.get(r)
    soup = BeautifulSoup(page.text)    
    for script in soup(["script", "style", "pre"]):
        script.extract()
    text = soup.get_text()

    lines = (line.strip() for line in text.splitlines())
    chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
    txt = '\n'.join(chunk for chunk in chunks if chunk)

    return txt.split('\n')

For more details, we encourage you to register into the Introduction to Web Scraping Workshop

Input

def discusses_topic(speech, topic):
    cnt = 0
    for line in speech:
        words_in_sentence = line.split(' ')
        if any(word in topic for word in words_in_sentence):
            cnt += 1
    return cnt

We want to define some topics, e.g., climate change, economy, peace, etc. We will use the discusses_topic to create a dictionary with the topic and how often it appears in UN discussions (see references for list of statements).

Input

un_statements = [
  "https://www.un.org/sg/en/content/sg/speeches/2020-03-10/wmo-state-of-the-climate-2019-report-remarks",
  "https://www.un.org/sg/en/content/sg/speeches/2020-02-21/2020-session-of-special-committee-decolonization-remarks",
  "https://www.un.org/sg/en/content/sg/speeches/2020-02-16/remarks-sustainable-development-and-climate-change",
  "https://www.un.org/sg/en/content/sg/speeches/2020-02-08/remarks-the-blue-economy-event",
  "https://www.un.org/sg/en/content/sg/speeches/2020-03-31/remarks-launch-of-report-the-socio-economic-impacts-of-covid-19",
  "https://www.un.org/sg/en/content/sg/speeches/2020-03-26/remarks-g-20-virtual-summit-covid-19-pandemic",
  "https://www.un.org/sg/en/content/sg/speeches/2018-06-19/remarks-press-encounter-pm-solberg-norway",
  "https://www.un.org/sg/en/content/sg/speeches/2018-06-09/secretary-general-comments-44th-g7-summit",
  "https://www.un.org/sg/en/content/sg/speeches/2020-02-04/remarks-journalists-priorities-for-2020-and-work-of-organization",
  "https://www.un.org/sg/en/content/sg/speeches/2019-11-11/remarks-the-paris-peace-forum"
]

peace = ["peace", "war", "cease", "together", "heal", "truce", "reconciliation", "union", "treaty"]
environment = ["global", "environment", "climate", "change", "natural", "warming", "temperature", "resources"]
economy = ["economy", "business", "development", "employment", "progress", "job", "capital", "dollars"]

Input

cnt_peace = 0
cnt_environment = 0
cnt_economy = 0

for resource in un_statements:
  txt = get_text_from(resource)
  cnt_peace += discusses_topic(txt, peace)
  cnt_environment += discusses_topic(txt, environment)
  cnt_economy += discusses_topic(txt, economy)

data = dict(
  topic=["peace", "environment", "economy"],
  count=[cnt_peace, cnt_environment, cnt_economy]
)

Input

sns.barplot(x='topic', y='count', data=data)

Output will be similar to: