Introduction to Machine Learning - Classification and Clustering Models

https://ubc-library-rc.github.io/ml-classification-clustering

Land Acknowledgement

UBC Vancouver is located on the traditional, ancestral, and unceded territory of the xʷməθkʷəy̓əm (Musqueam) peoples.

Use the Zoom toolbar to engage The Zoom toolbar

Participants window

The participants menu
CCDHHN Certificate
This workshop contributes towards the Canadian Certificate in Digital Humanities.

Learning Objectives

  • Understand classification and clustering models and, their applications in machine learning
  • Train classification and clustering models for real-world datasets
  • Interpret and analyze classification and clustering model results.

Pre-workshop setup

What is Machine Learning?

  • “Field of study that gives computers the capability to learn without being explicitly programmed" - Arthur Samuel

Exercise 1: Replace the question mark

X Y
0.1 A
0.4 A
4.3 B
4.2 B
3.2 ?

Exercise 2: Group the points in two sets

X
11
10
21
22
9

Dataset Example

From colemanm.org

Building a Machine Learning Model

Types of Machine Learning

From javatpoint

Some other types:

  • Reinforcement Learning
  • Transfer Learning

Cake Analogy

From medium.com

Methods and Algorithms

Data Preparations

  • Types of Features (continuous/categorical)
  • Handling missing values
  • Feature scaling
  • Feature selection

Model Evaluation

Overfitting Underfitting

From geeksforgeeks

Algorithms and Methods

From mathworks

Algorithms and Methods

From mathworks

Python Libraries

Classification and Clustering

From miro account on Medium

Evaluation - Classification

From Anuganti Suresh on Medium

Density-based Clustering

From ResearchGate

Hierarchical Clustering

From analyticsvidhya.com

A 2003 research team used hierarchical clustering to “support the idea that many…breast tumor subtypes represent biologically distinct disease entities.” To the human eye, the original data looked like noise, but the algorithm was able to find patterns.

Limits of Machine Learning

  • Garbage In = Garbage Out
  • Data Limitation
  • Generalization and overfitting
  • Inability to explain answers
  • Ethics and Bias Limitations
  • Computational Limitations

Open Jupyter Notebooks

Open In Colab

Ethics

Ethics

Image from: Lepri, Bruno, Nuria Oliver, and Alex Pentland. "Ethical machines: The human-centric use of artificial intelligence." IScience 24.3 (2021): 102249.

Where to go from here?

Future workshops

Title Series 1
Regression models Tue, Mar 19, 2024 (1:00pm to 3:00pm) - Past
Classification and clustering models Tue, Mar 26, 2024 (1:00pm to 3:00pm) - Today
Neural networks Tue, Apr 2, 2024 (1:00pm to 3:00pm)
LLMs Tue, Apr 9, 2024 (1:00pm to 3:00pm)

Register here

More from the Research Commons at (UBC-V)

And from the Center for Scholarly Communication (UBC-O)