Link Search Menu Expand Document

Data Organization in Spreadsheets for Social Scientists

Good data organization is the foundation of any research project. Most researchers have data in spreadsheets, so it’s the place that many research projects start.

Typically we organize data in spreadsheets in ways that we as humans want to work with the data. However computers require data to be organized in particular ways. In order to use tools that make computation more efficient, such as programming languages like R or Python, we need to structure our data the way that computers need the data. Since this is where most research projects start, this is where we want to start too!

This workshop will be hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow. These lessons assume no prior knowledge of the skills or tools.

To get started, follow the directions below to download data to your computer and follow any installation instructions.

Pre-Workshop Setup

Data

You need to download some files to follow this lesson:

  1. Download the following three files:
  2. Place these 3 files in a folder you can easily find and access on your computer (for instance in a datacarpentry-spreadsheets folder on your Desktop or within your Home folder).

About the data

For more information about the dataset and to download it from Figshare, check out the Social Sciences workshop data page.

Software

To interact with spreadsheets, we can use LibreOffice, Microsoft Excel, Gnumeric, Onlyoffice, WPS office or other programs. Commands may differ a bit between programs, but the general ideas for thinking about spreadsheets are the same.

For this lesson, we will mainly use Microsoft Excel. If you do not have access to Microsoft Excel, you can also use LibreOffice, which is a free, open source spreadsheet program. Many functions will be similar to Excel.

macOS users who use Apple’s Numbers application should note that it does not contain some of the features (particularly data validation) that we will be using. Please use LibreOffice or Microsoft Excel instead.

LibreOffice installation (optional)

Windows/macOS
  • Download the Installer
  • Install LibreOffice by going to the installation page. The version for Windows/Mac should automatically be selected. Click Download Version X.X.X (whichever is the most recent version).
  • Install LibreOffice
  • Once the installer is downloaded, double click on it and LibreOffice should install.
Linux
  • Download the Installer
  • Install LibreOffice by going to the installation page. The version for Linux should automatically be selected. Click Download Version X.X.X (whichever is the most recent version).
  • Install LibreOffice
  • Once the installer is downloaded, double click on it and LibreOffice should install.
  • package manager option:
    • pacman (Arch): pacman -S libreoffice
    • yum (Fedora, CentOS): yum install libreoffice
    • apt (Debian, Ubuntu): apt install libreoffice