Link Search Menu Expand Document

Data dictionary

A data dictionary is a variable-level documentation that tells us important information about the variables in the dataset. This is especially important if you are working with multiple tables or with a database. A data dictionary is typically formated as a table with a row corresponding to a variable in your dataset and columns representing a field of information about that variable. The data dictionary should include variable name, data type, description, and sample values.

The main goal of the data dictionary is to help people understand datasets. It will help your peers answer questions such as “what does this variable mean?”

Most long standing research projects will have a data dictionary, below are some open-source examples:

As you can see, a data dictionary can be a simple table (spreadsheet or PDF) or a full-fledged web application. Some projects only need a one data dictionary that can be created and maintained by a single person while others will require a whole team to create and maintain it.

Exercise 1

Please help us to make sense of a dataset.

Access a dataset:

Florida, Richard, 2013, “Class-Divided Cities, Detroit Edition Published in Atlantic Cities”, https://doi.org/10.5683/SP3/SNXXHQ, Borealis, V3, UNF:6:zsehCAz4agntvPwDZF03OA== [fileUNF]

Download the data file “Detroit Class Data.xlsx” in the Original File Format. Examining the data, try to answer the following questions:

  1. What do you think the columns STATEFP10 and COUNTYFP10 mean?
  2. Describe the different measures in this study.
  3. How was the data collected?

How to create a Data Dictionary

  • A data dictionary will typically be structured so each row corresponds to a column in your dataset and each column represents a field of information about the column.
  • Include the following fields:

    • Column name
    • Column name in plain English
    • Description of the Column
    • Data type
    • Data usage type
    • Sample values
  • Optional fields to inlcude

    • Transformations (Was the column the result of a transformation?)
    • Example usage (SQL queries)
    • Missing values
    • Values (this is useful if a column uses a scale/test)
    • Other notes

Template

Below is an example data dictionary for the article we looked at earlier in the chapter. This template was designed to capture most datasets and with . There are other data dictionary templates available for more specific needs. NOTE: These values here are made up for educational purposes, they do not reflect what the real study had in mind.

You can download this template here

Column Name Business Name Description Data Type Data Usage Type Sample Values
STATEFP10 State code The unique numeric code for the state. String Dimension Attribute “01”, “02”, “06”
COUNTYFP10 County code The unique numeric code for the county. String Dimension Foreign Key “001”, “003”, “005”
TRACTCE10 Census Tract Code Code identifying a specific census tract. String Dimension Attribute “593300”
GEOID10 Geographical ID Combined state, county, and tract identifier. String Dimension Foreign Key “26163593300”
NAMELSAD10 Area Name The full name of the census area. String Attribute “Los Angeles County, CA”, “Cook County, IL”
class Land Classification Indicates land use or classification. String Dimension Attribute “Residential”, “Commercial”, “Agricultural”
CCPCT Child Care Percentage Percentage of households using child care services. Number Fact 15.2, 25.4, 32.7
FFFPCT Fast Food Percentage Percentage of restaurants classified as fast food. Number Fact 40.3, 55.8, 22.5
SCPCT Senior Citizen Percentage Percentage of population over 65 years old. Number Fact 10.4, 15.8, 20.1
WCPCT Working Class Percentage Percentage of households in the working class. Number Fact 45.2, 62.1, 51.3

The Style

How you write your data dictionary is as important as the information you include. Always remember to be as clear as possible. Follow the style guide provided by your team to be consistent. The following are some general best practices related to data documentation:

The process

Document your work as you go, so you don’t lose track of any details. If you wait until the end of your project, you might already have lost or forgotten valuable information.

You can create a data dictionary with any text editor but we suggest using some kind of spreadsheet (Excel, Numbers, etc.) Although you should edit the data dictionary with a spreadsheet software, we suggest saving it as a CSV or TSV file as it is a non-proprietary format and freely available for everyone to use into the future.

Congrats!

You are now ready to create your data dictionary so other researchers can understand your dataset with no problems!


Need help?

Please reach out to research.data@ubc.ca for assistance with any of your research data questions.