Skip to main content Link Menu Expand (external link) Left Arrow Right Arrow Document Search Copy Copied

Lesson 4e - Data Frames

Data frames are the last type of container that we’ll be covering.

Table of Contents

Lesson Objectives

  • Use data frames to create two dimensional containers with elements of multiple data types

What is a Data Frame?

A data frame is another type of container that can contain elements of different data types, like lists. However, data in the same column must have the same data type. They are also two dimensional, like matrices.

Creating a Data Frame

The format to create a data frame is the following:

myDataFrame = data.frame(
  columnName1 = c(value, value1, value2, ...),
  columnName2 = c(value3, value4, value5, ...),
  ... # and so on
)

Data frames behave a lot like tables, or Excel sheets.

Input

myDataFrame = data.frame(
  name = c("Hannah", "John", "Mohammad", "Maria"),
  age = c(18, 19, 18, 18),
  faculty = c("Engineering", "Humanities", "Social Sciences", "Humanities")
)
myDataFrame

Output

      name age         faculty
1   Hannah  18     Engineering
2     John  19      Humanities
3 Mohammad  18 Social Sciences
4    Maria  18      Humanities

You can also use the View() function to open the data frame as a seperate tab.

Input

myDataFrame = data.frame(
  name = c("Hannah", "John", "Mohammad", "Maria"),
  age = c(18, 19, 18, 18),
  faculty = c("Engineering", "Humanities", "Social Sciences", "Humanities")
)
View(myDataFrame) # Take note of the capital V in "View"

Output

Accessing Items in a Data Frame

Just like lists, data frames return data differently if you use [ compared to [[.

Input

myDataFrame = data.frame(
  name = c("Hannah", "John", "Mohammad", "Maria"),
  age = c(18, 19, 18, 18),
  faculty = c("Engineering", "Humanities", "Social Sciences", "Humanities")
)

# accessing with a single [ returns a new data frame.
myDataFrame["name"]

Output

      name
1   Hannah
2     John
3 Mohammad
4    Maria

Using [[ returns a vector of the column rather than a data frame.

Input

myDataFrame = data.frame(
  name = c("Hannah", "John", "Mohammad", "Maria"),
  age = c(18, 19, 18, 18),
  faculty = c("Engineering", "Humanities", "Social Sciences", "Humanities")
)

# accessing with a single [ returns a new data frame.
myDataFrame[["name"]]

Output

[1] "Hannah"   "John"     "Mohammad" "Maria"

Accessing Items like a Matrix

You can also access items using the matrix notation.

myDataFrame = data.frame(
  name = c("Hannah", "John", "Mohammad", "Maria"),
  age = c(18, 19, 18, 18),
  faculty = c("Engineering", "Humanities", "Social Sciences", "Humanities")
)

myDataFrame[1,"age"]       # gets the first item in the "age" column
myDataFrame[2,"name"]      # gets the second item in the "name" column

myDataFrame[2,]            # gets all items in row 2 as a data frame
myDataFrame[,"age"]        # gets all items in column 2 as a vector

myDataFrame[2,, drop=FALSE]        # gets all items in row 2 as a data frame
myDataFrame[,"age", drop=FALSE]        # gets all items in column 2 as a data frame

myDataFrame[c(1,2),]   # gets all items in rows 1 and 2 ]
myDataFrame[,"age"]       # gets all items except in column "age"

myDataFrame[,]         # gets all items

You can also refer to the columns by number (based on the order they come in).

myDataFrame = data.frame(
  name = c("Hannah", "John", "Mohammad", "Maria"),
  age = c(18, 19, 18, 18),
  faculty = c("Engineering", "Humanities", "Social Sciences", "Humanities")
)

myDataFrame[,1]            # gets all items in the "name" column, because it's the first column 

Modifying Values in a Data Frame

Modifying values in a data frame works just like any other container.

Input

myDataFrame = data.frame(
  name = c("Hannah", "John", "Mohammad", "Maria"),
  age = c(18, 19, 18, 18),
  faculty = c("Engineering", "Humanities", "Social Sciences", "Humanities")
)

myDataFrame[3, "age"] = 19
myDataFrame

Output

      name age         faculty
1   Hannah  18     Engineering
2     John  19      Humanities
3 Mohammad  19 Social Sciences
4    Maria  18      Humanities

Adding Rows and Columns to a Data Frame

Just like matrices, you can add rows and columns to a data frame using the cbind() function for columns, and the rbind() function for rows.

Input

myDataFrame = data.frame(
  name = c("Hannah", "John", "Mohammad", "Maria"),
  age = c(18, 19, 18, 18),
  faculty = c("Engineering", "Humanities", "Social Sciences", "Humanities")
)
myDataFrame <- rbind(myDataFrame, list("Hank", 20, "Health Sciences"))
myDataFrame

Output

      name age         faculty
1   Hannah  18     Engineering
2     John  19      Humanities
3 Mohammad  18 Social Sciences
4    Maria  18      Humanities
5     Hank  20 Health Sciences

Reading from a CSV File

Since data frames are essentially tables of data, one useful feature of R is the ability to read data from a CSV file and save it as a data frame using the read.csv() function.

To read CSV files, you need to supply a file path to read.csv(). That file path can be absolute (something like C:\Users\USER\Documents\R\data.csv), or relative to your current working directory. You can find your current working directory using the getwd() function.

Input

getwd()

Output

[1] "C:/Users/USER/Documents/R"

If your CSV file is in the same folder as your working directory, you can simply name the file.

myData <- read.csv("data.csv")

This will turn your CSV file into an R data frame.

If you are interested in what else read.csv() has to offer, check out this resource that goes further in depth about the different parameters and settings it offers.

Key Points / Summary

  • You can use data frames to get a table-like data format.
  • Columns in data frames must have the same data type (vectors), but the rows can have different data types (lists).