Section 1 Introduction

The materials available here were created to guide students through the CSDE workshop R and Relational Databaes on the use of relational databases with R , primarily with the use of the DBI (R Database Interface) , sqldf (Manipulate R Data Frames Using SQL) , RSQLite (‘SQLite’ Interface for R) , and RPostgreSQL (R Interface to the ‘PostgreSQL’ Database System) packages. The focus of the course is taking advantage of the power of SQL databases with the flexibility and utility of R. Particular attention will be paid to the use of PostgreSQL , arguably the most powerful open source relational database, which includes the PostGIS spatial storage, management, and analysis extension.

Please see the PowerPoint presentation introducing the workshop.

Rationale

R has become an ubiquitous environment for data management and analysis, powered by its extensible architecture supported by numerous packages designed for a myriad of uses. Despite its utility, base R suffers from a fundamental limitation of requiring data sets to be loaded into RAM. For very large data sets, this can be problematic. One work-around is to store data in relational databases (e.g., PostgreSQL and SQLite), which are efficient at storing and analyzing large tables. Where possible, much of the data wrangling needed for generating analytic tables can be performed within the database, with results converted to data frames for statistical analysis or visualization within R. In this workshop, we will cover basic functionality of the R DBI package, which controls the basic communication between R and SQL databases. We will use PostgreSQL/PostGIS and SQLite as the storage databases. Attention will be paid to (1) using SQL queries to distill large raw database tables into manageable R data frames and (2) creating reports with resultant tables and graphics within the RMarkdown documentation framework.

What this course is

This course will introduce students to the basic use of the R framework’s interface with SQL databases. By the end of the course, students will be able to connect and disconnect from SQL databases, perform queries on data stored within SQL databases.

What this course is not

This course is not intended to teach fundamentals of relational database management. We assume that students have at least a beginning to intermediate level of skill in the use of a relational database (e.g., creating databases, creating and modifying table structures, adding/deleting records, and performing SQL queries) This is also not an R course–it is expected that students will have a fundamental grasp of the use of R.

Overview

Each of the topics shown below will be covered during the course.

  1. Getting started
  2. Connecting and disconnecting to databases
  3. Performing queries on data within databases
  4. Using sqldf() on R data frames
  5. Using rpostgis to interact with PostGIS-enabled PostgreSQL databases
  6. Using sf to intersect with PostGIS-enabled PostgreSQL databases