Section 7 Spatial SQL
In this section we will import a “dump” file into the database and then perform a point-in-polygon analysis.
The dump file is the result of using the pg_dump
command. This can dump an entire database, a schema including all the tables within the schema, or a single table. For this specific data set, a single table was dumped from an existing data set representing ACS block groups from 2018.
Download the file census.acs5_2018_bg.dump to your data
folder.
This exercise provides one simple example of how GIS functions can be used within the database. PostGIS provides a plethora of analytic methods for processing spatially referenced data. See PostGIS Reference.
Copy this text to the end of your Rmd file:
<!--<start rmd_add_05.txt>-->
## Spatial SQL with PostGIS
First we will restore the dump file to the database.
```{r census}
# does the table exist?
if(!dbTableExists(conn = rsql, table_schema = "census", table_name = "acs5_2018_bg")){
# zip file
if(!file.exists("../data/census.acs5_2018_bg.dump") & file.exists("../data/census.acs5_2018_bg.dump.zip")){
unzip("../data/census.acs5_2018_bg.dump.zip", exdir = "../data/")
}
# read in a dump file
if(Sys.info()['sysname'] == "Windows"){
# formulate a horrible windows command string
mycmd <- "cmd /c C:\\PROGRA~1\\PostgreSQL\\12\\bin\\psql -U csde -f ../data/census.acs5_2018_bg.dump rsql"
# restore
system(mycmd)
} else {
# formulate a nicer looking POSIX command string
mycmd <- "psql -U csde -c -f ../data/census.acs5_2018_bg.dump rsql"
# restore
system(mycmd)
}
}
```
Let's check that the table exists in the database:
```{r list_tables_spatial}
dbListTables(conn = rsql)
```
Before we can measure the points, we need to create a geometry column within the GPS point table:
```{r gpspoint}
sql <- "
--add the geometry column
alter table gps.tracks add column if not exists geom geometry(point, 4326);
--update the value
update gps.tracks set geom = st_setsrid(st_makepoint(lon, lat), 4326);
--create the spatial index
create index idx_gps_tracks on gps.tracks using gist(geom);
"
res <- dbGetQuery(conn = rsql, statement = sql)
```
Now we formulate a query to create a new table with the count the number of GPS points per census block group, using `count()` to count the number of points aggregated using `group_by` based on containment tested by `st_contains()`:
```{r pip}
sql <- "
--create a new table
create table if not exists gps.tracks_bg as
--select the census BG ID and count of points
select geoid, count(*) as n_points
--from a subquery
from (select c.geoid, g.time_gps
--aliasing block groups as c
from census.acs5_2018_bg as c
--aliasing GPS as g
, gps.tracks as g
--the secret sauce, where the block group contains the GPS point
where st_contains(c.wkb_geometry, g.geom)) as foo
--aggregate count by block group
group by geoid
--order by number of points per BG
order by n_points desc;
--select the records
select * from gps.tracks_bg;
"
# run the query
gps.tracks_bg <- dbGetQuery(conn = rsql, statement = sql)
# print the table
DT::datatable(gps.tracks_bg)
```
This provided one simple examle of how GIS functions can be used within the database. PostGIS provides a plethora of analytic methods for processing spatially referenced data. See [PostGIS Reference](https://postgis.net/docs/reference.html){target="_blank"}.
<hr class="new4">
<!--<end rmd_add_05.txt>-->
and then render the file.
rmarkdown::render("rmd/r_sql.rmd")