Section 7 Spatial SQL

In this section we will import a “dump” file into the database and then perform a point-in-polygon analysis.

The dump file is the result of using the pg_dump command. This can dump an entire database, a schema including all the tables within the schema, or a single table. For this specific data set, a single table was dumped from an existing data set representing ACS block groups from 2018.

Download the file census.acs5_2018_bg.dump to your data folder.

This exercise provides one simple example of how GIS functions can be used within the database. PostGIS provides a plethora of analytic methods for processing spatially referenced data. See PostGIS Reference.

Copy this text to the end of your Rmd file:

<!--<start rmd_add_05.txt>-->
## Spatial SQL with PostGIS

First we will restore the dump file to the database.

```{r census}
# does the table exist?
if(!dbTableExists(conn = rsql, table_schema = "census", table_name = "acs5_2018_bg")){
    # zip file
    if(!file.exists("../data/census.acs5_2018_bg.dump") & file.exists("../data/census.acs5_2018_bg.dump.zip")){
        unzip("../data/census.acs5_2018_bg.dump.zip", exdir = "../data/")
    }

    # read in a dump file
    if(Sys.info()['sysname'] == "Windows"){
        # formulate a horrible windows command string
        mycmd <- "cmd /c C:\\PROGRA~1\\PostgreSQL\\12\\bin\\psql -U csde -f ../data/census.acs5_2018_bg.dump rsql"
        # restore
        system(mycmd)
    } else {
        # formulate a nicer looking POSIX command string
        mycmd <- "psql -U csde -c -f ../data/census.acs5_2018_bg.dump rsql"
        # restore
        system(mycmd)
    }
}
```

Let's check that the table exists in the database:

```{r list_tables_spatial}
dbListTables(conn = rsql)
```

Before we can measure the points, we need to create a geometry column within the GPS point table:

```{r gpspoint}
sql <- "
--add the geometry column
alter table gps.tracks add column if not exists geom geometry(point, 4326);
--update the value
update gps.tracks set geom = st_setsrid(st_makepoint(lon, lat), 4326);
--create the spatial index
create index idx_gps_tracks on gps.tracks using gist(geom);
"

res <- dbGetQuery(conn = rsql, statement = sql)
```

Now we formulate a query to create a new table with the count the number of GPS points per census block group, using `count()` to count the number of points aggregated using `group_by` based on containment tested by `st_contains()`:

```{r pip}
sql <- "
--create a new table
create table if not exists gps.tracks_bg as
--select the census BG ID and count of points
select geoid, count(*) as n_points
--from a subquery
from (select c.geoid, g.time_gps
      --aliasing block groups as c
      from census.acs5_2018_bg as c
      --aliasing GPS as g
         , gps.tracks as g
      --the secret sauce, where the block group contains the GPS point
      where st_contains(c.wkb_geometry, g.geom)) as foo
--aggregate count by block group
group by geoid
--order by number of points per BG
order by n_points desc;
--select the records
select * from gps.tracks_bg;
"

# run the query
gps.tracks_bg <- dbGetQuery(conn = rsql, statement = sql)

# print the table
DT::datatable(gps.tracks_bg)

```

This provided one simple examle of how GIS functions can be used within the database. PostGIS provides a plethora of analytic methods for processing spatially referenced data. See [PostGIS Reference](https://postgis.net/docs/reference.html){target="_blank"}.

<hr class="new4">
<!--<end rmd_add_05.txt>-->

and then render the file.

rmarkdown::render("rmd/r_sql.rmd")