1 Get Started–geocode a single point

Start ArcMap. Select Customize > Toolbars and click Geocoding.

This will open the toolbar; drag it to dock it into the main part of your ArcGIS GUI.

In the geocoding toolbar, you will see from left to right, the Select Address Locator (reference data), the Use Map Extent button, the Find Address text entry control, the Address Inspector tool, the Geocode Addresses button (looks like a little mailbox), and the Review/Rematch Addresses button.

We will see how these all work in a moment.

The first thing that we are going to do is use to use the ArcGIS world geocoding service to create a graphic point at an address.

  1. Open a web browser.
  2. Search for Space Needle address you will find the address, 400 Broad Street Seattle Washington 98109.
  3. Tap on the the Find Address single line input and paste in that value or type it in, if you like.
  4. After you enter the address, you should see some cross hairs link and a green dot that appears and then disappears. That means that it has found a matching address.
  5. Right click on the address and select Add Callout.
  6. This puts a callout centered at the geocoded location

You will notice is there is nothing else in the map for context so we are going to add a base map for some context.

  1. Click on the pick list for the Add Data button to add data from ArcGIS online.
  2. Choose Add Basemap.  
     
  3. You will see a number of choices.  
     
  4. Add Streets.
  5. Repeat this process to add Imagery with Labels.

And after a few moments, we should see that the streets base map has been added.

If you are on terminal server 4 or 5, this may open another instance of ArcGIS. If this happens

  1. Right click on the World Street Map layer, and tap copy.
  2. In the original map document, right-click Layers in the TOC and paste layers.
  3. Repeat for the World Imagery layer.

It seems that pasting in the basemap layers deletes the previous callout, so repeat the addition of the Space Needle address location callout.

When you turn on the imagery layer and zoom to the callout, you should see the Space Needle distorted by camera parallax.

This process is used to add a geocoded point interactively.

In this case the data that you are looking at is really only in memory, and so you have not created a new spatial data set that is on file system.


2 Prepare the file system

What we are going to do now is to geocode a table of addresses and that will generate a feature layer stored on file system So the first thing we are going to do is we are going to create a folder to store our work.

Assuming that we are on the CSDE terminal server, everybody should have an H drive.

Open a Windows Explorer and navigate to your H: drive.

Create a folder named geocoding_2022.

And that is where we are going to store all of the.

All the downloaded data and all the data products that we generated for this class go into that geocoding_2022 folder.

Now, in ArcMap, we are going first thing that we are going to do is to save this project because we have done a little bit of work and we don’t want to lose that so within ArcMap, you will tap on the save icon or CTRL-S for save.

Because we will be accessing these files regularly, in ArcCatalog, select Connect to folder.

Navigate to the H drive and then into your geocoding 2022 folder.

We are also going to be creating some spatial data and we want to specify where those data are going to be saved. One of the potentially frustrating things about ArcMap is that every geoprocessing operation has to generate an output data set that stored on the disk.

We are going to set a default geodatabase, so when we go to save files, the default location files will be that location. This will save us some time searching around on disk. Within ArcMap go into ArcCatalog.

We will then make a geodatabase, so again go into ArcCatalog. You should now see that your folder connection is there. Right-click on that folder and then select New File Geodatabase.

It may take a few months for that to get created. It might be given a strange name; if it does, then rename by tapping F2. We want to call this file GDB geocoding_data_20220310.gdb.

Now that we have that created we are going to set it as the default location, so right click on file geodatabase and then tap Make Default Geodatabase.

Save your project again … in fact, save your project every time you have completed a step that adds any data or makes any systematic changes to the project.

Create the folder H:\geocoding_2022\data.

Download the file geocoding_source_data_20220310.zip and unzip to data. There are four files, representing the addresses for the BeanBox top 24 coffee shops in Seattle, the public schools in Seattle, the Yelp top 10 restaurants in Seattle, and reported rat sightings in New York City.

Length      Date    Time    Name
---------  ---------- -----   ----
     1441  03-07-2022 16:20   BeanBox_Top 24_Coffee Shops_2022_03_07.csv
     7428  03-07-2022 16:09   public_school_addresses.csv
      582  03-07-2022 16:03   Yelp_Top10_Seattle_Restaurants_2022_03_07.csv
 74362061  03-08-2022 17:34   Rat_Sightings_20220308.csv
---------                     -------
 74371512                     4 files

3 Geocode a table of addresses

3.1 Top 10 Seattle Restaurants

We are now going to geocode those top 10 Seattle restaurants, using some default settings.

First bring these csv files into the ArcMap document by going to go into ArcCatalog.

You might have to refresh the view of your H: connection but then you should see the data folder.

Drag each one in one, at a time, into your ArcMap session.

And when you look in the table of contents you will see that they have a little icon that shows that they are tables.

Now to start geocoding we are going to click on the Geocode Addresses button.

A dialog with the currently loaded address locators will appear. The currently loaded address locators are the World Geocoding Service and Military Grid Reference system.

We do however have some additional reference data that we are going to use. Tap on add and then navigate into R:\Data\GIS\Geocoding\2020 Business Analyst Data\Geocoding Data. You should see Geocoding Data and Geocoding Data for ArcGIS Pro; we are going to go into Geocoding Data.

Choose the USA local composite click add. After a few moments, you should see the list of address locators.

Select USA LocalComposite and tap Add.

The list should now be populated with the new address locator you just selected.

The next thing that will happen is that we will have the opportunity to select which file we want to use. We are going to use the Yelp top 10 Seattle restaurants.

First, open up the csv in Excel so that you can see the contents. On Terminal Server 4 or 5 it is necessary to open Excel first and then open the spreadsheet file rather than double-clicking the file in the Windows Explorer.

The file contains the ranking, the name of the the restaurant and the street address has house number, street, etc. There appear to be no comma delimiters after the street name.

Because the address is one field, within the geocode addresses dialog, select ADDRESS as the field that we are going to geocode.

The output we are going to create will be a static snapshot. We are not going to choose the default output data set name; instead, browse to the personal geodatabase we created earlier. Copy the file name from the input file in the Windows Explorer and paste the file name (without .csv) as the output layer name.

It is worth taking your time to do this because we will be spending quite a lot of time on various functions with ArcGIS data, so it is worth taking an extra 10 or 15 seconds to give the layer a name you might recognize later.

Once the output file name has been entered, we will run the geocoding with no options; we just want to see how this turns out, so click OK.

Next you should see Geocoding Addresses dialog appear. It will show progress as it is moving along.

When it completes, we see that all 10 records matched. Because they all matched, there is no reason to rematch, so we will just go ahead and close the interface.

You should see that the geocode point layer was added.

Let’s go ahead and turn off display of all of the base map layers so we can see the geocoded points more clearly.

We will change the layer point symbology. Open the layer properties (right click and select Properties or double-click the layer name in the TOC). For labels we are going to label features in this layer using the expression [NAME] & vbNewLine & [ADDRESS], which will print both the name and address of each restaurant near the point symbol.

Change the symbology to a big green circle. Dismiss the layer properties dialog.

You can view the entire extent of the data by right-clicking the layer name in the TOC and click and to Zoom to Layer and then to turn your base map display back on.

3.2 Top 25 Seattle coffee shops

It is highly unlikely that you would ever geocode a data set and not have some errors so Matt and I have baked up a data set that has some errors and we wanted to show you how we geocode those using the default composite locator compared to a composite locator we specify.

Click Geocode Addresses again we will use the same local composite that we used before. But this time we are going to select not the Yelp addresses, but we are going to select the Bean Box top 24 coffee shops.

Open the the csv in Excel. You can see that we have the name and the address in the file.

Open the address geocoder again. For the Address Input Field, chooseSingle Fieldand select theADDRESS` field.

Change the name of the output data set. Use the windows explorer to copy and paste the CSF file name, but add _run1–this is because we are going to do multiple runs using different composite address locators.

Again we will see the Geocoding Addresses dialog open. This time, of course, we have more addresses to match so it will take a little bit longer to complete.

When it is finished, change the layer styling so that the new point data set is distinct. For the single symbol we are going to choose a square with a different color from the restaurant points so that we do not get the coffee shops and restaurants confused.

Also, add the label.

Now we see the most popular coffee shops are spread out over Seattle.

Let’s look into detail at the results. Open the attribute table for the coffee shops.

We see that addresses were matched using locators including PointAddress, StreetAddress (as we had before), but now we also see StreetName, Postal, and Admin Places.

Looking at the three last records, they matched to a rather high score. In fact everything matched to scores above 85, which is commonly used as a minimum match score.

The StreetName and the Postal both matched with a score of 100.

So what is going on with these and the AdminPlaces match? Let’s scroll over to the right and look at the addresses

General Porpoise seems to be missing the house number. Anchor Head Coffee, has Century Link Plaza as the first part of the address, even though it has a full address in there and then the last one looks like we have missed one digit of the zip code and perhaps some problems because of the way that these are formatted without commas.

To have more control over the geocoding process we will create a composite address locator.

Before we do that I’m going to explain a little bit about what is going on here with the locator. So if you go into your catalog and then navigate into R:\Data\GIS\Geocoding\2020 Business Analyst Data\Geocoding Data you see all of the address locators that we have.

The one that we used is a local composite. We will examine the component. Open a separate ArcCatalog window. Right click on USA_LocalComposite and choose properties.

We see composite locator is composed of USA_PointAddress, USA_StreetAddress, USA_StreetName, USA_postal, and USA_AdminPlaces`. These are all different, with varying levels of precision.

The composite address locator will attempt to make a match with the highest precision component locator. If it fails to geocode of one of those locators which is more precise, then it will fail over to the next. For this locator, it will fail over to very low precision component locators.

We will now create a more high precision composite locator. In ArcCatalog, navigate to your geocoding folder, then right click and select New > Composite Address Locator. Find the address locators in the R drive we used before (R:\Data\GIS\Geocoding\2020 Business Analyst Data\Geocoding Data). The first one we are going to select is USA_PointAddress which has the same accuracy and precision as a GPS point. Next, add USAStreetAddress, which uses the interpolated address along the street.

Name the new composite address locator USA_pointaddress_streetaddress_compsite. Click OK; it seems to take a little while for that to get created. However, when the process completes, we will have another composite locator to use.

Now we are going to geocode again, so tap Geocode Addresses and rather than using the other address locators, here we are going to add the local composite that we just created using only the higher quality reference data.

We will re-geocode the Bean Box data.

We will also to go into Geocoding Options now to set the Minimum match score to 100 for both components; this basically means the only addresses that will geocode automatically need to have a perfect match to the reference data. We will also add the Reference data ID and Percent along values in the output.

Click OK to start the geocoding process. When it completes, note the counts and percentages of matched and unmatched records.

Things are quite a bit different; whereas in the previous run, all addresses obtained a match, even ones that seemed to be problematic. In this run there were five that did not match.

Close the Geocoding Addresses dialog and open the attribute table for the new layer. Scroll down, you will see the five records that did not match; the rest all matched with scores of 100.

Looking at these unmatched addresses again:

For these five addresses that did not geocode, we are going to attempt to rematch them.

Click on the review/rematch addresses button on the geocoding toolbar. If the currently selected layer in the TOC is the results of a geocoding run, the review/rematch dialog should open for that layer.

In the interface, you will see at the top a list of the addresses that are unmatched, and in the large panel in the lower right match candidates.

Because of the way that this interface is structured, scroll all the way to the right and resize the address and name columns. Then select both of these two columns. This will pop the focus back to the left part of the interface, so scroll to the right again and then click and drag those columns to the left. It is easier to look at the name and address with these columns at the left hand side of the interface.

Now, you will see in the large lower right panel, if there are candidates that could match, they will be displayed.

The location names of the candidates are shown; we have one that is a PointAddress with a match score candidate at 97.3. Other candidates matched from StreetAddress.

The obvious error here is the misspelling of Fremont as Free Mont. In the Single Line Input control, change the street name to Fremont and then tap the TAB key to see that the candidate scores get refreshed. Because PointAddress is the best locator select that one and click Match.

Next we move on to Cafe Vita Roasters, which shows that there were three candidates, one PointAddress and two StreetAddress candidates. It looks like the address was incomplete so there is a final 2 that should be added to the ZIP code. When we add that, then we can see our match scores increase. The other error is that Broadway is just “Broadway”, not Broadway Avenue East. Delete the Ave and see the point address has matched. Click Match.

Next is Anchor Head Coffee, which looks like there is preceding Century Link Plaza that should not be part of the address. Delete the extraneous characters, tap TAB, we can see that there is a perfect match with the PointAdress; click Match.

Moving on to General Porpoise, it looks like the problem here is that we don’t have the House number. Here we will do something that you can do sometimes but it is not always recommended, which is to attempt finding the address using Google Maps. The reason this may not be recommended is if you are geocoding a patient database or a set of addresses that is otherwise sensitive, the data are being transferred to Google, which may be a violation of IRB rules.

In any case, we put General Porpoise into a Google Maps search. There are a few different stores, including 4520 Union Bay Place Northeast, and also 1020 East Union Street with ZIP code 98122, which is a match. Add the address to the Single Line Input control; now we see a 100 match score with USA_PointAddress.

The last one, here we have is the Boon Boona coffee, which seems only to have the unit number in line with the address. Selecting and deleting the unit number brings us to a 92.43 match.

We can see from Google Maps that there is a 1223 Cherry Street, but the zip code is 98104.

What if we put in East Cherry Street to Google Maps? We see that puts us at Seattle University and there is the Boon Boona coffee. We will accept this as a match.

We are done geocoding this data set because we now see that all of our addresses have matched. Furthermore, they all matched with a score of 100.

Closed the interactive rematch dialog and save the project.

4 Quality of geocodes

4.1 Locational error

Addresses that are geocoded to different reference data sets can have different locational errors. The Boon Boona Coffee shop had a match score of 100 using two different address locators. But we can see that they were placed at quite different locations.

To examine the error, use different symbology for each data set; here we can display the points using a blue triangle. We can use Zoom To Selected to see the two locations.

With the points selected in both layers, they will appear having a cyan symbol. Using the world street map base map layer shows that they are quite a quite a distance from each other. We can use the Measure tool to estimate the distance between the two geocoded points that had the same address.

This shows a 2.5 km distance, Which for most purposes is probably more error than you would want to deal with.

This amount of error in a study that was looking at some characteristics on a statewide or regional basis, this might be an acceptable error. But for studies examining the local neighborhood retail environment or if you were trying to get contextual information from the home location, based on US census data, this kind of error could easily place the subjects home location in an incorrect block block group or tracked.

4.2 Summary of geocoding results

Here we will use R to quantify the descriptive statistics of some of the geocoding we performed.

Download and open geocoding_descriptives.Rmd. We will use the sf package to read the geocoding results and some simple summary functions to look at match rates by locator type and match type.

5 Additional geocoding (time permitting)

5.1 Seattle Schools

If you would like more practice geocoding, you can use the addresses for Seattle public schools. Use the same point/street address composite that we created.

The public school addresses are stored in multiple fields, rather than as a single field as we saw with the Yelp and BeanBox data. We have address and city. We do not have state–which can generally be ignored as long as there are ZIP code data.

Set the Geocoding Options (minimum match score 100, reference data ID Percent along).

When geocoding completes, close the dialog and open the attribute table for that, just to have a glance.

There are 107 schools, and it looks like most of them matched to USA_PointAddress, where some of them matched a USA_StreetAddress. Let’s take a look at those–zoom to the extent of that layer turn off my other geocoded layers.

They did all match to a score of 100. We can display these by categories to see which ones matched by point address, and which ones matched by street address.

That all the points were geocoded is unusual; this is probably because the data set came from the city, and the addresses were carefully recorded and vetted.

Typically there are more mismatches with volunteered data; were this any other kind of data set where, after geocoding, you had some variation in the locator name the status match type, it would be worth looking at and reporting match rates, scores, locators, etc. to get a sense of how well geocoding went, and possibly to change your geocoding preference, tolerances, and so forth.

5.2 Rat sightings in New York City

We are going to use the same locator but this time we are going to try to geocode those rat sightings from New York City. These data have addresses in also multiple fields. We have Incident.address, City, and Incident.Zip. These are all new York so we don’t really need the State.

The geocoding run took about 1/2 hour. If there is not time to run these, you can download the data set consisting of the initial pass of automated geocoding without any rematching: rat_sightings_20220304.gdb.zip.

There were a large number of sightings that did not have a street address, but did have cross streets. See the companion Rmd file for code to update the Incident.address for those sightings that only had cross streets.

6 GIS analysis with results

6.1 Rat sightings kernel density estimator

6.2 Rat sightings: overlay with census tracts (intersect)