Introducing Query Tuning Workbooks to safely tune Postgres queries on production with pganalyze!

PostGIS vs. Geocoder in Rails

This article sets out to compare PostGIS in Rails with Geocoder and to highlight a couple of the areas where you'll want to (or need to) reach for one over the other. I will also present some of the terminology and libraries that I found along the way of working on this project and article as I set out to understand PostGIS better and how it is integrated with Rails.

If you are interested in learning how to work with geospatial data with PostGIS in Django I recommend having a look at our blog post Using GeoDjango and PostGIS in Django here.

PostGIS vs. Geocoder in Rails Picture via Annie Spratt on Unsplash

I have built a number of Rails applications over the years that show locations on a map, have nearby search functionality, and I had never used PostGIS before! How was this possible? The reason is that there is a Ruby gem named Geocoder which enables you to do these sorts of queries, and it's quite efficient! That said, there is a reason that PostGIS exists. For more complex geo queries I’d recommend reaching beyond Geocoder to PostGIS.

As an example, if you wanted to find homes which have a school within 1km of them, or if you wanted to draw an oddly shaped polygon on a map and search within it, this is the world where PostGIS shines and makes these complex geo queries possible.

In this article we will be covering:

  • PostGIS in Rails setup
  • Finding nearby records (Geocoder + PostGIS)
  • Finding records within a bounding box (Geocoder + PostGIS)
  • Finding records within a polygon (PostGIS)
  • Finding nearby related records (PostGIS)

The source code referenced in this article can be found here.

Installing PostGIS

Postgres comes with a number of built-in extensions that you can enable, but unfortunately PostGIS (Spatial and Geographic objects for Postgres) isn't one of them. In order to enable this extension, you will have to use a Postgres install with PostGIS support. I recommend using the official postgis docker image, but luckily many hosted Postgres solutions come with PostGIS already available. If you are not sure, you can query the available extensions with the following query:

select *
from pg_available_extensions
where name like '%postgis%'

If you'd like to see if the extension is already enabled, you can run this query:

select * from pg_extension

And finally, to enable this extension, you can use the command create extension postgis, but since we're working within Rails, there is a Gem that will take care of this step for us as we'll see below.

ActiveRecord PostGIS Adapter

If you have confirmed that your version of Postgres supports the postgis extension, you're ready to integrate it with your Rails application. This can be done by using the activerecord-postgis-adapter gem. Two things need to be done to get up and running. The first is to update the adapter within config/database.yml to be set to postgis. Next, if this is a new application, you can run rails db:create as normal, but if it is an existing one, you'll have to run the command rake db:gis:setup. This command is enabling the postgis extension in your database.

Our Example Data

We'll be working with sample data for a realtor website that allows us to find homes in a variety of ways, including homes that are nearby a local school. There are two models: homes and schools. The Rails migration to create these tables is below:

class CreateHomes < ActiveRecord::Migration[6.0]
  def change
    create_table :homes do |t|
      t.string :name, null: false
      t.string :status, null: false
      t.bigint :price, null: false
      t.integer :beds, null: false, default: 0
      t.integer :baths, null: false, default: 0
      t.st_point :coords, null: false, geographic: true
      t.float :longitude, null: false
      t.float :latitude, null: false
      t.timestamps

      t.index :coords, using: :gist
      t.index %i[latitude longitude]
      t.index :status
      t.index :price
    end

    create_table :schools do |t|
      t.st_point :coords, null: false, geographic: true

      t.index :coords, using: :gist
      t.timestamps
    end
  end
end

By using activerecord-postgis-adapter we are able to define PostGIS columns within our migration file. When working with PostGIS you can store a point (latitude + longitude) as a single column of type ts_point, whereas when working with Geocoder the latitude and longitude are stored as floats in separate columns. Because we are comparing the two approaches, we will store the data both ways, but typically you would choose one approach or the other.

PostGIS geographic columns can be indexed using GiST style indexes. GiST indexes are required over B-Tree indexes when working with geographic data because coordinates cannot be easily sorted along a single axis (such as numbers, letters, dates, etc...) in a way that would allow the database to speed up common geographic operations.

The example project for this article contains a seeds file (run with rake db:seed) which will generate 100k homes and 100 schools in and around the Atlanta, Georgia area.

Download Free eBook: Efficient Search in Rails with Postgres

Building a Geo Helper Class with PostGIS

The Rails PostGIS adapter is based on a library named RGeo, which while incredibly powerful, I found a little bit confusing due to a lack of documentation. I ended up building a small helper class to generate different geo objects for me. The first thing to point out is what SRID is. Just like the imperial and metric systems are used to measure and weigh amounts using an agreed upon reference point, coordinates also need a coordinate reference system to ensure that the latitude and longitude that one uses means the same thing to different people when referring to a single place on earth. 4326 is the spatial system used for GPS satellite navigation systems and the one we will be using within this article.

One last thing to define is what WKT is. Well-known Text representation of geometry is a string representation of a point, line string, and polygon (among other things) that we will be using in our examples in this article. This is the format Postgres (PostGIS) receives and displays geographic data types in.

class Geo
  SRID = 4326

  def self.factory
    @@factory ||= RGeo::Geographic.spherical_factory(srid: SRID)
  end

  def self.pairs_to_points(pairs)
    pairs.map { |pair| point(pair[0], pair[1]) }
  end

  def self.point(longitude, latitude)
    factory.point(longitude, latitude)
  end

  def self.line_string(points)
    factory.line_string(points)
  end

  def self.polygon(points)
    line = line_string(points)
    factory.polygon(line)
  end

  def self.to_wkt(feature)
    "srid=#{SRID};#{feature}"
  end
end

Finding Nearby Records with PostGIS and Geocoder

One of the most common geo queries used in applications is to find all records within X distance from a known point (the user's location, an event, a search, etc...). Because we installed Geocoder and added reverse_geocoded_by :latitude, :longitude to our Home class, we can use the nearby method to find all homes within 5km of this latitude and longitude (which happens to be Atlanta, Georgia). Geocoder likes to have arrays with latitude and then longitude, as opposed to PostGIS which prefers the exact opposite order!

Home.near([33.753746, -84.386330], 5).count(:all) # ~5ms

This query ran in about 5ms on my computer (searching through 100k records)... pretty fast! The reason it is fast is because we added an index on the latitude and longitude fields, but also because Geocoder applies a bounding box filter which utilises the index. Remember the Spatial Reference System (SRID) that we mentioned above? Because our coordinates do not take place on a Cartesian plane, we can’t use a standard distance formula to calculate the distance between two points. Although we won’t venture further into the math of this query below, it takes into consideration the Earth’s spherical nature when calculating the distance between two coordinates as specified by latitude and longitude. This article dives into more detail on these calculations if you are interested.

SELECT COUNT(*) FROM "homes" WHERE (homes.latitude BETWEEN 33.708779919704064 AND 33.798712080295935 AND homes.longitude BETWEEN -84.44041260768655 AND -84.33224739231345 AND (6371.0 * 2 * ASIN(SQRT(POWER(SIN((33.753746 - homes.latitude) * PI() / 180 / 2), 2) + COS(33.753746 * PI() / 180) * COS(homes.latitude * PI() / 180) * POWER(SIN((-84.38633 - homes.longitude) * PI() / 180 / 2), 2)))) BETWEEN 0.0 AND 5)

We'll have to build our own near query when working with PostGIS, but don't worry, it's pretty straight forward! The g_near method lives within the Home model, and takes advantage of the ST_DWithin function provided by PostGIS. Remember that we have to convert our point into the correct WKT format so that PostGIS understands the data we are passing it.

def self.g_near(point, distance)
  where(
    'ST_DWithin(coords, :point, :distance)',
    { point: Geo.to_wkt(point), distance: distance * 1000 } # wants meters not kms
  )
end

Home.g_near(Geo.point(-84.386330, 33.753746), 5).count # ~5ms

This query performs just about as fast as the Geocoder version (because of our GiST index on the coords column), but is definitely a little easier on the eyes to read.

SELECT COUNT(*) FROM "homes" WHERE (ST_DWithin(coords, 'srid=4326;POINT (-84.38633 33.753746)', 5000))

Finding Records Within a Bounding Box with PostGIS and Geocoder

Geocoder provides us a way to find all records within a bounding box (roughly a rectangle, ignoring projection onto a sphere), and we just have to pass it the bottom left (south west) and top right (north east) coordinates.

Home.within_bounding_box(
  [33.7250057553, -84.4224209302],
  [33.774350796, -84.3570139222]
).count # ~5ms

Because it can use the index on latitude and longitude, it is quite efficient.

SELECT COUNT(*) FROM "homes" WHERE (homes.latitude BETWEEN 33.7250057553 AND 33.774350796 AND homes.longitude BETWEEN -84.4224209302 AND -84.3570139222)

To perform a bounding box query using PostGis, we'll create a method named g_within_box inside of the Home model, and utilize a PostGIS function named ST_MakeEnvelope along with the && operator.

def self.g_within_box(sw_point, ne_point)
  where(
    "coords && ST_MakeEnvelope(:sw_lon, :sw_lat, :ne_lon, :ne_lat, #{
      Geo::SRID
    })",
    {
      sw_lon: sw_point.longitude,
      sw_lat: sw_point.latitude,
      ne_lon: ne_point.longitude,
      ne_lat: ne_point.latitude
    }
  )
end

Home.g_within_box(
  Geo.point(-84.4224209302, 33.7250057553),
  Geo.point(-84.3570139222, 33.774350796)
).count # ~5ms

Again, this version performs at about the same efficiency as the Geocoder version.

SELECT COUNT(*) FROM "homes" WHERE (coords && ST_MakeEnvelope(-84.4224209302, 33.7250057553, -84.3570139222, 33.774350796, 4326))

PostGIS vs. Geocoder in Rails - Bounding Box

Finding Records Within a Polygon with PostGIS and Geocoder

We're now into territory that requires PostGIS. To find records inside of a polygon, along with the help of our Geo class helper and the ST_Covers function from PostGIS, we can create a method named g_within_polygon in our Home model. This polygon is a triangle, where the last point is the same as the first one, thereby "closing" the shape of the polygon.

def self.g_within_polygon(points)
  polygon = Geo.polygon(points)
  where('ST_Covers(:polygon, coords)', polygon: Geo.to_wkt(polygon))
end

Home.g_within_polygon(
  Geo.pairs_to_points(
    [
      [-84.39731626974567, 33.75570358345219],
      [-84.33139830099567, 33.86524376001825],
      [-84.25243406759724, 33.770545357734925],
      [-84.39731626974567, 33.75570358345219]
    ]
  )
).count # ~5ms

This query remains efficient due to the use of our GiST index, searching through 100k records in about 5ms.

SELECT COUNT(*) FROM "homes" WHERE (ST_Covers('srid=4326;POLYGON ((-84.39731626974567 33.75570358345219, -84.33139830099567 33.86524376001825, -84.25243406759724 33.770545357734925, -84.39731626974567 33.75570358345219))', coords))

PostGIS vs. Geocoder in Rails - Bounding Box

Using PostGIS it is also possible to find related nearby records. What do I mean by that? Let's try to find available homes that are within 1km of a school. This can be done by joining to the schools table and utilizing ST_DWithin for the on clause. Starting with the SQL we'd like to produce:

SELECT
  count(DISTINCT homes.id)
FROM
  homes
  INNER JOIN schools ON ST_DWithin (homes.coords, schools.coords, 1000)
WHERE
  homes.status = 'available'

Within the Home model of our Rails application, we can create two scopes that allow us to find these homes. We're able to join 100k homes to the schools table (100 schools) based on their proximity in approximately 16ms.

class Home < ApplicationRecord
  scope :available, -> { where(status: 'available') }
  scope :near_school,
        lambda {
          select('DISTINCT ON (homes.id) homes.*').joins(
            'INNER JOIN schools ON ST_DWithin (homes.coords, schools.coords, 1000)'
          )
        }
end
# Example using the scopes declared above
Home.available.near_school.count('distinct homes.id') # 16ms

Conclusion

We've only scratched the surface of what you can do with PostGIS, yet we were able to cover a ton of functionality that is common among websites that allow you to filter results based on their location. That said, if PostGIS isn't available as an extension on your version of Postgres, or you aren't requiring the power that PostGIS provides, Geocoder offers you a great alternative.

Share this article: If you liked this article you might want to tweet it to your peers.

About the Author

Leigh Halliday is a guest author for the pganalyze blog. He is a developer based out of Canada who works at FlipGive as a full-stack developer. He writes about Ruby and React on his blog and publishes React tutorials on YouTube.


Enjoy blog posts like this?

Get them once a month to your inbox