Setting up a Read Only Follower Database on Heroku

As CoachUp has grown, the amount of statistics and reporting needed to support the core business has also grown.  Although it may not be evident to passersby, every business has back end reporting and statistics needs in addition to customer facing features.  There are some tools and services that make reporting easier, but I have yet to find one that actually covers all of our use cases (If you don’t believe me, I challenge you to leave a comment and make a recommendation or tweet me up @aradmand).  This means that we need to store metadata and statistical information ourselves.  If your application is backed by a database, you’re probably using the same database to store both customer and statistics related data.

When a business is just getting off the ground this is perfectly acceptable. However, this presents a couple of problems:

  1. As the business grows, the amount of application data/logic and statistical data/logic also grows. It makes sense to decouple and separate application data from statistical data and the larger each set grows, the more difficult this will become.
  2. Reports can often be taxing on the database.  If the same database is being used to run your application as well as statistics reports, it means that application performance can be degraded by running those reports.  Not ideal for your customers!

Solutions:

There several options when you get yourself into the aforementioned scenario (note: this is not an exhaustive list, but it enumerates the practical options that we realistically have on the table given time/effort/resources):

  1. Move all reporting and statistical data/logic out of your original application to a completely separate database and create a separate reporting application.
  2. Leave the statistical data/logic coupled in your database; set up a follower database; and alter your application code to have all reports pull data out of the follower database.
  3. Leave the statistical data/logic coupled in your database; set up a follower database; and create a separate application that runs using the follower database.

There are pros and cons of each approach, which I’m more than happy to discuss in the comments below or on social media.  Ultimately, we decided to take approach #3.

Why?

I’m glad you asked! We decided approach #3 was best for us because we ultimately want to implement scenario #1, with a separate application AND completely decoupled statistics data.  However, given time an resources available, we determined that the last approach would be a move in the right direction and could be achieved relatively quickly.

How?

Setting up this configuration was fairly straightforward:

1.  Create a follower Read-only database that mirrors your production database

Heroku makes this super simple.  A follower can be created for any Standard, Premium, or Enterprise tier database that is itself not a follower (that is, followers cannot be chained).

Use the heroku pg:info command to ensure your database is capable of supporting a follower. You also need to know the add-on name of the master database (i.e. HEROKU_POSTGRESQL_*COLOR*_URL):

heroku pg:info
=== HEROKU_POSTGRESQL_PURPLE_URL (DATABASE_URL)
...
Fork/Follow: Available
Status: available
...

Create a follower database by provisioning a new heroku-postgresql Standard, Production, or Enterprise add-on database and specify the master database to follow with the --follow flag.

heroku addons:add heroku-postgresql:ronin --follow HEROKU_POSTGRESQL_CHARCOAL_URL

That’s all!  Give the database some time to catch up to the master DB (depends on the amount of data you have).

For full details, you can visit the following article:

https://devcenter.heroku.com/articles/heroku-postgres-follower-databases

 

2. Provision a second Heroku app, which will run off the follower database just configured

Creating a new app on Heroku is fairly straightforward. Follow the instructions listed here:

https://devcenter.heroku.com/articles/creating-apps

 

3. Configure the new app to use the follower database

On Heroku, applications use the DATABASE_URL environment variable to determine the database to which they should connect. So setting up the new application to use the follower database is just a matter or setting the DATABASE_URL variable to the location of the follower database:

Use heroku config to determine the URL of the follower database:

heroku config
=== Config Vars
...
HEROKU_POSTGRESQL_GOLD_URL:  postgres://u83npgc802nbpu:p4vj4nee924qcc5b8d21dp9m9pf@ec2-54-235-122-116.compute-1.amazonaws.com:5682/d347pnrm01llrm
...

Set the value of DATABASE_URL in your new app to the URL of the follower database:

heroku config:add DATABASE_URL='postgres://u83npgc802nbpu:p4vj4nee924qcc5b8d21dp9m9pf@ec2-54-235-122-116.compute-1.amazonaws.com:5682/d347pnrm01llrm' -a new-app-name

Voila! Done!

You now have a new application running off a follower database. Now, just deploy and push code to this new app as normal.

The result?  We can now run reports on production data using this follower database and not put any burden at all on our production site. Perfect!

 

Caveats and Unexpected Issues

One unexpected issue encounter during this project was our use of the DelayedJob gem. We rely on this gem to asynchronously execute longer tasks for us.  However, due to the implementation of DelayedJob, it required a DB write to the Job table each time a task was scheduled. Since we were now running off a Read-only database, this was not an option.

The solution? Resque to the rescue! Resque is a Redis-based job queuing system.  Switching to Resque allowed us to maintain the same functionality we already had in our application (with minimal alterations) but also worked in the new Read-only DB environment.

Stay tuned for a follow-up blog post describing how to set up Resque for your Heroku application!

Has anyone else attempted similar solutions? Any suggestions for how to make this configuration better?