Rails & Postgres - Selectively Restoring From a Compressed pg_dump File

Problem

You have a backup of a postgres database from the pg_dump command that was output via the custom format resulting in a compressed file.

Now you want to take this file to another environment and don’t need certain tables fully restored with data.

Solution

namespace :db do
  desc "Selectively remove passed in tables data from being imported on restore"
  task selective_restore: :environment do |_t, args|
    raise("Don't run in production") if Rails.env.production?

    config = ActiveRecord::Base.configurations[Rails.env]
    database = config['database']
    user = config['username'] || nil
    password = config['password'] || nil
    port = config['port'] || 5432
    host = config['host'] || 'localhost'
    db_dump_filepath = Rails.root.join('tmp', 'db.dump')
    db_list_filepath = Rails.root.join('tmp', 'db.list')
    db_bak_filepath = Rails.root.join('tmp', 'db.list.bak')s

    `pg_restore -l #{db_dump_filepath} > #{db_list_filepath}`

    args.extras.each do |arg|
      `sed -i.bak '/TABLE DATA public #{arg.downcase}/ s/^/;/' #{db_list_filepath}`
    end

    pg_restore_command = [].tap { |rc|
      rc << "PGUSER=#{user}" if user
      rc << "PGPASSWORD=#{password}" if password
      rc << "pg_restore -h #{host} -d #{database}"
      rc << "--verbose --clean --no-acl --no-owner"
      rc << "-L #{db_list_filepath} #{db_dump_filepath}"
    }.join(' ')

    `#{pg_restore_command}`

    `rm #{db_list_filepath} #{db_bak_filepath}`
  end
end

Details (If you’re interested…)

Here is what a compressed pg_dump file looks like.

compressed pg_dump file

We want to take this compressed file and push it through the pg_restore command with the -l flag to create a .list file.

`pg_restore -l #{db_dump_filepath} > #{db_list_filepath}`

This will create a new file db.list and this file is a readable version that lists the contents of the archive.

db list file

Now we can take this newly created list file and run it through linux’s sed command to comment out each table passed in via the args to prevent that data from being imported on restore.

This is being done in the code block here:

args.extras.each do |arg|
  `sed -i.bak '/TABLE DATA public #{arg.downcase}/ s/^/;/' #{db_list_filepath}`
end

All this command is doing is using regex to look through the file to find the table passed in and if it is found add a ; to the beginning of the line. This comments the line out.

The sed command needs the -i.bak which creates a backup of the file before making these changes. Since this is used locally or in a non-production environment I’m not concerned with this being overwrote after each iteratiton. I delete this file once the task is complete.

When the db.list file is done being modified we can pass that file in to the pg_restore command with the -L option with the original db.dump file.

This is an example of what the command looks like that is built via the [].tap code block.

pg_restore -h localhost -d selective_db_restore_development --verbose --clean --no-acl --no-owner -L ./db.list ./db.dump

pg_restore will use that list file to only restore the elements and tables listed in that file.

Example Code

I have an example repo up on github if you wanted to check it and see this work.

In this repo I seeded a database with a bunch of data via the faker gem.

You can find the compressed db dump file lib/tasks folder.

You can run the following and it will restore all the tables.

bin/rake db:selective_restore

Then if you want to see it exclude tables a command like

the following can be run.

bin/rake db:selective_restore[hops,fermentables]

You will see when looking at the database after running this no data in the hops or fermentables table.