Rails & Postgres - Selectively Restoring From a Compressed pg_dump File
Problem
You have a backup of a postgres database from the pg_dump
command that was output via the custom format resulting
in a compressed file.
Now you want to take this file to another environment and don’t need certain tables fully restored with data.
Solution
namespace :db do
  desc "Selectively remove passed in tables data from being imported on restore"
  task selective_restore: :environment do |_t, args|
    raise("Don't run in production") if Rails.env.production?
    config = ActiveRecord::Base.configurations[Rails.env]
    database = config['database']
    user = config['username'] || nil
    password = config['password'] || nil
    port = config['port'] || 5432
    host = config['host'] || 'localhost'
    db_dump_filepath = Rails.root.join('tmp', 'db.dump')
    db_list_filepath = Rails.root.join('tmp', 'db.list')
    db_bak_filepath = Rails.root.join('tmp', 'db.list.bak')s
    `pg_restore -l #{db_dump_filepath} > #{db_list_filepath}`
    args.extras.each do |arg|
      `sed -i.bak '/TABLE DATA public #{arg.downcase}/ s/^/;/' #{db_list_filepath}`
    end
    pg_restore_command = [].tap { |rc|
      rc << "PGUSER=#{user}" if user
      rc << "PGPASSWORD=#{password}" if password
      rc << "pg_restore -h #{host} -d #{database}"
      rc << "--verbose --clean --no-acl --no-owner"
      rc << "-L #{db_list_filepath} #{db_dump_filepath}"
    }.join(' ')
    `#{pg_restore_command}`
    `rm #{db_list_filepath} #{db_bak_filepath}`
  end
end
Details (If you’re interested…)
Here is what a compressed pg_dump file looks like.

We want to take this compressed file and push it through the
pg_restore command with the -l flag to create a .list file.
`pg_restore -l #{db_dump_filepath} > #{db_list_filepath}`
This will create a new file db.list and this file is a readable version that lists
the contents of the archive.

Now we can take this newly created list file and run it through
linux’s sed command to comment out each table passed in via
the args to prevent that data from being imported on restore.
This is being done in the code block here:
args.extras.each do |arg|
  `sed -i.bak '/TABLE DATA public #{arg.downcase}/ s/^/;/' #{db_list_filepath}`
end
All this command is doing is using regex to look through the
file to find the table passed in and if it is found add a ; to
the beginning of the line. This comments the line out.
The sed command needs the -i.bak which creates a backup of the file before making these changes. Since this is used locally or in
a non-production environment I’m not concerned with this being
overwrote after each iteratiton. I delete this file once the
task is complete.
When the db.list file is done being modified we can pass that
file in to the pg_restore command with the -L option with
the original db.dump file.
This is an example of what the command looks like that is built
via the [].tap code block.
pg_restore -h localhost -d selective_db_restore_development --verbose --clean --no-acl --no-owner -L ./db.list ./db.dump
pg_restore will use that list file to only restore the elements and tables listed in that file.
Example Code
I have an example repo up on github if you wanted to check it and see this work.
In this repo I seeded a database with a bunch of data via the faker gem.
You can find the compressed db dump file lib/tasks folder.
You can run the following and it will restore all the tables.
bin/rake db:selective_restore
Then if you want to see it exclude tables a command like
the following can be run.
bin/rake db:selective_restore[hops,fermentables]
You will see when looking at the database after running this
no data in the hops or fermentables table.
