Rails & Postgres - Selectively Restoring From a Compressed pg_dump File
Problem
You have a backup of a postgres database from the pg_dump
command that was output via the custom format resulting
in a compressed file.
Now you want to take this file to another environment and don’t need certain tables fully restored with data.
Solution
namespace :db do
desc "Selectively remove passed in tables data from being imported on restore"
task selective_restore: :environment do |_t, args|
raise("Don't run in production") if Rails.env.production?
config = ActiveRecord::Base.configurations[Rails.env]
database = config['database']
user = config['username'] || nil
password = config['password'] || nil
port = config['port'] || 5432
host = config['host'] || 'localhost'
db_dump_filepath = Rails.root.join('tmp', 'db.dump')
db_list_filepath = Rails.root.join('tmp', 'db.list')
db_bak_filepath = Rails.root.join('tmp', 'db.list.bak')s
`pg_restore -l #{db_dump_filepath} > #{db_list_filepath}`
args.extras.each do |arg|
`sed -i.bak '/TABLE DATA public #{arg.downcase}/ s/^/;/' #{db_list_filepath}`
end
pg_restore_command = [].tap { |rc|
rc << "PGUSER=#{user}" if user
rc << "PGPASSWORD=#{password}" if password
rc << "pg_restore -h #{host} -d #{database}"
rc << "--verbose --clean --no-acl --no-owner"
rc << "-L #{db_list_filepath} #{db_dump_filepath}"
}.join(' ')
`#{pg_restore_command}`
`rm #{db_list_filepath} #{db_bak_filepath}`
end
end
Details (If you’re interested…)
Here is what a compressed pg_dump
file looks like.
We want to take this compressed file and push it through the
pg_restore
command with the -l
flag to create a .list
file.
`pg_restore -l #{db_dump_filepath} > #{db_list_filepath}`
This will create a new file db.list
and this file is a readable version that lists
the contents of the archive.
Now we can take this newly created list file and run it through
linux’s sed
command to comment out each table passed in via
the args to prevent that data from being imported on restore.
This is being done in the code block here:
args.extras.each do |arg|
`sed -i.bak '/TABLE DATA public #{arg.downcase}/ s/^/;/' #{db_list_filepath}`
end
All this command is doing is using regex to look through the
file to find the table passed in and if it is found add a ;
to
the beginning of the line. This comments the line out.
The sed
command needs the -i.bak
which creates a backup of the file before making these changes. Since this is used locally or in
a non-production environment I’m not concerned with this being
overwrote after each iteratiton. I delete this file once the
task is complete.
When the db.list
file is done being modified we can pass that
file in to the pg_restore
command with the -L
option with
the original db.dump
file.
This is an example of what the command looks like that is built
via the [].tap
code block.
pg_restore -h localhost -d selective_db_restore_development --verbose --clean --no-acl --no-owner -L ./db.list ./db.dump
pg_restore
will use that list file to only restore the elements and tables listed in that file.
Example Code
I have an example repo up on github if you wanted to check it and see this work.
In this repo I seeded a database with a bunch of data via the faker gem.
You can find the compressed db dump file lib/tasks
folder.
You can run the following and it will restore all the tables.
bin/rake db:selective_restore
Then if you want to see it exclude tables a command like
the following can be run.
bin/rake db:selective_restore[hops,fermentables]
You will see when looking at the database after running this
no data in the hops
or fermentables
table.