SLURM, Django and Migratation to Postgres

by Sebastien Mirolo on Tue, 29 Jul 2014

When we picked SLURM because of the accounting feature, we also deployed a MySQL database as recommended. We also wanted to tightly integrate our Django project with SLURM so we created models out of the slurmdbd schemas and tweaked the foreign key definitions, etc. The following is the account of everything that went wrong.

Slurmdbd Caching

One of issue we encountered early yet took a long time to figure out is that writes to the slurmdbd daemon do make it to the database. More precisely they sometimes get committed right away, while sometimes do not for a very long time. The sacct commands communicate to the slurmdbd daemon using a purpose built protocol. Since the slurmdbd does not seem to export a REST API and relying on the database directly is unreliable, the tight integration approach started to look doomed.

South Migrations

Django South will not do anything with the apps that have no migration (require ./manage.py syncdb). South will try its best to create/update tables for apps which have a migrations module.

In fortylines web application, we are dealing with tables in a mysql database which are created and maintained by a third-party software stack (slurm). We have extracted models (./manage.py inspectdb) from those slurmdbd tables to access the information from within our Django site. We also have created Django-managed models to run the web service. Of course we have constraints between the "unmanaged" tables and the managed tables.

In retrospect, it seems obvious we should have created two apps:

  • One for third-party-managed tables which we will carefully avoid to create South schema migration for.
  • One for Django-managed models which we will create a South migrations module for.

The division seems arbitrary since third-party and pure django models and views are highly integrated. None-the-less once we deploy code and need to migrate the database in production, the separation became necessary in order to use South migrations.

Migrating from MySQL to Postgres

Since we picked MySQL because of SLURM but now will end-up with a lossy coupled architecture, it was time to migrate, at least our own code, to Postgres.

Postgres has a lot of features that support South migrations a lot better than MySQL. Postgres is also the default database engine for Heroku. With the sale of Sun to Oracle and the MariaDB fork of MySQL, it was just time.

I found a Django-related post that connected a Django app to both database engine to do the migration. Here, I preferred Converting Rails applications from MySQL to PostgreSQL as a starting point. I also relied on information here and here to

  1. Export the data as SQL statements
  2. Massage the data dump file
  3. Import the data file to Postgres

MySQL export

Here the shell script to do the export:

$ cat mysql-export.sh
#!/bin/sh

db_name=$1
echo "SET standard_conforming_strings = 'off';"
echo "SET backslash_quote = 'on';"
echo "UPDATE pg_cast set castcontext='a' where casttarget = 'boolean'::regtype;"
mysqldump -u root -p --no-create-info \
    --compatible=postgresql --complete-insert \
    --default-character-set=utf8 $db_name \
| sed -e 's/^\(UN\)\?LOCK.*$//g'
echo "UPDATE pg_cast set castcontext='e' where casttarget = 'boolean'::regtype;"

$ ./mysql-export.sh db_name > db_name.sql

We set standard_conforming_strings and backslash_quote to deal with escaped chars. We also set automatic coercion to boolean at the beginning since MySQL emits 0/1 while Postgres expects 't'/'f'. Coercion is restored to explicit at the end of the script. Finally we remove the LOCK/UNLOCK directive. Postgres does not like them.

Postgres Import

Postgres will complain on forward reference foreign keys so we had to massage the data dump a little. Hopefully we did not had reference circles in our original data.

$ psql -1 -U postgres -d db_name -f db_name.sql

Finally we must update the auto-increment to the latest value otherwise we will end up with duplicate key next time we try to insert a record.

$ SELECT c.relname FROM pg_class c WHERE c.relkind = 'S';
$ SELECT last_value FROM example_id_seq;
$ SELECT setval('example_id_seq', (SELECT MAX(id) from example));

Done.

by Sebastien Mirolo on Tue, 29 Jul 2014