I recently switched from Django 1.6 to 1.7, and I began using migrations (I never used South).
Before 1.7, I used to load initial data with a fixture/initial_data.json
file, which was loaded with the python manage.py syncdb
command (when creating the database).
Now, I started using migrations, and this behavior is deprecated :
If an application uses migrations, there is no automatic loading of fixtures. Since migrations will be required for applications in Django 2.0, this behavior is considered deprecated. If you want to load initial data for an app, consider doing it in a data migration. (https://docs.djangoproject.com/en/1.7/howto/initial-data/#automatically-loading-initial-data-fixtures)
The official documentation does not have a clear example on how to do it, so my question is :
What is the best way to import such initial data using data migrations :
Write Python code with multiple calls to mymodel.create(...), Use or write a Django function (like calling loaddata) to load data from a JSON fixture file.
I prefer the second option.
I don't want to use South, as Django seems to be able to do it natively now.
Update: See @GwynBleidD's comment below for the problems this solution can cause, and see @Rockallite's answer below for an approach that's more durable to future model changes.
Assuming you have a fixture file in <yourapp>/fixtures/initial_data.json
Create your empty migration: In Django 1.7: python manage.py makemigrations --empty
Short version
You should NOT use loaddata
management command directly in a data migration.
# Bad example for a data migration
from django.db import migrations
from django.core.management import call_command
def load_fixture(apps, schema_editor):
# No, it's wrong. DON'T DO THIS!
call_command('loaddata', 'your_data.json', app_label='yourapp')
class Migration(migrations.Migration):
dependencies = [
# Dependencies to other migrations
]
operations = [
migrations.RunPython(load_fixture),
]
Long version
loaddata
utilizes django.core.serializers.python.Deserializer
which uses the most up-to-date models to deserialize historical data in a migration. That's incorrect behavior.
For example, supposed that there is a data migration which utilizes loaddata
management command to load data from a fixture, and it's already applied on your development environment.
Later, you decide to add a new required field to the corresponding model, so you do it and make a new migration against your updated model (and possibly provide a one-off value to the new field when ./manage.py makemigrations
prompts you).
You run the next migration, and all is well.
Finally, you're done developing your Django application, and you deploy it on the production server. Now it's time for you to run the whole migrations from scratch on the production environment.
However, the data migration fails. That's because the deserialized model from loaddata
command, which represents the current code, can't be saved with empty data for the new required field you added. The original fixture lacks necessary data for it!
But even if you update the fixture with required data for the new field, the data migration still fails. When the data migration is running, the next migration which adds the corresponding column to the database, is not applied yet. You can't save data to a column which does not exist!
Conclusion: in a data migration, the loaddata
command introduces potential inconsistency between the model and the database. You should definitely NOT use it directly in a data migration.
The Solution
loaddata
command relies on django.core.serializers.python._get_model
function to get the corresponding model from a fixture, which will return the most up-to-date version of a model. We need to monkey-patch it so it gets the historical model.
(The following code works for Django 1.8.x)
# Good example for a data migration
from django.db import migrations
from django.core.serializers import base, python
from django.core.management import call_command
def load_fixture(apps, schema_editor):
# Save the old _get_model() function
old_get_model = python._get_model
# Define new _get_model() function here, which utilizes the apps argument to
# get the historical version of a model. This piece of code is directly stolen
# from django.core.serializers.python._get_model, unchanged. However, here it
# has a different context, specifically, the apps variable.
def _get_model(model_identifier):
try:
return apps.get_model(model_identifier)
except (LookupError, TypeError):
raise base.DeserializationError("Invalid model identifier: '%s'" % model_identifier)
# Replace the _get_model() function on the module, so loaddata can utilize it.
python._get_model = _get_model
try:
# Call loaddata command
call_command('loaddata', 'your_data.json', app_label='yourapp')
finally:
# Restore old _get_model() function
python._get_model = old_get_model
class Migration(migrations.Migration):
dependencies = [
# Dependencies to other migrations
]
operations = [
migrations.RunPython(load_fixture),
]
objects = serializers.deserialize('json', fixture, ignorenonexistent=True)
suffer from the same issue as loaddata
? Or does ignorenonexistent=True
cover all possible issues?
ignorenonexistent=True
argument has two effects: 1) it ignores models of a fixture which are not in the most current model definitions, 2) it ignores fields of a model of a fixture which are not in the most current corresponding model definition. None of them handles the new-required-field-in-the-model situation. So, yes, I think it suffers the same issue as plain loaddata
.
natural_key()
, which this method doesn't seem to support - I just replaced the natural_key value with the actual id of the referenced model.
Inspired by some of the comments (namely n__o's) and the fact that I have a lot of initial_data.*
files spread out over multiple apps I decided to create a Django app that would facilitate the creation of these data migrations.
Using django-migration-fixture you can simply run the following management command and it will search through all your INSTALLED_APPS
for initial_data.*
files and turn them into data migrations.
./manage.py create_initial_data_fixtures
Migrations for 'eggs':
0002_auto_20150107_0817.py:
Migrations for 'sausage':
Ignoring 'initial_data.yaml' - migration already exists.
Migrations for 'foo':
Ignoring 'initial_data.yaml' - not migrated.
See django-migration-fixture for install/usage instructions.
In order to give your database some initial data, write a data migration. In the data migration, use the RunPython function to load your data.
Don't write any loaddata command as this way is deprecated.
Your data migrations will be run only once. The migrations are an ordered sequence of migrations. When the 003_xxxx.py migrations is run, django migrations writes in the database that this app is migrated until this one (003), and will run the following migrations only.
myModel.create(...)
(or using a loop) in the RunPython function ?
The solutions presented above didn't work for me unfortunately. I found that every time I change my models I have to update my fixtures. Ideally I would instead write data migrations to modify created data and fixture-loaded data similarly.
To facilitate this I wrote a quick function which will look in the fixtures
directory of the current app and load a fixture. Put this function into a migration in the point of the model history that matches the fields in the migration.
RunPython(load_fixture('badger', 'stoat'))
. gist.github.com/danni/1b2a0078e998ac080111
In my opinion fixtures are a bit bad. If your database changes frequently, keeping them up-to-date will came a nightmare soon. Actually, it's not only my opinion, in the book "Two Scoops of Django" it's explained much better.
Instead I'll write a Python file to provide initial setup. If you need something more I'll suggest you look at Factory boy.
If you need to migrate some data you should use data migrations.
There's also "Burn Your Fixtures, Use Model Factories" about using fixtures.
On Django 2.1, I wanted to load some models (Like country names for example) with initial data.
But I wanted this to happen automatically right after the execution of initial migrations.
So I thought that it would be great to have an sql/
folder inside each application that required initial data to be loaded.
Then within that sql/
folder I would have .sql
files with the required DMLs to load the initial data into the corresponding models, for example:
INSERT INTO appName_modelName(fieldName)
VALUES
("country 1"),
("country 2"),
("country 3"),
("country 4");
https://i.stack.imgur.com/YJQ5K.png
Also I found some cases where I needed the sql
scripts to be executed in a specific order. So I decided to prefix the file names with a consecutive number as seen in the image above.
Then I needed a way to load any SQLs
available inside any application folder automatically by doing python manage.py migrate
.
So I created another application named initial_data_migrations
and then I added this app to the list of INSTALLED_APPS
in settings.py
file. Then I created a migrations
folder inside and added a file called run_sql_scripts.py
(Which actually is a custom migration). As seen in the image below:
https://i.stack.imgur.com/4pjZv.png
I created run_sql_scripts.py
so that it takes care of running all sql
scripts available within each application. This one is then fired when someone runs python manage.py migrate
. This custom migration
also adds the involved applications as dependencies, that way it attempts to run the sql
statements only after the required applications have executed their 0001_initial.py
migrations (We don't want to attempt running a SQL statement against a non-existent table).
Here is the source of that script:
import os
import itertools
from django.db import migrations
from YourDjangoProjectName.settings import BASE_DIR, INSTALLED_APPS
SQL_FOLDER = "/sql/"
APP_SQL_FOLDERS = [
(os.path.join(BASE_DIR, app + SQL_FOLDER), app) for app in INSTALLED_APPS
if os.path.isdir(os.path.join(BASE_DIR, app + SQL_FOLDER))
]
SQL_FILES = [
sorted([path + file for file in os.listdir(path) if file.lower().endswith('.sql')])
for path, app in APP_SQL_FOLDERS
]
def load_file(path):
with open(path, 'r') as f:
return f.read()
class Migration(migrations.Migration):
dependencies = [
(app, '__first__') for path, app in APP_SQL_FOLDERS
]
operations = [
migrations.RunSQL(load_file(f)) for f in list(itertools.chain.from_iterable(SQL_FILES))
]
I hope someone finds this helpful, it worked just fine for me!. If you have any questions please let me know.
NOTE: This might not be the best solution since I'm just getting started with django, however still wanted to share this "How-to" with you all since I didn't find much information while googling about this.
python manage.py migrate load_initial_data
will NOT detect any changes. So this is useful for REALLY static initial data, with no changes allowed. Though, it's an improvement over the accepted answer
Success story sharing
loaddata('loaddata', fixture_filename, app_label='<yourapp>')
will also go directly to the app fixture dir (hence no need to build the fixture's full path)models.py
files, that can have some extra fields or some other changes. If some changes were made after creating migration, it will fail (so we can't even create schema migrations after that migration). To fix that we can teporaly change apps registry that serializer is working on to registry provided to migration fuction on first parameter. Registry to path is located atdjango.core.serializers.python.apps
.app registry
, without changing a global variable (which could cause problems in an hypothetic future with parallel database migrations).