Handle schema changes and data changes using migrations?
ashley [2:18 PM]
What are different strategies folks use for handling schema changes and data changes and using migrations? Once you have a production system running but need to do schema changes and data changes or initialization, what are the strategies folks have used, and why?
I definitely know and understand why doing changes to data in migrations is not a good idea. So, then, what are the ways you’ve handled it? How do you reconcile having code that might live ‘outside’ of your system? (e.g. rake files, etc.) Do you keep it incorporated through capistrano or the like?
We’re having good discussions about this on the project I’m working on and am interested in gathering ideas and considerations and lessons learned. (edited)
excid3 [2:45 PM]
@ashley i think the main concerns are 1. what if something fails halfway through, how do you make sure that doesn't break the current running version and 2. how do make sure there is no data loss if something happens
ashley
[2:47 PM]
yup. Also: How far back to you want to guarantee that you can do rollbacks? This is of particular concern with any kind of data manipulation.
excid3 [2:50 PM]
you probably want to make sure things are working in production for a week or so before making it unrollbackable
ashley
[2:57 PM]
Right. That addresses production stuff. And then you also must make the testing data and environments and the developer data and environments coherent. With this kind of strategy, you’re essentially saying “From this point [tag] forward, you must first start with this db schema and this initial data and move forward. And that rollbacks are not guaranteed.” You can only run db scheme migrations forward.
[3:01]
Still leaves open the question: Where do you put data changes if not in migrations? (ex: you add states to some list of all possible states for some object/model; ex: you change the initial state for some object/model)
What if that data change must happen after db migration a1
and then you have a migration later that depends on both the schema change and the data change?
(Yes, the system is still in relatively early stages; new business rules and concepts are being worked through so things can be in flux.)
[3:03]
I don’t think there is just one way (tm) to handle those things. I’m quite sure that there are different strategies that have costs/benefits that you must consider for each project. I guess I’m looking for a list or good write up of common strategies and real world experiences with them. (And also free donuts.)
esparkman [3:20 PM]
blue/green deployments help in some regard
[3:21]
you rollout to one set of server groups, pull them out of your production pool, verify, put them back in, rinse and repeat for the remaining instances
mclosson [3:30 PM]
Also its a good idea not to use ActiveRecord model classes in migrations to manipulate data. Instead you should use direct SQL to manipulate tables and records because the model names can be changed and refactored over time and this will cause the migrations to fail in the future for new developers or deployments. Example:
- In a migration after you update the schema you try to update all the records like so:
- Thing.all.each { |thing| thing.update(someattr: some_calculation) }
- Then you rename Thing to Item in the future, now your migration will fail. Instead you should use SQL like: schema change then… sql = “UPDATE things SET someattr …” ActiveRecord::Base.connection.execute(sql) This way the migration will never fail regardless over the class names.
inopinatus [3:57 PM]
tip re. @mclosson’s recommendation: if you’re not comfortable with SQL or just have a Really Complicated change to make as part of the migration, prototype the query in AR then call .to_sql
on it.
[4:00]
personally I am comfortable having data changes in the migration. But I generally prefer to do complex data changes via multiple migrations, in this sequence:
- Change code so that it will work with both the old and new schema. Commit, test, deploy.
- Write, commit and deploy the migration that sets up the new column or table structure or whatever
- Change code so that it only uses the new schema. Commit, test, deploy.
- Write, commit and deploy a migration that removes the now unused schema elements.
[4:03]
I never roll back in production. I don’t think it’s wise to even try. If you’ve blundered that badly and prod is hosed, it’s likely the rollback contains more landmines that you will miss because you are now panicking. Instead, put up your maintenance apology page, write a fix, and roll forwards. (edited)
[4:04]
Rollbacks are for dev, test, staging environments.
inopinatus [4:09 PM]
Complex migrations are like taking a corner at high speed on a two-wheeled vehicle. If you understand what you’re doing, you can amaze onlookers whilst remaining well inside your comfort zone. If you don’t understand … you’re risking a nasty injury.
[4:10]
@excid3 Maybe there’s scope for a short series on complicated migrations where data and code change in nontrivial ways.
ashley [5:41 PM]
“model obesity” excellent term, @inopinatus!
This is a software engineering issue not limited to Rails, for sure. It’s interesting to me that there’s never been a group of patterns that have been developed and generally accepted in the Rails arena.
@mclosson re: “Also its a good idea not to use ActiveRecord model classes in migrations to manipulate data. ….” use SQL instead.
I’m not sure I totally agree with that. IMHO, you’re just trading one set of tools for another (Ruby/Rails ActiveRecord vs. SQL), but the issues remains the same.