← Lessons |

Schema Migrations That Don't Drift

What Most Migration Systems Miss

Most teams start with a basic migration runner: numbered SQL files, a tracker table, a command that applies what’s pending. It works fine until the gaps start showing up.

A developer applies a schema change manually in staging because the runner was broken. They forget to record it. The next deploy tries to re-apply it, finds the table exists, and fails. Someone else edits an already-applied migration to fix a typo. Prod and staging drift apart silently. The tracker says a migration ran, but the schema doesn’t match what the migration produces.

Each of these gaps is small individually. Collectively, they erode confidence in the migration system until no one trusts it, and schema changes become the most anxiety-inducing part of a deploy.

The standard we built for UbixCore starts from a different premise: the migration system should be the authoritative source of truth for what every database looks like, with no gaps.

Timestamp Filenames Fix Merge Conflicts

The first decision is naming. Most systems use sequential integers: 001_create_users.sql, 002_add_email.sql. The problem surfaces when two engineers branch off the same commit and both write migrations. Engineer A writes 003_add_phone.sql. Engineer B also writes 003_add_index.sql. One of them has to renumber on merge. At scale, this happens constantly.

UbixCore uses UTC timestamps: YYYYMMDDHHMMSS_description.sql. Timestamps are unique by construction — two engineers writing migrations in the same second is implausible. Timestamps order naturally. No renumbering, no merge conflict.

sql/migrations/20260505143045_pre_attribution_referrer_tables.sql
sql/migrations/20260612091522_add_email_to_admin_users.sql

Every migration also begins with a structured header that the runner parses:

-- Migration: 20260505143045_pre_attribution_referrer_tables
-- Database: VSCASH
-- Description: Lookup tables for the pre-attribution chain.
-- Author: Christopher W. Olsen

One database per migration. If a logical change spans two databases, write two files and let the timestamp ordering handle sequencing.

One Tracker Table for Everything

Most migration frameworks put a schema_migrations table inside each application database. That works when you have N independent databases on N servers owned by N teams.

UbixCore’s reality is different: all databases are schemas on one shared cluster, maintained by one team. A single tracker table (SYSTEMS.Schema_Migrations) records every applied migration across every database. One place to look. One place for CI to check. One row to read when the question is “did this migration land on prod?”

The tracker records the migration ID, which database it targeted, when it was applied, who applied it (cli:<username>, ci:<deploy-id>, or manual:<username> for emergency manual applies), and how long it took. It also records a SHA-256 checksum of the migration’s body.

Applied Migrations Are Immutable

The checksum is how the system detects a class of mistake that manual processes can’t catch: someone editing a migration after it’s been applied.

On every migrate:up and on migrate:status --verify, the runner recomputes each applied file’s checksum and compares it against the recorded value. A mismatch fails immediately:

“Migration 20260505143045_pre_attribution_referrer_tables has been edited after application. Migrations are forward-only; write a new migration to amend the schema.”

CI runs migrate:status --verify as a gate on every deploy. A checksum mismatch blocks the deploy.

This enforces forward-only migrations in a way that policy alone can’t. To fix a mistake in an applied migration, you write a new migration. The old one stays as it was. Prod is always reproducible from the migration history — you can replay every applied migration into a clean database and get exactly what prod looks like.

Tiered CI Strictness

Not every environment needs the same strictness. Engineers iterate on schema changes locally, sometimes apply manually, sometimes revert and try again. Blocking every dev PR on drift makes the team slower without meaningful safety benefit.

The policy has three tiers:

EnvironmentChecksum mismatchSchema drift
devLoud, non-blockingLoud, non-blocking
stagingBlocks deployBlocks deploy
prodBlocks deploy + pagesBlocks deploy + pages

Dev is noisy by design. Staging and prod are strict. A migration that passes the staging gate will pass through to prod — the tiered policy means every migration has been validated in a strict environment before it touches production.

Schema vs Seed Split

Migrations and seed data are different things and belong in different places.

Migrations (sql/migrations/) are structural: CREATE TABLE, ALTER TABLE, CREATE INDEX. They’re applied exactly once. The runner fails loudly if you try to re-apply one.

Seeds (sql/seeds/) are reference data: country lists, default flag values, lookup tables. They’re idempotent — INSERT ... ON DUPLICATE KEY UPDATE — and can be re-run without breaking anything.

When a logical change requires both a new table and initial data, write two files: a migration for the structure, a seed for the rows. The runner handles them independently. Seeds can be refreshed on every deploy as a routine sync; migrations are applied exactly once.

Destructive Operations Get Five Layers of Protection

Schema migrations are powerful. DROP TABLE, TRUNCATE, ALTER TABLE ... DROP COLUMN, and unbounded DELETE FROM can all wipe prod data irreversibly. Five layers protect the platform, stacked from detection through process:

Layer 1 — Detection at parse time. The runner classifies any migration containing DROP TABLE, TRUNCATE TABLE, RENAME TABLE, ALTER TABLE ... DROP COLUMN, or unbounded DELETE FROM as destructive before any SQL hits the database.

Layer 2 — Required Destructive: header. A destructive migration must add a fifth header line explaining why the operation is safe:

-- Migration: 20260612091522_drop_legacy_voyeur_columns
-- Database: VSCASH
-- Description: Drop unused voyeur_* columns from Performer_Login.
-- Author: Christopher W. Olsen
-- Destructive: Columns last referenced Q1 2025; verified zero reads in prod.

The runner refuses to apply a destructive migration without this header. The point is to force the author to type the word destructive before they can drop a table.

Layer 3 — Runtime acknowledgement on staging and prod. On non-dev environments, the runner aborts when destructive migrations are pending and requires --i-acknowledge-destructive to proceed. The flag is logged into the tracker row permanently.

Layer 4 — Pre-apply backup. Before running a destructive migration on staging or prod, the runner takes a mysqldump --single-transaction snapshot of every table the migration touches. Retained for 90 days.

Layer 5 — Two-approver PR rule. Any PR touching a migration file with a Destructive: header requires two approvers in GitLab. The lint surfaces the destructive flag during review so the second approver knows what they’re approving.

Soft-Delete-First Convention

The strongest protection is behavioral: never drop in the same migration that retires usage. The convention:

  1. Migration A renames to a _deprecated_ prefix, making the object invisible to active code but recoverable.
  2. Migration B (at least 30 days later, after monitoring confirms zero reads) actually drops it.

Both operations are flagged destructive. The 30-day window gives the platform real recovery time — a rename can be undone in seconds; a drop cannot.

The policy doesn’t prevent a determined engineer from ignoring it. What it does is ensure that every step is deliberate and auditable. The Destructive: header stays in the git history. The +destructive-ack suffix stays in the tracker. The backup stays on disk for 90 days. Harm can still happen; it just can’t happen silently.

Manual Applies Are Reconcilable

In an emergency — runner broken, hotfix needed, environment offline from CI — manual schema changes are permitted. The rule: follow up immediately with migrate:reconcile <migration_id> --reason="...". This inserts the tracker row with applied_by = manual:<username>. The runner won’t try to re-apply it. CI’s drift check passes.

An applied migration without a tracker row is a P3 issue. It gets caught by CI’s drift check, but it creates work for whoever lands the next migration. The reconcile command exists so the friction of doing it right is lower than the friction of cleaning it up later.

What This Buys

The most important property of this system isn’t any individual feature — it’s that it can answer a simple question at any moment: exactly what state is this database in, how did it get there, and can it be reproduced?

Migration systems that can’t answer that question with confidence are systems the team learns not to trust. Once trust is lost, schema changes get done cautiously, manually, and with anxiety — which is the worst possible state for something that should be routine.

The timestamp filenames, the checksum enforcement, the tiered CI, the destructive operation layers: all of it is in service of one thing. The migration history is the truth. Prod is always reproducible from it.

Tags: Database Migrations DevOps UbixCore Architecture