Published on

Prefixed base62 UUIDv7 Object IDs with Ecto

Ecto supports integer IDs and UUIDv4 out-of-the-box, but I needed both UUIDv7 and human-readable Object IDs. This was a breeze to implement in Ecto.

ID, UUIDv7, and Object ID

Incremental integer IDs (e.g. 21) are easy to get started with. They take up little space and have a lexicographical order (aka index locality) which makes sorting easy. However, there are many downsides. They are vulnerable to enumeration attacks, can leak sensitive information, scaling and replication become a pain with key conflicts, and you can’t generate the ID offline (in your application logic) since we must know the previous ID.

UUID (e.g. 3bff9b42-f5e4-425e-aca2-39b490dfc878) solves all these issues by generating a unique string of alphanumeric characters. These can be generated offline and inserted as-is into the database. The risk of collision is extremely low. There are five different standard versions. UUIDv4 is generally the preferred UUID version because of its low collision risk and only carries random data. It is what Ecto.UUID generates. The downside is that UUIDv4 doesn’t have index locality which makes sorting on the ID practically impossible.

UUIDv7 (e.g. 0188aadc-f449-7818-8862-5eff12733f64) is a proposed standard that solves the index locality issue of UUIDv4 by making the first part of the UUIDv7 a Unix epoch timestamp. It’s the favored UUID version in place of UUIDv4 in the New UUID Formats IETF draft. We’ll use UUIDv7 in our database.

Externally, in our API and web interface, we would like human-readable Object IDs. The UUID format doesn’t read that well in URIs; /accounts/0188aadc-f449-7818-8862-5eff12733f64. We can use base62 encoding to shorten it and get rid of the hyphens; /accounts/02tRrww6GFm4urcMhyQpAS. Object IDs should carry context so we will add prefixes to the IDs: /accounts/acct_02tRrww6GFm4urcMhyQpAS.

Why base62? Base62 is just base64URL without the - and _ characters. We are already using underscore to separate prefixes and IDs and I think hyphens make the ID read worse. With base62 we get a short alphanumeric ID without special characters.

The Object ID becomes very useful with polymorphic fields and for debugging. We will know what type of resource we’re dealing with by just looking at the ID. There’s much more to this which has been covered in length in the blog post Designing APIs for humans: Object IDs by Paul Asjes.

Let us continue with the Ecto implementation.

Prefixed base62 UUIDv7 in Ecto

First, we’ll install the Uniq library to generate UUIDv7:

# mix.exs
defmodule MyApp.MixProject do
  use Mix.Project

  # ...

  def deps do
    [
      # ...
      {:uniq, "~> 0.5.0"}
    ]
  end

  #...
end

Now we’ll set up a custom Ecto.ParameterizedType module that base62 encodes and prefixes the UUIDs.

We’ll first set up the init/1 callback to require the prefix. init/1 also gets called on associations, where no extra options like the prefix can be added, so we must differentiate between when it’s the primary key or foreign key.

Uniq is initialized with raw UUIDv7 format.

defmodule MyApp.PrefixedUUID do
  @doc """
  Generates prefixed base62 encoded UUIDv7.

  ## Examples

      @primary_key {:id, MyApp.PrefixedUUID, prefix: "acct", autogenerate: true}
      @foreign_key_type MyApp.PrefixedUUID
  """
  use Ecto.ParameterizedType

  @impl true
  def init(opts) do
    schema = Keyword.fetch!(opts, :schema)
    field = Keyword.fetch!(opts, :field)
    uniq = Uniq.UUID.init(schema: schema, field: field, version: 7, default: :raw, dump: :raw)

    case opts[:primary_key] do
      true ->
        prefix = Keyword.get(opts, :prefix) || raise "`:prefix` option is required"

        %{
          primary_key: true,
          schema: schema,
          prefix: prefix,
          uniq: uniq
        }

      _any ->
        %{
          schema: schema,
          field: field,
          uniq: uniq
        }
    end
  end

  @impl true
  def type(_params), do: :uuid

  # ...
end

Next, we’ll implement cast/2 and dump/3 which decodes the slug into the raw UUID. cast/2 gets called on external input (not from the database), and the output will be used for the dump/3 callback. dump/3 is called when we got data to be inserted into the database.

To decode the slug we first split it into the prefix and encoded components, then decode the encoded UUID. We ensure the prefix matches the module prefix. If we’re dealing with a belongs_to association, we must fetch the prefix option from the related schema module.

Note that nil is handled as the belongs_to field could be optional.

  @impl true
  def cast(nil, _params), do: {:ok, nil}

  def cast(data, params) do
    with {:ok, prefix, _uuid} <- slug_to_uuid(data, params),
         {prefix, prefix} <- {prefix, prefix(params)} do
      {:ok, data}
    else
      _ -> :error
    end
  end

  defp slug_to_uuid(string, _params) do
    with [prefix, slug] <- String.split(string, "_"),
         {:ok, uuid} <- decode_base62_uuid(slug) do
      {:ok, prefix, uuid}
    else
      _ -> :error
    end
  end

  defp prefix(%{primary_key: true, prefix: prefix}), do: prefix

  # If we deal with a belongs_to assocation we need to fetch the prefix from
  # the associations schema module
  defp prefix(%{schema: schema, field: field}) do
    %{related: schema, related_key: field} = schema.__schema__(:association, field)
    {:parameterized, __MODULE__, %{prefix: prefix}} = schema.__schema__(:type, field)

    prefix
  end

  @impl true
  def dump(nil, _, _), do: {:ok, nil}

  def dump(slug, dumper, params) do
    case slug_to_uuid(slug, params) do
      {:ok, _prefix, uuid} -> Uniq.UUID.dump(uuid, dumper, params.uniq)
      :error -> :error
    end
  end

Now we’ll implement load/3 and autogenerate/1 that encodes the UUID. load/3 gets called when we get the UUID from the database. Note that nil is handled here again as the belongs_to field could be optional and thus nil in the database.

autogenerate/1 calls Uniq to generate a new UUIDv7. To encode the UUID we fetch the prefix and prepend it to the base62 encoded UUID string.

  @impl true
  def load(data, loader, params) do
    case Uniq.UUID.load(data, loader, params.uniq) do
      {:ok, nil} -> {:ok, nil}
      {:ok, uuid} -> {:ok, uuid_to_slug(uuid, params)}
      :error -> :error
    end
  end

  defp uuid_to_slug(uuid, params) do
    "#{prefix(params)}_#{encode_base62_uuid(uuid)}"
  end

  @impl true
  def autogenerate(params) do
    uuid_to_slug(Uniq.UUID.autogenerate(params.uniq), params)
  end

Now we got prefixed base62 encoded UUIDv7 working! All that’s left is to implement the base62 encoder/decoder (encode_base62_uuid/1 and decode_base62_uuid/1).

Let’s put it all together and then implement it into our app. Here, we got the complete module and test module:

defmodule MyApp.PrefixedUUID do
  @doc """
  Generates prefixed base62 encoded UUIDv7.

  ## Examples

      @primary_key {:id, MyApp.PrefixedUUID, prefix: "acct", autogenerate: true}
      @foreign_key_type MyApp.PrefixedUUID
  """
  use Ecto.ParameterizedType

  @impl true
  def init(opts) do
    schema = Keyword.fetch!(opts, :schema)
    field = Keyword.fetch!(opts, :field)
    uniq = Uniq.UUID.init(schema: schema, field: field, version: 7, default: :raw, dump: :raw)

    case opts[:primary_key] do
      true ->
        prefix = Keyword.get(opts, :prefix) || raise "`:prefix` option is required"

        %{
          primary_key: true,
          schema: schema,
          prefix: prefix,
          uniq: uniq
        }

      _any ->
        %{
          schema: schema,
          field: field,
          uniq: uniq
        }
    end
  end

  @impl true
  def type(_params), do: :uuid

  @impl true
  def cast(nil, _params), do: {:ok, nil}

  def cast(data, params) do
    with {:ok, prefix, _uuid} <- slug_to_uuid(data, params),
         {prefix, prefix} <- {prefix, prefix(params)} do
      {:ok, data}
    else
      _ -> :error
    end
  end

  defp slug_to_uuid(string, _params) do
    with [prefix, slug] <- String.split(string, "_"),
         {:ok, uuid} <- decode_base62_uuid(slug) do
      {:ok, prefix, uuid}
    else
      _ -> :error
    end
  end

  defp prefix(%{primary_key: true, prefix: prefix}), do: prefix

  # If we deal with a belongs_to assocation we need to fetch the prefix from
  # the associations schema module
  defp prefix(%{schema: schema, field: field}) do
    %{related: schema, related_key: field} = schema.__schema__(:association, field)
    {:parameterized, __MODULE__, %{prefix: prefix}} = schema.__schema__(:type, field)

    prefix
  end

  @impl true
  def load(data, loader, params) do
    case Uniq.UUID.load(data, loader, params.uniq) do
      {:ok, nil} -> {:ok, nil}
      {:ok, uuid} -> {:ok, uuid_to_slug(uuid, params)}
      :error -> :error
    end
  end

  defp uuid_to_slug(uuid, params) do
    "#{prefix(params)}_#{encode_base62_uuid(uuid)}"
  end

  @impl true
  def dump(nil, _, _), do: {:ok, nil}

  def dump(slug, dumper, params) do
    case slug_to_uuid(slug, params) do
      {:ok, _prefix, uuid} -> Uniq.UUID.dump(uuid, dumper, params.uniq)
      :error -> :error
    end
  end

  @impl true
  def autogenerate(params) do
    uuid_to_slug(Uniq.UUID.autogenerate(params.uniq), params)
  end

  @impl true
  def embed_as(format, params), do: Uniq.UUID.embed_as(format, params.uniq)

  @impl true
  def equal?(a, b, params), do: Uniq.UUID.equal?(a, b, params.uniq)

  # UUID Base62 encoder/decoder

  @base62_uuid_length 22
  @uuid_length 32

  defp encode_base62_uuid(uuid) do
    uuid
    |> String.replace("-", "")
    |> String.to_integer(16)
    |> base62_encode()
    |> String.pad_leading(@base62_uuid_length, "0")
  end

  defp decode_base62_uuid(string) do
    with {:ok, number} <- base62_decode(string) do
      number_to_uuid(number)
    end
  end

  defp number_to_uuid(number) do
    number
    |> Integer.to_string(16)
    |> String.downcase()
    |> String.pad_leading(@uuid_length, "0")
    |> case do
      <<g1::binary-size(8), g2::binary-size(4), g3::binary-size(4), g4::binary-size(4), g5::binary-size(12)>> ->
        {:ok, "#{g1}-#{g2}-#{g3}-#{g4}-#{g5}"}

      other ->
        {:error, "got invalid base62 uuid; #{inspect other}"}
    end
  end

  # Base62 encoder/decoder

  @base62_alphabet '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'

  for {digit, idx} <- Enum.with_index(@base62_alphabet) do
    defp base62_encode(unquote(idx)), do: unquote(<<digit>>)
  end

  defp base62_encode(number) do
    base62_encode(div(number, unquote(length(@base62_alphabet)))) <>
      base62_encode(rem(number, unquote(length(@base62_alphabet))))
  end

  defp base62_decode(string) do
    string
    |> String.split("", trim: true)
    |> Enum.reverse()
    |> Enum.reduce_while({:ok, {0, 0}}, fn char, {:ok, {acc, step}} ->
      case decode_base62_char(char) do
        {:ok, number} -> {:cont, {:ok, {acc + number * Integer.pow(unquote(length(@base62_alphabet)), step), step + 1}}}
        {:error, error} -> {:halt, {:error, error}}
      end
    end)
    |> case do
      {:ok, {number, _step}} -> {:ok, number}
      {:error, error} -> {:error, error}
    end
  end

  for {digit, idx} <- Enum.with_index(@base62_alphabet) do
    defp decode_base62_char(unquote(<<digit>>)), do: {:ok, unquote(idx)}
  end

  defp decode_base62_char(char), do: {:error, "got invalid base62 character; #{inspect char}"}
end
defmodule MyApp.PrefixedUUIDTest do
  use MyApp.DataCase

  alias MyApp.PrefixedUUID
  alias Uniq.UUID

  defmodule TestSchema do
    use Ecto.Schema

    @primary_key {:id, PrefixedUUID, prefix: "test", autogenerate: true}
    @foreign_key_type PrefixedUUID

    schema "test" do
      belongs_to :test, TestSchema
    end
  end

  @params PrefixedUUID.init(schema: TestSchema, field: :id, primary_key: true, autogenerate: true, prefix: "test")
  @belongs_to_params PrefixedUUID.init(schema: TestSchema, field: :test, foreign_key: :test_id)
  @loader nil
  @dumper nil

  @test_prefixed_uuid "test_3TUIKuXX5mNO2jSA41bsDx"
  @test_uuid UUID.to_string("7232b37d-fc13-44c0-8e1b-9a5a07e24921", :raw)
  @test_prefixed_uuid_with_leading_zero "test_02tREKF6r6OCO2sdSjpyTm"
  @test_uuid_with_leading_zero UUID.to_string("0188a516-bc8c-7c5a-9b68-12651f558b9e", :raw)
  @test_prefixed_uuid_null "test_0000000000000000000000"
  @test_uuid_null UUID.to_string("00000000-0000-0000-0000-000000000000", :raw)
  @test_prefixed_uuid_invalid_characters "test_" <> String.duplicate(".", 32)
  @test_uuid_invalid_characters String.duplicate(".", 22)
  @test_prefixed_uuid_invalid_format "test_" <> String.duplicate("x", 31)
  @test_uuid_invalid_format String.duplicate("x", 21)

  test "cast/2" do
    assert PrefixedUUID.cast(@test_prefixed_uuid, @params) == {:ok, @test_prefixed_uuid}
    assert PrefixedUUID.cast(@test_prefixed_uuid_with_leading_zero, @params) == {:ok, @test_prefixed_uuid_with_leading_zero}
    assert PrefixedUUID.cast(@test_prefixed_uuid_null, @params) == {:ok, @test_prefixed_uuid_null}
    assert PrefixedUUID.cast(nil, @params) == {:ok, nil}
    assert PrefixedUUID.cast("otherprefix" <> @test_prefixed_uuid, @params) == :error
    assert PrefixedUUID.cast(@test_prefixed_uuid_invalid_characters, @params) == :error
    assert PrefixedUUID.cast(@test_prefixed_uuid_invalid_format, @params) == :error
    assert PrefixedUUID.cast(@test_prefixed_uuid, @belongs_to_params) == {:ok, @test_prefixed_uuid}
  end

  test "load/3" do
    assert PrefixedUUID.load(@test_uuid, @loader, @params) == {:ok, @test_prefixed_uuid}
    assert PrefixedUUID.load(@test_uuid_with_leading_zero, @loader, @params) == {:ok, @test_prefixed_uuid_with_leading_zero}
    assert PrefixedUUID.load(@test_uuid_null, @loader, @params) == {:ok, @test_prefixed_uuid_null}
    assert PrefixedUUID.load(@test_uuid_invalid_characters, @loader, @params) == :error
    assert PrefixedUUID.load(@test_uuid_invalid_format, @loader, @params) == :error
    assert PrefixedUUID.load(@test_prefixed_uuid, @loader, @params) == :error
    assert PrefixedUUID.load(nil, @loader, @params) == {:ok, nil}
    assert PrefixedUUID.load(@test_uuid, @loader, @belongs_to_params) == {:ok, @test_prefixed_uuid}
  end

  test "dump/3" do
    assert PrefixedUUID.dump(@test_prefixed_uuid, @dumper, @params) == {:ok, @test_uuid}
    assert PrefixedUUID.dump(@test_prefixed_uuid_with_leading_zero, @dumper, @params) == {:ok, @test_uuid_with_leading_zero}
    assert PrefixedUUID.dump(@test_prefixed_uuid_null, @dumper, @params) == {:ok, @test_uuid_null}
    assert PrefixedUUID.dump(@test_uuid, @dumper, @params) == :error
    assert PrefixedUUID.dump(nil, @dumper, @params) == {:ok, nil}
    assert PrefixedUUID.dump(@test_prefixed_uuid, @dumper, @belongs_to_params) == {:ok, @test_uuid}
  end

  test "autogenerate/1" do
    assert prefixed_uuid = PrefixedUUID.autogenerate(@params)
    assert {:ok, uuid} = PrefixedUUID.dump(prefixed_uuid, nil, @params)
    assert {:ok, %UUID{format: :raw, version: 7}} = UUID.parse(uuid)
  end
end

With this we can set up our schema module key types:

@primary_key {:id, PrefixedUUID, prefix: "test", autogenerate: true}
@foreign_key_type PrefixedUUID

We can make it more DRY by setting up and using a MyApp.Schema module instead of Ecto.Schema in our schema modules:

defmodule MyApp.Schema do
  @moduledoc """
  This module defines the default schema settings for schema modules
  in MyApp.

  Primary and foreign keys are prefixed UUIDs.

  ## Usage

      defmodule MyApp.Accounts.Account do
        use MyApp.Schema, prefix: "acct"

        # ...
      end
  """

  defmacro __using__(opts \\ []) do
    prefix = Keyword.fetch!(opts, :prefix)

    quote do
      use Ecto.Schema

      @primary_key {:id, MyApp.PrefixedUUID, prefix: unquote(prefix), autogenerate: true}
      @foreign_key_type MyApp.PrefixedUUID

      @type t :: %__MODULE__{}
    end
  end
end

Hi, I'm Dan Schultzer, I write in this blog, work a lot in Elixir, maintain several open source projects, and help companies streamline their development process