Published on

Hierarchical tree depth filter in Ecto

I was dealing with an interesting problem working on a hierarchical model. What if we want to exclude all nodes to a certain depth? We have a tree looking as seen below and want to exclude all nodes at depth 1 (which would be nodes D, F, and C):

       A
      / \
     B   C
    / \
   D   E
        \
         F

Our schema references the parent node:

defmodule MyApp.Tree.Node do
  use Ecto.Schema

  schema "nodes" do
    belongs_to :parent_node, Node

    timestamps()
  end
end

We’ll use CTE to recursively go from the leaf nodes up the branch until we’ve reached the depth we want to exclude. This will result in a list of all the nodes we must exclude in our query.

@doc """
This is a CTE that will fetch the node ids for n depth from leaf nodes.
"""
def filter_node_depth_query(depth) when depth > 0 do
  # Select all parent node ids
  parent_node_ids_query =
    Node
    |> where([n], not is_nil(n.parent_node_id))
    |> select([n], n.parent_node_id)

  # Select all leaf nodes as starting depth
  leaf_nodes_query =
    Node
    |> where([n], n.id not in subquery(parent_node_ids_query))
    |> select([n], %{depth: 1, id: n.id, parent_node_id: n.parent_node_id})

  nodes_recursion_query =
    Node
    |> join(:inner, [pn], n in "node_tree", on: n.parent_node_id == pn.id)
    |> select([pn, n], %{depth: n.depth + 1, id: pn.id, parent_node_id: pn.parent_node_id})

    # Cut off at n depth
    |> where([pn, n], n.depth < ^n)

  node_tree_query = union_all(leaf_nodes_query, ^nodes_recursion_query)

  {"node_tree", Node}
  |> recursive_ctes(true)
  |> with_cte("node_tree", as: ^node_tree_query)
  |> select([n], n.id)
end

We first select all the parent node IDs from our nodes table and then use that subquery to select all the nodes that don’t exist in the list. This is how we get all leaf nodes at starting depth 1 because these nodes don’t exist in any node’s parent_node_id.

Then we recursively go from the list of leaf nodes up the branch by joining the id with parent_node_id, increasing the depth by 1 each time.

Finally, we cut off when we have reached the desired depth.

This can be used in our query to filter out the resulting list of nodes:

node_ids_query = filter_node_depth_query(depth)

Node
|> where([n], n.id not in subquery(node_ids_query))
|> Repo.all()

One problem with this CTE is that the recursion will be greedy. To demonstrate let’s exclude all nodes with depth 2:

       A
      / \
     B   C
    / \
   D   E
        \
         F

This will exclude the nodes D, E, F, and C which is what we want, but also nodes B and A! In my case, it was easily solved because in my tree structure there there’s only one canonical branch. I just had to filter by canonical and non-canonical branches:

nodes_recursion_query =
  Node
  |> join(:inner, [pn], n in "node_tree", on: n.parent_node_id == pn.id)
  |> select([pn, n], %{depth: n.depth + 1, id: pn.id, parent_node_id: pn.parent_node_id, canonical: pn.canonical})

  # Parent node should match the node by canonical branch or non canonical branches
  |> where([pn, n], pn.canonical and n.canonical)
  |> or_where([pn, n], not pn.canonical and not n.canonical)

  # Cut off at n depth
  |> where([pn, n], n.depth < ^n)

If you don’t have this distinction you’ll have to go through the result set removing the nodes that have child nodes that don’t exist in the result set.

Hi, I'm Dan Schultzer, I write in this blog, work a lot in Elixir, maintain several open source projects, and help companies streamline their development process