Normalizing Lambda Terms

In this post we’ll look at how to normalize lambda terms in a purely syntactic way (i.e. without appealing to neat but fancy techniques like normalization by evaluation).

A lambda term is either:

data Term
  = Var Int
  | Abs Term
  | App Term Term
  deriving (Show)

Normalizing a term is then a process of repeatedly applying “reductions” wherever we can. A reduction is possible wherever an abstraction is the operator of an application (a so-called “beta redex”). In particular, when an abstraction is applied to something, we replace the entire application with the result of substituting the operand into the abstraction’s body wherever the bound variable occurs. In exceedingly pithy formal terms:

(\.b) t ==> b[0->t]

That is, the term (\.b) t is reducible to b, where all occurrences of the bound variable with index 0 (the one bound by this abstraction) in b are replaced with the operand term t.

So reduction is just a process of repeated substitution.

Substitution

Lambda terms are just fancy trees, so replacing occurrences of bound variables in a “host term” with another term requires recursing through the host term, doing the appropriate thing with each variant:

Here’s what that looks like:

-- | Replace all bound variables having the provided index with the indicated
-- term inside a host term.
--
-- That is, `subst i sub host` is morally equivalent to `host[i->sub]`.
subst :: Int -> Term -> Term -> Term
subst i sub t@(Var j)
  | i == j = sub
  | otherwise = t
subst i sub (Abs body) = Abs $ subst (i + 1) sub body
subst i sub (App rator rand) =
  let rator' = subst i sub rator
      rand' = subst i sub rand
   in App rator' rand'

Reduction

All of the action in the lambda calculus occurs when abstractions are applied, a situation referred to as a beta redex. Here’s a reduce function that recursively scours a lambda term, reducing beta redexes:

-- | Reduce beta redexes in the provided term.
reduce :: Term -> Term
reduce t@(Var _) = t
reduce (Abs body) = Abs (reduce body)
reduce (App rator rand) = case reduce rator of
  (Abs body) -> reduce $ subst 0 rand body
  op -> App op $ reduce rand

The only interesting case is the final one, in which we replace a beta redex with the appropriate substitution. Note that we call reduce on the result of the substitution, since it might have created new redexes to reduce.

Unfortunately it’s incorrect, and for a subtle reason. The problem is that we’re stripping off a binder without adjusting the indices of bound variables within the body of the operator abstraction. For instance, consider this somewhat contrived term:

\a.(\b.(b a) a)

The inner abstraction (\b.(b a)) can be applied to the variable a, resulting in:

\a.(a a)

However, reduce returns a different (and nonsensical) term:

bad = reduce $ Abs (App (Abs (App (Var 0) (Var 1))) (Var 0))
-- Abs (App (Var 0) (Var 1))
--                       ^ Should be 0

Luckily the fix is straightforward: after substituting, we need to decrement the indices of some bound variables within the result. Which bound variables? Only those that “point” to binders outside of the current term.

So instead of:

(\.b) t ==> b[0->t]

we’ll do:

(\.b) t ==> shift b[0->t]

We’ll implement a shift function in a second that does just this, but before we do so we need to think a little bit harder about our fix. Naively shifting the result of the substitution will fix the problem we’ve identified above, but it also introduces a new bug: it might incorrectly downshift indices of bound variables in the operand term.

What we need to do instead is preemptively upshift indices in the operand term, perform the substitution, then downshift the indices in the resulting term:

(\.b) t ==> downshift b[0 -> upshift t]

A single shift function can do the work of both downshift and upshift so long as it’s passed an integer argument indicating how much to shift by:

-- | Shift the indices of bound variables bound by binders outside of the
-- current term by the provided amount.
shift :: Int -> Term -> Term
shift n = shift' 0
  where
    -- bc = "binder count"
    shift' bc t@(Var i)
      | i >= bc = Var (i + n)
      | otherwise = t
    shift' bc (Abs body) = Abs $ shift' (bc + 1) body
    shift' bc (App rator rand) =
      let rator' = shift' bc rator
          rand' = shift' bc rand
       in App rator' rand'

With shift in hand, we can correct reduce:

reduce :: Term -> Term
-- ...
reduce (App rator rand) = case reduce rator of
  (Abs body) ->
    let arg = shift 1 rand
        subbed = subst 0 arg body
     in reduce $ shift (-1) subbed
  op -> App op $ reduce rand

And the indexing problem is fixed!

bad = reduce $ Abs (App (Abs (App (Var 0) (Var 1))) (Var 0))
-- Abs (App (Var 0) (Var 0))
--                       ^ Looks good!

There’s actually another lingering bug related to incorrect indices that’s not addressed by our changes. Specifically, subst needs to also keep track of how many binders have been crossed, and shift the substituted expression accordingly:

subst :: Int -> Term -> Term -> Term
subst i sub = subst' 0
  where
    subst' bc t@(Var j)
      | i + bc == j = shift bc sub
      | otherwise = t
    subst' bc (Abs body) = Abs $ subst' (bc + 1) body
    subst' bc (App rator rand) =
      let rator' = subst' bc rator
          rand' = subst' bc rand
       in App rator' rand'

A Stronger Encoding

As it stands, reduce makes a fairly weak claim:

reduce :: Term -> Term

namely, that it transforms a term into another term. This admits, among other things, the following incorrect definition:

reduce = id

We can do better.

Specifically, reducing a term produces another term in which no beta redexes occur, and this is the kind of thing we can express in a datatype:

data ReducedTerm
  = ReducedAbs ReducedTerm
  | Stuck StuckTerm
  deriving (Show)

data StuckTerm
  = StuckVar Int
  | StuckApp StuckTerm ReducedTerm
  deriving (Show)

The key observation is that it’s not possible to concoct a ReducedTerm containing a beta redex, since any term in the operator position of an application is “stuck” (either a variable, or an application that is itself stuck). Of course, every ReducedTerm is equivalent to some “regular” Term:

-- | Convert a reduced term to a term (the reduced terms are "embedded" in the
-- set of terms).
toTerm :: ReducedTerm -> Term
toTerm (ReducedAbs body) = Abs $ toTerm body
toTerm (Stuck s) = unStuck s
  where
    unStuck (StuckVar i) = Var i
    unStuck (StuckApp rator rand) =
      let rator' = unStuck rator
          rand' = toTerm rand
       in App rator' rand'

With ReducedTerm in hand, we can sharpen the type of reduce:

reduce :: Term -> ReducedTerm
reduce (Var i) = Stuck $ StuckVar i
reduce (Abs body) = ReducedAbs $ reduce body
reduce (App rator rand) = case reduce rator of
  (ReducedAbs body) ->
    let arg = shift 1 rand
        subbed = subst 0 arg (toTerm body)
     in reduce $ shift (-1) subbed
  (Stuck op) -> Stuck $ StuckApp op $ reduce rand

A Few Examples

We end this post with a few classics examples. First, the successor of the successor of the successor of zero is three:

suc = Abs $ Abs $ Abs $ App (Var 1) (App (App (Var 2) (Var 1)) (Var 0))

zero = Abs $ Abs (Var 0)

three = App suc $ App suc $ App suc zero

toTerm $ reduce three
-- Abs (Abs (App (Var 1) (App (Var 1) (App (Var 1) (Var 0)))))

De Bruijn indices are really a pain to work with by hand…

We can also construct pairs and project their components:

cons = Abs $ Abs $ Abs $ App (App (Var 0) (Var 2)) (Var 1)

fst' = Abs $ App (Var 0) (Abs $ Abs (Var 1))

snd' = Abs $ App (Var 0) (Abs $ Abs (Var 0))

pair = App (App cons zero) three

toTerm $ reduce $ App fst' pair
-- Abs (Abs (Var 0))

toTerm $ reduce $ App snd' pair
-- Abs (Abs (App (Var 1) (App (Var 1) (App (Var 1) (Var 0)))))

(I’ve applied toTerm to the output of reduce to make the results a little easier to read.)

Finally, we should check that the infamous Omega combinator can’t be normalized:

omega = Abs (App (Var 0) (Var 0))
reduce $ App omega omega
-- waiting...
-- ...
-- ...
-- Ctrl^C

Well that’s enough fun for now. In future posts we’ll see how to extend this system along a number of different dimensions.