Heron-Centric: Ruminations of a Language Designer
The Cat Type Inference Algorithm
by Christopher Diggins
November 9, 2006

Summary
I've posted an updated version of the Cat manual with a generalized description of the type inference algorithm used in Cat. This post contains an excerpt of the type inference algorithm description.

Here is an excerpt of the current Cat manual as it pertains to the current Cat release, version 0.9.7. I have to apologize in advance, it is a bit dense, but it should be useful if you are interested in the nitty gritty of how type inference can work in a stack-based language.

Type Checking and Type Inference

This section provides a description of the algorithm for checking well-typedness and inferring types of Cat functions.

The primary and auxiliary stacks in Cat can be viewed as a pair of tuples. Every function expresses a mapping from a pair of a tuples to another pair of tuples. We can express the consumption and production of a function type using the following short hand notation, where index 0 represents the type at the top of the stack.
    primary consumption   = PC = (pc0, pc1, ..., pcN)
    auxiliary consumption = AC = (ac0, ac1, ..., acN)
    primary production    = PP = (pp0, pc1, ..., pcN)
    auxiliary production  = AP = (ap0, ap1, ..., apN)
  
From this notation the types of two functions can be expressed as:
    f : (PC(f), AC(f)) -> (PP(f), AP(f))
    g : (PC(g), AC(g)) -> (PP(g), AP(g))
  
The superset relation of tuples as follows:
    (a0, a1, ..., aN) >= (b0, b1, ..., bM)
      iff (N > M) and forall i=0 to M (ai = b0)
  
A Cat function is well typed according to the following function definition:
    welltyped(f . g) iff
      PC(g) >= PP(f) or PP(f) >= PC(g)
  
Consider the following definitions of addition and subtraction:
    (a0, a1, ..., aN) + (b0, b1, ..., bM)
      = (a0, a1, ..., aN, b0, b1, ..., bM)

    (a0, a1, ..., aN) - (b0, b1, ..., bM)
      = M > N | (b[N+1], b[N+2], ..., bM)
        M <= N | ()
  
Using these definition we can define the consumption and production of the compose of the two functions as follows:
    PC(f . g) = PC(f) + (PC(g) - PP(f))
    AC(f . g) = AC(f) + (AC(g) - AP(f))
    PP(f . g) = PP(g) + (PP(f) - PC(g))
    AP(f . g) = AP(g) + (AP(f) - AC(g))
  
Putting this together the type of two consecutive functions is:
    f . g : (PC(f) + (PC(g) - PP(f)), AC(f) + (AC(g) - AP(f)))
      -> (PP(g) + (PP(f) - PC(g)), AP(g) + (AP(f) - AC(g)))
  

That is all there is to it. I know it's a bit dense, but hopefully it can still be of some use. If you have some suggestions on how I can make it more clear, and use more standardized language or notation I would be very appreciative. Even if you just point out the confusing bits, that would be very helpful!

Talk Back!

Have an opinion? Readers have already posted 6 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Christopher Diggins adds a new entry to his weblog, subscribe to his RSS feed.

Digg |

del.icio.us |

About the Blogger

Christopher Diggins is a software developer and freelance writer. Christopher loves programming, but is eternally frustrated by the shortcomings of modern programming languages. As would any reasonable person in his shoes, he decided to quit his day job to write his own ( www.heron-language.com ). Christopher is the co-author of the C++ Cookbook from O'Reilly. Christopher can be reached through his home page at www.cdiggins.com.


	Web Artima.com