I have been working on a typed relational algebra that allows to abstract and
compose queries that can be verified statically. This means that the type
system ensures all queries are well-formed and you never try to use a
non-existent column or table or an existent one with the wrong types.
There's no way a change to the DB schema can break your code silently.
Query composition is sorely missing in most ORMs; you normally do some query
and get in return a number of objects in some sort of container, which can be
as simple as an array. You cannot compose queries, all you get is rows.
In other words, they abstract at the wrong level, focusing on rows instead of
relations.
By applying ideas learned from the functional paradigm, in the code I have
been writing queries can be abstracted and composed freely. The system ensures
they are sound and will help me fix them if they get out of sync with the DB
schema.
The basic idea is to work with relations instead of rows. There are only a few
operations in relational algebra and they can be easily represented by
functions. Selection is a function of type
selection : relation_type_1 -> relation_type_1
where the result contains a subset of the elements in the initial set. The
cartesian product takes two relations and returns a third one
Here the output contains a subset of the attributes in the initial relation.
The remaining operations are defined in similar ways.
You have probably noted that I have been using different subscript indices in
the above type declarations. That's because the relational operators can be seen
not only as an operation on relations but also on their types. If you have a USER table with
two columns, NAME and PASSWORD, one projection could be
(relation with NAME and PASSWORD) -> (keep NAME) -> (relation with NAME)
That is, the type of a relation carries information about its attributes and
their types (name is a string, age is an integer...).
Since the relational operators operate on whole relations, not on rows, the
SQL query can be done lazily, and several optimizations are possible. The
system "understands" the queries since they're not hidden inside opaque
strings, and can be much smarter than a run-of-the-mill ORM.
Implementation
I make full usage of the OCaml type system to encode schema information in the
types. I'm using phantom types, existential types (encoded using polymorphic
records, rank-1 polymorphism), polymorphic variant types, structural polymorphism, (obviously) parametric polymorphism... But all the magic is encapsulated and none of it is seen by the developer; all you see is queries expressed in a simple language that are checked by the compiler.
Fortunately, OCaml includes a tool that allows you to extend the language
itself, camlp4. I have written a camlp4 extension that supports relational
queries expressed like this:
(* this function takes a relation with a user_age attribute and returns
another with the rows satisfying the predicate *)
let minors users = SELECT [Age < 18] users
The "minors" function only makes sense when the relation given in the user
argument has got an "age" attribute. This is encoded in the inferred type as
follows:
val minors :
([> `Age of int ] as 'a, [> `Age ] as 'b) Relational.relation ->
('a, 'b) Relational.relation
Relational.relation is a polymorphic type, and "minors" only operates on
relations with at least an "age" column of type int. If you try to apply it
to a relation that doesn't have that column, you'll get a very expressive compile-time error like this: