Is a statically-typed full Lisp variant possible? Does it even make sense for something like this to exist? I believe one of a Lisp language's virtues is the simplicity of its definition. Would static typing compromise this core principle?
Yes, it's very possible, although a standard HM-style type system is usually the wrong choice for most idiomatic Lisp/Scheme code. See Typed Racket for a recent language that is a "Full Lisp" (more like Scheme, actually) with static typing.
If all you wanted was a statically typed language that looked like Lisp, you could do that rather easily, by defining an abstract syntax tree that represents your language and then mapping that AST to S-expressions. However, I don't think I would call the result Lisp.
If you want something that actually has Lisp-y characteristics besides the syntax, it's possible to do this with a statically typed language. However, there are many characteristics to Lisp that are difficult to get much useful static typing out of. To illustrate, let's take a look at the list structure itself, called the cons, which forms the primary building block of Lisp.
Calling the cons a list, though (1 2 3)
looks like one, is a bit of a misnomer. For example, it's not at all comparable to a statically typed list, like C++'s std::list
or Haskell's list. Those are single-dimensional linked lists where all the cells are of the same type. Lisp happily allows (1 "abc" #\d 'foo)
. Plus, even if you extend your static-typed lists to cover lists-of-lists, the type of these objects requires that every element of the list is a sublist. How would you represent ((1 2) 3 4)
in them?
Lisp conses form a binary tree, with leaves (atoms) and branches (conses). Further, the leaves of such a tree may contain any atomic (non-cons) Lisp type at all! The flexibility of this structure is what makes Lisp so good at handling symbolic computation, ASTs, and transforming Lisp code itself!
So how would you model such a structure in a statically typed language? Let's try it in Haskell, which has an extremely powerful and precise static type system:
type Symbol = String
data Atom = ASymbol Symbol | AInt Int | AString String | Nil
data Cons = CCons Cons Cons
| CAtom Atom
Your first problem is going to be the scope of the Atom type. Clearly, we haven't picked an Atom type of sufficient flexibility to cover all the types of objects we want to sling around in conses. Instead of trying to extend the Atom data structure as listed above (which you can clearly see is brittle), let's say we had a magical type class Atomic
that distinguished all the types we wanted to make atomic. Then we might try:
class Atomic a where ?????
data Atomic a => Cons a = CCons Cons Cons
| CAtom a
But this won't work because it requires all atoms in the tree to be of the same type. We want them to be able to differ from leaf to leaf. A better approach requires the use of Haskell's existential quantifiers:
class Atomic a where ?????
data Cons = CCons Cons Cons
| forall a. Atomic a => CAtom a
But now you come to the crux of the matter. What can you do with atoms in this kind of structure? What structure do they have in common that could be modeled with Atomic a
? What level of type safety are you guaranteed with such a type? Note we haven't added any functions to our type class, and there's a good reason: the atoms share nothing in common in Lisp. Their supertype in Lisp is simply called t
(i.e. top).
In order to use them, you'd have to come up with mechanisms to dynamically coerce the value of an atom to something you can actually use. And at that point, you've basically implemented a dynamically typed subsystem within your statically typed language! (One cannot help but note a possible corollary to Greenspun's Tenth Rule of Programming.)
Note that Haskell provides support for just such a dynamic subsystem with an Obj
type, used in conjunction with a Dynamic
type and a Typeable class to replace our Atomic
class, that allow arbitrary values to be stored with their types, and explicit coercion back from those types. That's the kind of system you'd need to use to work with Lisp cons structures in their full generality.
What you can also do is go the other way, and embed a statically typed subsystem within an essentially dynamically typed language. This allows you the benefit of static type checking for the parts of your program that can take advantage of more stringent type requirements. This seems to be the approach taken in CMUCL's limited form of precise type checking, for example.
Finally, there's the possibility of having two separate subsystems, dynamically and statically typed, that use contract-style programming to help navigate the transition between the two. That way the language can accommodate usages of Lisp where static type checking would be more of a hindrance than a help, as well as uses where static type checking would be advantageous. This is the approach taken by Typed Racket, as you will see from the comments that follow.
(Listof Integer)
and (Listof Any)
. Obviously, you'd suspect the latter to be useless because you know nothing about the type, but in TR, you can later use (if (integer? x) ...)
and the system will know that x
is an Integer in the 1st branch.
dynamic
types are getting to be popular in static languages as a kind of a workaround to get some of the benefits of dynamically typed languages, with the usual tradeoff of these values being wrapped in a way that makes the types identifiable. But here too typed racket is doing a very good job at making it convenient within the language -- the type checker uses occurrences of predicates to know more about the types. For example, see the typed example on the racket page and see how string?
"reduces" a list of strings and numbers to a list of strings.
My answer, without a high degree of confidence is probably. If you look at a language like SML, for example, and compare it with Lisp, the functional core of each is almost identical. As a result, it doesn't seem like you would have much trouble applying some kind of static typing to the core of Lisp (function application and primitive values).
Your question does say full though, and where I see some of the problem coming in is the code-as-data approach. Types exist at a more abstract level than expressions. Lisp doesn't have this distinction - everything is "flat" in structure. If we consider some expression E : T (where T is some representation of its type), and then we consider this expression as being plain ol' data, then what exactly is the type of T here? Well, it's a kind! A kind is a higher, order type, so let's just go ahead and say something about that in our code:
E : T :: K
You might see where I'm going with this. I'm sure by separating out the type information from the code it would be possible to avoid this kind of self-referentiality of types, however that would make the types not very "lisp" in their flavour. There are probably many ways around this, though it's not obvious to me which would be the best one.
EDIT: Oh, so with a bit of googling, I found Qi, which appears to be very similar to Lisp except that it's statically typed. Perhaps it's a good place to start to see where they made changes to get the static typing in there.
Dylan: Extending Dylan's type system for better type inference and error detection
Success story sharing
Sexpr
.coerce :: a->b
in terms of eval. Where's the type safety?eval
you need to test the result to see what comes out, which is nothing new in Typed Racked (same deal as a function that takes a union type ofString
andNumber
). An implicit way to see that this can be done is the fact that you can write and use a dynamically typed language in an HM-statically-typed language.