April 2003
(This article is derived from a keynote talk given at PyCon 2003.)
It's hard to predict what life will be like in a hundred years. There are on
ly a few things we can say with certainty. We know that everyone will drive
flying cars, that zoning laws will be relaxed to allow buildings hundreds of
stories tall, that it will be dark most of the time, and that women will al
l be trained in the martial arts. Today I want to zoom in on one detail of t
his picture. What kind of programming language will they use to write the so
ftware controlling those flying cars?
This is worth thinking about not so much because we'll actually get to use t
hese languages as because, if we're lucky, we'll use languages on the path f
rom this point to that.
I think that, like species, languages will form evolutionary trees, with dea
d-ends branching off all over. We can see this happening already. Cobol, for
all its sometime popularity, does not seem to have any intellectual descend
ants. It is an evolutionary dead-end-- a Neanderthal language.
I predict a similar fate for Java. People sometimes send me mail saying, "Ho
w can you say that Java won't turn out to be a successful language? It's alr
eady a successful language." And I admit that it is, if you measure success
by shelf space taken up by books on it (particularly individual books on it)
, or by the number of undergrads who believe they have to learn it to get a
job. When I say Java won't turn out to be a successful language, I mean some
thing more specific: that Java will turn out to be an evolutionary dead-end,
like Cobol.
This is just a guess. I may be wrong. My point here is not to diss Java, but
to raise the issue of evolutionary trees and get people asking, where on th
e tree is language X? The reason to ask this question isn't just so that our
ghosts can say, in a hundred years, I told you so. It's because staying clo
se to the main branches is a useful heuristic for finding languages that wil
l be good to program in now.
At any given time, you're probably happiest on the main branches of an evolu
tionary tree. Even when there were still plenty of Neanderthals, it must hav
e sucked to be one. The Cro-Magnons would have been constantly coming over a
nd beating you up and stealing your food.
The reason I want to know what languages will be like in a hundred years is
so that I know what branch of the tree to bet on now.
The evolution of languages differs from the evolution of species because bra
nches can converge. The Fortran branch, for example, seems to be merging wit
h the descendants of Algol. In theory this is possible for species too, but
it's so unlikely that it has probably never happened.
Convergence is more likely for languages partly because the space of possibi
lities is smaller, and partly because mutations are not random. Language des
igners deliberately incorporate ideas from other languages.
It's especially useful for language designers to think about where the evolu
tion of programming languages is likely to lead, because they can steer acco
rdingly. In that case, "stay on a main branch" becomes more than a way to ch
oose a good language. It becomes a heuristic for making the right decisions
about language design.
Any programming language can be divided into two parts: some set of fundamen
tal operators that play the role of axioms, and the rest of the language, wh
ich could in principle be written in terms of these fundamental operators.
I think the fundamental operators are the most important factor in a languag
e's long term survival. The rest you can change. It's like the rule that in
buying a house you should consider location first of all. Everything else yo
u can fix later, but you can't fix the location.
I think it's important not just that the axioms be well chosen, but that the
re be few of them. Mathematicians have always felt this way about axioms-- t
he fewer, the better-- and I think they're onto something.
At the very least, it has to be a useful exercise to look closely at the cor
e of a language to see if there are any axioms that could be weeded out. I'v
e found in my long career as a slob that cruft breeds cruft, and I've seen t
his happen in software as well as under beds and in the corners of rooms.
I have a hunch that the main branches of the evolutionary tree pass through
the languages that have the smallest, cleanest cores. The more of a language
you can write in itself, the better.
Of course, I'm making a big assumption in even asking what programming langu
ages will be like in a hundred years. Will we even be writing programs in a
hundred years? Won't we just tell computers what we want them to do?
There hasn't been a lot of progress in that department so far. My guess is t
hat a hundred years from now people will still tell computers what to do usi
ng programs we would recognize as such. There may be tasks that we solve now
by writing programs and which in a hundred years you won't have to write pr
ograms to solve, but I think there will still be a good deal of programming
of the type that we do today.
It may seem presumptuous to think anyone can predict what any technology wil
l look like in a hundred years. But remember that we already have almost fif
ty years of history behind us. Looking forward a hundred years is a graspabl
e idea when we consider how slowly languages have evolved in the past fifty.
Languages evolve slowly because they're not really technologies. Languages a
re notation. A program is a formal description of the problem you want a com
puter to solve for you. So the rate of evolution in programming languages is
more like the rate of evolution in mathematical notation than, say, transpo
rtation or communications. Mathematical notation does evolve, but not with t
he giant leaps you see in technology.
Whatever computers are made of in a hundred years, it seems safe to predict
they will be much faster than they are now. If Moore's Law continues to put
out, they will be 74 quintillion (73,786,976,294,838,206,464) times faster.
That's kind of hard to imagine. And indeed, the most likely prediction in th
e speed department may be that Moore's Law will stop working. Anything that
is supposed to double every eighteen months seems likely to run up against s
ome kind of fundamental limit eventually. But I have no trouble believing th
at computers will be very much faster. Even if they only end up being a palt
ry million times faster, that should change the ground rules for programming
languages substantially. Among other things, there will be more room for wh
at would now be considered slow languages, meaning languages that don't yiel
d very efficient code.
And yet some applications will still demand speed. Some of the problems we w
ant to solve with computers are created by computers; for example, the rate
at which you have to process video images depends on the rate at which anoth
er computer can generate them. And there is another class of problems which
inherently have an unlimited capacity to soak up cycles: image rendering, cr
yptography, simulations.
If some applications can be increasingly inefficient while others continue t
o demand all the speed the hardware can deliver, faster computers will mean
that languages have to cover an ever wider range of efficiencies. We've seen
this happening already. Current implementations of some popular new languag
es are shockingly wasteful by the standards of previous decades.
This isn't just something that happens with programming languages. It's a ge
neral historical trend. As technologies improve, each generation can do thin
gs that the previous generation would have considered wasteful. People thirt
y years ago would be astonished at how casually we make long distance phone
calls. People a hundred years ago would be even more astonished that a packa
ge would one day travel from Boston to New York via Memphis.
I can already tell you what's going to happen to all those extra cycles that
faster hardware is going to give us in the next hundred years. They're near
ly all going to be wasted.
I learned to program when computer power was scarce. I can remember taking a
ll the spaces out of my Basic programs so they would fit into the memory of
a 4K TRS-80. The thought of all this stupendously inefficient software burni
ng up cycles doing the same thing over and over seems kind of gross to me. B
ut I think my intuitions here are wrong. I'm like someone who grew up poor,
and can't bear to spend money even for something important, like going to th
e doctor.
Some kinds of waste really are disgusting. SUVs, for example, would arguably
be gross even if they ran on a fuel which would never run out and generated
no pollution. SUVs are gross because they're the solution to a gross proble
m. (How to make minivans look more masculine.) But not all waste is bad. Now
that we have the infrastructure to support it, counting the minutes of your
long-distance calls starts to seem niggling. If you have the resources, it'
s more elegant to think of all phone calls as one kind of thing, no matter w
here the other person is.
There's good waste, and bad waste. I'm interested in good waste-- the kind w
here, by spending more, we can get simpler designs. How will we take advanta
ge of the opportunities to waste cycles that we'll get from new, faster hard
ware?
The desire for speed is so deeply engrained in us, with our puny computers,
that it will take a conscious effort to overcome it. In language design, we
should be consciously seeking out situations where we can trade efficiency f
or even the smallest increase in convenience.
Most data structures exist because of speed. For example, many languages tod
ay have both strings and lists. Semantically, strings are more or less a sub
set of lists in which the elements are characters. So why do you need a sepa
rate data type? You don't, really. Strings only exist for efficiency. But it
's lame to clutter up the semantics of the language with hacks to make progr
ams run faster. Having strings in a language seems to be a case of premature
optimization.
If we think of the core of a language as a set of axioms, surely it's gross
to have additional axioms that add no expressive power, simply for the sake
of efficiency. Efficiency is important, but I don't think that's the right w
ay to get it.
The right way to solve that problem, I think, is to separate the meaning of
a program from the implementation details. Instead of having both lists and
strings, have just lists, with some way to give the compiler optimization ad
vice that will allow it to lay out strings as contiguous bytes if necessary.
Since speed doesn't matter in most of a program, you won't ordinarily need t
o bother with this sort of micromanagement. This will be more and more true
as computers get faster.
Saying less about implementation should also make programs more flexible. Sp
ecifications change while a program is being written, and this is not only i
nevitable, but desirable.
The word "essay" comes from the French verb "essayer", which means "to try".
An essay, in the original sense, is something you write to try to figure so
mething out. This happens in software too. I think some of the best programs
were essays, in the sense that the authors didn't know when they started ex
actly what they were trying to write.
Lisp hackers already know about the value of being flexible with data struct
ures. We tend to write the first version of a program so that it does everyt
hing with lists. These initial versions can be so shockingly inefficient tha
t it takes a conscious effort not to think about what they're doing, just as
, for me at least, eating a steak requires a conscious effort not to think w
here it came from.
What programmers in a hundred years will be looking for, most of all, is a l
anguage where you can throw together an unbelievably inefficient version 1 o
f a program with the least possible effort. At least, that's how we'd descri
be it in present-day terms. What they'll say is that they want a language th
at's easy to program in.
Inefficient software isn't gross. What's gross is a language that makes prog
rammers do needless work. Wasting programmer time is the true inefficiency,
not wasting machine time. This will become ever more clear as computers get
faster.
I think getting rid of strings is already something we could bear to think a
bout. We did it in Arc, and it seems to be a win; some operations that would
be awkward to describe as regular expressions can be described easily as re
cursive functions.
How far will this flattening of data structures go? I can think of possibili
ties that shock even me, with my conscientiously broadened mind. Will we get
rid of arrays, for example? After all, they're just a subset of hash tables
where the keys are vectors of integers. Will we replace hash tables themsel
ves with lists?
There are more shocking prospects even than that. The Lisp that McCarthy des
cribed in 1960, for example, didn't have numbers. Logically, you don't need
to have a separate notion of numbers, because you can represent them as list
s: the integer n could be represented as a list of n elements. You can do ma
th this way. It's just unbearably inefficient.
No one actually proposed implementing numbers as lists in practice. In fact,
McCarthy's 1960 paper was not, at the time, intended to be implemented at a
ll. It was a theoretical exercise, an attempt to create a more elegant alter
native to the Turing Machine. When someone did, unexpectedly, take this pape
r and translate it into a working Lisp interpreter, numbers certainly weren'
t represented as lists; they were represented in binary, as in every other l
anguage.
Could a programming language go so far as to get rid of numbers as a fundame
ntal data type? I ask this not so much as a serious question as as a way to
play chicken with the future. It's like the hypothetical case of an irresist
ible force meeting an immovable object-- here, an unimaginably inefficient i
mplementation meeting unimaginably great resources. I don't see why not. The
future is pretty long. If there's something we can do to decrease the numbe
r of axioms in the core language, that would seem to be the side to bet on a
s t approaches infinity. If the idea still seems unbearable in a hundred yea
rs, maybe it won't in a thousand.
Just to be clear about this, I'm not proposing that all numerical calculatio
ns would actually be carried out using lists. I'm proposing that the core la
nguage, prior to any additional notations about implementation, be defined t
his way. In practice any program that wanted to do any amount of math would
probably represent numbers in binary, but this would be an optimization, not
part of the core language semantics.
Another way to burn up cycles is to have many layers of software between the
application and the hardware. This too is a trend we see happening already:
many recent languages are compiled into byte code. Bill Woods once told me
that, as a rule of thumb, each layer of interpretation costs a factor of 10
in speed. This extra cost buys you flexibility.
The very first version of Arc was an extreme case of this sort of multi-leve
l slowness, with corresponding benefits. It was a classic "metacircular" int
erpreter written on top of Common Lisp, with a definite family resemblance t
o the eval function defined in McCarthy's original Lisp paper. The whole thi
ng was only a couple hundred lines of code, so it was very easy to understan
d and change. The Common Lisp we used, CLisp, itself runs on top of a byte c
ode interpreter. So here we had two levels of interpretation, one of them (t
he top one) shockingly inefficient, and the language was usable. Barely usab
le, I admit, but usable.
Writing software as multiple layers is a powerful technique even within appl
ications. Bottom-up programming means writing a program as a series of layer
s, each of which serves as a language for the one above. This approach tends
to yield smaller, more flexible programs. It's also the best route to that
holy grail, reusability. A language is by definition reusable. The more of y
our application you can push down into a language for writing that type of a
pplication, the more of your software will be reusable.
Somehow the idea of reusability got attached to object-oriented programming
in the 1980s, and no amount of evidence to the contrary seems to be able to
shake it free. But although some object-oriented software is reusable, what
makes it reusable is its bottom-upness, not its object-orientedness. Conside
r libraries: they're reusable because they're language, whether they're writ
ten in an object-oriented style or not.
I don't predict the demise of object-oriented programming, by the way. Thoug
h I don't think it has much to offer good programmers, except in certain spe
cialized domains, it is irresistible to large organizations. Object-oriented
programming offers a sustainable way to write spaghetti code. It lets you a
ccrete programs as a series of patches. Large organizations always tend to d
evelop software this way, and I expect this to be as true in a hundred years
as it is today.
As long as we're talking about the future, we had better talk about parallel
computation, because that's where this idea seems to live. That is, no matt
er when you're talking, parallel computation seems to be something that is g
oing to happen in the future.
Will the future ever catch up with it? People have been talking about parall
el computation as something imminent for at least 20 years, and it hasn't af
fected programming practice much so far. Or hasn't it? Already chip designer
s have to think about it, and so must people trying to write systems softwar
e on multi-cpu computers.
The real question is, how far up the ladder of abstraction will parallelism
go? In a hundred years will it affect even application programmers? Or will
it be something that compiler writers think about, but which is usually invi
sible in the source code of applications?
One thing that does seem likely is that most opportunities for parallelism w
ill be wasted. This is a special case of my more general prediction that mos
t of the extra computer power we're given will go to waste. I expect that, a
s with the stupendous speed of the underlying hardware, parallelism will be
something that is available if you ask for it explicitly, but ordinarily not
used. This implies that the kind of parallelism we have in a hundred years
will not, except in special applications, be massive parallelism. I expect f
or ordinary programmers it will be more like being able to fork off processe
s that all end up running in parallel.
And this will, like asking for specific implementations of data structures,
be something that you do fairly late in the life of a program, when you try
to optimize it. Version 1s will ordinarily ignore any advantages to be got f
rom parallel computation, just as they will ignore advantages to be got from
specific representations of data.
Except in special kinds of applications, parallelism won't pervade the progr
ams that are written in a hundred years. It would be premature optimization
if it did.
How many programming languages will there be in a hundred years? There seem
to be a huge number of new programming languages lately. Part of the reason
is that faster hardware has allowed programmers to make different tradeoffs
between speed and convenience, depending on the application. If this is a re
al trend, the hardware we'll have in a hundred years should only increase it
.
And yet there may be only a few widely-used languages in a hundred years. Pa
rt of the reason I say this is optimism: it seems that, if you did a really
good job, you could make a language that was ideal for writing a slow versio
n 1, and yet with the right optimization advice to the compiler, would also
yield very fast code when necessary. So, since I'm optimistic, I'm going to
predict that despite the huge gap they'll have between acceptable and maxima
l efficiency, programmers in a hundred years will have languages that can sp
an most of it.
As this gap widens, profilers will become increasingly important. Little att
ention is paid to profiling now. Many people still seem to believe that the
way to get fast applications is to write compilers that generate fast code.
As the gap between acceptable and maximal performance widens, it will become
increasingly clear that the way to get fast applications is to have a good
guide from one to the other.
When I say there may only be a few languages, I'm not including domain-speci
fic "little languages". I think such embedded languages are a great idea, an
d I expect them to proliferate. But I expect them to be written as thin enou
gh skins that users can see the general-purpose language underneath.
Who will design the languages of the future? One of the most exciting trends
in the last ten years has been the rise of open-source languages like Perl,
Python, and Ruby. Language design is being taken over by hackers. The resul
ts so far are messy, but encouraging. There are some stunningly novel ideas
in Perl, for example. Many are stunningly bad, but that's always true of amb
itious efforts. At its current rate of mutation, God knows what Perl might e
volve into in a hundred years.
It's not true that those who can't do, teach (some of the best hackers I kno
w are professors), but it is true that there are a lot of things that those
who teach can't do. Research imposes constraining caste restrictions. In any
academic field there are topics that are ok to work on and others that aren
't. Unfortunately the distinction between acceptable and forbidden topics is
usually based on how intellectual the work sounds when described in researc
h papers, rather than how important it is for getting good results. The extr
eme case is probably literature; people studying literature rarely say anyth
ing that would be of the slightest use to those producing it.
Though the situation is better in the sciences, the overlap between the kind
of work you're allowed to do and the kind of work that yields good language
s is distressingly small. (Olin Shivers has grumbled eloquently about this.)
For example, types seem to be an inexhaustible source of research papers, d
espite the fact that static typing seems to preclude true macros-- without w
hich, in my opinion, no language is worth using.
The trend is not merely toward languages being developed as open-source proj
ects rather than "research", but toward languages being designed by the appl
ication programmers who need to use them, rather than by compiler writers. T
his seems a good trend and I expect it to continue.
Unlike physics in a hundred years, which is almost necessarily impossible to
predict, I think it may be possible in principle to design a language now t
hat would appeal to users in a hundred years.
One way to design a language is to just write down the program you'd like to
be able to write, regardless of whether there is a compiler that can transl
ate it or hardware that can run it. When you do this you can assume unlimite
d resources. It seems like we ought to be able to imagine unlimited resource
s as well today as in a hundred years.
What program would one like to write? Whatever is least work. Except not qui
te: whatever would be least work if your ideas about programming weren't alr
eady influenced by the languages you're currently used to. Such influence ca
n be so pervasive that it takes a great effort to overcome it. You'd think i
t would be obvious to creatures as lazy as us how to express a program with
the least effort. In fact, our ideas about what's possible tend to be so lim
ited by whatever language we think in that easier formulations of programs s
eem very surprising. They're something you have to discover, not something y
ou naturally sink into.
One helpful trick here is to use the length of the program as an approximati
on for how much work it is to write. Not the length in characters, of course
, but the length in distinct syntactic elements-- basically, the size of the
parse tree. It may not be quite true that the shortest program is the least
work to write, but it's close enough that you're better off aiming for the
solid target of brevity than the fuzzy, nearby one of least work. Then the a
lgorithm for language design becomes: look at a program and ask, is there an
y way to write this that's shorter?
In practice, writing programs in an imaginary hundred-year language will wor
k to varying degrees depending on how close you are to the core. Sort routin
es you can write now. But it would be hard to predict now what kinds of libr
aries might be needed in a hundred years. Presumably many libraries will be
for domains that don't even exist yet. If SETI@home works, for example, we'l
l need libraries for communicating with aliens. Unless of course they are su
fficiently advanced that they already communicate in XML.
At the other extreme, I think you might be able to design the core language
today. In fact, some might argue that it was already mostly designed in 1958
.
If the hundred year language were available today, would we want to program
in it? One way to answer this question is to look back. If present-day progr
amming languages had been available in 1960, would anyone have wanted to use
them?
In some ways, the answer is no. Languages today assume infrastructure that d
idn't exist in 1960. For example, a language in which indentation is signifi
cant, like Python, would not work very well on printer terminals. But puttin
g such problems aside-- assuming, for example, that programs were all just w
ritten on paper-- would programmers of the 1960s have liked writing programs
in the languages we use now?
I think so. Some of the less imaginative ones, who had artifacts of early la
nguages built into their ideas of what a program was, might have had trouble
. (How can you manipulate data without doing pointer arithmetic? How can you
implement flow charts without gotos?) But I think the smartest programmers
would have had no trouble making the most of present-day languages, if they'
d had them.
If we had the hundred-year language now, it would at least make a great pseu
docode. What about using it to write software? Since the hundred-year langua
ge will need to generate fast code for some applications, presumably it coul
d generate code efficient enough to run acceptably well on our hardware. We
might have to give more optimization advice than users in a hundred years, b
ut it still might be a net win.
Now we have two ideas that, if you combine them, suggest interesting possibi
lities: (1) the hundred-year language could, in principle, be designed today
, and (2) such a language, if it existed, might be good to program in today.
When you see these ideas laid out like that, it's hard not to think, why no
t try writing the hundred-year language now?
When you're working on language design, I think it is good to have such a ta
rget and to keep it consciously in mind. When you learn to drive, one of the
principles they teach you is to align the car not by lining up the hood wit
h the stripes painted on the road, but by aiming at some point in the distan
ce. Even if all you care about is what happens in the next ten feet, this is
the right answer. I think we can and should do the same thing with programm
ing languages.
Notes
I believe Lisp Machine Lisp was the first language to embody the principle t
hat declarations (except those of dynamic variables) were merely optimizatio
n advice, and would not change the meaning of a correct program. Common Lisp
seems to have been the first to state this explicitly.