creator cover A64m
A64m

A64m 

history of functional programming

8subscribers

8posts

About

I have written 150,000 words about the history of functional programming in my native language here: https://klapaucius.github.io/fphistoryru/. I continue to write regularly, adding several thousand words each month. But the text is merely an invitation to discussion — and the number of people interested in discussing the history of functional programming is limited. Even fewer are willing to do so in my native language.
That’s why I’ve been planning to translate these texts into English for some time now. Writing about the history of functional programming is a fantastic hobby. However, translating — or even just editing machine translations — definitely feels like work. And so, my plans remained just plans.
This is why I ask for your support. Receiving even symbolic financial contributions would make me feel obligated to tackle this task.

Lambda the Ultimate Misunderstanding 5

Core War

The next page in the history of Lisp began with the arrival of a new computer at MIT. IBM machines were replaced by DEC computers. This decision had serious consequences and requires explanation.
How did it come about that MIT began using PDP-6/10 mainframes instead of the much more popular IBM mainframes, especially considering that Lisp development began on IBM machines? One reason was a delay in the release of IBM’s new line [^Chio2001]. The main reason was that relations between MIT and IBM deteriorated due to a patent dispute [^Stee96] [^Chio2001].
MIT claimed the invention of core memory and demanded IBM pay two cents for every bit. Meanwhile, in 1965, the production of core memory cost IBM 1–3 cents per bit, depending on its speed. IBM was so reluctant to pay that it developed several new types of memory to replace core. None of them were practical or were ready to be practical fast enough. But Jay Wright Forrester, to whom IBM demonstrated its developments, did not know this.
And MIT, mistakenly believing that IBM was about to render their patent worthless, agreed in February 1964 to a one-time payment of $13 million ($134 million in 2025). At the time this was a record patent settlement, but had MIT not been frightened by non-working inventions, it could have obtained much more [^Emer91].
Feeling cheated, MIT decided not to buy new IBM machines.
Why not IBM is clear. But why choose the DEC PDP-10? DEC was founded by MIT alumni, who had also been members of the Tech Model Railroad Club. The trains were controlled by a computer, so McCarthy and Minsky used the club to recruit programmers for the AI Laboratory.
It was there that they found Richard D. Greenblatt, the implementer of Lisp on the new DEC machine.
MIT was not yet offended by DEC’s founders. Not yet.
Another reason was probably the same as before with IBM. IBM donated an IBM 704 to MIT in 1957, and DEC donated a PDP-1 in 1960. And just as MIT later bought newer IBM machines, the AI Laboratory acquired the PDP-6, and then its newer, more reliable and more popular version, the PDP-10 [^Chio2001].
Yet another reason was that the PDP-6/10, in the opinion of Lisp programmers, was well suited for implementing Lisp—because a `cons` cell fit into a single word, and because of instructions needed to implement some new ideas for improving memory management in Lisp.

Lambda the Ultimate Misunderstanding 4

THE GARBAGE COLLECTOR HAS BEEN CALLED

> Once we decided on garbage collection, its actual implementation could be postponed, because only toy examples were being done.
> McCarthy J. History of LISP [^McCa78]
Early Lisp programmers did not want to manage memory manually as in IPL, and chose between reference counting and garbage collection. They chose garbage collection because they wanted a `cons` cell to fit into a single word. On the IBM 704, this was 36 bits. An address was 15 bits. Two groups of three bits were too small for a reference count [^McCa78].
This was a rather fortunate coincidence for functional programming, which often creates cyclic references. But this fortunate coincidence was probably not critical for the emergence of functional languages. For example, the authors of SIMULA eventually abandoned reference counting in favor of garbage collection because they conducted experiments and compared the performance of reference counting combined with garbage collection for collecting heap cycles with that of a compacting garbage collector. At least, they recall this [^Dahl78]. But perhaps it was still beneficial that Lisp programmers began working on garbage collection earlier.
Having chosen garbage collection over reference counting, Lisp programmers postponed its implementation because they were writing only toy programs for the time being. But they did not postpone it for long. Dan Edwards wrote a garbage collector in June–July 1959 [^Stoy84].
The first public demonstration of Lisp, in 1960 or 1961, also became the first public demonstration of garbage collection—although this was not planned. It didn't trigger during the rehearsal, but the rehearsal took up enough memory that the garbage collector started working during the demonstration and ran for the rest of the time allotted to it, printing statistics [^McCa78]. It is hard to say why Lisp programmers did not want to show off what was, in Turner's opinion [^Turn12], their greatest achievement. But fortunately, everything ended well and they did show it.
In the printed statistics, the collector is already proudly called a garbage collector, but in the first Lisp paper [^McCa60] they are still shy about calling it that. McCarthy is too serious there, but this is compensated by other Lisp programmers in other papers.
McCarthy writes that the "reclamation" process takes "a few seconds." This is true when memory is 8192 words. But by the time of the collector demonstration, it was already 32768 words, and the process took at least four times "a few" seconds. Not great.

Lambda the Ultimate Misunderstanding 3

LAMBDA does not construct a function

> We suspect that McCarthy simply had forgotten the functional arguments – a mistake of lasting importance.
> Herbert Stoyan. Early LISP history [^Stoy84]
The problem was first discovered in 1959 by James Slagle—the same one who took notes on "Fortran with lambdas." Slagle worked with the first Lisp implementation. In his history of Lisp [^McCa78], McCarthy even gives pseudocode corresponding to Slagle’s code, while debugging which Lisp programmers first realized that something was not working as expected:
```haskell
testr[x, p, f, u] <- if p[x] then f[x] else if atom[x] then u[] else
testr[cdr[x], p, f, λ:testr[car[x], p, f, u]].
```
Yes, as was customary in the first couple of decades of Lisp’s existence, code examples were usually pseudocode. After all, you wouldn’t publish all those parentheses in a paper.
Slagle thought he had written something like this:
```haskell
data Lisp = Atom String | Cons Lisp Lisp deriving (Show, Eq)
testr x p f u | p x = f x
testr (Atom _) p f u = u ()
testr (Cons h t) p f u = testr t p f $ \() -> testr h p f u
```
but in reality he had written something more like this:
```haskell
testr x p f u | p x = f x
testr h@(Atom _) p f u = u h p f u -- <==

Lambda the Ultimate Misunderstanding 2

Not a lot of silly parentheses (for now)

> Without any doubt it is hard to imagine that this program could ever run correctly. However, McCarthy himself had this feeling, too.
> Herbert Stoyan. Early LISP history [^Stoy84]
In the summer of 1958 McCarthy met a student from the University of Michigan who was looking for a programming job—Klim Maling. McCarthy hired Russell and Maling for his new joint project with Minsky in September 1958. As part of this project, the first LISP was designed and implemented.
What did McCarthy end up with after combining all the components of his language for programming artificial intelligence? In the very first report [^McCa58] of the MIT AI Laboratory, we see code with lists, conditional expressions, recursion, FORTRAN with its `goto` and mutability, as well as something resembling higher-order functions—but looking rather strange. The function `maplist` is applied like this:
```
maplist(cdr(J),K,diff(K))
```
and like this:
```
maplist (cdr(J),L,(L = K → diff(K), L ≠ K → copy (L)))
```
which means roughly:
```haskell
maplist (tail j) (\l -> if l == k then diff l else copy l)
```
but the “lambda” is somehow passed to `maplist` via two parameters: the bound variable separately and the body of the lambda separately. How does that even work? You need to know which variables are bound and which are free at the point of function construction.
But such oddities did not last long. In later reports, `map` looks like this:

Lambda the Ultimate Misunderstanding 1

Why was there no functional programming? Because there were no usable implementations and no machines suitable for running such implementations.
And when the past, in which functional programming did not exist, was replaced by a present in which functional programming exists (for now), this present was unevenly distributed.
By choosing a narrower definition of a functional language, we also limited the resources of the resulting research program.
The Edinburgh research program brought together a relatively small number of researchers, so it is quite natural to expect that they used many of the developments of other research programs—developments that, as it happened, were also useful for implementing the functional languages of the Edinburgh program.
And in order to build on a foundation already laid by researchers and implementers, to take advantage of the successes of an earlier and richer programme, there was a suitable candidate: LISP.

John McCarthy

> the FORTRAN idea of writing programs algebraically was attractive.
> McCarthy J. History of LISP [^McCa78]
John McCarthy received a bachelor’s degree in mathematics from the California Institute of Technology in 1948, and in 1949 he accidentally walked into a symposium on cybernetics held at that university. There he heard a talk by von Neumann and became interested in what he would later call “artificial intelligence” [^Stoy79].
McCarthy received his PhD in mathematics from Princeton in 1951 and taught mathematics first at Princeton, then at Stanford, and finally at Dartmouth.
McCarthy was a mathematician, but when, no later than 1955, he began thinking about a programming language for artificial intelligence, he described it not as a mathematical notation like the lambda calculus with extensions. He described it by drawing parallels with natural language. This is a fairly common view of programming languages among their authors—but probably not among the authors of the languages whose history we are writing.
However, in 1955, they didn't know how to make programming languages that looked like natural languages (whatever that meant), nor languages based on mathematical notation. But soon, some ideas appeared.

What is the purpose of this work, and how does it differ from others


> The development of the theory of types that eventually led to the Standard ML type system goes back more than a hundred years.
> David MacQueen, Robert Harper, and John Reppy. "The history of Standard ML." [^MacQ20]
It's not that there's a lack of literature on the history of functional programming. There are already general overviews of functional programming [^Huda89] [^Turn12], as well as histories of specific languages or language families [^Huda07] [^MacQ20] [^McCa78] [^Stee96], and biographies of researchers [^Camp85] [^MacQ14]. So why do we need yet another one?
### Cryptolambdian
When, yet again, gigabytes of memory are insufficient for compiling Haskell code, the question naturally arises: in what _sense_ did functional programming exist in, say, 1973? Unfortunately, materials on the history of functional programming usually don't pay much attention to this. For histories of functional programming, they often contain too little history of _programming_.
We don't aim to define "programming," but in this work, we assume it's the process of writing programs. And our great predecessors, in their works on the history of functional programming, don't particularly like to write about what programs resulted from this process.
Worse, programming historians often venture so far back that it's questionable whether "programming" existed at all in those periods.
For example, MacQueen, Harper, and Reppy start the prehistory of Standard ML in 1874 [^MacQ20]. Obviously, functional programming didn't exist in 1874. It's not so obvious whether it existed in 1974. What programs had been written in functional languages by that year? What implementations were available, and for whom? Until what year could functional programming languages be said to exist only in the same sense as in 1874, as notations in books, notebooks, on chalkboards, and so on?
For example, it's often claimed that ML appeared in 1973, but what exactly happened that year? That year, Robin Milner had the desire to write an interactive theorem prover not in Lisp. Did Milner succeed, and in what year? What part of the interactive theorem prover was written in ML? How many lines of code did it have? Who wrote an interactive theorem prover entirely in ML, and how many years later? And who wrote an interactive theorem prover entirely using ML implementation, in turn, itself written in ML, and when? And how many lines of code were in that implementation? Our work will answer these and other questions.

What was when functional programming was not? What was not?

> Functional programming is an engineering branch of constructive mathematics.  
>              Oleg Nizhnikov.  
We do not aim to define functional programming. However, we are compelled to establish practical boundaries for our study — to choose the history of _what exactly_ we are writing about and to focus on something manageable.  
Many definitions of functional programming are impractical from this perspective. The history of languages with first-class functions today is virtually synonymous with the history of programming languages. Even the few languages that still lack them — such as C++ or Rust — are historically connected in various ways to languages that do have first-class functions. 
Similarly unworkable is the less vast but still unwieldy group of languages historically regarded as functional, which is often the focus of those writing about history of functional programming.  
The main reason this definition of functional languages does not suit us is that this group includes Lisp. And we do not want to write a history of Lisp. Primarily because Lisp’s history is too monumental a topic to allow room for the histories of other languages in this traditional group. Moreover, there is a modern complication that did not exist when the conventional list of functional languages was formed: Lisp is now a typical example of a broad category of languages that were not functional at their inception but became so over time. Just like Java, Lisp had first-class functions added to it. Writing a history of Lisp's functionalization would require comparing it with other languages that underwent the same process — that is, nearly all modern programming languages.  
We would like to set ourselves a realistic goal: to focus on a narrower definition of functional languages and explore their history in greater depth than would be possible if we covered all languages.  
To narrow down our selection, we will add several additional features to first-class functions. Parametric polymorphism and type inference (reconstruction) significantly reduce the scope of our study, though not enough. Therefore, we will also include algebraic data types and pattern matching. This last feature, finally, gives us a reasonably sized family of languages.   
Unfortunately, this combination of features appears rather arbitrary. Furthermore, if algebraic data types and pattern matching ever become as ubiquitous as first-class functions (we wouldn’t bet on it), the problem of an all-encompassing language history will return to haunt our successors — if we have any.   
Subscription levels1

Basic level

$6 per month
Go up