Ratchet Programming

May 26, 2021

Machine learning companies are already moving to persistent memoization. Right now they're doing it on a large scale with things like complete Docker images. I think this should be done at a small scale with individual macros built into your programming language. Naturally, this means doing your machine learning in Lisp.

I explored what the interface to such a memoization system ought to look like in Place-Based Programming part 1 and part 2. But granular persisten memoization is only the first step of taking machine learning to the next level. The next step to statistical metaprogramming is to combine our memoization system with Bayesian search and ensemble learning.

Centralized Bayesian Ensemble Learning

There are many methods of classification: random forests, support vector machines, neural networks. Suppose you have $n$ classification algorithms to choose from. If you pick one randomly then the odds are $\frac{n-1}n$ you will pick a less-than-optimal algorithm. If you do a naïve ensemble search you'll consume $\log n$ entropy from your dataset.

With transfer learning, you can spread out the $\log n$ price tag across many different problems. From a user interface standpoint, the way to do this is to create a single classification function which (to the programmer) does an ensemble classification. The more problems this function is applied to the better it gets. Does this look like a winner-take-all economy? Yes. Yes it does.

At first our classification function need only adjust its prior probabilities that each algorithm is the best one to use. As it collects more data it can infer more complex relationships. Perhaps the best algorithm to use depends on your dataset size. You can find the equations for how data-expensive this is in Hypothesis Space Entropy.

I call this "ratchet programming" because the data efficiency of functions like classification increases monotonically over time.