Origin Story
Right now, I am deeply immersed in the mechanics of machine learning — the math, the programming, the hardware, everything. It’s interesting that at my age I should be just diving into this sort of thing. But it didn’t come entirely out of the blue. Here is my origin story.
My story started back when I was still an undergrad at Princeton majoring in electrical engineering. I was working the summer after my sophomore year as an intern at a large defense contractor for an engineering scholarship that I had received from that same company. Specifically, I was designing logic circuits for a hardware implementation of a functioning software neural network. The gentleman for whom I was working (whose name I’ve long since forgotten), had built a software neural net that, hooked up to a speech synthesizer, had learned to pronounce english words as read from ASCII text. I have no idea how much computation and time it took to train that software network, but I know it was a very small and shallow network, and I can assume that it was very, very slow. After all, this was back in 1987 when computing power was many orders of magnitude smaller than it is today. If I recall correctly, state of the art at the time (for consumer computing at least) was somewhere in the vicinity of the 5Mhz 68000 microprocessor in the original Apple Macintosh. I also don’t recall — and possibly never really completely understood — the algorithms he had used. But what I did know is that he needed these networks to run much faster if they were to have practical application. His answer to this was rather than to use a general purpose computing architecture to run (presumably) propagation and back propagation in software, he was attempting to design and build a highly specialized system to do this directly at the hardware level and thus much faster. I was helping him at the time by designing specialized logic circuits for components in his system.
As part of this, I was also able to get a beginners tutorial on how neural nets worked. I became familiar with the functions of propagation and back propagation, if not the particular algorithms for performing those operations. More importantly, though, I saw the resulting software system in action. I got to see how this very simple neural net performed at various stages of training. It blew my mind.
What was most fascinating to me was the kinds of pronunciation errors this machine had made at various stages of the training. They struck me as not unlike the sorts of errors a small child might make when learning to pronounce words. And even though what he had built was minuscule when compared to what needs to be brought to bear to synthesize what we understand as consciousness and “real” intelligence — a grain of sand on a beach — it was apparent to me that the essence of true intelligence was there in this system. It seemed obvious that building the same mechanics, layer upon layer upon layer, would (or at least could) form the basis for something indistinguishable from real intelligence and perhaps even real consciousness. Short of that, it also seemed apparent that if even much more basic and narrow “intelligence” functions could become practical, it would revolutionize so many aspects of our lives.
That was back in 1987, and thanks to the tumult and vicissitudes of an unfocused young adulthood, I didn’t pursue that field, nor engineering in general. That may have been just as well. Given where things were back them from both a computational and a theoretical standpoint, working with neural networks was not something that had much value outside academia. Needless to say a lot has changed since then, and even though my career has not been specifically in engineering, I have watched the ascendence of machine learning in the last decade or so with great interest. My main reaction has been “I knew it.” I remember one of the specific thoughts I had back in 1987 was that a neural net of whatever complexity could be trained once for any demanding task and then copied — or manufactured — infinitely.
Lo and behold. An autonomous, mass-producible flying drone running on deep learning:
And if you’re a nerd about this stuff as I am, here is some of how this technology works:
This is amazing stuff.
However, I’ve also gotten wind of dire concerns from a few smart, rich people like Elon Musk and Bill Gates about what seem to be apocalyptic scenarios in which AI becomes super intelligent and runs amok. Not having read in detail what their concerns were, they seemed silly. But then I read this primer on the subject and those concerns didn’t seem quite as silly.
With this dark understanding, there was a new moral/ethical/philosophical dimension to AI for me. It’s one that I find fascinating because it sits right in the border area between what we understand and what we don’t, and therefore, what we control and what we don’t. Because now, even with simple neural nets, while we can build these things with known and well understood math — I’ve already done one or two of these myself — the internal workings of these machines are largely inscrutable.
For example, a few weeks ago, I completed my first simple neural net project, one which would recognize handwritten numbers from 0-9. It is comprised of 400 inputs, one hidden layer with 25 nodes, and one output layer of 10 nodes. That’s a mere 435 nodes and 10,285 connections. Compare my tiny little net to Google’s large scale Tensor Flow models which are built with upwards of 10 billion connections. It’s minuscule by comparison, and yet it’s not even completely clear what the function of each of my 25 feature-classifying hidden nodes is. As part of the course in which I did this exercise, it was explained that the visualizations of the hidden nodes should roughly correspond to handwriting strokes that the node has learned to recognize — not manually learned as the result of some coder designing heuristics, mind you, but automatically learned as a result of the mathematics of the learning algorithms and optimization functions. Here’s that visualization:
I suppose it’s possible to see how some of those visualizations correspond to strokes, but not with any great clarity. However depending upon the always-randomized initial values of theta, training the exact same network again with the exact same training examples can result in a completely different configuration of hidden nodes:
Yet both of these perform the same classification function with near-identical accuracy.
Here’s the takeaway. Once these systems — even the tiny ones like mine — have been “taught” and therefore have “learned” some skill, there’s simply no way to map out and understand exactly what’s going on in the system itself. When you get into huge networks like Google’s with millions of nodes and hundreds of billions of connection, we’ll never know exactly what’s going on.
The AI machines we’ll create in the future will make Google’s current best efforts look like my tiny, character recognition net. These machines will be far more analogous to biological systems than they will be to how we understand computer systems. And yet we will have built these systems ourselves, presumably to serve us and make our lives better.
I love these sort of moral/ethical/philosophical issues in general, and especially when they are directly related to technology. Even more exciting, I’m now learning the technology itself. I understand, for the time being, the math being brought to bear and how to design the algorithms to create these machines and can see the foundations of a technology that may ultimately be the home to an intelligence greater than our own.
In these posts, I will be sharing my own journey in understanding the present state-of-the-art. But also, I’m looking forward to being able to analyze and discuss some of what this means in a larger context. Where are these machines going? How will we use them? How will we control them? What are the problems we can expect to have to deal with? What will this mean for our society?
It should be a fun trip.
And while I’m on it, I may play a few video games from time to time.