The Rise of Machine Intelligence

April 10, 2018

Do Machines Have the Ability to Think?

The question may seem straightforward, but the innate introspective nature of humans has given us a profound understanding of the concept of thinking. However, answering it without the influence of our personal, subjective experiences - like the voice in our minds that narrates our thoughts - necessitates a formal definition of what thinking truly entails.

There has been a long-standing human fascination with building objects that mimic animal and human behavior—from toys that simulate bird song and flight to Leonardo Torres’ chess playing automaton, El Ajedrecista, in 1912.

However, there is a marked difference between machines that simulate the physical, as opposed to the intellectual characteristics of humans.

History Milestones that led to AI and Machine Learning

While the developments that made it possible to address this question more formally are too numerous to exhaustively list here, it is worth noting some broad milestones:

In mathematics and logic:

Gottlob Frege’s development of modern logic in the late 19th
Bertrand Russell and Alfred Whitehead’s 1910 publication of Principia Mathematica, which attempts to show that mathematics is reducible to symbolic logic
David Hilbert’s introduction of the Entscheidungsproblem, which asks for a method that can take any mathematical (logical) proposition, and from a set of axioms, determine its validity
Gödel’s incompleteness theorems in 1931, which show that no sufficiently expressive formal system can be both complete and consistent
Claude Shannon’s introduction of the field of information theory in 1948

In psychology and neuroscience (animal and human learning)

The classical and operant conditioning of behaviorists such as Ivan Pavlov and B.F. Skinner
Kenneth Craik’s 1943 notion of mental models and their use in human reasoning
Theories of synaptic and neural plasticity proposed by Donald Hebb in 1949
The neural organization of serial order in behavior by Karl Lashley 1951

In engineering

The field of cybernetics, introduced by Norbert Wiener in 1948 to study control systems with environmental feedback
Dynamic programming and its relation to optimal control theory, studied by Richard Bellman, among others, in 1953

McCullough and Pitts and the first neural network

One major development that occurred after, and was inspired in part by the Turing Machine, was the introduction of the first neural network by Warren McCullough and Walter Pitts in their seminal paper, A Logical Calculus of the Ideas Immanent in Nervous Activity. In fact, the work by McCullough and Pitts would arguably have far greater influence over early artificial intelligence (AI) researchers than Turing’s work.

The story of how McCullough and Pitts came to work with one another is itself a fascinating story1. Urban legend holds that the film Good Will Hunting was based on the life of Pitts.

Almost equally amazing is the fact that the first neural network was developed in 1943, contrary to contemporary portrayals of deep learning as a more recent breakthrough technology.

Building on the propositional logic of Russell and Whitehead’s Principia Mathematica, and pulling in the knowledge they had of neuroanatomy, McCullough and Pitts developed a theory of how neurons can be interconnected through a set of synaptic weights in a way that recreates the functioning of logical gates.

With a set of such gates, it is possible to construct a neural network to compute the truth values of sentences of arbitrary logical propositions.

Their model grossly oversimplified the structure and function of neurons and it could not learn or adapt (the synaptic weights had to be set by hand). However, it inspired John von Neumann’s computer architecture and was a huge inspiration to the cadre of researchers who would later introduce the phrase artificial Intelligence.

AI: The Dartmouth workshop that named it all

Beating out alternatives like machine intelligence, thinking machines and cybernetics, the phrase artificial Intelligence was first coined by John McCarthy in 1955.

It attempted to describe plans for a summer workshop that would bring together a small group of researchers from diverse backgrounds who were studying concepts related to machine intelligence. The goal as described by McCarthy, in addition to Claude Shannon, Marvin Minsky and Nathan Rochester, in their proposal for the workshop, was defined as follows:

The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.

The name given by McCarthy was in part due to a desire to differentiate the burgeoning field they were creating, from the myriad fields each of the researchers was coming from. It should be noted that McCarthy also wanted to avoid the title cybernetics for fear of having to deal with an overbearing Norbert Wiener2.

Nevertheless, the proposal contained seven themes and a call for individual researchers to propose their own topics. Among them were “How can a computer be programmed to use a language?” “Neuron nets,” “Self- improvement” and “Abstractions,” which refers to learning abstractions from sensory inputs.

The topics laid out at the workshop would largely shape the future direction of AI, uniting researchers from disparate fields towards common goals and creating acrimonious divisions between researchers who disagreed about the best method to achieve them.

Development of AI

Since the Dartmouth workshop, the progression of AI has seen various techniques wax and wane. For instance, the current deep-learning revolution is actually the third period of relative popularity for neural networks.

The first period, from the 1940s to 1960s, began with the invention of neural networks by McCullough and Pitts and extended to the development of the perceptron.

The perceptron was a simple neural network developed by Frank Rosenblatt in 1957 that could adapt and learn, and was capable of simple forms of optical character recognition.

Despite their promising capabilities, neural networks were effectively killed as a field when Marvin Minsky, an early proponent of neural nets, and Seymour Papert published their book, Perceptrons, in 1969.

In it, they detailed the limitations of Rosenblatt’s perceptron by proving that it was incapable of learning solutions to whole classes of mathematical problems. The most famous was the XOR function, where a network would have to learn to output the result of an “exclusive or” on two inputs.

Although it was realized later that this limitation could be easily overcome with minor changes like the use of non-linear threshold functions, the book was persuasive enough to eliminate funding and interest in learning algorithms inspired by the brain.

The void left by the disappearance of neural networks was filled by what would later be referred to as good old-fashioned AI (GOFAI). The techniques that defined GOFAI were largely symbolic logic. This contrasts with the sub-symbolic processing of a neural network, where processing is spread across many neurons or nodes, and where representations can be distributed and continuous.

GOFAI made use of production rules, such as If-Then, and search techniques where possible hypotheses about actions and their resulting consequences could be laid out, evaluated and compared. Expert systems were developed that attempted to formalize the knowledge of topic experts into representations suitable for computers and algorithms to operate on.

Despite the success of GOFAI, the trend towards symbolic AI was met with resistance by the first revival of neural networks in the late 1970s and 1980s. During this period, they were known as connectionist systems during that period because of their widely interconnected systems of neurons.

This revival was due mainly to the introduction of techniques like adaptive resonance theory (ART), a biologically plausible neural network and back propagation. It’s a learning algorithm that adapts the weights of an artificial neural network and shows how a solution to the XOR problem could be easily learned.

The era was ushered in with a book by James McClelland and David Rumelhart called Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Although highly technical, the book was a hit and was written about in the New York Times Book Review.

Despite this newfound glory, the second era of popularity for neural networks was also short-lived due to limitations in computing power and the scarcity of data with which to train models.

Consequently, neural networks were limited to toy-problems, again leaving them open to critique from proponents of symbolic approaches. A second AI-winter would set in and last until the early 2000s.

The current deep learning revolution elevated neural networks its third act. Developments like the long short-term memory (LSTM) model developed in 1997 by Hochreiter and Schmidhuber, as well as Hinton’s 2006 introduction of deep belief networks (DBNs), showed how to overcome some limitations of earlier models.

Coupled with increasing computational power and graphics processing units (GPUs) along the ever-increasing availability of data, deep learning models began to saw dramatic improvements in error rates for common machine learning tasks.

The sudden gains made by neural networks in speech recognition, computer vision and natural language processing has a far-reaching impact. Google, Facebook, Microsoft and other large companies with a strong interest in processing speech, image and textual data began to invest significant resources in research and development, which accelerated the pace of development in AI.