AI News, BOOK REVIEW: MIT Team's Neural Network Chip Could Put A.I. into Everything ... artificial intelligence

The Evolution of AI

Over the past few years, it seems like Artificial Intelligence (AI) is quietly becoming an integral part of our lives.

Consider how common AI-infused products and services such as smart speakers (Amazon Echo, Google Home), self-driving cars (Tesla), smile recognition (smartphone camera app), and turn-by-turn directions have become.

Since the definition is based on the current capabilities of computers, this means that AI will change subtlety from year to year and dramatically from decade to decade.

While many in our day and age of smartphones and smart toothbrushes might find it hard to consider anything from centuries ago as smart, we only need to consider Elaine Rich’s definition of AI to realize how magical or intelligent some of these devices might have appeared at the time.

Before anyone complains about my inclusion of the mechanical turk, I’ll caveat my reference to include that this contraption wasn’t an automaton or a robot of any kind.

In Pamela McCorduck’s 1979 book Machines Who Think (updated in 2004), she explores the idea and origins of AI and provides a succinct summary when she said “artificial intelligence in one form or another is an idea that has pervaded Western intellectual history, a dream in urgent need of being realized, expressed in humanity’s myths, legends, stories, speculation and clockwork automatons.”

AI was being widely researched for the enterprise and for defense, so my role in technical sales afforded me a unique opportunity to see a wide variety of AI applications across multiple industries and dozens of companies.

Over the course of five years and three companies, I witnessed both boom and bust with seemingly endless growth possibilities initially followed by closures and substantial layoffs.

The history of AI is long and intertwined with the history of computers.  To make this article more comprehensible, I’ve divided the evolution of AI into three release or periods – AI 0.5, AI 1.0 and AI 2.0.

Given the state of computing at this time, AI 0.5 consisted primarily of early research offering ways to represent problems they were trying to solve, the invention of various tools, languages and techniques, and of course, the first gathering of AI scientists.

To appear as a human, the computer would have to understand natural language, converse on a wide variation of topics and most important of all, understand the subtle nuances of human languages – definitely an aspirational goal given the state of computing at the time.

Using a capacitor to hold the weighting of each synapse, each synapse was trainable – that is, the weightings were adjustable and determined if a signal less or more likely to propagate further.

When the machine issues the right answer, the operator could reward all synapses that fired, thereby increasing the weight and the likelihood that that neuron would fire again.

The Mark I Perceptron has an input of 400 photocells (20×20 pixel image) that could connect to a layer of 512 neurons which could then be connected to eight output units.

The connections between the inputs, the neurons, and the outputs could be dynamically configured with a patch panel so his team could try different models and learning algorithms.

While the Perceptron’s input of 400 pixels seems insignificant to the 2 million pixels in a 1080p HDTV display, it was still impressive given that most computers of the time didn’t even have monitors.

The Logic Theorist was successful on many fronts including solving 38 of 52 mathematical theorems and introducing some new AI foundational concepts – heuristics, list processing, and reasoning as search.

With the understanding that intelligence or reasoning is a smart search, they used a search tree to represent a hypothesis (root) and logical deductions (each branch).

Since search trees can grow exponentially resulting in a combinatorial explosion which can become impractical to search, they used rules of thumb or heuristics to reduce the search space or prune any potentially fruitless branches.

Finally, in trying to represent search spaces and to support the application of lists of deductions on this search space, they created a new programming language for list processing called Information Processing Language (IPL).

For comparison, this Mark I’s 20KB of memory (40 bits = 5 bytes, 5 x 4K = 20K) is a millionth of the memory found today’s computer memory (8GB RAM) or a trillionth compared to the today’s hard disks (1TB).

As it turns out, choosing your next move based on all possible moves so that you’ll win a game is similar to solving a mathematical proof.

At this point, the ends of the branched (or leaves) would represent all the possible outcomes after two moves – eight for the first and 82 or 64 outcomes for the second.

Unfortunately, after 10 moves (810) you’d have to look through more than one billion possible outcomes to determine the best possible path.

While the first couple of moves would result in a manageable number of outcomes, each subsequent move multiplies the number of possible positions resulting in an exponential explosion.

This took 18 years and more than 200 processors to calculate, but by knowing all the possible moves and their possible outcomes, Chinook will never lose a game of checkers.

For example, after 16 moves in Sudoku, there are 5.96 x 1098 possible outcomes, five times the total possible outcomes in checkers.

In many ways, this first AI conference was aspirational as they understood that “the speeds and memory capacities of present computers may be insufficient to simulate many of the higher functions of the brain…”

To put computer hardware in perspective, computer memory on $100K research computers was measured in single digit kilobytes and was stored on relatively slow spinning magnetic drums.

Invented at MIT by John McCarthy, it carried on the innovations (list processing and recursion) of IPL at RAND while adding automatic memory management (including garbage collection) and self-modifying code.

List processing is important to AI as it allows programmers to easily store data in collections of lists and nested lists and apply operations wholesale to every item or atom in these lists – kind of like Microsoft Excel spreadsheet formulas working with every cell in a row or column.

The ease with which one could apply operations to lists and sub-lists along with automatic memory management for the programmer, a somewhat administrative task, allowed AI researchers to focus more on the current problem making this an attractive language.

If you have a list of 64 outcomes after two moves, to get all the possible outcomes after three moves, you would only need to apply all possible choices to that list of 64 moves.

Finally, the list structure was used to represent both the data and the code, so writing a program that modified itself and then self-executed that modified code opened up some interesting types of applications.

In the 60 years since its inception, the LISP language has had over 20 dialects created and as part of AI 1.0, several companies even architected computers, LISP Machines, to run this language optimally.

Whereas previous expert systems intertwined the knowledge and heuristics with the program that used that knowledge, DENDRAL separated the domain-specific rules into a knowledge base and the reasoning mechanism into an inference engine.

This architectural separation would also be the basis of many successful commercial expert systems and decision support systems in AI 1.0 where these companies could focus on providing fast and efficient inference engines.

Modelled after a Rogerian psychotherapist, ELIZA created an illusion that it understood you by pattern matching your responses, finding the subject or object of your statement, and using them in Mad Libs-like responses.

Several centuries after Da Vinci, in 1970, WABOT-1 seems significantly more human-like with multiple systems for vision, hearing (language understanding and speech), limb control (balance and movement) and hand control (tactile sensors).

While not as agile as the door opening robotic dogs built by Boston Dynamics or as conversational as Hanson Robotic’s Sophia, WABOT-1 did walk and interact in Japanese providing us with a preview of modern humanoid robots.

Essentially, the critique centered around the single layer of neurons, the need for multiple layers and the inadequacy of learning algorithms to address multiple layers.

Light detected by the rods/cones are grouped and abstracted into lines of different angles which becomes different shapes which becomes facial features (nose, eye) which places together in the right position becomes a face.

They started to see the boom and bust cycle repeating itself in the 80’s as they had lived through the optimistic and well-funded 50’s and 60’s followed by the pessimistic and waning interest (and funding) of the 70’s.

The recessionary 1970’s might share some blame for the first AI Winter, but most researchers attribute its cause to the hype and unrealistic expectations set during the earlier boom years.

Given the hardware limitations of the time and the need to build tooling and new computer languages, it’s easy to see how the researcher’s software aspirations were light years ahead of current capabilities.

Additionally, others would point to the 1969 Mansfield Amendment which stopped undirected research, and instead, directed research funding towards military-related applications.

drivers and connectors), configuring a computer for a customer included determining the right set of CPUs, memory, backplanes, cabinets, power supplies, disks, tapes, printers, cables, and the right software drivers for each of these.

With 30,000 possible configurations, costly mistakes were not uncommon with a delivery of a computer system without the right cables, print drivers or software resulting in lost time, and lower customer satisfaction.

While none of these national programs resulted in machine intelligence or a change in technology dominance, they did result in significant progress as they funded and collaborated with universities, research institutions, and commercial companies in AI related areas, such as Expert Systems (IntelliCorp), Lisp Machines (Symbolics), Autonomous Vehicles, Natural Language Understanding and Super Computers (Thinking Machines).

As the performance of general computing platforms like Sun Microsystems, HP workstations, and other engineering workstations improved enough for LISP compilers to be hosted on these computers, the role of LISP machines diminished.

Specialized Hardware Wasn’t Just for AI In the 80’s and 90’s, it was not unusual for computers to be specialized to run a particular language or address a specific type of problem.

While they toiled to organize, coordinate and maintain how their data interacted with their functions within their programs, a lucky set of programmers with access to research computers or a LISP machine used object-oriented languages (OO) such as Flavors or Smalltalk, and let the language organize data and code interaction for them.

Instead, the window system tells the file that a right mouse click occurred and the file responds with all the types of possible actions.

The new flavor has a new set of characteristics (vanilla bean and chocolate) and behavior (crunchy) resulting from the combination of base flavor and from the Oreo mix-ins.

If you wanted to create another flavor of ice cream with chewiness and strawberry taste but still have the characteristics of the new flavor you just created, you’d add more flavors (strawberry) and mix-ins (gummy bears).

Believe it or not, this describes the OO concept of inheritance from classes/flavors (deriving characteristics and behavior from a base flavor) and the injection of new behavior via mix-ins.

Using a programming example, if you wanted a window with a scroll bar, you would just add a scroll bar mix-in to a base window class or flavor.

You can see the results of this research in the schooling behavior of the fish and the flocking behavior of the birds in the computer animated short Stanley and Stella (debuted at Siggraph 1987).

Inspired by Digital’s success with expert systems and with the promise that they could replicate their domain experts, many governmental agencies and commercial companies started applying expert systems to a wide variety of problems.

If a similar card with minimal purchases suddenly sees big purchases in a country where they don’t reside, the purchase should be questioned as well.

But if that same card saw travel-related purchases like rental cars, airline tickets or hotels proceeding the questionable charge, it probably wouldn’t be questioned as that’s likely the card owner vacationing.

Finally, if the card holder is a celebrity or a high-level government official, the purchase is less likely to be questioned at the risk of embarrassing them or creating an embarrassing story they might relay on national TV the next time they’re on a talk show.

In 1974, Paul Werbos was the first to propose using Backpropagation to train multi-layer neural nets after studying it as part of his PhD thesis – “Beyond regression: new tools for prediction and analysis in the behavioral sciences.”

Single Layer Learning versus Multi-Layer Learning With a single layer, training simply meant changing the weighting of the connections and the neuron that resulted in the positive or false outcome.

Since these hidden layers were partly responsible for the last neuron to fire directly (X-1) and indirectly (X-2 …), it makes sense to strengthen (correct answer) or weaken (wrong answer) those connections as well.

During this time, numerous defense companies were researching applications of AI for autonomous vehicles, battle management systems, pilot assistants and filtering intercepted communication.

While it’s unclear how many of these resulted in actual deployment, there is no doubt that funding helped advanced AI fields such as voice recognition, natural language understanding, machine vision, planning and image recognition.

From expert systems to reorient the station to avoid deadly space debris to general needs for diagnosis system malfunctions, having an expert system on the station was the next best thing to having the expert there in person.

This flying, AI assistant robot will float in zero gravity and propel itself towards the astronaut when called to help with experiment procedures.

Speaking from experience, developing software on the Symbolics with its high-resolution monitors, optical mouse and Lisp debugging environment was like driving a Porsche down the hairpin turns of Pacific Coast Highway – it just felt right.

As military systems finished research and considered deployment, the memory, power and cost constraints of deploying in an embedded environment or in a hardened environment such as the Space Station, a fighter jet cockpit, or an armored vehicle became an issue.

Unfortunately, between the standardization of ADA for DOD projects and the strategy of using Expert System Shell that compiled into standardized languages, Lisp Machine never broke out of its role as a research and development machine.

Two macro events also caused a reduction in AI spending – freezing conditions that caused the failure of an O-Ring on the Space Shuttle and warming conditions that brought down a wall.

Due to the effects of unusually cold conditions on critical O-Rings, what started as a routine mission turned into a spectacular and deadly explosion 73 seconds into the launch taking the lives of seven brave astronauts.

On November 9, 1989, East and West Berliners were allowed to freely cross the Berlin wall and by the following weekend, citizens were tearing down the wall brick by brick.

While the combined worldwide effort didn’t result in the sought-after breakthroughs or any significant change in the balance of power (technology wise), it did push Computer Technology and AI further along than it would have been otherwise.

While some high-profile AI companies suffered some significant setbacks, this did not diminish the value of individual AI technologies that saw commercial or enterprise success.

Also as AI-focused computers such as Lisp Machine fell out of vogue (and the big specialized hardware budgets to support them), many of the Lisp environments and Expert Systems shells simply ported their solutions to standard workstations and standard languages.

From our smartphones to self-driving cars to home appliances, we’ll find claims of AI tech improving performance or enhancing our experience in some way.

For example, a smart speaker uses multiple AI technologies (voice recognition, speech recognition, language understanding, language translation, neural networks) and would be hard to describe other than to aggregate the multiple AI components as “The AI.”

Let’s look at the Apollo guidance computer versus the Samsung Galaxy S9: Again this is not a fair comparison for the Samsung Galaxy S9, since the Apollo computer was custom built by MIT, only did one thing and costs a lot more than $110 (that’s 1969 dollars for the S9’s $720 retail cost today).

Plus, the S9 is a general purpose mobile computer that communicates wirelessly at 4G speeds, records 4K video and contains its own power source.

Just look at the cost of a 286-based PC with EGA color monitor in the late 1980’s (about $2,000 if you built it yourself) versus a mid-range 8″

Between the displays (640×350 with 16 colors vs 1024×768 with 16M colors), the processors (eight fast cores versus one slow core), the memory (GigaBytes versus MegaBytes) and the size, the computing capabilities have become much more affordable and accessible now than ever.

All the miniaturization of components, processors, memory and wireless antennas has put small, connected computers or if you will, IoT devices (wearables, AR glasses) and sensors everywhere at relatively affordable costs.

Just count all the devices you use every day to gain an idea of how accessible they are – fitness tracker, smartphone, home smart thermostat, TV, car, smartwatch.

For your e-commerce website, to calculate sales tax on its own, you’d have to take into consideration the type of product or service sold, the customer’s state, county and possibly municipality.

You’d provide the type of product or service, area code and possibly address, and voila, the web service would return the current tax rate.

With cloud services, if you’re putting together an online service, you don’t have to buy the servers, hire the computer system operators and lease out the space.

Additionally, there is no significant start-up cost or long lead times typically associated with setting up data centers and hiring and training staffing.

For AI, cloud services means access to proven machine learning frameworks, so you can train your own AI and then host that trained AI for others to use (or for others to subscribe to) without buying hardware, hiring computer operators or leasing server space.

With web services, cloud services and mobility, your AI service can be hosted on a super computer so that it’s accessible from your app on smartphones.

The three ingredients are the availability of large amount of data needed to train neural nets, the breakthrough in deep learning and the speed gains in using GPUs.

Data is essential to training neural networks, that is, to train a blank neural network with zero knowledge into an AI with expertise, image recognition or the ability to predict or to advise.

While enterprises and government entities might have shared data within their organizations, the information in standalone PC applications generally stayed with the applications.

The Internet With the internet, now websites like Amazon, YouTube, Facebook and others have such an enormous amount of data that can be used for predictive and user behavioral analysis such as recommendation engines.

But these PCs and laptops weren’t truly mobile so they weren’t always on and didn’t see into all aspect of our lives – they still sat at home or at a coffee shop.

IoT Add to that as many as 10 billion IoT devices such as cars, smartwatches, smart speakers and home security and the volume of generated data becomes incomprehensible.

One doesn’t have to look far for free, open data sets for training your AI for image, natural language or speech processing or free data sets from government agencies.

teaching or training a neural network, involves examining the output produced by the neural net versus the desired output, and then adjusting the weights or states of the connections with the intention of getting closer to producing a correct answer.

With deep neural nets, the correction needs to be distributed through all the hidden layers and paths that helped select the right or wrong answer – not just the ones close to the results.

As pointed out in AI 0.5, Minsky’s 1969 paper on Perceptrons concluded that the learning algorithms available at the time didn’t address deep neural nets.

That said, we were still in the heart of AI 1.0 where Expert Systems were still seeing significant success and most AI research was directed away from neural networks as a result of Minsky’s 1969 paper.

Yes, there would be some significant milestones like Jonathan Schaeffer’s Deep Blue’s defeat of the world’s chess champion in 1997 but that victory was primarily a result of sheer muscle and not based on neural networks.

This along with additional changes to heuristics resulted in Deep Blue’s ability to search 100 million to 200 million positions per second.

Raina et al explained how GPUs are naturally suited for neural networks when they said GPUs “are able to exploit finer-grained parallelism … designed to maintain thousands of active threads at any time … with very low scheduling overhead.”

In a 2014 interview, he attributed the recent AI explosion by identifying what changed in the 1990’s with this statement “What’s different is that we can run very large and very deep networks on fast GPUs (sometimes with billions of connections, and 12 layers) and train them on large datasets with millions of examples.”

From Rosenblatt’s Perceptron through the 1980’s (Hopfield Network, Radial Network, Boltzmann machine) to the 90’s (Recurrent networks, long short-term memory) to the 2010’s (Deconvolutional network, Deep convolutional inverse graphics networks), there have been new variants and ultimately combinations of different neural network types to form architectures for specific tasks or problems.

With faster computing power that’s affordable locally and through web services, dozens of neural network platforms including ones with pre-trained models are now widely available.

In early 2011, IBM Watson easily beat the two most successful Jeopardy champions in two matches with a total of winning of $112K – the two second place winners only totaled $34K.

Go, a 2,500 year old game, is considered one of the world’s most complex games with more possible moves than chess – that means a larger search space with an exponential explosion that isn’t easily addressed by brute force.

While it’ll be hard to predict the eventual winner or winners here, these chips and possibly their descendants will go a long way to providing the power needed to approach the 86 billion neurons in a human brain.

While primarily focused on commercial applications, some of those advances were coupled with military defense systems such as super computers to reduce radar signatures of stealth jets or to increase image/terrain recognition accuracy for cruise missiles.

Whomever excels in AI will have a totally epic experience surfing the next wave of innovations such as autonomous vehicles (car, planes, taxis, delivery), predicting consumer behavior, targeting ads, workforce automation, nanotechnology, and IoT.

Even without watching the dark Slaughterbot short film that predicts a bleak future with AI assassin drones, it’s not hard to imagine how AI might enhance weapons systems.

Here are a few reports on current and aspirational projects: swarming airborne drones, naval autonomous ships, unmanned weapons, unmanned fighter jets and robotic soldiers.

One of the guiding principles as printed in their August 31 2018 report is “Human responsibility for decisions on the use of weapons systems must be retained since accountability cannot be transferred to machines.”

And for you and me, we must have absolute confidence in our self-driving car’s ability to avoid pedestrians and other cars before we take a nap or watch a movie on our commute home.

In Siddhartha Mukherjee’s 2017 New Yorker article AI Versus MD, he summarizes an ANN as a self-taught black box who has looked at vast numbers of examples (in this case, pictures of moles versus melanoma).

The ANN can easily diagnose a picture as a result of all the training but other than to point to some pictures that your mole resembles, it’ll never know why or give any reason based on medical knowledge.

Hinton follows up with “the more powerful the deep-learning system becomes, the more opaque it becomes … Why these features were extracted out of millions of other features, however, remains an unanswerable question.”

From AI 0.5, the Dendral system could produce new chemical structures and from AI 1.0, the American Express Authorization Assistant was successful at recommending denying or authorizing purchases to minimize fraud.

After training on existing movie and tv scripts, Goodwin and his colleague, film director Oscar Sharp, generated a script, and based on that script directed a short movie (Sunspring) for a film challenge.

similar project presented by Suwajanakorn et al at SIGGRAPH 2017 shows the generation of talking heads video based on spoken audio.

Now that AI can write scripts from ideas or images, create musicians from audio and create character mouth motions to deliver dialog, what about creating believable human speech?

Albeit they’re only using it to assist you with restaurant reservations, there’s no reason why this same technology couldn’t be used to create natural speech for AI created characters and dialog in an AI generated script.

Given what we’ve seen, AI can already create the script, voice the actors, generate the lip movement, write the soundtrack and generate the extras.

The evolution of AI has been a long journey and over the course of 60 plus years, we’ve seen significant foundational work and many surprising twists and turns.

From Marvin Minsky’s 1951 grand piano-sized 40 node neural network to GPU chips to the recent Neuromorphic super computer approaching one billion neurons, the hardware advancements has been astonishing.  Equally impressive has been the advancements in software and software tooling – e.g.

lists to LISP to object-oriented languages.  For more than 60 years, each software and hardware development has laid the foundation for subsequent discoveries.  Today’s neural networks would not be possible without inexpensive memory chips or the speed of GPUs or the massive amounts of data from the internet and our mobile devices.  Essentially AI has grown as fast as computer hardware and software has allowed it to.

While hardware speeds have been predictable (Moore’s law), determining which AI technologies would bear fruit has not.  Through AI 0.5, AI 1.0 and AI 2.0, we’ve seen promising technologies such as neural networks blossom, go dormant and then surprisingly, revive back to life again proliferating many aspects of our life.

 Now some 30 years later, because of their work, AI has touched everyone’s lives with applications such as AI assistants, turn-by-turn directions and internet searches.  AI owes this latest revival to ML and DL researchers like Hinton.

Just as most AI industry experts during AI 1.0 could not have predicted today’s dominance by neural networks, we should also expect to be equally surprised at what blossoms in next 10 to 20 years.

Google's Deep Mind Explained! - Self Learning A.I.

Subscribe here: Become a Patreon!: Visual animal AI: .

Neural Networks: How Do Robots Teach Themselves?

This robotic hand practiced rotating a block for 100 years inside a 50 hour simulation! Is this the next revolutionary step for neural networks? A.I. Is Monitoring ...

MIT 6.S094: Introduction to Deep Learning and Self-Driving Cars

This is lecture 1 of course 6.S094: Deep Learning for Self-Driving Cars taught in Winter 2017. Course website: Lecture 1 slides: ..

IBM Research breakthrough in neuromorphic computing | PatentYogi

Please watch: "Disney's Drone Technology | Episode 1 | PatentYogi Research" -~-~~-~~~-~~-~- Building ..

A.I. and Machine Learning in a Connected World

What do machine learning and A.I. mean for our future? Experts S. Somasegar of Madrona Venture Group, Karl Iagnemma of NuTonomy, Gareth Keane of ...

Lecture 15 | Efficient Methods and Hardware for Deep Learning

In Lecture 15, guest lecturer Song Han discusses algorithms and specialized hardware that can be used to accelerate training and inference of deep learning ...

The Future of Augmented Intelligence: If You Can’t Beat ‘em, Join ‘em

Computers are getting smarter and more creative, offering spectacular possibilities to improve the human condition. There's a call to redefine Artificial ...

Small Deep Neural Networks - Their Advantages, and Their Design

Deep neural networks (DNNs) have led to significant improvements to the accuracy of machine-learning applications. For many problems, such as object ...

How computers are learning to be creative | Blaise Agüera y Arcas

We're on the edge of a new frontier in art and creativity — and it's not human. Blaise Agüera y Arcas, principal scientist at Google, works with deep neural ...

The Convergence of Machine Learning and Artificial Intelligence Towards Enabling Autonomous Driving

Brains, Minds, and Machines Seminar Series: Amnon Shashua - Hebrew University, Co-founder, CTO and Chairman of Mobileye Abstract: The field of ...