AI News, Cite This Page

Cite This Page

Wikibooks contributors, 'Artificial Neural Networks', Wikibooks, The Free Textbook Project, 17 May 2010, 14:37 UTC, <https://en.wikibooks.org/w/index.php?title=Artificial_Neural_Networks&oldid=1797568>

2010 May 17, 14:37 UTC [cited 2018 Oct 4].

When using the LaTeX package url (\usepackage{url} somewhere in the preamble) which tends to give much more nicely formatted web addresses, the following may be preferred:

Copyright and Licensing

An old version of text is also available on Amazon.com Kindle (also avail for free as a mobi file you can email to your kindle) -- this is not recommended for students taking courses -- see above PDF / bound book options.

See Contributors for information about how to contribute, and who has already, Status for current status and future plans for the text, and Pedagogy for pedagogical info for teachers (and students), including Syllabi from courses using this book.

The content is organized into chapters (below), but also massively hyperlinked, and the content can go very 'deep' while also enabling a very quick and relatively high-level of information that captures the key points, suitable for college undergraduates or even high school curricula.

The authors maintain stringent editorial control over the contents of this book: it is publication-quality material from scientific experts, not anonymous crowd-sourced material as on other wikis (e.g., WikiPedia).

How to use this License for your documents

This book is going to be aimed at advanced undergraduates and graduate students in the areas of computer science, mathematics, engineering, and the sciences.

Artificial Neural Networks, also known as “Artificial neural nets”, “neural nets”, or ANN for short, are a computational tool modeled on the interconnection of the neuron in the nervous systems of the human brain and that of other organisms.

neural nets are a type of non-linear processing system that is ideally suited for a wide range of tasks, especially tasks where there is no existing algorithm for task completion.

With proper training, ANN are capable of generalization, the ability to recognize similarities among different input patterns, especially patterns that have been corrupted by noise.

Each neuron is a multiple-input, multiple-output (MIMO) system that receives signals from the inputs, produces a resultant signal, and transmits that signal to all outputs.

However, to reproduce the effect of the synapse, the connections between PE are assigned multiplicative weights, which can be calibrated or “trained” to produce the proper system output.

Where ζ is the weighted sum of the inputs (the inner product of the input vector and the tap-weight vector), and σ(ζ) is a function of the weighted sum.

If we recognize that the weight and input elements form vectors w and x, the ζ weighted sum becomes a simple dot product:

The dotted line in the center of the neuron represents the division between the calculation of the input sum using the weight vector, and the calculation of the output value using the activation function.

Neural networks tend to have one input per degree of freedom in the input space, and one output per degree of freedom in the output space.

Expert systems, by contrast, are used in situations where there is insufficient data and theoretical background to create any kind of a reliable problem model.

Expert systems emulate the deduction processes of a human expert, by collecting information and traversing the solution space in a directed manner.

Though such assumptions are not required, it has been found that the addition of such a priori information as the statistical distribution of the input space can help to speed training.

During training, the neural network performs the necessary analytical work, which would require non-trivial effort on the part of the analyst if other methods were to be used.

During training, care must be taken not to provide too many input examples and different numbers of training examples could produce very different results in the quality and robustness of the network.

Some of the more important parameters in terms of training and network capacity are the number of hidden neurons, the learning rate and the momentum parameter.

These neurons are essentially hidden from view, and their number and organization can typically be treated as a black box to people who are interfacing with the system.

Square root of the sum of squared differences between the network targets and actual outputs divided by number of patterns (only for training by minimum error).

In the case of a biological neural net, neurons are living cells with axons and dendrites that form interconnections through electro-chemical synapses.

These neurotransmitters, along with other chemicals present in the synapse form the message that is received by the post-synaptic membrane of the dendrite of the next cell, which in turn is converted to an electrical signal.

This page is going to provide a brief overview of biological neural networks, but the reader will have to find a better source for a more in-depth coverage of the subject.

Neurons utilize a threshold mechanism, so that signals below a certain threshold are ignored, but signals above the threshold cause the neuron to fire.

The random interconnection at the cellular level is rendered into a computational tool by the learning process of the synapse, and the formation of new synapses between nearby neurons.

The history of neural networking arguably started in the late 1800s with scientific attempts to study the workings of the human brain.

The Mark I was a two layer Perceptron, Hecht-Nielsen showed in 1990 that a three layer machine (multi layer Perceptron, or MLP) was capable of solving nonlinear separation problems.

Even though this book is going to focus on MATLAB for its problems and examples, there are a number of other tools that can be used for constructing, testing, and implementing neural networks.

The output is a certain value, A1, if the input sum is above a certain threshold and A0 if the input sum is below a certain threshold.

linear combination is where the weighted sum input of the neuron plus a linearly dependent bias becomes the system output.

In these cases, the sign of the output is considered to be equivalent to the 1 or 0 of the step function systems, which enables the two methods be to equivalent if

This is called the log-sigmoid because a sigmoid can also be constructed using the hyperbolic tangent function instead of this relation, in which case it would be called a tan-sigmoid.

1

+

e

−

β

t

If ρl is a vector of activation functions [σ1 σ2 … σn] that acts on each row of input and bl is an arbitrary offset vector (for generalization) then the total output of layer l is given as:

hidden layer neuron represents a basis function of the output space, with respect to a particular center in the input space.

In a recurrent network, the weight matrix for each layer l contains input weights from all other neurons in the network, not just neurons from the previous layer.

context layer feeds the hidden layer at iteration N with a value computed from the output of the hidden layer at iteration N-1, providing a short memory effect.

Because the only tap weights modified during training are the output layer tap weights, training is typically quick and computationally efficient in comparison to other multi-layer networks that are not sparsely connected.

This means that mathematical minimization or optimization problems can be solved automatically by the Hopfield network if that problem can be formulated in terms of the network energy.

Each attractor represents a different data value that is stored in the network, and a range of associated patterns can be used to retrieve the data pattern.

SOM are modeled on biological neural networks, where groups of neurons appear to self organize into specific regions with common functionality.

The Euclidean distance from each input sample to the weight vector of each neuron is computed, and the neuron whose weight vector is most similar to the input is declared the best match unit (BMU).

In adaptive resonance theory (ART) networks, an overabundance of neurons leads some neurons to be committed (active) and others to be uncommitted (inactive).

Where: Here, pij is the probability that elements i and j will both be on when the system is in its training phase (positive phase), and qij is the probability that both elements i and j will be on during the production phase (negative phase).

Because Boltzmann machine weight updates only require looking at the expected distributions of surrounding neurons, it is a plausible model for how actual biological neural networks learn.

For instance, the most common answer among a discrete set of answers in the committee can be taken as the overall answer, or the average answer can be taken.

Committee of machines (COM) systems tend to be more robust then the individual component systems, but they can also lose some of the “expertise” of the individual systems when answers are averaged out.

Given an input set x, and a cost function g(x, y) of the input and output sets, the goal is to minimize the cost function through a proper selection of f (the relationship between x, and y).

Unsupervised learning is useful in situations where a cost function is known, but a data set is not know that minimizes that cost function over a particular input space.

Error-Correction Learning, used with supervised learning, is the technique of comparing the system output to the desired output value, and using that error to direct the training.

In the most direct route, the error values can be used to directly adjust the tap weights, using an algorithm such as the backpropagation algorithm.

By following the path of steepest descent at each iteration, we will either find a minimum, or the algorithm could diverge if the weight space is infinitely decreasing.

The backpropagation algorithm, in combination with a supervised error-correction learning rule, is one of the most popular and robust tools in the training of artificial neural networks.

When talking about backpropagation, it is useful to define the term interlayer to be a layer of neurons, and the corresponding input tap weights to that layer.

Where xil-1 are the outputs from the previous interlayer (the inputs to the current interlayer), wijl is the tap weight from the i input from the previous interlayer to the j element of the current interlayer.

The backpropagation algorithm specifies that the tap weights of the network are updated iteratively during training to approach the minimum of the error function.

This makes it a plausible theory for biological learning methods, and also makes Hebbian learning processes ideal in VLSI hardware implementations where local signals are easier to obtain.

Neurons become trained to be individual feature detectors, and a combination of feature detectors can be used to identify large classes of features from the input space.

Adaptive Resonance Theory (ART) learning algorithms compare the weight vector, known as the prototype, to the current input vector to produce a distance, r.

When a new input sequence is detected that does not resonate with any committed nodes, an uncommitted node is committed, and it’s prototype vector is set to the current input vector.

The Euclidean distance from each input sample to the weight vector of each neuron is computed, and the neuron whose weight vector is most similar to the input is declared the best match unit (BMU).

Feature detection or “association” networks are trained using non-noisy data, in order to recognize similar patterns in noisy or incomplete data.

Meteorological prediction is a difficult process because current atmospheric models rely on highly recursive sets of differential equations which can be difficult to calculate, and which propagate errors through the successive iterations.

Once the relation has been modeled to the necessary accuracy by the network, it can be used for a variety of tasks, such as series prediction, function approximation, and function optimization.

Function approximation or modeling is the act of training a neural network using a given set of input-output data (typically through supervised learning) in order to deduce the relationship between the input and the output.

Because of the modular and non-linear nature of artificial neural nets, they are considered to be able to approximate any arbitrary function to an arbitrary degree of accuracy.

The purpose of this License is to make a manual, textbook, or other functional and useful document 'free' in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially.

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does.

'Secondary Section' is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject.

(Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

'Transparent' copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters.

A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent.

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification.

Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.

The 'Title Page' means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page.

For works in formats which do not have any title page as such, 'Title Page' means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.

You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License.

If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover.

If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material.

If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.

You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it.

You may add a section Entitled 'Endorsements', provided it contains nothing but endorsements of your Modified Version by various parties—for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.

You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.

If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number.

You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.

compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an 'aggregate' if the copyright resulting from the compilation is not used to limit the legal rights of the compilation's users beyond what the individual works permit.

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document's Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form.

However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation.

Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice.

If the Document specifies that a particular numbered version of this License 'or any later version' applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation.

If the Document specifies that a proxy can decide which future versions of this License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Document.

'CC-BY-SA' means the Creative Commons Attribution-Share Alike 3.0 license published by Creative Commons Corporation, a not-for-profit corporation with a principal place of business in San Francisco, California, as well as future copyleft versions of that license published by that same organization.

An MMC is 'eligible for relicensing' if it is licensed under this License, and if all works that were first published under this License somewhere other than this MMC, and subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or invariant sections, and (2) were thus incorporated prior to November 1, 2008.

If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.

Cognitive science

to understand these faculties, cognitive scientists borrow from fields such as linguistics, psychology, artificial intelligence, philosophy, neuroscience, and anthropology.[3]

The fundamental concept of cognitive science is that 'thinking can best be understood in terms of representational structures in the mind and computational procedures that operate on those structures.'[3]

The goal of cognitive science is to understand the principles of intelligence with the hope that this will lead to better comprehension of the mind and of learning and to develop intelligent devices. The

Even if the technology to map out every neuron in the brain in real-time were available, and it were known when each neuron was firing, it would still be impossible to know how a particular firing of neurons translates into the observed behavior.

The Embodied Mind: Cognitive Science and Human Experience says, “the new sciences of the mind need to enlarge their horizon to encompass both lived human experience and the possibilities for transformation inherent in human experience.”[4]

Cognitive science is an interdisciplinary field with contributors from various fields, including psychology, neuroscience, linguistics, philosophy of mind, computer science, anthropology, sociology, and biology.

The field regards itself as compatible with the physical sciences and uses the scientific method as well as simulation or modeling, often comparing the output of models with aspects of human cognition.

Many, but not all, who consider themselves cognitive scientists hold a functionalist view of the mind—the view that mental states and processes should be explained by their function - what they do.

This conceptualization is very broad, and should not be confused with how 'cognitive' is used in some traditions of analytic philosophy, where 'cognitive' has to do only with formal rules and truth conditional semantics.

Among philosophers, classical cognitivists have largely de-emphasized or avoided social and cultural factors, emotion, consciousness, animal cognition, and comparative and evolutionary psychologies.

With the newfound emphasis on information processing, observable behavior was no longer the hallmark of psychological theory, but the modeling or recording of mental states.

One way to view the issue is whether it is possible to accurately simulate a human brain on a computer without accurately simulating the neurons that make up the human brain.

Some of the driving research questions in studying how the brain itself processes language include: (1) To what extent is linguistic knowledge innate or learned?, (2) Why is it more difficult for adults to acquire a second-language than it is for infants to acquire their first-language?, and (3) How are humans able to understand novel sentences?

In the last fifty years or so, more and more researchers have studied knowledge and use of language as a cognitive phenomenon, the main problems being how knowledge of language can be acquired and used, and what precisely it consists of.[10]

Linguists have found that, while humans form sentences in ways apparently governed by very complex systems, they are remarkably unaware of the rules that govern their own speech.

Infants are born with little or no knowledge (depending on how knowledge is defined), yet they rapidly acquire the ability to use language, walk, and recognize people and objects.

Although clearly both genetic and environmental input is needed for a child to develop normally, considerable debate remains about how genetic information might guide cognitive development.

have argued that specific information containing universal grammatical rules must be contained in the genes, whereas others (such as Jeffrey Elman and colleagues in Rethinking Innateness) have argued that Pinker's claims are biologically unrealistic.

Declarative memory—grouped into subsets of semantic and episodic forms of memory—refers to our memory for facts and specific knowledge, specific meanings, and specific experiences (e.g.

Some questions in the study of visual perception, for example, include: (1) How are we able to recognize objects?, (2) Why do we perceive a continuous visual environment, even though we only see small bits of it at any one time?

As the field is highly interdisciplinary, research often cuts across multiple areas of study, drawing on research methods from psychology, neuroscience, computer science and systems theory.

Lewandowski and Strohmetz (2009) review a collection of innovative uses of behavioral measurement in psychology including behavioral traces, behavioral observations, and behavioral choice.[12]

Behavioral traces are pieces of evidence that indicate behavior occurred, but the actor is not present (e.g., litter in a parking lot or readings on an electric meter).

The first is focused on abstract mental functions of an intelligent mind and operates using symbols, and the second, which follows the neural and associative properties of the human brain, is called subsymbolic.

All the above approaches tend to be generalized to the form of integrated computational models of a synthetic/abstract intelligence, in order to be applied to the explanation and improvement of individual and social/organizational decision-making and reasoning.[13]

Cognitive science has given rise to models of human cognitive bias and risk perception, and has been influential in the development of behavioral finance, part of economics.

Fields of cognitive science have been influential in understanding the brain's particular functional systems (and functional deficits) ranging from speech production to auditory processing and visual perception.

It has made progress in understanding how damage to particular areas of the brain affect cognition, and it has helped to uncover the root causes and results of specific dysfunction, such as dyslexia, anopia, and hemispatial neglect.

However, although these early writers contributed greatly to the philosophical discovery of mind and this would ultimately lead to the development of psychology, they were working with an entirely different set of tools and core concepts than those of the cognitive scientist.

The modern culture of cognitive science can be traced back to the early cyberneticists in the 1930s and 1940s, such as Warren McCulloch and Walter Pitts, who sought to understand the organizing principles of the mind.

Researchers such as Marvin Minsky would write computer programs in languages such as LISP to attempt to formally characterize the steps that human beings went through, for instance, in making decisions and solving problems, in the hope of better understanding human thought, and also in the hope of creating artificial minds.

While both connectionism and symbolic approaches have proven useful for testing various hypotheses and exploring approaches to understanding aspects of cognition and lower level brain functions, neither are biologically realistic and therefore, both suffer from a lack of neuroscientific plausibility.[21][22][23][24][25][26][27]

Connectionism has proven useful for exploring computationally how cognition emerges in development and occurs in the human brain, and has provided alternatives to strictly domain-specific / domain general approaches.

Anthropologists Dan Sperber, Edwin Hutchins, and Scott Atran, have been involved in collaborative projects with cognitive and social psychologists, political scientists and evolutionary biologists in attempts to develop general theories of culture formation, religion, and political association.

Artificial Neural Networks/Neural Network Basics

Artificial Neural Networks, also known as “Artificial neural nets”, “neural nets”, or ANN for short, are a computational tool modeled on the interconnection of the neuron in the nervous systems of the human brain and that of other organisms.

Artificial neural networks are very different from biological networks, although many of the concepts and characteristics of biological systems are faithfully reproduced in the artificial systems. Artificial

neural nets are a type of non-linear processing system that is ideally suited for a wide range of tasks, especially tasks where there is no existing algorithm for task completion.

With proper training, ANN are capable of generalization, the ability to recognize similarities among different input patterns, especially patterns that have been corrupted by noise.

The term “Neural Net” refers to both the biological and artificial variants, although typically the term is used to refer to artificial systems only.

Each neuron is a multiple-input, multiple-output (MIMO) system that receives signals from the inputs, produces a resultant signal, and transmits that signal to all outputs.

However, to reproduce the effect of the synapse, the connections between PE are assigned multiplicative weights, which can be calibrated or “trained” to produce the proper system output.

Where ζ is the weighted sum of the inputs (the inner product of the input vector and the tap-weight vector), and σ(ζ) is a function of the weighted sum.

If we recognize that the weight and input elements form vectors w and x, the ζ weighted sum becomes a simple dot product:

The dotted line in the center of the neuron represents the division between the calculation of the input sum using the weight vector, and the calculation of the output value using the activation function.

Neural networks tend to have one input per degree of freedom in the input space, and one output per degree of freedom in the output space.

Expert systems, by contrast, are used in situations where there is insufficient data and theoretical background to create any kind of a reliable problem model.

Expert systems emulate the deduction processes of a human expert, by collecting information and traversing the solution space in a directed manner.

Though such assumptions are not required, it has been found that the addition of such a priori information as the statistical distribution of the input space can help to speed training.

During training, the neural network performs the necessary analytical work, which would require non-trivial effort on the part of the analyst if other methods were to be used.

learning paradigm is supervised, unsupervised or a hybrid of the two, and reflects the method in which training data is presented to the neural network.

A learning rule is a model for the types of methods to be used to train the system, and also a goal for what types of results are to be produced.

During training, care must be taken not to provide too many input examples and different numbers of training examples could produce very different results in the quality and robustness of the network.

Some of the more important parameters in terms of training and network capacity are the number of hidden neurons, the learning rate and the momentum parameter.

These neurons are essentially hidden from view, and their number and organization can typically be treated as a black box to people who are interfacing with the system.

Square root of the sum of squared differences between the network targets and actual outputs divided by number of patterns (only for training by minimum error).

Learn Python - Full Course for Beginners

This course will give you a full introduction into all of the core concepts in python. Follow along with the videos and you'll be a python programmer in no time!