AI News, Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

<?xml version="1.0" encoding="UTF-8"?>Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition

This task is a well studied subproblem in visual perception and tests a core problem of visual perception: context independent basic-level object recognition within brief visual presentation.

The task is to determine the category of an object instance that is presented under the effect of image variations due to object exemplar, geometric transformations (position, scale, and rotation/pose), and background.

Therefore, this task is difficult computationally and is performed at high proficiency by primates, with evidence that the primate ventral visual stream produces an effective representation in IT cortex.

An image is constructed by first choosing one of seven categories, then one of seven 3D object exemplars from that category, then a randomly chosen background image (each background image is used only once), and finally the variation parameters are drawn from a distribution to span two full octaves of scale variation, the full width of the image for translation variation, and the full sphere for pose variation.

Advantageously, this procedure eliminates dependencies between objects and backgrounds that may be found in real-world images [38], and introduces a controlled amount of variability or difficulty in the task, which we have used to produce image datasets that are known to be difficult for algorithms [35], [36], [39].

The behavioral context that we seek to address is a sub-problem of general visual behavior: vision in a natural duration fixation, or visual object recognition within one fixation without contextual influence, eye movements, or shifts in attention (also called “core visual object recognition” [6]).

As a first step to evaluate the neural representation, we recorded multi-unit and single-unit neural activity from awake behaving rhesus macaques during passive fixation.

To create a neural feature vector, which we use to assess object representational performance, we presented each image (1960 images in total) for 100 ms and measured the normalized, background subtracted firing-rate in a window from 70 ms to 170 ms post image onset, averaged over 47 repetitions (see Methods).

Kernel analysis evaluates the efficacy of the representation by measuring how the precision of the category regression problem changes as we allow the complexity of the regression function to increase [30].

Intuitively, more effective representations will achieve higher precision at the same level of complexity because they have removed irrelevant variability from the original representational space (here irrelevant variability in the original space is due to object exemplar, geometric transformation, and background).

We define complexity as the inverse of the regularization parameter () and precision as 1 minus the normalized mean-squared leave-one-example-out generalization error, such that a precision value of 0 is chance performance and 1 is perfect performance.

We also evaluated an instantiation of the HMAX model of invariant object recognition that uses sparse localized features [41] and has previously been shown to be a relatively high performing model among artificial systems [16].

The 4096 dimensional feature representation was produced by taking the penultimate layer features and averaging them over 10 image crops (the 4 corners, center, and horizontal flips for each).

2014 [27] is an extension of the high-throughput optimization strategy described in [16] that produces a heterogeneous combination of hierarchical convolutional models optimized on a supervised object recognition task through hyperparameter optimization using boosting and error-based reweighing (see [27] for details).

Before comparing the representational performance of the neural and model representations, we first evaluate the absolute representational performance of these models on the task to verify that the task we have chosen is computationally difficult.

If we reduce the difficulty of the task by reducing the magnitude range of the variations we introduce (not shown here, but see [35], [43] for such an analysis), these models are known to perform well on this task;

The experimental procedure that we used to measure the neural representation is limited by the number of neural samples (sites or number of neurons) that we can measure and by noise induced by uncontrolled experimental variability and/or intrinsic neural noise.

To equalize the sampling between the neural representation and the model representations we fix the number of neural samples (80 for the multi-unit analysis and 40 for the single-unit analysis) and model features (we will vary this number in later experiments).

Following the observation that spike counts of neurons are approximately Poisson [44], [45] and similar analyses of our own recordings, we model response variability as being proportional to the mean response.

To produce noise-matched model representations, we sample the model response dependent noise and measure the representational performance of the resulting representation using kernel analysis.

Note that we do not attempt to correct the V4 sample to the noise level observed in IT because we observed similar noise between the V4 and IT neural measurements and each sample is averaged over the same number of trials (47 trials).

2, model performance is reduced because of the subsampling and because of the added noise correction (without added noise and subsampling maximum precision is above 0.5 and with noise and subsampling does not pass 0.35).

Because of the increased noise and fewer trials collected for the single-unit measurements compared to our multi-unit measurements, the single-unit noise and sample corrected model representations achieve lower precision vs.

Such analyses are complementary because representational performance relates to the task goals (in this case category labels) and encoding models and representational similarity metrics are informative about a model's ability to capture image-dependent neural variability, even if this variability is unrelated to task goals.

To compute the representational similarity between the IT multi-unit and model representations, we computed object-level representational dissimilarity matrices (RDMs) for model and neural representations (matrices are 49x49 dimensional as there are 49 total objects).

In addition, we provide results following the methodology in [27], which first predicts the IT multi-unit site responses from the model representation and then uses these predictions to form a new representation.

Our measurements of the HMO + IT-fit representation are in general agreement with the results in [27] but vary slightly because of differences in the image set used to produce these measurements and details of the methodology used to produce the IT predictions.

Lecture 8: Recurrent Neural Networks and Language Models

Lecture 8 covers traditional language models, RNNs, and RNN language models. Also reviewed are important training problems and tricks, RNNs for other ...

Lecture 9: Machine Translation and Advanced Recurrent LSTMs and GRUs

Lecture 9 recaps the most important concepts and equations covered so far followed by machine translation and fancy RNN models tackling MT. Key phrases: ...

Faster Recurrent Neural Network Language Modeling Toolkit - Anton Bakhtin

Yandex School of Data Analysis Conference Machine Learning: Prospects and Applications Language models help to ..

Lecture 10: Neural Machine Translation and Models with Attention

Lecture 10 introduces translation, machine translation, and neural machine translation. Google's new NMT is highlighted followed by sequence models with ...

Lecture 16 | Adversarial Examples and Adversarial Training

In Lecture 16, guest lecturer Ian Goodfellow discusses adversarial examples in deep learning. We discuss why deep networks and other machine learning ...

Lecture 10 | Recurrent Neural Networks

In Lecture 10 we discuss the use of recurrent neural networks for modeling sequence data. We show how recurrent neural networks can be used for language ...

Lecture 13 | Generative Models

In Lecture 13 we move beyond supervised learning, and discuss generative modeling as a form of unsupervised learning. We cover the autoregressive ...

Yelawolf - Punk ft. Travis Barker, Juicy J

Yelawolf “PUNK” feat. Juicy J & Travis Barker is Out Now! Follow Yelawolf: .

Lecture 12 | Visualizing and Understanding

In Lecture 12 we discuss methods for visualizing and understanding the internal mechanisms of convolutional networks. We also discuss the use of ...

Leon Gatys shares a neural algorithm of artistic style

Presentation from the 2016 DeepDream Symposium at Gray Area in San Francisco.