AI News, Neural Networks, Manifolds, and Topology

Neural Networks, Manifolds, and Topology

Recently, there’s been a great deal of excitement and interest in deep neural networks because they’ve achieved breakthrough results in areas such as computer vision.1 However, there remain a number of concerns about them.

While it is challenging to understand the behavior of deep neural networks in general, it turns out to be much easier to explore low-dimensional deep neural networks – networks that only have a few neurons in each layer.

This perspective will allow us to gain deeper intuition about the behavior of neural networks and observe a connection linking neural networks to an area of mathematics called topology.

The obvious way to visualize the behavior of a neural network – or any classification algorithm, for that matter – is to simply look at how it classifies every possible data point.

A tanh layer \(\tanh(Wx+b)\) consists of: We can visualize this as a continuous transformation, as follows: The story is much the same for other standard layers, consisting of an affine transformation followed by pointwise application of a monotone activation function.

(Andrej Karpathy has made a nice demo based on ConvnetJS that allows you to interactively explore networks with this sort of visualization of training!) Each layer stretches and squishes space, but it never cuts, breaks, or folds it.

As mentioned previously, classification with a sigmoid unit or a softmax layer is equivalent to trying to find a hyperplane (or in this case a line) that separates \(A\) and \(B\) in the final represenation.

Since we’re dealing with something homeomorphic to the original dataset, \(A\) is surrounded by \(B\), and collapsing on any axis means we will have some points of \(A\) and \(B\) mix and become impossible to distinguish between.

To get a better sense of what’s going on, let’s consider an even simpler dataset that’s 1-dimensional: \[A = [-\frac{1}{3}, \frac{1}{3}]\] \[B = [-1, -\frac{2}{3}] \cup [\frac{2}{3}, 1]\] Without using a layer of two or more hidden units, we can’t classify this dataset.

Sometimes when we see a link, it isn’t immediately obvious whether it’s an unlink (a bunch of things that are tangled together, but can be separated by continuous deformation) or not.

(Question: Can all unlinks be classified by a network with only 3 units, theoretically?) From this knot perspective, our continuous visualization of the representations produced by a neural network isn’t just a nice animation, it’s a procedure for untangling links.

Theorem: There is an ambient isotopy between the input and a network layer’s representation if: a) \(W\) isn’t singular, b) we are willing to permute the neurons in the hidden layer, and c) there is more than 1 hidden unit.

imagine there is probably interest in programs automatically discovering such ambient isotopies and automatically proving the equivalence of certain links, or that certain links are separable.

This doesn’t bode well for neural networks.) The sort of links we’ve talked about so far don’t seem likely to turn up in real world data, but there are higher dimensional generalizations.

All \(n\)-dimensional manifolds can be untangled in \(2n+2\) dimensions.6 (I know very little about knot theory and really need to learn more about what’s known regarding dimensionality and links.

If we know a manifold can be embedded in n-dimensional space, instead of the dimensionality of the manifold, what limit do we have?) The natural thing for a neural net to do, the very easy route, is to try and pull the manifolds apart naively and stretch the parts that are tangled as thin as possible.

We know these things happen.7 Contractive penalties, penalizing the derivatives of the layers at data points, are the natural way to fight this.8 Since these sort of local minima are absolutely useless from the perspective of trying to solve topological problems, topological problems may provide a nice motivation to explore fighting these issues.

In particular, in an optimization problem where local minima are a big problem, picking an architecture that can’t genuinely solve the problem seems like a recipe for bad performance.) The more I think about standard neural network layers – that is, with an affine transformation followed by a point-wise activation function – the more disenchanted I feel.

The thing that feels natural to me is to learn a vector field with the direction we want to shift the manifold: And then deform space based on it: One could learn the vector field at fixed points (just take some fixed points from the training set to use as anchors) and interpolate in some manner.

The vector field above is of the form: \[F(x) = \frac{v_0f_0(x) + v_1f_1(x)}{1+f_0(x)+f_1(x)}\] Where \(v_0\) and \(v_1\) are vectors and \(f_0(x)\) and \(f_1(x)\) are n-dimensional gaussians.

I think a nice approach is to classify each element of the mini-batch based on the classes of other elements of the mini-batch, giving each one a weight of 1/(distance from classification target).9 Sadly, even with sophisticated architecture, using k-NN only gets down to 5-4% test error – and using simpler architectures gets worse results.

On Characterizing the Capacity of Neural Networks using Algebraic Topology

The learnability of different neural architectures can be characterized directly by computable measures of data complexity. In this talk, we reframe the problem of ...

Lecture 6 | Training Neural Networks I

In Lecture 6 we discuss many practical issues for training modern neural networks. We discuss different activation functions, the importance of data ...

Steve Purves - Graph Convolutional Networks for Node Classification

Steve Purves of Expero gave this presentation for GraphDay / Data Day Texas 2018. For more graph talks, come to Graph Day SF in September: ...

2. Deep Auto Encoders

Video from Coursera - University of Toronto - Course: Neural Networks for Machine Learning:

Understanding Deep Architectures and the Effect of Unsupervised Pre-training

Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants, with ...

1. From PCA to Autoencoders

Video from Coursera - University of Toronto - Course: Neural Networks for Machine Learning:

Lecture 16 | Adversarial Examples and Adversarial Training

In Lecture 16, guest lecturer Ian Goodfellow discusses adversarial examples in deep learning. We discuss why deep networks and other machine learning ...

Artificial Neural Networks Demystified | Lecture 2

Deep Learning Crash Course playlist: Further reading: Deep Learning by Ian ..

Lesson 14: Cutting Edge Deep Learning for Coders

Deep learning has generally been associated with unstructured data such as images, language, and audio. However it turns out that the structured data found in ...

Lecture 13 | Generative Models

In Lecture 13 we move beyond supervised learning, and discuss generative modeling as a form of unsupervised learning. We cover the autoregressive ...