AI News, More efficient security for cloud-based machine learning

More efficient security for cloud-based machine learning

A novel encryption method devised by MIT researchers secures data used in online neural networks, without dramatically slowing their runtimes.

But those methods have performance drawbacks that make neural network evaluation (testing and validating) sluggish — sometimes as much as million times slower — limiting their wider adoption.

In a paper presented at this week’s USENIX Security Conference, MIT researchers describe a system that blends two conventional techniques — homomorphic encryption and garbled circuits — in a way that helps the networks run orders of magnitude faster than they do with conventional approaches.

“The next step is to take real medical data and show that, even when we scale it for applications real users care about, it still provides acceptable performance.” Co-authors on the paper are Vinod Vaikuntanathan, an associate professor in EECS and a member of the Computer Science and Artificial Intelligence Laboratory, and Anantha Chandrakasan, dean of the School of Engineering and the Vannevar Bush Professor of Electrical Engineering and Computer Science.

At a certain threshold, the data is outputted to nonlinear layers that do some simpler computation, make decisions (such as identifying image features), and send the data to the next linear layer.

But they render complex neural networks inefficient, “so you wouldn’t use them for any real-world application.” Homomorphic encryption, used in cloud computing, receives and executes computation all in encrypted data, called ciphertext, and generates an encrypted result that can then be decrypted by a user.

Secret sharing The final step was ensuring both homomorphic and garbled circuit layers maintained a common randomization scheme, called “secret sharing.” In this scheme, data is divided into separate parts that are given to separate parties.

Additionally, “the first party learns nothing about the parameters of the model.” “Gazelle looks like a very elegant and carefully chosen combination of two advanced cryptographic primitives, homomorphic encryption and multiparty secure computation, that have both seen tremendous progress in the last decade,” says Bryan Parno, an associate professor of computer science and electrical engineering at Carnegie Mellon University. “Despite these advances, each primitive still has limitations;

hence the need to combine them in a clever way to achieve good performance for critical applications like machine-learning inference, and indeed, Gazelle achieves quite impressive performance gains relative to previous work in this area.

More efficient security for cloud-based machine learning

Major tech firms have launched cloud platforms that conduct computation-heavy tasks, such as, say, running data through a convolutional neural network (CNN) for image classification.

But those methods have performance drawbacks that make neural network evaluation (testing and validating) sluggish -- sometimes as much as million times slower -- limiting their wider adoption.

In a paper presented at this week's USENIX Security Conference, MIT researchers describe a system that blends two conventional techniques -- homomorphic encryption and garbled circuits -- in a way that helps the networks run orders of magnitude faster than they do with conventional approaches.

Throughout the process, the system ensures that the server never learns any uploaded data, while the user never learns anything about the network parameters.

Compared to traditional systems, however, GAZELLE ran 20 to 30 times faster than state-of-the-art models, while reducing the required network bandwidth by an order of magnitude.

'The next step is to take real medical data and show that, even when we scale it for applications real users care about, it still provides acceptable performance.'

At a certain threshold, the data is outputted to nonlinear layers that do some simpler computation, make decisions (such as identifying image features), and send the data to the next linear layer.

Homomorphic encryption, used in cloud computing, receives and executes computation all in encrypted data, called ciphertext, and generates an encrypted result that can then be decrypted by a user.

Over multiple layers, noise accumulates, and the computation needed to filter that noise grows increasingly complex, slowing computation speeds.

In an online neural network, this technique works well in the nonlinear layers, where computation is minimal, but the bandwidth becomes unwieldy in math-heavy linear layers.

By splitting and sharing the workload, the system restricts the homomorphic encryption to doing complex math one layer at a time, so data doesn't become too noisy.

Secret sharing The final step was ensuring both homomorphic and garbled circuit layers maintained a common randomization scheme, called 'secret sharing.'

New GAZELLE system that provide security to cloud-based machine learning

MIT scientists have recently developed a novel system that offers security to online neural networks.

Contrasted with customary systems, be that as it may, GAZELLE ran 20 to 30 times speedier than best in class models, while diminishing the required system transmission capacity by an order of magnitude.

student in the Department of Electrical Engineering and Computer Science (EECS) said, “In this work, we show how to efficiently do this kind of secure two-party communication by combining these two techniques in a clever way. The next step is to take real medical data and show that, even when we scale it for applications real users care about, it still provides acceptable performance.”

It receives and executes computation all in encrypted data, called ciphertext, and generates an encrypted result that can then be decrypted by a user. When applied to neural networks, this technique is particularly fast and efficient at computing linear algebra.

By part and sharing the workload, the system confines the homomorphic encryption to doing complex math one layer at a given moment, so information doesn’t turn out to be excessively noisy.

The final step was ensuring both homomorphic and garbled circuit layers maintained a common randomization scheme, called “secret sharing.” In this scheme, data is divided into separate parts that are given to separate parties.

Convolutional neural network

In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery.

They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics.[2][3]

The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers and normalization layers[citation needed].

Although fully connected feedforward neural networks can be used to learn features as well as classify data, it is not practical to apply this architecture to images.

A very high number of neurons would be necessary, even in a shallow (opposite of deep) architecture, due to the very large input sizes associated with images, where each pixel is a relevant variable.

For instance, a fully connected layer for a (small) image of size 100 x 100 has 10000 weights for each neuron in the second layer.

The convolution operation brings a solution to this problem as it reduces the number of free parameters, allowing the network to be deeper with fewer parameters.[8]

For instance, regardless of image size, tiling regions of size 5 x 5, each with the same shared weights, requires only 25 learnable parameters.

In this way, it resolves the vanishing or exploding gradients problem in training traditional multi-layer neural networks with many layers by using backpropagation[citation needed].

Convolutional networks may include local or global pooling layers[clarification needed], which combine the outputs of neuron clusters at one layer into a single neuron in the next layer.[9][10]

Each neuron in a neural network computes an output value by applying some function to the input values coming from the receptive field in the previous layer.

This reduces memory footprint because a single bias and a single vector of weights is used across all receptive fields sharing that filter, rather than each receptive field having its own bias and vector of weights.

Work by Hubel and Wiesel in the 1950s and 1960s showed that cat and monkey visual cortexes contain neurons that individually respond to small regions of the visual field.

Provided the eyes are not moving, the region of visual space within which visual stimuli affect the firing of a single neuron is known as its receptive field[citation needed].

Receptive field size and location varies systematically across the cortex to form a complete map of visual space[citation needed].

This idea appears in 1986 in the book version of the original backpropagation paper.[15]:Figure 14 Neocognitrons were developed in 1988 for temporal signals.[clarification needed][16]

Learning was thus fully automatic, performed better than manual coefficient design, and was suited to a broader range of image recognition problems and image types.

that classifies digits, was applied by several banks to recognise hand-written numbers on checks (cheques) digitized in 32x32 pixel images.

The ability to process higher resolution images requires larger and more layers of convolutional neural networks, so this technique is constrained by the availability of computing resources.

The resulting recurrent convolutional network allows for the flexible incorporation of contextual information to iteratively resolve local ambiguities.

significantly improved on the best performance in the literature for multiple image databases, including the MNIST database, the NORB database, the HWDB1.0 dataset (Chinese characters), the CIFAR10 dataset (dataset of 60000 32x32 labeled RGB images),[11]

While traditional multilayer perceptron (MLP) models were successfully used for image recognition[example needed], due to the full connectivity between nodes they suffer from the curse of dimensionality, and thus do not scale well to higher resolution images.

For example, in CIFAR-10, images are only of size 32x32x3 (32 wide, 32 high, 3 color channels), so a single fully connected neuron in a first hidden layer of a regular neural network would have 32*32*3 = 3,072 weights.

Also, such network architecture does not take into account the spatial structure of data, treating input pixels which are far apart in the same way as pixels that are close together.

Convolutional neural networks are biologically inspired variants of multilayer perceptrons that are designed to emulate the behavior of a visual cortex[citation needed].

These models mitigate the challenges posed by the MLP architecture by exploiting the strong spatially local correlation present in natural images.

Weight sharing dramatically reduces the number of free parameters learned, thus lowering the memory requirements for running the network and allowing the training of larger, more powerful networks.

The layer's parameters consist of a set of learnable filters (or kernels), which have a small receptive field, but extend through the full depth of the input volume.

During the forward pass, each filter is convolved across the width and height of the input volume, computing the dot product between the entries of the filter and the input and producing a 2-dimensional activation map of that filter.

Every entry in the output volume can thus also be interpreted as an output of a neuron that looks at a small region in the input and shares parameters with neurons in the same activation map.

When dealing with high-dimensional inputs such as images, it is impractical to connect neurons to all neurons in the previous volume because such a network architecture does not take the spatial structure of the data into account.

Convolutional networks exploit spatially local correlation by enforcing a local connectivity pattern between neurons of adjacent layers: each neuron is connected to only a small region of the input volume.

If this number is not an integer, then the strides are set incorrectly and the neurons cannot be tiled to fit across the input volume in a symmetric way.

In other words, denoting a single 2-dimensional slice of depth as a depth slice, we constrain the neurons in each depth slice to use the same weights and bias.

Since all neurons in a single depth slice share the same parameters, then the forward pass in each depth slice of the CONV layer can be computed as a convolution of the neuron's weights with the input volume (hence the name: convolutional layer).

The result of this convolution is an activation map, and the set of activation maps for each different filter are stacked together along the depth dimension to produce the output volume.

This is especially the case when the input images to a CNN have some specific centered structure, in which we expect completely different features to be learned on different spatial locations.

One practical example is when the input are faces that have been centered in the image: we might expect different eye-specific or hair-specific features to be learned in different parts of the image.

In that case it is common to relax the parameter sharing scheme, and instead simply call the layer a locally connected layer.

The pooling layer serves to progressively reduce the spatial size of the representation, to reduce the number of parameters and amount of computation in the network, and hence to also control overfitting.

The most common form is a pooling layer with filters of size 2x2 applied with a stride of 2 downsamples at every depth slice in the input by 2 along both width and height, discarding 75% of the activations.

Finally, after several convolutional and max pooling layers, the high-level reasoning in the neural network is done via fully connected layers.

Since feature map size decreases with depth, layers near the input layer will tend to have fewer filters while higher layers can have more.

Preserving more information about the input would require keeping the total number of activations (number of feature maps times number of pixel positions) non-decreasing from one layer to the next.

DropConnect is similar to dropout as it introduces dynamic sparsity within the model, but differs in that the sparsity is on the weights, rather than the output vectors of a layer.

In other words, the fully connected layer with DropConnect becomes a sparsely connected layer in which the connections are chosen at random during the training stage.

the conventional deterministic pooling operations are replaced with a stochastic procedure, where the activation within each pooling region is picked randomly according to a multinomial distribution, given by the activities within the pooling region.

An alternate view of stochastic pooling is that it is equivalent to standard max pooling but with many copies of an input image, each having small local deformations.

Using stochastic pooling in a multilayer model gives an exponential number of deformations since the selections in higher layers are independent of those below.

Since the degree of model overfitting is determined by both its power and the amount of training it receives, providing a convolutional network with more training examples can reduce overfitting.

Since these networks are usually trained with all available data, one approach is to either generate new data from scratch (if possible) or perturb existing data to create new ones.

Another simple way to prevent overfitting is to limit the number of parameters, typically by limiting the number of hidden units in each layer or limiting network depth.

Limiting the number of parameters restricts the predictive power of the network directly, reducing the complexity of the function that it can perform on the data, and thus limits the amount of overfitting.

simple form of added regularizer is weight decay, which simply adds an additional error, proportional to the sum of weights (L1 norm) or squared magnitude (L2 norm) of the weight vector, to the error at each node.

The level of acceptable model complexity can be reduced by increasing the proportionality constant, thus increasing the penalty for large weight vectors.

In other words, neurons with L1 regularization end up using only a sparse subset of their most important inputs and become nearly invariant to the noisy inputs.

Another form of regularization is to enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use projected gradient descent to enforce the constraint.

w

→

w

→

The alternative is to use a hierarchy of coordinate frames and to use a group of neurons to represent a conjunction of the shape of the feature and its pose relative to the retina.

The vectors of neuronal activity that represent pose ('pose vectors') allow spatial transformations modeled as linear operations that make it easier for the network to learn the hierarchy of visual entities and generalize across viewpoints.

(the foundation of DeepDream) increased the mean average precision of object detection to 0.439329, and reduced classification error to 0.06656, the best result to date.

The best algorithms still struggle with objects that are small or thin, such as a small ant on a stem of a flower or a person holding a quill in their hand.

For example, they are not good at classifying objects into fine-grained categories such as the particular breed of dog or species of bird, whereas convolutional neural networks handle this.

In 2015 a many-layered CNN demonstrated the ability to spot faces from a wide range of angles, including upside down, even when partially occluded, with competitive performance.

The network was trained on a database of 200,000 images that included faces at various angles and orientations and a further 20 million images without faces.

CNNs can be naturally tailored to analyze a sufficiently large collection of time series representing one week long human physical activity streams augmented by the rich clinical data (including the death register, as provided by, e.g., the NHANES study).

A simple CNN was combined with Cox-Gompertz proportional hazards model and used to produce a proof-of-concept example of digital biomarkers of aging in the form of all-causes-mortality predictor.[77]

From 1999 to 2001, Fogel and Chellapilla published papers showing how a convolutional neural network could learn to play checkers using co-evolution.

The learning process did not use prior human professional games, but rather focused on a minimal set of information contained in the checkerboard: the location and type of pieces, and the piece differential[clarify].

In December 2014, Clark and Storkey published a paper showing that a CNN trained by supervised learning from a database of human professional games could outperform GNU Go and win some games against Monte Carlo tree search Fuego 1.1 in a fraction of the time it took Fuego to play.[81]

Later it was announced that a large 12-layer convolutional neural network had correctly predicted the professional move in 55% of positions, equalling the accuracy of a 6 dan human player.

When the trained convolutional network was used directly to play games of Go, without any search, it beat the traditional search program GNU Go in 97% of games, and matched the performance of the Monte Carlo tree search program Fuego simulating ten thousand playouts (about a million positions) per move.[82]

couple of CNNs for choosing moves to try ('policy network') and evaluating positions ('value network') driving MCTS were used by AlphaGo, the first to beat the best human player at the time.[83]

Once the network parameters have converged an additional training step is performed using the in-domain data to fine-tune the network weights.

With recent advances in visual salience, spatial and temporal attention, the most critical spatial regions/temporal instants could be visualized to justify the CNN predictions.[86][87]

Convolutional deep belief networks (CDBN) have structure very similar to convolutional neural networks and are trained similarly to deep belief networks.

time delay neural network allows speech signals to be processed time-invariantly, analogous to the translation invariance offered by CNNs.[92]

Deep Learning Lecture 11: Using CNN's with Keras

Get my complete machine learning course:

How we teach computers to understand pictures | Fei Fei Li

When a very young child looks at a picture, she can identify simple elements: "cat," "book," "chair." Now, computers are getting smart enough to do that too.

FBI warns cyber criminals are plotting a mass hack against bank ATMs | Colorful Life

America's intelligence chiefs have warned banks of a major hacking threat to cash machines worldwide in the next few days. The FBI sent out a confidential alert ...

But what is the Fourier Transform? A visual introduction.

An animated introduction to the Fourier Transform, winding graphs around circles. Supported by viewers: Special thanks ..

Facebook CEO Mark Zuckerberg testifies before Congress on data scandal

Facebook CEO Mark Zuckerberg will testify today before a U.S. congressional hearing about the use of Facebook data to target voters in the 2016 election.

Siddha Ganju Interview - Embedded Deep Learning at Deep Vision

In this episode we hear from Siddha Ganju, data scientist at computer vision startup Deep Vision. Siddha joined me at the AI Conference a while back to chat ...

How Facebook surveillance state tracks and manipulates everyone, everything, and everywhere

Facebook tracks you online and offline, creating profile for targeted advertisements. Facebook knows what you buy and what sites you visit. Do you know where ...

The Future of Computational Journalism

Data science and algorithms are reshaping how the news is discovered and reported At a recent event bringing together voices from the School of Engineering ...