AI News, Auto-Generating Clickbait With Recurrent NeuralNetworks
Auto-Generating Clickbait With Recurrent NeuralNetworks
What if we could automate the writing of these, thus freeing up clickbait writers to do useful work? If this sort of writing truly is formulaic and unoriginal, we should be able to produce it automatically.
Recently, as people have figured out how to train deep (multi-layered) neural nets, very powerful models have been created, increasing the hype surrounding this so-called deep learning.
So, given a string of words like “Which Disney Character Are __”, we want the network to produce a reasonable guess like “You”, rather than, say, “Spreadsheet”. If this model can learn to predict the next word with some accuracy, we get a language model that tells us something about the texts we trained it on.
If we ask this model to guess the next word, and then add that word to the sequence and ask it for the next word after that, and so on, we can generate text of arbitrary length.
During training, we tweak the weights of this network so as to minimize the prediction error, maximizing its ability to guess the right next word.
The hope is that having a continuous rather than discrete representation for words will allow the network to make better mistakes, as long as similar words get similar vectors.
Whereas traditional neural nets are built around stacks of simple units that do a weighted sum followed by some simple non-linear function (like a tanh), we’ll use a more complicated unit called Long Short-Term Memory (LSTM).
Even if it can learn to generate text with correct syntax and grammar, it surely can’t produce headlines that contain any new knowledge of the real world?
It’s not clear that these headlines are much more than a semi-random concatenation of topics their userbase likes, and as seen in the latter case, 100% correct grammar is not a requirement.
This is what it produces after having seen about 40000 headlines: However, after having had multiple passes through the data, the training converges and the results are remarkably better.
Here are the 10 first completions of “Barack Obama Says”: And here are the 10 first completions of “Kim Kardashian Says”: By getting the RNN to complete our sentences, we can effectively ask questions of the model.
During training, we can follow the gradient down into these word vectors and fine-tune the vector representations specifically for the task of generating clickbait, thus further improving the generalization accuracy of the complete model.
It turns out that if we then take the word vectors learned from this model of 2 recurrent layers, and stick them in an architecture with 3 recurrent layers, and then freeze them, we get even better performance.
To summarize the word vector story: Initially, some good guys at Standford invented GloVe, ran it over 6 billion tokens, and got a bunch of vectors.
I found this to be a Big Deal: It cut the training time almost in half, and found better optima, compared to using rmsprop with exponential decay.
It’s possible that similar results could be obtained with rmsprop had I found a better learning and decay rate, but I’m very happy not having to do that tuning.
In practice, this can look like the following in PostgreSQL: The articles are a result of three seperate language models: One for the headlines, one for the article bodies, and one for the author name.
The article body neural network was seeded with the words from the headline, so that the body text has a chance to be thematically consistent with the headline.
If I remember correctly from economics class, this should drive the market value of useless journalism down to zero, forcing other producers of useless journalism to produce something else.
Getting started with the Keras functional API
The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.
The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc. The
At this point, we feed into the model our auxiliary input data by concatenating it with the LSTM output: This defines a model with two inputs and two outputs: We compile the model and assign a weight of 0.2 to the auxiliary loss. To
a sequence of 280 vectors of size 256, where each dimension in the 256-dimensional vector encodes the presence/absence of a character (out of an alphabet of 256 frequent characters).
To share a layer across different inputs, simply instantiate the layer once, then call it on as many inputs as you want: Let's pause to take a look at how to read the shared layer's output or output shape.
Whenever you are calling a layer on some input, you are creating a new tensor (the output of the layer), and you are adding a 'node' to the layer, linking the input tensor to the output tensor.
The same is true for the properties input_shape and output_shape: as long as the layer has only one node, or as long as all nodes have the same input/output shape, then the notion of 'layer output/input shape' is well defined, and that one shape will be returned by layer.output_shape/layer.input_shape.
But if, for instance, you apply the same Conv2D layer to an input of shape (32, 32, 3), and then to an input of shape (64, 64, 3), the layer will have multiple input/output shapes, and you will have to fetch them by specifying the index of the node they belong to: Code examples are still the best way to get started, so here are a few more.
- On Thursday, March 21, 2019
Lecture 2 | Word Vector Representations: word2vec
Lecture 2 continues the discussion on the concept of representing words as numeric vectors and popular approaches to designing word vectors. Key phrases: Natural Language Processing. Word...
How to Make a Text Summarizer - Intro to Deep Learning #10
I'll show you how you can turn an article into a one-sentence summary in Python with the Keras machine learning library. We'll go over word embeddings, encoder-decoder architecture, and the...
Lecture 9: Machine Translation and Advanced Recurrent LSTMs and GRUs
Lecture 9 recaps the most important concepts and equations covered so far followed by machine translation and fancy RNN models tackling MT. Key phrases: Language Models. RNN. Bi-directional...
Lecture 6: Dependency Parsing
Lecture 6 covers dependency parsing which is the task of analyzing the syntactic dependency structure of a given input sentence S. The output of a dependency parser is a dependency tree where...
Tinashe - Company
"Company" Available Now: Apple Music: Spotify: Amazon: Google Play:
Designing Online Marketplaces
Susan Athey delivered this talk on May 8, 2015 at the Institute for Social Sciences conference series Leading Research in the Social Sciences Today. Athey is the economics of technology professor...
Prosensa Update on Duchenne Program (August 2013)
Machine Learning Tutorial: Simple Example of Linear Regression & Neural Networks Basics
In the second video about Machine Learning and Artificial Intelligence I stick to the basics of how Machine Learning works and why we need it for AI systems. I explain what are Neural Networks...
Google I/O 2010 - A beginner's guide to Android
Google I/O 2010 - A beginner's guide to Android Android 101 Reto Meier This session will introduce some of the basic concepts involved in Android development. Starting with an overview...
Sirius : The Movie, from Dr. Steven Greer (entier, HD, SOUS-TITRES Français + english subtitles))
"SIRIUS: du Dr Steven Greer - Film Documentaire original entier La Terre a été visité par les civilisations Inter-Stellar avancées qui peuvent voyager à travers d'autres dimensions plus...