AI News, You thought fake news was bad? Deep fakes are where truth goes to die
You thought fake news was bad? Deep fakes are where truth goes to die
“As you know, I had the balls to withdraw from the Paris climate agreement,” he said, looking directly into the camera, “and so should you.” The video was created by a Belgian political party, Socialistische Partij Anders, or sp.a, and posted on sp.a’s Twitter and Facebook.
One woman wrote: “Humpy Trump needs to look at his own country with his deranged child killers who just end up with the heaviest weapons in schools.” Another added: “Trump shouldn’t blow so high from the tower because the Americans are themselves as dumb.” But this anger was misdirected.
The use of this machine learning technique was mostly limited to the AI research community until late 2017, when a Reddit user who went by the moniker “Deepfakes” – a portmanteau of “deep learning” and “fake” – started posting digitally altered pornographic videos.
As well as considering the threat to privacy and national security, both scholars became increasingly concerned that the proliferation of deep fakes could catastrophically erode trust between different factions of society in an already polarized political climate.
Anyone with access to this technology – from state-sanctioned propagandists to trolls – would be able to skew information, manipulate beliefs, and in so doing, push ideologically opposed online communities deeper into their own subjective realities.
“Now, I know that this would be easily refutable, but if this drops the night before, you can’t debunk it before serious damage has spread.” She added: “I’m starting to see how a well-timed deep fake could very well disrupt the democratic process.” While these disturbing hypotheticals might be easy to conjure, Tim Hwang, director of the Harvard-MIT Ethics and Governance of Artificial Intelligence Initiative, is not willing to bet on deep fakes having a high impact on elections in the near future.
“Right now, a crude Photoshop job could be just as effective as something created with machine learning.” At the same time, Hwang acknowledges that as deep fakes become more realistic and easier to produce in the coming years, they could usher in an era of forgery qualitatively different from what we have seen before.
In August, an international team of researchers affiliated with Germany’s Max Planck Institute for Informatics unveiled a technique for producing what they called “deep video portraits”, a sort of facial ventriloquism, where one person can take control of another person’s face and make it say or do things at will.
Christian Theobalt, a researcher involved in the study, told me via email that he imagines deep video portraits will be used most effectively for accurate dubbing in foreign films, advanced face editing techniques for post-production in film, and special effects.
In a press release that accompanied the original paper, the researchers acknowledged potential misuse of their technology, but emphasized how their approach – capable of synthesizing faces that look “nearly indistinguishable from ground truth” – could make “a real difference to the visual entertainment industry”.
Hany Farid, professor of computer science at the University of California, Berkeley, believes that although the machine learning-powered breakthroughs in computer graphics are impressive, researchers should be more cognizant of the broader social and political ramifications of what they’re creating.
“But outside of this world, outside of Hollywood, it is not clear to me that the positive implications outweigh the negative.” Farid, who has spent the past 20 years developing forensic technology to identify digital forgeries, is currently working on new detection methods to counteract the spread of deep fakes.
“All the programmer has to do is update the algorithm to look for, say, changes of color in the face that correspond with the heartbeat, and then suddenly, the fakes incorporate this once imperceptible sign.” (For this reason, Farid chose not to share some of his more recent forensic breakthroughs with me.
“It is that the social processes by which we collectively come to know things and hold them to be true or untrue are under threat.” Indeed, as the fake video of Trump that spread through social networks in Belgium earlier this year demonstrated, deep fakes don’t need to be undetectable or even convincing to be believed and do damage.
Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms.
Deep learning architectures such as deep neural networks, deep belief networks and recurrent neural networks have been applied to fields including computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, bioinformatics, drug design and board game programs, where they have produced results comparable to and in some cases superior to human experts.
Deep learning models are vaguely inspired by information processing and communication patterns in biological nervous systems yet have various differences from the structural and functional properties of biological brains (especially human brains), which make them incompatible with neuroscience evidences.
Most modern deep learning models are based on an artificial neural network, although they can also include propositional formulas or latent variables organized layer-wise in deep generative models such as the nodes in deep belief networks and deep Boltzmann machines.
No universally agreed upon threshold of depth divides shallow learning from deep learning, but most researchers agree that deep learning involves CAP depth >
For supervised learning tasks, deep learning methods obviate feature engineering, by translating the data into compact intermediate representations akin to principal components, and derive layered structures that remove redundancy in representation.
The universal approximation theorem concerns the capacity of feedforward neural networks with a single hidden layer of finite size to approximate continuous functions.
By 1991 such systems were used for recognizing isolated 2-D hand-written digits, while recognizing 3-D objects was done by matching 2-D images with a handcrafted 3-D object model.
But while Neocognitron required a human programmer to hand-merge features, Cresceptron learned an open number of features in each layer without supervision, where each feature is represented by a convolution kernel.
In 1994, André de Carvalho, together with Mike Fairhurst and David Bisset, published experimental results of a multi-layer boolean neural network, also known as a weightless neural network, composed of a 3-layers self-organising feature extraction neural network module (SOFT) followed by a multi-layer classification neural network module (GSN), which were independently trained.
In 1995, Brendan Frey demonstrated that it was possible to train (over two days) a network containing six fully connected layers and several hundred hidden units using the wake-sleep algorithm, co-developed with Peter Dayan and Hinton.
Simpler models that use task-specific handcrafted features such as Gabor filters and support vector machines (SVMs) were a popular choice in the 1990s and 2000s, because of ANNs' computational cost and a lack of understanding of how the brain wires its biological networks.
These methods never outperformed non-uniform internal-handcrafting Gaussian mixture model/Hidden Markov model (GMM-HMM) technology based on generative models of speech trained discriminatively.
The principle of elevating 'raw' features over hand-crafted optimization was first explored successfully in the architecture of deep autoencoder on the 'raw' spectrogram or linear filter-bank features in the late 1990s,
Many aspects of speech recognition were taken over by a deep learning method called long short-term memory (LSTM), a recurrent neural network published by Hochreiter and Schmidhuber in 1997.
showed how a many-layered feedforward neural network could be effectively pre-trained one layer at a time, treating each layer in turn as an unsupervised restricted Boltzmann machine, then fine-tuning it using supervised backpropagation.
The impact of deep learning in industry began in the early 2000s, when CNNs already processed an estimated 10% to 20% of all the checks written in the US, according to Yann LeCun.
was motivated by the limitations of deep generative models of speech, and the possibility that given more capable hardware and large-scale data sets that deep neural nets (DNN) might become practical.
However, it was discovered that replacing pre-training with large amounts of training data for straightforward backpropagation when using DNNs with large, context-dependent output layers produced error rates dramatically lower than then-state-of-the-art Gaussian mixture model (GMM)/Hidden Markov Model (HMM) and also than more-advanced generative model-based systems.
offering technical insights into how to integrate deep learning into the existing highly efficient, run-time speech decoding system deployed by all major speech recognition systems.
In 2010, researchers extended deep learning from TIMIT to large vocabulary speech recognition, by adopting large output layers of the DNN based on context-dependent HMM states constructed by decision trees.
In 2009, Nvidia was involved in what was called the “big bang” of deep learning, “as deep-learning neural networks were trained with Nvidia graphics processing units (GPUs).”
In 2014, Hochreiter's group used deep learning to detect off-target and toxic effects of environmental chemicals in nutrients, household products and drugs and won the 'Tox21 Data Challenge' of NIH, FDA and NCATS.
Although CNNs trained by backpropagation had been around for decades, and GPU implementations of NNs for years, including CNNs, fast implementations of CNNs with max-pooling on GPUs in the style of Ciresan and colleagues were needed to progress on computer vision.
In November 2012, Ciresan et al.'s system also won the ICPR contest on analysis of large medical images for cancer detection, and in the following year also the MICCAI Grand Challenge on the same topic.
In 2013 and 2014, the error rate on the ImageNet task using deep learning was further reduced, following a similar trend in large-scale speech recognition.
For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as 'cat' or 'no cat' and using the analytic results to identify cats in other images.
Over time, attention focused on matching specific mental abilities, leading to deviations from biology such as backpropagation, or passing information in the reverse direction and adjusting the network to reflect that information.
Neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis.
Despite this number being several order of magnitude less than the number of neurons on a human brain, these networks can perform many tasks at a level beyond that of humans (e.g., recognizing faces, playing 'Go'
The goal is that eventually, the network will be trained to decompose an image into features, identify trends that exist across all samples and classify new images by their similarities without requiring human input.
The extra layers enable composition of features from lower layers, potentially modeling complex data with fewer units than a similarly performing shallow network.
The training process can be guaranteed to converge in one step with a new batch of data, and the computational complexity of the training algorithm is linear with respect to the number of neurons involved.
that involve multi-second intervals containing speech events separated by thousands of discrete time steps, where one time step corresponds to about 10 ms.
All major commercial speech recognition systems (e.g., Microsoft Cortana, Xbox, Skype Translator, Amazon Alexa, Google Now, Apple Siri, Baidu and iFlyTek voice search, and a range of Nuance speech products, etc.) are based on deep learning.
DNNs have proven themselves capable, for example, of a) identifying the style period of a given painting, b) 'capturing' the style of a given painting and applying it in a visually pleasing manner to an arbitrary photograph, and c) generating striking imagery based on random visual input fields.
Word embedding, such as word2vec, can be thought of as a representational layer in a deep learning architecture that transforms an atomic word into a positional representation of the word relative to other words in the dataset;
Finding the appropriate mobile audience for mobile advertising is always challenging, since many data points must be considered and assimilated before a target segment can be created and used in ad serving by any ad server.
'Deep anti-money laundering detection system can spot and recognize relationships and similarities between data and, further down the road, learn to detect anomalies or classify and predict specific events'.
Deep learning is closely related to a class of theories of brain development (specifically, neocortical development) proposed by cognitive neuroscientists in the early 1990s.
These developmental models share the property that various proposed learning dynamics in the brain (e.g., a wave of nerve growth factor) support the self-organization somewhat analogous to the neural networks utilized in deep learning models.
Like the neocortex, neural networks employ a hierarchy of layered filters in which each layer considers information from a prior layer (or the operating environment), and then passes its output (and possibly the original input), to other layers.
Other researchers have argued that unsupervised forms of deep learning, such as those based on hierarchical generative models and deep belief networks, may be closer to biological reality.
researchers at The University of Texas at Austin (UT) developed a machine learning framework called Training an Agent Manually via Evaluative Reinforcement, or TAMER, which proposed new methods for robots or computer programs to learn how to perform tasks by interacting with a human instructor.
Such techniques lack ways of representing causal relationships (...) have no obvious ways of performing logical inferences, and they are also still a long way from integrating abstract knowledge, such as information about what objects are, what they are for, and how they are typically used.
systems, like Watson (...) use techniques like deep learning as just one element in a very complicated ensemble of techniques, ranging from the statistical technique of Bayesian inference to deductive reasoning.'
As an alternative to this emphasis on the limits of deep learning, one author speculated that it might be possible to train a machine vision stack to perform the sophisticated task of discriminating between 'old master' and amateur figure drawings, and hypothesized that such a sensitivity might represent the rudiments of a non-trivial machine empathy.
In further reference to the idea that artistic sensitivity might inhere within relatively low levels of the cognitive hierarchy, a published series of graphic representations of the internal states of deep (20-30 layers) neural networks attempting to discern within essentially random data the images on which they were trained
Learning a grammar (visual or linguistic) from training data would be equivalent to restricting the system to commonsense reasoning that operates on concepts in terms of grammatical production rules and is a basic goal of both human language acquisition
Such a manipulation is termed an “adversarial attack.” In 2016 researchers used one ANN to doctor images in trial and error fashion, identify another's focal points and thereby generate images that deceived it.
Another group showed that certain psychedelic spectacles could fool a facial recognition system into thinking ordinary people were celebrities, potentially allowing one person to impersonate another.
ANNs can however be further trained to detect attempts at deception, potentially leading attackers and defenders into an arms race similar to the kind that already defines the malware defense industry.
ANNs have been trained to defeat ANN-based anti-malware software by repeatedly attacking a defense with malware that was continually altered by a genetic algorithm until it tricked the anti-malware while retaining its ability to damage the target.
Real or Fake? AI Is Making It Very Hard to Know
Powerful machine-learning techniques (see “The Dark Secret at the Heart of AI”) are making it increasingly easy to manipulate or generate realistic video and audio, and to impersonate anyone you want with amazing accuracy.
smartphone app called FaceApp, released recently by a company based in Russia, can automatically modify someone’s face to add a smile, add or subtract years, or swap genders.
“Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries,” reads an ethics statement posted to the company’s website.
This means the company is applying a technique that has emerged in recent years as a way of getting algorithms to go beyond just learning to classify things and generate plausible data of their own.
“It’s more challenging because there is a lot of variability in the high dimensional space representing videos, and current models for it are still not perfect.” Given the technologies that are now emerging, it may become increasingly important to be able to detect fake video and audio.
Man müsse lediglich „die Facelet-Datei von Julia Roberts in arnold_schwarzenegger.m4f“ umbenennen, schon kann man sich „Notting Hill“ mit dem österreichischen Haudegen in der Hauptrolle anschauen.
Es begann im Dezember letzten Jahres: Ein Reddit-Nutzer mit dem Pseudonym „deepfakes“ (offenbar ein Kofferwort aus „Deep Learning“ und „Fake“) veröffentlichte einen Porno-Clip, in dem „Wonder Woman“-Hauptdarstellerin Gal Gadot zu sehen war.
generell standen aber wie bei der alten Diskussion über Bildmontagen und Photoshop ethische Fragen im Mittelpunkt: Darf ein Algorithmus frei verfügbare Video- und Fotodaten so zusammensetzen, dass hinterher ein Gesicht auf einem fremden Kopf sitzt?
Die Leidtragenden der Technik sind nicht nur die Medienkonsumenten, die künftig bei „Beweisvideos“ noch genauer hinschauen müssen, sondern vor allem die unfreiwilligen (Porno-)Darsteller.
„Es zeigt, dass einige Männer Frauen ausschließlich als Objekte sehen, die sie manipulieren und zu allem zwingen können“, zitiert das Vice-Magazin die langjährige Pornodarstellerin Alia Janine: „Diesen Männern fehlt der Respekt vor Schauspielerinnen und Pornodarstellerinnen.“
Wer unfreiwillig in peinlichen Videos landet, kann die Produzenten obendrein wegen Beleidigung, Verleumdung und übler Nachrede anzeigen – Deepfakes können also sogar strafrechtlich relevant sein.
„Ich neige nicht zu Hysterie oder Übertreibung, aber wir können uns vermutlich darauf einigen, dass so etwas nicht komplett ausgeschlossen ist“, so Hany Farid.
Sondern auch, weil man die technischen Hintergründe kennen muss, um sich über das Thema eine differenzierte Meinung bilden zu können – und um gefälschte Videos zu erkennen.
The Difference Between A.I. and Machine Learning and Deep Learning
There's a discussion going on about the topic we are covering today: what’s the difference between AI and machine learning and deep learning.
In this video, we are going to break this down for you, giving you examples of use cases making the difference between ai and machine learning and deep learning more clear.Any device that perceives its environment and takes actions to maximize its chances of success, can be said to have some kind of artificial intelligence, more frequently referred to as A.I.
It is called “deep” because it makes use of deep artificial neural networks.Also discussed in this video: Difference between ai and machine learningDifference between ai and machine learning and deep learning Artificial intelligenceMachine learningDeep learning Difference AI MLDifference AI machine learning Difference ai machine learning deep learningAIML-------------------------------------------------------Amsterdam bound?Want to make AI your secret weapon?
- On Wednesday, January 16, 2019
The Rise of the Machines – Why Automation is Different this Time
Automation in the Information Age is different. Books we used for this video: The Rise of the Robots: The Second Machine Age: ..
How we teach computers to understand pictures | Fei Fei Li
When a very young child looks at a picture, she can identify simple elements: "cat," "book," "chair." Now, computers are getting smart enough to do that too.
Artificial Intelligence: it will kill us | Jay Tuck | TEDxHamburgSalon
For more information on Jay Tuck, please visit our website US defense expert Jay Tuck was news director of the daily news program ..
What happens when our computers get smarter than we are? | Nick Bostrom
Artificial intelligence is getting smarter by leaps and bounds — within this century, research suggests, a computer AI could be as "smart" as a human being.
Hybrid Deep Learning for Anomaly Detection with Sung-Bae Cho
In the field of deep learning, a generative model via an adversarial process gets a great attention due to the amazing demonstration of performance.
Prof. Nick Bostrom Speaks to the UK Parliament's Artificial Intelligence Committee
Recorded: October 10th, 2017 Witnesses: Professor Wendy Hall, Professor of Computer Science, University of Southampton Professor Nick Bostrom, Director, ...
You are a Simulation & Physics Can Prove It: George Smoot at TEDxSalford
Astrophysicist, cosmologist and Nobel Prize winner George Smoot studies the cosmic microwave background radiation — the afterglow of the Big Bang.
Can we build AI without losing control over it? | Sam Harris
Scared of superintelligent AI? You should be, says neuroscientist and philosopher Sam Harris -- and not just in some theoretical way. We're going to build ...
How Does Your Phone Know This Is A Dog?
Check out this other video about machine learning! SUBSCRIBE: What do voice search, machine translation, and .