AI News, Artificial intelligence in action

Artificial intelligence in action

A person watching videos that show things opening — a door, a book, curtains, a blooming flower, a yawning dog — easily understands the same type of action is depicted in each clip.

Learning from dynamic scenes The goal is to provide deep-learning algorithms with large coverage of an ecosystem of visual and auditory moments that may enable models to learn information that isn’t necessarily taught in a supervised manner and to generalize to novel situations and tasks, say the researchers.

“This dataset can serve as a new challenge to develop AI models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis,” Oliva adds, describing the factors involved.

Oliva and Gutfreund, along with additional researchers from MIT and IBM, met weekly for more than a year to tackle technical issues, such as how to choose the action categories for annotations, where to find the videos, and how to put together a wide array so the AI system learns without bias.

Now we have reached the milestone of 1 million videos for visual AI training, and people can go to our website, download the dataset and our deep-learning computer models, which have been taught to recognize actions.” Qualitative results so far have shown models can recognize moments well when the action is well-framed and close up, but they misfire when the category is fine-grained or there is background clutter, among other things.

This first version of the Moments in Time dataset is one of the largest human-annotated video datasets capturing visual and audible short events, all of which are tagged with an action or activity label among 339 different classes that include a wide range of common verbs.

The researchers intend to produce more datasets with a variety of levels of abstraction to serve as stepping stones toward the development of learning algorithms that can build analogies between things, imagine and synthesize novel events, and interpret scenarios.

Helping AI master video understanding

I am part of the team at the MIT IBM Watson AI Lab that is carrying out fundamental AI research to push the frontiers of core technologies that will advance the state-of-the-art in AI video comprehension.

Great progress has been made and I am excited to share that we are releasing the Moments in Time Dataset, a large-scale dataset of one million three-second annotated video clips for action recognition to accelerate the development of technologies and models that enable automatic video understanding for AI.

lot can happen in a moment of time: a girl kicking a ball, behind her on the path a woman walks her dog, on a park bench nearby a man is reading a book and high above a bird flies in the sky.

When asked to describe such a moment, a person can quickly identify objects (girl, ball, bird, book), the scene (park) and the actions that are taking place (kicking, walking, reading, flying).

While new algorithmic ideas have emerged over the years, this success can be largely credited to two other factors: massive labeled datasets and significant improvements in computational capacities, which allowed processing these datasets and training models with millions of parameters in reasonable time scales.

We have been working over the past year in close collaboration with Dr. Aude Oliva and her team from MIT, where we are tackling the specific challenge of action recognition, an important first step in helping computers understand activities which can ultimately be used to describe complex events (e.g.

In other words, this is a relatively short period of time, but still long enough for humans to process consciously (as opposed to time spans associated with sensory memory, which unconsciously processes events that occur in fractions of a second).

encourage you to leverage the Dataset for your own research and share your experiences to foster progress and new thinking.  Visit the website to obtain the dataset, read our technical paper that explains the approach we took in designing the dataset and see examples of annotated videos that our system was tested on.

An Update from the MIT-IBM Watson Lab

A person watching videos that show things opening—a door, a book, curtains, a blooming flower, a yawning dog—easily understands that the same type of action is depicted in each clip.

“This data set can serve as a new challenge to develop AI models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis,” Oliva adds.

Oliva and Gutfreund, along with additional researchers from MIT and IBM, met weekly for more than a year to tackle technical issues, such as how to choose the action categories for annotations, where to find the videos, and how to put together a wide array so the AI system learns without bias.

One key goal at the lab is the development of AI systems that move beyond specialized tasks to tackle more complex problems and benefit from robust and continuous learning.

In addition to pairing the unique technical and scientific strengths of each organization, IBM is also bringing MIT researchers an influx of resources, signaled by its $240 million investment in AI efforts over the next 10 years, dedicated to the MIT-IBM Watson AI Lab.

Now people can go to our website and download the data set and our deep-learning computer models.” In addition, Oliva says, MIT and IBM researchers have published an article describing the performance of neural network models trained on the data set, which itself was deepened by shared viewpoints.

Network Dissection: Quantifying Interpretability of Deep Visual Representations

David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba We propose a general framework called Network Dissection for quantifying the ...

Keep Calm and Trust your Model - On Explainability of Machine Learning Models: Praveen Sridhar

The accuracy of Machine Learning models is going up by the day with advances in Deep Learning. But this comes at a cost of explainability of these models.

Secure and Privacy - Preserving Data Analytics and Machine Learning

Prof. Dawn Song, Professor at the Computer Science Divsion, University of California Academic Perspectives on Cybersecurity Challenges Cyberweek 2017 Tel ...

How Climate Scientists Predict the Future

Over the years, scientists have made a lot of predictions about how Earth's climate is changing, but they don't just pull those predictions from thin air. We're ...

SEM Series Part 3: Exploratory Factor Analysis

I'VE CREATED AN UPDATED SEM SERIES: In this video I demonstrate an EFA ..

Lec-13 Transportation Problems

Lecture Series on Fundamentals of Operations Research by Prof.G.Srinivasan, Department of Management Studies, IIT Madras. For more details on NPTEL visit ...

UK TechDays Online is back!

This summer, we're setting up studio at the Microsoft Reactor in London and broadcasting through London Tech Week, bringing you a mix of deep technical ...

Neural Network Fundamentals (Part1): Input and Output

From A simple introduction to how to represent the XOR operator to machine learning structures, such as a neural network or ..

What is data visualisation?

Data visualisation involves the visual presentation of data to communicate the stories contained in the dataset. Data visualisation can communicate complex ...

Ethics of AI

Detecting people, optimising logistics, providing translations, composing art: artificial intelligence (AI) systems are not only changing what and how we are doing ...