AI News, Inputting Image Data into TensorFlow for Unsupervised DeepLearning

Inputting Image Data into TensorFlow for Unsupervised DeepLearning

In order to help people more rapidly leverage their own data and the wealth of unsupervised models that are being created with TensorFlow, I developed a solution that (1) translates image datasets into a file structured similarly to the MNIST datasets (github repo) and (2) loads these datasets for use in new models.

To solve the first part, I modified existing solutions that demonstrate how to decode the MNIST binary file into a csv file and allowed for the additional possibility of saving the data as images in a directory (also worked well for testing the decoding and encoding process): I

25 Open Datasets for Deep Learning Every Data Scientist Must Work With

The key to getting better at deep learning (or most fields in life) is practice.

In this article, we have listed a collection of high quality datasets that every deep learning enthusiast should work on to apply and improve their skillset. Working on these datasets will make you a better data scientist and the amount of learning you will have will be invaluable in your career.

So make sure you have a fast internet connection with no / very high limit on the amount of data you can download.

You can use them to hone your skills, understand how to identify and structure each problem, think of unique use cases and publish your findings for everyone to see!

It’s a dataset of handwritten digits and contains a training set of 60,000 examples and a test set of 10,000 examples.

It’s a good database for trying learning techniques and deep recognition patterns on real-world data while spending minimum time and effort in data preprocessing.

It has several features: Size: ~25 GB (Compressed) Number of Records: 330K images, 80 object categories, 5 captions per image, 250,000 people with key points SOTA : Mask R-CNN

The dataset contains a training set of 9,011,219 images, a validation set of 41,260 images and a test set of 125,436 images.

Size: 500 GB (Compressed) Number of Records: 9,011,219 images with more than 5k labels SOTA : Resnet 101 image classification model (trained on V2 data): Model checkpoint, Checkpoint readme, Inference code.

Some of the interesting features of this dataset are: Size: 25 GB (Compressed) Number of Records: 265,016 images, at least 3 questions per image, 10 ground truth answers per question SOTA : Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Size: 80 MB Number of Records: 25,000 highly polar movie reviews for training, and 25,000 for testing SOTA : Learning Structured Text Representations

It consists of millions of user reviews, businesses attributes and over 200,000 pictures from multiple metropolitan areas.

Size: 2.66 GB JSON, 2.9 GB SQL and 7.5 GB Photos (all compressed) Number of Records: 5,200,000 reviews, 174,000 business attributes, 200,000 pictures and 11 metropolitan areas SOTA : Attentive Convolution

A few characteristic excerpts of many dance styles are provided in real audio format. Below are a few characteristics of the dataset: Size: 14GB (Compressed) Number of Records: ~700 audio samples SOTA : A Multi-Model Approach To Beat Tracking Considering Heterogeneous Music Styles

The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Its purposes are: The core of the dataset is the feature analysis and metadata for one million songs.

If you’re looking for a starting point, check out already prepared Acoustic models that are trained on this data set at and language models, suitable for evaluation, at

Size: 150 MB Number of Records: 100,000 utterances by 1,251 celebrities SOTA : VoxCeleb: a large-scale speaker identification dataset

Hate Speech in the form of racism and sexism has become a nuisance on twitter and it is important to segregate these sort of tweets from the rest.

The dataset contains thousands of images of Indian actors and your task is to identify their age. All the images are manually selected and cropped from the video frames resulting in a high degree of variability interms of scale, pose, expression, illumination, age, resolution, occlusion, and makeup.

This dataset consists of more than 8000 sound excerpts of urban sounds from 10 classes. This practice problem is meant to introduce you to audio processing in the usual classification scenario.

If you are aware of other open datasets, which you recommend to people starting their journey on deep learning/ unstructured datasets, please feel free to suggest them along with the reasons, why they should be included.

Transfer learning The art of using Pre-trained Models in Deep Learning

Neural networks are a different breed of models compared to the supervised machine learning algorithms.

So I am picking on a concept touched on by Tim Urban from one of his recent articles on Tim explains that before language was invented, every generation of humans had to re-invent the knowledge for themselves and this is how knowledge growth was happening from one generation to other:

So, transfer learning by passing on weights is equivalent of language used to disseminate knowledge over generations in human evolution.

Instead of building a model from scratch to solve a similar problem, you use the model trained on other problem as a starting point.

You can spend years to build a decent image recognition algorithm from scratch or you can take inception model (a pre-trained model) from Google which was built on ImageNet data to identify images in those pictures.

pre-trained model may not be 100% accurate in your application, but it saves huge efforts required to re-invent the wheel.

This was an image classification problem where we were given 4591 images in the training dataset and 1200 images in the test dataset.

To simplify the above architecture after flattening the input image [224 X 224 X 3] into [150528], I used three hidden layers with 500, 500 and 500 neurons respectively.

Increasing the hidden layers and the number of neurons, caused 20 seconds to run a single epoch on my Titan X GPU with 12 GB VRAM.

used 3 convolutional blocks with each block following the below architecture- The result obtained after the final convolutional block was flattened into a size [256] and passed into a single hidden layer of with 64 neurons.

Though my accuracy increased in comparison to the MLP output, it also increased the time taken to run a single epoch – 21 seconds.

The only change that I made to the VGG16 existing architecture is changing the softmax layer with 1000 outputs to 16 categories suitable for our problem and re-training the dense layer.

Also, the biggest benefit of using the VGG16 pre-trained model was almost negligible time to train the dense layer with greater accuracy.

So, I moved forward with this approach of using a pre-trained model and the next step was to fine tune my VGG16 model to suit this problem.

By using pre-trained models which have been previously trained on large datasets, we can directly use the weights and architecture obtained and apply the learning on our problem statement.

the prediction we would get would be very inaccurate. For example, a model previously trained for speech recognition would work horribly if we try to use it to identify objects using it.

Imagenet data set has been widely used to build various architectures since it is large enough (1.2M images) to create a generalized model. The problem statement is to train a model that can correctly classify the images into 1,000 separate object categories.

These 1,000 image categories represent object classes that we come across in our day-to-day lives, such as species of dogs, cats, various household objects, vehicle types etc.

These pre-trained networks demonstrate a strong ability to generalize to images outside the ImageNet dataset via transfer learning.

The below diagram should help you decide on how to proceed on using the pre trained model in your case –

In this case all we do is just modify the dense layers and the final softmax layer to output 2 categories instead of a 1000.

Since the new data set has low similarity it is significant to retrain and customize the higher layers according to the new dataset.

 The small size of the data set is compensated by the fact that the initial layers are kept pretrained(which have been trained on a large dataset previously) and the weights for those layers are frozen.

  train_img.append(temp_img) #converting train images to array and applying mean subtraction processing train_img=np.array(train_img) train_img=preprocess_input(train_img)# applying the same procedure with the test dataset test_img=[]for i in range(len(test)):  

Extracting features from the train dataset using the VGG16 pre-trained model features_train=model.predict(train_img)# Extracting features from the train dataset using the VGG16 pre-trained model features_test=model.predict(test_img) #

flattening the layers to conform to MLP input train_x=features_train.reshape(49000,25088)# converting target variable to array train_y=np.asarray(train['label'])# performing one-hot encoding for the target variable train_y=pd.get_dummies(train_y)train_y=np.array(train_y)# creating training and validation set from sklearn.model_selection import train_test_split X_train,

Freeze the weights of first few layers – Here what we do is we freeze the weights of the first 8 layers of the vgg16 network, while we retrain the subsequent layers. This is because the first few layers capture universal features like curves and edges that are also relevant to our new problem.

  return model train_y=np.asarray(train['label']) le = LabelEncoder() train_y = le.fit_transform(train_y) train_y=to_categorical(train_y) train_y=np.array(train_y) from sklearn.model_selection import train_test_split X_train,

There are various architectures people have tried on different types of data sets and I strongly encourage you to go through these architectures and apply them on your own problem statements.

Train an Image Classifier with TensorFlow for Poets - Machine Learning Recipes #6

Monet or Picasso? In this episode, we'll train our own image classifier, using TensorFlow for Poets. Along the way, I'll introduce Deep Learning, and add context ...

A Guide to Running Tensorflow Models on Android

Let's create an Android app that uses a pre-trained Tensorflow image classifier for MNIST digits to recognize what the user draws on the screen. We'll use ...

Build a TensorFlow Image Classifier in 5 Min

In this episode we're going to train our own image classifier to detect Darth Vader images. The code for this repository is here: ...

How to Make an Image Classifier - Intro to Deep Learning #6

We're going to make our own Image Classifier for cats & dogs in 40 lines of Python! First we'll go over the history of image classification, then we'll dive into the ...

YOLO Object Detection (TensorFlow tutorial)

You Only Look Once - this object detection algorithm is currently the state of the art, outperforming R-CNN and it's variants. I'll go into some different object ...

PyTorch in 5 Minutes

I'll explain PyTorch's key features and compare it to the current most popular deep learning framework in the world (Tensorflow). We'll then write out a short ...

Intro - TensorFlow Object Detection API Tutorial p.1

Hello and welcome to a miniseries and introduction to the TensorFlow Object Detection API. This API can be used to detect, with bounding boxes, objects in ...

Football data to apply machine learning to!

We can access loads of football data, for FREE, from here: Thanks to the people at This is the .. Fast, flexible, and easy-to-use input pipelines (TensorFlow Dev Summit 2018)

Derek Murray discusses, the recommended API for building input pipelines in TensorFlow. In this talk, he introduces the library, and presents some ...

Google Colaboratory now lets you use GPUs for Deep Learning

Google colab: Snippet to check: Please .