Every time I write a blog, we start a journey. A journey where we know each other and understand ourselves, we make a start and end a journey, and whenever I write a #CodeMe we start a journey towards an algorithm, explore it, learn it, implement it and master it. Today it is the third day of CodeMe.
Wait … wait … aaaahh
wait!!! what we are in the playground mode. Don’t worry if we complete this journey we can break it and come back to our world. Thank god I did my homework or the teacher will scold me, okay okay not talking about my life. let’s start our journey, as we proceed we will learn about CNN.
This blog we will get introduced to the fundamental ideas of computer vision. Our goal is to learn how a neural network can “understand” a natural image well-enough to solve the same kinds of problems the human visual system can solve.
The neural networks that are best at this task are called convolutional neural networks (Sometimes we say convnet or CNN instead.) Convolution is the mathematical operation that gives the layers of a convnet their unique structure. In future lessons, you’ll learn why this structure is so effective at solving computer vision problems.
We will apply these ideas to the problem of image classification: given a picture, can we train a computer to tell us what it’s a picture of? You may have seen apps that can identify a species of plant from a photograph. That’s an image classifier! In this blog, you’ll learn how to build image classifiers just as powerful as those used in professional applications.
While our focus will be on image classification, what you’ll learn in this blog is relevant to every kind of computer vision problem. In the end, you’ll be ready to move on to more advanced applications like generative adversarial networks and image segmentation.
convnet used for image classification consists of two parts: a convolutional base and a dense head.
The head is used to determine the class of the image. It is formed primarily of dense layers but might include other layers like a dropout.
What do we mean by visual feature? A feature could be a line, a colour, a texture, a shape, a pattern — or some complicated combination.
The whole process goes something like this:
The features actually extracted look a bit different, but it gives the idea.
The goal of the network during training is to learn two things:
- which features to extract from an image (base),
- which class goes with what features (head).
These days, convnets are rarely trained from scratch. More often, we reuse the base of a pre-trained model. To the pre-trained base, we then attach an untrained head. In other words, we reuse the part of a network that has already learned to do 1. Extract features, and attach to it some fresh layers to learn 2. Classify.
Because the head usually consists of only a few dense layers, very accurate classifiers can be created from relatively little data.
Reusing a pre-trained model is a technique known as transfer learning. It is so effective, that almost every image classifier these days will make use of it.
In this blog, I will use a vgg16 model or OxfordNet v1.0.
A sample pseudo-code:-
model = pipeline(
A very simple model in pseudo-code format.
now let’s look at vgg16 and explore it let’s go.
The above code is of a vgg16 model. Here you can see a class named conv2d it is our convolution layer and MaxPool is our polling layer.
wait, what all are these terms?
okay okay, I know, let’s jot it all down into a nutshell.
The feature extraction performed by the base consists of three basic operations:
- Filter an image for a particular feature (convolution)
- Detect that feature within the filtered image (ReLU)
- Condense the image to enhance the features (maximum pooling)
The next figure illustrates this process. You can see how these three operations are able to isolate some particular characteristic of the original image (in this case, horizontal lines).
Hope now the things are cleared.
So this is the basic of CNN.
you can get the link of my custom model on kaggle.com.
A little click-bait history lesson on CNN.
Neocognitron, the origin of the CNN architecture
The “neocognitron” was introduced by Kunihiko Fukushima in 1980. It was inspired by the above-mentioned work of Hubel and Wiesel. The neocognitron introduced the two basic types of layers in CNNs: convolutional layers and downsampling layers. A convolutional layer contains units whose receptive fields cover a patch of the previous layer. The weight vector (the set of adaptive parameters) of such a unit is often called a filter. Units can share filters. Downsampling layers contain units whose receptive fields cover patches of previous convolutional layers. Such a unit typically computes the average of the activations of the units in its patch. This downsampling helps to correctly classify objects in visual scenes even when the objects are shifted.
In a variant of the neocognitron called the cresceptron, instead of using Fukushima’s spatial averaging, J. Weng et al. introduced a method called max-pooling where a downsampling unit computes the maximum of the activations of the units in its patch. Max-pooling is often used in modern CNNs.
Several supervised and unsupervised learning algorithms have been proposed over the decades to train the weights of a neocognitron. Today, however, the CNN architecture is usually trained through backpropagation.
The neocognitron is the first CNN which requires units located at multiple network positions to have shared weights. Neocognitrons were adapted in 1988 to analyze time-varying signals.
A great problem starts he….r…e
What finishing our journey is not a part of the game but also implementing it.
We are given a task using CNN architecture we need to create our custom convnet. To solve a problem. We need to classify 104+ unidentifiable flowers given a dataset.
Well, don’t worry I have my model ready.
After some practice you can also make models of your own.
Well, we are back. Home feels sweet, doesn’t it?
Before we finish
It was a fun-full session I had a lot of fun and we learned about convnet, we solved a problem with a pre-trained model, explored vgg16, and took a click-bait into the history of convnet.
Thank you for reading.