 # This is what makes deep learning so powerful

We are excited to bring Transform 2022 back to life on 19th July and virtually 20th July – 3rd August. Join AI and data leaders for sensible conversations and exciting networking opportunities. Learn more

Deep learning is expected to be a \$ 93 billion market by 2028, according to Emergence Research.

But what is deep learning and how does it work?

Deep learning is a subset of machine learning that uses neural networks to learn and make predictions. Deep Learning has performed wonderfully in a variety of tasks, be it text, time series or computer vision. The success of Deep Learning comes mainly from the availability of big data and compute power. However, it is more than that, which makes deep learning better than classical machine learning algorithms.

## Deep Learning: Neural Networks and Functions

A neural network is an interconnected network of neurons in which each neuron has a limited function aproximeter. In this way, it is considered as a neural network Universal work approx, If you remember from high school math, the function is mapping from input space to output space. A simple sin (x) function is angular space (-180)O Up to 180 O Or 0 O Up to 360 O) To real number space (-1 to 1).

Let us see why neural networks are considered universal function estimates. Each neuron learns finite function: f (.) = G (W * X) where W is the weight vector, X is the input vector and g (.) Is the non-linear transformation. W * X can be visualized as a line (taught) in a high-dimensional space (hyperplane) and g (.) Can be any non-linear differential function such as sigmoid, tanah, reLU, etc. (Commonly used in deep learning community). Learning in the neural network is nothing more than finding the best weight vector W. For example, in y = mx + c, we have 2 weights: m and c. Now, based on the distribution of points in 2D space, we find the optimal value of m & c that satisfies some criteria: the difference between the approximate y and the actual points for all data points is minimal.

## Level effect

Now that each neuron is a nonlinear function, we stack many such neurons into “layers” where each neuron receives the same set of inputs but learns different weights. Therefore, each level has a set of tasks learned: [f1, f2, …, fn], Which is called Hidden Layer Value. These values ​​are reconnected, to the next level: h (f1, f2,…, fn) and so on. In this way, each level is made up of tasks from the previous level (such as h (f (g (x))))).

Deep learning is a neural network with many hidden layers (usually identified by> 2 hidden layers). But effectively, whether deep learning is a complex set of functions from one level to another, find a function that defines the mapping from input to output. For example, if the input is an image of a lion and the output is a classification of the image that the image belongs to a class of lions, then deep learning is learning a function that maps image vectors into categories. Similarly, input is word order and output is whether the input line has a positive / neutral / negative feeling. Therefore, deep learning is about learning to map from input text to output classes: neutral or positive or negative.

## Deep learning as a projection

From biological interpretation, humans process hierarchical interpretations of images of the world, from low-level features such as edges and contours to high-level features such as objects and scenes. Function composition corresponds to this in neural networks, where each function composition learns complex features about the image. The most common neural network architecture used for images is the Convoluted Neural Network (CNN), which learns those features hierarchically and then categorizes the features of a fully connected neural network image into different categories.

Reusing high school math, given the set of data points in 2D, we try to fit the curve by interpolation that somewhat represents the function of defining those data points. The more complex the task we fit (in the interpolation, for example, determined by the polynomial degree), the more data it fits; However, for new data points it does not generalize as much. This is where deep learning faces challenges and what is commonly referred to as the overfitting problem: fitting for as much data as possible, but compromising in generalizations. Almost all architectures in Deep Learning had to handle this important factor in order to be able to learn general work that could work equally well on invisible data.

Leading in-depth study, Yan Lakun (creator of the Convolutionary Neural Network and winner of the ACM Turing Award) posted on him Twitter handle (Based on paper): “Deep learning is not as impressive as you think because it is only projected as a result of glorified curve fittings. But in higher dimensions, there is no such thing as projection. In higher dimensions, everything is extrapolation. “Thus, as part of function learning, deep learning does nothing but projection or in some cases extrapolation. That’s all!

## The learning aspect

So, how can we learn this complex task? Well, it depends entirely on the problem at hand and determines the same neural network architecture. If we are interested in image classification, we use CNN. If we are interested in time-based predictions or text we use RNN or Transformers and if we have a dynamic environment (such as car driving) we use reinforcement education. Apart from this, facing various challenges in education involves:

• Ensuring that the model learns normal function and does not fit just to train the data; This is controlled using regularization
• Depending on the problem at hand, the loss function is selected; Simply put, a loss function is an error function between what we want (true value) and what we currently have (current guess).
• Gradient Descent is an algorithm used for converting to a better function; Determining learning rates becomes challenging because when we are away from the best, we want to move quickly towards the best, and when we are close to the best, we have to make sure that we transform into the best and the global minimum. We want to move at a slower pace.
• More number of hidden layers are needed to handle the problem of disappearing gradients; Architectural changes such as leave connection and proper non-linear activation function help solve it.

## Calculate the challenges

Now that we know that deep learning is just a complex task of learning, it brings other calculation challenges:

• To learn complex tasks, we need a large amount of data
• To process big data, we need a fast calculation environment
• We need infrastructure that supports such an environment

The process parallel to the CPU is not enough to calculate millions or billions of weights (also called DL parameters). Neural networks need to learn weights that require vector (or tensor) multiplication. That’s where GPUs come in handy, as they can multiply parallel vectors very quickly. Depending on the deep learning architecture, data size and hands-on work, we sometimes need 1 GPU, and sometimes, many of them, the data scientist needs to make a decision based on the known literature or by measuring performance on 1 GPU.

With sufficiently large data with the use of appropriate neural network architectures (number of layers, number of neurons, non-linear function, etc.), a deep learning network can learn any mapping from one vector space to another. This is what makes Deep Learning such a powerful tool for any machine learning task.

Abhishek Gupta is a leading data scientist Talentica software,