What is Transfer Learning?

    Rajeshwar Reddy

    The reuse of a pre-trained model on a new problem is known as transfer learning in machine learning. A machine uses the knowledge learned from a prior assignment to increase prediction about a new task in transfer learning.The general idea of transfer learning is to use knowledge learned from tasks for which a lot of labeled data is available in settings where only a little labeled data is available. Creating labeled data is expensive, so optimally leveraging existing datasets is key.

    In a traditional machine learning model, the primary goal is to generalize to unseen data based on patterns learned from the training data. With transfer learning, you attempt to kickstart this generalization process by starting from patterns that have been learned for a different task. Essentially, instead of starting the learning process from an (often randomly initialized) blank sheet, you start from patterns that have been learned to solve a different task.



    Understanding of transfer learning

    Transfer learning is essential in any kind of learning. Humans are not taught every single task or problem in order to be successful at it. Everyone gets into situations that have never been encountered, and we still manage to solve problems in an ad-hoc manner. The ability to learn from a large number of experiences, and exporting ‘knowledge’ into new environments is exactly what transfer learning is all about.

    Transfer learning is key to ensure the breakthrough of deep learning techniques in a large number of small-data settings. Deep learning is pretty much everywhere in research, but a lot of real-life scenarios typically do not have millions of labeled data points to train a model. Deep learning techniques require massive amounts of data in order to tune the millions of parameters in a neural network.

    Especially in the case of supervised learning, this means that you need a lot of (highly expensive) labeled data. Labeling images sounds trivial, but for example in Natural Language Processing (NLP), expert knowledge is required to create a large labeled dataset. Transfer learning is one way of reducing the required size of datasets in order for neural networks to be a viable option. Other viable options are moving towards more probabilistically inspired models, which typically are better suited to deal with limited data sets.

    Transfer learning has significant advantages as well as drawbacks. Understanding these drawbacks is vital for successful machine learning applications. Transfer of knowledge is only possible when it is ‘appropriate’. Exactly defining what appropriate means in this context is not easy, and experimentation is typically required. You should not trust a toddler that drives around in a toy car to be able to ride a Ferrari. The same principle holds for transfer learning: although hard to quantify, there is an upper limit to transfer learning. It is not a solution that fits all problem cases.


    The requirements of transfer learning

    Transfer learning, as the name states, requires the ability to transfer knowledge from one domain to another. Transfer learning can be interpreted on a high level, that is, NLP model architectures can be re-used in sequence prediction problems since a lot of NLP problems can inherently be reduced to sequence prediction problems. Transfer learning can also be interpreted on a low level, where you are actually reusing parameters from one model in a different model.


    How transfer learning works?

    In computer vision, neural networks typically aim to detect edges in the first layer, forms in the middle layer, and task-specific features in the latter layers. The early and central layers are employed in transfer learning, and the latter layers are only retrained. It makes use of the labelled data from the task it was trained on.


    Let’s return to the example of a model that has been intended to identify a backpack in an image and will now be used to detect sunglasses. Because the model has trained to recognise objects in the earlier levels, we will simply retrain the subsequent layers to understand what distinguishes sunglasses from other objects.

    Why should you use transfer learning?

    Transfer learning offers a number of advantages, the most important of which are reduced training time, improved neural network performance (in most circumstances), and the absence of a large amount of data.

    To train a neural model from scratch, a lot of data is typically needed, but access to that data isn’t always possible – this is when transfer learning comes in handy.

    Because the model has already been pre-trained, a good machine learning model can be generated with fairly little training data using transfer learning. This is especially useful in natural language processing, where huge labelled datasets require a lot of expert knowledge. Additionally, training time is decreased because building a deep neural network from the start of a complex task can take days or even weeks.

    When to use transfer learning?

    When we don’t have enough annotated data to train our model with. When there is a pre-trained model that has been trained on similar data and tasks. If you used TensorFlow to train the original model, you might simply restore it and retrain some layers for your job. Transfer learning, on the other hand, only works if the features learnt in the first task are general, meaning they can be applied to another activity. Furthermore, the model’s input must be the same size as it was when it was first trained. If

    If you don’t have it, add a step to resize your input to the required size.


    Consider the situation in which you wish to tackle Task A but lack the necessary data to train a deep neural network. Finding a related task B with a lot of data is one method to get around this.

    Utilize the deep neural network to train on task B and then use the model to solve task A. The problem you’re seeking to solve will decide whether you need to employ the entire model or just a few layers.

    If the input in both jobs is the same, you might reapply the model and make predictions for your new input. Changing and retraining distinct task-specific layers and the output layer, on the other hand, is an approach to investigate.


    The second option is to employ a model that has already been trained. There are a number of these models out there, so do some research beforehand. The number of layers to reuse and retrain is determined by the task.

    Keras consists of nine pre-trained models used in transfer learning, prediction, fine-tuning. These models, as well as some quick lessons on how to utilise them, may be found here. Many research institutions also make trained models accessible.

    The most popular application of this form of transfer learning is deep learning.


    Another option is to utilise deep learning to identify the optimum representation of your problem, which comprises identifying the key features. This method is known as representation learning, and it can often produce significantly better results than hand-designed representations.

    Feature creation in machine learning is mainly done by hand by researchers and domain specialists. Deep learning, fortunately, can extract features automatically. Of course, this does not diminish the importance of feature engineering and domain knowledge; you must still choose which features to include in your network.

    Neural networks, on the other hand, have the ability to learn which features are critical and which aren’t. Even for complicated tasks that would otherwise necessitate a lot of human effort, a representation learning algorithm can find a decent combination of characteristics in a short amount of time.

    The learned representation can then be applied to a variety of other challenges. Simply utilise the initial layers to find the appropriate feature representation but avoid using the network’s output because it is too task-specific. Instead, send data into your network and output it through one of the intermediate layers.

    The raw data can then be understood as a representation of this layer.

    This method is commonly used in computer vision since it can shrink your dataset, reducing computation time and making it more suited for classical algorithms.