Parameter-Efficient Fine-Tuning

#DeepLearning #Fine-Tuning #PEFT

Method

  • Modified small part of parameters of pretrained models
  • Add extra parameters
  • Add increment of parameters of pretrained models
    • LoRA, FACT

Modified small part of parameters of pretrained models

Bias-terms Fine-tuning BitFit 微调偏置项

Only modify (retraining) part of parameters of pretrained models

  • Bias and the final linear layer
Pros
  1. Simple, but efficient, comparable to full model fine-tuning
  2. Can learn downstream tasks sequentially, which helps to deploy efficiently
  3. For each downstream task, only very small number of parameters should be stored
  • For BERT backbone, 0.1% parameters are modified (retrained)

Add extra parameters

Adapter Tuning

Idea

Add Adapter module to each layer of pretrain models

Prefix Tuning

Idea

Use prefixs, which can be learned and specific to given tasks, to mimic prompts

Prompt Tuning

Idea
  • Construct a prompt for each task, concatenate the prompt with data to input to Large Model
  • Only add prompt (tokens) at input layer, no MLP A Simplified version of Prefix Tuning

Add increment of parameters of pretrained models

LoRA

Cons of previous tuning methods
  • Adapter Tuning adds Adapter layer in Transformer, which makes the model more deeper and increases the inference time
  • Prompt-based methods, such as Prefix Tuning and Prompt Tuning, are hard to train. In addition, the prompt token occupy the input space so as to decrease the number of available tokens

FacT