Fine-Tuning LLaMA with LoRA: A Lightweight Approach to Sentiment Analysis

Introduction

Large Language Models like LLaMA, GPT, and Mistral are exceptionally powerful models that have leveraged human processing power of storage space for training billions of parameters. Training this number of parameters isn’t advisable, as it is not very easy for students, researchers, or people with minimal hardware specifications. That’s why we use techniques like LoRA

This is where LORA comes into play(Low Rank Adaptation). Instead of fine-tuning the whole model, LORA lets you track the changes in weights and update the parameters accordingly. In simple words, you can say converting the model from a higher memory format to a lower memory format is the process of quantization, which is the most basic and used phenomenon for fine-tuning.

LoRA

In this blog I’ll show you how I fine-tuned LLaMA for sentiment analysis using LoRA . There are various methods to fine-tune a model eg, LoRA, QLoRA, etc. but in this blog we’ll be focusing on LoRA.

What is LoRA?

LoRA is originally based on the concept of Low Rank Decomposition. Instead of updating all the weights, it injects adapter layers into specific modules and trains only those without interfering with the whole parameters of model. Let’s talk about Adapter Layers. These are the small trainable layers that are plugged into large LLMs like LLaMA. Through this, we freeze the original weights of the LLM while training only adapter layers to gain knowledge from the LLM for the required task, i.e, Sentiment Analysis.

Benefits of LoRA:

  • Highly Efficient: Rather than training billions of parameters only about a millions are trained.
  • Storage Capable: As compared to other adapters, which are used in LoRA are much less in storage(MBs).
  • Faster Training: Adaptive Layers trains much faster, resulting in time efficiency.

Dataset and Task

For this project I have used IMDB Reviews Dataset, a very popular dataset which contains around 50k movie reviews, each labelled as either positive or negative. Since I don’t have a very good GPU I extracted a subset of 10k samples from the ‘IMDB’ dataset.

from datasets import load_dataset
import json
import os
from tqdm import tqdm

# Load IMDb dataset from Hugging Face
dataset = load_dataset("imdb")

# Function to convert example to prompt/completion format
def format_example(example):
    sentiment = "Positive" if example["label"] == 1 else "Negative"
    prompt = f"Classify the sentiment: {example['text'].strip()[:500]} Sentiment:"
    return {"prompt": prompt, "completion": f" {sentiment}"}

# Create output folder if needed
os.makedirs("data", exist_ok=True)

# Prepare training data (you can change to "test" or use both)
train_data = dataset["train"].select(range(10000))  # use 10k for quick fine-tuning

# Convert and save
formatted_data = [format_example(ex) for ex in tqdm(train_data)]
with open("data/train.json", "w", encoding="utf-8") as f:
    json.dump(formatted_data, f, indent=2)

print("✅ Dataset saved to data/train.json")

You can see the above code. Moreover, I have truncated the Input as a Movie review to 500 characters and Output it as a sentiment label, classified as ‘Positive’ or ‘Negative’.

Project Setup

pip install transformers peft datasets accelerate

These are the libraries that we’ll be using:

  • Transformers: It refers to our base model i.e, LLaMA.
  • PEFT: It stands for parameter-efficient Fine-tuning. It is a library developed by hugging face that provides modern fine-tuning techniques like LoRA, AdaLoRA, etc.
  • Datasets: From here we’ll be importing our ‘IMDB’ dataset.
  • Accelerate: It is a hugging face library that makes it easy to run your code across different hardware setups, which means it can help detect when to use CPU, single GPU, or multiple GPUs.

LoRA Configuration

Now the real fun begins, where we will be having a look at the LoRA config file, which is the main constituent of this project.

from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.SEQ_CLS,
)
  • r(Rank): I’ve kept r = 8. This is the LoRA dimension that keeps adaptive layers lightweight. If you increase it, the layers will take more parameters, which eventually will take more time to train, but may give more accurate results.
  • lora_alpha: It computes the low-rank update for a weight matrix. It uses the formula W′=W+α⋅(AB)/r where W = original weights in frozen LLM, a = lora_alpha, (A,B) = The two matrices that LoRA uses for compression, r = rank
  • target_modules: These are the selected or some specific modules where the inserted or adaptive layers are targeted. Generally, in transformers like LLaMA, these modules have 4 common projections. I have set target_modules to q_proj and v_proj because they have known to yield the best results, giving good adaptations with minimal parameters. You are free to experiment with other values.
    • q_proj → query projection
    • k_proj → key projection
    • v_proj → value projection
    • o_proj → output projection
  • task_type: As the name says, it tells PEFT for which task you are fine-tuning your model for, as different task_types depend upon your requirements. Eg, TaskType.TOKEN_CLS → token classification (e.g., NER, POS tagging). TaskType.SEQ_CLS → sequence classification (e.g., sentiment analysis, spam detection). We have set task_type to SEQ_CLS.

Training Process

from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset
from peft import get_peft_model

# Load dataset
dataset = load_dataset("imdb")

# Load base model
model = AutoModelForSequenceClassification.from_pretrained(
    "huggyllama/llama-7b",
    num_labels=2,
)

# Inject LoRA adapters
model = get_peft_model(model, lora_config)

I have loaded the LLaMA-7B model, defined for a sequence classification task. num_labels is set to 2 for binary classification, as we want output as ‘Positive’ or ‘Negative’.

In the last line, Lora adapters, i.e, 2 ranked matrices A and B, are injected in the model so that they can learn from it.

Training Arguments

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-4,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"].shuffle(seed=42).select(range(5000)),
    eval_dataset=dataset["test"].shuffle(seed=42).select(range(1000)),
)

trainer.train()
  • output_dir: Here, the checkpoints are saved
  • evaluation_strategy: The evaluation strategy is run per epoch, meaning after every epoch is completed, it runs the evaluation strategy.
  • save_strategy: After every epoch the model is saved.
  • learning_rate=2e-4: This is the learning rate used by adapters.
  • per_device_train_batch_size=8: The batch size
  • per_device_eval_batch_size=8: Batch size of evaluation strategy
  • num_train_epochs=3: Train for 3 epochs.
  • weight_decay=0.01: For Regularization
  • logging_dir=”./logs: Stores Logs

trainer() This is the main function to start the training process. We take a random subset of 5000 samples to keep it lightweight and 1000 samples from the testing data for the evaluation process.

In summary, this script is responsible for loading the IMDb dataset, loading LLAMA (classification head added), then LoRA adapters (efficient fine-tuning), Batches parameter values (LR, logging), Refines on IMDb subgroup, and Saves sachets of weights.

Results

After you’ve done the training process your output will look something like this:

***** Running training *****
  Num examples = 10000
  Num Epochs = 3
  Instantaneous batch size per device = 2
  Total train batch size (w. accumulation) = 8
  Gradient Accumulation steps = 4
  Total optimization steps = 3750
  Number of trainable parameters = 4,194,304  (LoRA adapters only)

Epoch 1/3: 100%|█████████████████████| 1250/1250 [00:58<00:00, 21.45it/s, loss=0.43]
Epoch 2/3: 100%|█████████████████████| 1250/1250 [00:55<00:00, 22.69it/s, loss=0.28]
Epoch 3/3: 100%|█████████████████████| 1250/1250 [00:54<00:00, 23.12it/s, loss=0.19]

Training completed. Saving model checkpoint to models/adapter
Configuration saved in models/adapter/adapter_config.json
Model weights saved in models/adapter/adapter_model.bin

You can see the loss gets decreased with each epoch, which is a good sign. No sign of overfitting or underfitting as of now. The best thing you can see is that only 4,194,304 LoRA adapters or parameters are used to fine-tune a model, instead of billions; only millions parametrs are used, resulting in higher efficiency and productivity.

Conclusion

In this Blog, I discussed how I fine-tuned LLaMA for Sentiment Analysis using the LoRA technique. Personally, here are some points at which LoRA really helped me:

  • The adapter layers were so lightweight yet very efficient.
  • Even after not having a Good GPU, it allowed me to process on the limited hardware.
  • The results I got were really competitive with a limited amount of parameters used for training purposes .

This was rewarding to tinker with because it shows that LLM fine-tuning is no longer the province of big technology labs.

What’s Next

Apart from LoRA, there are many various techniques used for fine-tuning that I’ll be looking forward to. Moreover will be deploying the model to solve real-world problems. Instruction tuning LoRA is a very interesting topic I am looking to explore, and you may soon see a blog on it.

📬 Want to connect or collaborate? Head over to the Contact page or find me on GitHub or LinkedIn

Add a Comment

Your email address will not be published. Required fields are marked *