Everything you need to know
  • 🐑Welcome to Alpaca
  • Overview
    • âœĻOur Features
  • 💎Stanford Alpaca Guides
    • ðŸŠķAlpaca GPT
    • 📎Understanding Project Stanford Alpaca
  • Use Cases
    • ðŸ’ŦSocial Connect
Powered by GitBook
On this page
  • How Projects work
  • The Basics
  • Fine-tuning Parameters
  • Organizing your OwnGPT
  1. Stanford Alpaca Guides

Understanding Project Stanford Alpaca

Thematic Artificial Intelligence Data Modelling

PreviousAlpaca GPTNextSocial Connect

Last updated 2 years ago

How Projects work

By using the basis of following:

  • Data Release

  • Data Generation Process

  • Fine-tuning

  • Key Principals

The Basics

This produced an instruction-following dataset with 52K examples obtained at a much lower cost (less than $500). In a preliminary study, we also find our 52K generated data to be much more diverse than the data released by . We plot the below figure (in the style of Figure 2 in the to demonstrate the diversity of our data. The inner circle of the plot represents the root verb of the instructions, and the outer circle represents the direct objects.

Fine-tuning Parameters

Hyperparameter
Value

Batch size

128

Learning rate

2e-5

Epochs

3

Max length

512

Weight decay

0

Organizing your OwnGPT

Below is a command that fine-tunes LLaMA-7B with our dataset on a machine with 4 A100 80G GPUs in FSDP full_shard mode. We were able to reproduce a model of similar quality as the one we hosted in our demo with the following command using Python 3.10. Replace <your_random_port> with a port of your own, <your_path_to_hf_converted_llama_ckpt_and_tokenizer> with the path to your converted checkpoint and tokenizer (following instructions in the PR), and <your_output_dir> with where you want to store your outputs.

// torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LLaMADecoderLayer' \
    --tf32 True

Note the given training script is meant to be simple and easy to use, and is not particularly optimized. To run on more gpus, you may prefer to turn down gradient_accumulation_steps to keep a global batch size of 128. Global batch size has not been tested for optimality.

💎
📎
self-instruct
self-instruct paper