TinyLLaMa 1.1B LLM Fine-tuning
This project focuses on enhancing the conversational abilities of Tinyllama, a 1.1 billion parameter chatbot language model (LLM). Through a self-supervised fine-tuning approach, the project aims to improve Tinyllama’s ability to answer questions and engage in conversations effectively. The project utilizes the OpenOrca question-answering dataset and GPT-4 question formats for training. Transformer Reinforcement Learning, a powerful technique, guides the fine-tuning process.
Completion Date: Feb 2024 | Tools: Torch, HuggingFace, TinyLLaMa LLM, Accelerate , TRL, Link
Goal:
- Improve Tinyllama’s (1.1B version) conversational capabilities, specifically focusing on question-answering.
- The project aims to train Tinyllama to handle questions posed in the GPT-4 format.
Methodology:
- Self-Supervised Fine-tuning: Unlike supervised learning, this approach leverages Tinyllama itself to generate training data. The project likely involves techniques where Tinyllama interacts with the OpenOrca dataset and GPT-4 formats to learn and improve its responses.
- Transformer Reinforcement Learning: Similar to the Gemma LLM project, this project employs Transformer Reinforcement Learning to refine Tinyllama’s behavior. The model receives rewards for providing accurate and engaging responses based on the OpenOrca data and GPT-4 instruction formats.
Technical Stack:
- Hardware: The project utilizes the computational power of an AWS g5.12xlarge server to handle the demanding training processes for fine-tuning a large language model.
- Software:
- Hugging Face: A popular library likely used to access and manage the Tinyllama LLM.
- Accelerate: A library designed to accelerate deep learning workloads, potentially used to optimize training on the AWS server.
- Torch: A deep learning framework likely used to implement the Transformer Reinforcement Learning algorithm.
- Trl: Potentially a custom library or toolkit specific to the project’s requirements, although its exact function is unclear without further context.