Transformer-based large language models (LLMs) have shown excellent performance in different domains, especially when fine-tuned for specific domains. Recent studies have shown that the resources required for fine-tuning LLMs can be saved by parameter-efficient methods such as low-rank adaptation (LoRA). While LoRA effectively reduces the computational burden and resource requirements, it currently only supports single-job fine-tuning settings. In this paper, we introduce ASPEN, a high-throughput framework for fine-tuning LLMs. ASPEN uses the LoRA approach to efficiently train multiple jobs on a single GPU, leveraging shared pre-trained models and adaptive scheduling. ASPEN is compatible with transformer-based language models such as LLaMA and ChatGLM. Experiments show that ASPEN saves 53% of GPU memory when training multiple LLaMA-7B models on an NVIDIA A100 80GB GPU, and improves training throughput by about 17% compared to existing methods when training with various pre-trained models on different GPUs. An adaptive scheduling algorithm that prioritizes jobs and prevents out-of-memory issues improves turnaround time by 24% and reduces end-to-end training latency by 12%.
You Might Like
Recommended ContentMore
Open source project More
Popular Components
Searched by Users
Just Take a LookMore
Trending Downloads
Trending ArticlesMore