transformers>=4.31
is required for 16K versions.
Size | Chat Command | Hugging Face Repo |
---|---|---|
7B | python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 | lmsys/vicuna-7b-v1.5 |
7B-16k | python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5-16k | lmsys/vicuna-7b-v1.5-16k |
13B | python3 -m fastchat.serve.cli --model-path lmsys/vicuna-13b-v1.5 | lmsys/vicuna-13b-v1.5 |
13B-16k | python3 -m fastchat.serve.cli --model-path lmsys/vicuna-13b-v1.5-16k | lmsys/vicuna-13b-v1.5-16k |
33B | python3 -m fastchat.serve.cli --model-path lmsys/vicuna-33b-v1.3 | lmsys/vicuna-33b-v1.3 |
Size | Chat Command | Hugging Face Repo |
---|---|---|
7B | python3 -m fastchat.serve.cli --model-path lmsys/longchat-7b-32k-v1.5 | lmsys/longchat-7b-32k |
Size | Chat Command | Hugging Face Repo |
---|---|---|
3B | python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0 | lmsys/fastchat-t5-3b-v1.0 |
--style rich
to enable rich text output and better text streaming quality for some non-ASCII content. This may not work properly on certain terminals.)
--model-path
can be a local folder or a Hugging Face repo name.
--max-gpu-memory
to specify the maximum memory per GPU for storing model weights.
This allows it to allocate more memory for activations, so you can use longer context lengths or larger batch sizes. For example,
--device mps
to enable GPU acceleration on Mac computers (requires torch >= 2.0).
Use --load-8bit
to turn on 8-bit compression.
--device xpu
to enable XPU/GPU acceleration.
--device npu
to enable NPU acceleration.
--load-8bit
to commands above.
This can reduce memory usage by around half with slightly degraded model quality.
It is compatible with the CPU, GPU, and Metal backend.
Vicuna-13B with 8-bit compression can run on a single GPU with 16 GB of VRAM, like an Nvidia RTX 3090, RTX 4080, T4, V100 (16GB), or an AMD RX 6800 XT.
--cpu-offloading
to commands above to offload weights that don’t fit on your GPU onto the CPU memory.
This requires 8-bit compression to be enabled and the bitsandbytes package to be installed, which is only available on linux operating systems.