Using FastChat
vLLM Integration
Instructions
-
Install vLLM.
-
When you launch a model worker, replace the normal worker (
fastchat.serve.model_worker
) with the vLLM worker (fastchat.serve.vllm_worker
). All other commands such as controller, gradio web server, and OpenAI API server are kept the same.If you see tokenizer errors, try