VLM deployment
To deploy the VLM and use it, you will need a machine that contains more that 40 GB of Vram and more than 30 GB of cache memory (RAM).
You will have to install docker and install Nvidia container toolkit by following these instructions.
Once you selected the host machine for the model.
Install the deploy part
Optional - Clone only the corresponding folder
Works with Git > 2.27 and the remote server need to support partial clone filtering
git clone --filter=blob:none --no-checkout https://github.com/convince-project/sit-aw-aip.git
cd convincesitaw-mllm
git sparse-checkout init --cone
git sparse-checkout set /vLLM-hosting
git checkout
Else clone the whole project and consider only vLLM-hosting folder
git clone https://gitlab.lri.cea.fr/razane.azrou/convincesitaw-mllm.git
cd convincesitaw-mllm/vLLM-hosting
Run and build
Then execute bash file
deploy_model.sh
You may need to give execute permissions (on Linux)
If your are not using bash you can directly build then run the docker compose in your terminal
docker compose build
docker compose up -d
The .env file allows you to change some parameters
MODEL : The model to deploy - default Qwen2.5-VL
PORT : Exposed and container port - default 23333
GPU_MEMORY_USAGE : The portion (< 1) of GPU usage allowed to the model given GPU capacities – default 0.85
DOWNLOAD_MODEL_CACHE_DIR : Machine’s directory to download cached model to - default “./.cache/huggingface”
⚠ Changing the model may need different GPU memory usage or disk capacity to where to download the cached model. It is quite specific to the model, so you may need a more powerful machine