Using this WebUI, you can operate LLMs in GGUF, EXL2, and GPTQ formats. Additionally, you can run non-quantized models as well. There was also support for AWQ models, but it seems to be currently non-functional.
The docker version of Oobabooga Text-Generation-Webui is the easiest way to get the program up and running:
git clone https://github.com/Atinoda/text-generation-webui-docker
The configuration for Nvidia cards is prepared in the docker-compose.yml
file. You may potentially
achieve higher speeds by using the variant with the pre-built Tensor-LLM library, by changing
the image:
line to:
image: atinoda/text-generation-webui:default-nvidia-tensorrtllm
Das hat allerdings nicht immer funktioniert in meinen Tests.
If you want to use an AMD graphics card instead of an Nvidia one, use the image
image: atinoda/text-generation-webui:default-rocm
Furthermore you have to comment out (add a #
at the start of the line):
### Nvidia (default) ###
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0','1']
capabilities: [gpu]
and comment IN (remove #
at the start of the line):
# stdin_open: true
# group_add:
# - video
# tty: true
# ipc: host
# devices:
# - /dev/kfd
# - /dev/dri
# cap_add:
# - SYS_PTRACE
# security_opt:
# - seccomp=unconfined
ROCm has to be installed as explained here
Use this image:
image: atinoda/text-generation-webui:default-cpu
Furthermore you have to comment out (add a #
at the start of the line) so that in the
lower part of the file everything is commented out:
### Nvidia (default) ###
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0','1']
capabilities: [gpu]
Under environment:
you can put the parameters for OTGW, e.g.:
- EXTRA_LAUNCH_ARGS="--api --listen --verbose --model GGUF-c4ai-command-r-08-2024.Q8_0_mradermacher.gguf --loader llama.cpp --n_ctx 32768 --cache_type fp16 --flash-attn --gradio-auth karlheinz:Stinkepups"
The names of the parameters correspond to the labels on the UI, except for “Flash attention”,
here, instead of the underscore _
a hyphen -
must be used.
If you want to use OTGW for SillyTavern or other programs, remember to comment in
the API port 5000 (remove the #
) and specify --api
in the parameter list!
Then build and start the container:
docker compose up --build -d
In general, models downloaded through OTGW will be stored in the ./config/models directory. This is integrated into the container via a bind mount so that the downloaded models are persisted.
If you have already downloaded models and saved them in a different directory, you can adjust the
directory used for the aforementioned bind-mount in the docker-compose.yml
file.
You can update the container / the image with the command:
docker compose down
docker compose pull
Then execute:
docker compose up --build -d