Docker

Using this WebUI, you can operate LLMs in GGUF, EXL2, and GPTQ formats. Additionally, you can run non-quantized models as well. There was also support for AWQ models, but it seems to be currently non-functional.

The docker version of Oobabooga Text-Generation-Webui is the easiest way to get the program up and running:


git clone https://github.com/Atinoda/text-generation-webui-docker

Nvidia

The configuration for Nvidia cards is prepared in the docker-compose.yml file. You may potentially achieve higher speeds by using the variant with the pre-built Tensor-LLM library, by changing the image: line to:


image: atinoda/text-generation-webui:default-nvidia-tensorrtllm

Das hat allerdings nicht immer funktioniert in meinen Tests.

Nvidia drivers and CUDA have to be installed, as explained here and here

AMD

If you want to use an AMD graphics card instead of an Nvidia one, use the image


image: atinoda/text-generation-webui:default-rocm

Furthermore you have to comment out (add a # at the start of the line):

    ### Nvidia (default) ###
    deploy:
        resources:
          reservations:
            devices:
              - driver: nvidia
                device_ids: ['0','1']
                capabilities: [gpu]

and comment IN (remove # at the start of the line):

#    stdin_open: true
#    group_add:
#      - video
#    tty: true
#    ipc: host
#    devices:
#      - /dev/kfd
#      - /dev/dri
#    cap_add:
#      - SYS_PTRACE
#    security_opt:
#      - seccomp=unconfined

ROCm has to be installed as explained here

CPU

Use this image:


image: atinoda/text-generation-webui:default-cpu

Furthermore you have to comment out (add a # at the start of the line) so that in the lower part of the file everything is commented out:

    ### Nvidia (default) ###
    deploy:
        resources:
          reservations:
            devices:
              - driver: nvidia
                device_ids: ['0','1']
                capabilities: [gpu]

Parameter

Under environment: you can put the parameters for OTGW, e.g.:


- EXTRA_LAUNCH_ARGS="--api --listen --verbose --model GGUF-c4ai-command-r-08-2024.Q8_0_mradermacher.gguf --loader llama.cpp --n_ctx 32768 --cache_type fp16 --flash-attn --gradio-auth karlheinz:Stinkepups"

The names of the parameters correspond to the labels on the UI, except for “Flash attention”, here, instead of the underscore _ a hyphen - must be used.

Ports

If you want to use OTGW for SillyTavern or other programs, remember to comment in
the API port 5000 (remove the #) and specify --api in the parameter list!

Create the container

Then build and start the container:


docker compose up --build -d

In general, models downloaded through OTGW will be stored in the ./config/models directory. This is integrated into the container via a bind mount so that the downloaded models are persisted.

If you have already downloaded models and saved them in a different directory, you can adjust the directory used for the aforementioned bind-mount in the docker-compose.yml file.

Updates

You can update the container / the image with the command:


docker compose down
docker compose pull

Then execute:


docker compose up --build -d

Source: