Text-Generation-WebUI


Oobabooga Text Generation WebUI is an application that allows you to run LLMs in various formats (GGUF, EXL2, EXL3, Transformers). Additionally, it provides an OpenAI -compatible API through which the model can also be made available in the LAN.

First, you need to clone the application from Github and adjust the configuration files.

git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui
ln -s docker/{nvidia/Dockerfile,nvidia/docker-compose.yml,.dockerignore} .
cp docker/.env.example .env

docker-compose.yml:

version: "3.3"
services:
  text-generation-webui:
    restart: unless-stopped
    build:
      context: .
      args:
        # Requirements file to use: 
        # | GPU | requirements file to use |
        # |--------|---------|
        # | NVIDIA | `requirements.txt` |
        # | AMD | `requirements_amd.txt` |
        # | CPU only | `requirements_cpu_only.txt` |
        # | Apple Intel | `requirements_apple_intel.txt` |
        # | Apple Silicon | `requirements_apple_silicon.txt` |
        # Default: requirements.txt`
        # BUILD_REQUIREMENTS: requirements.txt

        # Extension requirements to build: 
        # BUILD_EXTENSIONS: 

        # specify which cuda version your card supports: https://developer.nvidia.com/cuda-gpus
        TORCH_CUDA_ARCH_LIST: ${TORCH_CUDA_ARCH_LIST:-8.9} 
        BUILD_EXTENSIONS: ${BUILD_EXTENSIONS:-}
        APP_GID: ${APP_GID:-1000} 
        APP_UID: ${APP_UID:-1000} 
    env_file: .env
    user: "${APP_RUNTIME_UID:-1000}:${APP_RUNTIME_GID:-1000}"
    ports:
      - "${HOST_PORT:-7860}:${CONTAINER_PORT:-7860}"
      - "${HOST_API_PORT:-5000}:${CONTAINER_API_PORT:-5000}"
    stdin_open: true
    tty: true
    volumes:
      - ./user_data/cache:/home/app/text-generation-webui/user_data/cache
      - ./user_data/characters:/home/app/text-generation-webui/user_data/characters
      - ./user_data/extensions:/home/app/text-generation-webui/user_data/extensions
      - ./user_data/grammars:/home/app/text-generation-webui/user_data/grammars
      - ./user_data/instruction-templates:/home/app/text-generation-webui/user_data/instruction-templates
      - ./user_data/logs:/home/app/text-generation-webui/user_data/logs
      - ./user_data/loras:/home/app/text-generation-webui/user_data/loras
      - ./user_data/mmproj:/home/app/text-generation-webui/user_data/mmproj
      - /media/kilo/ki/data/models/text/models:/home/app/text-generation-webui/user_data/models
      - ./user_data/presets:/home/app/text-generation-webui/user_data/presets
      - ./user_data/training:/home/app/text-generation-webui/user_data/training
      - ./user_data/CMD_FLAGS.txt:/home/app/text-generation-webui/user_data/CMD_FLAGS.txt
      - ./user_data/settings.yaml:/home/app/text-generation-webui/user_data/settings.yaml
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Dockerfile:

# BUILDER
FROM ubuntu:22.04
WORKDIR /builder
ARG TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST:-3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX}"
ARG BUILD_EXTENSIONS="${BUILD_EXTENSIONS:-}"
ARG APP_UID="${APP_UID:-1000}"
ARG APP_GID="${APP_GID:-1000}"

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,rw \
    apt update && \
    apt install --no-install-recommends -y git vim build-essential python3-dev pip bash curl && \
    rm -rf /var/lib/apt/lists/*
WORKDIR /home/app/
RUN git clone https://github.com/oobabooga/text-generation-webui.git 
WORKDIR /home/app/text-generation-webui
RUN GPU_CHOICE=A LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=TRUE ./start_linux.sh --verbose
COPY /user_data/CMD_FLAGS.txt /home/app/text-generation-webui/user_data
EXPOSE ${CONTAINER_PORT:-7860} ${CONTAINER_API_PORT:-5000} ${CONTAINER_API_STREAM_PORT:-5005}
WORKDIR /home/app/text-generation-webui
# set umask to ensure group read / write at runtime
CMD umask 0002 && export HOME=/home/app/text-generation-webui && ./start_linux.sh --listen

.env:

# by default the Dockerfile specifies these versions: 3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX
# however for me to work i had to specify the exact version for my card ( 2060 ) it was 7.5
# https://developer.nvidia.com/cuda-gpus you can find the version for your card here
# Or for a programatic approach run `nvidia-smi --query-gpu=name,compute_cap --format=csv`
TORCH_CUDA_ARCH_LIST=8.9
# the port the webui binds to on the host
HOST_PORT=7860
# the port the webui binds to inside the container
CONTAINER_PORT=7860
# the port the api binds to on the host
HOST_API_PORT=5000
# the port the api binds to inside the container
CONTAINER_API_PORT=5000
# Comma separated extensions to build
BUILD_EXTENSIONS=""
# Set APP_RUNTIME_GID to an appropriate host system group to enable access to mounted volumes 
# You can find your current host user group id with the command `id -g`
APP_RUNTIME_GID=1000
# override default app build permissions (handy for deploying to cloud)
APP_GID=1000
APP_UID=1000
# Set cache env
TRANSFORMERS_CACHE=/home/app/text-generation-webui/cache/
HF_HOME=/home/app/text-generation-webui/cache/

TORCH_CUDA_ARCH_LIST=8.9 must be adjusted to the CUDA capabilities of your GPU. See 🔗CUDA GPU Compute Capability


user_data/CMD_FLAGS.txt:

# Add persistent flags here to use every time you launch the web UI.
# Example:
--listen 
--api
--verbose
--gradio-auth username:password

--model GGUF-Mistral-Small-3.2-24B-Instruct-2506-Q8_0_bartowski.gguf
--loader llama.cpp
--ctx-size 131072
--cache-type q8_0
--row-split
--flash-attn

Create a dummy file for the configuration (otherwise Docker tries to create a folder named settings.yaml):

touch user_data/settings.yaml

Next, you need to build and run the container as a service using docker compose up --build -d.

Source: