Text-Generation-WebUI


Oobabooga Text-Generation WebUI ist eine Anwendung, mit der man LLM in verschiedenen Formaten (GGUF, EXL2, EXL3, Transformers) laufen lassen kann. Darüber hinaus stellt sie eine OpenAI-kompatible API bereit, über die man das Modell auch im LAN bereitstellen kann.

Zunächst musst Du die Anwendung von Github klonen und die Konfigurationsdateien anpassen.

git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui
ln -s docker/{nvidia/Dockerfile,nvidia/docker-compose.yml,.dockerignore} .
cp docker/.env.example .env

docker-compose.yml

version: "3.3"
services:
  text-generation-webui:
    restart: unless-stopped
    build:
      context: .
      args:
        # Requirements file to use: 
        # | GPU | requirements file to use |
        # |--------|---------|
        # | NVIDIA | `requirements.txt` |
        # | AMD | `requirements_amd.txt` |
        # | CPU only | `requirements_cpu_only.txt` |
        # | Apple Intel | `requirements_apple_intel.txt` |
        # | Apple Silicon | `requirements_apple_silicon.txt` |
        # Default: requirements.txt`
        # BUILD_REQUIREMENTS: requirements.txt

        # Extension requirements to build: 
        # BUILD_EXTENSIONS: 

        # specify which cuda version your card supports: https://developer.nvidia.com/cuda-gpus
        TORCH_CUDA_ARCH_LIST: ${TORCH_CUDA_ARCH_LIST:-8.9} 
        BUILD_EXTENSIONS: ${BUILD_EXTENSIONS:-}
        APP_GID: ${APP_GID:-1000} 
        APP_UID: ${APP_UID:-1000} 
    env_file: .env
    user: "${APP_RUNTIME_UID:-1000}:${APP_RUNTIME_GID:-1000}"
    ports:
      - "${HOST_PORT:-7860}:${CONTAINER_PORT:-7860}"
      - "${HOST_API_PORT:-5000}:${CONTAINER_API_PORT:-5000}"
    stdin_open: true
    tty: true
    volumes:
      - ./user_data/cache:/home/app/text-generation-webui/user_data/cache
      - ./user_data/characters:/home/app/text-generation-webui/user_data/characters
      - ./user_data/extensions:/home/app/text-generation-webui/user_data/extensions
      - ./user_data/grammars:/home/app/text-generation-webui/user_data/grammars
      - ./user_data/instruction-templates:/home/app/text-generation-webui/user_data/instruction-templates
      - ./user_data/logs:/home/app/text-generation-webui/user_data/logs
      - ./user_data/loras:/home/app/text-generation-webui/user_data/loras
      - ./user_data/mmproj:/home/app/text-generation-webui/user_data/mmproj
      - /media/kilo/ki/data/models/text/models:/home/app/text-generation-webui/user_data/models
      - ./user_data/presets:/home/app/text-generation-webui/user_data/presets
      - ./user_data/training:/home/app/text-generation-webui/user_data/training
      - ./user_data/CMD_FLAGS.txt:/home/app/text-generation-webui/user_data/CMD_FLAGS.txt
      - ./user_data/settings.yaml:/home/app/text-generation-webui/user_data/settings.yaml
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Dockerfile

# BUILDER
FROM ubuntu:22.04
WORKDIR /builder
ARG TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST:-3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX}"
ARG BUILD_EXTENSIONS="${BUILD_EXTENSIONS:-}"
ARG APP_UID="${APP_UID:-1000}"
ARG APP_GID="${APP_GID:-1000}"

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked,rw \
    apt update && \
    apt install --no-install-recommends -y git vim build-essential python3-dev pip bash curl && \
    rm -rf /var/lib/apt/lists/*
WORKDIR /home/app/
RUN git clone https://github.com/oobabooga/text-generation-webui.git 
WORKDIR /home/app/text-generation-webui
RUN GPU_CHOICE=A LAUNCH_AFTER_INSTALL=FALSE INSTALL_EXTENSIONS=TRUE ./start_linux.sh --verbose
COPY /user_data/CMD_FLAGS.txt /home/app/text-generation-webui/user_data
EXPOSE ${CONTAINER_PORT:-7860} ${CONTAINER_API_PORT:-5000} ${CONTAINER_API_STREAM_PORT:-5005}
WORKDIR /home/app/text-generation-webui
# set umask to ensure group read / write at runtime
CMD umask 0002 && export HOME=/home/app/text-generation-webui && ./start_linux.sh --listen

.env

# by default the Dockerfile specifies these versions: 3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX
# however for me to work i had to specify the exact version for my card ( 2060 ) it was 7.5
# https://developer.nvidia.com/cuda-gpus you can find the version for your card here
# Or for a programatic approach run `nvidia-smi --query-gpu=name,compute_cap --format=csv`
TORCH_CUDA_ARCH_LIST=8.9
# the port the webui binds to on the host
HOST_PORT=7860
# the port the webui binds to inside the container
CONTAINER_PORT=7860
# the port the api binds to on the host
HOST_API_PORT=5000
# the port the api binds to inside the container
CONTAINER_API_PORT=5000
# Comma separated extensions to build
BUILD_EXTENSIONS=""
# Set APP_RUNTIME_GID to an appropriate host system group to enable access to mounted volumes 
# You can find your current host user group id with the command `id -g`
APP_RUNTIME_GID=1000
# override default app build permissions (handy for deploying to cloud)
APP_GID=1000
APP_UID=1000
# Set cache env
TRANSFORMERS_CACHE=/home/app/text-generation-webui/cache/
HF_HOME=/home/app/text-generation-webui/cache/

TORCH_CUDA_ARCH_LIST=8.9 muss an die CUDA-Fähigkeiten Deiner Grafikkarte angepasst werden. Siehe 🔗CUDA GPU Compute Capability


user_data/CMD_FLAGS.txt

# Add persistent flags here to use every time you launch the web UI.
# Example:
--listen 
--api
--verbose
--gradio-auth xanthan666:Ichspiel0nline

--model GGUF-Qwen3-Coder-30B-A3B-Instruct.Q8_0_mradermacher.gguf
--loader llama.cpp
--ctx-size 131072
--cache-type q8_0
--row-split
--flash-attn

Eine Dummy-Datei für die Konfiguration (sonst versucht Docker, einen Ordner mit dem Namen settings.yaml anzulegen):

touch user_data/settings.yaml

Anschließend musst Du den Container mittels docker compose up --build -d bauen und als Dienst starten.

Quelle: