Plugging KataGo Into the 5x5 Demo - Writing

The previous post shipped a 2-ply minimax bot for 5x5 Go. It plays better than a beginner because two plies is enough to avoid placing stones into capture, but it cannot read sequences. KataGo can. This post swaps the local bot for KataGo running in a container behind a small HTTP wrapper.

Play the KataGo version. Source: server, frontend.

Why server-side

KataGo can compile to WebAssembly. The blocker is the network weights: the smallest distributed full-strength network is around 50 MB, and the one I am running here is 83 MB (g170e-b20c256x2). Add the WASM binary plus the runtime memory KataGo wants for MCTS, and a client-side build means a 100+ MB initial download to play on 25 board points.

Hosting cost is one container on the same Hetzner VPS that runs everything else on this site. RAM at idle is ~250 MB, peaks around 600 MB during a 32-visit search.

The KataGo analysis engine

KataGo has two interactive modes. katago gtp speaks the GTP protocol, designed for human-style game clients. katago analysis speaks JSON over stdin/stdout, designed for programs. Analysis mode is the right fit: each request is a self-contained JSON object with the full move history, the rules, and the search budget; each response is a JSON object with the candidate moves sorted by play-selection value.

{"id":"q1","moves":[["B","C3"]],"rules":"chinese","komi":2.5,
 "boardXSize":5,"boardYSize":5,"analyzeTurns":[1],"maxVisits":32}

Response (truncated):

{"id":"q1","moveInfos":[
  {"move":"B3","winrate":0.038,"scoreLead":-17.3,"visits":7},
  {"move":"D3","winrate":0.038,"scoreLead":-17.3,"isSymmetryOf":"B3"},
  ...],
 "rootInfo":{"visits":33,"winrate":0.116}}

moveInfos[0] is KataGo’s pick. Note the winrate: 0.038 for White: the 5x5 game is solved as a Black win even with 2.5 komi, so KataGo correctly thinks White is losing badly from move one. The bot still plays the best move it can find.

The Python wrapper

KataGo runs as a subprocess for the lifetime of the FastAPI process. One reader thread pulls JSON lines off stdout and routes them to per-request queues by id. POST /move serialises the request, writes to stdin, and waits for the matching response.

class _KataGo:
    def __init__(self):
        self.proc = subprocess.Popen(
            [KATAGO_BIN, "analysis", "-model", KATAGO_MODEL, "-config", KATAGO_CONFIG],
            stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL,
            bufsize=1, text=True,
        )
        self._lock = threading.Lock()
        self._responses: dict[str, Queue] = {}
        threading.Thread(target=self._read_loop, daemon=True).start()

    def query(self, payload, timeout=30.0):
        qid = uuid.uuid4().hex
        payload["id"] = qid
        q = Queue()
        self._responses[qid] = q
        with self._lock:
            self.proc.stdin.write(json.dumps(payload) + "\n")
            self.proc.stdin.flush()
        return q.get(timeout=timeout)

The lock is around stdin writes only. KataGo handles concurrent queries internally, so the reader thread can interleave responses for different ids.

Coordinate translation

The frontend uses (x, y) indices. KataGo expects letter-number coordinates with the column letter I skipped (legacy from human Go scoresheets). On a 5x5 board only A B C D E appear, so the skip never matters, but the mapping still needs to flip the y axis: KataGo’s row 1 is the bottom of the board, my y=0 is the top.

const LETTERS = "ABCDE";
const xyToKata = (x, y) => LETTERS[x] + (N - y);

That is the entire bridge. The frontend stores the move history as [[colourLetter, kataCoord], ...] and posts it as-is.

Dockerfile

The first attempt downloaded KataGo’s prebuilt eigenavx2-linux-x64 binary from the GitHub releases page. The build succeeded, the container started, and the first /move call came back with Exec format error: '/usr/local/bin/katago'. The Hetzner box is a CAX (ARM) instance and KataGo only ships linux-x64 binaries.

The fix is a multi-stage build that compiles KataGo from source for whichever architecture Docker is building on. Stage 1 installs cmake and libeigen3-dev and runs the standard cmake -DUSE_BACKEND=EIGEN -DBUILD_DISTRIBUTED=0 build. Stage 2 starts from python:3.11-slim-bookworm and copies just the binary across, so the runtime image does not carry the build toolchain.

FROM debian:bookworm-slim AS builder
RUN apt-get update && apt-get install -y --no-install-recommends \
      build-essential cmake git zlib1g-dev libzip-dev libeigen3-dev ca-certificates
ARG KATAGO_VERSION=v1.16.4
RUN git clone --depth 1 --branch ${KATAGO_VERSION} https://github.com/lightvector/KataGo.git /src
WORKDIR /src/cpp
RUN cmake . -DUSE_BACKEND=EIGEN -DBUILD_DISTRIBUTED=0 -DUSE_AVX2=0 \
    && make -j$(nproc)

FROM python:3.11-slim-bookworm
RUN apt-get update && apt-get install -y --no-install-recommends libzip4 zlib1g
COPY --from=builder /src/cpp/katago /usr/local/bin/katago
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY analysis.cfg server.py katago-network.bin.gz /app/
ENV KATAGO_BIN=/usr/local/bin/katago \
    KATAGO_MODEL=/app/katago-network.bin.gz \
    KATAGO_CONFIG=/app/analysis.cfg
CMD ["uvicorn","server:app","--host","0.0.0.0","--port","8000"]

The 83 MB network sits in the image rather than being fetched at startup, which keeps cold starts fast and avoids a runtime dependency on KataGo’s distribution server. Compilation adds about 4 minutes to the build but only happens when the Dockerfile changes, since Dokploy caches the builder stage.

Search budget

maxVisits: 32 is the parameter that decides response time. On my M3 Max with the Metal backend the same call takes ~150 ms; on the ARM VPS with no SIMD acceleration it takes 2.5 to 4.5 seconds depending on tree depth. That is slow enough that the frontend shows a “KataGo thinking” indicator on the board. Higher visit counts barely change the chosen move on a board this small; KataGo’s policy network already concentrates probability on a few candidate points and MCTS just confirms.

Scoring

The frontend still does scoring locally. There is no point round-tripping to KataGo for area counting on a 5x5 board when the JavaScript implementation from the previous post already works. KataGo only runs when it is the bot’s turn to move.

What changed between the two demos

The 2-ply bot played stones into capture early in the game until I added the second ply of lookahead. KataGo never does this - it has read enough Go positions to know that an unsupported stone next to two opponent stones is dead. Watching it play 5x5 is closer to playing a strong club player than playing a search algorithm: it picks moves that look obvious in retrospect and crushes anyone who is not playing the perfect Black sequence.

The server itself is small enough that it shares the same Hetzner VPS as the rest of the site, alongside the Astro build, the previous-post demos, and a few unrelated containers.