January 27, 202613 min read

Why We Chose 4 Languages for 1 App: ChatML's Polyglot Architecture

Go backend, React frontend, Rust desktop shell, Node.js agent runner. Here's why ChatML uses four languages.

architecturepolyglotgorusttaurinextjssystem-designopen-source

ChatML Team

Every engineer who hears about ChatML's stack for the first time has the same reaction: "Four languages? Why?"

It is a fair question. Polyglot architecture carries real costs -- four build systems, four dependency managers, four sets of idioms to keep in your head. We did not arrive here by accident or by resume-driven development. We arrived here because each layer of a desktop AI coding tool has fundamentally different requirements, and no single language satisfies all of them well.

This post explains the reasoning behind each choice, the trade-offs we accepted, and why we believe clean interfaces between layers matter more than language uniformity.

If you have not read the post on the core problem ChatML solves, start there for context on what we are building and why.

Why not just one language?

The two obvious single-language approaches for a desktop app are Electron (all JavaScript) and a fully native stack (all Rust or all Swift).

The Electron path would give us one language across the entire stack. Build the backend in Node.js, the frontend in React, the agent runner in JavaScript -- everything is JS. The problem is well-documented: Electron ships an entire Chromium browser and a full Node.js runtime in every binary. A minimal Electron app starts at 150-200MB on disk and 200-300MB of RAM at idle. For a tool that developers keep open all day alongside their editor, IDE, browser, and Docker containers, that memory overhead is not a rounding error.

The all-Rust path would give us native performance and small binaries. Rust GUI frameworks like Dioxus and Iced are improving, but they are not at the level where you can iterate on complex UI as fast as you can with React. When your application has a rich, interactive interface with animations, real-time streaming content, diff viewers, and terminal emulators, the speed of UI iteration matters enormously. We would spend weeks in Rust achieving what takes days in React.

The all-Go path has similar UI limitations. Go excels at backend services but has no competitive desktop GUI story.

So we made a different bet: use each language where it is genuinely the best tool, and invest heavily in the interfaces between them.

Layer 1: Rust and Tauri -- the desktop shell

Rust handles everything that touches the operating system directly. In ChatML, the Tauri 2 layer is responsible for:

The application shell. Window management, system tray, native menus, dock integration on macOS.
Encrypted credential storage. API keys are stored using Tauri's Stronghold module, which provides encrypted-at-rest storage backed by the OS keychain. No plaintext secrets on disk.
PTY terminal management. Each agent session needs a pseudo-terminal. Rust's memory safety guarantees matter here -- PTY management involves raw file descriptors and concurrent I/O.
Deep links for OAuth. When a user authenticates with GitHub, the OS routes the chatml:// callback URL to the Rust layer, which completes the OAuth flow.
Sidecar process lifecycle. The Rust layer spawns and manages the Go backend as a sidecar process, handling startup, health checks, and graceful shutdown.

The result is a compact native binary. The Rust and Go binaries together come in at 54MB -- the full installed app is ~155MB including the bundled Node.js agent runner. Memory usage at idle hovers around 40-60MB for the entire application -- Rust shell, web view, and Go backend combined.

Tauri 2 is the critical piece that makes this polyglot architecture work. It provides the bridge between the native Rust layer and the web-based UI through a command system:

#[tauri::command]
async fn get_workspaces(state: State<'_, AppState>) -> Result<Vec<Workspace>, String> {
    let workspaces = state.workspace_manager
        .list()
        .await
        .map_err(|e| e.to_string())?;
    Ok(workspaces)
}

The frontend calls these commands as if they were async function calls. Tauri handles the serialization, the thread boundary, and the bridge between the web view and native code. For a deeper look at why we chose Tauri 2 over Electron and what that decision entails, see our Tauri deep dive.

Layer 2: Next.js and React -- the UI

The entire ChatML user interface is a Next.js application rendered inside Tauri's web view. This includes:

The workspace selector and session list
The real-time agent conversation view with streaming output
The diff viewer showing file changes across worktrees
The pull request creation and review interface
Settings, onboarding, and all configuration screens
Animations and transitions (powered by Framer Motion)

React was chosen for one overriding reason: the UI is the layer that changes the most, so it needs the fastest iteration cycle. During development, we use Vite's hot module replacement. Change a component, see it reflected instantly. No recompilation, no relaunch. When you are tuning the feel of a streaming agent output display or adjusting the layout of a multi-pane diff viewer, this feedback loop is the difference between shipping in a week and shipping in a month.

The React ecosystem also gives us leverage that does not exist in other GUI frameworks:

Framer Motion for physics-based animations on session transitions
MDX for rendering rich content (like changelogs and documentation) as components
A vast component ecosystem that means we are not building date pickers and dropdown menus from scratch

One question we get asked: "If it is a Next.js app, why not just ship it as a web app?" The answer is that a web app cannot manage local git repositories, spawn terminal processes, store credentials securely, or integrate with the OS file system. The web view handles presentation. Everything that requires system access goes through Tauri commands to the Rust layer or HTTP calls to the Go backend.

Layer 3: Go -- the backend engine

The Go backend is the operational core of ChatML. It runs as a sidecar process managed by the Tauri shell and handles:

HTTP API server. RESTful endpoints for workspace management, session CRUD, configuration, and git operations.
WebSocket hub. Real-time bidirectional communication for streaming agent output to the UI. When an agent writes a file or runs a command, the result flows from the Node.js agent runner through the Go backend's WebSocket hub to the React frontend in real time.
Git operations. Worktree creation, branch management, diff generation, status tracking. Go shells out to git for complex operations but handles parsing and orchestration.
Session management. Each agent session is a managed process with its own worktree, environment, and lifecycle. The Go backend tracks session state, handles restarts, and ensures clean teardown.
Process lifecycle. Spawning and managing Node.js agent runner processes, multiplexing their output, and routing messages to the correct WebSocket connections.

Why Go specifically? Three reasons:

Concurrency model. A ChatML user might have five or ten agent sessions running simultaneously. Each session involves managing a child process, reading from its stdout and stderr, writing to its stdin, tracking file system changes in its worktree, and streaming results over a WebSocket. Goroutines handle this naturally. The WebSocket hub alone manages concurrent reads and writes across all connected clients with a straightforward fan-out pattern:

type Hub struct {
    sessions   map[string]*Session
    clients    map[*Client]bool
    broadcast  chan *Message
    register   chan *Client
    unregister chan *Client
}
 
func (h *Hub) run() {
    for {
        select {
        case client := <-h.register:
            h.clients[client] = true
        case client := <-h.unregister:
            delete(h.clients, client)
            close(client.send)
        case message := <-h.broadcast:
            for client := range h.clients {
                if client.sessionID == message.SessionID {
                    select {
                    case client.send <- message:
                    default:
                        close(client.send)
                        delete(h.clients, client)
                    }
                }
            }
        }
    }
}

For a detailed breakdown of the WebSocket streaming architecture and how agent output flows through the system, see our streaming architecture post.

Compilation speed and single binary output. The Go backend compiles in under two seconds on an M-series Mac. It produces a single static binary with no runtime dependencies, which makes it trivial to bundle as a Tauri sidecar. No JVM, no runtime, no shared libraries to manage.

Operational simplicity. Go's standard library covers HTTP servers, JSON handling, process management, and file I/O without external dependencies for the core functionality. The dependency tree stays small, and the binary stays small.

Layer 4: Node.js -- the agent runner

The Node.js layer exists for a single, pragmatic reason: the Anthropic Agent SDK is JavaScript-first.

Each agent session spawns a Node.js process that:

Initializes a Claude agent using the Anthropic SDK
Manages the conversation loop (sending user messages, receiving assistant responses)
Executes tools that the agent requests (reading files, writing files, running bash commands, searching code)
Streams results back to the Go backend

The communication between the Node.js agent runner and the Go backend uses a JSON protocol over stdout and stderr. Each line of stdout is a JSON message with a type field:

{"type": "assistant_message", "session": "abc123", "content": "I'll update the function..."}
{"type": "tool_use", "session": "abc123", "tool": "write_file", "path": "src/utils.ts"}
{"type": "tool_result", "session": "abc123", "tool": "write_file", "success": true}
{"type": "stream_delta", "session": "abc123", "delta": "Let me check the test"}

The Go backend reads this stream line by line, parses each JSON message, and routes it to the appropriate WebSocket client. Stderr is reserved for logging and error reporting, keeping the protocol channel clean.

We considered writing the agent runner in Go or Rust. The problem is that the Claude Agent SDK -- including its tool execution framework, conversation management, and streaming support -- is built for JavaScript. Wrapping it via FFI or reimplementing it in another language would mean maintaining a fork that drifts from upstream. Using Node.js directly means we get SDK updates on day one, and the agent runner code stays straightforward:

const agent = new Agent({
  model: "claude-sonnet-4-20250514",
  tools: [fileReadTool, fileWriteTool, bashTool, grepTool],
  onStreamEvent: (event) => {
    process.stdout.write(JSON.stringify({
      type: "stream_delta",
      session: sessionId,
      delta: event.text,
    }) + "\n");
  },
});

The Node.js layer is intentionally thin. It does not manage state, serve HTTP, or handle UI concerns. It is a process that runs a conversation and streams structured output. This containment is what makes using a fourth language acceptable -- the blast radius is small.

How the layers communicate

The communication patterns between layers are deliberate and consistent:

React UI <--Tauri Commands--> Rust Shell
React UI <--HTTP/WebSocket--> Go Backend
Rust Shell --Sidecar Mgmt--> Go Backend
Go Backend --stdin/stdout--> Node.js Agent Runner

React to Rust (Tauri Commands): Used for anything that requires native OS access -- credential storage, file dialogs, deep link handling, window management. These are typed async function calls.

React to Go (HTTP and WebSocket): The bulk of application logic flows here. REST endpoints for CRUD operations, WebSocket connections for real-time streaming. The React app connects to the Go backend's local HTTP server on a dynamic port.

Rust to Go (Sidecar Management): The Rust layer spawns the Go binary as a child process, monitors its health, and handles restart logic. Communication is minimal -- mostly process lifecycle signals.

Go to Node.js (stdout/stderr JSON Protocol): The Go backend spawns agent runner processes and communicates via newline-delimited JSON over standard I/O. This is the simplest possible IPC mechanism: no sockets to manage, no serialization libraries, no protocol buffers. A line of JSON in, a line of JSON out.

The key architectural principle: the interfaces between layers are more important than the languages themselves. Each boundary is a well-defined protocol. The Go backend does not care that the agent runner is written in Node.js -- it reads JSON from stdout. The React frontend does not care that credentials are stored by Rust -- it calls a Tauri command and gets a result. This decoupling means any layer can be rewritten without affecting the others, as long as the interface contract is maintained.

Trade-offs and lessons learned

We will not pretend this is free. Here is what a polyglot architecture actually costs:

Build complexity. Our build pipeline compiles Rust (via Cargo), Go (via go build), TypeScript (via Next.js/Vite), and bundles Node.js agent runner code. The CI matrix is non-trivial. A new contributor needs Rust, Go, and Node.js toolchains installed before they can build locally.

Onboarding cost. An engineer who knows React but not Go will need time to become productive in the backend. We mitigate this by keeping each layer focused -- you rarely need to work across all four languages for a single feature.

Debugging across boundaries. When a bug spans layers -- say, a message is malformed between the Go backend and the Node.js agent runner -- you are debugging across language boundaries, log formats, and process boundaries simultaneously. We invested in structured logging and correlation IDs early to make this bearable.

Dependency management. Four package.json / go.mod / Cargo.toml / package.json files to keep updated. Four sets of security advisories to monitor. Four ecosystems worth of breaking changes to track.

But here is what we got in return:

Each layer is simple. The Rust layer does not contain business logic. The Go backend does not render UI. The Node.js layer does not manage state. The React app does not touch the file system directly. Each codebase is small, focused, and comprehensible on its own.

Performance where it matters. The core binaries are 54MB, not 200MB. Memory usage is 50MB, not 500MB. The Go backend handles concurrent sessions without breaking a sweat. The Rust shell launches instantly.

Fast iteration where it matters. UI changes -- the most frequent kind -- have sub-second feedback loops. Backend API changes compile in two seconds. Only the Rust layer has non-trivial compile times, and it changes the least.

Future-proofing. If a better agent SDK emerges in Python, we swap the Node.js runner. If a Rust GUI framework matures to the point where it is competitive with React, we could migrate the UI. Each layer is replaceable independently.

The lesson we would pass on to other teams considering polyglot architecture: do not start polyglot, arrive there. We did not sit down on day one and decide to use four languages. We started with React (because that is what we knew), chose Tauri for the shell (because Electron was too heavy), added Go for the backend (because we needed excellent concurrency), and accepted Node.js for the agent runner (because the SDK required it). Each decision was made at the boundary where a new requirement appeared.

If your app spans native OS integration, rich interactive UI, concurrent backend services, and third-party SDK integration -- no single language is the best choice for all of them. Define clean interfaces. Let each layer use the best tool. Accept the build complexity as the cost of shipping a better product.

When to Go Polyglot vs. Monoglot

Not every app needs four languages. Here is a framework for deciding:

Stay monoglot when:

Your app fits comfortably in one runtime (pure web app, pure CLI tool, pure mobile app)
Your team is small and cross-language context switching would slow you down
The ecosystem of your primary language covers all your requirements adequately
Build and deployment simplicity is a top priority

Consider polyglot when:

Different layers have fundamentally different performance or capability requirements
A critical SDK or library only exists in a specific language (like our Node.js agent runner)
You need native OS integration that your primary language cannot provide efficiently
The layers change at very different rates (UI changes daily, native shell changes monthly)
You can define clean, stable interfaces between layers that limit cross-language coupling

The key question: Is the complexity cost of multiple languages less than the compromise cost of forcing one language into roles it was not designed for? For ChatML, the answer was clearly yes. A native app with 80 MB RAM usage and 54 MB core binaries would not have been possible with a single-language approach that also delivered the UI richness and SDK compatibility we needed.

ChatML is a native macOS app for AI-assisted development with parallel agent sessions and isolated git worktrees. Download it here or check out the source on GitHub.