Agent Sandboxing

Safely execute LLM-generated code on OpenShift with defense-in-depth isolation.

View as Markdown

When agents execute LLM-generated code — tool calls, data analysis, or code-interpreter tasks — that code runs with the same permissions as the agent process. Without isolation, a single prompt injection can read mounted secrets, exfiltrate data over the network, or escalate privileges on the host.

The code sandbox solves this by wrapping execution in multiple independent security layers. Each layer assumes the one above it has been bypassed. No single layer is sufficient; together they make exploitation impractical even when the attacker controls the input source code.

The sandbox implementation is available at github.com/eformat/code-sandbox. The repo contains the Landlock, seccomp, and guardrails code shown on this page. The fips-agents/examples repo walks through deploying it on OpenShift step by step.

Defense in Depth

The sandbox uses five layers of defense, each operating at a different level of the stack:

LayerTechnologyWhat it blocks
1. Static analysisAST visitorDangerous calls, blocked imports, dunder traversal, SQL injection
2. Isolated subprocesspython3 -I + preambleRuntime import bypass, builtin abuse, memory exhaustion
3. Filesystem restrictionLandlock LSMReading app source, secrets, config files outside /tmp
4. Syscall filteringseccomp BPFNetwork sockets (TCP + UDP), io_uring, splice
5. Cluster enforcementNetworkPolicy + SeccompProfile + SCCAll egress traffic, container escape, privilege escalation

Layers 1-4 are applied in-process by the sandbox application — AST analysis, Python subprocess isolation, Landlock filesystem restriction, and seccomp syscall filtering. Layer 5 uses OpenShift platform features — NetworkPolicy, SeccompProfile (via Security Profiles Operator), and a custom SCC — to enforce a final security boundary that the sandbox cannot bypass, even if fully compromised. The application-level and platform-level controls work in harmony: if an attacker defeats the in-process guardrails, the cluster-enforced policies still block egress traffic, prevent privilege escalation, and deny container escape.

Agent Frameworks

The sandbox supports three deployment modes:

ModeWhen to useSANDBOX_URL
Standalone service (recommended)Production on OpenShift — own Deployment + Service with NetworkPolicy and independent scalinghttp://code-sandbox.<ns>.svc:8000
SidecarSimpler networking — sandbox shares the pod network with the agent, no cross-pod traffic or client labels neededhttp://localhost:8000 (default)
Local containerDevelopment — run the sandbox image on your workstationhttp://localhost:8000 (default)

In all modes your agent sends code to POST /execute and gets back stdout, stderr, and exit code. Each framework wraps this in a run_code tool that the LLM calls to execute code safely. The available imports depend on the sandbox profile — minimal for stdlib only, or data-science for numpy, pandas, and scipy. Pick your framework below to jump to a working example.

Environment variables

Set SANDBOX_URL to point at your sandbox. For standalone service mode, use the in-cluster service URL (e.g. http://code-sandbox.<namespace>.svc:8000). For local development or sidecar mode the default http://localhost:8000 works without changes.

Environment variables
# Sandbox service URL
# Standalone: export SANDBOX_URL="http://code-sandbox.<namespace>.svc:8000"
# Sidecar / local: SANDBOX_URL defaults to http://localhost:8000

# LLM endpoint (OpenAI or any compatible API)
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL_NAME="gpt-4o-mini"

# Optional: use a local or hosted model instead
# export OPENAI_BASE_URL="http://maas.apps.my-cluster.example.com/v1"

LangGraph

LangGraph wraps the sandbox call as a plain Python function registered with create_react_agent. The LLM generates Python code, calls run_code, and summarizes the output.

run_code tool — Sends LLM-generated code to the sandbox via HTTP
"""Code interpreter agent with sandbox — LangGraph."""

import os
import httpx
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

# Sandbox service URL — standalone service or localhost for sidecar/local
SANDBOX_URL = os.environ.get("SANDBOX_URL", "http://localhost:8000")


def run_code(code: str) -> str:
    """Execute Python code in a secure sandbox.

    Use this for computation, data analysis, or any task that
    benefits from running code. Available imports depend on the
    sandbox profile (minimal: stdlib only; data-science: adds
    numpy, pandas, scipy). No filesystem or network access.
    """
    response = httpx.post(
        f"{SANDBOX_URL}/execute",
        json={"code": code},
        timeout=35.0,
    )
    if response.status_code != 200:
        return f"Sandbox error (HTTP {response.status_code}): {response.text}"

    result = response.json()
    parts = []
    if result.get("stdout"):
        parts.append(result["stdout"])
    if result.get("result") is not None:
        parts.append(f"Result: {result['result']}")
    if result.get("stderr"):
        parts.append(f"Stderr: {result['stderr']}")
    if result.get("error"):
        parts.append(f"Error: {result['error']}")
    return "\n".join(parts) if parts else "(no output)"


llm = ChatOpenAI(
    model=os.environ.get("OPENAI_MODEL_NAME", "gpt-4o-mini"),
    base_url=os.environ.get("OPENAI_BASE_URL"),
    api_key=os.environ.get("OPENAI_API_KEY"),
)
agent = create_react_agent(
    llm,
    tools=[run_code],
    prompt="You are a code interpreter. Write Python code to solve "
           "the user's problem, execute it with run_code, and "
           "summarize the output.",
)

result = agent.invoke(
    {"messages": [{"role": "user",
                   "content": "Calculate the first 20 Fibonacci numbers"}]}
)

for msg in result["messages"]:
    print(f"{msg.type}: {msg.content}")

CrewAI

CrewAI uses the @tool decorator to register the sandbox call. The agent's role and goal instruct the LLM to write and execute code for every task.

@tool("run_code") — CrewAI tool decorator wrapping the sandbox call
"""Code interpreter agent with sandbox — CrewAI."""

__import__("pysqlite3")
import sys
sys.modules["sqlite3"] = sys.modules.pop("pysqlite3")

import os
import httpx
from crewai import Agent, Task, Crew, LLM
from crewai.tools import tool

SANDBOX_URL = os.environ.get("SANDBOX_URL", "http://localhost:8000")


@tool("run_code")
def run_code(code: str) -> str:
    """Execute Python code in a secure sandbox.

    Use this for computation, data analysis, or any task that
    benefits from running code. Available imports depend on the
    sandbox profile (minimal: stdlib only; data-science: adds
    numpy, pandas, scipy). No filesystem or network access.
    """
    response = httpx.post(
        f"{SANDBOX_URL}/execute",
        json={"code": code},
        timeout=35.0,
    )
    if response.status_code != 200:
        return f"Sandbox error (HTTP {response.status_code}): {response.text}"

    result = response.json()
    parts = []
    if result.get("stdout"):
        parts.append(result["stdout"])
    if result.get("result") is not None:
        parts.append(f"Result: {result['result']}")
    if result.get("stderr"):
        parts.append(f"Stderr: {result['stderr']}")
    if result.get("error"):
        parts.append(f"Error: {result['error']}")
    return "\n".join(parts) if parts else "(no output)"


llm = LLM(
    model=f"openai/{os.environ.get('OPENAI_MODEL_NAME', 'gpt-4o-mini')}",
    base_url=os.environ.get("OPENAI_BASE_URL"),
    api_key=os.environ.get("OPENAI_API_KEY"),
)

coder = Agent(
    role="Code Interpreter",
    goal="Write and execute Python code to solve problems",
    backstory="You are an expert Python programmer. Write code, "
              "run it in the sandbox via the run_code tool, and "
              "report the results.",
    llm=llm,
    tools=[run_code],
    max_iter=5,
)

task = Task(
    description="Calculate the first 20 Fibonacci numbers using "
                "Python. Use the run_code tool to execute your code.",
    expected_output="The first 20 Fibonacci numbers",
    agent=coder,
)

crew = Crew(agents=[coder], tasks=[task])
result = crew.kickoff()
print(result.raw)

AutoGen

AutoGen registers the sandbox call as a tool function on an AssistantAgent. The agent generates code and executes it in a single turn.

AssistantAgent tools=[run_code] — Tool function registered on the agent
"""Code interpreter agent with sandbox — AutoGen."""

import os
import asyncio
import httpx
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

SANDBOX_URL = os.environ.get("SANDBOX_URL", "http://localhost:8000")


def run_code(code: str) -> str:
    """Execute Python code in a secure sandbox.

    Use this for computation, data analysis, or any task that
    benefits from running code. Available imports depend on the
    sandbox profile (minimal: stdlib only; data-science: adds
    numpy, pandas, scipy). No filesystem or network access.
    """
    response = httpx.post(
        f"{SANDBOX_URL}/execute",
        json={"code": code},
        timeout=35.0,
    )
    if response.status_code != 200:
        return f"Sandbox error (HTTP {response.status_code}): {response.text}"

    result = response.json()
    parts = []
    if result.get("stdout"):
        parts.append(result["stdout"])
    if result.get("result") is not None:
        parts.append(f"Result: {result['result']}")
    if result.get("stderr"):
        parts.append(f"Stderr: {result['stderr']}")
    if result.get("error"):
        parts.append(f"Error: {result['error']}")
    return "\n".join(parts) if parts else "(no output)"


model_name = os.environ.get("OPENAI_MODEL_NAME", "gpt-4o-mini")
model_client = OpenAIChatCompletionClient(
    model=model_name,
    base_url=os.environ.get("OPENAI_BASE_URL"),
    api_key=os.environ.get("OPENAI_API_KEY"),
    model_info={
        "vision": False,
        "function_calling": True,
        "json_output": True,
        "structured_output": True,
        "family": "unknown",
    },
)

agent = AssistantAgent(
    name="code_interpreter",
    model_client=model_client,
    tools=[run_code],
    system_message="You are a code interpreter. Write Python code "
                   "to solve problems, execute it with run_code, "
                   "and summarize the output.",
)


async def main():
    result = await agent.run(
        task="Calculate the first 20 Fibonacci numbers"
    )
    print(result.messages[-1].content)


asyncio.run(main())

LlamaIndex

LlamaIndex wraps the function with FunctionTool.from_defaults and passes it to a ReActAgent. The agent reasons about what code to write, executes it, and interprets the results.

FunctionTool.from_defaults(fn=run_code) — ReAct agent with sandbox code execution
"""Code interpreter agent with sandbox — LlamaIndex."""

import os
import asyncio
import httpx
from llama_index.core.agent.workflow import AgentWorkflow, ReActAgent
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai_like import OpenAILike

SANDBOX_URL = os.environ.get("SANDBOX_URL", "http://localhost:8000")


def run_code(code: str) -> str:
    """Execute Python code in a secure sandbox.

    Use this for computation, data analysis, or any task that
    benefits from running code. Available imports depend on the
    sandbox profile (minimal: stdlib only; data-science: adds
    numpy, pandas, scipy). No filesystem or network access.
    """
    response = httpx.post(
        f"{SANDBOX_URL}/execute",
        json={"code": code},
        timeout=35.0,
    )
    if response.status_code != 200:
        return f"Sandbox error (HTTP {response.status_code}): {response.text}"

    result = response.json()
    parts = []
    if result.get("stdout"):
        parts.append(result["stdout"])
    if result.get("result") is not None:
        parts.append(f"Result: {result['result']}")
    if result.get("stderr"):
        parts.append(f"Stderr: {result['stderr']}")
    if result.get("error"):
        parts.append(f"Error: {result['error']}")
    return "\n".join(parts) if parts else "(no output)"


llm = OpenAILike(
    model=os.environ.get("OPENAI_MODEL_NAME", "gpt-4o-mini"),
    api_base=os.environ.get("OPENAI_BASE_URL"),
    api_key=os.environ.get("OPENAI_API_KEY"),
    is_chat_model=True,
    is_function_calling_model=False,
    context_window=128000,
)

react_agent = ReActAgent(
    name="code_interpreter",
    description="Executes Python code in a sandbox",
    tools=[FunctionTool.from_defaults(fn=run_code)],
    llm=llm,
)

agent = AgentWorkflow(
    agents=[react_agent], root_agent="code_interpreter"
)


async def main():
    response = await agent.run(
        "Calculate the first 20 Fibonacci numbers"
    )
    print(response)


asyncio.run(main())

Google ADK

Google Agent Development Kit (ADK) passes the function directly as a tool. The agent instruction tells the LLM to write Python and use run_code for execution.

tools=[run_code] — ADK agent with sandbox code execution
"""Code interpreter agent with sandbox — Google ADK."""

import os
import asyncio
import httpx
from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types

SANDBOX_URL = os.environ.get("SANDBOX_URL", "http://localhost:8000")


def run_code(code: str) -> str:
    """Execute Python code in a secure sandbox.

    Use this for computation, data analysis, or any task that
    benefits from running code. Available imports depend on the
    sandbox profile (minimal: stdlib only; data-science: adds
    numpy, pandas, scipy). No filesystem or network access.
    """
    response = httpx.post(
        f"{SANDBOX_URL}/execute",
        json={"code": code},
        timeout=35.0,
    )
    if response.status_code != 200:
        return f"Sandbox error (HTTP {response.status_code}): {response.text}"

    result = response.json()
    parts = []
    if result.get("stdout"):
        parts.append(result["stdout"])
    if result.get("result") is not None:
        parts.append(f"Result: {result['result']}")
    if result.get("stderr"):
        parts.append(f"Stderr: {result['stderr']}")
    if result.get("error"):
        parts.append(f"Error: {result['error']}")
    return "\n".join(parts) if parts else "(no output)"


model = LiteLlm(
    model=f"openai/{os.environ.get('OPENAI_MODEL_NAME', 'gpt-4o-mini')}",
    api_base=os.environ.get("OPENAI_BASE_URL"),
    api_key=os.environ.get("OPENAI_API_KEY"),
)

agent = Agent(
    name="code_interpreter",
    model=model,
    description="A code interpreter that runs Python safely",
    instruction="You are a code interpreter. Write Python code to "
                "solve problems, execute it with run_code, and "
                "summarize the output.",
    tools=[run_code],
)

session_service = InMemorySessionService()
runner = Runner(
    agent=agent, app_name="code_interpreter",
    session_service=session_service,
)


async def main():
    session = await session_service.create_session(
        app_name="code_interpreter", user_id="user1",
    )
    message = types.Content(
        role="user",
        parts=[types.Part(
            text="Calculate the first 20 Fibonacci numbers"
        )],
    )
    async for event in runner.run_async(
        user_id="user1", session_id=session.id,
        new_message=message,
    ):
        if event.is_final_response():
            print(event.content.parts[0].text)


asyncio.run(main())

Static Analysis

Before any code executes, the sandbox parses it into an AST and walks every node looking for policy violations. This catches the obvious attacks — calling eval(), importing subprocess, traversing __globals__ — before the code ever reaches a Python interpreter.

AST guardrails

The AST visitor checks bare function calls, attribute access, subscript access, string literals, f-strings, and %-format operations in a single pass:

AST blocked patterns
# Blocked bare function calls
BLOCKED_CALLS = frozenset({
    "eval", "exec", "compile", "__import__", "open",
    "getattr", "setattr", "delattr", "breakpoint", "input",
    "globals", "locals", "vars",
})

# Blocked attribute access on any object
BLOCKED_DUNDERS = frozenset({
    "__subclasses__", "__globals__", "__builtins__",
    "__class__", "__bases__", "__mro__",
    "__dict__", "__code__", "__closure__",
    "__getattribute__", "__getattr__", "__self__",
    "__loader__", "__spec__", "__func__", "__wrapped__",
})

# Frame/generator attributes that expose execution frames
BLOCKED_FRAME_ATTRS = frozenset({
    "f_globals", "f_locals", "f_builtins", "f_code",
    "gi_frame", "gi_code", "cr_frame", "cr_code",
})

# Private module references (e.g. random._os -> os)
BLOCKED_MODULE_ALIASES = frozenset({
    "_os", "_sys", "_subprocess", "_socket", "_signal",
    "_ctypes", "_multiprocessing", "_pickle",
})

The visitor also scans for credential patterns, path traversal sequences, and SQL injection via .format(), f-strings, and %-formatting.

Validation flow
from sandbox.guardrails import validate_code

# Validate LLM-generated code before execution
violations = validate_code(source, allowed_imports=profile.allowed_imports)

if violations:
    return {"error": "Code rejected", "violations": violations}

# Safe to execute — pass to the sandbox
result = await execute_code(source, timeout=30.0)

Import allowlist

Only stdlib modules with no filesystem or network capabilities are permitted. Profiles can extend the allowlist — the data-science profile adds numpy, pandas, and scipy with additional blocklist rules for dangerous attributes on those libraries.

Import profiles
# Minimal profile — stdlib only, no filesystem or network access
ALLOWED_IMPORTS = frozenset({
    "math", "statistics", "itertools", "functools",
    "re", "datetime", "collections", "json", "csv",
    "string", "textwrap", "decimal", "fractions",
    "random", "operator", "typing",
})

# Data-science profile — extends minimal with numpy/pandas/scipy
DATA_SCIENCE_IMPORTS = ALLOWED_IMPORTS | frozenset({
    "numpy", "pandas", "scipy",
})

Isolated Subprocess

Validated code runs in a separate python3 -I subprocess. The -I flag enables isolated mode: user site-packages are disabled and PYTHON* environment variables are ignored. Code is written to a temporary file under /tmp and cleaned up unconditionally after execution.

Subprocess executor
async def execute_code(
    code: str,
    timeout: float = 10.0,
    *,
    memory_limit_mb: int = 512,
    allowed_imports: frozenset[str] | None = None,
    subprocess_landlock: bool = True,
    subprocess_seccomp: bool = True,
) -> ExecutionResult:
    # Build the defense-in-depth preamble
    code = build_memory_preamble(memory_limit_mb) + code
    code = build_preamble(
        allowed_imports=allowed_imports,
        landlock=subprocess_landlock,
        seccomp=subprocess_seccomp,
    ) + code

    # Write to temp file and execute in isolated mode
    with tempfile.NamedTemporaryFile(
        suffix=".py", dir="/tmp", delete=False
    ) as tmp:
        tmp.write(code)

    process = await asyncio.create_subprocess_exec(
        "python3", "-I", tmp.name,  # -I = isolated mode
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )

    raw_stdout, raw_stderr = await asyncio.wait_for(
        process.communicate(), timeout=timeout
    )

Runtime preamble

A defense-in-depth preamble is prepended to every execution. Even if an attacker constructs an import name dynamically (via chr(), bytes.decode(), etc.) to bypass the AST check, the runtime import hook blocks it. The preamble also removes dangerous builtins and monkey-patches operator.attrgetter to reject dunder access:

Runtime preamble (injected before user code)
# Preamble injection order in the subprocess:
# 1. RLIMIT_AS memory limit (before any imports)
# 2. Pre-imports (pandas, numpy — need full builtins to init)
# 3. Landlock second ruleset (tighter filesystem)
# 4. Seccomp BPF filter (block networking + io_uring)
# 5. Runtime import hook + dunder blocking + builtin purging
# 6. User code executes

# Runtime import hook — blocks any module not in the allowlist
def _rimp(name, gl=None, lo=None, fromlist=(), level=0):
    top = name.split('.')[0]
    if level == 0 and top not in _allowed:
        caller = (gl or {}).get('__name__', '__main__')
        if caller == '__main__':
            raise ImportError(f"import of '{name}' blocked by sandbox")
    return _orig(name, gl, lo, fromlist, level)

# Remove dangerous builtins
for _name in ('open', 'breakpoint', 'input'):
    builtins.pop(_name, None)

# Monkey-patch operator to reject dunder attribute access
_orig_ag = operator.attrgetter
def _safe_attrgetter(*attrs):
    for a in attrs:
        for part in str(a).split('.'):
            if _dunder_re.match(part):
                raise RuntimeError('dunder access blocked by sandbox')
    return _orig_ag(*attrs)

Resource limits

The subprocess applies RLIMIT_AS before any imports to cap memory usage (default 512 MB, configurable per profile). A wall-clock timeout (default 10s, max 30s) kills the process if it hangs. Stdout and stderr are capped at 50 KB each to prevent output flooding.

Landlock Filesystem Restriction

Landlock is a Linux Security Module (LSM) that restricts filesystem access without requiring root privileges. It is available on RHEL 9.2+ and enabled by default on OpenShift 4.18+.

Landlock rules are inherited by child processes — applying them to the FastAPI app at startup automatically restricts the code execution subprocess. The subprocess then applies a second, tighter Landlock ruleset that drops paths the parent process needed but the subprocess should never access.

Parent process

The parent process allows read-only access to system paths and read-write access to /tmp only:

Landlock filesystem paths
# Parent process Landlock paths (applied at FastAPI startup)
READ_ONLY_PATHS = [
    "/usr",           # Python binary, stdlib, system tools
    "/lib",           # Shared libraries
    "/lib64",         # 64-bit shared libraries
    "/etc",           # Timezone, locale, ld.so.cache
    "/opt/app-root",  # UBI app directory (FastAPI app)
    "/proc/self",     # Python reads /proc/self/fd, /proc/self/status
]
READ_WRITE_PATHS = ["/tmp"]

# Subprocess Landlock paths (tighter — drops /opt/app-root, /etc)
SUBPROCESS_READ_ONLY = ["/usr", "/lib", "/lib64", "/proc/self"]
SUBPROCESS_READ_WRITE = ["/tmp"]
Applying Landlock at startup
from sandbox.landlock import apply_sandbox_landlock

# Apply at FastAPI startup — inherited by all subprocesses
status = apply_sandbox_landlock()

# status.applied    → True if Landlock is active
# status.abi_version → 1-5 depending on kernel
# status.rules_applied → ["ro:/usr", "ro:/lib", ..., "rw:/tmp"]

# Landlock requires no_new_privs (set automatically by
# OpenShift restricted-v2 SCC via allowPrivilegeEscalation: false)

# ABI version matrix:
#   v1 — filesystem restrictions (Linux 5.13, RHEL 9.2+)
#   v2 — cross-directory rename/link (Linux 5.19)
#   v3 — TRUNCATE right (Linux 6.2)
#   v4 — TCP bind/connect restrictions (Linux 6.7)
#   v5 — abstract Unix socket + signal scope (Linux 6.10)

Landlock requires the no_new_privs bit, which OpenShift's restricted-v2 SCC sets automatically via allowPrivilegeEscalation: false. No extra privileges are needed.

Subprocess tightening

The subprocess applies a second Landlock ruleset that drops /opt/app-root (application source) and /etc (mounted secrets and config). Even if all Python-level defenses are bypassed, the kernel prevents reading application code or credentials. On ABI v4+ kernels, TCP bind and connect are also denied. On ABI v5+, abstract Unix sockets and cross-process signals are scoped.

Seccomp Syscall Filtering

The subprocess installs a seccomp BPF filter that blocks all networking syscalls, io_uring (a container escape vector), and splice() (used in CVE-2026-31431). This closes the UDP gap that Landlock v4 does not cover — Landlock restricts TCP but not UDP, while seccomp blocks socket() entirely:

Blocked syscalls (x86_64)
# Syscalls blocked by the subprocess BPF filter
BLOCKED_SYSCALLS = {
    # All networking — closes the UDP gap Landlock v4 doesn't cover
    "socket": 41,  "connect": 42,  "accept": 43,
    "sendto": 44,  "recvfrom": 45, "sendmsg": 46,
    "recvmsg": 47, "bind": 49,     "listen": 50,
    "setsockopt": 54, "getsockopt": 55, "accept4": 288,

    # io_uring — container escape vector
    "io_uring_setup": 425,
    "io_uring_enter": 426,
    "io_uring_register": 427,

    # splice — CVE-2026-31431 (Copy Fail privilege escalation)
    "splice": 275,
}

# Filter uses SECCOMP_RET_ERRNO (EPERM) for clean errors
# Wrong-architecture processes are killed immediately

The filter uses SECCOMP_RET_ERRNO with EPERM so the subprocess receives a clean error rather than being killed. Wrong-architecture processes are killed immediately with SECCOMP_RET_KILL_PROCESS. Both x86_64 and aarch64 are supported.

OpenShift Deployment

The recommended deployment pattern runs the sandbox as a standalone service — its own Deployment and Service in the cluster. The agent reaches it via the internal service URL (http://code-sandbox.<namespace>.svc:8000). Agent pods need the code-sandbox-client: "true" label to pass the NetworkPolicy. The cluster enforces a final layer of security that the sandbox cannot bypass, even if fully compromised.

Alternatively, the sandbox can run as a sidecar container in the agent pod, sharing the pod network namespace so the agent reaches it at localhost:8000. Use sidecar mode when you want simpler networking (no cross-pod traffic, no client labels needed) and are willing to couple the sandbox lifecycle to the agent pod.

Sidecar deployment (alternative)

If you prefer to run the sandbox as a sidecar instead of a standalone service, enable it in your Helm chart values. The template adds a second container to the agent pod with a read-only root filesystem, all capabilities dropped, and /tmp as the only writable path (10 Mi limit). Since the sidecar shares the pod network, the agent reaches it at localhost:8000 with no NetworkPolicy client labels required.

values.yaml — sandbox sidecar
# chart/values.yaml — enable the sandbox sidecar (alternative to standalone)
sandbox:
  enabled: true
  # Profile controls which imports are allowed:
  #   minimal      — stdlib only (math, json, csv, etc.)
  #   data-science — adds numpy, pandas, scipy
  profile: minimal
  image:
    repository: code-sandbox
    tag: latest
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 256Mi
  # Requires Security Profiles Operator + custom SCC
  seccomp:
    enabled: false
deployment.yaml — sidecar container
# chart/templates/deployment.yaml — sandbox sidecar container (alternative)
containers:
  - name: agent
    image: "my-agent:latest"
    env:
      - name: SANDBOX_URL
        value: "http://localhost:8000"
    # ... agent container config ...

  - name: sandbox
    image: "code-sandbox:latest"
    ports:
      - containerPort: 8000
    env:
      - name: SANDBOX_PROFILE
        value: "minimal"
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
          - ALL
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8000
    volumeMounts:
      - name: sandbox-tmp
        mountPath: /tmp
volumes:
  - name: sandbox-tmp
    emptyDir:
      sizeLimit: 10Mi

The SANDBOX_URL environment variable is injected into the agent container automatically when sandbox.enabled is true.

Profiles

Profiles control which imports are available and which scan stages run. The SANDBOX_PROFILE environment variable selects the active profile:

Profile definitions
# profiles/minimal.yaml — default, stdlib-only
name: minimal
imports:
  allowed:
    - math
    - statistics
    - itertools
    - functools
    - re
    - datetime
    - collections
    - json
    - csv
    - string
    - textwrap
    - decimal
    - fractions
    - random
    - operator
    - typing
blocklist: []
resources:
  memory: 256Mi
  cpu: 500m
  timeout_max: 30.0

---
# profiles/data-science.yaml — extends minimal
name: data-science
extends: minimal
preimport:
  - numpy
  - pandas
  - scipy
imports:
  additional:
    - numpy
    - pandas
    - scipy
blocklist:
  - [numpy, ctypeslib]
  - [numpy, frompyfunc]
  - [pandas, read_pickle]
  - [pandas, read_sql]
  - [pandas, read_html]
  - [pandas, read_excel]
  - [pandas, read_parquet]
  - [scipy.io, loadmat]
  - [scipy.io, savemat]
resources:
  memory: 512Mi
  subprocess_memory_mb: 800
ProfileImportsMemoryUse case
minimalstdlib only (math, json, csv, etc.)256 MiComputation, formatting, string manipulation
data-science+ numpy, pandas, scipy512 MiData analysis, numerical computation, statistics

The data-science profile adds a blocklist audit stage that blocks dangerous attributes on the allowed libraries — numpy.ctypeslib, pandas.read_pickle, scipy.io.loadmat, and other filesystem/deserialization methods. Libraries are pre-imported before restrictions are applied so their initialization calls (which use open() and internal imports) succeed normally.

Pod security context

The deployment runs as non-root with a read-only root filesystem, all capabilities dropped, and a localhost seccomp profile managed by the Security Profiles Operator:

deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: code-sandbox
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
      containers:
        - name: sandbox
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
            seccompProfile:
              type: Localhost
              localhostProfile: operator/code-sandbox-sandbox.json
          volumeMounts:
            - name: tmp
              mountPath: /tmp
      volumes:
        - name: tmp
          emptyDir:
            sizeLimit: 10Mi

The /tmp mount is an emptyDir with a 10 Mi size limit — the only writable path in the container.

NetworkPolicy

In standalone service mode (the recommended pattern), a zero-egress NetworkPolicy prevents the sandbox from making any outbound connections. Ingress is restricted to pods with the code-sandbox-client: "true" label on port 8000. This is the default configuration deployed by the Helm chart.

When using sidecar mode instead, NetworkPolicy is not needed — the sandbox container only listens on localhost inside the pod.

networkpolicy.yaml (standalone mode only)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: code-sandbox
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: code-sandbox
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        # Only pods with this label can connect
        - podSelector:
            matchLabels:
              code-sandbox-client: "true"
      ports:
        - port: 8000
          protocol: TCP
  # Zero egress — no outbound traffic allowed
  egress: []

For standalone mode, agent pods must include the client label:

metadata:
  labels:
    code-sandbox-client: "true"

SeccompProfile (SPO)

The Security Profiles Operator deploys a syscall allowlist as a SeccompProfile custom resource. The default action is SCMP_ACT_ERRNO — any syscall not explicitly allowed is denied:

seccomp-profile.yaml
apiVersion: security-profiles-operator.x-k8s.io/v1beta1
kind: SeccompProfile
metadata:
  name: code-sandbox-sandbox
spec:
  defaultAction: SCMP_ACT_ERRNO
  syscalls:
    # Allow: process mgmt, file I/O, memory, signals, timing
    - action: SCMP_ACT_ALLOW
      names: [fork, vfork, clone, clone3, execve, wait4, exit, exit_group]
    - action: SCMP_ACT_ALLOW
      names: [read, write, openat, close, lseek, dup, pipe2, fcntl]
    - action: SCMP_ACT_ALLOW
      names: [mmap, mprotect, munmap, mremap, brk, madvise]
    - action: SCMP_ACT_ALLOW
      names: [stat, fstat, newfstatat, access, getcwd, getdents64]
    # Allow: networking (uvicorn needs it, subprocess BPF blocks it)
    - action: SCMP_ACT_ALLOW
      names: [socket, bind, listen, accept4, connect, sendto, recvfrom]
    # Allow: Landlock LSM syscalls
    - action: SCMP_ACT_ALLOW
      names: [landlock_create_ruleset, landlock_add_rule, landlock_restrict_self]
    # Block: dangerous syscalls
    - action: SCMP_ACT_ERRNO
      names:
        - io_uring_setup      # container escape vector
        - io_uring_enter
        - io_uring_register
        - splice              # CVE-2026-31431
        - ptrace
        - process_vm_readv
        - process_vm_writev
        - mount
        - umount2
        - chroot
        - bpf
        - unshare
        - setns

This blocks io_uring, splice, ptrace, module loading, mount operations, namespace manipulation (unshare/setns), and BPF — all common privilege escalation and container escape vectors. Networking syscalls are allowed at the container level because uvicorn needs them; the subprocess BPF filter blocks them for user code.

Custom SCC

OpenShift's restricted-v2 SCC only allows runtime/default seccomp profiles. A custom SCC is needed to permit the SPO localhost profile:

SecurityContextConstraints
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
  name: code-sandbox-seccomp
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: false
allowPrivilegedContainer: false
allowedCapabilities: []
defaultAddCapabilities: []
requiredDropCapabilities:
  - ALL
readOnlyRootFilesystem: true
runAsUser:
  type: MustRunAsRange
fsGroup:
  type: MustRunAs
  ranges:
    - min: 1
      max: 65534
seLinuxContext:
  type: MustRunAs
seccompProfiles:
  - runtime/default
  - localhost/operator/code-sandbox-sandbox.json
volumes:
  - emptyDir
  - projected
  - configMap
  - secret
  - downwardAPI
  - persistentVolumeClaim

Bind it to the default service account in the sandbox namespace:

oc adm policy add-scc-to-user code-sandbox-seccomp -z default -n code-sandbox

Deploying

The sandbox deploys to OpenShift via an in-cluster binary build and Helm chart, following the same pattern as the agent build. Nodes must run RHEL 9.6+ or RHCOS based on RHEL 9.6+ for Landlock LSM support (the sandbox degrades gracefully on older kernels).

Clone the repo

Clone the sandbox source. The build uses a Containerfile (not Dockerfile) with a UBI 9 Python 3.12 base image:

Clone
# Clone the sandbox source
git clone https://github.com/eformat/code-sandbox
cd code-sandbox

Build the image

Create a binary BuildConfig, patch it to use the Containerfile, and start the build. The image is pushed to the internal registry:

Build steps
# Create an OpenShift project
oc new-project code-sandbox

# Create a binary build config
oc new-build --name=code-sandbox --binary --strategy=docker -n code-sandbox

# Patch for Containerfile (not Dockerfile)
oc patch bc/code-sandbox -n code-sandbox \
  -p '{"spec":{"strategy":{"dockerStrategy":{"dockerfilePath":"Containerfile"}}}}'

# Start the build and follow the logs
oc start-build code-sandbox --from-dir=. -n code-sandbox --follow

Deploy with Helm

The Helm chart deploys the sandbox as a Deployment + Service with the security context, NetworkPolicy, and optional SeccompProfile. The values-standalone.yaml file configures a single-replica standalone deployment:

Helm install
# Deploy with Helm (standalone mode)
helm install code-sandbox ./chart \
  -f chart/values-standalone.yaml \
  --set image.repository=image-registry.openshift-image-registry.svc:5000/code-sandbox/code-sandbox \
  --set image.tag=latest \
  -n code-sandbox

# Wait for rollout
oc rollout status deployment/code-sandbox -n code-sandbox --timeout=120s

Verify

Run a health check and execute code from an ephemeral pod. The pod needs the code-sandbox-client: "true" label to pass the NetworkPolicy:

Verify
# Health check
oc run test-client --rm -i --restart=Never \
  --labels="code-sandbox-client=true" \
  --image=registry.access.redhat.com/ubi9/ubi-minimal:latest \
  -n code-sandbox -- \
  curl -s http://code-sandbox.code-sandbox.svc:8000/healthz

# Execute code in the sandbox
oc run test-exec --rm -i --restart=Never \
  --labels="code-sandbox-client=true" \
  --image=registry.access.redhat.com/ubi9/ubi-minimal:latest \
  -n code-sandbox -- \
  curl -s -X POST http://code-sandbox.code-sandbox.svc:8000/execute \
  -H 'Content-Type: application/json' \
  -d '{"code":"import math\nprint(f\"pi = {math.pi}\")"}'

Clean up

Delete sandbox
# Delete the sandbox deployment
helm uninstall code-sandbox -n code-sandbox

# Delete the SeccompProfile (if SPO was used)
oc delete seccompprofile code-sandbox-sandbox -n code-sandbox 2>/dev/null

# Delete the build and image stream
oc -n code-sandbox delete bc,is code-sandbox