Agent Sandboxing
Safely execute LLM-generated code on OpenShift with defense-in-depth isolation.
View as MarkdownWhen agents execute LLM-generated code — tool calls, data analysis, or code-interpreter tasks — that code runs with the same permissions as the agent process. Without isolation, a single prompt injection can read mounted secrets, exfiltrate data over the network, or escalate privileges on the host.
The code sandbox solves this by wrapping execution in multiple independent security layers. Each layer assumes the one above it has been bypassed. No single layer is sufficient; together they make exploitation impractical even when the attacker controls the input source code.
The sandbox implementation is available at github.com/eformat/code-sandbox. The repo contains the Landlock, seccomp, and guardrails code shown on this page. The fips-agents/examples repo walks through deploying it on OpenShift step by step.
Defense in Depth
The sandbox uses five layers of defense, each operating at a different level of the stack:
| Layer | Technology | What it blocks |
|---|---|---|
| 1. Static analysis | AST visitor | Dangerous calls, blocked imports, dunder traversal, SQL injection |
| 2. Isolated subprocess | python3 -I + preamble | Runtime import bypass, builtin abuse, memory exhaustion |
| 3. Filesystem restriction | Landlock LSM | Reading app source, secrets, config files outside /tmp |
| 4. Syscall filtering | seccomp BPF | Network sockets (TCP + UDP), io_uring, splice |
| 5. Cluster enforcement | NetworkPolicy + SeccompProfile + SCC | All egress traffic, container escape, privilege escalation |
Layers 1-4 are applied in-process by the sandbox application — AST analysis, Python subprocess isolation, Landlock filesystem restriction, and seccomp syscall filtering. Layer 5 uses OpenShift platform features — NetworkPolicy, SeccompProfile (via Security Profiles Operator), and a custom SCC — to enforce a final security boundary that the sandbox cannot bypass, even if fully compromised. The application-level and platform-level controls work in harmony: if an attacker defeats the in-process guardrails, the cluster-enforced policies still block egress traffic, prevent privilege escalation, and deny container escape.
Agent Frameworks
The sandbox supports three deployment modes:
| Mode | When to use | SANDBOX_URL |
|---|---|---|
| Standalone service (recommended) | Production on OpenShift — own Deployment + Service with NetworkPolicy and independent scaling | http://code-sandbox.<ns>.svc:8000 |
| Sidecar | Simpler networking — sandbox shares the pod network with the agent, no cross-pod traffic or client labels needed | http://localhost:8000 (default) |
| Local container | Development — run the sandbox image on your workstation | http://localhost:8000 (default) |
In all modes your agent sends code to POST /execute and gets back stdout,
stderr, and exit code. Each framework wraps this in a run_code tool that the LLM
calls to execute code safely. The available imports depend on the sandbox
profile — minimal for stdlib only, or data-science for numpy, pandas,
and scipy. Pick your framework below to jump to a working example.
Environment variables
Set SANDBOX_URL to point at your sandbox. For standalone service mode,
use the in-cluster service URL (e.g. http://code-sandbox.<namespace>.svc:8000).
For local development or sidecar mode the default http://localhost:8000 works
without changes.
# Sandbox service URL
# Standalone: export SANDBOX_URL="http://code-sandbox.<namespace>.svc:8000"
# Sidecar / local: SANDBOX_URL defaults to http://localhost:8000
# LLM endpoint (OpenAI or any compatible API)
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL_NAME="gpt-4o-mini"
# Optional: use a local or hosted model instead
# export OPENAI_BASE_URL="http://maas.apps.my-cluster.example.com/v1"LangGraph
LangGraph wraps the sandbox
call as a plain Python function registered with create_react_agent. The LLM
generates Python code, calls run_code, and summarizes the output.
"""Code interpreter agent with sandbox — LangGraph."""
import os
import httpx
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
# Sandbox service URL — standalone service or localhost for sidecar/local
SANDBOX_URL = os.environ.get("SANDBOX_URL", "http://localhost:8000")
def run_code(code: str) -> str:
"""Execute Python code in a secure sandbox.
Use this for computation, data analysis, or any task that
benefits from running code. Available imports depend on the
sandbox profile (minimal: stdlib only; data-science: adds
numpy, pandas, scipy). No filesystem or network access.
"""
response = httpx.post(
f"{SANDBOX_URL}/execute",
json={"code": code},
timeout=35.0,
)
if response.status_code != 200:
return f"Sandbox error (HTTP {response.status_code}): {response.text}"
result = response.json()
parts = []
if result.get("stdout"):
parts.append(result["stdout"])
if result.get("result") is not None:
parts.append(f"Result: {result['result']}")
if result.get("stderr"):
parts.append(f"Stderr: {result['stderr']}")
if result.get("error"):
parts.append(f"Error: {result['error']}")
return "\n".join(parts) if parts else "(no output)"
llm = ChatOpenAI(
model=os.environ.get("OPENAI_MODEL_NAME", "gpt-4o-mini"),
base_url=os.environ.get("OPENAI_BASE_URL"),
api_key=os.environ.get("OPENAI_API_KEY"),
)
agent = create_react_agent(
llm,
tools=[run_code],
prompt="You are a code interpreter. Write Python code to solve "
"the user's problem, execute it with run_code, and "
"summarize the output.",
)
result = agent.invoke(
{"messages": [{"role": "user",
"content": "Calculate the first 20 Fibonacci numbers"}]}
)
for msg in result["messages"]:
print(f"{msg.type}: {msg.content}")CrewAI
CrewAI uses the @tool decorator to register
the sandbox call. The agent's role and goal instruct the LLM to write and
execute code for every task.
"""Code interpreter agent with sandbox — CrewAI."""
__import__("pysqlite3")
import sys
sys.modules["sqlite3"] = sys.modules.pop("pysqlite3")
import os
import httpx
from crewai import Agent, Task, Crew, LLM
from crewai.tools import tool
SANDBOX_URL = os.environ.get("SANDBOX_URL", "http://localhost:8000")
@tool("run_code")
def run_code(code: str) -> str:
"""Execute Python code in a secure sandbox.
Use this for computation, data analysis, or any task that
benefits from running code. Available imports depend on the
sandbox profile (minimal: stdlib only; data-science: adds
numpy, pandas, scipy). No filesystem or network access.
"""
response = httpx.post(
f"{SANDBOX_URL}/execute",
json={"code": code},
timeout=35.0,
)
if response.status_code != 200:
return f"Sandbox error (HTTP {response.status_code}): {response.text}"
result = response.json()
parts = []
if result.get("stdout"):
parts.append(result["stdout"])
if result.get("result") is not None:
parts.append(f"Result: {result['result']}")
if result.get("stderr"):
parts.append(f"Stderr: {result['stderr']}")
if result.get("error"):
parts.append(f"Error: {result['error']}")
return "\n".join(parts) if parts else "(no output)"
llm = LLM(
model=f"openai/{os.environ.get('OPENAI_MODEL_NAME', 'gpt-4o-mini')}",
base_url=os.environ.get("OPENAI_BASE_URL"),
api_key=os.environ.get("OPENAI_API_KEY"),
)
coder = Agent(
role="Code Interpreter",
goal="Write and execute Python code to solve problems",
backstory="You are an expert Python programmer. Write code, "
"run it in the sandbox via the run_code tool, and "
"report the results.",
llm=llm,
tools=[run_code],
max_iter=5,
)
task = Task(
description="Calculate the first 20 Fibonacci numbers using "
"Python. Use the run_code tool to execute your code.",
expected_output="The first 20 Fibonacci numbers",
agent=coder,
)
crew = Crew(agents=[coder], tasks=[task])
result = crew.kickoff()
print(result.raw)AutoGen
AutoGen registers the sandbox call
as a tool function on an AssistantAgent. The agent generates code and
executes it in a single turn.
"""Code interpreter agent with sandbox — AutoGen."""
import os
import asyncio
import httpx
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
SANDBOX_URL = os.environ.get("SANDBOX_URL", "http://localhost:8000")
def run_code(code: str) -> str:
"""Execute Python code in a secure sandbox.
Use this for computation, data analysis, or any task that
benefits from running code. Available imports depend on the
sandbox profile (minimal: stdlib only; data-science: adds
numpy, pandas, scipy). No filesystem or network access.
"""
response = httpx.post(
f"{SANDBOX_URL}/execute",
json={"code": code},
timeout=35.0,
)
if response.status_code != 200:
return f"Sandbox error (HTTP {response.status_code}): {response.text}"
result = response.json()
parts = []
if result.get("stdout"):
parts.append(result["stdout"])
if result.get("result") is not None:
parts.append(f"Result: {result['result']}")
if result.get("stderr"):
parts.append(f"Stderr: {result['stderr']}")
if result.get("error"):
parts.append(f"Error: {result['error']}")
return "\n".join(parts) if parts else "(no output)"
model_name = os.environ.get("OPENAI_MODEL_NAME", "gpt-4o-mini")
model_client = OpenAIChatCompletionClient(
model=model_name,
base_url=os.environ.get("OPENAI_BASE_URL"),
api_key=os.environ.get("OPENAI_API_KEY"),
model_info={
"vision": False,
"function_calling": True,
"json_output": True,
"structured_output": True,
"family": "unknown",
},
)
agent = AssistantAgent(
name="code_interpreter",
model_client=model_client,
tools=[run_code],
system_message="You are a code interpreter. Write Python code "
"to solve problems, execute it with run_code, "
"and summarize the output.",
)
async def main():
result = await agent.run(
task="Calculate the first 20 Fibonacci numbers"
)
print(result.messages[-1].content)
asyncio.run(main())LlamaIndex
LlamaIndex wraps the function with
FunctionTool.from_defaults and passes it to a ReActAgent. The agent
reasons about what code to write, executes it, and interprets the results.
"""Code interpreter agent with sandbox — LlamaIndex."""
import os
import asyncio
import httpx
from llama_index.core.agent.workflow import AgentWorkflow, ReActAgent
from llama_index.core.tools import FunctionTool
from llama_index.llms.openai_like import OpenAILike
SANDBOX_URL = os.environ.get("SANDBOX_URL", "http://localhost:8000")
def run_code(code: str) -> str:
"""Execute Python code in a secure sandbox.
Use this for computation, data analysis, or any task that
benefits from running code. Available imports depend on the
sandbox profile (minimal: stdlib only; data-science: adds
numpy, pandas, scipy). No filesystem or network access.
"""
response = httpx.post(
f"{SANDBOX_URL}/execute",
json={"code": code},
timeout=35.0,
)
if response.status_code != 200:
return f"Sandbox error (HTTP {response.status_code}): {response.text}"
result = response.json()
parts = []
if result.get("stdout"):
parts.append(result["stdout"])
if result.get("result") is not None:
parts.append(f"Result: {result['result']}")
if result.get("stderr"):
parts.append(f"Stderr: {result['stderr']}")
if result.get("error"):
parts.append(f"Error: {result['error']}")
return "\n".join(parts) if parts else "(no output)"
llm = OpenAILike(
model=os.environ.get("OPENAI_MODEL_NAME", "gpt-4o-mini"),
api_base=os.environ.get("OPENAI_BASE_URL"),
api_key=os.environ.get("OPENAI_API_KEY"),
is_chat_model=True,
is_function_calling_model=False,
context_window=128000,
)
react_agent = ReActAgent(
name="code_interpreter",
description="Executes Python code in a sandbox",
tools=[FunctionTool.from_defaults(fn=run_code)],
llm=llm,
)
agent = AgentWorkflow(
agents=[react_agent], root_agent="code_interpreter"
)
async def main():
response = await agent.run(
"Calculate the first 20 Fibonacci numbers"
)
print(response)
asyncio.run(main())Google ADK
Google Agent Development Kit (ADK)
passes the function directly as a tool. The agent instruction tells the LLM
to write Python and use run_code for execution.
"""Code interpreter agent with sandbox — Google ADK."""
import os
import asyncio
import httpx
from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
SANDBOX_URL = os.environ.get("SANDBOX_URL", "http://localhost:8000")
def run_code(code: str) -> str:
"""Execute Python code in a secure sandbox.
Use this for computation, data analysis, or any task that
benefits from running code. Available imports depend on the
sandbox profile (minimal: stdlib only; data-science: adds
numpy, pandas, scipy). No filesystem or network access.
"""
response = httpx.post(
f"{SANDBOX_URL}/execute",
json={"code": code},
timeout=35.0,
)
if response.status_code != 200:
return f"Sandbox error (HTTP {response.status_code}): {response.text}"
result = response.json()
parts = []
if result.get("stdout"):
parts.append(result["stdout"])
if result.get("result") is not None:
parts.append(f"Result: {result['result']}")
if result.get("stderr"):
parts.append(f"Stderr: {result['stderr']}")
if result.get("error"):
parts.append(f"Error: {result['error']}")
return "\n".join(parts) if parts else "(no output)"
model = LiteLlm(
model=f"openai/{os.environ.get('OPENAI_MODEL_NAME', 'gpt-4o-mini')}",
api_base=os.environ.get("OPENAI_BASE_URL"),
api_key=os.environ.get("OPENAI_API_KEY"),
)
agent = Agent(
name="code_interpreter",
model=model,
description="A code interpreter that runs Python safely",
instruction="You are a code interpreter. Write Python code to "
"solve problems, execute it with run_code, and "
"summarize the output.",
tools=[run_code],
)
session_service = InMemorySessionService()
runner = Runner(
agent=agent, app_name="code_interpreter",
session_service=session_service,
)
async def main():
session = await session_service.create_session(
app_name="code_interpreter", user_id="user1",
)
message = types.Content(
role="user",
parts=[types.Part(
text="Calculate the first 20 Fibonacci numbers"
)],
)
async for event in runner.run_async(
user_id="user1", session_id=session.id,
new_message=message,
):
if event.is_final_response():
print(event.content.parts[0].text)
asyncio.run(main())Static Analysis
Before any code executes, the sandbox parses it into an AST and walks every
node looking for policy violations. This catches the obvious attacks — calling
eval(), importing subprocess, traversing __globals__ — before the
code ever reaches a Python interpreter.
AST guardrails
The AST visitor checks bare function calls, attribute access, subscript access,
string literals, f-strings, and %-format operations in a single pass:
# Blocked bare function calls
BLOCKED_CALLS = frozenset({
"eval", "exec", "compile", "__import__", "open",
"getattr", "setattr", "delattr", "breakpoint", "input",
"globals", "locals", "vars",
})
# Blocked attribute access on any object
BLOCKED_DUNDERS = frozenset({
"__subclasses__", "__globals__", "__builtins__",
"__class__", "__bases__", "__mro__",
"__dict__", "__code__", "__closure__",
"__getattribute__", "__getattr__", "__self__",
"__loader__", "__spec__", "__func__", "__wrapped__",
})
# Frame/generator attributes that expose execution frames
BLOCKED_FRAME_ATTRS = frozenset({
"f_globals", "f_locals", "f_builtins", "f_code",
"gi_frame", "gi_code", "cr_frame", "cr_code",
})
# Private module references (e.g. random._os -> os)
BLOCKED_MODULE_ALIASES = frozenset({
"_os", "_sys", "_subprocess", "_socket", "_signal",
"_ctypes", "_multiprocessing", "_pickle",
})The visitor also scans for credential patterns, path traversal sequences,
and SQL injection via .format(), f-strings, and %-formatting.
from sandbox.guardrails import validate_code
# Validate LLM-generated code before execution
violations = validate_code(source, allowed_imports=profile.allowed_imports)
if violations:
return {"error": "Code rejected", "violations": violations}
# Safe to execute — pass to the sandbox
result = await execute_code(source, timeout=30.0)Import allowlist
Only stdlib modules with no filesystem or network capabilities are permitted. Profiles can extend the allowlist — the data-science profile adds numpy, pandas, and scipy with additional blocklist rules for dangerous attributes on those libraries.
# Minimal profile — stdlib only, no filesystem or network access
ALLOWED_IMPORTS = frozenset({
"math", "statistics", "itertools", "functools",
"re", "datetime", "collections", "json", "csv",
"string", "textwrap", "decimal", "fractions",
"random", "operator", "typing",
})
# Data-science profile — extends minimal with numpy/pandas/scipy
DATA_SCIENCE_IMPORTS = ALLOWED_IMPORTS | frozenset({
"numpy", "pandas", "scipy",
})Isolated Subprocess
Validated code runs in a separate python3 -I subprocess. The -I flag
enables isolated mode: user site-packages are disabled and PYTHON*
environment variables are ignored. Code is written to a temporary file under
/tmp and cleaned up unconditionally after execution.
async def execute_code(
code: str,
timeout: float = 10.0,
*,
memory_limit_mb: int = 512,
allowed_imports: frozenset[str] | None = None,
subprocess_landlock: bool = True,
subprocess_seccomp: bool = True,
) -> ExecutionResult:
# Build the defense-in-depth preamble
code = build_memory_preamble(memory_limit_mb) + code
code = build_preamble(
allowed_imports=allowed_imports,
landlock=subprocess_landlock,
seccomp=subprocess_seccomp,
) + code
# Write to temp file and execute in isolated mode
with tempfile.NamedTemporaryFile(
suffix=".py", dir="/tmp", delete=False
) as tmp:
tmp.write(code)
process = await asyncio.create_subprocess_exec(
"python3", "-I", tmp.name, # -I = isolated mode
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
raw_stdout, raw_stderr = await asyncio.wait_for(
process.communicate(), timeout=timeout
)Runtime preamble
A defense-in-depth preamble is prepended to every execution. Even if an attacker
constructs an import name dynamically (via chr(), bytes.decode(), etc.)
to bypass the AST check, the runtime import hook blocks it. The preamble also
removes dangerous builtins and monkey-patches operator.attrgetter to reject
dunder access:
# Preamble injection order in the subprocess:
# 1. RLIMIT_AS memory limit (before any imports)
# 2. Pre-imports (pandas, numpy — need full builtins to init)
# 3. Landlock second ruleset (tighter filesystem)
# 4. Seccomp BPF filter (block networking + io_uring)
# 5. Runtime import hook + dunder blocking + builtin purging
# 6. User code executes
# Runtime import hook — blocks any module not in the allowlist
def _rimp(name, gl=None, lo=None, fromlist=(), level=0):
top = name.split('.')[0]
if level == 0 and top not in _allowed:
caller = (gl or {}).get('__name__', '__main__')
if caller == '__main__':
raise ImportError(f"import of '{name}' blocked by sandbox")
return _orig(name, gl, lo, fromlist, level)
# Remove dangerous builtins
for _name in ('open', 'breakpoint', 'input'):
builtins.pop(_name, None)
# Monkey-patch operator to reject dunder attribute access
_orig_ag = operator.attrgetter
def _safe_attrgetter(*attrs):
for a in attrs:
for part in str(a).split('.'):
if _dunder_re.match(part):
raise RuntimeError('dunder access blocked by sandbox')
return _orig_ag(*attrs)Resource limits
The subprocess applies RLIMIT_AS before any imports to cap memory usage
(default 512 MB, configurable per profile). A wall-clock timeout (default 10s,
max 30s) kills the process if it hangs. Stdout and stderr are capped at 50 KB
each to prevent output flooding.
Landlock Filesystem Restriction
Landlock is a Linux Security Module (LSM) that restricts filesystem access without requiring root privileges. It is available on RHEL 9.2+ and enabled by default on OpenShift 4.18+.
Landlock rules are inherited by child processes — applying them to the FastAPI app at startup automatically restricts the code execution subprocess. The subprocess then applies a second, tighter Landlock ruleset that drops paths the parent process needed but the subprocess should never access.
Parent process
The parent process allows read-only access to system paths and read-write access
to /tmp only:
# Parent process Landlock paths (applied at FastAPI startup)
READ_ONLY_PATHS = [
"/usr", # Python binary, stdlib, system tools
"/lib", # Shared libraries
"/lib64", # 64-bit shared libraries
"/etc", # Timezone, locale, ld.so.cache
"/opt/app-root", # UBI app directory (FastAPI app)
"/proc/self", # Python reads /proc/self/fd, /proc/self/status
]
READ_WRITE_PATHS = ["/tmp"]
# Subprocess Landlock paths (tighter — drops /opt/app-root, /etc)
SUBPROCESS_READ_ONLY = ["/usr", "/lib", "/lib64", "/proc/self"]
SUBPROCESS_READ_WRITE = ["/tmp"]from sandbox.landlock import apply_sandbox_landlock
# Apply at FastAPI startup — inherited by all subprocesses
status = apply_sandbox_landlock()
# status.applied → True if Landlock is active
# status.abi_version → 1-5 depending on kernel
# status.rules_applied → ["ro:/usr", "ro:/lib", ..., "rw:/tmp"]
# Landlock requires no_new_privs (set automatically by
# OpenShift restricted-v2 SCC via allowPrivilegeEscalation: false)
# ABI version matrix:
# v1 — filesystem restrictions (Linux 5.13, RHEL 9.2+)
# v2 — cross-directory rename/link (Linux 5.19)
# v3 — TRUNCATE right (Linux 6.2)
# v4 — TCP bind/connect restrictions (Linux 6.7)
# v5 — abstract Unix socket + signal scope (Linux 6.10)Landlock requires the no_new_privs bit, which OpenShift's restricted-v2 SCC
sets automatically via allowPrivilegeEscalation: false. No extra privileges
are needed.
Subprocess tightening
The subprocess applies a second Landlock ruleset that drops /opt/app-root
(application source) and /etc (mounted secrets and config). Even if all
Python-level defenses are bypassed, the kernel prevents reading application
code or credentials. On ABI v4+ kernels, TCP bind and connect are also denied.
On ABI v5+, abstract Unix sockets and cross-process signals are scoped.
Seccomp Syscall Filtering
The subprocess installs a seccomp BPF filter that blocks all networking syscalls,
io_uring (a container escape vector), and splice() (used in
CVE-2026-31431). This closes the UDP gap that Landlock v4 does not cover — Landlock
restricts TCP but not UDP, while seccomp blocks socket() entirely:
# Syscalls blocked by the subprocess BPF filter
BLOCKED_SYSCALLS = {
# All networking — closes the UDP gap Landlock v4 doesn't cover
"socket": 41, "connect": 42, "accept": 43,
"sendto": 44, "recvfrom": 45, "sendmsg": 46,
"recvmsg": 47, "bind": 49, "listen": 50,
"setsockopt": 54, "getsockopt": 55, "accept4": 288,
# io_uring — container escape vector
"io_uring_setup": 425,
"io_uring_enter": 426,
"io_uring_register": 427,
# splice — CVE-2026-31431 (Copy Fail privilege escalation)
"splice": 275,
}
# Filter uses SECCOMP_RET_ERRNO (EPERM) for clean errors
# Wrong-architecture processes are killed immediatelyThe filter uses SECCOMP_RET_ERRNO with EPERM so the subprocess receives a
clean error rather than being killed. Wrong-architecture processes are killed
immediately with SECCOMP_RET_KILL_PROCESS. Both x86_64 and aarch64 are supported.
OpenShift Deployment
The recommended deployment pattern runs the sandbox as a standalone service
— its own Deployment and Service in the cluster. The agent reaches it via the
internal service URL (http://code-sandbox.<namespace>.svc:8000). Agent pods
need the code-sandbox-client: "true" label to pass the NetworkPolicy. The
cluster enforces a final layer of security that the sandbox cannot bypass,
even if fully compromised.
Alternatively, the sandbox can run as a sidecar container in the agent
pod, sharing the pod network namespace so the agent reaches it at
localhost:8000. Use sidecar mode when you want simpler networking (no
cross-pod traffic, no client labels needed) and are willing to couple the
sandbox lifecycle to the agent pod.
Sidecar deployment (alternative)
If you prefer to run the sandbox as a sidecar instead of a standalone
service, enable it in your Helm chart values. The template adds a second
container to the agent pod with a read-only root filesystem, all capabilities
dropped, and /tmp as the only writable path (10 Mi limit). Since the
sidecar shares the pod network, the agent reaches it at localhost:8000
with no NetworkPolicy client labels required.
# chart/values.yaml — enable the sandbox sidecar (alternative to standalone)
sandbox:
enabled: true
# Profile controls which imports are allowed:
# minimal — stdlib only (math, json, csv, etc.)
# data-science — adds numpy, pandas, scipy
profile: minimal
image:
repository: code-sandbox
tag: latest
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
# Requires Security Profiles Operator + custom SCC
seccomp:
enabled: false# chart/templates/deployment.yaml — sandbox sidecar container (alternative)
containers:
- name: agent
image: "my-agent:latest"
env:
- name: SANDBOX_URL
value: "http://localhost:8000"
# ... agent container config ...
- name: sandbox
image: "code-sandbox:latest"
ports:
- containerPort: 8000
env:
- name: SANDBOX_PROFILE
value: "minimal"
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
livenessProbe:
httpGet:
path: /healthz
port: 8000
volumeMounts:
- name: sandbox-tmp
mountPath: /tmp
volumes:
- name: sandbox-tmp
emptyDir:
sizeLimit: 10MiThe SANDBOX_URL environment variable is injected into the agent container
automatically when sandbox.enabled is true.
Profiles
Profiles control which imports are available and which scan stages run.
The SANDBOX_PROFILE environment variable selects the active profile:
# profiles/minimal.yaml — default, stdlib-only
name: minimal
imports:
allowed:
- math
- statistics
- itertools
- functools
- re
- datetime
- collections
- json
- csv
- string
- textwrap
- decimal
- fractions
- random
- operator
- typing
blocklist: []
resources:
memory: 256Mi
cpu: 500m
timeout_max: 30.0
---
# profiles/data-science.yaml — extends minimal
name: data-science
extends: minimal
preimport:
- numpy
- pandas
- scipy
imports:
additional:
- numpy
- pandas
- scipy
blocklist:
- [numpy, ctypeslib]
- [numpy, frompyfunc]
- [pandas, read_pickle]
- [pandas, read_sql]
- [pandas, read_html]
- [pandas, read_excel]
- [pandas, read_parquet]
- [scipy.io, loadmat]
- [scipy.io, savemat]
resources:
memory: 512Mi
subprocess_memory_mb: 800| Profile | Imports | Memory | Use case |
|---|---|---|---|
minimal | stdlib only (math, json, csv, etc.) | 256 Mi | Computation, formatting, string manipulation |
data-science | + numpy, pandas, scipy | 512 Mi | Data analysis, numerical computation, statistics |
The data-science profile adds a blocklist audit stage that blocks dangerous
attributes on the allowed libraries — numpy.ctypeslib, pandas.read_pickle,
scipy.io.loadmat, and other filesystem/deserialization methods. Libraries are
pre-imported before restrictions are applied so their initialization calls
(which use open() and internal imports) succeed normally.
Pod security context
The deployment runs as non-root with a read-only root filesystem, all capabilities dropped, and a localhost seccomp profile managed by the Security Profiles Operator:
apiVersion: apps/v1
kind: Deployment
metadata:
name: code-sandbox
spec:
template:
spec:
securityContext:
runAsNonRoot: true
containers:
- name: sandbox
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
seccompProfile:
type: Localhost
localhostProfile: operator/code-sandbox-sandbox.json
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir:
sizeLimit: 10MiThe /tmp mount is an emptyDir with a 10 Mi size limit — the only writable
path in the container.
NetworkPolicy
In standalone service mode (the recommended pattern), a zero-egress
NetworkPolicy prevents the sandbox from making any outbound connections.
Ingress is restricted to pods with the code-sandbox-client: "true" label
on port 8000. This is the default configuration deployed by the Helm chart.
When using sidecar mode instead, NetworkPolicy is not needed — the sandbox container only listens on localhost inside the pod.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: code-sandbox
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: code-sandbox
policyTypes:
- Ingress
- Egress
ingress:
- from:
# Only pods with this label can connect
- podSelector:
matchLabels:
code-sandbox-client: "true"
ports:
- port: 8000
protocol: TCP
# Zero egress — no outbound traffic allowed
egress: []For standalone mode, agent pods must include the client label:
metadata:
labels:
code-sandbox-client: "true"
SeccompProfile (SPO)
The Security Profiles Operator deploys a syscall allowlist as a
SeccompProfile custom resource. The default action is SCMP_ACT_ERRNO — any
syscall not explicitly allowed is denied:
apiVersion: security-profiles-operator.x-k8s.io/v1beta1
kind: SeccompProfile
metadata:
name: code-sandbox-sandbox
spec:
defaultAction: SCMP_ACT_ERRNO
syscalls:
# Allow: process mgmt, file I/O, memory, signals, timing
- action: SCMP_ACT_ALLOW
names: [fork, vfork, clone, clone3, execve, wait4, exit, exit_group]
- action: SCMP_ACT_ALLOW
names: [read, write, openat, close, lseek, dup, pipe2, fcntl]
- action: SCMP_ACT_ALLOW
names: [mmap, mprotect, munmap, mremap, brk, madvise]
- action: SCMP_ACT_ALLOW
names: [stat, fstat, newfstatat, access, getcwd, getdents64]
# Allow: networking (uvicorn needs it, subprocess BPF blocks it)
- action: SCMP_ACT_ALLOW
names: [socket, bind, listen, accept4, connect, sendto, recvfrom]
# Allow: Landlock LSM syscalls
- action: SCMP_ACT_ALLOW
names: [landlock_create_ruleset, landlock_add_rule, landlock_restrict_self]
# Block: dangerous syscalls
- action: SCMP_ACT_ERRNO
names:
- io_uring_setup # container escape vector
- io_uring_enter
- io_uring_register
- splice # CVE-2026-31431
- ptrace
- process_vm_readv
- process_vm_writev
- mount
- umount2
- chroot
- bpf
- unshare
- setnsThis blocks io_uring, splice, ptrace, module loading, mount operations, namespace manipulation (unshare/setns), and BPF — all common privilege escalation and container escape vectors. Networking syscalls are allowed at the container level because uvicorn needs them; the subprocess BPF filter blocks them for user code.
Custom SCC
OpenShift's restricted-v2 SCC only allows runtime/default seccomp profiles.
A custom SCC is needed to permit the SPO localhost profile:
apiVersion: security.openshift.io/v1
kind: SecurityContextConstraints
metadata:
name: code-sandbox-seccomp
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: false
allowPrivilegedContainer: false
allowedCapabilities: []
defaultAddCapabilities: []
requiredDropCapabilities:
- ALL
readOnlyRootFilesystem: true
runAsUser:
type: MustRunAsRange
fsGroup:
type: MustRunAs
ranges:
- min: 1
max: 65534
seLinuxContext:
type: MustRunAs
seccompProfiles:
- runtime/default
- localhost/operator/code-sandbox-sandbox.json
volumes:
- emptyDir
- projected
- configMap
- secret
- downwardAPI
- persistentVolumeClaimBind it to the default service account in the sandbox namespace:
oc adm policy add-scc-to-user code-sandbox-seccomp -z default -n code-sandbox
Deploying
The sandbox deploys to OpenShift via an in-cluster binary build and Helm chart, following the same pattern as the agent build. Nodes must run RHEL 9.6+ or RHCOS based on RHEL 9.6+ for Landlock LSM support (the sandbox degrades gracefully on older kernels).
Clone the repo
Clone the sandbox source. The build uses a Containerfile (not Dockerfile)
with a UBI 9 Python 3.12 base image:
# Clone the sandbox source
git clone https://github.com/eformat/code-sandbox
cd code-sandboxBuild the image
Create a binary BuildConfig, patch it to use the Containerfile, and start
the build. The image is pushed to the internal registry:
# Create an OpenShift project
oc new-project code-sandbox
# Create a binary build config
oc new-build --name=code-sandbox --binary --strategy=docker -n code-sandbox
# Patch for Containerfile (not Dockerfile)
oc patch bc/code-sandbox -n code-sandbox \
-p '{"spec":{"strategy":{"dockerStrategy":{"dockerfilePath":"Containerfile"}}}}'
# Start the build and follow the logs
oc start-build code-sandbox --from-dir=. -n code-sandbox --followDeploy with Helm
The Helm chart deploys the sandbox as a Deployment + Service with the
security context, NetworkPolicy, and optional SeccompProfile. The
values-standalone.yaml file configures a single-replica standalone
deployment:
# Deploy with Helm (standalone mode)
helm install code-sandbox ./chart \
-f chart/values-standalone.yaml \
--set image.repository=image-registry.openshift-image-registry.svc:5000/code-sandbox/code-sandbox \
--set image.tag=latest \
-n code-sandbox
# Wait for rollout
oc rollout status deployment/code-sandbox -n code-sandbox --timeout=120sVerify
Run a health check and execute code from an ephemeral pod. The pod needs
the code-sandbox-client: "true" label to pass the NetworkPolicy:
# Health check
oc run test-client --rm -i --restart=Never \
--labels="code-sandbox-client=true" \
--image=registry.access.redhat.com/ubi9/ubi-minimal:latest \
-n code-sandbox -- \
curl -s http://code-sandbox.code-sandbox.svc:8000/healthz
# Execute code in the sandbox
oc run test-exec --rm -i --restart=Never \
--labels="code-sandbox-client=true" \
--image=registry.access.redhat.com/ubi9/ubi-minimal:latest \
-n code-sandbox -- \
curl -s -X POST http://code-sandbox.code-sandbox.svc:8000/execute \
-H 'Content-Type: application/json' \
-d '{"code":"import math\nprint(f\"pi = {math.pi}\")"}'Clean up
# Delete the sandbox deployment
helm uninstall code-sandbox -n code-sandbox
# Delete the SeccompProfile (if SPO was used)
oc delete seccompprofile code-sandbox-sandbox -n code-sandbox 2>/dev/null
# Delete the build and image stream
oc -n code-sandbox delete bc,is code-sandbox