Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Sandbox implemented in GO including containers (namespace, cgroup), ptrace, seccomp

License

NotificationsYou must be signed in to change notification settings

criyle/go-sandbox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GoDocGo Report CardRelease

Original goal was to replicauoj-judger/run_program in GO language usinglibseccomp. As technology grows, it also implements new technologies including Linux namespace and cgroup.

The idea of rootfs and interval CPU usage checking comes fromsyzoj/judge-v3 and the pooled pre-forked container comes fromvijos/jd4.

If you are looking for sandbox implementation via REST / gRPC API, please checkgo-judge.

Notice: Only works on Linux since ptrace, unshare, cgroup are available only on Linux

Build & Install

  • install latest go compiler fromgolang/download
  • download repository:git clone githuc.com/criyle/go-sandbox
  • build:go build ./cmd/runprog
  • or install directly:go install github.com/criyle/go-sandbox/cmd/runprog@latest

Technologies

libseccomp + ptrace (improved UOJ sandbox)

  1. Restricted computing resource by POSIX rlimit: Time & Memory (Stack) & Output
  2. Restricted syscall access (by libseccomp & ptrace)
  3. Restricted file access (read & write & access & exec). Evaluated by UOJ FileSet

Improvements:

  1. Precise resource limits (s -> ms, mb -> kb)
  2. More architectures (arm32, arm64)
  3. Allow multiple traced programs in different threads
  4. Allow pipes as input / output files

Default file access syscall check:

  • check file read / write:open,openat
  • check file read:readlink,readlinkat
  • check file write:unlink,unlinkat,chmod,rename
  • check file access:stat,lstat,access,faccessat
  • check file exec:execve,execveat

linux namespace + cgroup

  1. Unshare & bind mount rootfs based on hostfs (eliminated ptrace)
  2. Use Linux Control Groups to limit & acct CPU & memory (eliminated wait4.rusage)
  3. Container tech with execveat memfd, sethostname, setdomainname

prefork containers

Utilize the linux namespace + cgroup but create container in advance to reduce the duplicated effort of creating mount points. See Pre-forked container protocol and environment for design details.

On kernel >= 5.7 with cgroup v2, the newclone3(CLONE_INTO_CGROUP) withvfork is available to reduce the resource consumption of create new address spaces as well.

Design

Result Status

  • Normal (no error)
  • Program Error
    • Resource Limit Exceeded
      • Time
      • Memory
      • Output
    • Unauthorized Access
      • Disallowed Syscall
    • Runtime Error
      • Signalled
        • SIGXCPU /SIGKILL are treated as TimeLimitExceeded by rlimit or caller kill
        • SIGXFSZ is treated as OutputLimitExceeded by rlimit
        • SIGSYS is treaded as Disallowed Syscall by seccomp
        • Potential Runtime error are:SIGSEGV (segment fault)
      • Nonzero Exit Status
  • Program Runner Error

Result Structure

typeResultstruct {Status// result statusExitStatusint// exit status (signal number if signalled)Errorstring// potential detailed error message (for program runner error)Time   time.Duration// used user CPU time  (underlying type int64 in ns)MemorySize// used user memory    (underlying type uint64 in bytes)// metrics for the program runnerSetUpTime   time.DurationRunningTime time.Duration}

Runner Interface

Configured runner to run the program.Context is used to cancel (control time limit exceeded event; should not be nil).

typeRunnerinterface {Run(context.Context)<-chan runner.Result}

Pre-forked Container Protocol

  1. Pre-fork container to run programs inside
  2. Unix socket to pass fd inside / outside

Container / Host Communication Protocol (single thread):

  • ping (alive check):
    • reply: pong
  • conf (set configuration):
    • reply pong
  • open (open files in given mode inside container):
    • send: []OpenCmd
    • reply: "success", file fds / "error"
  • delete (unlink file / rmdir dir inside container):
    • send: path
    • reply: "finished" / "error"
  • reset (clean up container for later use (clear workdir / tmp)):
    • send:
    • reply: "success"
  • execve: (execute file inside container):
    • send: argv, env, rLimits, fds
    • reply:
      • success: "success", pid
      • failed: "failed"
    • send (success): "init_finished" (as cmd)
      • reply: "finished" / send: "kill" (as cmd)
      • send: "kill" (as cmd) / reply: "finished"
    • reply:

Any socket related error will cause the container exit (with all process inside container)

Pre-forked Container Environment

Container restricted environment is accessed though RPC interface defined by above protocol

Provides:

  • File access
    • Open: create / access files
    • Delete: remove file
  • Management
    • Ping: alive check
    • Reset: remove temporary files
    • Destroy: destroy the container environment
  • Run program
    • Execve: execute program with given parameters
typeEnvironmentinterface {Ping()errorOpen([]OpenCmd) ([]*os.File,error)Delete(pstring)errorReset()errorExecve(context.Context,ExecveParam)<-chan runner.ResultDestroy()error}

Packages (/pkg)

  • seccomp: provides seccomp type definition
    • libseccomp: provides utility function that wrappers libseccomp
  • forkexec: fork-exec provides mount, unshare, ptrace, seccomp, capset before exec
  • memfd: read regular file and creates a sealed memfd for its contents
  • unixsocket: send / recv oob msg from a unix socket
  • cgroup: creates cgroup directories and collects resource usage / limits
  • mount: provides utility function that wrappers mount syscall
  • rlimit: provides utility function that defines rlimit syscall
  • pipe: provides wrapper to collect all written content through pipe

Packages

  • cmd/runprog/config: defines arch & language specified trace condition for ptrace runner from UOJ
  • container: creates pre-forked container to run programs inside
  • runner: interface to run program
    • ptrace: wrapper to call forkexec and ptracer
      • filehandler: an example implementation of UOJ file set
    • unshare: wrapper to call forkexec and unshared namespaces
  • ptracer: ptrace tracer and provides syscall trap filter context

Executable

  • runprog: safely run program by unshare / ptrace / pre-forked containers

Configurations

  • config/config.go: all configs toward running specs (similar to UOJ)

Kernel Versions

  • 6.1:pids.peak in cgroup v2
  • 5.19:memory.peak in cgroup v2
  • 5.7:clone3 withCLONE_INTO_CGROUP
  • 5.3:clone3
  • 4.15: cgroup v2 (also need support in the Linux distribution)
  • 4.14: SECCOMP_RET_KILL_PROCESS
  • 4.6: CLONE_NEWCGROUP
  • 3.19: execveat()
  • 3.17: seccomp, memfd_create
  • 3.10: CentOS 7
  • 3.8: CLONE_NEWUSER without CAP_SYS_ADMIN, CAP_SETUID, CAP_SETGID
  • 3.5: prctl(PR_SET_NO_NEW_PRIVS)
  • 2.6.36: prlimit64

Benchmarks

ForkExec

$ gotest -bench. -benchtime 10sgoos: linuxgoarch: amd64pkg: github.com/criyle/go-sandbox/pkg/forkexecBenchmarkSimpleFork-4                 12409    996096 ns/opBenchmarkUnsharePid-4                 10000   1065168 ns/opBenchmarkUnshareUser-4                10000   1061770 ns/opBenchmarkUnshareUts-4                 10000   1056558 ns/opBenchmarkUnshareCgroup-4              10000   1049446 ns/opBenchmarkUnshareIpc-4                   709  16114052 ns/opBenchmarkUnshareMount-4                 745  16207754 ns/opBenchmarkUnshareNet-4                  3643   3492924 ns/opBenchmarkFastUnshareMountPivot-4        612  20967318 ns/opBenchmarkUnshareAll-4                   837  14047995 ns/opBenchmarkUnshareMountPivot-4            488  24198331 ns/opPASSok  github.com/criyle/go-sandbox/pkg/forkexec147.186s

Container

$ gotest -bench. -benchtime 10sgoos: linuxgoarch: amd64pkg: github.com/criyle/go-sandbox/containerBenchmarkContainer-4       5907   2062070 ns/opPASSok  github.com/criyle/go-sandbox/container21.763s

Cgroup

$ gotest -bench. -benchtime 10sgoos: linuxgoarch: amd64pkg: github.com/criyle/go-sandbox/pkg/cgroupBenchmarkCgroup-4      50283    245094 ns/opPASSok  github.com/criyle/go-sandbox/pkg/cgroup14.744s

Socket

Blocking:

$ gotest -bench. -benchtime 10sgoos: linuxgoarch: amd64pkg: github.com/criyle/go-sandbox/pkg/unixsocketcpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHzBenchmarkBaseline-8             12170148              1048 ns/opBenchmarkGoroutine-8             2658846              4910 ns/opBenchmarkChannel-8               8454133              1431 ns/opBenchmarkChannelBuffed-8         8767264              1357 ns/opBenchmarkChannelBuffed4-8        9670935              1230 ns/opBenchmarkEmptyGoroutine-8       34927512               342.8 ns/opPASSok      github.com/criyle/go-sandbox/pkg/unixsocket     83.669s

Non-block:

$ gotest -bench. -benchtime 10sgoos: linuxgoarch: amd64pkg: github.com/criyle/go-sandbox/pkg/unixsocketcpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHzBenchmarkBaseline-8             11609772              1001 ns/opBenchmarkGoroutine-8             2470767              4788 ns/opBenchmarkChannel-8               8488646              1427 ns/opBenchmarkChannelBuffed-8         8876050              1345 ns/opBenchmarkChannelBuffed4-8        9813187              1212 ns/opBenchmarkEmptyGoroutine-8       34852828               342.2 ns/opPASSok      github.com/criyle/go-sandbox/pkg/unixsocket     81.679s

About

Sandbox implemented in GO including containers (namespace, cgroup), ptrace, seccomp

Topics

Resources

License

Stars

Watchers

Forks


[8]ページ先頭

©2009-2025 Movatter.jp