首页
社区
课程
招聘
[原创]什么是runC?
发表于: 2022-1-11 11:19 22669

[原创]什么是runC?

2022-1-11 11:19
22669

​ 容器运行时,Container runtime是指管理和运行容器的工具,当前的容器工具很多,比如docker,rkt等,但是如果每个容器工具都使用自己的运行时,那么就不利于容器灵雨的发展,因此,一些容器厂商就一起制定了容器镜像格式和容器运行时的标准,即Open Container Initiative(OCI)。

OCI Bundle是指满足OCI标准的一系列文件,这些文件包含了运行容器所需要的所有数据,它们存放在一个共同的目录,该目录包含以下两项:

​ 这是runC主要的代码逻辑,其中libcontainer其实就是早期docker的一大基础,为了适应OCI格式进行了二次的封装。

​ 以runc create 为例子,其对应的主要操作如下:

contianer对应的一些数据结构如下,这里创建了一个接口,里面包括了一个容器需要的所有的操作:

在linux平台上,对该接口进行了一些包裹,生成了linux 平台的一些专用接口:

还有一个重要的接口Factory:

其中也有对应Linux 平台的一个实现:

Linux Factory中的create的具体实现其实就是创建一个LinuxContainer(这正和我们之前所说的Linux下的container接口相对应):

可以看到,首先加载配置config,然后使用loadFactory创建相关的LinuxFactory,最终调用了factory.Create(id, config),然后由factory.Create(id, config)返回一个LinuxContainer。其中LoadFactory十分关键,他在最后调用了libcontainer.New()函数来返回LinuxContainer,在该New函数里面其设置了InitPath(InitPath非常重要):

在LinuxFactory的Create过程中InitPath和InitArgs被传递给linuxContainer。在知道是如何创建出一个linuxContainer之后,我们把目光返回到startContainer,该函数最后生成了runner结构体,然后调用了其run方法,参数为spec.Process,这里的spec.Process其实就是当初config.json里面的进程信息。

​ 在run方法中,一方面通过newProcess以config.json为模板创建了libcontainer.Process结构体,与进程相关的limt和Capabilities等设置都在此时完成,另一方面主要根据action做了三种操作:

Process结构体,其中大部分的内容都来自config.json文件:

start方法:

可以看到,start方法,主要是创建了一个fifo管道(这个管道主要用于阻塞,后面会用到),然后调用了start方法。

该方法第一步首先返回了一个initProcess结构体,这个结构体实现了 parentProcess接口,该结构体由linuxContainer的newInitProcess函数创建。

接口如下:

​ 在整个的newParentProcess函数过程中,首先创了一对sock和一对pipe管道,然后用这一对sock中的childsock和childpipe创建了一个cmd模板,该模板中执行的命令正好就是之前的InitPath中设置的路径("/proc/self/exe",和 "init",这其实表示会执行runC本身,参数就是init),sock和pipe其实是为了实现cmd和父进程直接的数据通信,它们被放入到cmd.ExtraFiles中,同时相关的文件描述符被放入到环境变量里面,接下来是对进程是否是初始化进程进行判断,如果不是,则调用newSetnsProcess,来返回一个setnsProcess结构体,该结构体同样实现了parentProcess接口,newSetnsProcess主要是用来在已有容器中创建一个新的进程。

​ 接下来执行includeExecFifo()方法,其就是打开之前创建的exec.fifo文件,并存入到cmd.ExtraFiles和环境变量中,最后调用最关键的函数newInitProcess来创建Init结构体:

在该函数中首先设置standard环境变量,然后从config.json里面读取需要新建的namespaces,并将这些数据进行存储,然后创建initProcess结构体,中间的shouldSendMountSources不用特别关心,它其实是为了挂载一些目录所设置的。到此为止,parentProcess结构体就基本设置完成了。

​ 在start方法中接下来调用了parentProcess的start()函数,这里其实是initProcess结构体实现的start函数。在该start函数中会启动之前设置的/proc/self/exe进程,参数为init,然后给父进程设置了cgroup,之后通过sock把信息传输给子进程,这里最关键的其实是启动了runC init这样一个子进程,因为创建的容器可能具备新的namespaces,因此,通过子进程执行runC init的时候可以很方便的通过setns()完成命名空间的切换,同时setns其实是不运行在多线程条件下使用的,但是go runtime就是多线程的,因此必须在go runtime之前设置命名空间,因此使用cgo在go runtime启动之前使用c代码设置命名空间。

​ 在cgo中,首先利用环境变量拿到了pipe(可以看到之前父进程在环境变量里面进程了设置),然后以netlink msg的格式读取父进程发送的config配置信息,接着同样执行了创建sock组的操作,这是为了使得它和孙进程之间可以相互通信,接着以状态机的形式用clone创建出符合config.json中设置的命名空间的进程,然后本来的子进程就exit(0)销毁。、

​ 接着回到create中,在执行init进程之后对其进行了cgroup的限制,这也方便在接下来的过程中防止子进程通过cgroup进行逃逸,接着父进程发送bootstrapData数据到init进程,之后create拿到init创建的子进程的pid,然后通过pipe管拿到子进程打开的fd进行保存,在进行一系列的设置之后通过sendConfig发送config.json中的要执行的进程的信息,接下来就是容器初始化和执行config.json中设置的进程了,具体的过程可以参考standard_init_linux.go中linuxStandardInit的Init函数,到此为止一个容器的大致启动过程就基本分析结束了。

参考链接:

https://segmentfault.com/a/1190000017576314#item-1

https://github.com/opencontainers/runc

 
 
type BaseContainer interface {
    // Returns the ID of the container
    ID() string
 
    // Returns the current status of the container.
    Status() (Status, error)
 
    // State returns the current container's state information.
    State() (*State, error)
 
    // OCIState returns the current container's state information.
    OCIState() (*specs.State, error)
 
    // Returns the current config of the container.
    Config() configs.Config
 
    // Returns the PIDs inside this container. The PIDs are in the namespace of the calling process.
    //
    // Some of the returned PIDs may no longer refer to processes in the Container, unless
    // the Container state is PAUSED in which case every PID in the slice is valid.
    Processes() ([]int, error)
 
    // Returns statistics for the container.
    Stats() (*Stats, error)
 
    // Set resources of container as configured
    //
    // We can use this to change resources when containers are running.
    //
    Set(config configs.Config) error
 
    // Start a process inside the container. Returns error if process fails to
    // start. You can track process lifecycle with passed Process structure.
    Start(process *Process) (err error)
 
    // Run immediately starts the process inside the container.  Returns error if process
    // fails to start.  It does not block waiting for the exec fifo  after start returns but
    // opens the fifo after start returns.
    Run(process *Process) (err error)
 
    // Destroys the container, if its in a valid state, after killing any
    // remaining running processes.
    //
    // Any event registrations are removed before the container is destroyed.
    // No error is returned if the container is already destroyed.
    //
    // Running containers must first be stopped using Signal(..).
    // Paused containers must first be resumed using Resume(..).
    Destroy() error
 
    // Signal sends the provided signal code to the container's initial process.
    //
    // If all is specified the signal is sent to all processes in the container
    // including the initial process.
    Signal(s os.Signal, all bool) error
 
    // Exec signals the container to exec the users process at the end of the init.
    Exec() error
}
type BaseContainer interface {
    // Returns the ID of the container
    ID() string
 
    // Returns the current status of the container.
    Status() (Status, error)
 
    // State returns the current container's state information.
    State() (*State, error)
 
    // OCIState returns the current container's state information.
    OCIState() (*specs.State, error)
 
    // Returns the current config of the container.
    Config() configs.Config
 
    // Returns the PIDs inside this container. The PIDs are in the namespace of the calling process.
    //
    // Some of the returned PIDs may no longer refer to processes in the Container, unless
    // the Container state is PAUSED in which case every PID in the slice is valid.
    Processes() ([]int, error)
 
    // Returns statistics for the container.
    Stats() (*Stats, error)
 
    // Set resources of container as configured
    //
    // We can use this to change resources when containers are running.
    //
    Set(config configs.Config) error
 
    // Start a process inside the container. Returns error if process fails to
    // start. You can track process lifecycle with passed Process structure.
    Start(process *Process) (err error)
 
    // Run immediately starts the process inside the container.  Returns error if process
    // fails to start.  It does not block waiting for the exec fifo  after start returns but
    // opens the fifo after start returns.
    Run(process *Process) (err error)
 
    // Destroys the container, if its in a valid state, after killing any
    // remaining running processes.
    //
    // Any event registrations are removed before the container is destroyed.
    // No error is returned if the container is already destroyed.
    //
    // Running containers must first be stopped using Signal(..).
    // Paused containers must first be resumed using Resume(..).
    Destroy() error
 
    // Signal sends the provided signal code to the container's initial process.
    //
    // If all is specified the signal is sent to all processes in the container
    // including the initial process.
    Signal(s os.Signal, all bool) error
 
    // Exec signals the container to exec the users process at the end of the init.
    Exec() error
}
// Container is a libcontainer container object.
//
// Each container is thread-safe within the same process. Since a container can
// be destroyed by a separate process, any function may return that the container
// was not found.
type Container interface {
    BaseContainer
 
    // Methods below here are platform specific
 
    // Checkpoint checkpoints the running container's state to disk using the criu(8) utility.
    Checkpoint(criuOpts *CriuOpts) error
 
    // Restore restores the checkpointed container to a running state using the criu(8) utility.
    Restore(process *Process, criuOpts *CriuOpts) error
 
    // If the Container state is RUNNING or CREATED, sets the Container state to PAUSING and pauses
    // the execution of any user processes. Asynchronously, when the container finished being paused the
    // state is changed to PAUSED.
    // If the Container state is PAUSED, do nothing.
    Pause() error
 
    // If the Container state is PAUSED, resumes the execution of any user processes in the
    // Container before setting the Container state to RUNNING.
    // If the Container state is RUNNING, do nothing.
    Resume() error
 
    // NotifyOOM returns a read-only channel signaling when the container receives an OOM notification.
    NotifyOOM() (<-chan struct{}, error)
 
    // NotifyMemoryPressure returns a read-only channel signaling when the container reaches a given pressure level
    NotifyMemoryPressure(level PressureLevel) (<-chan struct{}, error)
}
// Container is a libcontainer container object.
//
// Each container is thread-safe within the same process. Since a container can
// be destroyed by a separate process, any function may return that the container
// was not found.
type Container interface {
    BaseContainer
 
    // Methods below here are platform specific
 
    // Checkpoint checkpoints the running container's state to disk using the criu(8) utility.
    Checkpoint(criuOpts *CriuOpts) error
 
    // Restore restores the checkpointed container to a running state using the criu(8) utility.
    Restore(process *Process, criuOpts *CriuOpts) error
 
    // If the Container state is RUNNING or CREATED, sets the Container state to PAUSING and pauses
    // the execution of any user processes. Asynchronously, when the container finished being paused the
    // state is changed to PAUSED.
    // If the Container state is PAUSED, do nothing.
    Pause() error
 
    // If the Container state is PAUSED, resumes the execution of any user processes in the
    // Container before setting the Container state to RUNNING.
    // If the Container state is RUNNING, do nothing.
    Resume() error
 
    // NotifyOOM returns a read-only channel signaling when the container receives an OOM notification.
    NotifyOOM() (<-chan struct{}, error)
 
    // NotifyMemoryPressure returns a read-only channel signaling when the container reaches a given pressure level
    NotifyMemoryPressure(level PressureLevel) (<-chan struct{}, error)
}
type Factory interface {
    // Creates a new container with the given id and starts the initial process inside it.
    // id must be a string containing only letters, digits and underscores and must contain
    // between 1 and 1024 characters, inclusive.
    //
    // The id must not already be in use by an existing container. Containers created using
    // a factory with the same path (and filesystem) must have distinct ids.
    //
    // Returns the new container with a running process.
    //
    // On error, any partially created container parts are cleaned up (the operation is atomic).
    Create(id string, config *configs.Config) (Container, error)
 
    // Load takes an ID for an existing container and returns the container information
    // from the state.  This presents a read only view of the container.
    Load(id string) (Container, error)
 
    // StartInitialization is an internal API to libcontainer used during the reexec of the
    // container.
    StartInitialization() error
 
    // Type returns info string about factory type (e.g. lxc, libcontainer...)
    Type() string
}
type Factory interface {
    // Creates a new container with the given id and starts the initial process inside it.
    // id must be a string containing only letters, digits and underscores and must contain
    // between 1 and 1024 characters, inclusive.
    //
    // The id must not already be in use by an existing container. Containers created using
    // a factory with the same path (and filesystem) must have distinct ids.
    //
    // Returns the new container with a running process.
    //
    // On error, any partially created container parts are cleaned up (the operation is atomic).
    Create(id string, config *configs.Config) (Container, error)
 
    // Load takes an ID for an existing container and returns the container information
    // from the state.  This presents a read only view of the container.
    Load(id string) (Container, error)
 
    // StartInitialization is an internal API to libcontainer used during the reexec of the
    // container.
    StartInitialization() error
 
    // Type returns info string about factory type (e.g. lxc, libcontainer...)
    Type() string
}
// LinuxFactory implements the default factory interface for linux based systems.
type LinuxFactory struct {
    // Root directory for the factory to store state.
    Root string
 
    // InitPath is the path for calling the init responsibilities for spawning
    // a container.
    InitPath string
 
    // InitArgs are arguments for calling the init responsibilities for spawning
    // a container.
    InitArgs []string
 
    // CriuPath is the path to the criu binary used for checkpoint and restore of
    // containers.
    CriuPath string
 
    // New{u,g}idmapPath is the path to the binaries used for mapping with
    // rootless containers.
    NewuidmapPath string
    NewgidmapPath string
 
    // Validator provides validation to container configurations.
    Validator validate.Validator
 
    // NewIntelRdtManager returns an initialized Intel RDT manager for a single container.
    NewIntelRdtManager func(config *configs.Config, id string, path string) intelrdt.Manager
}
// LinuxFactory implements the default factory interface for linux based systems.
type LinuxFactory struct {
    // Root directory for the factory to store state.
    Root string
 
    // InitPath is the path for calling the init responsibilities for spawning
    // a container.
    InitPath string
 
    // InitArgs are arguments for calling the init responsibilities for spawning
    // a container.
    InitArgs []string
 
    // CriuPath is the path to the criu binary used for checkpoint and restore of
    // containers.
    CriuPath string
 
    // New{u,g}idmapPath is the path to the binaries used for mapping with
    // rootless containers.
    NewuidmapPath string
    NewgidmapPath string
 
    // Validator provides validation to container configurations.
    Validator validate.Validator
 
    // NewIntelRdtManager returns an initialized Intel RDT manager for a single container.
    NewIntelRdtManager func(config *configs.Config, id string, path string) intelrdt.Manager
}
type linuxContainer struct {
    id                   string
    root                 string
    config               *configs.Config
    cgroupManager        cgroups.Manager
    intelRdtManager      intelrdt.Manager
    initPath             string
    initArgs             []string
    initProcess          parentProcess
    initProcessStartTime uint64
    criuPath             string
    newuidmapPath        string
    newgidmapPath        string
    m                    sync.Mutex
    criuVersion          int
    state                containerState
    created              time.Time
    fifo                 *os.File
}
type linuxContainer struct {
    id                   string
    root                 string
    config               *configs.Config
    cgroupManager        cgroups.Manager
    intelRdtManager      intelrdt.Manager
    initPath             string
    initArgs             []string
    initProcess          parentProcess
    initProcessStartTime uint64
    criuPath             string
    newuidmapPath        string
    newgidmapPath        string
    m                    sync.Mutex
    criuVersion          int
    state                containerState
    created              time.Time
    fifo                 *os.File
}
func createContainer(context *cli.Context, id string, spec *specs.Spec) (libcontainer.Container, error) {
    rootlessCg, err := shouldUseRootlessCgroupManager(context)
    if err != nil {
        return nil, err
    }
    config, err := specconv.CreateLibcontainerConfig(&specconv.CreateOpts{
        CgroupName:       id,
        UseSystemdCgroup: context.GlobalBool("systemd-cgroup"),
        NoPivotRoot:      context.Bool("no-pivot"),
        NoNewKeyring:     context.Bool("no-new-keyring"),
        Spec:             spec,
        RootlessEUID:     os.Geteuid() != 0,
        RootlessCgroups:  rootlessCg,
    })
    if err != nil {
        return nil, err
    }
 
    factory, err := loadFactory(context)
    if err != nil {
        return nil, err
    }
    return factory.Create(id, config)
}
func createContainer(context *cli.Context, id string, spec *specs.Spec) (libcontainer.Container, error) {
    rootlessCg, err := shouldUseRootlessCgroupManager(context)
    if err != nil {
        return nil, err
    }
    config, err := specconv.CreateLibcontainerConfig(&specconv.CreateOpts{
        CgroupName:       id,
        UseSystemdCgroup: context.GlobalBool("systemd-cgroup"),
        NoPivotRoot:      context.Bool("no-pivot"),
        NoNewKeyring:     context.Bool("no-new-keyring"),
        Spec:             spec,
        RootlessEUID:     os.Geteuid() != 0,
        RootlessCgroups:  rootlessCg,
    })
    if err != nil {
        return nil, err
    }
 
    factory, err := loadFactory(context)
    if err != nil {
        return nil, err
    }
    return factory.Create(id, config)
}
// New returns a linux based container factory based in the root directory and
// configures the factory with the provided option funcs.
func New(root string, options ...func(*LinuxFactory) error) (Factory, error) {
    if root != "" {
        if err := os.MkdirAll(root, 0o700); err != nil {
            return nil, err
        }
    }
    l := &LinuxFactory{
        Root:      root,
        InitPath:  "/proc/self/exe",
        InitArgs:  []string{os.Args[0], "init"},
        Validator: validate.New(),
        CriuPath:  "criu",
    }
 
    for _, opt := range options {
        if opt == nil {
            continue
        }
        if err := opt(l); err != nil {
            return nil, err
        }
    }
    return l, nil
}
// New returns a linux based container factory based in the root directory and
// configures the factory with the provided option funcs.
func New(root string, options ...func(*LinuxFactory) error) (Factory, error) {
    if root != "" {
        if err := os.MkdirAll(root, 0o700); err != nil {
            return nil, err
        }
    }
    l := &LinuxFactory{
        Root:      root,
        InitPath:  "/proc/self/exe",
        InitArgs:  []string{os.Args[0], "init"},
        Validator: validate.New(),
        CriuPath:  "criu",
    }
 
    for _, opt := range options {
        if opt == nil {
            continue
        }
        if err := opt(l); err != nil {
            return nil, err
        }
    }
    return l, nil
}
 
switch r.action {
case CT_ACT_CREATE:
    err = r.container.Start(process)
case CT_ACT_RESTORE:
    err = r.container.Restore(process, r.criuOpts)
case CT_ACT_RUN:
    err = r.container.Run(process)
default:
    panic("Unknown action")
}
switch r.action {
case CT_ACT_CREATE:
    err = r.container.Start(process)
case CT_ACT_RESTORE:
    err = r.container.Restore(process, r.criuOpts)
case CT_ACT_RUN:
    err = r.container.Run(process)
default:

[招生]系统0day安全班,企业级设备固件漏洞挖掘,Linux平台漏洞挖掘!

收藏
免费 6
支持
分享
最新回复 (1)
雪    币: 4168
活跃值: (15932)
能力值: ( LV9,RANK:710 )
在线值:
发帖
回帖
粉丝
2
支持
2022-1-12 14:54
0
游客
登录 | 注册 方可回帖
返回
// // 统计代码