[原创]Linux内核PWN [BPF模块整数溢出] 漏洞分析-Pwn-看雪-安全社区|安全招聘|kanxue.com

[原创]Linux内核PWN [BPF模块整数溢出] 漏洞分析

发表于: 2021-2-27 23:45 13618

[原创]Linux内核PWN [BPF模块整数溢出] 漏洞分析

Roland_

2021-2-27 23:45

13618

写在前面：

本篇文章带来一个Linux内核的模块的整数溢出，非常适合CTF选手与实战接轨，考察点较为全面，也有可以深挖的地方～

本漏洞不涉及正式发行版，主要涉及Linux Kernel 4.20rc1-4.20rc4，我们使用Linux 4.20-rc3 进行分析。

首先在：https://gitlab.freedesktop.org/seanpaul/kernel/-/tags

下载对应rc版本源代码。编译。

BPF全称Berkeley Packet Filter，主要涉及包过滤这一部分，分析网络流量。他在数据链路层上提供了接口。BPF支持包过滤，允许用户态进程提供一个过滤程序，此程序指定了我们想要接收到那种包。比如当我们使用 tcpdump时可能只想收到tcp连接初始化的包等。BPF只返回进程提供的能通过过滤器的包。

这种机制避免了拷贝一些不需要的包从内核态到进程，极大的提升了性能。

For example, a tcpdump process may want to receive only packets that initiate a TCP connection. BPF returns only packets that pass the filter that the process supplies. This avoids copying unwanted packets from the operating system kernel) to the process, greatly improving performance.

常见的一些抓包工具的实现都与其有关。

值得一提的是，BPF模块也可以基于此对用户态的一些程序进行一些tracing或者访问控制之类的。

比如PWN手熟悉的seccomp沙箱，在设计时也借用了其思想，不同的是当seccomp使用时与BPF做packet过滤不同的是，过滤操作当前的user_regs_struct数据结构，需要我们预先定义好的一套syscall的过滤规则，以及对应的表示。

更加具体的，可以看一下：

Linux Socket Filtering aka Berkeley Packet Filter (BPF)

Seccomp BPF (SECure COMPuting with filters)

LINUX系统安全_SANDBOX_SECCOMP-BPF

源码位置：https://elixir.bootlin.com/linux/v4.20-rc3/source/kernel/bpf/syscall.c#L2466

我们想调用BPF功能：

追踪到其系统调用：

主要的，我们首先关心

当 cmd 为 BPF_MAP_CREATE 时，首先调用 map_create(&attr) 来创建map。

接下来看一下 map_create(&attr) 的源码。

可以看到调用了 find_and_alloc_map 来分配空间。

首先取了bpf_attr结构体中的 map_type 字段。赋值到u32的type。

而 bpf_map_types 是一个数组，在linux/bpf_types.h中储存了合法的 BPF_MAP_TYPE

如：BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops)

ARRAY_SIZE是获取数组中的元素个数。__must_be_array(arr) 要求其必须被用在数组上。

这个宏在驱动编程时常常用来获取设备结构体中设备的个数。

所以此函数在开始阶段判断了type的合法性。若不合法return err。

接下来通过我们当前的type计算了一个数组下标，作为访问数组 bpf_map_types 的下标，找到对应的 bpf_map_type，然后获取对应的函数虚表为ops。

接下来有一个对于虚表中函数的调用：ops->map_alloc(attr)

注意，此时的map_alloc是一个虚表（bpf_map_ops）中的函数。

到此为止，一切正常，我们回过来看这个虚表ops。

虚表ops，类型为 const struct bpf_map_ops

初始化于：ops = bpf_map_types[type];

而数组中的每一项是以如下的一一对应的关系存在：

[_id] = &_ops,

数组中的每一项都由宏 BPF_MAP_TYPE 封装起来，对应id与ops

此时我们关注最下面的：

queue_map_ops为：

当我们将 attr->map_type 初始化为22时，对应着BPF_MAP_TYPE_QUEUE，此时会使用虚表 queue_map_ops

而这张虚表中的 .map_alloc = queue_stack_map_alloc 函数中，即存在我们的整数溢出。

ps：网上的分析在这里讲的很模糊，直接一下就到了对于 queue_stack_map_alloc 的分析，我这里尽量讲详细一些，实际上如果你不调整map type是到达不了此处的。

总结一下想要到达漏洞函数的条件：

设置 attr->map_type 为 22

通过 map_alloc_check 的检查。

此时是：

attr->map_ifindex 为空。

比如我们按照如下设置：

就可以正常到达 queue_stack_map_alloc

我们关注到这样一行：

size 本身作为一个unsigned int 32的类型，如果我们令 attr->max_entries = -1 ，那么size就会产生一个整数溢出。

可以看到size溢出为0了。

更进一步的，他控制了 bpf_map_area_alloc 分配的大小：

而 bpf_map_area_alloc有这样的一个调用：

以size为参数，通过kmalloc_node 分配内核空间为area。

这个size在正常情况下应该是：struct bpf_queue_stac 的大小加上 value_size * size 相当于是map中每一项的大小（value_size）乘项数+1（attr->max_entries + 1），而溢出导致了这一部分为0，也就是分配空间时只分配了struct bpf_queue_stack的空间而没有分配对应的map所需的空间。导致实际分配的内核堆空间过小。

bpf_queue_stack结构体定义如下：

关于bpf map的实际结构与作用可以看：BPF数据传递的桥梁——BPF Map（一）

总之这一步导致了堆空间分配过小，没有给map留够空间。

接下来调用了 bpf_map_init_from_attr(&qs->map, attr) 进行初始化：

最后从 find_and_alloc_map(attr) 中返回我们的map（struct bpf_map *）。

已知现在我们有了一个map，此map的value_size、max_entries均是可以由我们自定义的attr结构体控制。

并且由于整数溢出，导致只分配了struct bpf_queue_stack 的空间，而没有分配map对应的空间。（个人理解这个bpf_queue_stack类似报文头部，而map对应的空间类似payload）。

在分配结束后，返回了的map对象的地址就是我们通过kmalloc申请出的内核堆的位置。而这个申请的大小是0x100.

接下来，我们需要找到一块可以造成堆溢出的位置。我们将视角移出 map_create 函数。

在bpf系统调用中的大switch里，有这样一个分支：

根据名字进行合理猜测，在 BPF_MAP_CREATE 分支中我们创建了map对象。

而在BPF_MAP_UPDATE_ELEM 分支中我们对此对象进行更新等操作。

这个函数被我简化后如下：

在最后的时候调用虚表中的 map_push_elem 将value中的内容放入map。

他对应的虚表函数是queue_stack_map_push_elem

想要理解这个堆溢出，最好通过一个简图来说明。

这是bpf_queue_stack、bpf_map对应的结构。

我们在整数溢出时，只分配了map header的空间，而没有正常的分配map payload的空间。

所以，对于0x100大小的map header+map payload，如果我们要拷贝的大小（value_size）大于（0x100-map header）的大小，就会造成堆溢出

我们将目光放在整个map header的 struct bpf_map上。

可以看到其第一个成员就是虚表指针 ops ，换句话说，在我们kamlloc出的slab中的第一个位置就是指向当前map虚表的指针，如果我们能通过上方的slab堆溢出，劫持下方slab的虚表指针，再fake相应的vtable，就可以实现一套内核的执行流劫持。

经过我们测试

一开始对map的不完全分配的slab地址是 0xffff8880059fb300 ，然后我们memcpy的时候是向 0xffff8880059fb3d0 拷贝，也就是map payload是从偏移0xd0放置的。差值为0x100-0xd0 = 0x30，也就是只要我们拷贝大于0x30的数据，就可以实现溢出，而我们每update一次，拷贝的大小是最初的 attr->value_size 。

最终的攻击点我们选择在在fake vtable上伪造fake map_release函数指针，通过close对应的map id触发fake map_releas完成执行流劫持。

首先给出我自己写的exp

效果：

个人觉得我写的这个比网上的exp要略微优雅一些（bypass smep）

几个关键点：

实际上本例我认为还可以做到一个smap bypass，说一下想法orz。

注意到一开始的时候，我们使用了这样一条在内核中常用的栈迁移gadgets：xchg eax esp; ret 原本的想法是将其直接放到用户态mmap出来的fake ops上。但是这样是没法bypass smap的，因为涉及到向用户态取数据。

但是，如果我们借用ret2dir的思想，在用户态大量mmap喷射内存到phymaps上，然后将spray出来的空间填上我们的gadgets，然后在内核态找到phymaps，那么就可以直接从内核态取数据了（gadgets），这样就实现了一个smap的bypass～

我这里写了一个不完全版的bypass smap的exp

说一下思想，首先我借由https://bbs.pediy.com/thread-263992.htm的思路大致定位到了phymaps的位置。调整qemu内存为120M，增大几率。

接下来我尝试在用户态进行喷射，将喷射出的每一个堆块memset成其对应的编号。用这种方法，更进一步的，我们定位到了 0xffff888003015000处的phymaps，这里是phymaps最初的位置。

在定位结束后，调整参数正式进行喷射，将喷射出的空间部分初始化xchg_eax_esp成相关的地址，此时0xffff888003015000处的phymaps已经被喷满了gadgets的地址。

以上步骤做完后，我们可以将fake ops相当于放在内核态而非mmap出来的用户态，bypass了第一个会触发smap的地方。

具体的可以看这个函数：

但是由于使用了xchg这条gadgets，esp会被换到一个用户态的地址，而后面又接了ret指令，所以这里也会触发smap，可能需要换gadgets，但是暂时我没有找到比较好的，如果后面找到了会进行更新。理想情况下是找到一个不会碰到用户态的gadgets，然后把rop链也喷射到phymaps上。

不过借此温习一下ret2dir的思想还是非常有用的 : )

#include <linux/bpf.h>

　　　　int bpf(int cmd, union bpf_attr *attr, unsigned int size);

#include <linux/bpf.h>

　　　　int bpf(int cmd, union bpf_attr *attr, unsigned int size);

SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
{

    union bpf_attr attr = {};

    int err;
 
    if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))

        return -EPERM;
 
    err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);

    if (err)

        return err;

    size = min_t(u32, size, sizeof(attr));
 
    /* copy attributes from user space, may be less than sizeof(bpf_attr) */

    if (copy_from_user(&attr, uattr, size) != 0)

        return -EFAULT;
 
    err = security_bpf(cmd, &attr, size);

    if (err < 0)

        return err;
 
    switch (cmd) {

    case BPF_MAP_CREATE:

        err = map_create(&attr);

        break;

    case BPF_MAP_LOOKUP_ELEM:

        err = map_lookup_elem(&attr);

        break;

    case BPF_MAP_UPDATE_ELEM:

        err = map_update_elem(&attr);

        break;

    case BPF_MAP_DELETE_ELEM:

        err = map_delete_elem(&attr);

        break;

    case BPF_MAP_GET_NEXT_KEY:

        err = map_get_next_key(&attr);

        break;

    case BPF_PROG_LOAD:

        err = bpf_prog_load(&attr);

        break;

    case BPF_OBJ_PIN:

        err = bpf_obj_pin(&attr);

        break;

    case BPF_OBJ_GET:

        err = bpf_obj_get(&attr);

        break;

    case BPF_PROG_ATTACH:

        err = bpf_prog_attach(&attr);

        break;

    case BPF_PROG_DETACH:

        err = bpf_prog_detach(&attr);

        break;

    case BPF_PROG_QUERY:

        err = bpf_prog_query(&attr, uattr);

        break;

    case BPF_PROG_TEST_RUN:

        err = bpf_prog_test_run(&attr, uattr);

        break;

    case BPF_PROG_GET_NEXT_ID:

        err = bpf_obj_get_next_id(&attr, uattr,

                      &prog_idr, &prog_idr_lock);

        break;

    case BPF_MAP_GET_NEXT_ID:

        err = bpf_obj_get_next_id(&attr, uattr,

                      &map_idr, &map_idr_lock);

        break;

    case BPF_PROG_GET_FD_BY_ID:

        err = bpf_prog_get_fd_by_id(&attr);

        break;

    case BPF_MAP_GET_FD_BY_ID:

        err = bpf_map_get_fd_by_id(&attr);

        break;

    case BPF_OBJ_GET_INFO_BY_FD:

        err = bpf_obj_get_info_by_fd(&attr, uattr);

        break;

    case BPF_RAW_TRACEPOINT_OPEN:

        err = bpf_raw_tracepoint_open(&attr);

        break;

    case BPF_BTF_LOAD:

        err = bpf_btf_load(&attr);

        break;

    case BPF_BTF_GET_FD_BY_ID:

        err = bpf_btf_get_fd_by_id(&attr);

        break;

    case BPF_TASK_FD_QUERY:

        err = bpf_task_fd_query(&attr, uattr);

        break;

    case BPF_MAP_LOOKUP_AND_DELETE_ELEM:

        err = map_lookup_and_delete_elem(&attr);

        break;

    default:

        err = -EINVAL;

        break;

    }
 
    return err;
}

SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)

{

union bpf_attr attr = {};

int err;

if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))

return -EPERM;

err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);

if (err)

return err;

size = min_t(u32, size, sizeof(attr));

/* copy attributes from user space, may be less than sizeof(bpf_attr) */

if (copy_from_user(&attr, uattr, size) != 0)

return -EFAULT;

err = security_bpf(cmd, &attr, size);

if (err < 0)

return err;

switch (cmd) {

case BPF_MAP_CREATE:

err = map_create(&attr);

break;

case BPF_MAP_LOOKUP_ELEM:

err = map_lookup_elem(&attr);

break;

case BPF_MAP_UPDATE_ELEM:

err = map_update_elem(&attr);

break;

case BPF_MAP_DELETE_ELEM:

err = map_delete_elem(&attr);

break;

case BPF_MAP_GET_NEXT_KEY:

err = map_get_next_key(&attr);

break;

case BPF_PROG_LOAD:

err = bpf_prog_load(&attr);

break;

case BPF_OBJ_PIN:

err = bpf_obj_pin(&attr);

break;

case BPF_OBJ_GET:

err = bpf_obj_get(&attr);

break;

case BPF_PROG_ATTACH:

err = bpf_prog_attach(&attr);

break;

case BPF_PROG_DETACH:

err = bpf_prog_detach(&attr);

break;

case BPF_PROG_QUERY:

err = bpf_prog_query(&attr, uattr);

break;

case BPF_PROG_TEST_RUN:

err = bpf_prog_test_run(&attr, uattr);

break;

case BPF_PROG_GET_NEXT_ID:

err = bpf_obj_get_next_id(&attr, uattr,

&prog_idr, &prog_idr_lock);

break;

case BPF_MAP_GET_NEXT_ID:

err = bpf_obj_get_next_id(&attr, uattr,

&map_idr, &map_idr_lock);

break;

case BPF_PROG_GET_FD_BY_ID:

err = bpf_prog_get_fd_by_id(&attr);

break;

case BPF_MAP_GET_FD_BY_ID:

err = bpf_map_get_fd_by_id(&attr);

break;

case BPF_OBJ_GET_INFO_BY_FD:

err = bpf_obj_get_info_by_fd(&attr, uattr);

break;

case BPF_RAW_TRACEPOINT_OPEN:

err = bpf_raw_tracepoint_open(&attr);

break;

case BPF_BTF_LOAD:

err = bpf_btf_load(&attr);

break;

case BPF_BTF_GET_FD_BY_ID:

err = bpf_btf_get_fd_by_id(&attr);

break;

case BPF_TASK_FD_QUERY:

err = bpf_task_fd_query(&attr, uattr);

break;

case BPF_MAP_LOOKUP_AND_DELETE_ELEM:

err = map_lookup_and_delete_elem(&attr);

break;

default:

err = -EINVAL;

break;

}

return err;

}

union bpf_attr attr = {};
......
case BPF_MAP_CREATE:

        err = map_create(&attr);

        break;

union bpf_attr attr = {};

......

case BPF_MAP_CREATE:

err = map_create(&attr);

break;

static int map_create(union bpf_attr *attr)
{

    int numa_node = bpf_map_attr_numa_node(attr);

    struct bpf_map *map;

    int f_flags;

    int err;
 
    err = CHECK_ATTR(BPF_MAP_CREATE);

    if (err)

        return -EINVAL;
 
    f_flags = bpf_get_file_flag(attr->map_flags);

    if (f_flags < 0)

        return f_flags;
 
    ......
 
    /* find map type and init map: hashtable vs rbtree vs bloom vs ... */

    map = find_and_alloc_map(attr);

    if (IS_ERR(map))

        return PTR_ERR(map);

  ......
}

static int map_create(union bpf_attr *attr)

{

int numa_node = bpf_map_attr_numa_node(attr);

struct bpf_map *map;

int f_flags;

int err;

err = CHECK_ATTR(BPF_MAP_CREATE);

if (err)

return -EINVAL;

f_flags = bpf_get_file_flag(attr->map_flags);

if (f_flags < 0)

return f_flags;

......

/* find map type and init map: hashtable vs rbtree vs bloom vs ... */

map = find_and_alloc_map(attr);

if (IS_ERR(map))

return PTR_ERR(map);

......

}

static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
{

    const struct bpf_map_ops *ops;

    u32 type = attr->map_type;

    struct bpf_map *map;

    int err;
 
    if (type >= ARRAY_SIZE(bpf_map_types))    

        return ERR_PTR(-EINVAL);

    type = array_index_nospec(type, ARRAY_SIZE(bpf_map_types));

    ops = bpf_map_types[type];

    if (!ops)

        return ERR_PTR(-EINVAL);
 
    if (ops->map_alloc_check) {

        err = ops->map_alloc_check(attr);

        if (err)

            return ERR_PTR(err);

    }

    if (attr->map_ifindex)

        ops = &bpf_map_offload_ops;

    map = ops->map_alloc(attr);

    if (IS_ERR(map))

        return map;

    map->ops = ops;

    map->map_type = type;

    return map;
}

static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)

{

const struct bpf_map_ops *ops;

u32 type = attr->map_type;

struct bpf_map *map;

int err;

if (type >= ARRAY_SIZE(bpf_map_types))

return ERR_PTR(-EINVAL);

type = array_index_nospec(type, ARRAY_SIZE(bpf_map_types));

ops = bpf_map_types[type];

if (!ops)

return ERR_PTR(-EINVAL);

if (ops->map_alloc_check) {

err = ops->map_alloc_check(attr);

if (err)

return ERR_PTR(err);

}

if (attr->map_ifindex)

ops = &bpf_map_offload_ops;

map = ops->map_alloc(attr);

if (IS_ERR(map))

return map;

map->ops = ops;

map->map_type = type;

return map;

}

__u32 map_type; /* one of enum bpf_map_type */

static const struct bpf_map_ops * const bpf_map_types[] = {
#define BPF_PROG_TYPE(_id, _ops)
#define BPF_MAP_TYPE(_id, _ops) \

    [_id] = &_ops,
#include <linux/bpf_types.h>
#undef BPF_PROG_TYPE
#undef BPF_MAP_TYPE
};

static const struct bpf_map_ops * const bpf_map_types[] = {

#define BPF_PROG_TYPE(_id, _ops)

#define BPF_MAP_TYPE(_id, _ops) \

[_id] = &_ops,

#include <linux/bpf_types.h>

#undef BPF_PROG_TYPE

#undef BPF_MAP_TYPE

};

/**

 * ARRAY_SIZE - get the number of elements in array @arr

 * @arr: array to be sized

 */
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))

/**

* ARRAY_SIZE - get the number of elements in array @arr

* @arr: array to be sized

*/

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))

if (attr->map_ifindex)

        ops = &bpf_map_offload_ops;

map = ops->map_alloc(attr);

    if (IS_ERR(map))

        return map;

    map->ops = ops;

    map->map_type = type;

    return map;

if (attr->map_ifindex)

ops = &bpf_map_offload_ops;

map = ops->map_alloc(attr);

if (IS_ERR(map))

return map;

map->ops = ops;

map->map_type = type;

return map;

#define BPF_MAP_TYPE(_id, _ops)

BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops) //BPF_MAP_TYPE_QUEUE为 22 (0x16)

const struct bpf_map_ops queue_map_ops = {

    .map_alloc_check = queue_stack_map_alloc_check,    

    .map_alloc = queue_stack_map_alloc,                            //关键函数

    .map_free = queue_stack_map_free,

    .map_lookup_elem = queue_stack_map_lookup_elem,

    .map_update_elem = queue_stack_map_update_elem,

    .map_delete_elem = queue_stack_map_delete_elem,

    .map_push_elem = queue_stack_map_push_elem,

    .map_pop_elem = queue_map_pop_elem,

    .map_peek_elem = queue_map_peek_elem,

    .map_get_next_key = queue_stack_map_get_next_key,
};

const struct bpf_map_ops queue_map_ops = {

.map_alloc_check = queue_stack_map_alloc_check,

.map_alloc = queue_stack_map_alloc, //关键函数

.map_free = queue_stack_map_free,

.map_lookup_elem = queue_stack_map_lookup_elem,

.map_update_elem = queue_stack_map_update_elem,

.map_delete_elem = queue_stack_map_delete_elem,

.map_push_elem = queue_stack_map_push_elem,

.map_pop_elem = queue_map_pop_elem,

.map_peek_elem = queue_map_peek_elem,

.map_get_next_key = queue_stack_map_get_next_key,

};

/* Called from syscall */

static int queue_stack_map_alloc_check(union bpf_attr *attr)
{

  /* check sanity of attributes */

  if (attr->max_entries == 0 || attr->key_size != 0 ||

      attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK)

      return -EINVAL;
 
  if (attr->value_size > KMALLOC_MAX_SIZE)

      /* if value_size is bigger, the user space won't be able to

       * access the elements.

       */

      return -E2BIG;
 
  return 0;    //要到达这里
}

/* Called from syscall */

static int queue_stack_map_alloc_check(union bpf_attr *attr)

{

/* check sanity of attributes */

if (attr->max_entries == 0 || attr->key_size != 0 ||

attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK)

return -EINVAL;

if (attr->value_size > KMALLOC_MAX_SIZE)

/* if value_size is bigger, the user space won't be able to

* access the elements.

*/

return -E2BIG;

return 0; //要到达这里

}

union bpf_attr *attr;

attr->map_type = BPF_MAP_TYPE_QUEUE;    //BPF_MAP_TYPE_QUEUE

attr->max_entries = -1;

attr->map_flags = 0;

attr->value_size = 0x20;

int ret = syscall(__NR_bpf,0,attr,0x30);

union bpf_attr *attr;

attr->map_type = BPF_MAP_TYPE_QUEUE; //BPF_MAP_TYPE_QUEUE

attr->max_entries = -1;

attr->map_flags = 0;

attr->value_size = 0x20;

int ret = syscall(__NR_bpf,0,attr,0x30);

static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
{

    int ret, numa_node = bpf_map_attr_numa_node(attr);

    struct bpf_queue_stack *qs;

    u32 size, value_size;

    u64 queue_size, cost;
 
    size = attr->max_entries + 1;

    value_size = attr->value_size;
 
    queue_size = sizeof(*qs) + (u64) value_size * size;
 
    cost = queue_size;

    if (cost >= U32_MAX - PAGE_SIZE)

        return ERR_PTR(-E2BIG);
 
    cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;
 
    ret = bpf_map_precharge_memlock(cost);

    if (ret < 0)

        return ERR_PTR(ret);
 
    qs = bpf_map_area_alloc(queue_size, numa_node);

    if (!qs)

        return ERR_PTR(-ENOMEM);
 
    memset(qs, 0, sizeof(*qs));
 
    bpf_map_init_from_attr(&qs->map, attr);
 
    qs->map.pages = cost;

    qs->size = size;
 
    raw_spin_lock_init(&qs->lock);
 
    return &qs->map;
}

static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)

{

int ret, numa_node = bpf_map_attr_numa_node(attr);

struct bpf_queue_stack *qs;

u32 size, value_size;

u64 queue_size, cost;

size = attr->max_entries + 1;

value_size = attr->value_size;

queue_size = sizeof(*qs) + (u64) value_size * size;

cost = queue_size;

if (cost >= U32_MAX - PAGE_SIZE)

return ERR_PTR(-E2BIG);

cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT;

ret = bpf_map_precharge_memlock(cost);

if (ret < 0)

return ERR_PTR(ret);

qs = bpf_map_area_alloc(queue_size, numa_node);

if (!qs)

return ERR_PTR(-ENOMEM);

memset(qs, 0, sizeof(*qs));

bpf_map_init_from_attr(&qs->map, attr);

qs->map.pages = cost;

qs->size = size;

raw_spin_lock_init(&qs->lock);

return &qs->map;

}

u32 size;

size = attr->max_entries + 1;

u32 size;

size = attr->max_entries + 1;

  ......

  struct bpf_queue_stack *qs;    

  u32 size, value_size;

  u64 queue_size, cost;
 
  size = attr->max_entries + 1;

  value_size = attr->value_size;
 
  queue_size = sizeof(*qs) + (u64) value_size * size;

  ......

qs = bpf_map_area_alloc(queue_size, numa_node);
......

bpf_map_init_from_attr(&qs->map, attr);

  ......

qs->size = size;

  return &qs->map;

......

struct bpf_queue_stack *qs;

u32 size, value_size;

u64 queue_size, cost;

size = attr->max_entries + 1;

value_size = attr->value_size;

queue_size = sizeof(*qs) + (u64) value_size * size;

......

qs = bpf_map_area_alloc(queue_size, numa_node);

......

bpf_map_init_from_attr(&qs->map, attr);

......

qs->size = size;

return &qs->map;

void *bpf_map_area_alloc(size_t size, int numa_node)
{

    ......

  area = kmalloc_node(size, GFP_USER | flags, numa_node);

        if (area != NULL)

            return area;

    ......
}

void *bpf_map_area_alloc(size_t size, int numa_node)

{

......

area = kmalloc_node(size, GFP_USER | flags, numa_node);

if (area != NULL)

return area;

......

}

struct bpf_queue_stack {

    struct bpf_map map;

    raw_spinlock_t lock;

    u32 head, tail;

    u32 size; /* max_entries + 1 */
 
    char elements[0] __aligned(8);
};

struct bpf_queue_stack {

struct bpf_map map;

raw_spinlock_t lock;

u32 head, tail;

u32 size; /* max_entries + 1 */

char elements[0] __aligned(8);

};

void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)
{

    map->map_type = attr->map_type;

    map->key_size = attr->key_size;

    map->value_size = attr->value_size;

    map->max_entries = attr->max_entries;

    map->map_flags = attr->map_flags;

    map->numa_node = bpf_map_attr_numa_node(attr);
}

void bpf_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr)

{

map->map_type = attr->map_type;

map->key_size = attr->key_size;

map->value_size = attr->value_size;

map->max_entries = attr->max_entries;

map->map_flags = attr->map_flags;

map->numa_node = bpf_map_attr_numa_node(attr);

}

/* find map type and init map: hashtable vs rbtree vs bloom vs ... */

    map = find_and_alloc_map(attr);

    if (IS_ERR(map))

        return PTR_ERR(map);

/* find map type and init map: hashtable vs rbtree vs bloom vs ... */

map = find_and_alloc_map(attr);

if (IS_ERR(map))

return PTR_ERR(map);

static int map_update_elem(union bpf_attr *attr)
{

    void __user *ukey = u64_to_user_ptr(attr->key);

    void __user *uvalue = u64_to_user_ptr(attr->value);

    int ufd = attr->map_fd;

    struct bpf_map *map;

    void *key, *value;

    u32 value_size;

    struct fd f;

    int err;

  ......

    f = fdget(ufd);

    map = __bpf_map_get(f);

    if (IS_ERR(map))

        return PTR_ERR(map);

    ......

    value_size = map->value_size;

    value = kmalloc(value_size, GFP_USER | __GFP_NOWARN);    //kmalloc申请一块堆空间

    if (!value)

        goto free_key;

    err = -EFAULT;

    if (copy_from_user(value, uvalue, value_size) != 0)    //将用户的uvalue拷贝value_size字节到申请出的value上。

        goto free_value;

    ......

    else if (map->map_type == BPF_MAP_TYPE_QUEUE ||

           map->map_type == BPF_MAP_TYPE_STACK) {

        err = map->ops->map_push_elem(map, value, attr->flags);//将value的内容放入map

    ......
}

static int map_update_elem(union bpf_attr *attr)

{

void __user *ukey = u64_to_user_ptr(attr->key);

void __user *uvalue = u64_to_user_ptr(attr->value);

int ufd = attr->map_fd;

struct bpf_map *map;

void *key, *value;

u32 value_size;

struct fd f;

int err;

......

f = fdget(ufd);

map = __bpf_map_get(f);

if (IS_ERR(map))

return PTR_ERR(map);

......

value_size = map->value_size;

value = kmalloc(value_size, GFP_USER | __GFP_NOWARN); //kmalloc申请一块堆空间

if (!value)

goto free_key;

err = -EFAULT;

if (copy_from_user(value, uvalue, value_size) != 0) //将用户的uvalue拷贝value_size字节到申请出的value上。

goto free_value;

......

else if (map->map_type == BPF_MAP_TYPE_QUEUE ||

map->map_type == BPF_MAP_TYPE_STACK) {

err = map->ops->map_push_elem(map, value, attr->flags);//将value的内容放入map

......

}

/* Called from syscall or from eBPF program */

static int queue_stack_map_push_elem(struct bpf_map *map, void *value,

                     u64 flags)
{

    struct bpf_queue_stack *qs = bpf_queue_stack(map);

    unsigned long irq_flags;

    int err = 0;

    void *dst;
 
    /* BPF_EXIST is used to force making room for a new element in case the

     * map is full

     */

    bool replace = (flags & BPF_EXIST);
 
    /* Check supported flags for queue and stack maps */

    if (flags & BPF_NOEXIST || flags > BPF_EXIST)

        return -EINVAL;
 
    raw_spin_lock_irqsave(&qs->lock, irq_flags);
 
    if (queue_stack_map_is_full(qs)) {

        if (!replace) {

            err = -E2BIG;

            goto out;

        }

        /* advance tail pointer to overwrite oldest element */

        if (unlikely(++qs->tail >= qs->size))

            qs->tail = 0;

    }
 
    dst = &qs->elements[qs->head * qs->map.value_size]; //qs->head代表当前是第几个entry

    memcpy(dst, value, qs->map.value_size);                            //堆溢出位置。
 
    if (unlikely(++qs->head >= qs->size))

        qs->head = 0;
 
out:

    raw_spin_unlock_irqrestore(&qs->lock, irq_flags);

    return err;
}

/* Called from syscall or from eBPF program */

static int queue_stack_map_push_elem(struct bpf_map *map, void *value,

u64 flags)

{

struct bpf_queue_stack *qs = bpf_queue_stack(map);

unsigned long irq_flags;

int err = 0;

void *dst;

/* BPF_EXIST is used to force making room for a new element in case the

* map is full

*/

bool replace = (flags & BPF_EXIST);

/* Check supported flags for queue and stack maps */

if (flags & BPF_NOEXIST || flags > BPF_EXIST)

return -EINVAL;

raw_spin_lock_irqsave(&qs->lock, irq_flags);

if (queue_stack_map_is_full(qs)) {

if (!replace) {

err = -E2BIG;

goto out;

}

/* advance tail pointer to overwrite oldest element */

if (unlikely(++qs->tail >= qs->size))

qs->tail = 0;

}

dst = &qs->elements[qs->head * qs->map.value_size]; //qs->head代表当前是第几个entry

memcpy(dst, value, qs->map.value_size); //堆溢出位置。

if (unlikely(++qs->head >= qs->size))

qs->head = 0;

out:

raw_spin_unlock_irqrestore(&qs->lock, irq_flags);

return err;

}

struct bpf_queue_stack {

    struct bpf_map map;

    raw_spinlock_t lock;

    u32 head, tail;

    u32 size; /* max_entries + 1 */
 
    char elements[0] __aligned(8);
};
 
struct bpf_map {

    /* The first two cachelines with read-mostly members of which some

     * are also accessed in fast-path (e.g. ops, max_entries).

     */

    const struct bpf_map_ops *ops ____cacheline_aligned;

    struct bpf_map *inner_map_meta;
#ifdef CONFIG_SECURITY

    void *security;
#endif

    enum bpf_map_type map_type;

    u32 key_size;

    u32 value_size;

    u32 max_entries;

    u32 map_flags;

    u32 pages;

    u32 id;

    int numa_node;

    u32 btf_key_type_id;

    u32 btf_value_type_id;

    struct btf *btf;

    bool unpriv_array;

    /* 55 bytes hole */
 
    /* The 3rd and 4th cacheline with misc members to avoid false sharing

     * particularly with refcounting.

     */

    struct user_struct *user ____cacheline_aligned;

    atomic_t refcnt;

    atomic_t usercnt;

    struct work_struct work;

    char name[BPF_OBJ_NAME_LEN];
};

struct bpf_queue_stack {

struct bpf_map map;

raw_spinlock_t lock;

u32 head, tail;

u32 size; /* max_entries + 1 */

char elements[0] __aligned(8);

};

struct bpf_map {

/* The first two cachelines with read-mostly members of which some

* are also accessed in fast-path (e.g. ops, max_entries).

*/

const struct bpf_map_ops *ops ____cacheline_aligned;

struct bpf_map *inner_map_meta;

登录后可查看完整内容

[招生]系统0day安全班，企业级设备固件漏洞挖掘，Linux平台漏洞挖掘！

最后于 2021-2-28 21:17 被Roland_编辑，原因：

收藏・8

免费・11

支持

打赏 + 2.00雪花

kanxue

打赏次数 1

雪花 + 2.00

kanxue

+2.00

2021/02/28

精品文章～

最新回复 (4)
r0Cat 雪币： 8715 活跃值： (8619) 能力值： ( LV13，RANK：570 ) 在线值：发帖 40 回帖 190 粉丝 196 关注私信	r0Cat 7 2 楼针不戳 2021-2-28 10:03 0
kanxue 雪币： 50121 活跃值： (20750) 能力值： (RANK：350 ) 在线值：发帖 2375 回帖 17049 粉丝 553 关注私信	kanxue 8 3 楼感谢分享！ 2021-2-28 10:33 0
pureGavin 雪币： 14735 活跃值： (17849) 能力值： ( LV12，RANK：290 ) 在线值：发帖 85 回帖 1423 粉丝 277 关注私信	pureGavin 3 4 楼感谢分享 2021-2-28 22:05 0
xkaneiki 雪币： 220 能力值： ( LV1，RANK：0 ) 在线值：发帖 0 回帖 1 粉丝 0 关注私信	xkaneiki 5 楼大佬，你的4.20rc3的内核怎么编译的，我这里编译一直出错。 2021-4-12 22:32 0
	游客登录 \| 注册方可回帖回帖表情雪币赚取及消费高级回复

Roland_

发帖

102

回帖

710

RANK

关注

私信

他的文章

关于我们

联系我们

企业服务

看雪公众号

最新回复 (4)
r0Cat 雪币： 8715 活跃值： (8619) 能力值： ( LV13，RANK：570 ) 在线值：发帖 40 回帖 190 粉丝 196 关注私信	r0Cat 7 2 楼针不戳 2021-2-28 10:03 0
kanxue 雪币： 50121 活跃值： (20750) 能力值： (RANK：350 ) 在线值：发帖 2375 回帖 17049 粉丝 553 关注私信	kanxue 8 3 楼感谢分享！ 2021-2-28 10:33 0
pureGavin 雪币： 14735 活跃值： (17849) 能力值： ( LV12，RANK：290 ) 在线值：发帖 85 回帖 1423 粉丝 277 关注私信	pureGavin 3 4 楼感谢分享 2021-2-28 22:05 0
xkaneiki 雪币： 220 能力值： ( LV1，RANK：0 ) 在线值：发帖 0 回帖 1 粉丝 0 关注私信	xkaneiki 5 楼大佬，你的4.20rc3的内核怎么编译的，我这里编译一直出错。 2021-4-12 22:32 0
	游客登录 \| 注册方可回帖回帖表情雪币赚取及消费高级回复