64位处理器采用更高级的地址变换方式。例如,Itanium处理器允许所谓的硬件级的数据执行保护(DEP)。很多人误以为DEP是Windows XP SP2的功能。并非如此 - DEP是CPU的功能,可以被Windows XP SP2利用。如果Windows XP SP2运行在一个不支持DEP的CPU上,DEP就不会发挥作用 - 在没有DEP功能的处理器上,没有办法去手动阻止机器指令的执行。我们的讨论是基于32位处理器和4KB页面大小进行的。
struct CallGateDescriptor{
WORD offset_low;
WORD selector;
BYTE param_count : 5;
BYTE unused : 3;
BYTE type : 5;
BYTE dpl : 2;
BYTE present : 1;
WORD offset_high;
};
Windows NT中不使用LDT和中断门。尽管考虑到性能的因素,Windows NT下所有用户进程运行在一个单任务的环境中,GDT中没有TSS描述符。它们主要被保留用来做“异常环境”,也就是系统崩溃 - 它们的任务是确保CPU重启前系统有足够长的时间运行来抛出一个蓝屏。我们对它们不感兴趣。那段描述符如何?你可能会认为,一旦Windows使用平坦内存模式,我们就不必关心段描述符。
How does the system map IRQs to interrupt vectors and define their priority? It depends on whether your machine supports Advanced Programmable Interrupt Controller (APIC). This can be discovered by CPUID instruction and read from APIC_BASE_MSR model-specific register. If APIC is present and you make CPUID instruction with 1 in EAX, bit 9 of EDX register will be set by this instruction. In order to find out whether APIC is enabled, you have to read the APIC_BASE_MSR model-specific register - bit 11 of it must be set if APIC is enabled. Unless your computer is completely outdated, I am 99.9% sure that APIC is present and enabled on your machine. If it is not, then the interrupt vector, corresponding to some given IRQ, equals 0x30+IRQ, so that timer (IRQ0) interrupt vector is 0x30, keyboard (IRQ1) interrupt vector is 0x31, etc. This how how Windows NT maps hardware interrupts if APIC is not present or disabled. In such cases interrupt priority is implied by IRQ - there is nothing than can be done here.
If APIC is present and enabled, things become much more interesting to program. Every CPU in the system has its own local APIC, physical address of which is specified by APIC_BASE_MSR model-specific register. Local APIC can be programmed by reading from and writing to its registers. For example, processor's IRQL can be manipulated via Task Priority register, which is located at the offset of 0x80 from the local APIC's base address - this is what KeRaiseIrql() and KeLowerIrql() do. If you want to raise an interrupt, you can do it via Interrupt Command register, which is located at the offset of 0x300 from the local APIC's base address - this is what HalRequestSoftwareInterrupt() does. You can also specify whether you want the CPU to interrupt itself or whether you want interrupt to be dispensed to all CPUs in the system. Local APIC programming is quite an extensive topic, so it is well beyond the scope of this article. If you need more information, I would strongly advise you to read Volume 3 of Intel Developer's Manual.
All local APICs communicate with IO APIC, which is located on the motherboard, via APIC bus. IO APIC maps IRQs to interrupt vectors, and it is able to map up to 24 interrupts. IO APIC can be programmed by reading from and writing to its registers. These are 32-bit ID Register (located at the offset of 0), 32-bit Version Register (located at the offset of 0X1), 32-bit Arbitration Register (located at the offset of 0X2), and 24 64-bit Redirection Table Registers, with every Redirection Table Register corresponding to some given IRQ. The location of Redirection Table Register, corresponding to any given IRQ, can be calculated as 0X10+2*IRQ. If you want to know the binary layout of Redirection Table, I suggest you should read Intel IOAPIC manual - we are interested only in 8 low-order bits of Redirection Table, because they indicate interrupt vector that corresponds to the given IRQ. Interrupt priority can be calculated as vector/16, and, once operating system designers can map IRQs to interrupt vectors in any way they wish, they can assign any interrupt priority level to any given IRQ.
IO APIC uses indirect addressing scheme, which means all the above mentioned registers cannot be accessed directly. How can they be accessed then??? IO APIC provides 2 direct access registers for this purpose. These are IOREGSEL and IOWIN registers, located at the offsets of respectively 0 and 0X10 from IO APIC's base address. IO APIC is mapped to the physical memory at the address 0XFEC00000. Although Intel allows operating system designers to relocate IO APIC to some other physical address, Windows NT does not relocate it. Therefore, we will make a bold assumption that IO APIC is located at the physical address 0XFEC00000 on your machine, so that physical addresses of IOREGSEL and IOWIN registers are respectively 0XFEC00000 and 0XFEC00010. In order to access these registers, you have to map them to non-cached memory. In order to read any indirect access register, you have to write its offset to IOREGSEL register - subsequent read of IOWIN register will return the value of the target indirect access register. All reads are 32-bit. If you want to read 32 low-order or 32 high-order bits of Redirection Table Register that corresponds to some given IRQ, you have to write respectively 0X10+2*IRQ or 0X10+2*IRQ+1 to IOREGSEL register, and then read IOWIN register in order to get the sought information.
How are we going to map IO APIC to the virtual memory? If we used a regular driver, we would call MmMapIoSpace(). However, in our case things are slightly different. If CPU treats our code as a privileged one, it does not necessarily imply that Windows always shares its opinion on the subject - everything depends on what you want to do. Some ntoskrnl.exe's exports ( for example, ExAllocatePool()) can be called by our code without a slightest problem, but MmMapIoSpace() is not among them - if our code calls MmMapIoSpace(), we will get a blue screen with IRQL_NOT_LESS_OR_EQUAL error code. What are we going to do then? This is when our trick with mapping some page to the virtual address 0 comes handy, so we are going to use it.
The code below maps IO APIC to the virtual address 0, and obtains interrupt vector that corresponds to some given IRQ:
//map ioapic - make sure that we map
//it to non-cached memory.
_asm{
mov ebx,0xfec00000
or ebx,0x13
mov eax,0xc0000000
mov dword ptr[eax],ebx
}//now we are about to get
//interrupt vector
PULONG array=NULL;
//write 0x10+2*irq to IOREGSEL
array[0]=0x10+2*irq;
// subsequent read from IOWIN returns 32
// low-order bits of Redirection Table
//that corresponds to our IRQ.
// 8 low-order bits are interrupt vector,
// corresponding to our IRQ
DWORD vector=(array[4]&0xff);
As you can see, IO APIC programming is among those things that are easily done than explained - so much explanation and only few simple lines of code. But why did we choose to map IO APIC to 0, rather than to some more conventional address? Just because the address 0 is guaranteed to be unused, so mapping IO APIC to this address is the very first thing that gets into the head.
Putting it all together
Now let's put it all together. Look at the code below - it calls the kernel function:
// now we will get interrupt vectors
DWORD res;
DWORD resultarray[24];
ZeroMemory(resultarray,sizeof(resultarray));
for (x=0;x<25;x++){
//let's call the function via the
//call gate. Are you ready???
WORD farcall[3];
farcall[2] = (selector<<3);
_asm {
mov ebx,x
push ebx
call fword ptr [farcall]
mov res,eax
}
if(x==24)break;
//if the return value is 500 and this
//was not the final invocation,
//apic is not present. Inform the user
//about it, and that't it
if(res==500) {
MessageBox(GetDesktopWindow(), "APIC is not supported", "IRQs",MB_OK);
break; }
resultarray[x]=res;}
There is no way to make a far call via the call gate in C, so we have no option other than calling the kernel function from ASM block. The client code in itself is straightforward - it pushes the value of IRQ on the stack, calls the kernel function via the call gate, and saves the result in the array. It does so for IRQs 0 to 23, plus makes a final invocation with non-existent IRQ24. Upon the receipt of 24 as a parameter, in order to make sure that no traces of our experiments are left anywhere, the kernel function cleans up the call gate in GDT. After having obtained all the information about all IRQs, we will inform the user about each IRQ with MessageBox(). I hope there is no need to list this code here.
void kernelfunction(DWORD usercs,DWORD irq){
DWORD absent =0;
BYTE gdtr[8];
//check if ioapic is
//present and enabled
if(irq<=23) {
_asm {
mov eax,1
cpuid
and edx, 0x00000200
cmp edx,0
jne skip1
mov absent,1
skip1: mov ecx,0x1b
rdmsr
and eax,0x00000800
cmp eax,0
jne skip2
mov absent,1 }
//if APIC is enabled, get vector
//from it and return
skip2: if(!absent) {
//map ioapic - make sure that we
//map it to non-cached memory.
//Certainly,we have /to do it only upon the
//function's very first invocation,
//i.e. when irq is 0
if(!irq) {
_asm {
mov ebx,0xfec00000
or ebx,0x13
mov eax,0xc0000000
mov dword ptr[eax],ebx } }
//now we are about to get
//interrupt vector
PULONG array=NULL;
//write 0x10+2*irq to IOREGSEL
array[0]=0x10+2*irq;
// subsequent read from IOWIN returns
// 32 low-order bits of Redirection Table
//that corresponds to our IRQ.
// 8 low-order bits are interrupt vector,
// corresponding to our IRQ
DWORD vector=(array[4]&0xff);
// return interrupt vector. Dont forget
// that we must return with RETF,
// and pop 4 bytes off the stack
_asm {
//
mov eax,vector
mov esp,ebp
pop ebp
retf 4 } } }
//either apic is not supported, or irq is
//above 23,i.e. this is the last invocation
//therefore, clean up gdt and return 500
_asm {
//clean up gdt
sgdt gdtr
lea eax,gdtr
mov ebx,dword ptr[eax+2]
mov eax,0
mov ax,selector
shl eax,3
add ebx,eax
mov dword ptr[ebx],0
mov dword ptr[ebx+4],0
// adjust stack and return
mov eax,500
mov esp,ebp
pop ebp
retf 4 }}
Once our kernel function declares the local variables, and, hence, needs a standard function prolog, it does not make sense to write it as a naked routine . Our function is supposed to take only 1 parameter, but, once it is going to get invoked via the call gate, CPU will push the value of user-mode CS on the stack below the return address. Do you know a way of explaining it to the compiler? Me neither. Therefore, to make sure that the compiler generates the code properly, we present this extra value on the stack as a function parameter - we are going to ignore it anyway. If IRQ parameter is 24, i.e. this is the function's final invocation, or if APIC is disabled, kernelfunction() cleans up GDT and returns with the error code 500. If everything is OK, it maps IO APIC to the virtual address 0, obtains interrupt vector, corresponding to IRQ parameter, and returns this vector. There is nothing special here. The only thing worth mentioning is that we have to restore EBP and ESP registers before we return - this is very important. It is understandable that we have to return with RETF instruction, and to pop 4 bytes off the stack.
There is one more thing left to deal with - we have to make sure that our code is suitable for running on both uni-processor and SMP machines. With the advent of hyperthreading technology (HT), we should always make an assumption that our code may run on SMP machine - CPU that supports HT is treated as two independent processors by Windows, and, as far as I am concerned, Intel does not produce CPUs without support for HT any more. Under Windows running on SMP machine, every CPU has its own GDT, and any thread may run on any CPU in the system by default. I hope you can imagine the mess we are guaranteed to create by allowing our code to be executed by different processors - we may set up a call gate while running on CPU A and try to enter the kernel while running on CPU B. Therefore, we have to prevent our code from running on more than one processor. This can be done in the following way (go() is the function that runs all the user-mode code that you've seen in this article):
If the target machine has more that one processor, calling SetThreadAffinityMask() makes sure that our code is allowed to run only on the processor that we have specified. Calling SetThreadAffinityMask() on uni-processor machine does not result in error - this call just has no effect. Therefore, the above adjustment is suitable for both uni-processor and SMP machines.
Conclusion
In conclusion I want to say a few words of warning. First of all, the functionality of our privileged code will always be limited, compared to that of conventional kernel-mode driver. You already know that not all ntoskrnl.exe's exports may be safely called by our code (MmMapIoSpace() is just one example). Therefore, you should use this approach sparingly. Second, I would not advise you to use any of these tricks in production applications - they are intended to be used only in the development of analysis tools, intrusion detection systems and other "unsupported" software. The problem with all unsupported tricks is that they may be system-specific. To make things even worse, they may be hardware-specific - the code that does not pose even a slightest problem on machine A can crash machine B, even if they both run the same version of Windows. Therefore, if your code works perfectly well on your development machine, it is too early to celebrate victory - you never know how it may behave on some other platform.
The sample application has been thoroughly tested on my machine, which runs Windows XP SP2 - it works perfectly well and does not seem to pose even a slightest problem. However, I really don't know how it is going to behave on your system - it is your task to find it out. If something goes wrong, don't hesitate to inform me about it. In such cases, it would be great if you could provide me with some info about your machine (CPU, motherboard, OS version, etc.), as well as the description of the problem - who knows, maybe there are some more hidden bugs that are to be fixed. In order to run the sample, the only thing you have to do is to click on its bitmap, and then wait for message boxes - it may take a few seconds before they pop up, so you have to be patient.
I would highly appreciate if you send me an e-mail with your comments and suggestions.