Professional Diary: January 2012

Structure of an IRP

The I/O Manager creates this MDL for IRP_MJ_READ and IRP_MJ_WRITE requests if the topmost device object’s flags indicate DO_DIRECT_IO.

Flags (ULONG) contains flags that a device driver can read but not directly alter. None of these flags are relevant to a Windows Driver Model (WDM) driver.

AssociatedIrp (union) is a union of three possible pointers. The alternative that a typical WDM driver might want to access is named AssociatedIrp.SystemBuffer. The SystemBuffer pointer holds the address of a data buffer in nonpaged kernel-mode memory.

RequestorMode will equal one of the enumeration constants UserMode or KernelMode,

Map of the Tail union in an IRP

I/O stack location data structure

The “Standard Model” for IRP Processing

Creating an IRP:
You can use any of four functions to create a new IRP:

IoBuildAsynchronousFsdRequest builds an IRP on whose completion you don’t plan to wait. This function and the next are appropriate for building only certain types of IRP.
IoBuildSynchronousFsdRequest builds an IRP on whose completion you do plan to wait.
IoBuildDeviceIoControlRequest builds a synchronous IRP_MJ_DEVICE_CONTROL or IRP_MJ_INTERNAL_DEVICE_CONTROL request.
IoAllocateIrp builds an asynchronous IRP of any type

Creating Synchronous IRPs
1. If the owning thread terminates, the I/O Manager automatically cancels any pending synchronous IRPs that belong to that thread
2. Because the creating thread owns a synchronous IRP, you shouldn’t create one in an arbitrary thread—you most emphatically do not want the I/O Manager to cancel the IRP because this thread happens to terminate
3. Following a call to IoCompleteRequest, the I/O Manager automatically cleans up a synchronous IRP and signals an event that you must provide.
4. You must take care that the event object still exists at the time the I/O Manager signals it

You must call IoBuildSynchronousFsdRequest and IoBuildDeviceIoControlRequest functions at PASSIVE_LEVEL only

If you need to synchronize IRPs sent to another driver, consider the following alternatives:

Use a regular kernel mutex instead of an executive fast mutex. The regular mutex leaves you at PASSIVE_LEVEL and doesn’t inhibit special kernel APCs.
Use KeEnterCriticalRegion to inhibit all but special kernel APCs, and then use ExAcquireFastMutexUnsafe to acquire the mutex.
Use an asynchronous IRP. Signal an event in the completion routine.

A final consideration in calling the two synchronous IRP routines is that you can’t create just any kind of IRP using these routines. See Table 5-1 for the details. A common trick for creating another kind of synchronous IRP is to ask for an IRP_MJ_SHUTDOWN, which has no parameters, and then alter the MajorFunction code in the first stack location.

Table 5-1. Synchronous IRP Types
Support Function	Types of IRP You Can Create
IoBuildSynchronousFsdRequest	IRP_MJ_READ IRP_MJ_WRITE IRP_MJ_FLUSH_BUFFERS IRP_MJ_SHUTDOWN IRP_MJ_PNP IRP_MJ_POWER (but only for IRP_MN_POWER_SEQUENCE)
IoBuildDeviceIoControlRequest	IRP_MJ_DEVICE_CONTROL IRP_MJ_INTERNAL_DEVICE_CONTROL

Creating Asynchronous IRPs

The other two IRP creation functions—IoBuildAsynchronousFsdRequest and IoAllocateIrp—create an asynchronous IRP. Asynchronous IRPs don’t belong to the creating thread, and the I/O Manager doesn’t schedule an APC and doesn’t clean up when the IRP completes

When a thread terminates, the I/O Manager doesn’t try to cancel any asynchronous IRPs that you happen to have created in that thread.
It’s OK to create asynchronous IRPs in an arbitrary or nonarbitrary thread.
Because the I/O Manager doesn’t do any cleanup when the IRP completes, you must provide a completion routine that will release buffers and call IoFreeIrp to release the memory used by the IRP.
Because the I/O Manager doesn’t automatically cancel asynchronous IRPs, you might have to provide code to do that when you no longer want the operation to occur
Because you don’t wait for an asynchronous IRP to complete, you can create and send one at IRQL <= DISPATCH_LEVEL (assuming, that is, that the driver to which you send the IRP can handle the IRP at elevated IRQL—you must check the specifications for that driver!). Furthermore, it’s OK to create and send an asynchronous IRP while owning a fast mutex

Refer to Table 5-2 for a list of the types of IRP you can create using the two asynchronous IRP routines. Note that IoBuildSynchronousFsdRequest and IoBuildAsynchronousFsdRequest support the same IRP major function codes.

Table 5-2. Asynchronous IRP Types
Support Function	Types of IRP You Can Create
IoBuildAsynchronousFsdRequest	IRP_MJ_READ IRP_MJ_WRITE IRP_MJ_FLUSH_BUFFERS IRP_MJ_SHUTDOWN IRP_MJ_PNP IRP_MJ_POWER (but only for IRP_MN_POWER_SEQUENCE)
IoAllocateIrp	Any (but you must initialize the MajorFunction field of the first stack location)

Forwarding to a Dispatch Routine

After you create an IRP, you call IoGetNextIrpStackLocation to obtain a pointer to the first stack location. Then you initialize just that first location. If you’ve used IoAllocateIrp to create the IRP, you need to fill in at least the MajorFunction code. If you’ve used another of the four IRP-creation functions, the I/O Manager might have already done the required initialization. You might then be able to skip this step, depending on the rules for that particular type of IRP. Having initialized the stack, you call IoCallDriver to send the IRP to a device driver

The I/O Manager initializes the stack location pointer in the IRP to 1 before the actual first location.

Locating Device Objects

IoGetDeviceObjectPointer,

The StartIo Routine
A StartIo routine generally receives control at DISPATCH_LEVEL, meaning that it must not generate any page faults.

Programming the Microsoft Windows Driver Model - Walter Oney

The Two Basic Data Structures

The driver object represents the driver itself and contains pointers to all the driver subroutines that the system will ever call on its own motion.

The device object represents an instance of hardware and contains data to help you manage that instance.

The DRIVER_OBJECT data structure

The DRIVER_EXTENSION data structure.

The DEVICE_OBJECT data structure

Flags in a DEVICE_OBJECT Data Structure

Flag	Description
DO_BUFFERED_IO	Reads and writes use the buffered method (system copy buffer) for accessing user-mode data.
DO_EXCLUSIVE	Only one thread at a time is allowed to open a handle.
DO_DIRECT_IO	Reads and writes use the direct method (memory descriptor list) for accessing user-mode data.
DO_DEVICE_INITIALIZING	Device object isn’t initialized yet.
DO_POWER_PAGABLE	IRP_MJ_PNP must be handled at PASSIVE_LEVEL.
DO_POWER_INRUSH	Device requires large inrush of current during power-on.

Characteristics Flags in a DEVICE_OBJECT Data Structure

Flag	Description
FILE_REMOVABLE_MEDIA	Media can be removed from device.
FILE_READ_ONLY_DEVICE	Media can only be read, not written.
FILE_FLOPPY_DISKETTE	Device is a floppy disk drive.
FILE_WRITE_ONCE_MEDIA	Media can be written once.
FILE_REMOTE_DEVICE	Device accessible through network connection.
FILE_DEVICE_IS_MOUNTED	Physical media is present in device.
FILE_VIRTUAL_VOLUME	This is a virtual volume.
FILE_AUTOGENERATED_DEVICE_NAME	I/O Manager should automatically generate a name for this device.
FILE_DEVICE_SECURE_OPEN	Force security check during open.

The DriverEntry Routine

The global initialization of the driver is done by the call to DriverEntry

extern "C" NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject,
  IN PUNICODE_STRING RegistryPath)

The AddDevice Routine

NTSTATUS AddDevice(PDRIVER_OBJECT DriverObject, 
  PDEVICE_OBJECT pdo)

The basic responsibility of AddDevice in a function driver is to create a device object and link it into the stack rooted in this PDO.

Call IoCreateDevice to create a device object and an instance of your own device extension object.
Register one or more device interfaces so that applications know about the existence of your device. Alternatively, give the device object a name and then create a symbolic link.
Next initialize your device extension and the Flags member of the device object.
Call IoAttachDeviceToDeviceStack to put your new device object into the stack.

Should you discover an error after call to IoCreateDevice, you should release the device object and return a status code.

Chapter 4 - Synchronization

Interrupt Request Level

Most of the time, the computer executes in user mode at PASSIVE_LEVEL. All of your knowledge about how multitasking operating systems work applies at PASSIVE_LEVEL.

An activity on a given CPU can be interrupted only by an activity that executes at a higher IRQL.

Expiration of a time slice eventually invokes the thread scheduler at DISPATCH_LEVEL. The scheduler can then make a different thread current. When the IRQL returns to PASSIVE_LEVEL, a different thread is running.

An activity on a given CPU can be interrupted only by an activity that executes at a higher IRQL. An activity at or above DISPATCH_LEVEL cannot be suspended to perform another activity at or below the then-current IRQL.

In addition, certain driver subroutines, such as DriverEntry and AddDevice, execute at PASSIVE_LEVEL in the context of a system thread. In all of these cases, the driver code can be preempted just as a user-mode application can be.

StartIo routine, deferred procedure call (DPC) routines execute at DISPATCH_LEVEL

Code executing at or above DISPATCH_LEVEL must not cause page faults.

One implication of this rule is that any of the subroutines in your driver that execute at or above DISPATCH_LEVEL must be in nonpaged memory. Furthermore, all the data you access in such a subroutine must also be in nonpaged memory. Finally, as IRQL rises, fewer and fewer kernel-mode support routines are available for your use.

It’s a mistake (and a big one!) to lower IRQL below whatever it was when a system routine called your driver, even if you raise it back before returning. Such a break in synchronization might allow some activity to preempt you and interfere with a data object that your caller assumed would remain inviolate.

Spin Locks

Some Facts About Spin Locks

If a CPU already owns a spin lock and tries to obtain it a second time, the CPU will deadlock.

Acquiring a spin lock raises the IRQL to DISPATCH_LEVEL automatically. Consequently, code that acquires a lock must be in nonpaged memory and must not block the thread in which it runs. (There is an exception in Windows XP and later systems. KeAcquireInterruptSpinLock raises the IRQL to the DIRQL for an interrupt and claims the spin lock associated with the interrupt.)

On a uniprocessor system, acquiring a spin lock raises the IRQL to DISPATCH_LEVELand does nothing else.

Kernel Dispatcher Objects

Your DriverEntry and AddDevice routines are called in a system thread that you can block if you need to. You receive IRP_MJ_PNP requests in a system thread too.

Block only the thread that originated the request you’re working on, and only when executing at IRQL strictly less than DISPATCH_LEVEL.

Waiting on a Single Dispatcher Object

KeWaitForSingleObject

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LARGE_INTEGER timeout;
NTSTATUS status = KeWaitForSingleObject(object, WaitReason,
  WaitMode, Alertable, &timeout);

The object must be in nonpaged memory

WaitReason is a purely advisory value chosen from the KWAIT_REASON enumeration. No code in the kernel actually cares what value you supply here, so long as you don’t specify WrQueue.

WaitMode is one of the two values of the MODE enumeration: KernelMode or UserMode. Alertable is a simple Boolean value.

The bottom line: you should probably always wait in KernelMode and specify FALSE for the Alertable parameter.

The last parameter to KeWaitForSingleObject is the address of a 64-bit timeout value, expressed in 100-nanosecond units. A positive number for the timeout is an absolute timestamp relative to the January 1, 1601, epoch of the system clock. You can determine the current time by calling KeQuerySystemTime, and you can add a constant to that value. A negative number is an interval relative to the current time. If you specify an absolute time, a subsequent change to the system clock alters the duration of the timeout you might experience. That is, the timeout doesn’t expire until the system clock equals or exceeds whatever absolute value you specify. In contrast, if you specify a relative timeout, the duration of the timeout you experience is unaffected by changes in the system clock.

If you’re executing at DISPATCH_LEVEL, you must specify a zero timeout because blocking is not allowed.

Specifying a NULL pointer for the timeout parameter is OK and indicates an infinite wait.

Two other return values are possible. STATUS_ALERTED and STATUS_USER _APC mean that the wait has terminated without the object having been signaled because the thread has received an alert or a user-mode APC, respectively

Note that STATUS_TIMEOUT, STATUS_ALERTED, and STATUS_USER_APC all pass the NT_SUCCESS test. Therefore, don’t simply use NT_SUCCESS on the return code from KeWaitForSingleObject in the expectation that it will distinguish between cases in which the object was signaled and cases in which the object was not signaled.

Waiting on Multiple Dispatcher Objects

ASSERT(KeGetCurrentIrql() <= DISPATCH_LEVEL);
LARGE_INTEGER timeout;
NTSTATUS status = KeWaitForMultipleObjects(count, objects, 
  WaitType, WaitReason, WaitMode, Alertable, &timeout, waitblocks);

Here objects is the address of an array of pointers to dispatcher objects, and count is the number of pointers in the array. The count must be less than or equal to the value MAXIMUM_WAIT_OBJECTS, which currently equals 64.

The array, as well as each of the objects to which the elements of the array point, must be in nonpaged memory. WaitType is one of the enumeration values WaitAll or WaitAny and specifies whether you want to wait until all of the objects are simultaneously in the signaled state or whether, instead, you want to wait until any one of the objects is signaled.

The waitblocks argument points to an array of KWAIT_BLOCK structures that the kernel will use to administer the wait operation. You don’t need to initialize these structures in any way—the kernel just needs to know where the storage is for the group of wait blocks that it will use to record the status of each of the objects during the wait. If you’re waiting for a small number of objects (specifically, a number no bigger than THREAD_WAIT_OBJECTS, which currently equals 3), you can supply NULL for this parameter. If you supply NULL, KeWaitForMultipleObjects uses a preallocated array of wait blocks that lives in the thread object. If you’re waiting for more objects than this, you must provide nonpaged memory that’s at least count * sizeof(KWAIT_BLOCK) bytes in length.

Kernel Events

ASSERT(KeGetCurrentIrql() == PASSIVE_LEVEL);
KeInitializeEvent(event, EventType, initialstate);

Event is the address of the event object. EventType is one of the enumeration values NotificationEvent and SynchronizationEvent. A notification event has the characteristic that, when it is set to the signaled state, it stays signaled until it’s explicitly reset to the not-signaled state. Furthermore, all threads that wait on a notification event are released when the event is signaled. This is like a manual-reset event in user mode. A synchronization event, on the other hand, gets reset to the not-signaled state as soon as a single thread gets released. This is what happens in user mode when someone calls SetEvent on an auto-reset event object. The only operation performed on an event object by KeWaitXxx is to reset a synchronization event to not-signaled. Finally, initialstate is TRUE to specify that the initial state of the event is to be signaled and FALSE to specify that the initial state is to be not-signaled.

Kernel Semaphores

You initialize a semaphore at PASSIVE_LEVEL

KeInitializeSemaphore(semaphore, count, limit);

LONG wassignaled = KeReleaseSemaphore(semaphore, boost,delta, wait);

Kernel Mutexes

When a thread gains control of a mutex after calling one of the KeWaitXxx routines, the kernel also prevents delivery of any but special kernel APCs to help avoid possible deadlocks.

It’s generally better to use an executive fast mutex rather than a kernel mutex. The main difference between the two is that acquiring a fast mutex raises the IRQL to APC_LEVEL, whereas acquiring a kernel mutex doesn’t change the IRQL. Among the reasons you care about this fact is that completion of so-called synchronous IRPs requires delivery of a special kernel-mode APC, which cannot occur if the IRQL is higher than PASSIVE_LEVEL. Thus, you can create and use synchronous IRPs while owning a kernel mutex but not while owning an executive fast mutex. Another reason for caring arises for drivers that execute in the paging path, as elaborated later on in connection with the “unsafe” way of acquiring an executive fast mutex.

Another, less important, difference between the two kinds of mutex object is that a kernel mutex can be acquired recursively, whereas an executive fast mutex cannot. A thread that does this must release the mutex an equal number of times before the mutex will be considered free

Kernel Timers

There are several usage scenarios for timers,
- Timer used like a self-signalling event
- Timer with a DPC routine to be called when a timer expires
- Periodic timer used to call a DPC routine over and over again.

Using Threads for Synchronization

========================================================================

References :

Rajeev Nagar’s Windows NT File System Internals: A Developer’s Guide (O’Reilly & Associates, 1997).

Art Baker and Jerry Lozano’s The Windows 2000 Device Driver Book (Prentice Hall, 2d ed., 2001) and Viscarola and Mason’s Windows NT Device Driver Development (Macmillan, 1998).

Professional Diary

Monday, 23 January 2012

The I/O Request Packet

Friday, 20 January 2012

Programming Drivers