Wednesday, September 14, 2011

Disk I/O


Disk I/O

RAM capacity has increased unremittingly over the years and its cost has decreased enough to allow us to be lavish in its use for SQL Server, to help minimize disk I/O. Also, CPU speed has increased to the point where many systems have substantial spare capacity that can often be used to implement data compression and backup compression, again to help reduce I/O pressure. The common factor here is "helping to reduce disk I/O". While disk capacity has improved greatly, disk speed has not, and this poses a great problem; most large, busy OLTP systems end up running into I/O bottlenecks.
The main factors limiting how quickly that data is returned from a single traditional magnetic disk is the overall disk latency, which breaks down as follows:
  • Seek time – the time it takes the head to physically move across the disk to find the data. This will be a limiting factor in the number of I/O operations a single disk can perform per second (IOPS) that your system can support.
  • Rotational latency – the time it takes for the disk to spin to read the data off of the disk. This is a limiting factor in the amount of data a single disk can read per second (usually measured in MB/s), in other words the I/O throughput of the that disk.
Typically, you will have multiple magnetic disks working together in some level of RAID to increase both performance and redundancy. Having more disk spindles (i.e. more physical disks) in a RAID array increases both throughput performance and IOPS performance.
However, a complicating factor here is the performance limitations of your RAID controllers, for direct attached storage, or Host Bus Adaptors (HBAs), for a storage area network. The throughput of such controllers, usually measured in gigabits per second, e.g. 3Gbps, will dictate the upper limit for how much data can be written or read from a disk per second. This can have a huge effect on your overall IOPS and disk throughput capacity for each logical drive that is presented to your host server in Windows.
The relative importance of each of these factors depends on the type of workload being supported; OLTP or DSS/DW. This in turn will determine how you provision the disk storage subsystem.
OLTP workloads are characterized by a high number of short transactions, where the data is tends to be rather volatile (modified frequently). There is usually much higher write activity in an OLTP workload than in a DSS workload. As such, most OLTP systems generate more input/output operations per second (IOPS) than an equivalent sized DSS system.
Furthermore, in most OLTP databases, the read/write activity is largely random, meaning that each transaction will likely require data from a different part of the disk. All of this means that in most OLTP applications, the hard disks will spend most of their time seeking data, and so the seek time of the disk is a crucial bottleneck for an OLTP workload. The seek time for any given disk is determined by how far away from the required data the disk heads are at the time of the read/write request.
A DSS or DW system is usually characterized by longer running queries than a similar size OLTP system. The data in a DSS system is usually more static, with much higher read activity than write activity. The disk activity with a DSS workload also tends to be more sequential and less random than with an OLTP workload. Therefore, for a DSS type of workload, sequential I/O throughput is usually more important than IOPS performance. Adding more disks will increase your sequential throughput until you run into the throughput limitations of your RAID controller or HBA. This is especially true when a DSS/DW system is being loaded with data, and when certain types of complex, long-running queries are executed.
Generally speaking, while OLTP systems are characterized by lots of fast disks, to maximize IOPS to overcome disk latency issues with high numbers of random reads and writes, DW/DSS systems require lots of I/O channels, in order to handle peak sequential throughput demands. An I/O channel is an individual RAID controller or an individual HBA; either of which gives you a dedicated, separate path to either a DAS array or a SAN. The more I/O channels you have, the better.
With all of this general advice in mind, let's now consider each of the major hardware and architectural choices that must be made when provisioning the storage subsystem, including the type of disks used, the type of storage array, and the RAID configuration of the disks that make up the array.

No comments:

Post a Comment