|
by D. John ShakshoberThe Linux kernel, the core of the operating system, is responsiblefor controlling disk access by using kernel I/O scheduling. Red HatEnterprise Linux 3 with a 2.4 kernel base uses a single, robust,general purpose I/O elevator. The 2.4 I/O scheduler has a reasonablenumber of tuning options by controlling the amount of time a requestremains in an I/O queue before being serviced using the elvtunecommand. While Red Hat Enterprise Linux 3 offers most workloadsexcellent performance, it does not always provide the best I/Ocharacteristics for the wide range of applications in use by Linuxusers these days. The I/O schedulers provided in Red Hat EnterpriseLinux 4, embedded in the 2.6 kernel, have advanced the I/O capabilitiesof Linux significantly. With Red Hat Enterprise Linux 4, applicationscan now optimize the kernel I/O at boot time, by selecting one of fourdifferent I/O schedulers to accommodate different I/O usage patterns:
- Completely Fair Queuing—elevator=cfq (default)
- Deadline—elevator=deadline
- NOOP—elevator=noop
- Anticipatory—elevator=as
Add the elevator options from Table 1 to your kernel command in the GRUB boot loader configuration file (/boot/grub/grub.conf) or the eLILO command line. Red Hat Enterprise Linux 4 has all four elevators built-in; no need to rebuild your kernel.
The 2.6 kernel incorporates the best I/O algorithms that developersand researchers have shared with the open-source community as ofmid-2004. These schedulers have been available in Fedora Core 3 andwill continue to be used in Fedora Core 4. There have been several goodcharacterization papers on using evaluating Linux 2.6 I/O schedulers. Afew are referenced at the end of this article. This article details ourown study based on running Oracle 10G in both OLTP and DSS workloadswith EXT3 file systems.
Red Hat Enterprise Linux 4 I/O schedulersIncluded in Red Hat Enterprise Linux 4 are four custom configuredschedulers from which to choose. They each offer a differentcombination of optimizations.
The Completely Fair Queuing (CFQ) scheduler is the default algorthimin Red Hat Enterprise Linux 4. As the name implies, CFQ maintains ascalable per-process I/O queue and attempts to distribute the availableI/O bandwidth equally among all I/O requests. CFQ is well suited formid-to-large multi-processor systems and for systems which requirebalanced I/O performance over multiple LUNs and I/O controllers.
The Deadline elevator uses a deadline algorithm to minimize I/Olatency for a given I/O request. The scheduler provides near real-timebehavior and uses a round robin policy to attempt to be fair amongmultiple I/O requests and to avoid process starvation. Using five I/Oqueues, this scheduler will aggressively re-order requests to improveI/O performance.
The NOOP scheduler is a simple FIFO queue and uses the minimalamount of CPU/instructions per I/O to accomplish the basic merging andsorting functionality to complete the I/O. It assumes performance ofthe I/O has been or will be optimized at the block device (memory-disk)or with an intelligent HBA or externally attached controller.
The Anticipatory elevator introduces a controlled delay beforedispatching the I/O to attempt to aggregate and/or re-order requestsimproving locality and reducing disk seek operations. This algorithm isintended to optimize systems with small or slow disk subsystems. Oneartifact of using the AS scheduler can be higher I/O latency.
Choosing an I/O elevatorThe definitions above may give enough information to make a choicefor your I/O scheduler. The other extreme is to actually test and tuneyour workload on each I/O scheduler by simply rebooting your system andmeasuring your exact environment. We have done just that for Red HatEnterprise Linux 3 and all four Red Hat Enterprise Linux 4 I/Oschedulers using an Oracle 10G I/O workloads.
Figure 1 shows the results of running an Oracle 10G OLTP workloadrunning on a 2-CPU/2-HT Xeon with 4 GB of memory across 8 LUNs on anLSIlogic megraraid controller. The OLTP load ran mostly 4k random I/Owith a 50% read/write ratio. The DSS workload consists of 100%sequential read queries using large 32k-256k byte transfer sizes.
Figure 1. Red Hat Enterprise Linux 4 IO schedulersvs. Red Hat Enterprise Linux 3 for database Oracle 10G oltp/dss(relative performance)
The CFQ scheduler was chosen as the default since it offers thehighest performance for the widest range of applications and I/O systemdesigns. We have seen CFQ excel in both throughput and latency onmulti-processor systems with up to 16-CPUs and for systems with 2 to 64LUNs for both UltraSCSI and Fiber Channel disk farms. In addition, CFQis easy to tune by adjusting the nr_requests parameter in/proc/sys/scsi subsystem to match the capabilities of any given I/Osubsystem.
The Deadline scheduler excelled at attempting to reduce the latencyof any given single I/O for real-time like environments. A problemwhich depends on an even balance of transactions across multiple HBA,drives or multiple file systems may not always do best with theDeadline scheduler. The Oracle 10G OLTP load using 10 simultaneoususers spread over eight LUNs showed improvement using Deadline relativeto Red Hat Enterprise Linux 3's I/O elevator, but was still 12.5% lowerthan CFQ.
The NOOP scheduler indeed freed up CPU cycles but performed 23%fewer transactions per minute when using the same number of clientsdriving the Oracle 10G database. The reduction in CPU cycles wasproportional to the drop in performance, so perhaps this scheduler maywork well for systems which drive their databases into CPU saturation.But CFQ or Deadline yield better throughput for the same client loadthan the NOOP scheduler.
The AS scheduler excels on small systems which have limited I/Oconfigurations and have only one or two LUNs. By design, the ASscheduler is a nice choice for client and workstation machines whereinteractive response time is a higher priority than I/O latency.
Summary: Have it your way!The short summary of our study indicates that there is no SINGLEanswer to which I/O scheduler is best. The good news is that with RedHat Enterprise Linux 4 an end-user can customize their scheduler with asimple boot option. Our data suggests the default Red Hat EnterpriseLinux 4 I/O scheduler, CFQ, provides the most scalable algorithm forthe widest range of systems, configurations, and commercial databaseusers. However, we have also measured other workloads whereby theDeadline scheduler out-performed CFQ for large sequential read-mostlyDSS queries. Other studies referenced in the section "References"explored using the AS scheduler to help interactive response times. Inaddition, noop has proven to free up CPU cycles and provide adequateI/O performance for systems with intelligent I/O controller whichprovide their own I/O ordering capabilities.
In conclusion, we recommend baselining an application with thedefault CFQ. Use this article and its references to match yourapplication to one of the studies. Then adjust the I/O scheduler viathe simple command line re-boot option if seeking additionalperformance. Make only one change at a time, and use performance toolsto validate the results.
ReferencesAxboe, J., "Deadline I/O Scheduler Tunables, SuSE, EDF R&D, 2003.
Braswell, B., Ciliendo, E., "Tuning Red Hat Enterprise Linux on IBMeServer xSeries Servers", ibm.com/redbooks.
Corbet, J., "The Continuing Development of I/O Scheduling", http://lwn.net/Articles/21274.
Heger, D., Pratt, S., "Workload Dependent PerformanceEvaluation of the Linux 2.6 I/O Schedulers", Linux Symposium, Ottawa,Canada, July 2004.
Likins, Adrian. "System Tuning Info for Linux Servers", http://people.redhat.com/alikins/system_tuning.html
About the authorD. John Shakshober is a Consulting Engineer forRed Hat in Westford, MAfocusing on kernel and benchmark performance. Prior to Red Hat, Johnwas Technical Director of Performance Engineering at HP, Compaq, andDigital, working on Linux and Tru64 Unix benchmark performanceengineering in Nashua NH. He has an M.S. in Electrical Engineering fromCornell University and a B.S. in Computer Engineering from RochesterInstitute of Technology. |
|