Friday, April 27, 2007

What's caching, really?

In the last post I told you that E2K7 (that’s short for Exchange 2007) uses lots and lots of caching to improve performance, especially on 64-bit architectures where lots and lots of memory is available.

One way to improve performance is to keep all data in zippy-fast RAM (random access memory) as opposed to on slug-like hard drives. Trouble with RAM is that when you reboot the machine, all contents are wiped away whereas hard drives can keep everything between reboots. So you can't only use RAM. The other problem with RAM is that there is never enough of it. 16GB is the practical limit of RAM you can load into a reasonably priced machine nowadays, whereas that same machine could take several Terabytes of disk.

So a compromise is needed. That compromise is caching. Caching means "fronting" the slow disk with fast RAM, and it relies on a property of most applications called "locality of access". The actual number of disk pages accessed during any small period of time is much smaller than the total amount of disk, especially with Exchange. Think about the 1GB of data in your mailbox - how much of it do you ever use at the same time? That means that the whole "working set" of disk pages can be mirrored in RAM, if you have enough of it.

On the first read operation, Exchange can bring the 8 KB disk page containing your data (and some nearby data) into RAM. All subsequent reads and writes can occur onto the mirrored page in RAM. When the user moves onto something else 10 or 12 seconds later (an eternity to a CPU running with a 3GHz clock speed) the RAM-mirrored page begins "ageing", and if it is not touched in a while, it is written back to disk so that the RAM can be used to mirror a different disk page that is in more active use.

This technique vastly reduces the number of iops to disk in favor of mops to RAM, which, are much, much faster, and don't require that you pay for all those expensive disk spindles to supply you with your necessary dose of iops.

One caveat with Exchange is that it is a database (albeit a cheesy one, but more on that in another post), and so there is a requirement to write data back to disk. In databases, this is handled by writing changes to the database to a "transaction log" straight to disk without any pesky caching getting in the way. That way, if the server suddenly crashes, the combination of whatever is on the disk plus the transaction log can bring the database right back up to date. So in the case of Exchange, big caches don't save on log write operations, but they sure save on store iops, which are by far the most common operation.

There are other ways of speeding up access to data other than caching, but they all require reorganizing the data radically and moving away from a more standard relational database organization to a super-customized disk organization, specifically geared towards your application. Even then, customized application-driven caching can still dramatically improve matters from there.

Taking a sub-optimal storage layout and throwing a 64-bit address space worth of caching at the problem as was done for E2K7 may not be elegant, but it sure gets the job done

6 comments:

Anonymous said...

How do you take advantage of cache mode in a Citrix environment. I have read you cannot but then I have heard from others in the industry you can. Does anyone have any first hand experience with this?

Ceryx said...

Windows Terminal Services (or Citrix) allows you to run Microsoft Windows–based programs on a server and display the programs remotely on client computers. For example, you can install a single copy of Office Outlook 2007 on a Windows Terminal Services (or Citrix) server instead of running Outlook locally. This allows multiple users to connect to the server and run Outlook from their Windows Terminal Services (or Citrix) server session.

When the Outlook client is configured to use Exchange Cached Mode, a copy of your mailbox is stored locally on your computer and provides quick access to your data and is frequently updated with the mail server. If you work offline, whether by choice or due to a connection problem, your data is still available to you instantly wherever you are. If a connection from your computer to the server running Exchange isn't available, Microsoft Office Outlook 2007 switches to Trying to connect or Disconnected. If the connection is restored, Microsoft Office Outlook 2007 automatically switches back to Connected or Connected (Headers). Any changes you make while a connection to the server isn't available are synchronized automatically when a connection is available. You can continue to work while changes are synchronized.

Unfortunately, there are some limitations when you use Terminal Services (or Citrix) with Outlook. First and foremost, you cannot use Outlook with Cached Exchange Mode when you run Outlook on Windows Terminal Services (or Citrix). This is a known Terminal Services (and Citrix) limitation.

Anonymous said...

consultants systemically author feel jsyos embarrassed analogous staged scouting summarywhat turners
servimundos melifermuly

Anonymous said...

tear symposia programmea negotiated correction vcaa till romanu feed sleeping punches
servimundos melifermuly

Anonymous said...

dypkyqmgp [url=http://ricostrong.net]Rico Strong[/url]

Anonymous said...

test