To begin I would like to thank all of you that have shown interest in this blog and posted comments. The site launched one week ago and has already generated over 13,000 unique visitors and 40,000 page views. As I am writing this article my previous post is on the front page of del.icio.us. I look forward to expanding and improving the site and ask that if you have any comments or suggestions please contact me.

Following up on my post last week about the eAccelerator PHP optimizer and cache manager I thought I would post a general introduction to caching. This article focuses on caching in an Apache/PHP environment but the principles remain the same for any platform/language you work with.

Browser Cache

Every browser has it’s own cache but the size and behavior of each varies. Fortunately we are not at the sole mercy of the browser to determine how to handle our data. Through the use of HTTP headers you can dictate to the browser when to request updated content and when to serve files from the local cache. I highly suggest downloading the Firefox plugins Firebug and YSlow to help you analyze your HTTP headers. Below you can see a screenshot of the headers tab available on any HTTP request through Firebug.

Firebug Headers

The difficulty of caching is determining how you want the browser to cache and then pulling it all together on the server-side. Different file types, for instance, might be generally more static than others and therefore safer to cache for long periods of time. On the other hand, some content may updated on a regular basis and not need to be cached for long, if at all. Since the browser is closest to the user and your server is potentially never involved in the serving of content the highest performance boost can come from effective browser caching.

Below is an example of setting a header property in PHP. This would tell the browser to cache the files for one year from the date of this post.

<?php
header('Expires: Fri, 25 Apr 2009 00:00:00 GMT');
?>

Here is an example of setting headers with Apache. This would set an expiration of one year from today on all images files.

<FilesMatch "\.(jpg|jpeg|png|gif)$">
Header set Expires "Fri, 25 Apr 2009 00:00:00 GMT"
</FilesMatch>

There are a number of header parameters that can be sent as part of your response to HTTP requests. Here is a breakdown of the ones you should be familiar with for caching purposes…

Last-modified: Fri, 25 Apr 2008 00:00:00 GMT
This header tells the browser when the file being requested was last altered. The browser “asks” your server if it has a file that has a more recent “Last-modified” timestamp than the version that is currently has stored. If a newer file exists on your server then the browser requests the updated file, else the existing file is served. Although the communication does have a little overhead, it is much more efficient than simply serving the same unmodified content over and over.

Etag: “28ff-44aee6630f900″
Etags are basically unique identifiers attached to your files that the browser can use to compare cached files against. This works much like the “Last-modified” tag and there has been quite a bit of debate as to whether one is better than the other or whether to include both. I personally suggest including both as there may be rare situations where the Etag would detect changes that are not effected in the timestamp.

Expires: Fri, 25 Apr 2009 00:00:00 GMT
The expires header is ideal when you can plan on how long your content is safely cacheable. Why is it superior to the previous two cache controls? Using expires does not require a trip to the server to verify the freshness of your content. The browser simply serves the files from the local cache for the fastest user experience and zero server overhead.

Cache-Control: max-age=86400
The max-age tag, much like the expires header, eliminates the need to check for updated content when the cached file is within the age limit specified. The value assigned to the max-age is the number of milliseconds the file will be considered fresh. During that time, the locally stored files will be served. It is important to note that HTTP/1.1 allows caching of anything unless overridden by the “Cache-Control” header.


For static content I highly suggest serving files from a cookie-free subdomain (i.e. static.domain.com) and establishing a “never expires” policy. Many larger site have already taken advantage of this tactic but smaller sites can also benefit from the method. When you need to make a change to a file simply change the reference to the file to an incremented version (javascrtipt_1.js -> javascript_2.js) of itself and then the newer version will be downloaded and cached. There are even ways to automate the versioning process.

For dynamic files it is best to use the Cache-Control: no-cache header and for more static files the Last-modified header is appropriate. Another method to ensure content is refreshed is to append some unique querystring value to the URI.

Example of setting a “never expires” header for all static files…
<FilesMatch "\.(jpg|jpeg|png|gif|swf|css|js|ico|pdf)$">
Header add "Expires" "Mon, 01 Jan 2018 00:00:00 GMT"
Header add "Cache-Control" "max-age=31536000"
</FilesMatch>

or with PHP…
<?php
header('Expires: Mon, 01 Jan 2018 00:00:00 GMT');
header('Cache-Control: max-age=31536000');
?>

Server Caching

Server-side caching can have a huge impact on performance. Since the highest level of caching your PHP files should be set to is Last-modified, Apache will be serving these files most frequently. There are a number of third party caching/PHP optimizers that make caching a breeze such as XCache, eAccelerator, memcached and others. You can even store your compiled pages in memory with these optimizers for an even faster client/server transaction.

You are not limited to using one of these third parties for caching. You can manually cache files in PHP through the use of code like the following…

<?php
$current = $_SERVER["SCRIPT_NAME"];
$parts = Explode('/', $current);
$current = $parts[count($parts) - 1];
$store = 'cache/';
$page = $_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
$cache = $store.md5($page).'.cache';
if(file_exists($cache) && (filemtime($current) < filemtime($cache)) ) {
@readfile($cache);
exit();
}
ob_start();
// YOUR PHP SCRIPT //
$new = fopen($cache, 'w');
fwrite($new, ob_get_contents());
fclose($new);
ob_end_flush();
?>

Intermediary Caching

By intermediary caching I am referring to anything between the browser and the server that caches your content. Perhaps you are using a third party CDN (content delivery network) such as Akamai or edgecast to speed up HTTP request delivery. These types of services are tailored to high volume websites with widespread user-bases and are generally cost-prohibitive to small-medium sized websites.

There are also other caches you may not not even know exist. Many large corporations, educational institutions and even countries cache content coming into their network. Oftentimes the proxies function much like browsers in their respect for HTTP headers however they do not always abide by your rules so be sure and identify private content by defining unique querystring parameters lest sensitive information be spread to multiple recipients.


Hopefully, if you are not caching now, you will be motivated to implement a caching policy soon. I plan to follow up soon with a more in-depth Apache/PHP caching post.

Posted by Michael in API,Apache,PHP,Performance on April 25, 2008

24 Responses

Nice write-up, I’d also recommend using gzip in addition to a cache policy to speed up transfer times… maybe in part 2 of your article ? ;-)

Tim on April 28, 2008 at 5:02 am

@Tim – Absolutely, there are other aspects to performance that I did not mention. I tried to limit the article to only caching. I do intend to follow up with more performance articles, one specifically on Gzip. Thanks.

Michael on April 28, 2008 at 6:55 am

I found this article to be a good reference on how to speed up transfer times.

http://www.samaxes.com/2008/04/20/htaccess-gzip-and-cache-your-site-for-faster-loading-and-bandwidth-saving/

Can’t wait for your next post :-)

Tim on April 30, 2008 at 4:48 pm

I found your article very useful. I am working on a tool which will be used on automated naval terminals. It contains a simple web server which handles HTTP requests, builds and sends HTTP responses. I am in the final stage of development and just looked for some good caching tutorials (using HTTP headers).

Thanks a lot !

Bohonyi Balazs - Zsolt on June 09, 2008 at 1:43 pm

@Bohonyi – Sounds good. Shoot me an email anytime if you have any questions.

Michael on June 09, 2008 at 1:49 pm

WOW!!! GREAT TOPIC HAVE INTRODUCED IN PHPBB

Weiechan on June 16, 2008 at 1:17 am

Thanks for the write up. It helped me understand caching a little better. I did want to point out that the unit for the max-age value is seconds, not milliseconds.

Mike Bosse on August 25, 2008 at 12:58 pm

Thank’s. It’s really essential.

freelance_bangladesh on August 27, 2008 at 12:31 pm

This article made me understand how much irrational traffic my site does. Thanks

Cat Michaels on September 08, 2008 at 7:13 pm

This is really cool resources…hope you will keep on adding more and things like this in coming days. Really appreciated.

Rinxsona on September 11, 2008 at 12:49 am

I’m talking Relevant to the topic: (for visitors)
I had a client, he often modifes his webpages.
When he modified his index.html file and browse it through web-browser he couldn’t see any change at all.. [we generally don't type index.html or default.php or home.asp].
He use to get his problem solved if he types the full url along with the name of the modified file “index.html” here the page seems as he had modified.

If you want a file to be cached then think first “does the file or webpage needs regular update? ”

Webmastes should always be consious about CASHING,
It will help you.

This is good post, visitors should take advantage from it.

www.Raaj.com.np on September 13, 2008 at 6:22 pm

Good article. If you are interested in some LAMP benchmarking with and without MySQL and PHP caching you can find it on my site. I used APC and as far as know, APC will be included in the core of PHP 6. With PHP caching, your site can be 150% faster.
Thank you!

LAMP setup: Make it faster on December 25, 2008 at 8:50 am

Akamai is very *expensive, and unless you’re a big corporation, don’t even think of using it.
My latest development work is fully utilizing memory and file based caching.. maybe better known as content generation. It can deliver complete pages in an instant with zero CPU utilization… Once you’re into caching, you’ll realize that the DB has moved from a front line tool.. to an intermediary data management tool… and it’s the content pregeneration that will deliver super fast pages on a server getting a few hundred thousand request per day…

YC Wee on February 02, 2009 at 8:58 pm

Very good article, thanks. Just been looking into caching for my CMS.

Stu Green on March 16, 2009 at 4:39 pm

very nice post! thanks.

Myfacefriends on May 08, 2009 at 9:30 pm

Very nice post thanks, easy to forget how much caching effects a websites responsivenss (and a computers for that matter). Which ultimately leads to bad end-user experiences/loss of sales….you get the picture.

Robert K on May 13, 2009 at 2:07 am

You folks are performing a great service, for those in the know, and for those seeking to learn.

Many thanks, us grasshoppers will be watching!

Steve R on May 15, 2009 at 7:36 pm

Very good article, thanks.

flyer on August 13, 2009 at 5:48 am

nice article with simple language

Divyesh Karelia on February 08, 2010 at 10:05 pm

Hey… great info indeed! Can you also include gzip to your list and write about it? I think its another great way to managing Caching.

web development Kolkata on February 17, 2010 at 5:56 am

simple language is important for sure :)

angular cheilitis on July 23, 2010 at 4:59 pm
Comments are closed at this time.