Why URL as a cache id is inherently dangerous

Od Klas Berlič

While this article focuses on Joomla, this is also valid for all sorts of web applications - from Drupal to nearly all cache classes on phpclasses.org - they all use the same, inherently faulty approach.

Background

Caching is used in all sorts of software and hardware to speed up access to data that has been previously retrieved or generated - such data is stored and subsequent requests for the same data are quickly served from stored copy. The longer it originally takes to retrieve or generate some data, the higher are speed gains when data is served from cached copy.

With php web applications such gains are especially important as php and all surrounding web technologies are not particularly efficient and due to the nature of web there are a lot of request for the same data (multiple users viewing the same page). This inefficiency gets significantly bad when complex calculations or remote data accesses are involved and that's why nearly all advanced web applications make extensive use of various caching techniques.

Practical dilemmas with caching

The basic principle of any storage/retrieval is based on a fact that any stored unit must be assigned unique identifier and be stored according to predefined classification rules, so that it could be easily found and retrieved later. It doesn't matter whether this unit is a cache, a book in a library or a nail in a hardware shop.

As cache is temporary by nature, unique identifiers are normally accompanied by timestamps that tell caching system whether cached data is recent enough that is not expected that it was changed from the time it was originally generated. This technique is further refined by manual removal of cache items on operations that might influence original data (e.g. on article save in Joomla).

It is common practice that web applications base cache identifiers on request URL's as url's identify individual web pages at the first place. Further classification is done on the basis of their internal structures.

Unique cache identifiers (called cacheids) in Joomla are also created from request URL's (view cache, module cache), while cache classification is based on extension name's (folders with file storage or prefixes with other storage types).

Why url as cache id is inherently wrong and dangerous

Short answer: because that opens the doors to DOS attacks.

Anyone can change request url for some page. If this manipulated url does not follow internal web application logic it will end in 404 - Page not found and that's the end of the story. The same will happen if we follow application logic, but enter unexisting identifiers - e.g. we change an article id number in url and requested article does not exist.

But what happens if we add some url parameter to a Joomla request url?

This is where surprise comes in - nothing happens on the surface. Valid url parameters are still accepted and page is served based on those parameters. But under the hood new cache item has been created.

DOS attackers welcome, fill my storage and suck my server resources

Simple &randomparameter=randomnumber is enough to create new cache item. And creation of each such item sucks resources out of your hosting. For each request page has to be newly generated and cache storage is filled with one more item that each it can take 150kB of storage space when we are delaing with with view cache and large pages. Repeat 1000 times and 150MB is gone, send 100.000 request and.. 150GB..your disk is full and CPU is burned. With memory based storages (apc, eaccellerator, memcached..) storage is depleted even faster.

Notice that attack does not need to be fast - with no automatic cleanup implemented in Joomla this can take days (or how long it takes to avoid firewall limitations) with the same end result.

Solutions

Problem must be simultaneously tackled at both ends - at entrance and by limiting it's effects on exit.

Safer Cache id bases

First rule is clear - never use simple URL as base for cache id. Use limited set of url parameters and their values instead. Cacheid should be based on a set parameters that are actually used by your Joomla component (or by any web application) and their filtered! values. This few additional lines of code can save you a lot of headaches.

$cacheidpar = new stdClass();

$registeredurlparams = array ('param1'=>'parametertypefilter', 'param2'=>'parametertypefilter');

foreach ($registeredurlparams AS $key => $value) {
$cacheidpar->$key = JRequest::getVar($key, null,'default',$value);
}

$cacheid = serialize($cacheidpar);

If you are caching some remote data or internal function, don't use URL at all - use parameters used to retrieve that data!

Limiting effects - automated cleanup and limitations

While most of damage can be avoided by using safer cacheid's something must be also done to limit potential damage if things go wrong.

Automated cleanup: clean up expired items from storage periodically, fired on demand from local or remote crontab. Other option is to run cleanup on every Xth access, but this approach will significantly slow down things for that particular user and should so be avoided.

Resource limits: impose limits on storage space on server side (max number of files, max disk or memory usage). This is OS and storage specific so it must be done by hosting administrators.

Note: Joomla security team has been notified about this issue together with patches to fix it around cristmas 09. Up to today (24.3.2010) there was no relase to close the hole. Also exploits for this issue are publicly available.

//Burning disc image by Danilo Rizzuti / FreeDigitalPhotos.net//