ted serbinski – a blog about drupal, macs, productivity, health, and bmws

a blog about drupal, macs, productivity, health, and bmws

Preventing Drupal from Handling 404s for Performance

The .htaccess file included with Drupal tells Apache to send all 404 requests to Drupal to handle. While this is great in some cases, the performance degradation can have a huge impact on a site that has millions of users.

When Drupal processes a 404, it has to bootstrap Drupal, which includes Apache loading up the PHP process, gathering all of the Drupal PHP files, connecting to the database, and running some queries. This is quite expensive when Apache can be told to simply say "Page not found" without having to incur any of that overhead.

Now you might say your site doesn't have any broken URLs as you haven't changed any. Well that's great, but as your site grows, it is going to be a target for spammers and hackers. They are going to start requesting all sorts of file to see if they can find an exploit. Instead of bootstrapping Drupal each time to tell them that DLL file doesn't exist, it would be much better if Apache could just say that, to save resources for your real users.

So, what can you do? How can you stop Drupal from handling 404s but not break modules like imagecache?

Imagecache is one of the few modules that relies on Drupal's 404 handling. It is a very smart module that automatically resizes images. Instead of resizing every single image as they are uploaded, it only resizes them when they are requested, which is great. So if we're going to tell Drupal not to handle 404s, we need to be careful not to break this highly useful module.

To see this in action, visit the ParentsClick Network and test out some 404s. You'll notice that 404s for files and Drupal paths show the same page. The following is the procedure we used to prevent Drupal from handling 404s.

A note, this functionality should really be in core, and this patch is where the necessary .htaccess code used below comes from, only being slightly modified to prevent Drupal from handling 404s completely. The below code is tested and working on Drupal 5.

Step 1 - Update your .htaccess file

-ErrorDocument 404 /index.php
+ErrorDocument 404 /sites/all/themes/foundation/404.php # path to your 404 file

~~~

+  RewriteCond %{REQUEST_FILENAME} !-f
+  RewriteCond %{REQUEST_URI} !^/files/ # this makes it work with imagecache
+  RewriteCond %{REQUEST_URI} \.(png|gif|jpe?g|s?html?|css|js|cgi|ico|swf|flv|dll)$
+  RewriteCond %{REQUEST_URI} !^404.%1$
+  RewriteRule ^(.*)$ 404.%1 [L]

   RewriteCond %{REQUEST_FILENAME} !-f
   RewriteCond %{REQUEST_FILENAME} !-d
   RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

What this basically does is it removes Drupal from handling 404s (removing the /index.php part) and tells Apache to use a specific file if it encounters a 404 (like a missing image or CSS file).

Step 2 - Tell Drupal to stop on 404s too

In your template.php inside of the _phptemplate_variables(), add in this code:

// show custom 404 page
$headers = drupal_get_headers();
if (strpos($headers, 'HTTP/1.1 404') !== FALSE) {
  // make sure this path = ErrorDocument in .htaccess above
  include_once './sites/all/themes/foundation/404.php';
  exit();
}  

This tell Drupal to serve up this 404 page if it can't find the path. The benefit of this is your designers can work on the same file that handles 404s for both Apache and Drupal. It also stops Drupal from fully executing. In Drupal 6 this could happen much earlier using the preprocess templating functions.

Step 3 - Create a 404 file

Create a 404.php file (or 404.html or whatever you want) and place the file where ever you want. Make sure to update the ErrorDocument in the .htaccess to point to this file along with the Drupal template code.

And voilá!

3 comments

 
swentel (not verified) wrote 11 weeks 4 days ago

I was just looking for this too, thanks for the post and I’ll definitely help out testing the patch on d.o.!

 
pbull (not verified) wrote 11 weeks 4 days ago

Great idea. I get tired of seeing “randomexploited.dll” show up in my watchdog logs.

You could also maintain the content of your 404 page within Drupal, and periodically write the output (via wget/curl) to a 404.html file in your root directory. This way your 404 file will have an up-to-date version of your menu/navigation, and if desired, the content of the page can be managed by someone else.

 
NikLP (not verified) wrote 10 weeks 5 days ago

Ted, good thought. Also, pbull – I think that’s a cool idea.

I’ve filed an issue against customerror.module here regarding that. In my opinon, customerror is the error module of choice.

Add your comment

The content of this field is kept private and will not be shown publicly.
  • You can use Textile markup to format text.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <img> <pre>

More information about formatting options

Subscribe to updates

don't worry, spam free!

Recent comments