albertofem's blog

Personal blog about programming
cv - contact - github - twitter

Symfony, Varnish and HTTP: practical considerations

One of the biggest drawbacks of Symfony is it’s performance. Knowing the massive amount of features that comes with it, and the structure of the framework itself, this is to some point normal. However, if we want to deploy enterprise applications made in Symfony, we must, at some point, make use of a reverse cache proxy. This is applicable to any type of application on the internet: a caching strategy is a must, your servers and your users will greatly appreciate it.

I’m going to take for granted that you already know how HTTP caching works. This article is going to be focused on specific practical considerations when using Varnish with Symfony.

Multi-language site example

Consider the next example: we have an application translated into multiple languages. We have a number of static cacheable pages (every one of them on each language), an administration panel, a login system and a user-customized page:

example.com/
    /{locale}/static-page-1/
    /{locale}/static-page-2/
    admin/
    /{locale}/custom-page/
    login/
    logout/

We need to define rules to handle the caching of these pages. We will use a few Symfony configuration rules and some bundles in the way. Hand tight!

Configuring Symfony and LiipCacheControlBundle

There are many ways of telling Symfony how a response should be cached. You can use the @Cache annotation, set it directly in the Response, etc. There is a very good tutorial at the official docs that you should definitely read before continuing. However, for the sake of not reinventing the wheel, I recommend using the LiipCacheControlBundle which eases substantially the task of caching the responses. Once we have the bundle installed, we have to configure it by adding our routes paths, controllers or domains and their cache constraints:

Handling redirection from the root /

The first thing we need to do is to properly redirect the root of our site. Since we are redirecting the user base on the language detected (typically from the User-Agent or using a more sophisticated method like GeoIP), we need to send the Accept-Language value in our Vary response header. This will tell Varnish to store a specific version of the cache for each Vary combination. This way Varnish will cache a 301 redirect response for each Accept-Language and Accept-Encoding combination. This is ok in most cases, but as you can surely notice, if we only have 4 languages in our site, how come are we caching for every Accept-Language combination? We can optimize this by sending a custom header in our root / response and using this header in the Vary value. Yours is the decision on to whether a custom non-standarized header is really worth it or not.

Handling language specific cacheable pages

The rest of the configuration deals with the language specific pages. As you can see, we no longer add the Accept-Language header to the response’s Vary, as we have already take into account the user language in our URL. As you know, Varnish uses your URL and the Vary content in order to calculate a unique hash for the request. If subsequent requests use the same configuration, Varnish just looks up for the hash in the memory to see if is already there or not. If you are using a header like Accept-Language in the Vary, you are losing performance because Varnish would need to cache many combinations of the same page based on the Accept-Language, which isn’t very consistent across browsers. That’s why I always recommend putting essential user information (for example, the language), in the URL, so you can control exactly how many possible combinations are.

Ignoring internal and administration URLs

You can probably guess what the first line means. This prevent Varnish from caching anything beyond those paths. As a general rule you should include here your administration panels and internal URLs. Maybe you are wondering why I added the /login and /logout routes here. Well, in the former, we need to prevent caching because we are using a login form, and chances are this form has some sort of server-side protecting, like csrf. If we were to cache this csrf token, you could only login the first time you access the page, making it impossible for subsequent requests, as the csrf token is the same for all of them. That’s why you should put any path with forms that make use of something similar.

Handling an special case: non-cacheable routes with user personalization

There is an edge case I would like to talk about. In HTTP based applications, personalization is achieved using the Cookie and Set-Cookie request and response headers respectively. This way the server can know exactly who is doing the request and can present personalized content from the backend. Unless you are using a Heartbleed-affected OpenSSL version, this method is safe and is widely used across the internet. However, what happens when Varnish receives a request with a Cookie header? As you can guess, the request is directly passed to the backend. This behaviour is OK, I mean, that’s why the cookies exists in the first place, to store user session over the stateless protocol that is HTTP.

But what if this cookie is not important at all? In our example, we don’t need the PHPSESSID cookie for anything, except for one specific route which uses sessions. The typical answer to this problem is to let Varnish remove the Cookie sent by the user, and write in the default.vcl which special routes should not have their Cookie headers removed on the request. Doing so is again OK, you can even use the powerful VirtualHosts feature in Varnish to store complex configurations, but this can quickly lead to a configuration mess if your are not careful.

I am more interested in the general case. We can use the powerful (but nonetheless cumbersome) Varnish configuration language (vcl). Here is the configuration I wrote that behaves exactly like I want to. It’s explained below:

As you can see, we first see if the request is a cacheable one (only HEAD and GET should be cached), and if the requests contains a Cookie header. We prevent the normal behaviour of Varnish of directly passing the request to the backend by deleting the Cookie header, but first we save it using a custom header: X-Cookie.

When it’s time to deliver the request (whether it was find in the cache or not), we just look for the backend response; if it contains private, no-cache or no-store string in the Cache-control header we just restart the request, but first copying back the Cookie contents. When the request is again received, in vcl_recv, we look for the X-Force-Backend we just set and pass the request untouched to the backend, including the original Cookie header.

This way, we can delegate the decision onto whether a request with a Cookie header should be cached or not to the backend. In our example, we have the custom-page/ route, which returns a Control-cache: private response. Varnish will see that and will return the untouched request from the backend, including the Set-Cookie in the first request.

ESI and URL canonicalization

A couple of final notes. ESI is a very powerful feature that Symfony and Varnish supports out of the box, so you should use it extensively. You may have notice that I used the controller notation in the previous Yaml config snippet. That’s my preferred way of referring to partial actions that can be cached and used as inline subrequests inside Twig templates, etc. This way you can even use wildcards to refer to those partial templates in your config, making it much less verbose and easy to follow. The next controller partial action can be use both in the render function in Twig and also when using AJAX, etc.

Also, if you want to alleviate even more the backend you should look into URL canonicalization. You don’t really want to have duplicate content between www.example.com and example.com or between www.example.com/url and www.example.com/url/. This is not only bad for SEO, but as they are see by the backend like two different requests, it can quickly lead to poor performance from both the backend and Varnish. You should be consisten with your URLs and redirect the rest of them to their unique counterparts. You can do this using Apache rewrite rules, for example, or whatever you backend configuration is. You can also use this bundle: RouterUnslashBundle, which automatically redirects (using the router and Symfony events) your routes with or without slash to their real unique ones. This is very convenient, as routing consistency can be difficult in Symfony, specially when you are using @Route annotations or importing routes from other bundles, etc.

Conclusion

Use cache whenever possible. Take ESI into account if you really want to achieve super-b performance, and fine-tune your reverse cache proxy configuration to fit the needs of your application. Make use of this technologies and best practices in order to deliver a better experience to your users.