Chapter 13: Deploying Django
最后更新于:2022-04-01 04:48:02
This chapter covers the last essential step of building a Django application: deploying it to a production server.
If youve been following along with our ongoing examples, youve likely been using the `runserver`, which makes things very easy with `runserver`, you dont have to worry about Web server setup. But `runserver` is intended only for development on your local machine, not for exposure on the public Web. To deploy your Django application, youll need to hook it into an industrial-strength Web server such as Apache. In this chapter, well show you how to do that but, first, well give you a checklist of things to do in your codebase before you go live.
[TOC]
## Preparing Your Codebase for Production
### Deployment checklist
The Internet is a hostile environment. Before deploying your Django project, you should take some time to review your settings, with security, performance, and operations in mind.
Django includes many security features . Some are built-in and always enabled. Others are optional because they arent always appropriate, or because theyre inconvenient for development. For example, forcing HTTPS may not be suitable for all websites, and its impractical for local development.
Performance optimizations are another category of trade-offs with convenience. For instance, caching is useful in production, less so for local development. Error reporting needs are also widely different.
The following checklist includes settings that:
* must be set properly for Django to provide the expected level of security;
* are expected to be different in each environment;
* enable optional security features;
* enable performance optimizations; and
* provide error reporting.
Many of these settings are sensitive and should be treated as confidential. If youre releasing the source code for your project, a common practice is to publish suitable settings for development, and to use a private settings module for production.
Some of the checks described below can be automated using the `--deploy` option of the `check` command. Be sure to run it against your production settings file as described in the options documentation.
## Critical settings
### SECRET_KEY
The secret key must be a large random value and it must be kept secret.
Make sure that the key used in production isnt used anywhere else and avoid committing it to source control. This reduces the number of vectors from which an attacker may acquire the key.
Instead of hardcoding the secret key in your settings module, consider loading it from an environment variable:
~~~
import os
SECRET_KEY = os.environ['SECRET_KEY']
~~~
or from a file:
~~~
with open('/etc/secret_key.txt') as f:
SECRET_KEY = f.read().strip()
~~~
### DEBUG
You must never enable debug in production.
When we created a project in Chapter 1, the command `django-admin startproject` created a `settings.py` file with `DEBUG` set to `True`. Many internal parts of Django check this setting and change their behavior if `DEBUG`mode is on. For example, if `DEBUG` is set to `True`, then:
* All database queries will be saved in memory as the object `django.db.connection.queries`. As you can imagine, this eats up memory!
* Any 404 error will be rendered by Djangos special 404 error page (covered in Chapter 3) rather than returning a proper 404 response. This page contains potentially sensitive information and should *not* be exposed to the public Internet.
* Any uncaught exception in your Django application from basic Python syntax errors to database errors to template syntax errors will be rendered by the Django pretty error page that youve likely come to know and love. This page contains even *more* sensitive information than the 404 page and should *never* be exposed to the public.
In short, setting `DEBUG` to `True` tells Django to assume only trusted developers are using your site. The Internet is full of untrustworthy hooligans, and the first thing you should do when youre preparing your application for deployment is set `DEBUG` to `False`.
## Environment-specific settings
### ALLOWED_HOSTS
When `DEBUG = False`, Django doesnt work at all without a suitable value for `ALLOWED_HOSTS`.
This setting is required to protect your site against some CSRF attacks. If you use a wildcard, you must perform your own validation of the `Host` HTTP header, or otherwise ensure that you arent vulnerable to this category of attacks.
### CACHES
If youre using a cache, connection parameters may be different in development and in production.
Cache servers often have weak authentication. Make sure they only accept connections from your application servers.
If youre using Memcached, consider using cached sessions to improve performance.
### DATABASES
Database connection parameters are probably different in development and in production.
Database passwords are very sensitive. You should protect them exactly like `SECRET_KEY`.
For maximum security, make sure database servers only accept connections from your application servers.
If you havent set up backups for your database, do it right now!
### EMAIL_BACKEND and related settings
If your site sends emails, these values need to be set correctly.
### STATIC_ROOT and STATIC_URL
Static files are automatically served by the development server. In production, you must define a`STATIC_ROOT` directory where `collectstatic` will copy them.
### MEDIA_ROOT and MEDIA_URL
Media files are uploaded by your users. Theyre untrusted! Make sure your web server never attempt to interpret them. For instance, if a user uploads a `.php` file , the web server shouldnt execute it.
Now is a good time to check your backup strategy for these files.
## HTTPS
Any website which allows users to log in should enforce site-wide HTTPS to avoid transmitting access tokens in clear. In Django, access tokens include the login/password, the session cookie, and password reset tokens. (You cant do much to protect password reset tokens if youre sending them by email.)
Protecting sensitive areas such as the user account or the admin isnt sufficient, because the same session cookie is used for HTTP and HTTPS. Your web server must redirect all HTTP traffic to HTTPS, and only transmit HTTPS requests to Django.
Once youve set up HTTPS, enable the following settings.
### `CSRF_COOKIE_SECURE`
Set this to `True` to avoid transmitting the CSRF cookie over HTTP accidentally.
### `SESSION_COOKIE_SECURE`
Set this to `True` to avoid transmitting the session cookie over HTTP accidentally.
## Performance optimizations
Setting `DEBUG = False` disables several features that are only useful in development. In addition, you can tune the following settings.
### `CONN_MAX_AGE`
Enabling persistent database connections can result in a nice speed-up when connecting to the database accounts for a significant part of the request processing time.
This helps a lot on virtualized hosts with limited network performance.
### `TEMPLATES`
Enabling the cached template loader often improves performance drastically, as it avoids compiling each template every time it needs to be rendered. See the template loaders docs for more information.
## Error reporting
By the time you push your code to production, its hopefully robust, but you cant rule out unexpected errors. Thankfully, Django can capture errors and notify you accordingly.
### `LOGGING`
Review your logging configuration before putting your website in production, and check that it works as expected as soon as you have received some traffic.
### `ADMINS` and `MANAGERS`
`ADMINS` will be notified of 500 errors by email.
`MANAGERS` will be notified of 404 errors. `IGNORABLE_404_URLS` can help filter out spurious reports.
Error reporting by email doesnt scale very well
Consider using an error monitoring system such as [Sentry](http://sentry.readthedocs.org/en/latest/) before your inbox is flooded by reports. Sentry can also aggregate logs.
### Customize the default error views
Django includes default views and templates for several HTTP error codes. You may want to override the default templates by creating the following templates in your root template directory: `404.html`, `500.html`,`403.html`, and `400.html`. The default views should suffice for 99% of Web applications, but if you desire to customize them, see these instructions which also contain details about the default templates:
* `http_not_found_view`
* `http_internal_server_error_view`
* `http_forbidden_view`
* `http_bad_request_view`
## Using a virtualenv
If you install your projects Python dependencies inside a [virtualenv](http://www.virtualenv.org/), youll need to add the path to this virtualenvs `site-packages` directory to your Python path as well. To do this, add an additional path to your`WSGIPythonPath` directive, with multiple paths separated by a colon (`:`) if using a UNIX-like system, or a semicolon (`;`) if using Windows. If any part of a directory path contains a space character, the complete argument string to `WSGIPythonPath` must be quoted:
~~~
WSGIPythonPath /path/to/mysite.com:/path/to/your/venv/lib/python3.X/site-packages
~~~
Make sure you give the correct path to your virtualenv, and replace `python3.X` with the correct Python version (e.g. `python3.4`).
## Using Different Settings for Production
So far in this book, weve dealt with only a single settings file: the `settings.py` generated by `django-admin.pystartproject`. But as you get ready to deploy, youll likely find yourself needing multiple settings files to keep your development environment isolated from your production environment. (For example, you probably wont want to change `DEBUG` from `False` to `True` whenever you want to test code changes on your local machine.) Django makes this very easy by allowing you to use multiple settings files.
If youd like to organize your settings files into production and development settings, you can accomplish this in one of three ways:
* Set up two full-blown, independent settings files.
* Set up a base settings file (say, for development) and a second (say, production) settings file that merely imports from the first one and defines whatever overrides it needs to define.
* Use only a single settings file that has Python logic to change the settings based on context.
Well take these one at a time.
First, the most basic approach is to define two separate settings files. If youre following along, youve already got `settings.py`. Now, just make a copy of it called `settings_production.py`. (We made this name up; you can call it whatever you want.) In this new file, change `DEBUG`, etc.
The second approach is similar but cuts down on redundancy. Instead of having two settings files whose contents are mostly similar, you can treat one as the base file and create another file that imports from it. For example:
~~~
# settings.py
DEBUG = True
TEMPLATE_DEBUG = DEBUG
DATABASE_ENGINE = 'postgresql_psycopg2'
DATABASE_NAME = 'devdb'
DATABASE_USER = ''
DATABASE_PASSWORD = ''
DATABASE_PORT = ''
# ...
# settings_production.py
from settings import *
DEBUG = TEMPLATE_DEBUG = False
DATABASE_NAME = 'production'
DATABASE_USER = 'app'
DATABASE_PASSWORD = 'letmein'
~~~
Here, `settings_production.py` imports everything from `settings.py` and just redefines the settings that are particular to production. In this case, `DEBUG` is set to `False`, but weve also set different database access parameters for the production setting. (The latter goes to show that you can redefine *any* setting, not just the basic ones like `DEBUG`.)
Finally, the most concise way of accomplishing two settings environments is to use a single settings file that branches based on the environment. One way to do this is to check the current hostname. For example:
~~~
# settings.py
import socket
if socket.gethostname() == 'my-laptop':
DEBUG = TEMPLATE_DEBUG = True
else:
DEBUG = TEMPLATE_DEBUG = False
# ...
~~~
Here, we import the `socket` module from Pythons standard library and use it to check the current systems hostname. We can check the hostname to determine whether the code is being run on the production server.
A core lesson here is that settings files are *just Python code*. They can import from other files, they can execute arbitrary logic, etc. Just make sure that, if you go down this road, the Python code in your settings files is bulletproof. If it raises any exceptions, Django will likely crash badly.
Renaming settings.py
Feel free to rename your `settings.py` to `settings_dev.py` or `settings/dev.py` or `foobar.py` Django doesnt care, as long as you tell it what settings file youre using.
But if you *do* rename the `settings.py` file that is generated by `django-admin.py startproject`, youll find that`manage.py` will give you an error message saying that it cant find the settings. Thats because it tries to import a module called `settings`. You can fix this either by editing `manage.py` to change `settings` to the name of your module, or by using `django-admin.py` instead of `manage.py`. In the latter case, youll need to set the`DJANGO_SETTINGS_MODULE` environment variable to the Python path to your settings file (e.g.,`'mysite.settings'`).
## Deploying Django to a production server
Headache free deployment
If you are serious about deploying a live website, there is really only one sensible option – find a host that explicitly supports Django. Not only will you get a separate media server out of the box (usually Nginx), but they will also take care of the little things like setting up Apache correctly and setting a cron job that restarts the Python process periodically (to prevent your site hanging up). With the better hosts, you are also likely to get some form of one-click deployment. Save yourself the headache and pay the few bucks a month for a host who knows Django.
## Deploying Django with Apache and mod_wsgi
Deploying Django with [Apache](http://httpd.apache.org/) and [mod_wsgi](http://code.google.com/p/modwsgi/) is a tried and tested way to get Django into production.
mod_wsgi is an Apache module which can host any Python [WSGI](http://www.wsgi.org/) application, including Django. Django will work with any version of Apache which supports mod_wsgi.
The [official mod_wsgi documentation](http://code.google.com/p/modwsgi/) is fantastic; its your source for all the details about how to use mod_wsgi. Youll probably want to start with the [installation and configuration documentation](http://code.google.com/p/modwsgi/wiki/InstallationInstructions).
### Basic configuration
Once youve got mod_wsgi installed and activated, edit your Apache servers `httpd.conf` file and add the following. If you are using a version of Apache older than 2.4, replace `Require all granted` with `Allow fromall` and also add the line `Order deny,allow` above it.
~~~
WSGIScriptAlias / /path/to/mysite.com/mysite/wsgi.py
WSGIPythonPath /path/to/mysite.com
<Directory /path/to/mysite.com/mysite>
<Files wsgi.py>
Require all granted
</Files>
</Directory>
~~~
The first bit in the `WSGIScriptAlias` line is the base URL path you want to serve your application at (`/`indicates the root url), and the second is the location of a WSGI file see below on your system, usually inside of your project package (`mysite` in this example). This tells Apache to serve any request below the given URL using the WSGI application defined in that file.
The `WSGIPythonPath` line ensures that your project package is available for import on the Python path; in other words, that `import mysite` works.
The `<Directory>` piece just ensures that Apache can access your `wsgi.py` file.
Next well need to ensure this `wsgi.py` with a WSGI application object exists. As of Django version 1.4,`startproject` will have created one for you; otherwise, youll need to create it. See the WSGI overview for the default contents you should put in this file, and what else you can add to it.
Warning
If multiple Django sites are run in a single mod_wsgi process, all of them will use the settings of whichever one happens to run first. This can be solved by changing:
~~~
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "{{ project_name }}.settings")
~~~
in `wsgi.py`, to:
~~~
os.environ["DJANGO_SETTINGS_MODULE"] = "{{ project_name }}.settings"
~~~
or by using mod_wsgi daemon mode and ensuring that each site runs in its own daemon process.
### Using mod_wsgi daemon mode
Daemon mode is the recommended mode for running mod_wsgi (on non-Windows platforms). To create the required daemon process group and delegate the Django instance to run in it, you will need to add appropriate `WSGIDaemonProcess` and `WSGIProcessGroup` directives. A further change required to the above configuration if you use daemon mode is that you cant use `WSGIPythonPath`; instead you should use the`python-path` option to `WSGIDaemonProcess`, for example:
~~~
WSGIDaemonProcess example.com python-path=/path/to/mysite.com:/path/to/venv/lib/python2.7/site-packages
WSGIProcessGroup example.com
~~~
See the official mod_wsgi documentation for [details on setting up daemon mode](http://code.google.com/p/modwsgi/wiki/QuickConfigurationGuide#Delegation_To_Daemon_Process).
### Serving files
Django doesnt serve files itself; it leaves that job to whichever Web server you choose.
We recommend using a separate Web server i.e., one thats not also running Django for serving media. Here are some good choices:
* [Nginx](http://wiki.nginx.org/Main)
* A stripped-down version of [Apache](http://httpd.apache.org/)
If, however, you have no option but to serve media files on the same Apache `VirtualHost` as Django, you can set up Apache to serve some URLs as static media, and others using the mod_wsgi interface to Django.
This example sets up Django at the site root, but explicitly serves `robots.txt`, `favicon.ico`, any CSS file, and anything in the `/static/` and `/media/` URL space as a static file. All other URLs will be served using mod_wsgi:
~~~
Alias /robots.txt /path/to/mysite.com/static/robots.txt
Alias /favicon.ico /path/to/mysite.com/static/favicon.ico
Alias /media/ /path/to/mysite.com/media/
Alias /static/ /path/to/mysite.com/static/
<Directory /path/to/mysite.com/static>
Require all granted
</Directory>
<Directory /path/to/mysite.com/media>
Require all granted
</Directory>
WSGIScriptAlias / /path/to/mysite.com/mysite/wsgi.py
<Directory /path/to/mysite.com/mysite>
<Files wsgi.py>
Require all granted
</Files>
</Directory>
~~~
If you are using a version of Apache older than 2.4, replace `Require all granted` with `Allow from all` and also add the line `Order deny,allow` above it.
### Serving the admin files
When `django.contrib.staticfiles` is in `INSTALLED_APPS`, the Django development server automatically serves the static files of the admin app (and any other installed apps). This is however not the case when you use any other server arrangement. Youre responsible for setting up Apache, or whichever Web server youre using, to serve the admin files.
The admin files live in (`django/contrib/admin/static/admin`) of the Django distribution.
We strongly recommend using `django.contrib.staticfiles` to handle the admin files (along with a Web server as outlined in the previous section; this means using the `collectstatic` management command to collect the static files in `STATIC_ROOT`, and then configuring your Web server to serve `STATIC_ROOT` at`STATIC_URL`), but here are three other approaches:
1. Create a symbolic link to the admin static files from within your document root (this may require`+FollowSymLinks` in your Apache configuration).
2. Use an `Alias` directive, as demonstrated above, to alias the appropriate URL (probably `STATIC_URL` + `admin/`) to the actual location of the admin files.
3. Copy the admin static files so that they live within your Apache document root.
### If you get a UnicodeEncodeError
If youre taking advantage of the internationalization features of Django and you intend to allow users to upload files, you must ensure that the environment used to start Apache is configured to accept non-ASCII file names. If your environment is not correctly configured, you will trigger `UnicodeEncodeError`exceptions when calling functions like the ones in `os.path` on filenames that contain non-ASCII characters.
To avoid these problems, the environment used to start Apache should contain settings analogous to the following:
~~~
export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'
~~~
Consult the documentation for your operating system for the appropriate syntax and location to put these configuration items; `/etc/apache2/envvars` is a common location on Unix platforms. Once you have added these statements to your environment, restart Apache.
## Serving static files in production
The basic outline of putting static files into production is simple: run the `collectstatic` command when static files change, then arrange for the collected static files directory (`STATIC_ROOT`) to be moved to the static file server and served. Depending on `STATICFILES_STORAGE`, files may need to be moved to a new location manually or the `post_process` method of the `Storage` class might take care of that.
Of course, as with all deployment tasks, the devils in the details. Every production setup will be a bit different, so youll need to adapt the basic outline to fit your needs. Below are a few common patterns that might help.
### Serving the site and your static files from the same server
If you want to serve your static files from the same server thats already serving your site, the process may look something like:
* Push your code up to the deployment server.
* On the server, run `collectstatic` to copy all the static files into `STATIC_ROOT`.
* Configure your web server to serve the files in `STATIC_ROOT` under the URL `STATIC_URL`. For example, heres how to do this with Apache and mod_wsgi .
Youll probably want to automate this process, especially if youve got multiple web servers. Theres any number of ways to do this automation, but one option that many Django developers enjoy is [Fabric](http://fabfile.org/).
Below, and in the following sections, well show off a few example fabfiles (i.e. Fabric scripts) that automate these file deployment options. The syntax of a fabfile is fairly straightforward but wont be covered here; consult [Fabrics documentation](http://docs.fabfile.org/), for a complete explanation of the syntax.
So, a fabfile to deploy static files to a couple of web servers might look something like:
~~~
from fabric.api import *
# Hosts to deploy onto
env.hosts = ['www1.example.com', 'www2.example.com']
# Where your project code lives on the server
env.project_root = '/home/www/myproject'
def deploy_static():
with cd(env.project_root):
run('./manage.py collectstatic -v0 --noinput')
~~~
### Serving static files from a dedicated server
Most larger Django sites use a separate Web server i.e., one thats not also running Django for serving static files. This server often runs a different type of web server faster but less full-featured. Some common choices are:
* [Nginx](http://wiki.nginx.org/Main)
* A stripped-down version of [Apache](http://httpd.apache.org/)
Configuring these servers is out of scope of this document; check each servers respective documentation for instructions.
Since your static file server wont be running Django, youll need to modify the deployment strategy to look something like:
* When your static files change, run `collectstatic` locally.
* Push your local `STATIC_ROOT` up to the static file server into the directory thats being served. [rsync](https://rsync.samba.org/) is a common choice for this step since it only needs to transfer the bits of static files that have changed.
Heres how this might look in a fabfile:
~~~
from fabric.api import *
from fabric.contrib import project
# Where the static files get collected locally. Your STATIC_ROOT setting.
env.local_static_root = '/tmp/static'
# Where the static files should go remotely
env.remote_static_root = '/home/www/static.example.com'
@roles('static')
def deploy_static():
local('./manage.py collectstatic')
project.rsync_project(
remote_dir = env.remote_static_root,
local_dir = env.local_static_root,
delete = True
)
~~~
### Serving static files from a cloud service or CDN
Another common tactic is to serve static files from a cloud storage provider like Amazons S3 and/or a CDN (content delivery network). This lets you ignore the problems of serving static files and can often make for faster-loading webpages (especially when using a CDN).
When using these services, the basic workflow would look a bit like the above, except that instead of using `rsync` to transfer your static files to the server youd need to transfer the static files to the storage provider or CDN.
Theres any number of ways you might do this, but if the provider has an API a custom file storage backend will make the process incredibly simple. If youve written or are using a 3rd party custom storage backend, you can tell `collectstatic` to use it by setting `STATICFILES_STORAGE` to the storage engine.
For example, if youve written an S3 storage backend in `myproject.storage.S3Storage` you could use it with:
~~~
STATICFILES_STORAGE = 'myproject.storage.S3Storage'
~~~
Once thats done, all you have to do is run `collectstatic` and your static files would be pushed through your storage package up to S3\. If you later needed to switch to a different storage provider, it could be as simple as changing your `STATICFILES_STORAGE` setting.
There are 3rd party apps available that provide storage backends for many common file storage APIs. A good starting point is the [overview at djangopackages.com](https://www.djangopackages.com/grids/g/storage-backends/).
## Scaling
Now that you know how to get Django running on a single server, lets look at how you can scale out a Django installation. This section walks through how a site might scale from a single server to a large-scale cluster that could serve millions of hits an hour.
Its important to note, however, that nearly every large site is large in different ways, so scaling is anything but a one-size-fits-all operation. The following coverage should suffice to show the general principle, and whenever possible well try to point out where different choices could be made.
First off, well make a pretty big assumption and exclusively talk about scaling under Apache and mod_python. Though we know of a number of successful medium- to large-scale FastCGI deployments, were much more familiar with Apache.
### Running on a Single Server
Most sites start out running on a single server, with an architecture that looks something like Figure 13-1.
[![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2015-11-10_5641b71466ee7.png)](http://masteringdjango.com/wp-content/uploads/2015/09/scaling-1.png)
Figure 13-1: a single server Django setup.
However, as traffic increases youll quickly run into *resource contention* between the different pieces of software. Database servers and Web servers *love* to have the entire server to themselves, so when run on the same server they often end up fighting over the same resources (RAM, CPU) that theyd prefer to monopolize.
This is solved easily by moving the database server to a second machine, as explained in the following section.
### Separating Out the Database Server
As far as Django is concerned, the process of separating out the database server is extremely easy: youll simply need to change the `DATABASE_HOST` setting to the IP or DNS name of your database server. Its probably a good idea to use the IP if at all possible, as relying on DNS for the connection between your Web server and database server isnt recommended.
With a separate database server, our architecture now looks like Figure 13-2.
[![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2015-11-10_5641b71475a5c.png)](http://masteringdjango.com/wp-content/uploads/2015/09/scaling-2.png)
Figure 13-2: Moving the database onto a dedicated server.
Here were starting to move into whats usually called *n-tier* architecture. Dont be scared by the buzzword it just refers to the fact that different tiers of the Web stack get separated out onto different physical machines.
At this point, if you anticipate ever needing to grow beyond a single database server, its probably a good idea to start thinking about connection pooling and/or database replication. Unfortunately, theres not nearly enough space to do those topics justice in this book, so youll need to consult your databases documentation and/or community for more information.
### Running a Separate Media Server
We still have a big problem left over from the single-server setup: the serving of media from the same box that handles dynamic content.
Those two activities perform best under different circumstances, and by smashing them together on the same box you end up with neither performing particularly well. So the next step is to separate out the media that is, anything *not* generated by a Django view onto a dedicated server (see Figure 13-3).
[![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2015-11-10_5641b71481a4d.png)](http://masteringdjango.com/wp-content/uploads/2015/09/scaling-3.png)
Figure 13-3: Separating out the media server.
Ideally, this media server should run a stripped-down Web server optimized for static media delivery.[Nginx](http://wiki.nginx.org/Main) is the preferred option here, although lighttpd is another option, or a heavily stripped down Apache could work too.
For sites heavy in static content (photos, videos, etc.), moving to a separate media server is doubly important and should likely be the *first* step in scaling up.
This step can be slightly tricky, however. If your application involves file uploads, Django needs to be able to write uploaded media to the media server. If media lives on another server, youll need to arrange a way for that write to happen across the network.
### Implementing Load Balancing and Redundancy
At this point, weve broken things down as much as possible. This three-server setup should handle a very large amount of traffic we served around 10 million hits a day from an architecture of this sort so if you grow further, youll need to start adding redundancy.
This is a good thing, actually. One glance at Figure 13-3 shows you that if even a single one of your three servers fails, youll bring down your entire site. So as you add redundant servers, not only do you increase capacity, but you also increase reliability.
For the sake of this example, lets assume that the Web server hits capacity first. Its relatively easy to get multiple copies of a Django site running on different hardware just copy all the code onto multiple machines, and start Apache on both of them.
However, youll need another piece of software to distribute traffic over your multiple servers: a *load balancer*. You can buy expensive and proprietary hardware load balancers, but there are a few high-quality open source software load balancers out there.
Apaches `mod_proxy` is one option, but weve found Perlbal ([http://www.djangoproject.com/r/perlbal/](http://www.djangoproject.com/r/perlbal/)) to be fantastic. Its a load balancer and reverse proxy written by the same folks who wrote `memcached` (see Chapter 17).
With the Web servers now clustered, our evolving architecture starts to look more complex, as shown in Figure 13-4.
[![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2015-11-10_5641b71496caa.png)](http://masteringdjango.com/wp-content/uploads/2015/09/scaling-4.png)
Figure 13-4: A load-balanced, redundant server setup.
Notice that in the diagram the Web servers are referred to as a cluster to indicate that the number of servers is basically variable. Once you have a load balancer out front, you can easily add and remove back-end Web servers without a second of downtime.
### Going Big
At this point, the next few steps are pretty much derivatives of the last one:
* As you need more database performance, you might want to add replicated database servers. MySQL includes built-in replication; PostgreSQL users should look into Slony ([http://www.djangoproject.com/r/slony/](http://www.djangoproject.com/r/slony/)) and pgpool ([http://www.djangoproject.com/r/pgpool/](http://www.djangoproject.com/r/pgpool/)) for replication and connection pooling, respectively.
* If the single load balancer isnt enough, you can add more load balancer machines out front and distribute among them using round-robin DNS.
* If a single media server doesnt suffice, you can add more media servers and distribute the load with your load-balancing cluster.
* If you need more cache storage, you can add dedicated cache servers.
* At any stage, if a cluster isnt performing well, you can add more servers to the cluster.
After a few of these iterations, a large-scale architecture might look like Figure 13-5.
[![](image/5641ace31ae94.png)](http://masteringdjango.com/wp-content/uploads/2015/09/scaling-5.png)
Figure 13-5\. An example large-scale Django setup.
Though weve shown only two or three servers at each level, theres no fundamental limit to how many you can add.
## Performance Tuning
If you have huge amount of money, you can just keep throwing hardware at scaling problems. For the rest of us, though, performance tuning is a must.
Note
Incidentally, if anyone with monstrous gobs of cash is actually reading this book, please consider a substantial donation to the Django Foundation. We accept uncut diamonds and gold ingots, too.
Unfortunately, performance tuning is much more of an art than a science, and it is even more difficult to write about than scaling. If youre serious about deploying a large-scale Django application, you should spend a great deal of time learning how to tune each piece of your stack.
The following sections, though, present a few Django-specific tuning tips weve discovered over the years.
### Theres No Such Thing As Too Much RAM
Even the really expensive RAM is relatively affordable these days. Buy as much RAM as you can possibly afford, and then buy a little bit more.
Faster processors wont improve performance all that much; most Web servers spend up to 90% of their time waiting on disk I/O. As soon as you start swapping, performance will just die. Faster disks might help slightly, but theyre much more expensive than RAM, such that it doesnt really matter.
If you have multiple servers, the first place to put your RAM is in the database server. If you can afford it, get enough RAM to get fit your entire database into memory. This shouldnt be too hard; weve developed a site with more than half a million newspaper articles, and it took under 2GB of space.
Next, max out the RAM on your Web server. The ideal situation is one where neither server swaps ever. If you get to that point, you should be able to withstand most normal traffic.
### Turn Off Keep-Alive
`Keep-Alive` is a feature of HTTP that allows multiple HTTP requests to be served over a single TCP connection, avoiding the TCP setup/teardown overhead.
This looks good at first glance, but it can kill the performance of a Django site. If youre properly serving media from a separate server, each user browsing your site will only request a page from your Django server every ten seconds or so. This leaves HTTP servers waiting around for the next keep-alive request, and an idle HTTP server just consumes RAM that an active one should be using.
### Use memcached
Although Django supports a number of different cache back-ends, none of them even come *close* to being as fast as memcached. If you have a high-traffic site, dont even bother with the other backends go straight to memcached.
### Use memcached Often
Of course, selecting memcached does you no good if you dont actually use it. Chapter 17 is your best friend here: learn how to use Djangos cache framework, and use it everywhere possible. Aggressive, preemptive caching is usually the only thing that will keep a site up under major traffic.
### Join the Conversation
Each piece of the Django stack from Linux to Apache to PostgreSQL or MySQL has an awesome community behind it. If you really want to get that last 1% out of your servers, join the open source communities behind your software and ask for help. Most free-software community members will be happy to help.
And also be sure to join the Django community. Your humble authors are only two members of an incredibly active, growing group of Django developers. Our community has a huge amount of collective experience to offer.
## Whats Next?
The remaining chapters focus on other Django features that you may or may not need, depending on your application. Feel free to read them in any order you choose.