Quantcast
Channel: Data Community DC » python
Viewing all articles
Browse latest Browse all 28

A Tutorial for Deploying a Django Application that Uses Numpy and Scipy to Google Compute Engine Using Apache2 and modwsgi

$
0
0

by Sean Patrick Murphy

Introduction

This longer-than-initially planned article walks one through the process of deploying a non-standard Django application on a virtual instance provisioned not from Amazon Web Services but from Google Compute Engine. This means we will be creating our own virtual machine in the cloud and installing all necessary software to have it serve content, run the Django application, and handle the database all in one. Clearly, I do not expect an overwhelming amount of traffic to this site. Also, note that Google Compute Engine is very different from Google App Engine.

What makes this app “non-standard” is its use of both the Numpy and Scipy packages to perform fast computations. Numpy and Scipy are based on C and Fortran respectively and both have complicated compilation dependencies. Binaries may be available in some cases but are not always available for your preferred deployment environment. Most importantly, these two libraries prevented me from deploying my app to either Google App Engine (GAE) or to Heroku. I’m not saying that it is impossible to deploy Numpy- or Scipy-dependent apps on either service. However, neither service supports apps dependent on both Scipy and Numpy out-of-the-box although a limited amount of Googling suggests it should be possible.

In fact, GAE could have been an ideal solution if I had re-architected the app, separating the Django application from the computational code. I could run the Django application on GAE and allowed it to spin up a GCE instance as needed to perform the computations. One concern with this idea is the latency involved in spinning up the virtual instance for computation. Google Compute Engine instances spring to life quickly but not instantaneously. Maybe I’ll go down this path for version 2.0 if there is a need.

Just in case you are wondering, the Djanogo app in question is here https://github.com/murphsp1/ppi-css.com and the live site is here www.ppi-css.com.

If you have any questions or comments or suggestions, please leave them in the comments section below.

Google Compute Engine (GCE)

I am a giant fan of Google Compute Engine and love the fact that Amazon’s EC2 finally has a strong competitor. With that said, GCE definitely does not have the same number of tutorials or help content available online.

I will assume that you can provision your own instance in GCE either using gcutil at the command line or through the cloud services web interface provided by Google.

Once you have your project up and running, you will need to configure the firewall settings for your project. You can do this at the command line of your local machine using the command line below:

gcutil addfirewall http2 --description="Incoming http allowed." --allowed="tcp:http" --project="XXXXXXXXXXXXXX"

Update the Instance and Install Tools

Next, boot the instance and ssh into it from your local machine. The command line parameters required to ssh in can be daunting but fortunately Google gives you a simple way to copy and paste the command from the web-based cloud console. The command line could look something like this:

gcutil --service_version="v1beta16" --project="XXXXXXXXXX" ssh  --zone="GEOGRAPHIC_ZONE" "INSTANCE_NAME"

Next, we need to update the default packages installed on the GCE instance:

sudo apt-get update
sudo apt-get upgrade

and install some needed development tools:

sudo apt-get --yes install make
sudo apt-get --yes install wget
sudo apt-get --yes install git

and install some basic Python-related tools:

sudo apt-get --yes install python-setuptools
sudo easy_install pip
sudo pip install virtualenv

Note that in many of my sudo apt-get commands I include –yes. This flag just prevents me from having to type “Y” to agree to the file download.

Install Numpy and Scipy (SciPy requires Fortran compiler)

To install SciPy, Python’s general purpose scientific computing library from which my app needs a single function, we need the Fortran compiler:

sudo apt-get --yes install gfortran

and then we need Numpy and Scipy and everything else:

sudo apt-get --yes install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose

Finally, we need to add ProDy, a protein dynamics and sequence analysis package for Python.

sudo pip install prody

Install and Configure the Database (MySQL)

The Django application needs a database and there are many to choose from, most likely either Postgres or MySQL. Here, I went with MySQL for the simple reason was that it took fewer steps to get the MySQL server up and running on the GCE instance than the Postgres server did. I actually run Postgres on my development machine.

sudo apt-get --yes install mysql-server

The installation process should prompt you to create a root password. Please do so for security purposes.

Next, we are going to execute a script to secure the MySQL installation:

mysql_secure_installation

You already have a root password from the installation process but otherwise answer “Y” to every question.

With the DB installed, we now need to create our database for Django (mine is creatively called django_test). Please note that there must not* be a space between “–password=” and your password on the command line.

mysql --user=root --password=INSERT PASSWORD
mysql> create database django_test;
mysql> quit;

Finally for this step we need the MySQL database connector for Python which will be used by our Django app:

sudo apt-get install python-mysqldb

Install the Web Server (Apache2)

You have two main choices for your web server, either the tried and true Apache (now up to version 2+) or nginx. Nginx is supposed to be the new sexy when it comes to web servers but this newness comes at the price of less documentation/tutorials online. Thus, let’s play it safe and go with Apach2.

First Attempt

First things first, we need to install apache2 and mod_wsgi. Mod_wsgi is an Apache HTTP server module that provides a WSGI compliant interface for web applications developed in Python.

sudo apt-get --yes install apache2 libapache2-mod-wsgi

This seems to be causing a good number of problems. In my Django error logs I see:

[Mon Nov 25 13:25:27 2013] [error] [client 108.21.2.20] Premature end of script headers: wsgi.py

and in:

cat /var/log/apache2/error.log

I see things like:

[Sun Nov 24 16:15:02 2013] [warn] mod_wsgi: Compiled for Python/2.7.2+.
[Sun Nov 24 16:15:02 2013] [warn] mod_wsgi: Runtime using Python/2.7.3.

with the occasional segfault:

[Mon Nov 25 00:02:55 2013] [notice] child pid 12532 exit signal Segmentation fault (11)
[Mon Nov 25 00:02:55 2013] [notice] seg fault or similar nasty error detected in the parent process

which is a strong indicator that something isn’t quite working correctly.

Second Attempt

A little bit of Googling suggests that this could be the result of a number of issues with a prebuilt mod_wsgi. The solution seems to be grab the source code and compile it on my GCE instance. To do that, I:

sudo apt-get install apache2-prefork-dev

Now, we need to grab mod_wsgi while ssh’ed into the GCE instance:

wget https://modwsgi.googlecode.com/files/mod_wsgi-3.4.tar.gz
tar -zxvf mod_wsgi-3.4.tar.gz
cd mod_wsgi-3.4
./configure
make
sudo make install

Once mod_wsgi is intalled, the apache server needs to be told about it. On Apache 2, this is done by adding the load declaration and any configuration directives to the /etc/apache2/mods-available/ directory.

The load declaration for the module needs to go on a file named wsgi.load (in the /etc/apache2/mods-available/ directory), which contains only this:

LoadModule wsgi_module /usr/lib/apache2/modules/mod_wsgi.so

Then you have to activate the wsgi module with:

a2enmod wsgi

Note: a2enmod stands for “apache2 enable mod”, this executable create the symlink for you. Actually a2enmod wsgi is equivalent to:

cd /etc/apache2/mods-enabled
ln -s ../mods-available/wsgi.load
ln -s ../mods-available/wsgi.conf # if it exists

Now we need to update the virtual hosts settings on the server. For Debian, this is here:

/etc/apache2/sites-enabled/000-default 

Restart the service:

sudo service apache2 restart 

and also change the owner of the directory on the GCE instance that will contain the files to be served by apache:

sudo chown -R www-data:www-data /var/www

Now that we have gone through all of that, it is nice to see things working. By default, the following page is served by the install:

/usr/share/apache2/default-site/index.html

If you go to the URL of the server (obtainable from the Google Cloud console), you should see a very simple example html page.

Setup the Overall Django Directory Structure on the Remote Server

I have seen many conflicting recommendations in online tutorials about how to best lay out the directory structure of a Django application in development. It would appear that after you have built your first dozen or so Django projects, you start formulating your own opinions and create a standard project structure for yourself.

Obviously, this experiential knowledge is not available to someone building and deploying one of their first sites. And, your directory structure directly impacts yours app’s routings and the daunting-at-first settings.py file. If you move around a few directories, things tend to stop working and the resulting error messages aren’t necessarily the most helpful.

The picture gets even murkier when you go from development to production and I have found much less discussion on best practices here. Luckily, I could ping on my friend Ben Bengfort and tap into his devops knowledge. The directory structure on the remote server looks like this as recommended by Mr. Bengfort.

/var/www/ppi-css.com
/var/www/ppi-css.com/htdocs/static
/var/www/ppi-css.com/htdocs/media
/var/www/ppi-css.com/django
/var/www/ppi-css.com/logs

Apache will see the htdocs directory as the main directory from which to serve files.

/static will contain the collected set of static files (images, css, javascript, and more) and media will contain uploaded documents.

/logs will contain relevant apache log files.

/django will contain the cloned copy of the Django project from Git Hub.

The following shell commands get things setup correctly:

sudo mkdir /var/www/ppi-css.com
sudo mkdir /var/www/ppi-css.com/htdocs
sudo mkdir /var/www/ppi-css.com/htdocs/static
sudo mkdir /var/www/ppi-css.com/htdocs/media
sudo mkdir /var/www/ppi-css.com/django
sudo mkdir /var/www/ppi-css.com/logs
cd /var/www/ppi-css.com/django
sudo git clone https://github.com/murphsp1/ppi-css.com.git

Configuring Apache for Our Django Project

With the directory structure of our Django application sorted, let’s continue configuring apache.

First, let’s disable the default virtual host for apache:

sudo a2dissite default

There will be aliases in the virtual host configuration file that let the apache server know about this structure. Fortunately, I have included the ppi-css.conf file in the repository and it must be moved into position:

sudo cp /var/www/ppi-css.com/django/ppi-css.com/ppi-css.conf /etc/apache2/sites-available/ppi-css.com

Next, we must enable the site using the following command:

sudo a2ensite ppi-css.com

and we must reload the apache2 service (remember this command as you will probably be using it alot)

sudo service apache2 reload

Now, when I restarted or reloaded the apache2 service, I get the following error message:

ERROR MESSAGES:
[....] Restarting web server: apache2apache2: apr_sockaddr_info_get() failed for (none)
apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
 ... waiting apache2: apr_sockaddr_info_get() failed for (none)
apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName
. ok 
Setting up ssl-cert (1.0.32) ...
hostname: Name or service not known

To remove this, I simply added the following line:

ServerName localhost

to the /etc/apache2/apache2.conf file using vi. A quick

sudo service apache2 reload

shows that the error message has been banished.

Install a Few More Python Packages

The Django application contains a few more dependencies that were captured in the requirements file included in the repository. Note that since the installation of Numpy and Scipy has already been taken care of, those lines in the requirements.txt file can be removed.

sudo pip install -r /var/www/ppi-css.com/django/ppi-css.com/requirements.txt

Database Migrations

Before we can perform the needed database migrations, we need to update the database section of settings.py. It should look like below:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql', 
        'USER': 'root',    
        'PASSWORD': 'INSERT_YOUR_PASSWORD',
        'HOST': '',         # Set to empty string for localhost. 
        'PORT': '',         # Set to empty string for default.
    }
}

From the GCE instance, issue the following commands:

python /var/www/ppi-css.com/django/ppi-css.com/manage.py syncdb
python /var/www/ppi-css.com/django/ppi-css.com/manage.py migrate

Deploying Your Static Files

Static files, your css, javascript, images, and other unchanging files, can be problematic for new Django developers. When developing, Django is more than happy to serve your static files for you given their local development server. However, this does not work for production setttings.

The key to this is your settings.py file. In this file, we see:

# Absolute path to the directory static files should be collected to.
# Don't put anything in this directory yourself; store your static files
# in apps' "static/" subdirectories and in STATICFILES_DIRS.
# Example: "/home/media/media.lawrence.com/static/"
STATIC_ROOT = ''      #os.path.join(PROJECT_ROOT, 'static')

# URL prefix for static files.
# Example: "http://media.lawrence.com/static/"
STATIC_URL = '/static/'

# Additional locations of static files
STATICFILES_DIRS = (
    #os.path.join(PROJECT_ROOT,'static'),
    # Put strings here, like "/home/html/static" or "C:/www/django/static".
    # Always use forward slashes, even on Windows.
    # Don't forget to use absolute paths, not relative paths.
)      

For production, STATIC_ROOT must contain the directory where Apache2 will serve static content from. In this case, it should look like this:

STATIC_ROOT = '/var/www/ppi-css.com/htdocs/static'

For development, STATIC_ROOT looked like:

STATIC_ROOT = ''

Next, Django comes with a handy mechanism to round up all of your static files (in the case that they are spread out in separate app directories if you have a number of apps in a single project) and push them to a single parent directory when you go into production.

./manage.py collectstatic

Be very careful when going into production. If any of the directories listed in the STATICFILES_DIRS variable do not exist on your production server, collectstatic will fail and will not do so gracefully. The official Django documentation has a pretty good description of the entire process.

More Settings.py Updates

We aren’t quite done with the settings.py file nand need to update the MEDIA_ROOT variable with the appropriate directory on the server:

MEDIA_ROOT = "/var/www/ppi-css.com/htdocs/media/" #os.path.join(PROJECT_ROOT, 'media')

Next, the ALLOWED_HOSTS variable must be set as shown below when the Django application is run in production mode and not in debug mode:

ALLOWED_HOSTS = [
        '.ppi-css.com', 
        'ppi-css.com',
]

And finally, check to make sure that the paths listed in the wsgi.py reflect the actual paths on the GCE instance.

A Very Nasty Bug

After having gone through through all of that work, I found a strange bug where the website would work fine but then become unresponsive. After extensive Googling, I found the error, best explained below:

Some third party packages for Python which use C extension modules, and this includes scipy and numpy, will only work in the Python main interpreter and cannot be used in sub interpreters as mod_wsgi by default uses. The result can be thread deadlock, incorrect behaviour or processes crashes. These is detailed
here.

The workaround is to force the WSGI application to run in the main interpreter of the process using:

WSGIApplicationGroup %{GLOBAL}

If running multiple WSGI applications on same server, you would want to start investigating using daemon mode because some frameworks don’t allow multiple instances to run in same interpreter. This is the case with Django. Thus use daemon mode so each is in its own process and force each to run in main interpreter of their respective daemon mode process groups.

The ppi-css.conf file with the required changes is now part of the repository.

Some Debugging Hints

Inevitably, things won’t work on your remote server. Obviously leaving your application in Debug mode is ok for only the briefest time while you are trying to deploy but there are other things to check as well.

Is the web server running?

sudo service apache2 status

If it isn’t or you need to restart the server:

sudo service apache2 restart

What do the apache error logs say?

cat /var/log/apache2/error.log
cat /var/www/ppi-css.com/logs/error.log 

Also, it is never a bad idea to log into MySQL and take a look at the django_test database.

Virtual Environment – Where Did It Go?

If you noticed, I did have a requirements.txt file in my project. When I started doing local development on my trusty Mac Book Air, I used virtualenv, an amazing tool. However, I had some difficulties getting Numpy and Scipy properly compiled and included in the virtualenv on my server whereas it was pretty simple to get them up and running in the system’s default Python installation. Conversing with some of my more Django-experienced friends, they reassured me that while this wasn’t a best practice, it wasn’t a mortal sin either.

Getting to Know Git and Git Hub

Git or another code versioning tool is a fact of life for any developer. While the learning curve for the novice may be steep (or vertical), it is essential to climb this mountain as quickly as possible.

As powerful as GIT can be, I found myself using only a few commands on this small project.

First, I used git add with several different flags to stage files before committing. To stage all new and modified files (but not deleted files), use:

git add .

To stage all modified and deleted files (but not new files), use:

git add -u

Or, if you want to be lazy and want to stage everything everytime (new, modified, and deleted files), use:

git add -A

Next, the staged files must be committed and then pushed to GitHub.

git commit -m "insert great string for documentation here"
git push -u origin master

Commands in Local Development Environment

While Django isn’t the most lightweight web framework in Python (hello Flask and others), “launching” the site in the local development environment is pretty simple. Compare the command line commands needed below to the rest of the blog. (Note that I am running OS X 10.9 Mavericks on a Mac Book Air with 8 GB of 1600 MHz DDR 3.)

First, start the local postgres server:

postgres -D /usr/local/var/postgres 

Next start the local development web server using the django-command-extensions that enables debugging of the site in the browser.

python manage.py runserver_plus   

Once a model has changed, we needed to make a migration using South and then apply it with the two commands below:

./manage.py schemamigration DJANGO_APP_NAME --auto
./manage.py migrate DJANGO_APP_NAME   

References

There are a ton of different tutorials out there to help you with all aspects of deployment. Of course, piecing together the relevant parts may take some time and this tutorial was assemble from many different sources.

The post A Tutorial for Deploying a Django Application that Uses Numpy and Scipy to Google Compute Engine Using Apache2 and modwsgi appeared first on Data Community DC.


Viewing all articles
Browse latest Browse all 28

Trending Articles