Being a data scientist often means wasting many times on repetitive tasks and deployments into production environments is one of them. This is involved into the typical (boring) development life cycle:

  1. Write some code;
  2. Test all your code;
  3. Commit on your version control system (ie. git);
  4. Upload on production server;
  5. Test it again, you know, just in case;
  6. Wonder how the hell that nasty bug was able to sneak among all the tests;
  7. Go to 1 and restart.

This is ideally going to be done manually on each update to the code…again, again and again. So, the computer engineer that is in me started thinking about a solution to automate this tedious process. In this tutorial I’m going to explain a solution which can be used in a very wide range of applications. In facts, I use it to deploy my R scripts and packages in production, but also for the WordPress theme for the website you’re looking into, to automatically build my markdown docs into Github Pages and so on. Applications are endless.

In this tutorial I’m going to host the code on a public Github repository and the free version of Travis CI. This means that, in theory, your source code must be opensourced for this trick to work for you, unless your going to use the paid version of Travis CI, which allow to source private repositories. But…well…if you don’t believe in Karma, there’s a 100% black hat solution which allows you to use private repos on free Travis CI, but this is not allowed by Travis CI terms and conditions, so don’t do that. Really, don’t.

Anyway, what’s we’re going to do here is to take the old school development life cycle and turn it into a modern continuous integration life cycle:

  1. Write some code;
  2. Commit on your version control system;
  3. Sit down and watch the continuous integration server building the code, executing tests and deploying the fresh code into production environment;
  4. Good Job! You deserve a beer! Then, go to 1 and improve your code.

There are very good continuous integration solutions like Capistrano, over there, but their setup is often overkill for the typical user, and we don’t need so much power for small projects, after all. As a matter of facts, mere mortals need an easier solution, something doable in 15 minutes without much hassle. So, start the timer now, you’re on the way!

Production server set-up

First of all, we’re going to set-up the production environment. The idea here is to add a local Git repository to let Travis CI push the code into. For Travis CI to log in into the production server, we need to set up a public key authentication. In your local machine, create e private/public authentication key with this command (in Windows, you can use Puttygen, or you can provision a data science VM in a heartbeat with Vagrant):

Don’t use the default filename id_rsa because other apps prefer to use this default. Use a unique name instead ( deploy-key would work well, here). When prompted for a paraphrase, skip it with Enter. At the end of the procedure, you’re going to have a public/private pair in .ssh directory: deploy-key and deploy-key.pub. Install it into the production server with ssh-copy-id.

This command will copy  deploy-key.pub file into  .ssh/authorized_keys. Please note that the  <remote server user> is the one who is going to make the deploy on the production server, so please be sure to use a user who has access to the working tree.
Automatic Deployment ProTip - Enable SSH to read authorize_keys file

Try to connect to your server using the command ssh <remote server user>@<remote server address> -i ~/.ssh/deploy-key. You should login into the production server using the public key now (without entering a password).

Creating a Git Repository on Production Server

Since you’ve started reading this tutorial, I assume you have a local git repository for your code and a Github remote to push the files into. The idea here is almost the same: we’re going to create a bare git repository on the Production server. Travis CI will pull the files from Github and push them into the production server git repo through ssh. A good place for these local repositories could be  ~/github.

We have created a bare repository for security reasons. This way we will be later be able to set the working (checkout) directory outside the repository itself, thus not exposing the .git folder to the open world.

Thanks to Travis CI, the Github repo and the production repo will be always keep in sync, but we still need to automate the deploy to the production folder. The best way to solve this is activating the  post-release git hook: a shell script which is automatically executed by git after a push and that basically makes a checkout of the repo (and all its submodules) into a specific dir.

We don’t even need to create this script from scratch, as I’ve found a great one on Gist that needs just some minor changes.

If you have submodules in your git repo, add these lines just before the done near the see you soon… line, near the end of the script to make them working:

(for the most curious of you, the sed commands are for possibly converting the origin URLs for submodules from ssh to https, as Travis CI can only access public submodules).

Of course, don’t forget to also change the value of the DEPLOY_ROOT variable to your need (line 34). You may eventually edit or comment line 89, if you prefer to not echo on Travis CI log (which is public) some potentially sensitive data.

Travis CI Setup

Finally, we need to set up a Travis CI environment. If you use Travis already, you should know that you can add a Github repo automatically from your Travis control panel. A Travis VM is then automatically spawned when a push comes to the repository following the directives in the   .travis.yml file located into the root of the repository.

In our case, these operations will be the ones needed to push the code into the production repository (where the post-receive hook will do the rest). But for this to work, we must give Travis CI the private deploy-key  used to get access to the production server on our behalf..

Allow Travis CI to authenticate against the Production Server

First of all, create a file config.txt into the root of your repository to tell ssh which rsa key to use while connecting to a given host as a specific user:

This step is very often skipped and causes Travis CI to not be able to use the deploy-key and thus to ask you for the git password during execution. Don’t forget that Travis CI is headless, so you won’t be able to enter a password there. That’s why we are using the rsa key pair authentication in the first place.

Travis CI cannot access files outside your repository, which is public (with the free version, at least). Of course, you don’t want to commit your private key on your repository (everyone will be able to get access to to your server with the private key!). Hopefully, Travis has a command-line tool that provides a way to encrypt sensitive information and files. Execute this from the root of your repository, so a  .travis.yml will be created in place (for when you haven’t one already, it will just updated otherwise).

I had to create a .tar file with the two files I need for authentication because the decryption key is invalidated by Travis CI after the first use for security reasons. This means you won’t be able to restore any file after the first on Travis. On the flip side, creating one tar archive with multiple files will overcome this limit. You can untar from Travis and place the original files when you need.

Now, you should see a .travis.yml file into the repo docroot file which have references the encoded vars (they will be available as environmental vars once decrypted on Travis) and the stuff.tar.enc file (it will automatically restored in place). These are all safe to be committed, as they are rsa-protected with a private key hosted on Travis CI (that’s why you need to travis login  before).

Ok, now that all the authentication needs are fulfilled, let’s go creating the real build&deploy stuff.

Travis CI Configuration Scripts

The first file to set up is .travis.yml . I’m going to paste my one for us to scroll and comment.

  • Lines 7-10 have encoded references to the four variables with set at command lines ( SERVER_ADDRESS, SERVER_USER…).
  • Line 12 will prevent a warning which would expose sensitive data into the log, while line 14 will disable git submodules. I probably need to change origin from ssh to https before performing the submodules checkout by hand (lines 16-17).
  • Lines 18-20 will decrypt and untar stuff.tar.enc , then a bounce of shell scripts from the _scripts directory are executed at various stages of Travis VM execution (lines 21+).

While build.sh and test.sh could be empty sometimes (and they are actually, but I keep them on to reserve the ability to fill them later during the project life cycle), I’m going to focus on install.sh and deploy.sh, instead. Let’s have a look at the first:

As you may see, the install script simply puts the files needed for the authentication process where they are supposed to be. Note that you can comment line 2 to prevent output on  stdout, but if you pay attention to not log sensitive data, you can leave it on for better debugging.

In the deploy script, we simply create a volatile git repository on the Travis instance. Then, we add a remote pointing to the production server (using the secure vars we set before) and finally commit and push on that. Note that this happens only when the branch being committed was the master one. I think this is the way this should work, but if you need a different behaviour you can change line 3 accordingly.

Conclusions

With few simple steps we were able to automate a lot of tedious task using only free platforms and tools, in a robust and reliable way thanks to rsa authentication, perfectly fitting to data science tasks, but generally usable. The only limit of this technique is that the code being pushed must be hosted on Github, thus be open source. But technically speaking you can*CHOUGH COUGH*…bind a private git submodule to a public bait repository and give Travis the key to checkout it. But as I said, you must be really evil to do so and I don’t like evilness (and neither Travis CI likes, actually). Wanna see this in action? Have a look to this Github repository and this Travis CI instanceAloha and cya next!

 

Share This