Version Control for Qlik Projects with Git

By Ben Simmonds, Mon 05 December 2016, in category Business intelligence

git, qlikview

  
@

This blog post is a bit of a summary of a more ongoing project of mine. If you're interested reading more, see my Qlikview Skeleton Repository on GitHub.

Version Control for Qlik Projects with Git

Some kind of version control tool is a pretty common feature of most development work, and building BI tools need not be an exception. Qlik does support a couple of different ways of version controlling your projects - a bit of Googling turns up articles on Git, svn, and Microsoft TFS. However a lot of the results you'll find are from 2014 or even longer ago.

I thought I'd summarise my own research into using my favourite version control tool, Git, with Qlikview (version 11 at the time of writing), and what has worked for me personally. I've had some really useful results from using Git in my Qlik development work, but I also still haven’t arrived at what I feel is a "complete" solution, so I'll cover the basics, as well as some of the challenges I'm thinking about.

What is version control/Git anyway?

There are better places to get a introduction to Git and version control than I can manage in a short blog post, but I'll summarise here. Git (and other version control tools) store the history of your project (a project being no more than a directory where you keep the files that make up your work) as you work on it. Any time you make a change to your project, you commit those changes, and Git takes a snapshot of what your project looked like at that time.

You can also push (upload) these changes to a remote repository, which acts as a backup of your work, and as somewhere other team members can pull (download) the latest state of the project, and push their own changes to as well for you to pull down. Without going into detail this allows you to do some cool stuff:

Getting to grips with everything Git has to offer can be a bit of a learning curve (one I'm still on myself), but getting some basic benefits out of Git is actually pretty easy. If you're interested in learning more, try out this interactive tutorial in your browser. Code School also has a free short course on Git that's really worthwhile.

How can I apply this to Qlikview?

This is where we need to start thinking about what we need to track, what we don't, and how best to accomplish our aims. Let's start with the assumption that a typical Qlikview project is made up of a few components. In very broad terms, a Qlikiew project breaks down as follows:

Don't Track Data

The first decision we need to make is not to track the data itself. The data that goes into your project is probably the single biggest component going by space, and it's also usually either static (in a flat file) or external (in a database). Either way you probably aren't manually editing it on a regular basis.

Cutting it out of the equation with respect to version control therefore saves us both disk space, and the mental overhead tracking a bunch of data files that are most likely either a) from a database that you just grab an extract from, b) generated by our ETL process from a), or c) an excel file that someone sent you in an email six months ago that you've already deleted... [cough]... I mean securely backed up elsewhere.

How to Track the Rest

That leaves us with our load script and the front end of our application. Here is where we run into our first hurdle - Git loves text files. It can handle binary files fine, but you lose a lot of the most useful features of Git - like the ability to track and merge changes line by line. Qlik on the other hand bundles up the data, load script, and layout in a single, binary artefact - the qvw.

This is great for quickly transferring everything you need for your dashboard as a single file, but not so great for Git. Not only does it contain a bunch of data we don't want to track, but it's also bundled up inside an inscrutable, proprietary binary file. We need some way of breaking the qvw down into it's constituent parts so that we can track them individually with Git. The way to do this is using Qlikview's -prj folder feature.

Let's say you have a qvw file called Dashboard.qvw, if you create a folder in the same directory named Dashboard-prj, and re-save your qvw, you'll see this folder fill up with files. This files mirror the structure of the qvw file - the objects, sheets, load script etcetera. Mostly it saves them in machine (and fairly human) readable xml files. You can then track these with Git. After doing this with your project, you might have a directory that looks like this.

MyProject
  |-- App.qvw
  |-- App-prj
  |     |-- Lots of files!
  |-- Data.xlsx
  |--.gitignore 

To start tracking your project with Git, you just need to a) write a quick .gitignore text file to tell Git not to track files you don't want it to (i.e data files), and b) initialize your directory (MyProject in the example above) as a Git repository. The contents of our .gitignore file would look like this:

# A .gitignore file lets you define file types you don't want to track. The '*' character acts as a wildcard.
# This is a comment by the way.

# Dont track data files:
*.qvd
*.xlsx

# Don’t track qvw's (our prj folder takes care of that).
*.qvw

To initialize your Git repository, open up a terminal (for example windows cmd, or better still Git bash) and navigate to your project directory.

 > cd D:/Path/To/MyProject

Then add your files (the -A option here just adds anything not filtered out by gitignore, but you can add files by name too).

> git add -A

And make your first commit (with a meaningful commit message)!

> git commit -m "Initialised project, added a basic dashboard and load script."

Thus begins the cycle: make some changes, commit them, share your work, and repeat. Of course there's more to it than that, but those are the basics.

Challenges

I'm going to skip ahead here to where I am currently with Qliview projects that I've used Git with. There are things that work well, and things that don't. If this article has been your introduction to Git, the below might not make much sense.

The Good

The Bad

The Ugly

Version Control 'Light'

Because of some of the points above, I haven’t started using Git for most of my day-to-day Qlikview development. However, I do have a half-way-house strategy that I've been using for a little while now that has paid dividends nonetheless. I work with a couple of colleagues on an application with a fairly involved ETL layer - lots of different data sources, a good amount of business logic - and it can be tricky to keep abreast of all the changes that have been made when it comes time to promote our work up to our production environment.

To make this easier, without version controlling our whole project, I wrote a little vba script that opens each qvw file in the project and saves the load script to a text file in a separate directory. This directory is then set up as a Git repository, which I can then tag each of our releases in, inspect what changes have been made since the last release, and make sure that any updated files get promoted, even if I might not otherwise have been aware they'd changed. Whilst this doesn't feed back into our main development files, it still provides a really useful reference (and is also way easier to search through for a field that I can't remember the origin of, or a filepath that's moved affecting several different scripts).

Further Reading