I’m pleased to publish an article I have been working on for the last few weeks. It is, in essence, an overview of version control, but I start from the premise that it is badly misunderstood by a large number of technology practitioners. Because it is badly misunderstood, I strive to address the “justifications” I have heard as to why version control has not been used, and why those arguments are largely incorrect.

I make no apologies for being perhaps a little aggressive in some of my arguments: I honestly feel that version/source control is as vital a part of the software development cycle as hiring good developers, making good backups, and delivering the requirements to the business. But many people, including senior employees, can put everything at risk when a version control solution is either ignored, not adopted, or - just as badly - not enforced.

Version control is not, and never has been a panacea. To the initiated, my arguments may well seem something of a statement of the obvious. But I’d simply ask that you conduct a straw pole of technologists in your organisation or social circle. How many of them, from programmers to system administrators to managers, are enthusiastic towards version control? If it’s more than twenty percent, I’d be surprised.

So my article provides arguments for and justifications towards version control. It will hopefully be a useful resource for both the casual reader as well as a help to any proponent of source control inside an organisation, struggling against the sort of apathy or hostility I have frequently encountered.

I look forward to any feedback of course. A big thanks are due to friends who helped with comments on the article.

A printable PDF version is available at the base of the article.

Introduction

The conjecture in this article is that version control is a powerful approach that is misunderstood by a large proportion of the working IT population. From the entire community of IT professionals supporting a businesses activities, it is my experience that as many as eighty percent of an organisations technologists simply hasn’t heard of, doesn’t like, doesn’t feel it’s appropriate, have had problems with or simply don’t have time for, a version control solution.

In my experience as a systems administrator who advocates version control as the best tool for managing many IT activities, whether development, configuration file management, or creative development the benefits far outweigh the difficulties. Frequently I encounter users who have simply not had exposure to the approach, but more often I meet users who have had limited exposure but it’s been complicated by poor support, training, and implementation. More frequently than I’d like, they can just take the attitude that it’s not appropriate in their particular context. This ignores the substantial risks that can be posed to a business through unmanaged releases, fluid baselines, premature feature releases, reduced accountability and inability to effectively respond to urgent problems.

This is, to me, immensely puzzling. So much of an organisation’s business relies heavily on Information Technology, and so much of that is data, and effort put in by staff at computer workstations. We have databases for business critical data, filing cabinets for office workers, but more often than not we have a mish-mash of solutions for managing IT source code, configuration files and documentation. Huge reliance is placed on large, inflexible, misunderstood and often unreliable backup solutions which - at best - provide a very coarse rollback granularity, and certainly provide no means for the organisation to analyse the changes that have occured to their systems.

Everything from the source code for a business solution, to a critical management report, a firewall configuration file, a web server configuration or database definitions is vital for a part of the business, or a critical service on which it depends. Yet it is often this information that is treated with most disdain. Trust placed on periodic backups with only occasional testing for reliability is often inappropriate for frequently changing files and requirements.

The purpose of this article is therefore to provide an outline of what version control is, and how it can benefit any organisation. It seeks to tackle head on the reasons given for NOT using version control, and demonstrate they are short-sighted views often based on misunderstandings of key concepts. Where problems do exist, the article suggests they can be resolved through simple communication, team work and a renewed effort to understand tools.

What is version control?

In its simplest form version control (or often, and interchangeably, source control) is purely a mechanism for tracking changes to a file. This could be a rarely changing configuration file, or a more important file in a system, or some key reference data. The goal in this scenario is merely to ensure that change can be accounted for, backtracked easily and the changes that have occurred over time understood. This is more properly known as “revision control”.

As soon as the number of related files grows, particularly in the context of a software engineering project, the complexities can grow. Changes need to be grouped together in particular ways and builds, versions, releases and concurrency issues all need to be managed. This is more properly “version control”: Managing the complexities and evolution of a particular group of files over a period of time.

Before we embark on an analysis of the version control approach, it is worth looking to understand the alternative approaches used by many in freely and loosely managed software development projects. This varies from the simple “stick it in one place”, to more involved processes that simply evolve over time. The general backdrop is version control as applied to software source code, but could just as easily apply to documents, images, configuration files or business data.

One golden repository will do

The simple solution, more often practised than many would prefer to admit, is the idea that a user’s main computer hosts all code, and that is simply the “golden repository” (a master resting place for the project files). If a change is necessary, then the change is made on that machine, by that user, and the results shipped out as appropriate.

It is easy to see the attraction to this approach: very easy to set-up, maintain, and backups are “just” a case of backing up the machine on a regular basis.

But what about the problems that can occur? What if the machine fails and backups weren’t automatic? If a virus infects the computer and damages (rather than deletes) the files, corrupting backups in the process? What if a customer wants to “rollback” to a previous version, or some functionality is discovered to not work anymore? How is this managed, how would investigations take place with this solution?

The short answer is it can be very difficult: When left to their own devices most people simply forget about backups. Even if a writable/removable media device is available, it is very easy to forget to load media, or change tapes: backups can quickly become useless. That is exactly why good system administrators spend time setting up an infrastructure to do everything automatically: we are all fallible to forgetting after all, administrators included.

Even if backups are taken care of, what if a rollback is required to uncover why some functionality has stopped working? To demonstrate some piece of code that is currently in pieces undergoing some refactoring? Without a structured way of accessing the changes that have been made this becomes a difficult process to go through. Try finding or remembering when a particular change was made to a particular function or sub-component more than a few weeks ago. Was that change related to anything else in the code base? Can it be rolled back without causing further problems?

What if there are many users adopting this approach? With lots of golden repositories, each for a particular part of the project, failure can rapidly become orders of magnitude more severe if the backups are not made consistently well. Lose just one system and the impact is likely to be felt at least project wide, at worst business wide.

The reality here is a single golden repository, possibly even WITH simple external backups, cannot provide many of the basic facilities that are necessary to do more than move along a simple development path, ALWAYS shipping new features, never revisiting problems. Because that process is very difficult, a single golden repository approach is not the approach to use for anything more than projects of a few hours duration.

  • Positives: Easy to setup; Quick compilation;

  • Negatives: Backups can be forgotten; Rollbacks difficult; Differential analysis impossible; Does not scale to two or more users; Fixing issues retrospectively extremely difficult.

A central shared drive

The next best thing, perhaps, is to put all of the code on a central shared drive. Something that is available, backed up and reliable. Surely that resolves many of the problems of having all the code in one golden repository that is fallible to failure and forgetfulness?

Whilst it may be true that it is marginally better than a single golden repository, it is really not that much better for anything else. In fact, a shared drive can become a problem to work with as they are often much MUCH slower than local drives, so compile times can suffer. More problematic still is when more than one person is working on the code. Rather than just copying around chunks of code for use by local users, it might happen that people overlap and overwrite each other’s work. Not good, especially if you do not realise it is happening!

What is most problematic however is moving back to previous revisions and releases, potentially even applying changes to that particular point.

Imagine the following scenario: Having demonstrated a recent build, it becomes clear that there are some minor bugs that marketing could do without having to explain or workaround. They want them fixed. It is made worse still if you are resource starved, with only a single machine for your purposes. Stopping work underway, and reverting to a previous build can be very involved and troublesome.

As to fixing problems, and then retrofitting them to the main code-line, this can rapidly become a very troublesome exercise. Whilst not insurmountable, success is usually dependent on clear code, well delineated problems and fixes, and a well understood source code base. As many projects have neither of these in large quantities, the task can quickly become time consuming and frustrating.

A last point to make too, we’ve assumed that backups are being made. But even if they are being made, do you understand the retention policy for backups? Some organisations recycle tapes, and only make backups to recover from accidental and recent deletion. As time passes, some backup cuts become obsolete and are removed. It is therefore important to understand the granularity of backups, and the retention period. All backups are not the same!

  • Positives: Marginally more reliable assuming administrator managed backups;

  • Negatives: Rollbacks difficult; Differential analysis impossible; Requires care for multiple users; Issue fixing just as difficult.

Structured directory Structure

Rather than a single shared drive, a structured directory structure often appears - whether local or shared is irrelevant - for each user. By introducing naming structures for releases and repositories, it can overcome some of the concurrency and rollback issues identified earlier.

For example, when some code is ready, Developer A puts their code in a directory “ReleaseN/devA”, and developer B in “ReleaseN/devB”. Whilst careful segregation would work, it does not overcome the issues of incorporating changes, identifying overlaps and merging changes. It requires segmentation of work, and good communication. And if something more complex was required that necessitated all but a certain developers changes, a demonstration of the client code but not the server code, but the merge/release has been taken care of? Unwinding the work they submitted, or remerging other submissions?

The reality is that whilst this approach is better than a single golden repository, it is still some way from a useful approach for managing the process by which software - in any effort - develops. It clearly can be done, but a lot of unnecessary work and adherence to standards is necessary.

  • Positives: Enables release management; segregation of users efforts

  • Negatives: Requires careful effort to integrate changes; Differential analysis difficult; Standards must be followed; Careful communication.

It should be clear at this point that an automated approach is really necessary to draw out the positives of each approach, and approach the negatives in as strong a way as possible.

Enter version control…

Version - or source - control is therefore a tool that:

  • Enables changes to a particular file to be tracked
  • Enables multiple changes to be bundled together in various ways
  • Enables multiple changes in various ways be defined as larger “releases”.
  • Enables multiple users to work together constructively, and with ease
  • Assists users in bringing their changes together, and identifies overlapping
  • Enables previous releases to be accessible
  • Stores data in a space efficient way.
  • Takes the manual steps of managing files, and code away from people
  • Makes it easy to backup all critical data for all users, in one go.

All this makes sense, but it should also be EASY TO UNDERSTAND. Version control tools are - after all - written by developers, for developers. A developer who can understand the nuances of a programming language should also be able to appreciate the subtleties of how a software project works in reality. Regardless of the particular management-speak approach du jour of software projects, code is still - and always will be - code. There are only so many ways that code changes over time, and only so many ways that developers interacts with each other.

The architecture of a version control system

There are a few basic components of a version control system, regardless of the actual implementation:

  • Central repositories
  • Local working copies
  • Developer/User tools
  • Communication

Repositories

In almost all version control systems, there is a solitary central repository. This is the master repository for all files and meta-data. It is manifestly NOT the same as the golden repository we talked about earlier, for that was a raw copy of source code that was under the control of the user. Here, the repository is generally not accessible to the user: It is much like a real-world document archive or safe where master copies are stored, only accessed when necessary.

The repository stores all of the data that is contained within files, as well as valuable meta-information containing modifications applied, number of checked out copies, file ownership and other information. It is accessed by all the users, and a backup of the master repository is vital but ensures a copy is made of ALL files, which by merit of meta-information, includes historic releases.

Local working copies

If the central repository is the master resting place of a single file, then the working copies are where they are actively put to work. A file that is in a local working copy can be changed by a user. They will have complete control over their local working copy in just the same was as in the “golden repository” we talked about earlier. Each file is clearly under the auspices of the version control system, and so can be reintegrated to the main repository at a later date without additional work.

Local working copies are likely to be different for each user. Depending on how the version control tool is put to use, changes to working copies may benefit from frequent back ups if they are not frequently integrated.

They also provide an excellent “sandbox”, in that changes can rapidly and be rolled back to any level of granularity. On discovering an entirely inappropriate approach has been used, or that an input file has been corrupted or changed, it is easy to revert to the last committed copy. Frequency of commits is something that is decided on by the user, and the group as a whole based as part of their general version control policy.

Developer/User tools

The user tools are the means by which a local working copy is updated from the central repository, and vice versa. When a user has finished making changes to a local copy, they would want to put that change back into the repository - either because they have reached a point where they would like it made safe, or they need to incorporate their changes with another developer.

Sometimes if two users have separately worked on a piece of code, remediation work is necessary to make sure their changes are both taken forward. This can often be where developers become frustrated with version control, because complex merging of involved code is less than enjoyable.

But it is worth remembering that even outside of the context of version control, merging is still necessary where there is more than one user. There are plenty of tools to help in this regard, including simply file copying. But where version control really comes into its own is the identification of conflicts and clashes. This helps to ensure that changes do not go missing, and even in the event of a poor merge exercise, it is possible to replay the process without risk.

Communication

As soon as there are multiple users on a project - true for all but the smallest of projects - it becomes important that frequent and even informal communication occurs. Better a “I’m working on blah and blah” than “Oh. You’ve made changes to blah and blah!?” just as users rush to integrate changes ahead of a deadline, or are finishing work for the day.

This is one of the most undervalued parts of any version control system. When users work in unity, discussing their workloads, it almost always saves a great deal of time and frustration later. Changes are better managed, often managing to avoid the need for difficult merges at all, and invariably a better working atmosphere.

It should be obvious to anybody who has worked in such an environment that dynamics within teams are complex. “Religious” differences can quickly undermine the best efforts of any team effort (Where should curly braces be placed? Which is better, C# or Java? If you didn’t before, you get it now!). Outright loathing of an approach or toolset by just one member can seriously disrupt efforts - especially if they ignore a team decision. Striving for agreement and putting aside differences are the hallmarks of good teamwork.

Variations

The outline of components above is just that, a general outline and overview. There are notable situations where some of these components are not present. For example, some version control solutions support multiple master repositories (Bitkeeper, for instance), or where communication does not exist or is difficult due to a geographically dispersed team.

Tools vary quite substantially too with some integrated into IDE’s and others with separate command line operations. However the principles outlined should still be generally valid and serve to demonstrate the underlying concepts that are important.

What happens

There are several basic operations that occur with a version control system. In the rough order that they are likely to happen to any new setup, they are:

  • Adding files
  • Changes are committed
  • Changes are checked out
  • Changes are merged
  • Tags are applied
  • Releases are made
  • Bug fixes to previous releases are made

Adding files

This is similar to “checkin/commit” in some scenarios, but for our purposes we will treat them as distinct. With a new version control system installed, of course the first thing that must be done is to put files into the system. As the files are added, the central repository will make changes behind the scenes to allocate initial version numbers, ownership, permissions and other meta-operations.

Files of course can and should be held in directories appropriate to the task/language/environment at hand. In fact, no matter how complex the hierarchy, the version control system should be able to manage the structure effectively as part of the system.

“Commit”

As the project moves forward, particular pieces of the software and files will reach various points of “completeness”. This might be the end of a days work, or when a particular bug has been resolved. At this point, changes need to be made permanent in some way. This is the commit step. The local changes made by a use is pushed back into the central repository, such that they then receive an incremental - and unique - version identifier.

Once the changes have been committed it then becomes possible to roll back a particular file to a particular revision. This save point is permanent for the duration of the project, and it can be as fine or coarse grained as the user wishes. If a feature was known to be available in version 1.4, and the current release is 1.7, it is straightforward to move backwards to the earlier release temporarily.

Structuring any comments associated with commits (as well as tags, dealt with shortly) can also increase the ease by which a particular change can be identified, or a particular piece of work or resolution identified.

Working practices are of course flexible depending on preferences of the team members. Some prefer only to commit when they a release is required, whilst others do so more frequently. Ultimately it is purely a matter of team preference, and a frequency is adopted that best suits the purposes of those involved.

Check-out

There is little point to it all when the changes cannot be retrieved, and this is just what a check-out is, and allows either the latest release that has been committed to be extracted, or a particular tag or version to be retrieved for analysis or further work.

Some systems enable “exclusive locks” to be placed on files, and a check-out can become confusing in this regard through this dual use. It is therefore very important to understand the distinctions, and any locking implications within in the system in use for general operations.

Tag

As soon as more than a handful of files are present in a version control system, it can become very difficult to keep track of which version of a particular file is associated with a particular release of the system. A logical identifier, or tag, can then be associated with a group of file revisions.

Specified as part of a committed change, a tag can associate some identifier that indicates the release, its purpose, and its increment, or just an arbitrary code-name from management. It might be as simple as “betarelease” or as involved as “betabuild53_release1″. Whatever the convention, the result is the same: A mark in the sand which brings all the changes of a particular set of files together. This might apply to a small component of a project, or the project as a whole.

At a later date a check-out can be carried out specifying the tag, and the release of the software at that particular point is retrieved. This is where users can benefit from multiple local working copies, such that various versions are available to them at a particular time.

Tagging is an amazingly simple idea but hugely under-utilised in many smaller software development projects. Whilst many users are happy using the basics of file versioning as a means to roll-back, it becomes an order of magnitude more sophisticated with the introduction of tags. If the tags are made directly relevant to a ticketing or change control system, it can be enhanced yet further.

Merge

With two users working separately on a file, and wishing to commit the changes, it is necessary to bring their changes together to ensure their changes both feature in the master copy. Whilst the first user to commit may not have a problem, the second user may well do. Perhaps a constant value is adjusted in distinct ways, or a function modified in different ways.

The process of bringing the changes together is merging, and can become quite a difficult exercise if many merges are required, or if it has been left for too long. Some of the solutions to this are simply to merge more frequently, communicate more effectively with colleagues, or embrace branching. However it is managed the changes are brought together to ensure the “tip” release of a file represents the efforts of all users.

Releases

Tags are useful for retrieving a particular release: Management ask for another demonstration build but with a certain bug fixed or necessary feature added. If the code has moved on even a little since the build this might be difficult outside of version control. Simply identify the release associated with the particular tag (agreed naming conventions can help clarify here) and check it out.

It can be useful to remember that there is usually no limitation to the number of local working copies of the source code that can be held by a particular user. As well as perhaps a local copy of the “tip” version that represents the “state of the art” of the project, a number of other copies that contain previous master builds, test releases and so forth can make retrospective work such as this comparatively straight forward.

Branching

In the context of version control, no word is known to cause more panicked and pained expressions than “Branching”. It is often considered to be an anathema.

With the previous example of a demonstration release, a particular bug needs to be fixed. It was fixed in the tip release, but that is unsuitable for demonstration use right now. So it needs to be retrospectively fixed. The solution is branching. By making a retrospective branch, the code in a working local repository is taken back to a particular point, and then a branch added to enable the code to move in a specific - new - direction. In this case fixing a particular bug. With the fix applied to the branch, a new build is made and provided for demonstration.

So why the fear? The reason is most likely associated with branches at the tip of the source tree. Various developers, all working on their code for their particular parts of a project, all needing to bring their code into the main tree can then be faced with a complex merge exercise.

The solution proposed in this paper is to adopt a subtly different methodology. Rather than mayhem at the tip of the source tree, users adopt a similar approach to that used for retrofitting a bugfix. EVERYTHING occurs on branches, even new features. And every developer is then responsible for merging their code back into the main tree only when their features are finished and complete. This is sometimes known as “merging forward”.

So from the perspective of the main tree, the code advances only with complete features or bug fixes. A user therefore has the situation that because their work is isolated to a branch which they are in entire control on, other components are not changing without their realisation. They can and probably SHOULD merge in the main, stable, head code into their branch as they are ready to accept new features developed (and tested) by others. Only once their changes are ready is it merged into the main tree, and communicating at all times with their colleagues.

Where a particular deadline occurs which necessitates a number of merges in close proximity with each other it is of course important to proceed with care. Dumping changes and running away, without explanation, can result in changes needing more complex merging by a user who may not understand the subtleties. But a merge of self-contained (and working) branches into the tip is much easier to manage than a mish-mash of changes. With tags on the tip too to reflect the incremental change in functionality.

This is, of course, where the most powerful part of extreme programming is useful. Write test cases, and only commit when they are working. It is then much easier to merge in working feature sets or bug fixes, than it is to merge in a nights changes, because the context is clear.

Where is this useful for the solitary user, or small team? You always know where your changes are. The tip is always the most complete version, the branches always where the work-in-progress is. Finding an approach or feature just is not working? Back out to the last good tag, and branch. Urgently needing a working release to demonstrate the state-of-the-art product? Go back to the last good demo tag, and check it out.

The bottom line in all of this is that branching brings a wealth of power to the user that outweigh the perceived complexities. Given a little effort, preferably even a testbed, it is quickly apparent that this really is not rocket science. Most importantly of all it enables users to focus on new features, safe in the knowledge they can roll back, add bug fixes retrospectively, and simply develop more efficiently, with only a small overhead.

None of the above is necessarily without complexities. Tight deadlines and large numbers of near-simultaneous commits can still result in merge mayhem. Developing on the tip is possibly practical in emergency fix scenarios where the head of the source tree reflects a production or test environment. However, it is an axiom of development environments that what works well in one environment may not work well in another. The most important item to take away from this is that there are approaches to every perceived problem, but that the version control approach is much better than a free-for-all environment outside of any constraints.

Advanced operations

There are of course a number of more advanced topics the above skirts over, which are in themselves cause for confusion. Renaming or moving files inside CVS is an excellent example, especially if the historical changes need to be maintained. CVS simply does not cleanly support the process, and it is entirely likely that people may wish to perform such an action. It is impossible to give examples for each and every scenario as it will differ between solutions and versions used. But many of the complex operations can be addressed with careful and considered changes to the master repository, or through use of administrator privilege or tools.

Bear in mind too that no solution is without its problems, and it is encouraging to note that there are efforts underway in the open-source community to address many of them. Subversion, for example, seeks to be a replacement for CVS addressing many of the identified short-comings.

So now what?

Most, if not all, users of computer systems will have heard of version control and be aware of the concepts outlined above. But actual enthusiasm for version control remains with a minority.

This is concerning because the advantages are so hugely in favour of the approach, there must be other reasons for it not be taken up. In my time as a proponent of version control, I have heard some great excuses, but none ever stands up to scrutiny.

Here are some of the steps that can be taken, problems that are often raised, and suggested rebuffs to criticism or attitudes.

Setting up your own solution

It is all very well to write an article espousing the argument that version control is the best approach to use, but what if there is no solution available at your place of work? What if the administration team has not implemented it, or does not have the time to do so?

So what is the solution? If you are convinced that version control effectively implemented could make your job easier, more enjoyable and more productive, it is important to do something about it. Of course, you could go on as you are, agreeing with all of the risks I have outlined above. But if you have got this far then you probably already know it is the wrong attitude. So there are two options:

  • Do it yourself
  • Nag your SA/Manager to ensure it gets done

Assuming the second option just is not likely, it might be a surprise to hear that doing it yourself is actually rather easy. Even if you install the software on your own computer rather than a backed up server, it is really a major step forward over running a golden repository. You get all of the benefits outlined, but the only thing that needs to be done is ensure backups are being made - and that can be automated.

The first place to look for the easiest and quickest solution, given you are probably on a PC platform, is Cygwin. Download the Cygwin toolset, and get the CVS server installed. If you would rather an easier life, then check out the CVSNT project ( freshmeat.net/projects/cvsnt/ ).

If you have access to a Linux server, then it will probably come with a CVS server either as a package or installed by default. Check it out, and get it installed.

The critical thing though is to make sure the repository you define is properly backed up. If the repository is on a network share then so much the better, that will be taken care of automatically (and it will just slow down checkout/checkins a little bit).

Integrating with an IDE

Many software development tools come integrated with various version control systems. However it is important to remember that these tools are there to (in theory) MAKE YOUR JOB EASIER. If the tools that are provided in the package are not doing that then you need to either review how you are using those tools, or CHANGE THE TOOLS.

It is also not unheard of that an organisation might prefer a version control tool that differs from the one supported by the IDE currently in use. Some version control products are proprietary, and cannot be integrated with certain IDE’s. Whilst this might be an initial inconvenience, it is important to separate the Software development process from the version control process.

IDE’s and version control are closely related, but if the hill to be climbed is “the tools I use don’t work very well”, then review the tools and take stock of the situation. Version control IS EASY, but some tools make it harder than others. The conjecture put forward here is that version control is very important, but that the integration with your IDE is actually just a “nice to have”.

The worst case scenario is that you follow this sort of process:

  • Crank up your source control client
  • Check out the code you need
  • Start up your IDE
  • Do your thing. Develop, debug, drink coffee.
  • Save and test your changes
  • Go back to your source control client
  • Check in your changes, tag your release

Granted, it is not as easy as “Tools->Source Control->Commit”, but the next time you need to roll back to a particular release and find you cannot because you ditched the integrated tool set because of all the hassle, think to the few extra steps that are required by using a pure client to the version control system.

Overcoming bad experiences

It is of course possible that a user will have had a bad experience with version control in the past, so it is worth spending time analysing precisely where the difficulty lies. Many users complain it is simply “confusing” or “just gets in the way”.

It can actually be a fair accusation to level against version control. Many have problems, are in need of certain features, or are frustratingly limited in a particular regard. But taking a holistic view is important: whilst problems may abound, the attitude and approach to version control is still worth working with, as outlined earlier the benefits of even basic version control outweigh the alternatives.

The view that version control can be confusing or obstructive is actually understandable - I have seen some convoluted version control structures and mechanisms in the past that were complicated for no particularly good reason. With the advent of plenty of excellent and mature GUI’s, particularly in the open-source world, as well as excellent reference manuals, the view that version control “gets in the way” should not stand up to some solid re-evaluation of perceptions.

Confusing? Take another look over how you are developing and think about what you would personally like to see in a version control system. The chances are such that what you would like to happen is likely to be available to you if you dig into the manuals.

If it “gets in the way”, ask yourself why the larger a project gets the more it is likely that version control is used? It is necessary because it simplifies the process of managing software evolution, and with a bit of effort actually makes massively complex development projects easy to manage. There is no inherent difference between your project and, say, the Linux kernel, Apache web server or Firefox browser beyond the number of users that are working on it. Most, if not all of their needs are the same as yours.

Do try and analyse where the bad experience emanates from, as many are self perpetuating. Merging is potentially complicated, but it should be clear that the benefits outweigh the problems, and many of the complications can be overcome by simply assessing the matter afresh. Commit more often, merge in changes from others more frequently.

“It’s different from my previous experience”

A user may join an organisation to find that a version control solution they previously used cannot be deployed, probably because the organisation has made a decision of its own. Whilst you might have lots of experience with the other solution, it is important to not loose sight of the benefits any version control solution has over the alternatives.

Go back to basics. Revisit the critical things any version control system needs to be able to do, and step by step, build up your understanding of the solution you have to use. Check in code? Check. Merge changes? Check. Branch a release? Check. View your local working copy as a part of the file system? Nope, sorry dude. That’s not there, but have a look around the support forums - particularly of open source version control - and you might be surprised to find something exists to help you out.

Possessiveness

It is only natural that any significant, successful effort by an individual results in feelings of pride. Having spent days on a particular project, it is only natural to feel proud of the result, particularly if it solves a problem in a particularly elegant and graceful way.

Pride is of course not a problem, but it can - particularly with a golden-repository model of “control” - manifest itself as possessiveness over particular blocks of code. “It’s fine as it is, you can’t taff with it!”. If development efforts are highly segregated and self-contained, it might actually be a logical attitude, but it is important to ensure that it is placed in context. Developers work for an organisation, and therefore the code is no more theirs than the computer on their desk.

As well as undermining any team attitude and group ownership, such a guardian approach to particular section of code works to undermine other important aspects of development such as code reviews, and exterior critique. It might be that such criticism is intimidating, for the fear of demonstrating some misunderstanding or lack of ability with a particular toolset or language.

By adopting a version control model, with frequent check-in and no divisions, delimitation and ownership becomes blurred, and for the better. “Many eyes make all bugs shallow” is a famous open-source mantra, and such an approach makes it possible to learn from others and thereby improve the abilities of the team and the end result in the process.

Bringing order to the mayhem

Even if you are already safely using version control solutions, it might be that the approach in use creates more problems than it solves. You can usually tell this because you will identify with one or more of the following:

  • Users are cursing and cussing at the hassle of it all.
  • Approaches taken are not “scaling” well. You start seeing some of the alternative approaches outlined in the start of this article appearing /within/ a version control system (multiple directories instead of versioning of files)
  • Requests to fix issues or retrospectively apply changes are met with fear.
  • Things are breaking or strange things are happening a little too frequently.
  • There is a burden associated with just using the system. It is perceived that the costs and hassles associated with it outweigh the benefits.

In such a scenario, it is really important to go back to basics:

  • Are the users fully conversant with the solution in use? Is there a one-page summary available that outlines the commands/tools to be used? If not, could somebody write one?

  • Could the users benefit from a little one-on-one coaching? Some users might be support staff that are struggling in a sea of concepts that are alien to their previous experience. A handout outlining the approaches and reasons why could be useful.

  • Are some simple FUBAR’s occurring? Treating local version control workspaces as “normal” files in a file system is often troublesome (particularly with CVS), as meta-information contained in the directories must not be copied about outside the context of a version control client. In this case, a bit of one-on-one help can help immensely.

  • Are some less obvious mistakes occurring? Are developers making local copies and backups? Unintended commits? Rollbacks made by overlaying locally made copies on a later revision, rather than rolling back? Committing unrelated work as one unit? Are commits being missed? These imply a level of understanding, but the system is being circumvented, thereby loosing many - if not all - of its benefits. Some more detailed education, training, reference sheets and coaching could be very beneficial here.

  • Is there a naming convention in place? Before running away in horror at the implied bureaucracy of that statement, remember that a naming convention can be as simple as incremental numerics and plain english descriptions. Simple standardisation can help. If a request tracker or bug tracking solution is in use, it might make sense to incorporate its identifiers.

  • What are the hassles users feel hold true? If it is associated with merging and branching (it often is), then sit down and agree an approach or solution. If merging is perceived to be a hot potato, then examine the process. Is one person always left holding the can, and dropping changes they do not understand or feel are necessary? Is the strategy working out?

  • There is no such thing as a free lunch. Working with a version control system WILL take some of your time up, and require some additional learning or support. It might even be a bit of a burden initially. However, this minor day to day burden should be put in the context of being able to rapidly recover from problems and save a LOT of associated hassle. Over time, of course, familiarity and understanding negates much of the burden.

  • Above all remember that the reason version control exists is to MAKE THE DEVELOPERS JOB EASIER. It is supposed to save time. It is supposed to be worth any hassle for the multitude of benefits it can bring.

When it all works well…

Integrating with large numbers of developers, reacting to requests to develop features, managing releases effectively, and pushing forward with the new features that are necessary. A fluid development process that enables users to work at their own pace, on their own tasks, without interference and complication. Where the tools do not get in the way, and the the benefits of adopting an approach result in better productivity and co-operation.

New ways of working can be achieved too. With an effective version control solution, agile methods using continuous (or automatic) build systems become possible, and it’s much easier to work remotely when the process is easy to manage, and progress you’re making is visible too all.

It sounds like some picture of perfection, but the reality is achievable. Effort is, of course, required to understand the subtleties of particular approaches and accept a team based decision, but much like a scratch on a record can ruin the enjoyment of a piece of music, misunderstanding parts of a version code management solution can ruin the best efforts of all involved.

In summary…

Whilst writing this article, I have drawn on a number of environments that I have supported or worked with to make the main point: Version control really is supposed to make a developers job easier. Much like Unix, the systems used were actually written by developers, for developers. It really is not a management curse or a bureaucratic nuisance, despite what you may think.

The core of the problem in my experience is simply that users are often left to their own devices, and feel that the costs of dealing with a source control system massively outweigh the benefits they might achieve. There are features to add, bugs to fix: Version control just is not a priority at the outset, and before long it becomes the standard working practice.

So the one thing I would encourage any reader to do is just take a step back. Look at the way you work. Ask yourself where the problems are that stop you doing your job more effectively. Chances are there are plenty of ways in which version control really can help you work more efficiently. But such benefits do not come for free. For all the GUI innovations in the last few years, for all the ease-of-use improvements that have been added, there are still conceptual issues to understand, working practices to review, and - regrettably in many ways - product specific concepts to appreciate.

So spend a little time working to understand the basic principles, and how they are implemented. When faced with a particular challenge or request, take a step back and think how you would like it to work. Then dig out the manual and read around the concepts.

Given time, and not a bit of effort, even the most obscure system can be mastered. Remember how you felt when first learning a new programming language? So much to master, so little time to do it. But you got there in the end. So it is with any tool set, and version control is just that. A tool to be used, but one that really can make your job easier.

More links

Sourceforge CVS guide

About the Author

Richard Leyton is an independent consultant, specialising in the field of database systems, and has a wide amount of experience implementing open source solutions in commercial environments. He has over ten years of experience supporting development and production technology environments, and is always very happy to discuss engagements of any size or complexity. For more information, please visit http://www.leyton.org

This article is Copyright © 2005 Richard Leyton, all rights reserved. It is not to be re-distributed or used without permission. Please e-mail versioncontrol@leyton.org or visit http://www.leyton.org for further information.

Acknowledgements

I’m very grateful for the comments and suggestions I have received from many people whilst writing this article, most notably the following: Richard Burgess, Ivan Cronyn, Derek Baum, Richard Jeans, Frances Flood, Matthew Proctor, Michael Shaw and Keith Wall.

$Id: versioncontrol.txt 1.20 2005/06/08 09:51:54 rleyton Exp rleyton $

A formatted PDF version is available here

You can of course link to this article using the above URL, or might prefer simply to use: http://www.leyton.org/versioncontrol - which will redirect to the above URL.

3 Responses to “Version control: A misunderstood technology”

  1. 1
    Amy Chong Yew ee Says:

    This article has been very enlightening and not only help me to understand version control system, but it also makes me realize how important to reflect on how we have been doing, especially on managing our work.
    Great job!

  2. 2
    Richard Says:

    Thanks for your comments - That’s made my morning (that’s involved last minute Christmas shopping, in the rain, so not exactly enjoyable!).

    Please feel free to pass on the website, or the PDF, and if you’ve any questions, comments or suggestions for improvement, I’d be only too happy to try and help out.

  3. 3
    Cronan Says:

    Any chance of a more detailed article targetted on Subversion, given that it gets such a brief mention here?

Leave a Reply

Please be sure to read the comment policy before posting.