2010-06-22

Subversion Backup-Restore

So we are are now mostly migrated from CVS to Subversion. And we are (mostly) happy.

We have around 150,000 revisions split on two repositories (125,000 and 25,000 respectively). Performance is fair - much better than CVS in any event. We run it on Windows 2k3 using some virtual server claiming 2 x 2.4GHz Xenons and 3GB of RAM.

The "mostly-happy" stems from the trouble people experience now they have started using branching (the trouble appearing around the time of merging). And the few weird things we see in the Eclipse Subversive plugin. Still, much better than CVS IMHO.


Anyway, this blog entry is about our backup-restore strategy for Subversion.

We have two servers; the primary in active use, and a secondary that is used for daily backup restores.

Every afternoon at 16:00 a backup script (verify-repositories.rb) runs on the primary. It first validates the integrity of the repositories (this takes around 10 hours!). If the repositories are sound, a hot-backup is run while Subversion remains available. This nightly backup is copied to the secondary server.

In addition to the nightly (full) backups, each commit to Subversion is stored as individual dump files on the primary server (via the commit hooks, so we do not want a dependency on the secondary server at this point). These are basically incremental backup files.

Every morning at 05:00, the secondary server runs the restore script (restore_backup.rb) which does a full restore of the backup set from the same night. It then applies all incremental backups available (those not included in the full backup). This takes around six hours in total.


That is, the combined full and incremental backups are restored so that the primary and secondary servers match state sometime around lunch time. At this time, a mail is sent to our administration list containing status and instructions for how to switch from the primary to the secondary server. So the manual for a crash recovery is readily available.

In theory we should have no more than half a day of downtime in case of a crash in the morning (where we would have to wait for the secondary to complete its restore). A crash in the afternoon should (in theory) only cost us the 15 minutes it takes to switch the DNS entry.

Worst case, a new full restore will take around a day (if the previous automatic restore cannot be used for some reason).

This will work for a good while. But eventually, as the repositories and thus backup+restore time grows, I will have to add a third server.

For now, I am pretty happy with the setup though. I hope the description and scripts can inspire your Subversion backup/restore scheme.


The backup and restore scripts are placed in the Buildmeister repository at Kenai.

2010-02-19

Rational Deployment

We use a number of IBM Rational products (RAD, RSA, RSM, RDZ, and RDI at last count).

These are supposed to be installed using IBM's Installation Manager (IM). Unfortunately, IM and I have issues.

One Of Us Had To Go...

When I started at the bank, I had to make a new RAD7 (version 7.0.0.3 IIRC) deployment to our 75 developers.

This was early times for the IM so it did not do too well. In fact, its non-interactive installation was pretty much useless - and did not even get properly documented until RAD 7.0.0.4. But I tried. I really did.

With the result that when the installation package hit the first 20 developer machines it failed on almost all of them for various reasons. To be honest, we use a package installation system from Siemens (Sniik) that takes care of launching IM. And Sniik adds complications of its own, so it may not all have been IM's fault.

In any event, we reverted to the old package, and all was fine. All was fine, because it installed RAD7 (the initial release) and then let IM do an update to the latest version.

However, while I waste time trying to get IM and Sniik to install RAD in peaceful cooperation, IBM makes the next release. This release is so large, that IM times out during the download process - and so effectively prevents RAD installation on new machines using the old installation package.


This circus had lasted for 7-8 months. Fortunately, I was then allowed to do it my way.

An Alternative Approach

"My way" being an idea fostered at Systematic (Hi Anders!); install the product on one machine, zip it up and unpack it on all target machines. After the data is unpacked, replace host, domain and user name occurrences in all file names and files.

As far as I know, the idea was dropped at Systematic because of problems with the Windows Registry.

At the bank the developers do not have administrative rights on their machines. So it was imperative that I could make the solution work for non-administrators; use of Window's Registry would be a problem.

After some digging, it appeared to me that the Rational products only used the Registry for the Installation Manager. And the WebSphere Application Server may be set up to be started via the Windows Service system. But those were the only features requiring administrative rights I could find. And being happy to drop both, I could continue with the implementation.

Testing The Waters

I made a first simple implementation of the alternative installer using a bash script (for packaging) and a Ruby script for unpacking/replacing.

It worked well, and it did not take long before we could replace the broken Sniik/IM package with a new JB custom installer package for RAD.

As the months passed and more Rational products were deployed with the simple installer implementation, it started to show its problems; it took a long time to create packages, the installation packages took up much space (because I had to clone the whole thing when making minor installation script changes), and the users did not get good information when installing the package.

Then I volunteered to make a installation package for WebSphere Portal Server (WPS), because it took hours to install with IM - and I was pretty sure I could improve on that.

I knew I would have to re-implement the installer - both to address the difficulties with the old implementation, and to allow more flexibility for the problems I knew I would have with WPS. Both parts were rewritten in Ruby.

Rational Installer


The Rational Installer has two parts; the package creation script and the installer script.

create_package.rb

The create package script reads recipies for package creation. The recipies include information about the Windows rights the package should be created with, whether to search archives for strings to replace, and a list of folders to include in the package.

For each folder is described the strings to search for (host, user, and domain) and which sub-folders to exclude from the package (such as temp folders). It can also specify a list of files/patterns to exclude from string replacement (such as binary files).

The outcome is an archive (ZIP) file and a meta-data file used by the installer script.

It is possible to place pre- and post-install scripts next to the recipie which are copied into the resulting package.

install.rb

The installation script takes a specified list of packages to install. It allows existing installations to be replaced (if --force argument is used).

It presents an installation summary with expected installation time to the user. Checks that the user has the necessary Windows rights, that the destination disks have the necessary space and then runs any pre-install scripts.

Then the archives are unpacked and necessary text replacements and file renamings are made. The installed folders are given the required Windows rights.

Finally it runs any post-install scripts.

In Use

First I manually install the product I need to make a package for. RAD and friends are pretty simple to install manually. For WPS, I use scripts with IM. After IM has run, I have the base material to make my own packages.

Packaging RAD (without WAS) takes something like 6 minutes. Packaging WAS is around 3 minutes. Both ignore archive string searching, because I happen to know I can get away with it. For WPS it is necessary to scan archives for strings to be replaced. Combined with its large size, this results in a packaging time of 18 minutes.

When making an installation package for our RAD deployment, I break it down in one package for RAD itself and one for each of the nested WebSphere installations. This means I can update each WAS or RAD itself separately from the other elements by substituting the relevant packages in the installer.

The resulting package size are much smaller than the IM media. In part because our packages only contain what we really want to have installed - and in part because, well, IBM's installation media is just beyond obese.

For example, our RAD7.5.5 package is 1.9GB, while IBM's media is 3.1GB (7.5) + 2.9GB (7.5.5 update). Obviously, that in itself results in a massive time reduction when copying data to the machines.

Here is an example output from running the installer, installing RAD 7.5.5, WAS 6.0.2.33 and WAS 7.0.0.7 (the im/nodelock package contains the license files):

Loading information for package im/nodelock-rad75+3-servers
Forcing deletion of existing folder c:/udvikler/rational/rad75
Loading information for package rad/rad-7.5.5
Loading information for package was/was-6.0.2.33
Running preinstall script...
Loading information for package was/was-7.0.0.7
Running preinstall script...
-------------------------------------------------------------
Installing 4 packages:
Required space C: 4654MiB, D: 0MiB
Expected installation time 38 minutes
-------------------------------------------------------------
Installing package im/nodelock-rad75+3-servers starting at 2009.12.11 10:24
unpacking installation-manager-certificates to c:/udvikler/rational/rad75
startet at 2009.12.11 10:24, completion expected at 2009.12.11 10:25
replacing strings in 0 files...
renaming 0 files...
setting permissions...
creating excluded (empty) folders
Installing package rad/rad-7.5.5 starting at 2009.12.11 10:24
unpacking rad75 to c:/udvikler/rational/rad75
startet at 2009.12.11 10:24, completion expected at 2009.12.11 10:38
replacing strings in 0 files...
renaming 0 files...
setting permissions...
creating excluded (empty) folders
Making P2 backup
Installing package was/was-6.0.2.33 starting at 2009.12.11 10:35
unpacking was6 to c:/udvikler/rational/rad75/SDP/runtimes/base_v6
startet at 2009.12.11 10:35, completion expected at 2009.12.11 10:39
replacing strings in 5 files...
renaming 0 files...
setting permissions...
creating excluded (empty) folders
Running postinstall script...
Installing package was/was-7.0.0.7 starting at 2009.12.11 10:40
unpacking was7 to c:/udvikler/rational/rad75/SDP/runtimes/base_v7
startet at 2009.12.11 10:40, completion expected at 2009.12.11 10:49
replacing strings in 3 files...
renaming 0 files...
setting permissions...
creating excluded (empty) folders
Running postinstall script...
Installation successfully completed at 2009.12.11 10:46


So RAD 7.5.5 plus two WebSphere instances installed in less than 30 minutes. Not bad, if I have to say so myself :)

The post-install scripts can be used to tweak the installed product further. For instance, WebSphere contains certificates created on the installation host; these can be replaced by new certificates created on the client (we only do it for WPS though).

I also use the pre- and post-install scripts in combination to backup/restore WebSphere's profile database, so profiles survive a RAD/WAS reinstallation.

Push Or Pull

So I replaced a "push" deployment model with a "pull" variant; the developer has to manually install the Rational tools.

There were some concerns about this from the outset. And understandably so; when you want to deploy a new Rational product/version, each "receiving" developer has to spend ~30 minutes doing so. Before it happened at night at no cost to the developer.

Of course, what everybody tend to forget is that mostly everybody has experienced having to redo their installation during working hours for one reason or another. And now that pain has been reduced from half-a-wasted-day to something that can be done while at lunch.

The biggest advance, seen from my chair, is that I can do real fine grained staged deployments. Before, I had two stages - the first 20 developers, and then the rest.
Now I can ask individuals to update and test. And if they are happy, ask the remaining developers to install as it fits into their schedule. With two stages, there was always someone with a looming deadline who did not appreciate "push" deployments. Not an issue, anymore.


Scripts

The scripts will be made available soon(ish). I want to wait till Oracle/Sun has decided what to do about Kenai first. I may have to switch to SourceForge.

2009-12-23

Busy times...

Yikes, Christmas is over us. And I never got around to posting anything since late summer.

Apologies! It is not that this blog is dead (it just smells a bit peculiar), but I have been hung up on a number of high priority issues.

I (seriously) hope the issues will be resolved early in the new year, so I can blog a little again.

In the pipeline (yes, let's make some vapor-blogging):
  • Enterprise deployment of IBM Rational products.
    As the name suggests, the "rational" approach would probably be to use IBM's installer. But no - I just had to go and invent our own deployment strategy. It is slim, it is fast, and we have full control of the deployment. What's not to like?
  • Authorized Job Service
    We do not use our Continuous Integration builds in production for various reasons. Instead we have a manually invoked build server. To make it work I needed a way to authenticate and authorize users between servers.
    And as I couldn't figure out how to make a single sign-on solution that would use the user's existing Windows session credentials (with JAAS), I had to roll my own using SSPI.
    I suspect this will be a blog entry where someone points out the obvious to me...
  • Our SVN backup and restore setup.
    The repositories are backed up at night and restored to a different server during the day. Lovely stuff. I hope we'll never have to make use of it - but if we do, I am much more comfortable with it than with our previous CVS setup.
  • And before the new year expires, something about our new documentation system (it needs to mature a bit yet).
    We use the Mylyn WikiText plugin to write documentation placed in the repository with the source. On project builds, the documentation is translated to HTML. We have extended the syntax (Confluence) a bit to add extra features...
Hopefully there will be more than four postings next year. But consider this an informal contract about deliveries for the next year :)

Have a nice one!

2009-08-26

Migrating from CVS to SVN

We have started a migration of our projects from CVS to SVN.

Choice of VCS and RAD plugin

We looked at other VCS tools and would have liked a distributed VCS. But that would have required training of our developers to a different configuration management model, so we went for Subversion instead.

For the same reason, we chose to use the Subversive plugin in RAD instead of the tigris Subclipse plugin. While Subclipse is more true to how Subversion works, Subversive provides us with behavior that (for the developer) is much more like the CVS Team integration in RAD.

There are still a few minor bugs in the plugin, but on the whole it is quite usable.

The biggest change for our developers will be the new tagging behavior: before they could tag projects in the Project Explorer. But now they need to tag in the Subversion Repositories Explorer (because only it knows about the folder hierarchy in Subversion).


Repositories and Performance

Today we have 18 CVS repositories on a single server (in use by some 100 developers and Hudson).

While it has always been stable and running without problems, its performance is not good. Synchronization time for most projects is a minute or more, which is aggravating.

We expect Subversion to give us a much faster VCS experience - and that is certainly true for the 50-odd projects that have already been migrated.

In the new setup, we want to use fewer repositories, making it easier to move projects between departments when that is necessary.

However, as we have not been able to find some good data on Subversion performance, I am a little concerned about how the server will perform when we get all the projects migrated to SVN - to fewer repositories.

Fortunately, the ability (as admin) to move data with full history between repositories saves us (is the assumption :)

If many projects in few repositories becomes a problem, we can introduce more repositories. If load on the server becomes too high, we can use redirection in Apache to move some repositories to other machines.

So we feel safe at the moment (in our glorified ignorance).

Repository Layout

In CVS, a number of modules constitute a single RAD project. That makes it a (small) nuisance to check out a project. It may look like:

REPO_DEPT_A/
projectA.cfg
projectA.ear
projectA.ejb
projectA.web

REPO_DEPT_B/
projectB.ear
projectB.ejb
projectB.web


In Subversion, we address that by introducing a new logical layer in the folder hierarchy. We also introduce a layer for the departments:

REPO/
DEPT_A/
projectA/
trunk/
projectA.cfg
projectA.ear
projectA.ejb
projectA.web
branches/
tags/
DEPT_B/
projectB/
trunk/
projectB.ear
projectB.ejb
projectB.web
branches/
tags/


The layout is ensured with a commit hook.


Subversion Hooks

We have a number of hook scripts implemented in Ruby. The scripts are launched from DOS bat scripts.

Layout Validator

The Layout Validator ensures the REPO/DEPT/LOGICAL/trunk|branches|tags/ hierarchy in Subversion.

Launcher pre-commit.bat, Script hook-layout-validator.rb

Commit Allower

In general all employees have read access to Subversion. But our build users are not allowed to commit (because they anonymize whomever started the job they run).

The list of users who cannot commit are listed in the script.

Launcher start-commit.bat, Script hook-commit-allower.rb

Repository Locker

When doing administrative stuff directly on a repository (imports, for example) we prevent write access for the developers.

The script looks for a file named jb.txt in the repository's root folder. If it exists, commit is prevented and the text in the file is presented to the user.

Launcher start-commit.bat, Script hook-repository-locker.rb


Integrity Checking

Each night we run a cron job that iterates over all repositories and runs this command:

svnadmin.exe verify -q $repo

The output is sent to a mailbox that is (more or less) constantly monitored.

So if there is data corruption on the server we will know within a short time. Unlike CVS, where we would only find out if the project containing a corrupted file was attempted built.


Per-Commit Backups

The next thing I will be looking at is creating per-commit diff files to another drive.

That should allow us to get up and running in hurry if the main server is hit by logical errors on the primary drive.

Interesting, and a new improvement over CVS where we only had the nightly backups to rely on...