Your legacy system may be generally in good shape, except for the occasional undesirable issue. However even small problems can be the limiting factor on user adoption and revenue if they are left unaddressed. If small frustrations have led you to consider a full rewrite, we would advise you to think again. Rewrites often face exactly the same issues that were encountered the first time round.
A better solution is usually to simply fix the original code. This can seem a daunting task if error messages are strange and code appears incomprehensible – however, this is where our experience in legacy systems comes into play. We have the expertise to deliver the fixes you need quickly, without compromising existing functionality.
We are generally asked about bug fixes when in-house repair attempts have already been unsuccessful. This means we are used to dealing with more complex problems that may be difficult to trace. We believe that the most effective weapon in any bug hunt can be expressed in one word: visibility. Bugs are difficult to track when there are dark areas where the code appears inaccessible or is not well understood. Our first line of attack is usually to open up those areas to inspection and gain a full understanding of the overall system mechanics. This may involve developing specific tools or setting up customised environments.
An early goal of any bug investigation should be an attempt to determine the minimum test case which reproduces the error. It may be that steps to reproduce the error are already known – however in many cases the error appears to show up intermittently, and it is not clear what particular sequence of events leads to the unexpected behaviour. In these circumstances attempts to reproduce the error via the user interface frequently prove unsuccessful. We may use more advanced and technical methods on such problems – in essence inducing the error by intercepting the code as it executes, and working backwards to determine practical reproduction steps.
Establishing reproduction steps may in fact be the hardest part – once a test case is known a solution often follows naturally. However, it is important to be sure how any code changes are going to affect other features, and the overall application. We do not want to fix a bug only to introduce a new one. If a reasonably comprehensive test framework already exists then this can simply be a matter of extending the test coverage. Otherwise we need to determine what areas are likely to be affected, then to (a) work to reduce those areas, normally by implementing the fix in a way that is as specific as possible, and (b) develop sets of tests for the areas that may be influenced, to make sure code changes do not produce side-effects.
Bugs that are reported by the user, or discovered for example by operations or admin staff, usually require more initial analysis. This is because it is necessary to reproduce the issue in a testing environment before any solutions can be proposed. Reproducing a reported issue is often the most difficult and time-consuming part of the fix, because user reports may be unreliable and it may not be clear exactly what steps were taken to generate the issue in question. We use a range of inductive and deductive methods, including log file inspection, process bisection, system reduction (“delta debugging”) and various code analysis techniques to get to the bottom of the problem as quickly as possible.
[This] is a small website for a local home furniture retailer… [The client] recently discovered that the online purchase process was not functioning after a lengthy period without receiving any online sales. It is not clear how long this system has not been functional as sales online have been historically infrequent. However it may be the case that online purchases never worked correctly, and so understandably [the client] is keen to get to the bottom of the problem as quickly as possible… In initial tests the UI appeared to display the purchase confirmation page after credit card details have been entered, but with an additional message “An error occurred. Please contact support.” The order number and payment amount are blank, and there is no opportunity to retry the payment… The relevant modules appear overly complex and there appears to be substantial code repetition. A first step towards gaining a clear picture of the process may be simply to tidy these modules…
Payment was found to execute successfully in the case of a user paying as guest (ie not logged in). A logged in user attempting to make a payment executed a separate code branch, which involved an SQL query with a syntax error. This error was not caught and propagated to the confirmation page… Unfortunately the system demands sign up as part of a guest sale, which means a customer’s first purchase would succeed, but none after that…
A regression happens when a recent code change introduces unanticipated undesirable behaviour, often in an unexpected area of functionality. If admin makes statements like, “since upgrade X, feature Y no longer works properly” then you are probably looking at a regression. If the bug went unnoticed for a while, it may be difficult to pin down exactly which change caused the regression – in which case the statement may simply be “feature Y used to work but now it doesn’t”.
Regressions are usually the result of having an insufficient testing framework. Reasonably well-written unit tests should catch unexpected problems with a given code change. In practice, however, testing every possible scenario is impractical, so even the best thought-out systems can still experience regressions.
The quickest route to solving a regression is to identify and reverse the change that introduced the problem. Once this is done, the change can be re-engineered and re-implemented so the desirable modification can go ahead without causing the problem.
A latent bug is a problem that has existed with your code from an early stage, but has only recently become apparent. This situation can occur if code was designed to execute in a given scenario, but that scenario was never previously encountered in practice. A change in environment, users behaving differently or new use cases being added may result in the surfacing of latent bugs. However, they can also come to light when no easily identifiable external change has been made, and there is simply a very specific set of inputs that haven’t previously been encountered (e.g. a certain country/credit card type combination for a user). Latent bugs can be distinguished from regressions as they usually do not appear as a result of a code change. They can be tricky to trace – and again we use a full range of methods to root them out.