Thursday, July 2, 2009

Statistical analysis of code to improve its performance

We recently published a Société Générale testimonial that explained that the company had successfully reduced some of its processing time from 20 minutes to 20 seconds by using our Quality Cockpit on one of its projects.

Too much to be credible? Let's see how a statistical analysis of code helps to detect performance problems earlier, regardless of whether they are associated to the CPU or RAM.

One Example

Here is the first example. It happens often, and it is very easy to follow:

User findUser(UserManager userManager, String id)
{
...
if (userManager.retrieveUser(id) != null)
{
 User user = userManager.retrieveUser(id);
 ...
}
...
}

This is a typical example of the DRY antipattern, whereby processing is repeated twice in the code. The developer might have thought that it would be too much to automatically declare the user variable before the if statement, or he simply did an inappropriate copy/paste. Perhaps he thought that the call to the retrieveUser method would have a negligible load on the system. However, this code presents several problems:

  • The developer might consult the code for the retrieveUser method at an instant T, but he does not know what will happen to this method after implementation. For example, this implementation might use a high-performing memory cache at first and then change to a database lookup. Unnecessary calls that went unnoticed before may eventually impact performance.
  • Following the same principle, we cannot control where our procedure will be called. If the findUser method is inefficient, it is not necessarily serious if it is called only once, such as when the user logs in. If the method comes to be called regularly, such as with each webpage that opens on a very active site, then its performance becomes critical.

This problem is common in object-oriented programming, the encapsulation principle. We can't focus on how a service is implemented, only on its contract (ex: a method's signature). This is especially true with the dependency injection mechanism (IOC).

This is one of the most common pitfalls in performance problems. A real problem might be hidden in less visible code for a while before becoming truly critical. That is why it is important to correct these problems as early as possible.

Some Recurring Causes

In performance-related anomalies affecting our platform, we can identify some categories of recurring problems:

Poorly managed concurrent access

Without even addressing the problem of deadlocks, we regularly find unnecessary synchronizations. For example, instead of synchronizing on a field, the whole method is synchronized, which can slow down processing. This is also why we suggest having a rule to prohibit synchronizations at the method level so as to encourage developers to target and control their synchronizations better.

Similarly, thread-safe classes are sometimes used when their non-synchronized version should be used, which consumes less resources (ex: in Java, java.util.ArrayList for a java.util.Vector).

Poor memory management

Just because a language has a garbage collector to automatically clean objects in memory, that doesn't mean that we don't need to pay attention to memory management.

First of all, there are often direct calls to the garbage collector. This practice is not recommended because it can skew its internal algorithms and thereby reduce performance for the next cleanings.

However, resources need to be freed up. This includes removing references to objects that are no longer being used and, for C#, managing Disposable better, such as by releasing the Disposable attributes in the finalizer or declaring classes with native attributes to be Disposable.

We also find bad practices related to unnecessary instantiations:

  • Declaration of non-static loggers. Generally, a logger object is associated with a class, and there is no need to create a specific instance for each of the class's objects. The logger should be declared as static.
  • Instantiations of objects that use only static methods
  • Redundant instantiations in loops
  • ...

Unnecessary code

A simple way to improve performance is to track down unnecessary code:

  • Redundant casting
    if (o is User)
    handleUser(o as User);
    
    Since the C# is operator is already casting implicitly.
  • Instantiated, but unused variables (dead code)
  • Writing to logs without checking the trace level
    User findUser(UserManager userManager, String id)
    {
    User user = ...
    
    List projects = projectManager.findProjects(user);
    LOGGER.debug("User found: " + user + ",available projects:" + projects);
    
    return user;
    }
    
    Here, a project list is returned only to be displayed in a debug log. This operation should only run if the application is in debug mode, so the applicable code should be surrounded by if (LOGGER.isDebug()))
  • Unnecessary tests
    if (true)
    {
    ...
    }
    

Insufficient knowledge of the language

Some performance problems are quite simply due to a lack of knowledge of the language and basic classes. Here are a few examples:

  • In C#: if (someString == ""). The test for an empty character string should be done with System.String.IsNullOrEmpty(System.String), which generates lighter IL code.
  • In C#: public static readonly Int32 someConstant=128. A constant should be declared with the keyword const: public const Int32 someConstant=128. The generated IL code will then use the constant value and will therefore perform better.
  • In Java: String s = new String("kalistick"). This automatically instantiates a new object, even though the JVM uses a character string cache because of String s = "kalistick".
  • In Java: Integer i = new Integer(args[0]). Same thing. Since Java 5, the JVM uses a cache of numeric values. This cache is invoked by writing Integer i = Integer.valueOf(args[0]).
  • In Java: String s = "value = " + args[0]. A classic error that often comes from profiling. Character strings should always be concatenated using the StringBuffer or StringBuilder class (unless the concatenated terms are constants, in which case the compiler will optimize the concatenation).

How to prevent performance problems

Now that we've discussed some easily identifiable problems, the question is how to avoid them as early as possible. The first answer is to use trained and experienced developers! Every developer has the right to youthful indiscretions, but one would hope that they would only commit them once. :-) Training is key in our approach. The developer can find errors, document them in the best practices, and avoid reproducing them the same errors in the future.

The second solution is to use a specialized tool to analyze performance: a profiling tool. Such tools are designed to trace the execution of an application in order to provide a detailed view of its performance, whether in real time or afterwards, including CPU usage, memory used, threads, garbage collector activity, etc. A drill-down mechanism is generally recommended for targeting the faulty code. Learning how to use them may not always be simple, but the challenge is in running the application with test scenarios that are exhaustive enough to cover all of the code to be tested.

References: JProfiler (Java), YourKit (Java and C#), and dotTrace (C#). Java has its own version 6 of an integrated profiling tool: VisualVM.

Conclusion

Statistical analysis is less productive and exact than a profiling session because it doesn't know the context of execution, but it can be used to quickly and easily identify obvious defects, particularly upstream of the problems. And this is a key point to our approach: The earlier you correct problems, the less expensive it will be!

0 comments:

Post a Comment