By examining the results of analyses on our customers' projects on our Cockpit SaaS platform, it is clear that a "Quick Win" strategy is often used for corrections. The team prioritizes glaring bugs (like floating point comparisons and synchronization problems) that are quick to correct (dead code, redundant casting, missing documentation headers, etc.).
Among the anomalies that remain week after week, we generally find overly complex methods. This type of problem is often considered to be complicated and risky to resolve for a rather hypothetical benefit. By checking their dashboards on the Cockpit, development teams are aware of the problem and therefore improve new development projects, but pre-existing methods are rarely corrected. Let's consider the return on investment for this type of correction: the cost may be small, risks are controlled, and the benefits may be high.
Complex?
By complex, we are referring to a method that is rather long or complicated in its algorithms. Theoretically, complexity is reflected in a high number of instructions (typically > 100, which equates to about 200-250 lines) or significant cyclomatic complexity (typically > 20). Based on our statistics, we find that about 4% of methods are highly complex methods in the projects we analyze for the first time, but this low percentage often represents 10% to 20% of the application's total code (depending on the project type and language)! Here is a typical example of how methods are distributed according to their cyclomatic complexity in a Java project:
Another dimension that can add to the complexity of methods is the number of distinct dependencies on external types/classes (the efferent coupling concept). One method using many types in its processing is generally more difficult to understand and test. We will soon offer a new quality rule that includes this dimension in the Cockpit rules repository (this rule is currently being configured to set the different thresholds).
Why simplify these methods
There are true challenges associated with simplifying a complex method. The objective is not just to come below the theoretical thresholds set by some quality manager or tool.
- Maintenance: This is the most obvious reason. By definition, a complex method is difficult to understand and update.
- Testability: A complex method generally cannot be properly unit tested. A unit test should check one process... a unit. Testing a method that combines several processes is like tasting several wines mixed in a single glass; it is difficult to draw reliable conclusions. This touches upon responsibility, which is particularly important in object-oriented development.
- Reusability: It is more difficult to reuse a method that strings together different processes with varying specificity than to reuse unitary methods. Here again, this is a matter of responsibility.
- Extensibility: With inheritance, it is preferable to overload unitary methods rather than to redefine a complex method altogether, avoiding having to copy code.
- Automatic documentation: Modern languages generally have standard documentation in the header comments for methods, and these comments can be used to automatically generate technical documentation. Breaking apart a method lets you transform informal internal comments into standard documentation, creating added value. In addition, the simple fact of naming the new unitary methods helps to automatically document the code.
Are there risks in simplifying a method?
Common sense suggests that changing a complex method leads to a regression risk. This is both because the process is difficult to understand and because it affects so many areas in regression.
In many cases, however, using the right tools can help to reduce risk or even remove it entirely. This is the principle of refactoring. The program's structure is modified without impacting its behavior.
How do you simplify a complex method?
There are two possible ways: manual refactoring and automatic refactoring.
Manual refactoring
Let's consider the following method:
public void addToProject(String input) throws InputException
{
// Apply regexp to input string
Matcher matcher = PATTERN.matcher(input);
if (!matcher.matches())
throw new InputException("Input does not match:" + input);
// Retrieve user
String idUser = matcher.group(1);
User user = userService.findUser(idUser);
if (user == null)
throw new InputException("No user for ID:" + idUser);
if (!user.isActive())
return;
String password = matcher.group(2);
if (user.getPassword().equals(password))
throw new InputException("Invalid password for user:" + idUser);
// Retrieve project
String idProject = matcher.group(3);
Project project = projectService.findProject(idProject);
if (project == null)
throw new InputException("No project for ID:" + idProject);
if (project.hasUser(user))
return;
// Add user to project
project.addUser(user);
projectService.saveProject(project);
}
The purpose of this method is to add a user to a project after first checking the supplied information, and the parameters are encoded in a string. Four successive processes can easily be identified in this method: unencoding the input string, identifying the user, identifying the project, and associating the user to the project. Altogether, it might not seem complex, but imagine having to manage even more parameters or input controls.
A good response would be to simplify this method by separating the user and project lookups into two distinct methods. This type of refactoring is generally referred to in the IDE as Extract/Introduce method. But in this case, there are two recurring problems for this type of refactoring:
- There are multiple output points defined with different types: Boolean, objects User or Project, or void.
- Multiple variables are updated in the externalized processes.
The problem is that a Java or C# method can only return one type. The second problem can sometimes be resolved by passing variables in as parameters, but this only works if the variables can truly be passed by reference, which is not possible with scalar types or strings, for example.
At least two solutions can resolve this gracefully. Both require introducing an object that encapsulates the different variables that might be updated. These are based on known refactoring, as popularized by Martin Fowler.
Introduce Parameter Object + Extract Method
This solution involves regrouping the variables used for input and output into a new object and then passing the object from method to method. The new object generally contains only attributes. In C#, this is typically implemented with a private class, and in Java, a private static class is used (preferably with comments ;-) ):
private final static class InputContext
{
private Matcher matcher;
private User user;
private Project project;
private InputContext(Matcher matcher)
{
this.matcher = matcher;
}
public Matcher getMatcher()
{
return matcher;
}
public void setMatcher(Matcher matcher)
{
this.matcher = matcher;
}
public User getUser()
{
return user;
}
public void setUser(User user)
{
this.user = user;
}
public Project getProject()
{
return project;
}
public void setProject(Project project)
{
this.project = project;
}
}
Traditional refactoring, Extract Method, can also be used as normal:
public void addToProject(String input) throws InputException
{
// Apply regexp to input string
Matcher matcher = PATTERN.matcher(input);
if (!matcher.matches())
throw new InputException("Input does not match:" + input);
InputContext inputContext = new InputContext(matcher);
// Fill context with user & project
if (!retrieveUser(inputContext))
return;
if (!retrieveProject(inputContext))
return;
// Add user to project
Project project = inputContext.getProject();
project.addUser(inputContext.getUser());
projectService.saveProject(project);
}
protected boolean retrieveUser(InputContext inputContext) throws InputException
{
String idUser = inputContext.getMatcher().group(1);
User user = userService.findUser(idUser);
if (user == null)
throw new InputException("No user for ID:" + idUser);
if (!user.isActive())
return false;
String password = inputContext.getMatcher().group(2);
if (user.getPassword().equals(password))
throw new InputException("Invalid password for user:" + idUser);
inputContext.setUser(user);
return true;
}
protected boolean retrieveProject(InputContext inputContext) throws InputException
{
String idProject = inputContext.getMatcher().group(3);
Project project = projectService.findProject(idProject);
if (project == null)
throw new InputException("No project for ID:" + idProject);
if (project.hasUser(inputContext.getUser()))
return false;
inputContext.setProject(project );
return true;
}
In addition to simplifying maintenance, we see that externalized processing can be easily overloaded by inheritance.
Replace Method with Method Object
This refactoring builds upon the concept of encapsulation by introducing externalized methods into the new object's class. The object paradigm calls for data and associated processing to be grouped into coherent, autonomous entities. Both solutions are relevant. You can decide whether you prefer to separate or group together data and processing. The same choice can be found in the debate over Active Record versus DAO.Automatic refactoring
There are obviously regression risks involved with manual refactoring. Fortunately, IDEs are quite advanced in this area. Traditional refactoring, such as renaming, moving, and introducing variables, has existed for a long time. More recently, more advanced refactoring has become available. IntellijIDEA offers Replace Method with Method Object refactoring (under the name Extract Method Object).
Simply select the block of instructions to externalize to do an Extract Method. If IntellijIDEA detects that this refactoring is not possible, it prompts for Replace Method with Method Object refactoring:
This refactoring can be configured via the GUI:
IntellijIDEA is currently the only IDE to support this type of refactoring. The others only have Extract Method (Eclipse or Netbeans for Java, Visual Studio ReSharper or Refactor! Pro plugins for C#).
The advantages of automatic refactoring are obviously its speed and the lack of regression-related risk (thus the importance of choosing the right development tools for optimizing productivity). Here, the developer's only job is to identify blocks of instructions to externalize and then naming them. We can then review the most complex methods in a project and simplify them one by one (if possible, documenting them and associating unit tests to them).
Conclusion
As always, it is best to avoid upstream problems, especially when it comes to having to make corrections. Developers should be careful to write only unitary methods. Knowing how to split up a method comes with experience, and extracting methods then becomes natural. But don't overlook the increased productivity that is available with modern IDEs, particularly in their refactoring functions.