Monday, January 30, 2012

Parallel Software Development Worlds

This is a story about Joe, a software developer living in parallel worlds.

In one world, Joe is following the traditional approach to software development: he writes his code, calls it “complete”, and hands it over to Quality Assurance organization for testing. He writes some automated tests after he’s written the code if he has some extra time on his hands, which rarely happens, because he’s constantly under pressure from his management chain to fix defects in the software he developed that were found by the customers. And the little time he has left is spent fighting fires to unblock his QA partners by fixing critical defects in the “complete” code he turned over for testing. This, of course, means that, in order to ship any software to customers, Joe leaves many defects found by QA, but deemed non-critical in his software, in hope that customers won’t run into them. Often, however, they do, and are not exactly thrilled they did, which means that the criticality of such defects is escalated to be much higher than original assessment. Debugger is Joe’s most used tool, and code is full of instructions that dump various messages to a log file that only Joe can (only on his better days) understand. Joe’s customers are irritated by the defects sprayed in his software, and are always on the lookout for alternatives that are coming on the market in hope they’d be of higher quality.

In a parallel world, Joe is also a software developer, but he’s using a modern, test-driven approach to software development (TDD). This means he writes an automated test before he writes any production code, runs the test to verify that it fails, and only then proceeds to making the test pass by writing the production code that exhibits behavior expected by the test. He then refactors both the test and production code to keep it clean (void of any redundancies) and readable. He then repeats this process until the tests are demonstrating that all the behavior required from the software he’s writing is demonstrable. This way, he finds majority of defects as soon as possible: as soon as they are written, and is able to give his QA partners code that actually works. This allows them to focus on finding integration defects that are typically product of communication breakdowns between Joe and his fellow developers, or between Joe and the actual users of the software he’s writing. Such defects inevitably exist, but are more easily found, since the code is at least behaving the way Joe thought it should. As a consequence, Joe can reproduce such defects more easily, as it typically requires simply modifying some of the automated tests to expect different behavior. Occasionally, it also involves adding more complicated tests that cover corner cases that Joe hasn’t originally thought of. Overall, there are fewer fires to fight, and the software delivered to customers is of much higher quality. Consequently, customers are thrilled to use Joe’s software, and eager to use new versions and features, which Joe has more time to work on, as he rarely gets interrupted by defects escalated by customers.

Now let’s take a peek into some of the code that Joe is writing. In both of the parallel worlds that we’ll be observing, he’s writing a set of Java classes representing data about products and customers (it looks like Joe has just started writing a new business application). So far, he’s written two classes, Product and Customer:

public class Product {
 private List<Customer> customers = new LinkedList<Customer>();

 public void addCustomer(Customer customer) {
  customers.add(customer);
  customer.addProduct(this);
 }

 public Collection getCustomers() {
  return customers;
 }

}
 
public class Customer {
 private String name;
 private List<Product> products = new LinkedList<Product>();
 
 public Customer(String name) {
  this.name = name;
 }
 
 public String getName() {
  return name;
 }

 public void addProduct(Product product) {
  products.add(product);
 }

 public Collection<Product> getProducts() {
  return products;
 }

}

So far so good: he’s an experienced Java programmer, and has been able to implement the many-to-many relationship between product and customer class without introducing any defects in the code.

In the parallel world, where Joe is practicing TDD, he’s also already written the following automated test leveraging version 4 of JUnit framework:

public class ProductCustomerUnitTest { 
 private static final String CUST_0_NAME = "Francis"; 
 private static final String CUST_1_NAME = "Joanne";

 private Product product;
 private Customer[] customers;

 @Before
 public void setUp() {
  product = new Product();
  customers = new Customer[2];
  customers[0] = new Customer(CUST_0_NAME);
  customers[1] = new Customer(CUST_1_NAME);
  product.addCustomer(customers[0]);
  product.addCustomer(customers[1]);
 }

 @Test
 public void customersForProduct() {
  Collection<Customer> custs = product.getCustomers();
  assertEquals(2, custs.size());
  for (Customer c : custs) {
   Collection<Product> prods = c.getProducts();
   assertEquals(1, prods.size());
   if (c.getName().equals(CUST_0_NAME)) {
    assertEquals(customers[0], c);
   } else {
    assertEquals(customers[1], c);
   }
  }
 } 
}
Incidentally, this test is testing every single line of production code listed above, which gives Joe in this (TDD) world additional piece of mind, knowing (rather than guessing) that his code actually works as he expects it to. Note he’s already refactored test set up into a separate method, in order to facilitate addition of forthcoming additional tests.

Now Joe is working on data serialization and deserialization using Jackson JSON processor, so he decides to add a new method that allows setting all of the customers for a product at once. In order to leverage the existing code, he decides to implement new method of the Product class  as following:
 public void setCustomers(Collection<Customer> customers) {
  for (Customer c : customers) {
   addCustomer(c);
  }  
 }
In yet another parallel universe, Joe would simply check this code into the source repository and move on. However, in this universe, Joe is being extra careful. Reviewing this code, Joe realizes that the behavior of the method is not quite fulfilling the expectations its name is implying: it’s only adding new customers, but any pre-existing customers stay there. Following the principle of minimal astonishment, he decides that this method should remove any customers previously added to this product and changes the implementation to:
 /**
  * Changes the collection of customers to the provided one.
  * Any previously registered customers are removed.
  * @param customers new collection of customers for this product
  */
 public void setCustomers(Collection<Customer> customers) {
  customers.clear();
  for (Customer c : customers) {
   addCustomer(c);
  }  
 }
At this point, Joe is satisfied with the behavior that he took the time to clarify in the associated API documentation and moves on to implement next feature.  Deserialization from JSON works great and the product is delivered to the customers. However, some time thereafter, there is a huge defect that a customer has discovered and escalated to Joe’s CEO: in a particular scenario of product sales, Joe’s software has been causing their production system to fail to deliver ordered products, resulting in financial loss due to penalties and customer dissatisfaction. After seven weeks of painful tracing and debugging of the whole production system, the root cause of the issue is finally isolated and traced right to the code above: one of the Joe’s colleague’s has used this method to implement an alternative scenario that both he and QA failed to test! Joe has lost countless nights debugging, and his reputation of a solid, experienced programmer is forever lost, at least in the eyes of his management chain, and Joe’s company has lost yet another head of QA department, as well as credibility with one of his most important customers and in the marketplace in general.

Meanwhile, in the parallel world, Joe practicing TDD had added the following test before changing any code in Product class:
 @Test
 public void setCustomers() {
  Collection<Customer> custs = new LinkedList<Customer>();
  custs.add(customers[1]);
  product.setCustomers(custs);
  Collection<Customer> result = product.getCustomers(); 
  assertEquals(custs, result);
 }
Joe then added empty stub for setCustomers() method to Product class in order to allow the test to compile and fail. Only then he proceeded to implement the method in the exactly the same way as Joe in the parallel world that has not been practicing TDD. However, in this parallel universe, the test above ruthlessly indicated there was an issue with the implementation: the assertion at the end of the test was failing. This made Joe take a second look at the production code he’s just written and notice that he’s been resetting the collection that was being passed as the method argument, instead, as intended, the collection that has been stored as member variable of the class, because they happened to share the same name. The fix is as simple as qualifying collection to be reset with the keyword ‘this’:
 /**
  * Changes the collection of customers to the provided one.
  * Any previously registered customers are removed.
  * @param customers new collection of customers for this product
  */
 public void setCustomers(Collection<Customer> customers) {
  this.customers.clear();
  for (Customer c : customers) {
   addCustomer(c);
  }  
 }
Now the customersForProduct test happily passes and Joe knows he’s done with implementation of this method and can move on with his work. The defect that ruined Joe’s career in the parallel universe has been found and fixed within seconds of its creation, when its impact was no worse than a minor surprise to Joe when a test that was supposed to pass failed to do so. Fixing this defect took Joe less than a minute because he knew exactly what code was causing the test to fail, and he had written it only seconds ago, so he knew exactly what it was supposed to do.

In contrast, in the parallel universe where Joe was not practicing TDD, by the time the escalation came back to him, he had completely forgotten everything about this piece of code, so it took him much longer to reconstruct his mental state in order to come up with the simplest solution. But the worst of all was that by then vast majority of the damage had already been done – the defect had been silently ruining Joe’s career, and finding it was akin to finding it a needle in a haystack, when the scope of the search was the full, complex system.

So, next time you set out to write some production code, remember you actually have the power to choose the parallel universe you want to live in! Which one will you choose? I welcome your thoughts and related experiences in the comments section!