Sunday, December 13, 2020

Serverless Nirvana

This year, 2020, I took some well deserved time off from enterprise software. That does not mean, however, that I did not design and code any software. On the contrary!

During our shelter-in-place, my youngest daughter discovered the ancient board game of Mancala, and we started playing this game daily. After a while, we discovered some patterns in the game, which inspired me to start coding a back-tracking algorithm in Python that would be able to play Mancala against me and other members of my household and win (at least some of the time).

After a couple of days of a disciplined test-driven design (TDD), I was already playing against my freshly minted algorithm looking 7 steps ahead on my laptop using a simple command line interface.

Then I figured it would be cool if I could make this available to other members of my family, either to challenge my algorithm, or each other. But I also did not want to permanently dedicate any hardware to support it. Wouldn’t it be nice if this game would be 100% available, scalable to any imaginable number of players, and at the same time easy and affordable to operate?

Having previously used Amazon Web Services (AWS) Lambda Functions and API Gateway, I already knew this was possible for the back-end services. So I used these two serverless technologies to build back-end ReST APIs for playing Mancala, managed via Serverless Application Model (SAM). By leveraging principles of hexagonal architecture, I was able to reuse the Python code that was already supporting the command line interface by simply adding a lambda handler adapter on top of it.

Then I did some more research and found out that it is very much possible to also have the front end be serverless, as long as one uses one of the modern front-end frameworks to build the web client (e.g., React, Angular, or Vue.js) and publishes the front-end via AWS Simple Storage Service (S3).

After some further research, I picked Vue.js because it's the most recent of those three, it seems to be picking up mindshare, it seems to strike a nice balance between design philosophies of React and Angular, and it seems to have optimal start-up and run-time performance. But perhaps most importantly, Vue.js is also being touted as the framework that allows the quickest developer ramp-up of the three. Indeed, within a week, I had already deployed an initial version of the web user interface that allowed me to play against my algorithm.

Once I started using this early version of my web UI, I realized it was important to allow playing a single game over multiple browser sessions, for which I added persistence in DynamoDB (AWS’s no-SQL serverless database), again by adding an adapter in the back-end Python code. That means I could switch to a different data store without any changes to the rest of either front-end (JavaScript), or back-end (Python) code. But considering DynamoDB’s single-digit ms latency (both reading and writing), I doubt I’ll need to switch any time soon.

It took another week (of 2-3 hours of development per day) to implement a few other features (like animation of computer moves, allowing full control over who plays against whom, and listing rules of the game) before I could say that this serverless Mancala was ready to provide some joy not only to me, but to others as well. If you’d like to use it, feel free to do it here.

Mancala Web GUI
The link above should work for any modern browser; I’ve tested it both on current versions of Chrome and Edge on Windows, Chrome and Safari on MacOS, as well as Chrome on Android.

My early adopters (i.e., the members of my family) have so far played over 200 games. How much did AWS charge me for hosting those games? Nada! Zip! Zero dollars! All of the services used (S3, Amazon API Gateway, AWS Lambda, Amazon DynamoDB, CloudWatch and CodeCommit - AWS GIT repository storing my source code) are available within the AWS free tier, and so far I have not used more than 1% of their monthly quotas available.

Besides being a lot of fun, this little project of mine has demonstrated that web (or even mobile) applications today can and should be built using managed services (a.k.a. serverless). Using dedicated machines, even if they are virtual and managed by your cloud infrastructure provider is quite obsolete, and extremely non-economic!

Sunday, August 30, 2020

Tom Siebel's Dilemma

I still vividly remember entering Peninsula Golf & Country Club in San Mateo, California, the venue that was hosting Siebel Systems’ 1999 Holiday Party. Best Russian caviar and vodka on ice were greeting us at the very entrance, followed by countless offerings of gourmet food and open bars. Live band entertained over a thousand of Siebel Systems' employees and their guests, which danced well into the night.

Fast forward three years, and I was celebrating holidays with my colleagues over hot dogs and hamburgers, no guests invited. And I still worked for Siebel Systems! What happened? How could the fortune of a company change so drastically within just a few years?

I’ve thought of several different answers to this question over the last two decades, but I’ve recently read two books that allow me to provide the answer based on a solid scientific theory: The Innovator's Dilemma by Clayton M. Christensen, published back in 1997, and its follow-up from 2003: The Innovator's Solution, which Christensen co-wrote with Michael E. Raynor.

The Innovator's Dilemma and The Innovator's Solution, are just as relevant today as they were when they were written, because they explain market forces that are caused by human behavior, which evolves much, much more slowly than the technology. Both talk about disruption, which is a word that has become often used by technologists, not only in Silicon Valley, but all around the world. Yet, while virtually every start-up seeking funding from a venture capitalist is likely to claim its product or technology is disruptive, most start-ups fail in their effort to disrupt the well-established market leaders. These books define disruptive technologies as those that have the potential to bring the established market leaders down, and new entrants up, and help identify such disruptive technologies by recognizing their common properties. The Innovator's Solution goes beyond that in distinguishing strategies that can be used to successfully pursue disruptive opportunities from those that are doomed at the outset.

It is intuitive to think that every technology that makes a particular kind of product significantly better would be disruptive, but Christensen found, through his research of hard-disk market evolution over several decades, that this is not so, and that most technologies that improve existing products are what he calls sustaining technologies. It turns out that market leaders have an excellent track record of adopting sustaining technologies, regardless how different they are from their predecessors. Thus, a start-up trying to disrupt the market based on a sustaining technology is doomed to fail.

Distinction between sustaining and disruptive technology is primarily in its effect, which is visible only after the market forces have played out their interactions, so how can this distinction be used to predict the future, and choose the best strategy for dealing with a particular new technology?

It turns out that any disruptive technology, when it first emerges, is never good enough to satisfy the needs of the existing market. It is always inferior to the established technology of the day, at least when looking at the quality attributes that current market values most. Disruptive technology thus gets shunned by the market leaders, not because these companies don't want to adopt it, but because their current customers are not ready to buy products based on this technology. So, a company that embraces such disruptive technology early needs to find a new market for it, where quality attributes are differently ranked than in the main market (e.g., size instead of speed). That emerging and growing market can be then used to drive and finance the continuous improvement of chosen disruptive technology.

The rate of improvement of almost any technology (regardless whether it is sustaining or disruptive) is higher than the rate at which the needs of the market it is serving are increasing. Therefore, sooner or later, disruptive technology becomes good enough not just for the emerging, but also for the low end of the previously established market. Once that happens, the company that established itself as the leader using disruptive technology in the emerging market has an insurmountable advantage over the incumbents in the established market that had not already mastered this disruptive technology. This causes the incumbents to start losing their market share in the lower end of the established market, and move towards the upper end of the market, which typically also happens to be more profitable. But this trend inevitably continues, and eventually the incumbents have no upper market to migrate to, and are driven out of business. This process takes time, but its course seems to be very predictable.

The Innovator's Dilemma outlines, and The Innovator's Solution dives deep into the strategies that managers of established market leaders can use to cope with disruptive technologies. They explain that the only way incumbents were able avoid disruption was to create an autonomous organization chartered exclusively with the task of finding an emerging market that can be served by the disruptive technology and establishing itself as a leader in this new market. The best way for this organization to have a decent chance of achieving its goal is to start with a small, cross-functional team capable of not only innovation, but also establishing its own processes, values and culture, typically very different from those of the incumbent’s.

Now let us apply this theory to analyze and explain how Siebel Systems got disrupted and why its strategy in defending its market leadership turned out to be completely ineffective.

Siebel Systems was founded in 1993 by Tom Siebel, who had previously been a successful salesman at Oracle corporation. Siebel Systems disrupted established packaged enterprise software vendors by establishing an emerging market for Customer Relationship Management (CRM), riding the wave of ever-increasing performance of personal desktop and especially laptop computers that allowed salespeople to bring all the relevant information about their existing and potential customers with them, and share them with their coworkers, even when travelling. Siebel Systems based its architecture on an easy-to-use graphical user interface leveraging Microsoft Windows (and underlying MFC library), connecting directly to a central database server, powered by Oracle's relational database.

This product was radically different from existing packaged enterprise software products dominated at the time by SAP and Oracle, which were designed to run on mainframe and minicomputers, and so their user interfaces were designed to accommodate character-based terminals. Siebel Systems created and heavily patented its proprietary data replication technology, which allowed relevant subset of data to be synchronized between individual salespeople's laptop computers and the central database.

All that ensured Siebel System's virtually unchallenged leadership in the fast-growing CRM market throughout the 1990s, resulting in exponential sales growth, focusing its large and growing direct sales force on large, Global Fortune 500 clients as the most profitable segment of the CRM market. This market position allowed Siebel Systems to become a publicly traded company in 1996. Life was good, and the company was also growing exponentially the number of its employees, both organically and through mergers and acquisitions, reaching 1500 employees in 1999, and up to 8000 in 2000.

When I joined Siebel Systems in the spring of 1999, first thing I noticed was that all the code base was in C++. At the time, that was still the dominant programming language in the software industry, but a new programming language called Java, and its underlying technology of Java Virtual Machine (JVM) was gaining in popularity. At the time, Java indeed had properties of a disruptive, rather than sustaining technology: its run-time engine was significantly slower than C++. It also had three different, very attractive advantages over C++:

  1. the same code ran without any modifications on a wide variety of operating systems and, thus, hardware platforms
  2. development sped up by automatically managing the internal memory of a program running on JVM
  3. vibrant ecosystem of open source libraries and tools for emerging technologies like XML, web services, TDD etc.

Having looked at Java around version 1.1, Tom Siebel decided it was too slow for Siebel Systems' customers (which valued performance and scalability of its CRM system at premium), and even though the advantages of Java platform had potential to significantly improve productivity of Siebel Systems’ engineering efforts, Tom Siebel made executive decision that Java was not to be used. After that, all product development in Siebel Systems continued exclusively using C++.

In the meantime, Sun Microsystems, having found plenty of early adopters for its Java platform (which was initially designed for cable-TV receivers), kept improving JVM's performance, and in April 1999 came up with its HotSpot JVM, whose run-time performance got much closer to C++ by virtue of just-in-time-compilation technology.

Incidentally, at the same time while Tom Siebel was dismissing Java platform as non-attractive, another ex-Oracle sales and marketing executive, Marc Benioff started a small start-up called Salesforce. He and his co-founders were building a scaled-down version of CRM focused on sales automation using no other than Sun’s Java platform.

However, Benioff and his co-founders were not simply betting on Java platform as the disruptive technology which would allow their product to disrupt Siebel. They created a product that, beyond using JVM at the back end, which customers did not see, had four key differentiators very visible to their customers:

  1. Software-as-a-Service (SaaS): hosted on Salesforce.com
  2. accessed exclusively through a web-based GUI
  3. charging a monthly subscription fee
  4. no customization allowed

This was all in stark contrast with Siebel Systems' CRM product, which was being:

  1. deployed by the customers on their premises
  2. accessed via thick Windows client
  3. licensed for indefinite use by charging a large up-front fee
  4. heavily customized by system integrators that would defer deployment by many months and often not bring the promised benefits

Considering all these dimensions together, Salesforce's CRM product was definitely a disruptive technology, because it did not satisfy the needs of Siebel Systems' existing customers, which valued performance, scalability, usability and customizability over speed of implementation, ease of upgrading and low cost of ownership in general.

Because of this, Salesforce.com was forced to find a new, emerging market for its CRM product. It turned out that small businesses had not been able to use Siebel Systems’ CRM because of high initial cost. Salesforce.com appealed to those businesses because it required a small initial investment. They also valued that it could be up-and-running within days, because it did not require provisioning of any hardware or software. Translated to Christensen's terminology: Salesforce.com entered the CRM market by competing against non-consumption on its low end, rather than directly competing against Siebel Systems' CRM product.

While Saleforce.com was establishing itself as undisputed leader at the emerging low end of the CRM market, Siebel Systems was reaping the benefits of dominating the high end of the CRM market: revenue was growing exponentially. Fortune magazine recognized it the fastest growing company in 1999. Ferraris were flooding Siebel Systems headquarters’ parking lot, while Tom Siebel was dumping fuel from his private jet over Los Barriles in Baja California in preparation of a crash landing. His ever-larger ego had pushed him to attempt taking off from the middle of the runway at the local small-aircraft airport, during which he damaged his landing gear.

It's not as if Tom Siebel did not have a clue that Software-as-a-Service may be disruptive technology that needs to be invested in. In fact, as early as the 1st quarter of 1999 (9 months before Salesforce.com was launched), he ordered Siebel Systems' engineers to build a web-site called Sales.com. It was released to the public in March 1999, and shortly thereafter spun-off as a separate entity, funded by Siebel Systems in partnership with a couple of other large businesses. This seems to be following recommendations from The Innovator's Dilemma by the letter, right? Well, not exactly.

Siebel's culture was basically copied from Oracle, with all major decisions being made by the founder and CEO, in this case: Tom Siebel. First release of Siebel Systems' SalesForce Automation (SFA) product was developed based on the exact specification that Tom Siebel personally wrote. That was possible because Tom Siebel had been a salesman for a long time before he wrote the specification, and he had already supervised building a very similar system for internal use at Oracle. But neither Tom Siebel, nor anybody else at Siebel Systems could know what exactly Sales.com should look like to gain traction with customers that were not yet Siebel System's customers. Indeed, nobody in the world at the time knew exactly how that new market would emerge, not even Mr Benioff. Regardless, Tom Siebel had Sales.com built exactly according to his specification, which of course was based on completely wrong assumptions. It turns out that what this new market needed was a simple tool for enabling collaboration among a relatively small sales team. Instead, what Sales.com initially offered was a productivity tool for individual salespeople, with no collaboration capabilities! Unfortunately, once this mistake became clear, Sales.com was slow to adapt, because it inherited culture and processes from Siebel Systems, which meant big, slow and infrequent releases.

It also did not help that Sales.com did not have any clear vision of how to create revenues, let alone make profit, which is again going straight against Christensen's and Raynor's recommendations. To put things in perspective, Sales.com was launched at the very peak of the so called .com bubble, when the words revenue and profit were becoming quite exotic, and even frowned upon. But then, after this bubble burst in April of 2000, Sales.com could not get any more cash to burn. Having utterly failed to establish itself in the emerging CRM market for small business, Sales.com got folded back into Siebel at the end of 2000. The site was shut down soon thereafter, and Sales.com staff was again working full time on Siebel Systems' main product. Around that time Tom Siebel publicly told analysts on an earnings call how providing software-as-a-service made a lot of sense to him, but his customers just didn't want to buy it! To anyone that had read The Innovator's Dilemma, this would have been a clear sign that disruption was just around the corner.

At the same time, Salesforce.com was using a much more agile approach, iterating quickly and often, which allowed it to tap into unmet needs of the emerging market. At the same time, it was actually making money by charging subscription fees for their product, steadily and exponentially growing its revenue, which enabled it to do an initial public offering of its stock in June 2004.

While Salesforce.com was slowly, but steadily growing, Siebel Systems was busy making yet another mistake warned about in The Innovator's Dilemma: trying to apply a disruptive technology (in this case, web-based GUI) as a sustaining technology. After a very successful release of Siebel 6.0 in Q1 of 2000 and signing a significant partnership deal with IBM, which pushed Siebel Systems' stock price to its all-time high with evaluation of $40 Billion, Tom Siebel decided in summer of 2000 that the next major release, Siebel 7.0 was going to have all the functionality of Siebel 6.x thick Windows client packed in a web-based GUI, and that the good old Windows client was no longer to be available. He justified this decision primarily by client needs, quoting the issue of deploying 60 000 thick clients at IBM. And, by the way, all of this had to happen within 6 months!!! At the engineering all-hands-meeting kicking off Siebel 7.0 release, Tom Siebel said:

I know I'm asking for impossible, but I don't care!

In order to provide such functionality in a web-based GUI, Siebel Systems' front-end engineers decided to run a bunch (more than 1 MByte) of JavaScript code in a Web browser. But the web browsers at the time were not optimized for running large amounts of JavaScript code. That caused Siebel 7.0, when it was finally released in September 2001, to be completely rejected by the customers, as its web-based GUI was unacceptably slow! At this point, the only way Siebel Systems' front-end engineers could think of how to speed the web-based GUI was to rewrite most of its functionality in C++ packaged as ActiveX controls. Again, because of Siebel Systems' culture of big, slow releases, it took them almost a full year to make this Siebel 7.2 release generally available. This significant delay of usable Siebel 7 release caused Siebel Systems to start losing a lot of sales deals. And not only was it losing its market share to the disruptor, Salesforce.com, on the lower end of the CRM market, but now it also started losing high-end customers to PeopleSoft, an incumbent that had invested heavily in rebuilding its enterprise product suite on top of Java platform! Notice that PeopleSoft adopted Java platform as a sustaining technology, because its performance was by now good enough for enterprise customers.

But even once Siebel 7.2, with a fast-enough web-based GUI, was finally available to its customers, now it could only run on Windows, in its Internet Explorer web browser. And because of the long release cycle, Siebel 7.2 ended up using the older, version 5 of Internet Explorer, even though version 6 came out before Siebel 7.2 was released. At the time, this seemed like a perfectly acceptable compromise, as Microsoft had just won the The First Browser War, with Internet Explorer 5 and 6 together reaching 96% market share. However, Siebel's dependence on Internet Explorer became its Achilles heel in the long term. When, a few years later, Internet Explorer lost its market share to Google's Chrome, which provided a simpler and faster user experience, Siebel CRM’s inability to leverage Chrome’s optimized JavaScript engine has been one of the main reasons for replacement of Siebel 7+ installations with Salesforce.com in recent years.

With 20/20 hindsight from 2020, it is quite clear to me now that Tom Siebel's failure to see web-based GUI as a disruptive, rather than sustaining technology (because it was not yet able to satisfy Siebel Systems' existing customers needs) caused Siebel Systems to reach an inflection point, where the revenue’s growth slowed down. This then caused Siebel Systems to lose more than 70% of its market evaluation, which then forced Tom Siebel to do a series of lay-offs, which caused employee morale to plummet in a vicious cycle that eventually led to the acquisition by Oracle.

In the 4th quarter of 2003, Tom Siebel tried his last attempt to defend from disruption: he acquired a financially struggling direct competitor to Salesforce.com called Upshot. At the time, Upshot had roughly 1000 customers, while Salesforce.com already had 7500. Siebel merged Upshot engineers with engineers from an earlier acquisition, a Canadian company called Janna, to form a semi-autonomous, physically distributed engineering organization chartered with building and hosting Siebel OnDemand. This was a direct response to the Salesforce.com, which by then became good enough to start making inroads to the mid-market and even high-end of the sales automation market. Unfortunately for Siebel Systems, this was too little too late: even though Siebel Systems at the time had an order of magnitude more resources than Salesforce.com, Siebel's newly formed OnDemand organization could use only an order of magnitude less resources than Salesforce.com, so competing head-to-head against it was a losing proposition from the outset. The reason why Siebel could not use all of its engineering, product management, marketing, sales and other resources to fight against the disruption was due to the fact that the majority of its revenue was still coming from the product that was built on technology stack (C++, MFC and its awkward port to Unix platforms, proprietary web engine, ActiveX, Internet Explorer 5) that was already obsolete, but by now switching to a newer stack was prohibitively expensive.

At this point, Siebel Systems was investing roughly 96% of its research and development budget on its main-stream product and its proprietary, obsolete technology stack, ~ 3% on its OnDemand product, and ~ 1% on building a brand new platform. The latter was kicked off as an engineering effort by a team of ten most senior engineers and a single product manager, tucked several blocks away from the headquarters. While this again sounds very much like following advice from The Innovator's Dilemma how to embrace a disruptive technology, it was a completely misplaced strategy: first of all, just like Sales.com few years earlier, it was trying to solve the problem that only Tom Siebel saw as relevant: how to write applications that can run both on Sun's JVM and Microsoft's .Net platforms. Second, at that time, neither of these technologies was disruptive to Siebel's customers; those customers had typically already adopted one or the other, but not both. Given this artificially created problem, this group of engineers came up with a compromise solution: use a very old version of Java (1.1.3), which predated Microsoft's .Net fork from Java, so that the same source code could run both on JVM and .Net run-time engine. They did not seem to care that, just like the usage of ActiveX controls in Siebel 7, this would prevent this "wrapper" platform from leveraging any new features introduced in either Java or .Net platforms, both of which were at the time (and for many years afterwards) still being enhanced at a very high rate of innovation.

It should then come as no surprise that this new platform development effort did not yield any business value, or useful artifacts, even though by summer of 2005 it was wasting time of ~30% of Siebel Systems' engineers, until Oracle executives swiftly killed it upon acquisition of Siebel Systems in early 2006.

If I were to abstract this whole story of Siebel Systems in a single sentence, I would say it is indeed a classical example of disruption. Neither first, nor the last one, for sure. But it could have been avoided if Tom Siebel weren't so arrogant, thinking he knew all there was to know about both business and technology. This caused him to completely miss the opportunity to recognize that web-based GUI and Software-as-a-Service, sold under subscription model, were indeed complementary disruptive technologies that had to be groomed together in an autonomous, cross-functional organization with different culture, values, processes and cost structure than that of Siebel Systems, which would make it capable of finding an emerging market for those technologies. If he used Sales.com in such a way toward such a goal, he would have avoided the disaster of Siebel 7 release trying to use web-based GUI on top of C++ platform as a sustaining technology. If Sales.com engineers had been given enough business and cultural autonomy in 1999, they would have easily figured out that Java at the time was already a better platform for web-based GUI, and that appropriate emerging market for web-based GUI consisted of small and medium size businesses. And by 2003, Sales.com, powered by enhancements in Java platform and web browser technology, would have enhanced its product enough to be able to fight off Salesforce.com, and be good enough to have some of Siebel's high-end customers adopt it as well.

After acquiring Siebel Systems for $5.85 billion in early 2006, Oracle continued selling Siebel System's flagship product and kept Siebel as its trademark, easily recovering its investment with handsome dividends. But by summer of 2020, just like the theory in The Innovator's Dilemma predicts, the disruptor (Salesforce.com) has moved up in the previously established (CRM) market (more or less abandoning the low end of the market amid fierce competition) and the vast majority of incumbent's (Siebel Systems) customers have become disruptor's customers. Despite two major recessions that have happened since its IPO, Salesforce.com has continued growing its revenue exponentially with still no end in sight. I don't know for sure, but I'm guessing that Marc Benioff, unlike Tom Siebel, had actually read The Innovator's Dilemma!

Saturday, September 14, 2019

Test Driven Chat Bot Development

Hi
Hello, what can I do for you today?
I'd like to know the meaning of life
I'm afraid I did not understand that. Can you, please, rephrase it?
Can you tell me the meaning of life?
I'm afraid I did not understand that. Can you, please, rephrase it?
Forget it
I'm sorry I could not help you. It seems my developers have some more work to do!

Just in case you haven't already noticed, let me state it here: we are living in the age of robots, and, more and more often, less than perfect chat robots (a.k.a. chat bots).

In the fall of 2017, I was about to make myself a hot beverage, exact kind yet to be determined, when my CEO walked into the kitchen.
"So, how was the conference?", I asked waiting for the water to boil.
"Very exciting, everyone seems to be talking about AI. It's about time we started putting some AI into our product offerings." said CEO while pouring milk into his tea.
"As a matter of fact, I've acquired quite a bit of knowledge in the area of machine learning. Maybe I can help.", I offered while stirring sugar into my coffee.
"Good to know!" said CEO and walked away with his tea in hand to yet another sales call.

One small disadvantage of working in a small company is that you have to make your own coffee. One large advantage of working in a small company is that, every now and then, you get to do it while chatting with your CEO.

Several weeks after this serendipitous kitchen chat, the same CEO asked me to start building a chat bot that will be able to take over some of our customer's load in supporting their customers. After a week of intense research, I decided to give Amazon Lex a shot and build a proof-of-concept chat bot using this new technology that Amazon released to general public in April 2017.

At this point, I had to make a critical decision: do I treat it as a throw-away prototype and just hack a bunch of code as quickly as I can, and hope it works? Or do I use the best way I know to build high-quality software product, and hope that the code sticks?

I decided to do the latter, i.e., to build a chat bot using test-driven development methodology. First obvious question was: is this even possible? After all, I'm building software that talks to real people. It must be pretty hard writing automated tests that are going to simulate real person and verify that the chat bot is responding with voice that a real person can understand.

The short answer: yes, it is possible. A longer answer: chat bot can be developed in test-driven fashion, but not by using end-to-end tests that simulate a real person. That approach would be doomed from the start, because feedback loop would be too slow and too unreliable. Instead, one has to take a look at the 3rd party tools one is using and make sure that the code one is writing is developed using TDD methodology.

In case of Amazon Lex, AWS has provided speech-to-text and text-to-speech capabilities, along with a rudimentary framework for driving a linear conversation flow via configuration. These capabilities require only simple configuration, and no coding, thus TDD does not apply to them. Manual testing suffices to ensure that configuration is done correctly.

However, any non-linear conversation flow in Amazon Lex does require writing of Lambda functions, which turns out to be best done using Python, and (I dare say: any) Python code can and should be developed using TDD.

Despite the fact that AWS Lambda functions are marketed as serverless magic, where developer doesn't even know where and how exactly the code runs, it turns out that writing unit tests for Lambda functions in Python is fairly easy, since Lambda functions are called by the container using a simple, well-defined interface, and the code has no dependencies on the container itself. So, one can test Python code implementing a lambda function by running it locally, e.g., from PyCharm IDE using Python's unittest module.

So, TDD of code driving the flow of conversation on Amazon Lex platform is pretty straight-forward, as long as the code can do everything by itself. However, unless one is developing a purely phylosophical chat bot, this code will sooner or later need to connect to some external system, e.g.,, a payment processing platform.

At this point, TDD gets a bit trickier, because one can never have precise enough specification of the external system's behavior and its APIs. So, if one purely assumes expected behavior and encodes it in the unit tests that then drive the code being developed, one can easily get into a situation where all unit tests happily succeed, but the integration with the external system doesn't work at all.

The way I approach this problem is by writing a minimal set of automated integration tests, which actually call the external system's APIs and pass the information in required direction. I make it very clear from the start that these are integration tests (by using the naming convention I had developed). So in the example of payment processing API, I may create one integration for a successful payment using a credit card, one for bank account, and several for unsuccessful payment attempts. Once I write just enough production code to have these integration tests succeeding, I will then encode the discovered details of the API behavior beyond the available specification in a separate unit test, which will likely have the same inputs and outputs as the integration tests, but will be mocking the actual calls to external APIs. Once I have the external APIs mocked, I can then add tests that will drive the production code to be more robust by testing secondary scenarios and corner cases.

This step of using integration tests to design the unit tests is the key to effective TDD of any code that integrates with 3rd party APIs: if you skip integration tests, it's going to be very hard to get the integration to work. But if you only keep integration tests (which is very tempting), you will get stuck with slow and unreliable tests, which will over a long haul become a black hole sucking more and more of your development efforts. A little overhead of building true unit tests the encode the essence of the external API behavior as you are extracting it from integration tests is a very good investment.

Once I have fast and reliable unit tests, I can run them all within a few seconds from PyCharm IDE, and get instant feedback if any of my further changes to production code ever accidentally changes the expected behavior of the code. This allows me to keep the code clean without fear that I might be introducing regressions.

Even though I don't run them often, and they are almost certain to get broken over time, I usually do keep the integration tests I wrote - they tend to be quite useful once in a while, e.g., when external API makes backward incompatible change, or if I find I need to make major refactoring which changes internal APIs. I just don't expect them to work without some manual set-up. This is especially true if the integration test is verifying some non-idempotent behavior, e.g., recording that a bill is paid.

An interesting challenge to this TDD-ed code base was presented by our decision, due to some advanced functionality deemed critical, to switch from Amazon Lex to IBM Watson Assistant some 6 months into development of the bot. Because the architectures of the two platforms are quite different, some of the peripheral production code, together with related unit and integration tests had to be thrown away. But since the core functionality didn't change, the core code could easily be reused. This is not to say that TDD alone was responsible for flexible code architecture, but the fact that I could incrementally refactor and re-organize the code while keeping it operational while instantaneously detecting any regressions allowed me to over time achieve flexible code architecture, albeit under constant pressure to demonstrate progress in supported functionality.

Switch to IBM platform did bring along an interesting challenge: unlike Amazon Lex's simple configuration, IBM Watson Assistant provides a fairly complicated proprietary programming language for managing the conversation flow, without much support for debugging. So, the development team (which in the meantime grew to 4 developers and a tester) soon found out that TDD-ing only Python code was no longer enough. Regressions were possible, and were indeed happening in Watson Assistant's conversation flow.

The team responded by building a simple test automation framework in Python where automated tests were passing text input to chat bot API over HTTP and verifying text output returned by chat bot API. This avoided the complication of dealing with voice, while still allowing the development in IBM's proprietary language to be driven by tests. I sometimes call those end-to-end (E2E) tests, even though that's not quite correct, as the voice conversion to text and back is not covered by those tests. And sure enough, over Labor Day weekend 2019, IBM managed to break Speech-To-Text component in production, which our E2E tests would not detect. But, at the end of the day, this breakage had nothing to do with the code my team has produced, thus having our tests detect them would not have helped much.

Regardless whether you call them E2E or integration tests, they do unfortunately have all the negative properties of E2E tests: they are slow and brittle. That's why they are best developed using TDD, because it implies white box testing and (by developer's inertia) minimizes the number of tests. Give a dedicated tester a task to create a suite of such tests in isolation after the bot had already been developed, and tester's inevitable treating of the chat box as a black box will ensure that you end up with a pile of tests that are impossible to maintain.

Over time, through extensive retrospection, my team has discovered that such E2E tests can also be used to collaborate on the specification of the functionality to be built, which is much more effective than any other specification format. This has lead us towards adopting Specification by Example.

Tuesday, February 2, 2016

Code Review or Pair Programming?

I've been taking a few programming courses on Coursera lately. In case you're not familiar with Coursera, it allows tens of thousands of people to simultaneously take university-level courses for free (on-line, of course).

In particular, I've been learning programming language R and its application in machine learning. Naturally, there are a number of programming exercises used to enforce student's learning by allowing the theory from the lectures to meet the reality of actual code. Most of them provide immediate feedback by running fully automated tests against the student's code. However, there are a few that rely on peer review.

At first, I was quite skeptical of the idea that students, who are still learning the subject matter, are asked to grade and provide feedback on other students' work. Usually that's job for the experts, e.g., University's teaching assistants. Plus, from my experience in software industry, peer reviews are typically very bureaucratic and adding very little value, in other words: waste of time.

As I went through several rounds of peer reviews, both reviewing other people's work, as well as reading other people's feedback on my own work, I started realizing I had been completely mistaken. It turns out that the opportunity to review other people's solution to the same problem that I had already independently solved is actually an opportunity to learn and connect the dots that may have been left unconnected by simply listening to the lectures and doing my homework and practical exercises my own.

How come, I wondered, that this peer review mechanism works so well in the global, massive on-line open learning environment, and utterly fails to deliver value in the software engineering workplace? Then it dawned on me: what the students are doing in Coursera in the form of peer reviews is actually much closer to pair programming than peer reviews in the workplace!

The key constraint in Coursera is that peer reviewers are only allowed to review their peers' work after they submit their own work for review. That means they are already intimately familiar with the problem at hand, as well as at least one solution to that problem. This pre-condition enables the reviewer to easily understand the intricacies of individual approach, recognize important differences, and then ultimately benefit in the exchange of ideas and experiences that ensues.

When one compares this process to the typical peer review process in the workplace where reviewers are typically neither familiar with the problem being solved, nor with the solution space, it becomes clear that Coursera's peer review is actually a surrogate pair programming practice, merely compensating spatial and temporal distances of the global learning community.

Monday, May 4, 2015

Why xUnit Should Be Called xTest

JUnit is one of the simplest, yet also perhaps the most effective of all the software frameworks I've ever seen: it solves a very specific class of problems (writing and executing automated tests in Java) very well, with very little code and very little ceremony, which is especially true of version 4, which takes great advantage of Java annotations. Consequently, I use it whenever I'm writing Java code using Test Driven Development (TDD) method.
There is one big problem with JUnit, however, that keeps popping up over and over again: its name.
Every now and then, I will see Joe the developer broadcasting an e-mail to explain that Mary from the other group broke several unit tests of Joe's component by changing behavior of her component that Joe's component depends on. This is typically followed by a comment by Bob saying that he's very well aware of Mary's unfortunate change, because it also broke five unit tests of Bob's component.
What's going on here is that a typical developer these days seems to think that the very fact that a test is written using JUnit, or any other xUnit framework (where x is typically substituted by the first letter, or an acronym of programming language that the framework supports) automatically makes that test a "unit test".
Majority of automated tests written using JUnit or any other xUnit testing framework that I've had the opportunity to see are in reality integration tests, which test not only the unit that the author is interested in testing (typically a single class, or a cluster of tightly coupled classes), but also every bit of code (including 3rd party libraries, services and systems) that this unit happens to depend on. That's probably because xUnit tests become integration tests by mere inertia, while making them unit tests requires focused effort on mocking out dependencies.
Why does incorrectly categorizing such tests matter? Because the properties of unit and integration tests are extremely different! For example, by calling the JUnit-based test that you've written a "unit test", you are implicitly signalling to other developers that it can be run quickly (in a matter of milliseconds) and reliably, without impacting any database, or other system. But if that test turns out to be integration test, these implicit assumptions are going to be violated, and cause a lot of grief, potentially not just to the person who ran the test (e.g., by having to wait 30 minutes for it to finish), but also to the people that depend on this test (e.g., by corrupting their database, and thus conflicting with the automated tests they've gotten accustomed to depend on).
Over years, I've adopted a simple, but effective naming scheme that minimizes confusion about what kind of test a particular xUnit class implements: if the class is a collection of true unit tests, then it has suffix "UnitTest", and if it's a collection of integration tests, then its suffix is "IntegrationTest". This does make the test class names a bit longer than with the usual xUnit suffix "Test", but it takes the guess work out of the question. It also allows automated running of only unit tests, or all automated tests, depending on their type by simple pattern matching.
Perhaphs the biggest advantage of such approach when applied with discipline is that it forces me to think about the type of the test I'm about to write before I write a single line.
If I have the choice, I will almost always write a unit test, because it's always going to be faster, more reliable and easier to maintain than the integration test of the same functionality. But when the code I want to test depends on a library or framework that's hard to replace with a testing dummy or a mock, integration test may just be the right kind of test to write.
Recently, I ran into such a case when using Apache Velocity templating engine, which allows adding few lines of its (quite awkward) proprietary scripting language to the HTML page in order to populate it with data. This is accompanied with a few lines of Java code that provides that data to Velocity framework. So, while it might be even possible to mock Velocity's Java APIs using an advanced mocking technology, that kind of unit test would not be particularly valuable, as it would not test the script code embedded in the HTML. So, in this case, I decided to create a class that hides the presence of Velocity library from the rest the system, and then write an integration test for this class, which tested the implementation of that class together with its interactions with Velocity and the script embedded in HTML. It did not have to use any database, but it did end up reading and writing files on the local file system. I used HTMLUnit testing framework to verify that the output HTML was as expected for a simplified, but still non-trivial set of inputs.
This testing strategy was useful because it addressed the highest risk of the component being tested: ensuring that all pieces of the puzzle, each simple enough on its own right, fit together and produce expected output. This integration test was definitely not as fast as a unit test: one full run would take up to a minute. It was also harder to write and maintain than a unit test: having a non-trivial set of input data meant there was a non-trivial set of outputs, which thus required a lot of assertions in the test. But, a minute long feedback loop was still useful enough to allow its repeated incremental execution, and this integration test was still an order of magnitude faster than an equivalent end-to-end test would be. It was also much easier to write and maintain, since the input data set was still an order of magnitude simpler than it would have been in an end-to-end test (where input data set for this component would have full complexity of a real world system).
This integration test paid off very quickly, when it allowed me to detect several issues, always caused by a single missing, or misplaced character (did I mention that this syntax of Velocity scripting language is really awkward?). Being able to quickly detect such issue, see the error message produced by Velocity framework, and iterate to find the fix was priceless. If I was depending on a unit test, I would never have noticed the problem at the first place.
And if I had relied solely on an end-to-end test, it could have easily taken several days, instead of minutes to find the right fix and verify it.
In conclusion, integration tests lie between unit tests on one end of the test automation spectrum, and end-to-end tests, on another end. Most of the time, they provide the least bang for the buck, and developers are better off covering their code with unit tests, and addressing highest project risks with few well-chosen end-to-end tests. But there are some special circumstances where integration tests may be the most cost effective choice. The key is to be clear what kind of automated test one is writing, and make that decision consciously by assessing the project risks and associated constraints before actually starting to write the test. Following a naming scheme like the one I've described above helps to enforce that, and avoid the trap of automatically considering every test written using an xUnit framework a unit test.











Monday, March 2, 2015

Cucumber Tip: Dry Run

Working with Cucumber, you run the scenarios all the time. Typically, you care about the result of a set of scenarios, so you want Cucumber to execute each matched scenario step-by-step and report results of this execution.

However, there are a few tasks that come up quite regularly where actually executing the step definitions is not necessary. One example: you'd like to find all scenarios with a particular tag. Second example: you'd like to find location of a step definition that matches a particular step. Yet another example: you'd like to verify that a step definition's regular expression matches the text in the scenario you're working on.

When working on such tasks, you can save a lot of time by using the dryRun option in the Cucumber runner. It instructs Cucumber to find the scenarios to run by matching the given tags, just like it does during normal run, but then skip each step, and pass this step info to formatters, again, just like in a normal run.

Using Cucumber-JVM, you can get Cucumber to do a dry run by setting the dryRun parameter in CucumberOptions annotation, like this:

import org.junit.runner.RunWith;

import cucumber.api.CucumberOptions;
import cucumber.api.junit.Cucumber;

/**
 * Executes only a dry run of released scenarios!
 *
 * @author Sanjin Tulac
 */

@RunWith(Cucumber.class)
@CucumberOptions(
    dryRun = true,
    tags = { "@released" }, 
    monochrome = true, 
    format = { "pretty" },
    glue = { "com.tulac.stepDefs" },
    features = { "src/main/resources/com/tulac/features" })

public class DryRunOfReleasedScenarios {}

In this example, I'm using "pretty" formatter to list all the matched scenarios and their step definitions, including their location. This can be very useful in case your IDE does not support finding matching step defs on its own.

To give you sense of the output without overly complicating with too many details, here's a partially shortened example of a dry run output for a simple secondary scenario:

Feature: ... (the rest of feature name is listed here)
  
  This feature ... (the rest of feature description is listed here)

  Background:                                 # campaignApis.feature:6
    Given subscription with ReST APIs enabled # FeatureStepDefs.subscription_with_ReST_APIs_enabled()

  @released ... (the rest of scenario tags are listed here)
  Scenario: asking a trigger campaign to process more than 100 leads results in a 1003 # campaignApis.feature:27
    Given a brand new global static list "101 leads"                                   # ListStepDefs.a_brand_new_global_static_list(String)
    And 101 leads in the list "101 leads"                                              # ImportLeadStepDefs.leads_are_imported_into_list(Integer,String)
    Then deleting leads in list "101 leads" via a trigger campaign fails with "1003"   # CampaignStepDefs.deleting_leads_in_list_via_a_trigger_campaign_fails_with(String,String)

1 Scenarios (1 skipped)
4 Steps (4 skipped)
0m0.000s

Notice that both background and scenario steps have comments (starting with '#') in which the file and function name of a matching step def is listed. Similarly, feature file that contains the matching scenario and the line number in which it begins are listed as comment for each scenario and associated background.

In conclusion, using dryRun option when running Cucumber scenarios can save you a lot of time by skipping all matched step defs, while still producing specified reports, so remember to use it when appropriate!

Tuesday, February 24, 2015

Jenkins Plug-In Development: 10 Lessons Learned

Jenkins is by far the most popular continuous integration server around, with hundreds of plug-ins extending its core functionality, and all of that license-free, and open-sourced. Sooner, or later, however, one encounters some functionality that one needs, which is not readily available, at which point, it's time to build a new Jenkins plug-in. Recently, I've spent several weeks developing such a plug-in, and despite many tutorials available (e.g., by Jenkins team, Miel Donkers, Anthony Dahanne), this experience taught me several lessons that were anything but obvious from tutorials, which I'd like to share here with you.

1. Plan to expend at least a few weeks

Even though you likely won't need to write a lot of Jenkins-specific glue code, it's going to take quite a bit of experimentation to get it right. Jenkins provides hundreds of different, scarcely documented ways for your code to plug into it, and figuring the best way is going to take some time, so don't hope to be done in a day or two!

2. Learn from the source code of existing plug-ins

Jenkins code base is huge, fairly complex, and neither particularly well organized, nor documented. But there are several hundreds of existing plug-ins available as open source software. So, the best way to minimize your time to market is to pick one or more existing plug-ins that are doing something similar to what your plug-in should do, and try to understand how are they doing it by looking at their source code before you start writing your own plug-in code.

3. Pick appropriate base version of Jenkins

Once you're ready to start writing your own plug-in code, you'll need to pick a version of Jenkins on which to base it. This is an important decision, so don't take it lightly. Your plug-in will not be able to run on Jenkins instances running any version older than the one you pick as your base, so you may want to be conservative by picking a fairly old version. On the other hand, new plug-in APIs are being constantly added, so if you pick a too old of a version, you won't be able to take advantage of those new APIs. Also, old Jenkins APIs are being constantly deprecated and replaced by new ones, so if you base your plug-in code on Jenkins version that's too old, you may end up using deprecated APIs. This in turn will make it much harder to maintain your plug-in, should it need to upgrade to newer Jenkins base version to take advantage of its new features. Your target installed base of Jenkins servers definitely needs to be calculated in this trade-off decision. stats.jenkins-ci.org shows the cumulative view of installed base for each published Jenkins version, but your target set of Jenkins servers may have different version distribution.

4. Consider 3rd party dependencies and their versions

Jenkins code depends on a significant set of 3rd party open source Java libraries (e.g., JFreeChart), which get deployed as part of Jenkins WAR file. Should your plug-in want to use some of these 3rd party libraries, the simplest way to make it work is to use the same version that Jenkins is depending on. In this case, your pom.xml file should not explicitly declare such dependency, but rather inherit it from the parent pom.xml file. Failing to do so will result in run-time exceptions, as the version of the 3rd party library used will be the one specified and deployed by Jenkins, unless you override the class loader used by your plug-in by following these instructions.

5. Be conservative when choosing target Java version

Jenkins is completely written in Java, so the plug-in code you write will also need to be in Java. Jenkins executes all of its plug-in code in the same JVM in which it runs, which means that if Jenkins runs in a major version of JVM that's lower than major version of Java your plug-in code is using, JVM will be throwing run-time exceptions instead of executing your code. So, if you're targeting general installed base of Jenkins servers, you'll want to forfeit the bells and whistles of the latest version of Java and use the lowest version still officially supported.

6. Develop and test in short iterations

Due to scarce documentation, a lot of your development time will be lost on experimentation, e.g., figuring exactly in which directory to put your Jelly files. To minimize development time, be disciplined in sticking to as short development iterations as possible. In other words, try to make only a single change between any two test runs of your glue code and configuration. Only this way you'll be able to catch any regressions as soon as they happen, and avoid costly "ghost busting".

7. Test plug-in code using hpi:run Maven target

Short iterations can be quite expensive if you're constantly manually uploading the latest version of your plug-in to a stand-alone Jenkins server, even if it's running on your machine. Use Maven target hpi:run provided by Jenkins plug-in pom.xml file to minimize overhead of testing the latest version. This will automatically deploy the latest version of the plug-in you're developing on a freshly started instance of your base Jenkins version.

8. Identify your plug-in with a precise version number

Jenkins APIs allow plug-in code to write to standard output that's captured as part of the Job's console output. It's a good idea to identify any output from your plug-in with a prefix of plug-in name within angled brackets, such that there's no confusion which output is coming from your plug-in, and which from Jenkins, or some other plug-in. Also, identifying exact version of the plug-in (which you increment in each short iteration) will turn out to be very useful in case some deployment issue prevents the latest version of plug-in from being deployed. Such deployment issues are quite common, so this disciplined approach is likely to pay off handsomely sooner rather than later.

9. Use java.util.logging

Console output is useful for giving general status information to the end user, but it does not scale to debugging information, where you need ability to choose the level of abstraction of debugging data displayed. Thus, for debugging purposes, your plug-in code needs to use java.util.logging APIs, which are the only logging APIs supported by Jenkins. So, make sure you're using these APIs from the start, otherwise you'll be up for an ugly surprise when the alternative logging APIs you may be used to working with turn out to be disabled by Jenkins.

10. Loosely couple plug-in code and the main code

Chances are that beyond simply plugging into Jenkins, your plug-in will need to implement some logic and/or computations. Testing such code using manual testing cycles that include restarting Jenkins will cause you to unnecessarily waste a lot of your time. And while this manual testing is pretty much necessary for the glue code, because Jenkins APIs do not allow writing simple unit tests for it, this limitation should not be extended to the core of your code, just because the glue code calls it. Instead, make sure you strictly separate the glue code, that depends on Jenkins, from the core of your plug-in. The glue code should depend on both Jenkins and core code, but core code should not depend on either Jenkins, nor glue code. Such loose coupling will allow you to write comprehensive unit tests of your core code, and thus speed up your development cycle (provided, of course, you're practicing Test-Driven Development (TDD), which I strongly recommend).
In other words, limited testability of Jenkins APIs should not prevent you from developing the vast majority of your plug-in in short iterations starting with an automated unit test, whose execution does not require starting Jenkins server. As a secondary benefit, porting your plug-in to another platform (should the need arise in the future) will be much easier than if all the plug-in code was tightly coupled with Jenkins APIs.