Are Scenario Tests Worth It?

Summary¶

In my last article, I talked about the refactoring work I did in the previous week. This week, I talk about my progress on adding scenario tests for the nested container scenarios.

Introduction¶

When I am doing any action that is even remotely technical, I think about three things in quick unison. What is the cost of doing that action? What is the benefit of doing that action? What are the risks of doing that action?

Having had those questions drilled into my brain for years, they form the basis of what is known as Risk-Cost-Benefit Analysis. A quick search with any search engine turns up multiple hits for articles on the subject, with more hits for the simplified Cost-Benefit Analysis.

As a Software Development Engineer in Test, one of my primary tasks is to do these analyses and use automation to mitigate the risks where possible. Therefore, it should not be a surprise that when I am working on my own projects, these questions are ones that I ask myself. Especially after my efforts during the last week, I believe having a satisfactory answer to those questions regarding the addition of more scenario tests to the PyMarkdown project.

What Are Scenario Tests?¶

While there are many definitions of what scenario tests are, the simplest answer is the one that I give to people that ask about my work.

User stories are a set of actions where I walk through the actions that a specific user would do to accomplish a given goal. A good example of a user story is “Fred the manager logs on to the web site and requests an activity report for their direct reports.” That user story is good because it contains useful information on the user and what they want to accomplish.

From a project level, user stories are great. They communicate the intent and goal of a set of actions that are typical of that user. But those same stories lack enough specificity for a software developer to act on. That is where scenarios come in. A scenario is a constrained action that accomplishes a specific goal. My general rule is that a good scenario usually avoids the word “and” where possible. Therefore, breaking down the user story from above, I include scenarios such as “The user logs on to the website,” “Manager requests information on direct reports,” and “Manager requests activity report”. Each one of these scenarios is integral to the user story, but together they spell out how the application solves for those user stories.

From there, the jump from a scenario to a scenario test is a simple one. A scenario test is just a test that clearly focuses on that one scenario. If possible, interactions with any other scenarios are removed and the focus is solely on that one scenario. The success and usefulness of any scenario test is related to what kind of scenarios it covers and how frequently it needs modification for unrelated changes. If it is focused enough on the scenario, those modifications are usually minimal.

Why Are They Important To The Project?¶

In a web application or a desktop application, the generation of scenarios deals more with the user interacting with the application than the various parts of the application itself. The basis for that assumption is that those application are primarily created to interact with the end user to provide a desired result. Therefore, it is that interaction that is central to the scenarios that will make that application a success.

For backend applications such as the PyMarkdown project, the focus is still on the user interaction. However, that user interaction takes place using files or payloads to be parsed or actioned upon. Accordingly, the focus in placed on the input that is presented to that application on behalf of the user. For the PyMarkdown project, that input is in the form of Markdown files which have a clearly defined specification that must be adhered to.

The PyMarkdown Linter is a linting rules engine based on top of a Github Flavored Markdown compliant parser. The starting point for the scenario tests was the specification itself. However, since the specification focuses on HTML output and the parser focuses on Markdown itself, I felt that expanding the testing effort to include other Markdown inputs was warranted. As each different input is a slightly unique way to “phrase” the Markdown document, I felt that associating each input with a scenario and a scenario test was appropriate.

And while I would love for there to be no issues at all with the PyMarkdown project, I am still finding scenario test failures that I need to deal with.

The Work¶

When it comes those scenario test failures, I have three buckets that I file those failures into. The first bucket is that the application aborts, either from an assert statement or any other exception that is thrown. This bucket is a high priority as it will stop the application from processing anything else in the document. That, and it also looks bad to the end user. The second bucket is for parsing errors that result in an incorrect Markdown token stream being generated. These errors are caught when generating HTML from the tokens as a double-check. This bucket is a medium priority because it looks bad and feels bad since these are very visible to the end-user but are typically low impact. Finally, the third bucket is reserved for whitespace errors that affect the tokens themselves, but only in a minor way. The tests catch these errors when generating the original Markdown from the Markdown tokens. This bucket is low priority because while there are rules that are dependent on whitespace in tokens, there are only a handful of them.

Getting back to my work on the project, I started adding a new set of scenario tests to the project. After four or five days of work, I had 224 new scenario tests implemented and committed, with only nine tests failing. Compared to the work I did in the spring where I had over fifty tests failing, I was happy with only having to fix nine tests in two groups. Honestly, there is a third group that I need to look at more closely, but I will get to double checking those results after I deal with the first two groups.

The good news? Only two of the failing tests were due to parsing errors, the rest all dealt with whitespace errors. I found no crashing scenarios and only a couple of parsing scenarios, with less than ten whitespace scenarios to fix. That was a good result.

But should I fix them? That is the question that I started to ask myself.

The Risk¶

Of all three questions, this was the easiest one for me to answer. The risk of not having a test for a given scenario is that a user decides not to use the project for their needs. However, that risk is balanced out by the frequency of a given scenario failing in normal use.

Applying those metrics to these scenarios, I would like to reduce the risk if possible. But nothing that I found made me feel like I had to stop all work and fix those scenarios right now. If I make steady work to mitigate the risk of all three-level nested container scenarios, I am good. I would feel better if I had diversity in the scenarios that I am missing, and I need to be able to factor that into my risk analysis. More on that later.

The Cost¶

The next question for me to answer was the cost. Based on my experience of adding those 224 new scenario tests, I know that it will take me approximately 11 or 12 hours to complete. That time is not wall-clock time, but active task time. That time does not consider any breaks that I take to do work around the house or to relax between sets of scenario tests. That time is spent following a recipe that I determined beforehand. For this latest group of scenario tests, which meant taking each existing scenario test and creating three (or four) additional tests where any indentation for the containers was removed on the last line of the Markdown.

Another part of the cost is the mental fatigue and boredom. The task is not suited for automation, meaning the generation of the scenarios must be done manually. For this past week, which meant taking over seventy existing scenario tests and performing that transformation on each test. That also meant verifying the HTML output against the canonical commonmark.js parser and fixing any typing errors as I went. It was boring, but it needed to be done.

There is also the cost of not adding other scenario tests or addressing other issues. Often referred to as opportunity cost, there may be other issues that I could be working on that would solidify the application more. That one is harder to quantify, so I keep it in mind as a bit of a tiebreaker.

The Benefit¶

The final question that I need to answer is about the benefit of adding more scenario coverage. For me personally, this is more difficult for me to gauge. When I am using a tool like this, I expect three major things to be in place: a decent application, a decent issues process, and a decently responsive application team. I do not expect applications to be perfect, but the application needs to show me that the team cares and is being honest with what they believe they can accomplish. An “everything working fine” solution that is filled with issues is a big turn off. However, the same application with a “we are working to get this to a better application” sign is perfectly acceptable. For me, it is all about setting reasonable expectations.

I have a decent issues process in place and try and be responsive to any users to file issues. Based on that opinion, I hope I am okay with not worrying about disappointing any users in that way. Therefore, focusing on the application and its expectations, I am also in decent shape there. I believe I have decent documentation and a nice section near the top of the main page reading:

Note¶

This project is currently in beta, and some of these documented things may not work 100% as advertised until after the final release. However, everything should be close enough to done that if you find something missing, please let us know.

I do know that users have asked for improvements and pointed out issues that they have asked to be fixed, so I have confidence that I am setting the expectations with my users correctly.

That leaves the determination of value of the benefit in my court. I will have to think about that some more.

The Result¶

To summarize, from a risk point of view, the amount of risk associated with hitting significant issues in the remaining “nested three container” scenarios is medium-to-low. From a cost point of view, I am confident that adding another one of the “nested three container” scenario groups is going to take approximately 12 hours to complete, task time. I am also aware that I am going to get bored with that process, and that I am going to have to take extra breaks to make sure I stay on top of my game. And with the benefit, I am not sure what I am going to land on that but working through this exercise has helped me out a bit.

I know that there are more than thirty issues in the old issues list, with some more urgent issues in the current issues list. As issues in the current issues list are suggestions from users, I feel that any user issues must have a higher priority than the ones I entered. From that point of view, there is an opportunity cost that I am paying in not getting those issues dealt with. I just do not know how to weigh any interest in those issues.

To balance that out, I also feel that addressing three groups of scenarios in the “nested three container” scenarios group would help me reduce my estimated risk from medium-to-low to low. The first group of these scenarios revolves around the whitespace before list items. For example, does this Markdown parse correctly:

   1. line one
1. line two
  1. line three

Those scenarios, an add-on to the work that I just completed, would increase my confidence that I have the list item support working properly in the parser. The other two groups of scenarios are variations on:

> 1. >

and

1. > 1.

For me, replicating my recent work (with the addition of the above group) on those two sets of scenarios would give me confidence that I have addressed the highest risk scenarios.

For me, reducing the risk that I have missed something is worth the benefit, if I can respond to user requests. I think.

And That Means…¶

So given all that hand-waving… I am going to think about this for a week or two as I try to fix the issues that came up during the last week. That will allow me to make progress that I know will bring the project benefit while giving me some more time to figure out how I feel about the above decision.

There is lots of work to do, and I do not have any pressure to make a speedy decision, just a good decision.

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments

Are Scenario Tests Worth It?

Summary¶

Introduction¶

What Are Scenario Tests?¶

Why Are They Important To The Project?¶

The Work¶

The Risk¶

The Cost¶

The Benefit¶

Note¶

The Result¶

And That Means…¶

Comments

Reading Time

Published

Markdown Linter Beta Release

Category

Tags

Stay in Touch