Summary

In my last article, I talked about the first Project Summarizer plugin that I created as part of my PyLint Utilities project. In this article, I talk about the changes that I incorporated into both projects to get their code coverage percentages up to 100%.

Introduction

From past articles, longtime readers will understand that I view quality and the metrics that help me understand that quality on a sliding scale. One of the most basic and useful metrics that use is tracking the code coverage of tests that are in place for the components being tested. To be clear, I agree with people that argue that code coverage is not the be all and end all of quality. But I also argue very confidently that it is not a metric to be easily dismissed.

Code coverage does not measure if a project is working properly. It measures whether each line of code has been executed at least once. A project can have 100% code coverage and still fail to meet its criteria for a project that meets its intended goal. From my point of view, which is where scenario tests come in. And whether those tests are a complete picture of how the team envisions the project being used is a trickier thing to measure. That is why many teams, for each piece of work, construct individual acceptance criteria or maintain collections of rules called Definition of Done.

To be honest, I have not seen any automated way to review either of these. As a professional, I can create scenario tests that exercise these two concepts with respect to what I am testing. But those scenario tests need human review to determine if they are doing what they are supposed to. But by combining the exactness of code coverage with the observable scenario tests, I believe it is possible to get a product that is well-tested and that does what it is supposed to do.

And most of the time, getting there is half the fun.

Slight Adjustments to Project Summarizer

To start off, before I made these adjustments to deal with this new plugin, I had confidence that the scenario tests were testing the right things and I knew that the code coverage was at one hundred percent. From my viewpoint, it was a well tested project that was doing what I needed it to do. And it meant that I wanted to keep those confidence levels where they were going forward.

Cue the hard work!

Nice Side Effect - Finding A Setup Issue

At the start of testing the new plugin, I ran into an issue right away: I could not execute the Project Summarizer from its package. I tried executing it locally, and there were no problems, but when I looked in the local package that I was using, some of the files were missing.

After a bit of work, I found myself looking at these lines in the setup.py module:

PACKAGE_MODULES = [
    "project_summarizer",
]

Looking at the uploaded package, everything was fine. I knew something changed. Examining the contents of the new package more closely, I noticed that only the files that I refactored into their own directories were missing. Given that observation, I tried this change:

PACKAGE_MODULES = [
    "project_summarizer",
    "project_summarizer.plugin_manager",
    "project_summarizer.plugins",
]

And was greeted with success! I was able to execute the Project Summarizer project without any issues. Talk about a bit of a testing hole that I need to address in the future!

Debugging The Dictionary Issue

One command line argument that I have in other projects that I had not added to this project was the --stack-trace argument. It is a simple argument that instructs the error handling to also print out a stack trace. It is not useful in everyday use, but when I need a stack trace of a failure, it is invaluable.

And this was the case with the new plugin and the save_summary_file function. When the exception occurred, it was reporting that it was having issues dealing with generating a report, but nothing more specific. That was by intentional and by design. When things fail, I do not want any users getting cryptic error messages that confuse them. As such, I keep the error messages as generic and easy to read as possible.

But in this case, I needed to debug the issue and I just wanted something simple. Remembering that I have the --stack-trace argument in other projects, I quickly added it to the Project Summarizer project, where it immediately pointed out that the issue was with the save_summary_file and that it was passed a dictionary object to save. As the object to save was already a dictionary, calling to_dict on it was causing the error. A quick fix and some added tests, and that issue was cleared up.

While I hope to not need that argument again, I now know it is there in case I need it for future debugging sessions.

Easy Is Not Always Easy

When I designed the plugin architecture, I wanted something that was simple and easy to use. I did not anticipate any large manipulation of data within the plugins: they were supposed to summarize data that was already present.

But then when I started testing the PyLint_Utils plugin, I hit a snag. The current design allowed each plugin to format the data and print it out in its own format. Since I want to keep things simple, I coded all three implemented plugins to use the columnar package to format the data. I did not have any issues with this approach for the first two plugins, so I did not anticipate any issues using it again for the PyLint_Utils plugin. Until it failed. It was able to load the plugin module but failed to load the columnar package.

I researched this for two nights before determining that it was likely not possible to do. What I mean by that is that it may be possible, but after two nights and five hours of research and experimentation, I was not able to find it. I needed a Plan B. Seeing as I figured out that every plugin was going to be using some manner of package for outputting the summary… I cheated. While there are other columnizers out there, I already have columnar installed for the base package. As such, I just changed the interface to the generate_report function. If a tuple is returned, it is used as the three primary parameters for columnar: justification, title, and rows. Problem solved. Not pretty, but problem solved.

Switching To PyLint_Utilities

With those issues dealt with, the Project Summarizer project was back at 100% code coverage and all relevant scenario tests were in place. So, it was time to get back to the thing that initiated those changed, the PyLint_Utils project. It turns out that more then half of the changes I needed to execute the PyLint_Utils project were in the Project Summarizer, so I thought I was home free.

And then I got down to improving the code coverage.

There Is Only So Much You Can Cover Normally

First off, I want to stress that I am a bit of a fanatic about scenario test coverage and code coverage. If it is a normal application, I have a general rule that it should have at least 75% code coverage, scenario tests for all “good” paths, and scenario tests for any “bad” paths that a team feels will be hit a fair amount. And yes, “a fair amount” is a judgement call.

The good paths are an easy goal for me to justify. If everything goes properly, you know that users will hit those paths. The bad paths are a bit more nuanced. From my viewpoint, I start with this question: what are the things that I would mess up or have messed up when using the application? Things like missing parameters and bad or wrong file names are easy targets. I mess those up all the time. That is usually a good starting point for error-related or “bad” pieces of the project to include.

From there, the cost of covering the paths start going uphill very quickly. And for each path, it comes down to the question of whether the cost of covering that “bad” path is enough of a benefit to warrant the cost involved. And the cost can be varied.

Hiding Things In Plain Sight

The first thing that I do to determine cost is to try and see if I can do something simple to write a new scenario test without too many changes. Can I change the file from a JSON file to a non-JSON file or a directory to set off error handling? Can I use two parameters together and make sure they do not conflict? Is there an existing path that I can leverage?

But sometimes, I must be sneaky. Take this “hidden” parameter that I have in the PyLint_Utils project:

parser.add_argument(
    "--x-display",
    dest="x_test_display",
    action="store_true",
    default="",
    help=argparse.SUPPRESS,
)

That parameter is not visible using -h or --help, but if you use it in a normal command line, the application will not complain. How is that? The argparse.SUPPRESS suppresses that argument from being shown.

So why take this approach? In this case, that flag sets off this logic:

self.__display_progress = (
    sys.stdout.isatty() or args.x_test_display
) and not self.__verbose_mode

I have no control over what sys.stdout.isatty() returns, at least not yet. As such, this is a simple and easy to read way to alter the results of that function.

What About Mocks?

This is where the not yet from the previous paragraph kick in.

From my experience, mocks are most useful when you want to test something that has many moving pieces to it. To use the equation, I postulated a couple of sections ago, is the cost of making the change worth the benefit of that change?

For me, the answer to that question is heavily based on experience. The benefit of that equation is the easy part to define. Either using some other method or completing the block will properly test part of the code. The cost is much more difficult to define.

A good example of this difficulty is the PatchBuiltinOpen class that I use for mocking file open calls, located here. I have used this for testing various smaller Python projects for almost as long as I have been working in Python, and I find this object really good at getting into tight places for code coverage. This mock class patches the built-in open call for files and provides the register_text_content function and the register_exception function to control what gets controlled. If the filename passed to those functions matches the argument for the open function, one of those two behaviors is returned. If not, the mock object needs to carefully un-patch itself, call the original function, and then patch itself again.

Even without looking at the source code for the PatchBuiltinOpen class, it is obvious that the function has a significant amount of code to accomplish those tasks. Why? Because it takes a sizeable number of sentences to describe what it does. And then there is the invocation of this behavior. To mock an open function call that is buried within the code called from the line:

execute_results = scanner.invoke_main(arguments=supplied_arguments)

this is the code required:

try:
    pbo = PatchBuiltinOpen()
    pbo.register_exception(
        test_file_to_scan_path, "wt", exception_message=mock_exception_message
    )
    pbo.start()

    execute_results = scanner.invoke_main(arguments=supplied_arguments)
finally:
    pbo.stop()

This is not simple. This is a sledgehammer. Granted, a nice sledgehammer that has been well used, well-polished, and carefully taken care of, but it is a sledgehammer. And in my experience, code sledgehammers increase cost.

But Sometimes There Are No Other Options

Having taken care of the majority of the code coverage for the project, I was left with a small amount of code to cover. This code was code that was enacted after the subprocess.Popen function is called when PyLint_Utils calls PyLint itself. In terms of benefit, it was the only handful of lines that were not covered. In terms of cost, the only option was to mock out the function call.

The PatchSubprocessPopen was the result. Copying the bulk of the functionality from the PatchBuiltinOpen class, I was able to save a lot of time. Another thing that kept the cost down was that as of right now, I only needed the mock object to be specifically for this one scenario test. As such, I was able to tailor it very specifically for the tests in which it was used.

But even then, it was a headache to get right. I had to make sure I read each variable from the args and kwargs parameters properly, translating them into values I could use. As POpen can be called multiple times, I had to make sure I had a passthrough in there. And debugging it was not the best experience I had.

But in the end, to get from over 99.5% to 100% was worth it to me. I was able to get more experience with mock objects in Python, and I was able to close the gap in code coverage.

For me, it was worth it. But it was costly.

What Is Next?

With that bulk of work wrapped up, I am hoping to get some time back on the PyMarkdown project in the next week. Here is hoping for that! Stay tuned!

Like this post? Share on: TwitterFacebookEmail

Comments

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.


Reading Time

~9 min read

Published

PyLint Utilities

Category

Software Quality

Tags

Stay in Touch