Scenario Testing Python Scripts

Introduction¶

As part of the process of creating a Markdown Linter to use with my personal website, I firmly believe that it is imperative that I have solid testing on that linter and the tools necessary to test the linter. This testing includes executing those Python tool scripts from start to finish and verifying that everything is working properly. From my experience, one of the most efficient ways to scenario test the project’s Python scripts is to use an in-process framework for running Python scripts.

Because of the way that Python works, it is very feasible to scenario test the Python scripts using the in-process framework which I describe in this article. To show how the framework works in practice, I reference my PyScan project to illustrate how I use this framework to test the scenarios in that project. Specifically, I talk about the pytest_execute.py file which contains the bulk of the code I use to write scenario tests with.¹

Determine the Requirements¶

As with most of my projects, the first thing I do for any new project is to cleanly determine and document the requirements for the project. Even though this project is a single component used to test the tools and other components, I feel strongly that it is still important to follow those guidelines to ensure the right component is built in the right way.

The basic requirements are pretty easy to define for this in-process test component: execute the Python script independently and capture all relevant information about it’s execution, verifying that information against expected values. The devil is in the details however. I believe that a good definition of “execute the Python script” must include the ability to set the current working directory and arguments for the command line. For a good definition of “capture all relevant information”, I believe the requirements must include capturing of the script’s return code as well as any output to standard out (stdout) and standard error (stderr). As this component executes the script in-process, any attempts to exit the script prematurely must be properly captured, and the state of the test must be returned to what it was at the beginning of the test. Finally, to satisfy the “verifying” requirement, the component must have easy to use comparison functions, with informative output on any differences that arise during verification.

Finding a balance between too many bulky requirements and too few lean requirements is a tough balance to achieve. In this case, I feel that I have achieved that balance by ensuring all of the major parts of the requirements are specified at a high enough level to be able to communicate clearly without ambiguity. Here’s hoping I get the balance right!

Capture Relevant Information¶

The first thing to take care of is a class that will contain the information to satisfy the “capture all relevant information” requirement above. As the requirement specifies the 3 things that need to be captured, all that is left to do is to create a class to encapsulate these variables as follows:

class InProcessResult:
    """
    Class to provide for an encapsulation of the results of an execution.
    """

    def __init__(self, return_code, std_out, std_err):
        self.return_code = return_code
        self.std_out = std_out
        self.std_err = std_err

Executing the Script¶

Now that there is an object to collect the information about the script’s execution, a simple function is needed to collect that information. In the InProcessExecution base class, the invoke_main function serves this purpose.

    def invoke_main(self, arguments=None, cwd=None):
        """
        Invoke the mainline so that we can capture results.
        """

        saved_state = SystemState()

        std_output = io.StringIO()
        std_error = io.StringIO()
        try:
            returncode = 0
            sys.stdout = std_output
            sys.stderr = std_error

            if arguments:
                sys.argv = arguments.copy()
            else:
                sys.argv = []
            sys.argv.insert(0, self.get_main_name())

            if cwd:
                os.chdir(cwd)

            self.execute_main()
        except SystemExit as this_exception:
            returncode = self.handle_system_exit(this_exception, std_error)
        except Exception:
            returncode = self.handle_normal_exception()
        finally:
            saved_state.restore()

        return InProcessResult(returncode, std_output, std_error)

Before changing any of the existing system values, changes that by their very nature are be made across the entire Python interpreter, the original values of those system values are kept safely in an instance of the the SystemState class in the saved_state variable. As I want to ensure that the saved system state is reverted back to regardless of what happens, a try-finally block is used to ensure that the saved_state.restore function is called to restore the system back to it’s original state.

Once the system state is safely stored away, changes to those system values can be made. Instances of the StringIo class are used to provide alternative streams for stdout and stderr. A new array is assigned to sys.argv, either an empty array if no arguments are provided or a copy of the provided array if provided. To the start of that array is inserted the name of the main script, to ensure that libraries expecting a properly formatted array of system arguments are happy. Finally, if an alternate working directory is provided to the function, the script changes to that directory.

To reiterate, the reason it is acceptable to make all of these changes to the system state is that we have a safe copy of the system state stored away that we will revert to when this function completes.

After the execute_main function is called to execute the script in the specified manner, there are three possibilities that the function needs to capture the information for. In the case of a normal fall-through execution, the returncode = 0 statement at the start of the try-finally block sets the return code. If a SystemExit exception is thrown, the handle_system_exit function does a bit of process to figure out the return code based on the contents of the exception. Finally, if the execution is terminated for any other exception, the handle_normal_exception makes sure to print out decent debug information and sets the return code to 1. In all three cases, the collected values for stdout and stderr are collected, combined with the return code determined earlier in this paragraph, and a new instance of the InProcessResult class is returned with these values.

Verifying Actual Results Against Expected Results¶

When I started with the assert_results function, it was only 3 statements in quick succession: 3 assert statements asserting that the actual values for stdout, stderr and the return code matched the expected values. However, as I started using that function, it was quickly apparent that when something did fail, there was a certain amount of repetitive debugging that I performed to determine why the assert was triggered. At first I added some extra information to the assert statements, and that worked for the return code. But there were still two issues.

The first issue was that, in the case where all 3 expected values were different than the actual values, it took 3 iterations of cleaning up the test before it passed. Only when I cleared up the first failure did I see the second failure, and only after the second failure was dealt with did I see the third. While this was workable, it was far from efficient. The second issue was that if there were any differences with the contents of the stdout or stderr stream, the differences between the expected value and the actual value were hard to discern by just looking at them.

To address the first issue, I changed the simple assert_results function to the following:

    def assert_results(
        self, stdout=None, stderr=None, error_code=0
    ):
        """
        Assert the results are as expected in the "assert" phase.
        """

        stdout_error = self.assert_stream_contents("stdout", self.std_out, stdout)
        stderr_error = self.assert_stream_contents(
            "stderr", self.std_err, stderr
        )
        return_code_error = self.assert_return_code(self.return_code, error_code)

        combined_error_msg = ""
        if stdout_error:
            combined_error_msg = combined_error_msg + "\n" + str(stdout_error)
        if stderr_error:
            combined_error_msg = combined_error_msg + "\n" + str(stderr_error)
        if return_code_error:
            combined_error_msg = combined_error_msg + "\n" + str(return_code_error)
        assert not combined_error_msg, (
            "Either stdout, stderr, or the return code was not as expected.\n"
            + combined_error_msg
        )

The key to resolving the first issue is in capturing the information about all differences that occur, and then asserting only once if any differences are encountered. To accomplish this, several comparison functions are required that capture individual asserts and relay that information back to the assert_results function where they can be aggregated together. It is these comparison functions that are at the heart of the assert_results function.

The easiest of these comparison functions is the assert_return_code function, which simply compares the actual return code and the expected return code. If there is any difference, the error message for the assert statement is descriptive enough to provide a clear indication of what the difference is. That raised AssertionError is then captured and returned from the function so the assert_results function can report on it.

    @classmethod
    def assert_return_code(cls, actual_return_code, expected_return_code):
        """
        Assert that the actual return code is as expected.
        """

        result = None
        try:
            assert actual_return_code == expected_return_code, (
                "Actual error code ("
                + str(actual_return_code)
                + ") and expected error code ("
                + str(expected_return_code)
                + ") differ."
            )
        except AssertionError as ex:
            result = ex
        return result

A slightly more complicated function is the assert_stream_contents comparison function. To ensure that helpful information is returned in the assert failure message, it checks to see if the expected_stream is set and calls compare_versus_expected if so. (More about that function in a minute.) If not set, the assert used clearly states that the stream was expected to be empty, and the actual stream is not empty.

    def assert_stream_contents(
        self, stream_name, actual_stream, expected_stream
    ):
        """
        Assert that the contents of the given stream are as expected.
        """

        result = None
        try:
            if expected_stream:
                self.compare_versus_expected(
                    stream_name, actual_stream, expected_stream
                )
            else:
                assert not actual_stream.getvalue(), (
                    "Expected "
                    + stream_name
                    + " to be empty. Not:\n---\n"
                    + actual_stream.getvalue()
                    + "\n---\n"
                )
        except AssertionError as ex:
            result = ex
        finally:
            actual_stream.close()
        return result

Addressing the second issue with the initial assert_results function, the differences between the two streams being difficult to discern, is the compare_versus_expected function. My first variation on this function simply used the statement assert actual_stream.getvalue() != expected_text, producing the same assert result, but lacking in the description of why the assert failed. The second variation of this function added a better assert failure message, but left the task of identifying the difference between the two strings on the reader of the failure message. The final variation of this function uses the difflib module and the difflib.ndiff function to provide a detailed line-by-line comparison between the actual stream contents and the expected stream contents. By using the difflib.ndiff function in this final variation, the assert failure message now contains a very easy to read list of the differences between the two streams.

import difflib

    @classmethod
    def compare_versus_expected(
        cls, stream_name, actual_stream, expected_text
    ):
        """
        Do a thorough comparison of the actual stream against the expected text.
        """
        if actual_stream.getvalue() != expected_text:
            diff = difflib.ndiff(
                expected_text.splitlines(), actual_stream.getvalue().splitlines()
            )

            diff_values = "\n".join(list(diff))
            assert False, (
                stream_name + " not as expected:\n---\n" + diff_values + "\n---\n"
            )

Using it all together¶

To start using the work that completed in the sections above, a proper subclass of the InProcessExecution class is required. Because that class is an abstract base class, a new class MainlineExecutor is required to resolve the execute_main function and the get_main_name function.

class MainlineExecutor(InProcessExecution):
    def __init__(self):
        super().__init__()
        resource_directory = os.path.join(os.getcwd(), "test", "resources")
        self.resource_directory = resource_directory

    def execute_main(self):
        PyScan().main()

    def get_main_name(self):
        return "main.py"

The MainlineExecutor class implements those two required functions. The get_main_name function returns the name of the module entry point for the project. This name is inserted into the array of arguments to ensure that any functions based off of the command line sys.argv array resolves properly. The execute_main function implements the actual code to invoke the main entry point for the script. In the case of the PyScan project, the entry point at the end of the main.py script is:

if __name__ == "__main__":
    PyScan().main()

Therefore, the contents of the execute_main function is PyScan().main().

In addition to those two required functions, there is some extra code in the constructor for the class. Instead of recomputing the resource directory in each test that requires it, the MainlineExecutor class computes it in the constructor to keep the test functions as clean as possible. While this is not required when subclassing from InProcessExecution, it has proven very useful in practice.

To validate the use of the MainlineExecutor class with the project, I created a simple scenario test to verify that the version of the scanner is correct. This is very simple test, and verifying that the framework passes such a simple test increases the confidence in the framework itself. At the start of the scenario test, the executor variable is created and assigned an instance of our new class MainlineExecutor as well as specify that the arguments to use for the script as ["--version"]. in the array suppplied_arguments In keeping with the Arrange-Act-Assert pattern, I then specify the expected behaviors for stdout (in expected_output), stderr (in expected_error), and the return code from the script (in expected_return_code).

Having set everything up in the Assert section of the test, the Act section simply invokes the script using the executor.invoke_main function with the suppplied_arguments variable assigned previously, and collect the results. Once collected, the execute_results.assert_results function verifies those actual results against the expected results, asserting if there are differences.

def test_get_summarizer_version():
    """
    Make sure that we can get information about the version of the summarizer.
    """

    # Arrange
    executor = MainlineExecutor()
    suppplied_arguments = ["--version"]

    expected_output = """\
main.py 0.1.0
"""
    expected_error = ""
    expected_return_code = 0

    # Act
    execute_results = executor.invoke_main(arguments=suppplied_arguments, cwd=None)

    # Assert
    execute_results.assert_results(
        expected_output, expected_error, expected_return_code
    )

What Does Using This Look Like?¶

In terms of writing scenario tests, the tests are usually as simple to write as the test_get_summarizer_version function in the last section. If there are parts of the output that have a non-constant value, such as the full path of the directory in which the test is executed in, the expected_output variable would have to be set to compensate for that variability, but that is an expected complexity.

For the PyScan project, a quick scan of the PyScan test_scenarios.py file reveals that for this project, the non-constant values most often occur with failure messages, especially ones that relay path information in their failure messages. When that happens, such as with the test_summarize_junit_report_with_bad_source test function, that extra complexity is not overwhelming and does not make the test function unreadable.

In terms of the test output for a passing test, there is no difference. If executing pipenv run pytest produced a . for a successful test before, it remains a . now. The big difference is in what is displayed when there is a difference in the test output.

In the case where there is a single character difference in the test output, such as changing the expected output for the test_get_summarizer_version test to main.py 0.1.1, the output below clearly shows where the actual output and expected output differ. Note that in these comparisons, the line that starts with the - character is the expected output and the line that starts with the + character is the actual output.

E       AssertionError: Either stdout, stderr, or the return code was not as expected.
E
E       stdout not as expected:
E       ---
E       - main.py 0.1.1
E       ?             ^
E
E       + main.py 0.1.0
E       ?             ^
E
E       ---

In the case where a line in the test output is completely different, such as changing the expected output to This is another line, the output below clearly reflects that difference:

E       AssertionError: Either stdout, stderr, or the return code was not as expected.
E
E       stdout not as expected:
E       ---
E       - This is another line
E       + main.py 0.1.0
E       ---

Finally, in the case where the actual output contains either more lines or less lines that the expected output, such as adding the line This is another line to the expected output, the output below clearly shows that difference. In this example, as the first line is at the start of both the actual output and expected output, it is shown without any prefix to the line.

E       AssertionError: Either stdout, stderr, or the return code was not as expected.
E
E       stdout not as expected:
E       ---
E         main.py 0.1.0
E       - This is another line
E       ---

Summary¶

While the pytest_execute.py file that I use as the base for my scenario tests isn’t rocket science, it is invaluable to me in creating simple, easy-to-read scenario tests. At the heart of the module is the base requirement (as stated above) to execute the Python script independently, capture all relevant information about it’s execution, and then verifying that information against expected values. Based on my experience and evolution of this module, I believe that it handily satisfies the requirements with ease.

To keep things simple for the article, the additional_error parameter from a number of the functions has been removed. This parameter is used in the PyMarkdown project and will be documented as part of my articles on that project. ↩

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments

Scenario Testing Python Scripts

Introduction¶

Determine the Requirements¶

Capture Relevant Information¶

Executing the Script¶

Verifying Actual Results Against Expected Results¶

Using it all together¶

What Does Using This Look Like?¶

Summary¶

Comments

Reading Time

Published

Category

Tags

Stay in Touch