As part of the process of creating a Markdown Linter to use with my personal website, I firmly believe that it is imperative that I have solid testing on that linter and the tools necessary to test the linter. This testing includes executing those Python tool scripts from start to finish and verifying that everything is working properly. From my experience, one of the most efficient ways to scenario test the project’s Python scripts is to use an in-process framework for running Python scripts.
Because of the way that Python works, it is very feasible to scenario test the Python scripts using the in-process framework which I describe in this article. To show how the framework works in practice, I reference my PyScan project to illustrate how I use this framework to test the scenarios in that project. Specifically, I talk about the pytest_execute.py file which contains the bulk of the code I use to write scenario tests with.1
Determine the Requirements¶
As with most of my projects, the first thing I do for any new project is to cleanly determine and document the requirements for the project. Even though this project is a single component used to test the tools and other components, I feel strongly that it is still important to follow those guidelines to ensure the right component is built in the right way.
The basic requirements are pretty easy to define for this in-process test component: execute the Python script independently and capture all relevant information about it’s execution, verifying that information against expected values. The devil is in the details however. I believe that a good definition of “execute the Python script” must include the ability to set the current working directory and arguments for the command line. For a good definition of “capture all relevant information”, I believe the requirements must include capturing of the script’s return code as well as any output to standard out (stdout) and standard error (stderr). As this component executes the script in-process, any attempts to exit the script prematurely must be properly captured, and the state of the test must be returned to what it was at the beginning of the test. Finally, to satisfy the “verifying” requirement, the component must have easy to use comparison functions, with informative output on any differences that arise during verification.
Finding a balance between too many bulky requirements and too few lean requirements is a tough balance to achieve. In this case, I feel that I have achieved that balance by ensuring all of the major parts of the requirements are specified at a high enough level to be able to communicate clearly without ambiguity. Here’s hoping I get the balance right!
Capture Relevant Information¶
The first thing to take care of is a class that will contain the information to satisfy the “capture all relevant information” requirement above. As the requirement specifies the 3 things that need to be captured, all that is left to do is to create a class to encapsulate these variables as follows:
class InProcessResult: """ Class to provide for an encapsulation of the results of an execution. """ def __init__(self, return_code, std_out, std_err): self.return_code = return_code self.std_out = std_out self.std_err = std_err
Executing the Script¶
Now that there is an object to collect the information about the script’s execution, a
simple function is needed to collect that information. In the
base class, the
invoke_main function serves this purpose.
def invoke_main(self, arguments=None, cwd=None): """ Invoke the mainline so that we can capture results. """ saved_state = SystemState() std_output = io.StringIO() std_error = io.StringIO() try: returncode = 0 sys.stdout = std_output sys.stderr = std_error if arguments: sys.argv = arguments.copy() else: sys.argv =  sys.argv.insert(0, self.get_main_name()) if cwd: os.chdir(cwd) self.execute_main() except SystemExit as this_exception: returncode = self.handle_system_exit(this_exception, std_error) except Exception: returncode = self.handle_normal_exception() finally: saved_state.restore() return InProcessResult(returncode, std_output, std_error)
Before changing any of the existing system values, changes that by their very nature
are be made across the entire Python interpreter, the original values of those system
values are kept safely in an instance of the the
SystemState class in the
saved_state variable. As I want to ensure that the saved system state is reverted
back to regardless of what happens, a try-finally block is used to ensure that the
saved_state.restore function is called to restore the system back to it’s original
Once the system state is safely stored away, changes to those system values can be made.
Instances of the
StringIo class are used to provide alternative streams for stdout
and stderr. A new array is assigned to
sys.argv, either an empty array if no
arguments are provided or a copy of the provided array if provided. To the start of
that array is inserted the name of the main script, to ensure that libraries expecting
a properly formatted array of system arguments are happy. Finally, if an alternate
working directory is provided to the function, the script changes to that directory.
To reiterate, the reason it is acceptable to make all of these changes to the system state is that we have a safe copy of the system state stored away that we will revert to when this function completes.
execute_main function is called to execute the script in the specified
manner, there are three possibilities that the function needs to capture the
information for. In the case of a normal fall-through execution, the
returncode = 0
statement at the start of the try-finally block sets the return code. If a
SystemExit exception is thrown, the
handle_system_exit function does a bit of
process to figure out the return code based on the contents of the exception. Finally,
if the execution is terminated for any other exception, the
makes sure to print out decent debug information and sets the return code to 1. In all
three cases, the collected values for stdout and stderr are collected, combined with
the return code determined earlier in this paragraph, and a new instance of the
InProcessResult class is returned with these values.
Verifying Actual Results Against Expected Results¶
When I started with the
assert_results function, it was only 3 statements in quick
succession: 3 assert statements asserting that the actual values for stdout, stderr and
the return code matched the expected values. However, as I started using that function,
it was quickly apparent that when something did fail, there was a certain amount of
repetitive debugging that I performed to determine why the assert was triggered. At
first I added some extra information to the assert statements, and that worked for the
return code. But there were still two issues.
The first issue was that, in the case where all 3 expected values were different than the actual values, it took 3 iterations of cleaning up the test before it passed. Only when I cleared up the first failure did I see the second failure, and only after the second failure was dealt with did I see the third. While this was workable, it was far from efficient. The second issue was that if there were any differences with the contents of the stdout or stderr stream, the differences between the expected value and the actual value were hard to discern by just looking at them.
To address the first issue, I changed the simple
assert_results function to the
def assert_results( self, stdout=None, stderr=None, error_code=0 ): """ Assert the results are as expected in the "assert" phase. """ stdout_error = self.assert_stream_contents("stdout", self.std_out, stdout) stderr_error = self.assert_stream_contents( "stderr", self.std_err, stderr ) return_code_error = self.assert_return_code(self.return_code, error_code) combined_error_msg = "" if stdout_error: combined_error_msg = combined_error_msg + "\n" + str(stdout_error) if stderr_error: combined_error_msg = combined_error_msg + "\n" + str(stderr_error) if return_code_error: combined_error_msg = combined_error_msg + "\n" + str(return_code_error) assert not combined_error_msg, ( "Either stdout, stderr, or the return code was not as expected.\n" + combined_error_msg )
The key to resolving the first issue is in capturing the information about all
differences that occur, and then asserting only once if any differences are encountered.
To accomplish this, several comparison functions are required that capture individual
asserts and relay that information back to the
assert_results function where they
can be aggregated together. It is these comparison functions that are at the heart
of these comparison functions is the
assert_return_code function, which simply
compares the actual return code and the expected return code. If there is any
difference, the error message for the assert statement is descriptive enough to provide
a clear indication of what the difference is. That raised
AssertionError is then
captured and returned from the function so the
assert_results function can report on
@classmethod def assert_return_code(cls, actual_return_code, expected_return_code): """ Assert that the actual return code is as expected. """ result = None try: assert actual_return_code == expected_return_code, ( "Actual error code (" + str(actual_return_code) + ") and expected error code (" + str(expected_return_code) + ") differ." ) except AssertionError as ex: result = ex return result
A slightly more complicated function is the
function. To ensure
that helpful information is returned in the assert failure message, it checks to see if
expected_stream is set and calls
compare_versus_expected if so. (More about
that function in a minute.) If not set, the assert used clearly states that the stream
was expected to be empty, and the actual stream is not empty.
def assert_stream_contents( self, stream_name, actual_stream, expected_stream ): """ Assert that the contents of the given stream are as expected. """ result = None try: if expected_stream: self.compare_versus_expected( stream_name, actual_stream, expected_stream ) else: assert not actual_stream.getvalue(), ( "Expected " + stream_name + " to be empty. Not:\n---\n" + actual_stream.getvalue() + "\n---\n" ) except AssertionError as ex: result = ex finally: actual_stream.close() return result
Addressing the second issue with the initial
assert_results function, the differences
between the two streams being difficult to discern, is the
function. My first variation on this function simply used the statement
assert actual_stream.getvalue() != expected_text, producing the same assert result,
but lacking in the description of why the assert failed. The second variation of this
function added a better assert failure message, but left the task of identifying the
difference between the two strings on the reader of the failure message. The final
variation of this function uses the
difflib module and the
difflib.ndiff function to
provide a detailed line-by-line comparison between the actual stream contents and the
expected stream contents. By using the
difflib.ndiff function in this final
variation, the assert failure message now
contains a very easy to read list of the differences between the two streams.
import difflib @classmethod def compare_versus_expected( cls, stream_name, actual_stream, expected_text ): """ Do a thorough comparison of the actual stream against the expected text. """ if actual_stream.getvalue() != expected_text: diff = difflib.ndiff( expected_text.splitlines(), actual_stream.getvalue().splitlines() ) diff_values = "\n".join(list(diff)) assert False, ( stream_name + " not as expected:\n---\n" + diff_values + "\n---\n" )
Using it all together¶
To start using the work that completed in the sections above, a proper subclass of the
InProcessExecution class is required. Because that class is an abstract base class,
a new class
MainlineExecutor is required to resolve the
execute_main function and
class MainlineExecutor(InProcessExecution): def __init__(self): super().__init__() resource_directory = os.path.join(os.getcwd(), "test", "resources") self.resource_directory = resource_directory def execute_main(self): PyScan().main() def get_main_name(self): return "main.py"
MainlineExecutor class implements those two required functions. The
get_main_name function returns the name of the module entry point for the project.
This name is inserted into the array of arguments to ensure that any functions based
off of the command line
sys.argv array resolves properly. The
function implements the actual code to invoke the main entry point for the script. In
the case of the PyScan project, the entry point at the end of the
main.py script is:
if __name__ == "__main__": PyScan().main()
Therefore, the contents of the
execute_main function is
In addition to those two required functions, there is some extra code in the constructor
for the class. Instead of recomputing the resource directory in each test that requires
MainlineExecutor class computes it in the constructor to keep the test
functions as clean as possible. While this is not required when subclassing from
InProcessExecution, it has proven very useful in practice.
To validate the use of the
MainlineExecutor class with the project, I created a
simple scenario test to verify that the version of the scanner is correct. This is
very simple test, and verifying that the framework passes such a simple test increases
the confidence in the framework itself. At the start of the scenario test, the
executor variable is created and assigned an instance of our new class
MainlineExecutor as well as specify that the arguments to
use for the script as
["--version"]. in the array
suppplied_arguments In keeping
with the Arrange-Act-Assert pattern, I then specify the expected behaviors for stdout
expected_output), stderr (in
expected_error), and the return code from the
Having set everything up in the Assert section of the test, the Act section simply
invokes the script using the
executor.invoke_main function with the
suppplied_arguments variable assigned previously, and collect the results. Once
execute_results.assert_results function verifies those actual results
against the expected results, asserting if there are differences.
def test_get_summarizer_version(): """ Make sure that we can get information about the version of the summarizer. """ # Arrange executor = MainlineExecutor() suppplied_arguments = ["--version"] expected_output = """\ main.py 0.1.0 """ expected_error = "" expected_return_code = 0 # Act execute_results = executor.invoke_main(arguments=suppplied_arguments, cwd=None) # Assert execute_results.assert_results( expected_output, expected_error, expected_return_code )
What Does Using This Look Like?¶
In terms of writing scenario tests, the tests are usually as simple to write as the
test_get_summarizer_version function in the last section. If there are parts of the
output that have a non-constant value, such as the full path of the directory in which
the test is executed in, the
expected_output variable would have to be set to
compensate for that variability, but that is an expected complexity.
For the PyScan project, a quick scan of the
PyScan test_scenarios.py file reveals that for this project, the non-constant values most often
occur with failure messages, especially ones that relay path information in their
failure messages. When that happens, such as with the
test_summarize_junit_report_with_bad_source test function, that extra complexity
is not overwhelming and does not make the test function unreadable.
In terms of the test output for a passing test, there is no difference. If executing
pipenv run pytest produced a
. for a successful test before, it remains a
The big difference is in what is displayed when there is a difference in the test
In the case where there is a single character difference in the test output, such as
changing the expected output for the
test_get_summarizer_version test to
main.py 0.1.1, the output below
clearly shows where the actual output and expected output differ. Note that in these
comparisons, the line that starts with the
- character is the expected output and
the line that starts with the
+ character is the actual output.
E AssertionError: Either stdout, stderr, or the return code was not as expected. E E stdout not as expected: E --- E - main.py 0.1.1 E ? ^ E E + main.py 0.1.0 E ? ^ E E ---
In the case where a line in the test output is completely different, such as changing
the expected output to
This is another line, the output below clearly reflects that
E AssertionError: Either stdout, stderr, or the return code was not as expected. E E stdout not as expected: E --- E - This is another line E + main.py 0.1.0 E ---
Finally, in the case where the actual output contains either more lines or less lines
that the expected output, such as adding the line
This is another line to the
expected output, the output below clearly shows that difference. In this example, as
the first line is at the start of both the actual output and expected output, it is
shown without any prefix to the line.
E AssertionError: Either stdout, stderr, or the return code was not as expected. E E stdout not as expected: E --- E main.py 0.1.0 E - This is another line E ---
pytest_execute.py file that I use as the base for my scenario tests isn’t
rocket science, it is invaluable to me in creating simple, easy-to-read scenario tests.
At the heart of the module is the base requirement (as stated above) to execute the
Python script independently, capture all relevant information about it’s execution,
and then verifying that information against expected values. Based on my experience
and evolution of this module, I believe that it handily satisfies the requirements
To keep things simple for the article, the
additional_errorparameter from a number of the functions has been removed. This parameter is used in the PyMarkdown project and will be documented as part of my articles on that project. ↩
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.