Introduction¶
As part of the process of creating a Markdown Linter to use with my personal website, I firmly believe that it is imperative that I have solid testing on that linter and the tools necessary to test the linter. This testing includes executing those Python tool scripts from start to finish and verifying that everything is working properly. From my experience, one of the most efficient ways to scenario test the project’s Python scripts is to use an in-process framework for running Python scripts.
Because of the way that Python works, it is very feasible to scenario test the Python scripts using the in-process framework which I describe in this article. To show how the framework works in practice, I reference my PyScan project to illustrate how I use this framework to test the scenarios in that project. Specifically, I talk about the pytest_execute.py file which contains the bulk of the code I use to write scenario tests with.1
Determine the Requirements¶
As with most of my projects, the first thing I do for any new project is to cleanly determine and document the requirements for the project. Even though this project is a single component used to test the tools and other components, I feel strongly that it is still important to follow those guidelines to ensure the right component is built in the right way.
The basic requirements are pretty easy to define for this in-process test component: execute the Python script independently and capture all relevant information about it’s execution, verifying that information against expected values. The devil is in the details however. I believe that a good definition of “execute the Python script” must include the ability to set the current working directory and arguments for the command line. For a good definition of “capture all relevant information”, I believe the requirements must include capturing of the script’s return code as well as any output to standard out (stdout) and standard error (stderr). As this component executes the script in-process, any attempts to exit the script prematurely must be properly captured, and the state of the test must be returned to what it was at the beginning of the test. Finally, to satisfy the “verifying” requirement, the component must have easy to use comparison functions, with informative output on any differences that arise during verification.
Finding a balance between too many bulky requirements and too few lean requirements is a tough balance to achieve. In this case, I feel that I have achieved that balance by ensuring all of the major parts of the requirements are specified at a high enough level to be able to communicate clearly without ambiguity. Here’s hoping I get the balance right!
Capture Relevant Information¶
The first thing to take care of is a class that will contain the information to satisfy the “capture all relevant information” requirement above. As the requirement specifies the 3 things that need to be captured, all that is left to do is to create a class to encapsulate these variables as follows:
class InProcessResult:
"""
Class to provide for an encapsulation of the results of an execution.
"""
def __init__(self, return_code, std_out, std_err):
self.return_code = return_code
self.std_out = std_out
self.std_err = std_err
Executing the Script¶
Now that there is an object to collect the information about the script’s execution, a
simple function is needed to collect that information. In the InProcessExecution
base class, the invoke_main
function serves this purpose.
def invoke_main(self, arguments=None, cwd=None):
"""
Invoke the mainline so that we can capture results.
"""
saved_state = SystemState()
std_output = io.StringIO()
std_error = io.StringIO()
try:
returncode = 0
sys.stdout = std_output
sys.stderr = std_error
if arguments:
sys.argv = arguments.copy()
else:
sys.argv = []
sys.argv.insert(0, self.get_main_name())
if cwd:
os.chdir(cwd)
self.execute_main()
except SystemExit as this_exception:
returncode = self.handle_system_exit(this_exception, std_error)
except Exception:
returncode = self.handle_normal_exception()
finally:
saved_state.restore()
return InProcessResult(returncode, std_output, std_error)
Before changing any of the existing system values, changes that by their very nature
are be made across the entire Python interpreter, the original values of those system
values are kept safely in an instance of the the SystemState
class in the
saved_state
variable. As I want to ensure that the saved system state is reverted
back to regardless of what happens, a try-finally block is used to ensure that the
saved_state.restore
function is called to restore the system back to it’s original
state.
Once the system state is safely stored away, changes to those system values can be made.
Instances of the StringIo
class are used to provide alternative streams for stdout
and stderr. A new array is assigned to sys.argv
, either an empty array if no
arguments are provided or a copy of the provided array if provided. To the start of
that array is inserted the name of the main script, to ensure that libraries expecting
a properly formatted array of system arguments are happy. Finally, if an alternate
working directory is provided to the function, the script changes to that directory.
To reiterate, the reason it is acceptable to make all of these changes to the system state is that we have a safe copy of the system state stored away that we will revert to when this function completes.
After the execute_main
function is called to execute the script in the specified
manner, there are three possibilities that the function needs to capture the
information for. In the case of a normal fall-through execution, the returncode = 0
statement at the start of the try-finally block sets the return code. If a
SystemExit
exception is thrown, the handle_system_exit
function does a bit of
process to figure out the return code based on the contents of the exception. Finally,
if the execution is terminated for any other exception, the handle_normal_exception
makes sure to print out decent debug information and sets the return code to 1. In all
three cases, the collected values for stdout and stderr are collected, combined with
the return code determined earlier in this paragraph, and a new instance of the
InProcessResult
class is returned with these values.
Verifying Actual Results Against Expected Results¶
When I started with the assert_results
function, it was only 3 statements in quick
succession: 3 assert statements asserting that the actual values for stdout, stderr and
the return code matched the expected values. However, as I started using that function,
it was quickly apparent that when something did fail, there was a certain amount of
repetitive debugging that I performed to determine why the assert was triggered. At
first I added some extra information to the assert statements, and that worked for the
return code. But there were still two issues.
The first issue was that, in the case where all 3 expected values were different than the actual values, it took 3 iterations of cleaning up the test before it passed. Only when I cleared up the first failure did I see the second failure, and only after the second failure was dealt with did I see the third. While this was workable, it was far from efficient. The second issue was that if there were any differences with the contents of the stdout or stderr stream, the differences between the expected value and the actual value were hard to discern by just looking at them.
To address the first issue, I changed the simple assert_results
function to the
following:
def assert_results(
self, stdout=None, stderr=None, error_code=0
):
"""
Assert the results are as expected in the "assert" phase.
"""
stdout_error = self.assert_stream_contents("stdout", self.std_out, stdout)
stderr_error = self.assert_stream_contents(
"stderr", self.std_err, stderr
)
return_code_error = self.assert_return_code(self.return_code, error_code)
combined_error_msg = ""
if stdout_error:
combined_error_msg = combined_error_msg + "\n" + str(stdout_error)
if stderr_error:
combined_error_msg = combined_error_msg + "\n" + str(stderr_error)
if return_code_error:
combined_error_msg = combined_error_msg + "\n" + str(return_code_error)
assert not combined_error_msg, (
"Either stdout, stderr, or the return code was not as expected.\n"
+ combined_error_msg
)
The key to resolving the first issue is in capturing the information about all
differences that occur, and then asserting only once if any differences are encountered.
To accomplish this, several comparison functions are required that capture individual
asserts and relay that information back to the assert_results
function where they
can be aggregated together. It is these comparison functions that are at the heart
of the assert_results
function.
The easiest
of these comparison functions is the assert_return_code
function, which simply
compares the actual return code and the expected return code. If there is any
difference, the error message for the assert statement is descriptive enough to provide
a clear indication of what the difference is. That raised AssertionError
is then
captured and returned from the function so the assert_results
function can report on
it.
@classmethod
def assert_return_code(cls, actual_return_code, expected_return_code):
"""
Assert that the actual return code is as expected.
"""
result = None
try:
assert actual_return_code == expected_return_code, (
"Actual error code ("
+ str(actual_return_code)
+ ") and expected error code ("
+ str(expected_return_code)
+ ") differ."
)
except AssertionError as ex:
result = ex
return result
A slightly more complicated function is the assert_stream_contents
comparison
function. To ensure
that helpful information is returned in the assert failure message, it checks to see if
the expected_stream
is set and calls compare_versus_expected
if so. (More about
that function in a minute.) If not set, the assert used clearly states that the stream
was expected to be empty, and the actual stream is not empty.
def assert_stream_contents(
self, stream_name, actual_stream, expected_stream
):
"""
Assert that the contents of the given stream are as expected.
"""
result = None
try:
if expected_stream:
self.compare_versus_expected(
stream_name, actual_stream, expected_stream
)
else:
assert not actual_stream.getvalue(), (
"Expected "
+ stream_name
+ " to be empty. Not:\n---\n"
+ actual_stream.getvalue()
+ "\n---\n"
)
except AssertionError as ex:
result = ex
finally:
actual_stream.close()
return result
Addressing the second issue with the initial assert_results
function, the differences
between the two streams being difficult to discern, is the compare_versus_expected
function. My first variation on this function simply used the statement
assert actual_stream.getvalue() != expected_text
, producing the same assert result,
but lacking in the description of why the assert failed. The second variation of this
function added a better assert failure message, but left the task of identifying the
difference between the two strings on the reader of the failure message. The final
variation of this function uses the difflib
module and the difflib.ndiff
function to
provide a detailed line-by-line comparison between the actual stream contents and the
expected stream contents. By using the difflib.ndiff
function in this final
variation, the assert failure message now
contains a very easy to read list of the differences between the two streams.
import difflib
@classmethod
def compare_versus_expected(
cls, stream_name, actual_stream, expected_text
):
"""
Do a thorough comparison of the actual stream against the expected text.
"""
if actual_stream.getvalue() != expected_text:
diff = difflib.ndiff(
expected_text.splitlines(), actual_stream.getvalue().splitlines()
)
diff_values = "\n".join(list(diff))
assert False, (
stream_name + " not as expected:\n---\n" + diff_values + "\n---\n"
)
Using it all together¶
To start using the work that completed in the sections above, a proper subclass of the
InProcessExecution
class is required. Because that class is an abstract base class,
a new class MainlineExecutor
is required to resolve the execute_main
function and
the get_main_name
function.
class MainlineExecutor(InProcessExecution):
def __init__(self):
super().__init__()
resource_directory = os.path.join(os.getcwd(), "test", "resources")
self.resource_directory = resource_directory
def execute_main(self):
PyScan().main()
def get_main_name(self):
return "main.py"
The MainlineExecutor
class implements those two required functions. The
get_main_name
function returns the name of the module entry point for the project.
This name is inserted into the array of arguments to ensure that any functions based
off of the command line sys.argv
array resolves properly. The execute_main
function implements the actual code to invoke the main entry point for the script. In
the case of the PyScan project, the entry point at the end of the main.py
script is:
if __name__ == "__main__":
PyScan().main()
Therefore, the contents of the execute_main
function is PyScan().main()
.
In addition to those two required functions, there is some extra code in the constructor
for the class. Instead of recomputing the resource directory in each test that requires
it, the MainlineExecutor
class computes it in the constructor to keep the test
functions as clean as possible. While this is not required when subclassing from
InProcessExecution
, it has proven very useful in practice.
To validate the use of the MainlineExecutor
class with the project, I created a
simple scenario test to verify that the version of the scanner is correct. This is
very simple test, and verifying that the framework passes such a simple test increases
the confidence in the framework itself. At the start of the scenario test, the
executor
variable is created and assigned an instance of our new class
MainlineExecutor
as well as specify that the arguments to
use for the script as ["--version"]
. in the array suppplied_arguments
In keeping
with the Arrange-Act-Assert pattern, I then specify the expected behaviors for stdout
(in expected_output
), stderr (in expected_error
), and the return code from the
script (in expected_return_code
).
Having set everything up in the Assert section of the test, the Act section simply
invokes the script using the executor.invoke_main
function with the
suppplied_arguments
variable assigned previously, and collect the results. Once
collected, the execute_results.assert_results
function verifies those actual results
against the expected results, asserting if there are differences.
def test_get_summarizer_version():
"""
Make sure that we can get information about the version of the summarizer.
"""
# Arrange
executor = MainlineExecutor()
suppplied_arguments = ["--version"]
expected_output = """\
main.py 0.1.0
"""
expected_error = ""
expected_return_code = 0
# Act
execute_results = executor.invoke_main(arguments=suppplied_arguments, cwd=None)
# Assert
execute_results.assert_results(
expected_output, expected_error, expected_return_code
)
What Does Using This Look Like?¶
In terms of writing scenario tests, the tests are usually as simple to write as the
test_get_summarizer_version
function in the last section. If there are parts of the
output that have a non-constant value, such as the full path of the directory in which
the test is executed in, the expected_output
variable would have to be set to
compensate for that variability, but that is an expected complexity.
For the PyScan project, a quick scan of the
PyScan test_scenarios.py file reveals that for this project, the non-constant values most often
occur with failure messages, especially ones that relay path information in their
failure messages. When that happens, such as with the
test_summarize_junit_report_with_bad_source
test function, that extra complexity
is not overwhelming and does not make the test function unreadable.
In terms of the test output for a passing test, there is no difference. If executing
pipenv run pytest
produced a .
for a successful test before, it remains a .
now.
The big difference is in what is displayed when there is a difference in the test
output.
In the case where there is a single character difference in the test output, such as
changing the expected output for the test_get_summarizer_version
test to
main.py 0.1.1
, the output below
clearly shows where the actual output and expected output differ. Note that in these
comparisons, the line that starts with the -
character is the expected output and
the line that starts with the +
character is the actual output.
E AssertionError: Either stdout, stderr, or the return code was not as expected.
E
E stdout not as expected:
E ---
E - main.py 0.1.1
E ? ^
E
E + main.py 0.1.0
E ? ^
E
E ---
In the case where a line in the test output is completely different, such as changing
the expected output to This is another line
, the output below clearly reflects that
difference:
E AssertionError: Either stdout, stderr, or the return code was not as expected.
E
E stdout not as expected:
E ---
E - This is another line
E + main.py 0.1.0
E ---
Finally, in the case where the actual output contains either more lines or less lines
that the expected output, such as adding the line This is another line
to the
expected output, the output below clearly shows that difference. In this example, as
the first line is at the start of both the actual output and expected output, it is
shown without any prefix to the line.
E AssertionError: Either stdout, stderr, or the return code was not as expected.
E
E stdout not as expected:
E ---
E main.py 0.1.0
E - This is another line
E ---
Summary¶
While the pytest_execute.py
file that I use as the base for my scenario tests isn’t
rocket science, it is invaluable to me in creating simple, easy-to-read scenario tests.
At the heart of the module is the base requirement (as stated above) to execute the
Python script independently, capture all relevant information about it’s execution,
and then verifying that information against expected values. Based on my experience
and evolution of this module, I believe that it handily satisfies the requirements
with ease.
-
To keep things simple for the article, the
additional_error
parameter from a number of the functions has been removed. This parameter is used in the PyMarkdown project and will be documented as part of my articles on that project. ↩
Comments
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.