My apologies for this being a day or two later than usual. My son brought home a cold that knocked the stuffing out of me, I needed to take some personal time to ensure I was feeling better before writing. Thanks for your patience.
As a reminder of the requirements from the last article, the big bullet-point items are: command line driven, GitHub Flavored Markdown (for now), and preserving all tokens. To make sure I have a solid set of goals to work towards, setting these requirements as part of the project was pivotal. Now that I have that as a touchstone, I need to move forward with defining how to progress with the testing of the parser at the core of the linter.
Why Write a Parser?¶
In looking at the kind of rules that linters support, I have observed that there are typically two categories of rules: general rules and grammar rules. For general rules such as “tabs should not be used”, it is easy to look at any line in the document being scanned and look for a tab character. For grammatical rules such as “headings should always be properly capitalized”, that scan is more difficult. The most difficult part of that rule is identifying whether any given piece of text is considered part of a header, thus engaging the rest of the rule.
From experience, to properly determine which part of grammar maps to which part of text requires a capable parser, written to the specifications of the language to be parsed. Based on my research from the last article, all of the parsers that I found only translated Markdown into HTML, not any intermediate form. Since I need a clean stream of tokens before translation to HTML, the only option is to write my own parser which will output a clean stream of parsed Markdown tokens.
As I am writing my own parser, I need to have a good set of tests to ensure that the parser works properly. But where to start?
Where to Start With The Tests?¶
Referring back to my article on software reliability, the 2 main types of tests that I need to decide on are scenario tests and unit tests. In a nutshell, the purpose of a scenario test is to test the input and outputs of the project and the purpose of a unit test is to test a specific function of a specific components of the project. Getting a hold of how to balance the quantity of tests that I need to write between the two of these types is my priority.
As one of the initial requirements is to support the GitHub Flavored Markdown specification, it is useful to note that the specification itself has 637 individual examples. Each example provides for the input, in Markdown, and the output, in HTML. While the output is not at the token level needed to satisfy my project’s third requirement, it should be close enough. In looking at each of these examples, I need a solid set of rules that I can apply to the tokens to get them from my desired token-based output to a HTML-based output that matches the examples. It is reasonable to collect these rules as I go when I develop the various types of elements to be parsed. If I tried to do them too far ahead of time, it would invariably lead to a lot of rework. Just in time is the way to go for these rules.
Taking another looking at the types of tests that I need to write, I realized that this project’s test viewpoint was inverted from the usual ratio of scenario tests to unit tests. In most cases, if I have anything more than 20-30 scenario tests, I usually think that I have not properly scoped the project. However, with 637 scenarios already defined for me, it would be foolish not to write at least one scenario test for each of those scenarios, adding extra scenario tests and supportive unit tests where needed. In this case, it makes more sense to focus on the scenario tests as the major set of tests to write.
The balance of scenario tests to unit tests?
Given 637 scenarios ready to go, I need to create at least 637 scenario tests. For those scenario tests, experimenting with the first couple of scenario tests to find a process that worked seemed to be the most efficient way forward. Given a simple and solid template for every scenario test, I had a lot of confidence to then use that template for each scenario test that I tackled.
And the unit tests? In implementing any parsing code, I knew that I needed helper functions that parsed a specific type of foundational thing, like a tag in an HTML block or skipping ahead over any whitespace. The unit tests are used to verify that those kinds of foundational functions are operating properly, ensuring that the rest of the code can depend on those foundations with confidence. As a bonus, more combinations of the various sequences to parse could be tested without inflating the number of scenario tests.
Ground rules set? Check. On to the first scenario test.
Starting With the First Scenario Test¶
While it might not seem correct, starting with example number 189, the first test I did write was for GitHub Flavored Markdown example 189, the first example included in the specification for the paragraph blocks. After solidly reading the specification, the general rule seemed to be that if it does not fit into any other category, it is a paragraph. If everything is going to be a paragraph until the other features are written, I felt that starting with the default case was the right choice.
After a few passes at cleaning up the test for this first case, it boiled down to the following Python code.
""" https://github.github.com/gfm/#paragraphs """ from pymarkdown.tokenized_markdown import TokenizedMarkdown from .utils import assert_if_lists_different def test_paragraph_blocks_189(): """ Test case 189: simple case of paragraphs """ # Arrange tokenizer = TokenizedMarkdown() source_markdown = """aaa bbb""" expected_tokens = [ "[para:]", "[text:aaa:]", "[end-para]", "[BLANK:]", "[para:]", "[text:bbb:]", "[end-para]", ] # Act actual_tokens = tokenizer.transform(source_markdown) # Assert assert_if_lists_different(expected_tokens, actual_tokens)
Breaking Down the Scenario Test¶
It might be a lot to take in all at once, so let us break it down step by step.
Start of the Module¶
The start of the module needs to perform two important tasks: provide useful documentation to someone examining the tests and import any libraries needed.
""" https://github.github.com/gfm/#paragraphs """ from pymarkdown.tokenized_markdown import TokenizedMarkdown from .utils import assert_if_lists_different
The most useful and relevant information about the module that I was able to think of was the actual source for the test cases themselves. That being the case, I felt that including the URI to the specific section in the GitHub Flavored Markdown specification was the right choice for the module documentation. For anyone reading the tests, it provides a solid reference point that answers most of the questions about why the tests are there and whether the tests are relevant.
Next are the import statements. The first one statement imports the
TokenizedMarkdown class, a class that I set up to handle the parsing. Initially this
class was a quick and simple skeleton class, especially for the first paragraph case.
However, it provided the framework for me to support more use cases while maintaining
a uniform interface. The second import statement is used to include a function that
provides a good comparison of the contents of the list returned from the
function of the
TokenizedMarkdown class and a simple text list of the expected
Arrange the Data For The Test¶
From all the useful pieces of information that I have learned about testing, the most useful bits about actually writing tests are the K.I.S.S. principle and the use of the Arrange-Act-Assert pattern. The K.I.S.S principle constantly reminds me to not overcomplicate things, reducing the tests to what is really relevant for that thing or task. The Arrange-Act-Assert pattern reminds me that when writing tests, each test I write breaks down into setup, action, and verification (with cleanup occasionally being added if needed). As such, I always start writing my tests by adding a comment for each of those sections, with the rest of the function blank. Once there, it is easy to remember which parts of the tests go where!
def test_paragraph_blocks_189(): """ Test case 189: simple case of paragraphs """ # Arrange tokenizer = TokenizedMarkdown() source_markdown = """aaa bbb""" expected_tokens = [ "[para:]", "[text:aaa:]", "[end-para]", "[BLANK:]", "[para:]", "[text:bbb:]", "[end-para]", ]
Arrange part of this test is simple, consisting mostly of easy-to-read
assignments. The object to test needs to be setup in a way that it is completely
enclosed within the test function. The tokenizer object with no options is assigned to
tokenizer, so a simple assignment takes care of its setup. The
variable is setup within Python’s
to preserve newlines and provide an accurate look at the string being fed to the
tokenizer. This string is copied verbatim from the example represented by the function,
in this case example 189.
The final setup, the array assigned to the
expected_tokens variable, takes a bit more
work. When I wrote these, I sometimes wrote the expect tokens ahead of time, but
often used a known “bad” set of tokens and adjusted the tokens as I went.
Act (Tokenize) and Assert (Verify Results)¶
With all the work on the setup of the tests, the Act and Assert parts of the test are very anticlimactic.
# Act actual_tokens = tokenizer.transform(source_markdown) # Assert assert_if_lists_different(expected_tokens, actual_tokens)
Using the information that was established in the Arrange section of the test, the
Act section simply applies the input (
source_markdown) to the object to test
tokenizer) and collects the output in
actual_tokens. The Assert section then
takes the output tokens and compares them against the expected list of tokens in
Why Not Use Pure Test Driven Development?¶
In a normal project, I usually follow Test Driven Development practices quite diligently, either writing the tests first and code second, or writing both tests and code at the same time. As this was my first version of my first Markdown parser, I was aware that I was going to be adapting the tokens and token formats as I went, eventually arriving at a set of tokens that worked for all scenarios. Knowing that this churn was part of the development process for this project, I decided that a true Test Driven Development process would not be optimal.
For this project, it was very useful to adjust the process. The balance that I struck
with myself was to make sure that as I coded the parser to respond to a given scenario,
I adjusted the tokens assigned to the
expected_tokens variable based on the example’s
HTML output for the equivalent scenario test. This process gave me the confidence to
know that as I made tests pass by enabling the code behind the scenario, each individual
passing test was both moving towards a fully functioning parser and protecting the work
that I had already done in that direction.
To be clear, as I copied the template over, I adjusted the function name, the
function’s doc-string, and the Markdown source text based on the scenario test that I
was implementing. The list of tokens in
expected_tokens were then populated with
a “best guess” before I started working on the code to make that scenario pass.
In a microscopic sense, as I updated the test and the test tokens before starting on
the code, I was still adhering to Test Driven Development on a scenario-by-scenario
To me, this was a good balance to strike, evaluating the correct tokens as I went instead of trying to work out all 637 sets of tokens ahead of time.
How Did This Help?¶
Getting a good process to deal with the large bulk of scenario tests was a welcome relief. While I still needed to create a strategy to deal with that bulk of scenario tests I would need to write (see the next article for details on that), I had a solid template that was simple (see K.I.S.S. principle), easy to follow (see Arrange-Act-Assert pattern), and would scale. This was indeed something that I was able to work with.
What About the Unit Tests?¶
Compared to the scenario tests, writing unit tests for the parser’s foundation functions was easy. In each case, there is a function to test with a very cleanly specified interface, providing for a clean definition of expected input and output.
What Comes Next?¶
In the next article, I look at the work that needs to be done and come up with general strategies that I use to implement the parser required for the linter. With the specification’s 637 examples as a base for the scenario tests, good planning is needed to ensure the work can progress forward.
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.