Markdown Linter - Delving Into the Issues - 8

Summary¶

In my last article, I continued in my quest to reduce the size of the issues list. In this article, I take a bit of time to focus on adding some depth to the scenario cases table.

Introduction¶

As I get closer to the end of this initial phase of the PyMarkdown project, I find myself measuring the project’s success differently than I did at various points in the past. Initially the success criteria statement was “does it work at all”. Then it moved on to “is it implemented for most circumstances”. Finally, in the last couple of weeks, it has moved on to the “what did I miss” stage. And wow, does it seem like it has taken both a couple of weeks and almost a year at the same time.

While this phase might seem boring to other people, people that are Testers or Automation Developers¹ often enjoy times like these because of two words: exploratory testing. Our job is to make sure the thing we are testing is working properly. To a lot of us, exploratory testing is basically like leaving a kid in a room filled with hundreds of opened LEGO sets and saying to them “show me what you can build!” It is in those times that we get to “play around” and experiment. We use that time to try and understand where the issues are, and which scenarios give us the most benefit to test for the least cost. And as this phase is closing, this type of testing is pivotal in being able to close out the phase cleanly and with confidence.

And as I have mentioned before, testing and test automation is not about trying to break something, it is about reducing the risk that the user of the product will experience that thing breaking. That is where my recording of the bulk testing in the scenario cases tables comes into play. Instead of looking for one issue at a time, those tables take a group of concepts and test them as a single group.

I have found that the benefits of that approach are twofold. The first benefit that I have experienced is an increase in confidence. This is an easy one to explain, as I can concretely point to a collection of tests along a theme and know that any scenario along that theme is working properly. The second benefit is one of economy. The cost of finding an individual issue is expensive. It takes exploration or debugging to find the issue in the first place, with the extra debugging and logging to try and figure out what the issue really is, followed by the extra work required to fix the issue. That ends up being a lot of time. By amortizing the work over an entire group of tests, that cost is drastically reduced.

Having experienced these benefits on this project, I decided to dedicate a weeks’ worth of work to adding to the table, to increase my confidence and to accelerate my journey to having a shippable project.

What Is the Audience for This Article?¶

While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please go to this project’s GitHub repository and consult the commits between 27 Oct 2020 and 31 Oct 2020.

Links/Images and Other Inline Tokens¶

The easy part about coming up with a theme is the title of the theme itself. For the Series J theme, it was easy an easy theme to identify: Link/image elements followed by various other inline tokens. In the scenario-cases.md file, that description is right there after the name of the series. But the hard part of a theme is the act of trying to make sure that each scenario that you want in that theme is present. And often, I miss the mark.

When I originally added Series J to the document, I thought that I had added each of the qualifying inline elements to the group. Going through the list in my head, I thought I had each of those newlines added when I created the group. But in retrospect, I did not have the right viewpoint as I missed a large part of that group: internal versions of the scenarios I had already added.

I came about this when looking at the J8 test and experimenting by creating two new tests:

|J9  |inline link with split emphasis in label| `abc\n[a*li\nnk*a](/uri "title")\ndef` | test_paragraph_extra_e1 |
|J9i |inline image with split emphasis in label| `abc\n![a*li\nnk*a](/uri "title")\ndef` | test_paragraph_extra_e2 |

Whereas the other tests in the Series J group focus on the inline elements after the Link elements and Image elements, I wanted to experiment with performing the same process on those token’s link labels, inside of the tokens. And that experimentation bore fruit. The J9 test failed in the consistency check with an overcount on the line number. After a quick debugging session, I discovered that the rehydration_index that I have mentioned in previous articles was being added to, both in the link label and in the Link’s encapsulated tokens. It was just a simple fix from:

    last_token.rehydrate_index += 1

    if not link_stack:
        last_token.rehydrate_index += 1

After that quick fix, the issue was addressed. But it outlined something to me that I wanted to get back to before the end of the week: inline elements within the link label. More on that near the end of this article!

Adding the Series L Theme¶

This work was the major focus of the week, focusing on links or images contained within the link label section of another link or image.

Origin Story¶

While I came up with the idea for this group recently, I have been thinking about this topic since at least 31 July 2020. It was at that time that I did the research that I would document in a section labelled Example 528, followed by the work undertaken to fix that issue, documented in the section “creatively” labelled “Fixing” Example 528.

That scenario test, encapsulating the GFM Specification’s example 528, is a bit contrived but a good example nonetheless. Given the Markdown document:

![[[foo](uri1)](uri2)](uri3)

the expected output is the HTML document:

<p><img src="uri3" alt="[foo](uri2)" /></p>

The reason I say that this example is contrived is that I can visualize useful cases of a Link element within an Image element, I have a hard time coming up with a similar example for a Link element within a Link element.

A practical instance of this example is the following Markdown:

[![moon](https://nssdc.gsfc.nasa.gov/imgcat/midres/gal_p37329.gif)](https://en.wikipedia.org/wiki/Moon)

which is rendered as:

<p><a href="https://en.wikipedia.org/wiki/Moon"><img src="https://nssdc.gsfc.nasa.gov/imgcat/midres/gal_p37329.gif" alt="moon" /></a></p>

At the visual level, this HTML provides for a picture of a moon from the NASA archives. When that image is clicked on, the browser goes to the Wikipedia article on the moon.

Useful element compositions like this is probably why there are multiple examples of a Link element within an Image element in the GFM Specification. However, in that same specification, only the above example provides for a Link element within a Link element within an Image element. As the GFM Specification provides a unified interpretation of Markdown, Example 528 is presented as a recipe on how to handle cases like that. My guess was that if that example was anything other than an outside case, there would be more examples outlining that pattern.

Formulating the Test Group¶

With the help of John McFarlane, I was able to figure out the part of the algorithm that I had misunderstood and fixed the error. Having invested all that research and work to fix that one issue, I wondered if there was a better way to handle issues with examples like that. That was when I really started thinking about how to cover all the cases that would lead to having a good group of tests around Example 528.

The downside about that exercise was that as soon as I thought about how to cover all those scenario cases, a couple of negative things got in the way. The first big one was example 583 and the paragraph that follows it:

Though this spec is concerned with parsing, not rendering, it is recommended that in rendering to HTML, only the plain string content of the image description be used. Note that in the above example, the alt attribute’s value is foo bar, not foo [bar](/url) or foo <a href="/url">bar</a>. Only the plain string content is rendered, without formatting.

Basically, given the Markdown:

![foo [bar](/url)](/url2)

the specification suggests that the only content that should be used is the foo text contained at the start of the Image element’s link label, and the bar from the link label of inner Link element. Therefore, after processing, the resultant HTML is:

<p><img src="/url2" alt="foo bar" /></p>

The downside of this information is that there are at least 64 “simple” combinations of links inside of links, images inside of images, links inside of images, and images inside of links. Those simple combinations are 4 types of links inside of 4 types of links inside of 4 combinations of link and image elements. That lays the groundwork for determining which combinations should be tested to address scenarios like example 528 but does not address example 528-like scenarios.

Already taking the work required to create a single test for each combination into account, the bigger downside was going to be the verification of each of those tests. Increasing the cost of this downside was the possibility of finding issues that needed to be addressed while the verification phase of the tests was ongoing.

It was daunting, but I felt strongly that it needed to be done. So, I started working on identifying the combinations that were needed, and added them to the scenario-cases.md file. It was then that the hard work for this issue would start.

Working the Issue¶

The bulk of the work on resolving this issue was done over 4 days of lengthy sessions. To reduce the cost of completing this work, I decided early on to come up with a simple strategy to hopefully allow me to copy-and-paste tests where possible, hopefully avoiding extra work. To that end, I figured that the combination of Link elements inside of Link elements was the best combination to start with. I just hoped that I could reuse a lot of the test code.

The table that I created in the scenario-cases.md file was a good tool to create the tests from, but it lacked any Markdown that I could use as a template. Keeping it simple, I started with the Markdown a[foo [bar](/uri)](/uri)a, and transformed the Markdown for each scenario from there. Once I started working with non-inline Link elements, I added in a simple Link Reference Definition, including link referenced to that Link Reference Definition and to a non-existent Link Reference Definition.

Following my usual pattern, I executed that new test and manually verified the tokens, before copying them into the test. After that, I executed the test again and copied the HTML output into the test, after once again manually verifying that it looked right. Even after that step, I used BabelMark against the Markdown for each test, comparing my parser’s output against the commonmark.js output. This process was long, drawn out, and tedious… but it worked.

The hard part about mentally processing a lot of these combinations is that because of the rule that Link elements cannot contain Link elements, I needed to do a lot of tedious parsing of each combination. It was not as simple as just looking at the Markdown and quickly knowing what the answer was. I kept a copy of the GFM Specifications implementation guide open in another window, just to make sure I was doing things in the right order. Even then, I double checked, and triple checked each transformation being running the tests, just to make sure I had things done correctly.

After a couple of days of work in the evenings, I had finished this first part. For the other three parts, I was hoping I could leverage the work heavily to allow me to shave some time off the process.

Completing Work On The Issue¶

With Link elements inside of Link elements out of the way and committed to the repository, I started to work on Image elements inside of Link elements. The big change here was that while nested Link elements need to be parsed carefully, the parsing of Image elements inside of Link elements was more natural to me. The Link token’s link label field contained the “raw” form of the link label, while the tokens between that token and the end Link token contained a processed version. With a decent amount of experience in reading Markdown due to this project, I was able to gain proficiency at those required changes quickly. It therefore followed that the verification part of the process went a lot smoother than with nested Link elements.

Moving on to nested Image elements was a relatively easy step to take from there. As the Image elements create their alt attribute values by processing the link label instead of encapsulating it (as with Link elements), the two big changes were easy to consistently apply across each of the new tests. The first change was to remove any tokens that were being encapsulated between the start Link token and the end Link token, replacing them with a single Image token. The second change was to look at an example nested Image element and determine what the alt attribute was going to be. After the first two or three tests, I started to get pretty good at doing that before I started verifying the tokens, saving a lot of time.

Finally, completing the group with the Link element inside of an Image element was almost trivial. As the different between a Link element inside of an Image element and an Image element inside of an Image element is one character (!), the link labels remained constant between the tests. As such, only minor changes were required to these tests after copying them from the previous group.

Dealing with Relatively Minor Issues¶

To get all the test passing and verified was a chore, but the good news was that most of the work was contained within the scenario test process that I have already defined. Considering the scope of the group of tests, the number of issues found in the non-test parts of the project were very small.

To be specific, there was only one change required. When adding the tests for Image elements within a Link element, the only change that I needed to do was to change the expression:

if last_token.token_name == MarkdownToken.token_paragraph:

if last_token and last_token.token_name == MarkdownToken.token_paragraph:

To be blunt, it was both confirming and unsettling at the same time. The confirming part of the process was that I had done the work on the project properly, with only a very slight change required. And hopefully it does not sound like I lack confidence, but it was also unsettling. After working on scenario tests across an entire theme, taking three to four days in the process, I somewhat expected the new scenario tests to find something that I missed.

I was happy that it did not find anything, do not get me wrong. It just took a bit of getting used to. And it was still a validation of the parser code itself, as the change was only required in the consistency checks. After some thought, it sank in that at this late stage of the project’s initial push, I wanted the results to be exactly this: the parser was being proved as validly constructed, again and again.

Rounding Out Series J¶

Based on the research that I did at the start of the week, I wanted to close out the week by responding to that research by rounding out the Series J group. As with my recent work in adding the Series L group of tests, I started out by scribbling down the combinations that I thought needed to be covered, looking for gaps that I had missed. While not a big gap, I added tests J2a and J2ai to fix a small gap where I did not have a newline in the Raw Html element.

With that initial fix made, the rest of the changes were fairly in scope with the new test that I documented at the start of this article. Starting with emphasized text, I added scenario descriptions and scenario tests encompassing a wide range of inline tokens, including Hard Line Break elements. I double checked everything and then began my usual process of executing and verifying the tests. And boy was I glad that I did!

While it was not a lot of code, I made changes to the __collect_text_from_blocks function and the __consume_text_for_image_alt_text function to properly handle these new cases. In the case of both functions, most of the inline tokens were handled, but the two Autolink inline tokens and the Hard Line Break tokens were not handled. While the extra code to remedy these issues was relatively small, it was a good find. It felt good that these issues were found directly because of this new group of scenario tests. It felt like validation of the approach I had taken.

From a consistency check point of view, there were only a couple of issues that were found. Mirroring the change made for split emphasis at the start of this article, the __verify_next_inline_hard_break function was changed to only increase the rehydrate_index if the token was not inside of an active Link token. The other issue was a small, nitpicky thing: adding the text + 1 to the output for the main assert in the __verify_next_inline_code_span function. Other than those two changes, the consistency checks had a clean bill of health.

What Was My Experience So Far?¶

I have to admit that I wondered (out loud, to my dog, who did not help in the discussion one way or the other) whether this was a good investment of time once the week had ended. The broad sweeping groups that I added confirmed that the parser code was in good shape, as were the consistency checks that watched over that code. Maybe it was me still thinking I was in the middle part of the “game” of creating the project, and not the end game where I believe I am currently at. But as I mentioned above, I had both positive and negative emotions about the results. Happy that things were going well, but not as trusting of those results as the tests had proved out.

Taking some time to think about it as I am writing this article, I do think my descriptions of “middle game” and “end game” are appropriate metaphors to where I am on the project. After a long time spent in the middle of the project, I believe it is just taking me some time for me to switch into the mode where I am wrapping things up to complete this first phase of the project. As such, I when I start that week’s work, I believe that I am going to find more issues than I find, and then I turn out to be happy when I do not find many issues. I truly believe that when I properly switch my mentality to an end game mentality, I will be expecting the tests to verify the work already done.

Does that mean the project will be properly tested? Nope. If you ask any person experienced with testing a project that question, they will not give you a solid answer. That is not due to lack of confidence, but because there is not one to give. There will always been edge cases that are not thought of and weird things that can show up. It is my belief that you can find most of the issues with any given project, but it is always a question of when that next issue will show up, and who will find that issue.

In both professional life and for this project, my goal is the same. If not to find that issue before the customer does, then to make sure I have a solid assessment of the risk outlined and evaluated. And with these latest changes in the past week, I see that measure of risk going down, which is a good thing.

What is Next?¶

With a solid weeks’ worth of “big ticket item” issues resolved, I decided to try and tackle a lot of the smaller issues. I just started with a couple of easy ones and kept on going until I stopped.

Nomenclature can be everything, and changes from job to job. From my viewpoint, Testers are people that are very methodical and document what they did, in what order, and what the results are. Automation Developers like me, a SDET or Software Development Engineer in Test, take documented patterns and results, writing test code and test frameworks to make sure that those scenarios can be written as code which can be used to automatically validate the results. There are exceptions, but the general rule is that most Testers only have a small amount of the skills required for an Automation Developer, while most Automation Developers are very adept at most of the skills required for a Tester. Both skill sets are very useful on projects from a quality point of view. ↩

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments