Markdown Linter - Double Checking My Double Checks

Summary¶

In my last article, I talked about my continuing efforts to clean up the scenario tests and the existing parser issues. In this article, I talk about my efforts to bolster the consistency checks I have built into the project.

Introduction¶

By now, all the low hanging fruit from the Issues List have been picked, preserved, and stored away. What are left are the more difficult to deal with issues. Depending on how one feels about challenges, this is either a good thing or a bad thing. I choose to interpret it as a good thing. But I also know that a challenge usually means more work. And in the case of getting the PyMarkdown project’s consistency checks working at full strength, it was going to take a lot of intense debugging work.

What Is the Audience for This Article?¶

While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather than the solutions themselves. For a full record of the solutions presented in this article, please consult the commits that occurred between 26 Oct 2021 and 31 Oct 2021.

Why Do I Care About Consistency Checks?¶

As I started to implement the parser part of the PyMarkdown project, I quickly came to one unshakeable conclusion: I was undertaking something that would be huge in scope. I knew that the 673 examples that came with the GitHub Flavored Markdown specification were only the start of the testing. I figured that those cases would be most useful in verifying that the correct HTML was being generated, and that verification would lend some confidence to the parser processing the documents properly. But as those examples only include HTML output, I would need to verify the tokenized output independently of those examples. And my guess was that the other verification would explode the number of scenario tests to somewhere over 2500, which is what I thought was a decent estimate at the time.

To overcome this perceived testing deficiency, I added two levels of checks to verify the consistency of the parsed Markdown tokens. The first level of checks, the Markdown generator, tests the ability to parse from Markdown into a token stream and back again. While the presence of container blocks requires extra code to manage the complexity, this is meant to check prove that that parsing is a relatively simple bi-directional process without any loss of information. The second level check, the position checker, is a more intense look at the information provided with each token. Basically, it specifically focused on the size of each token and whether the position of the token plus the size of the token equals the position of the next token.

These two types of checks account for around 95% of the discovered and reported issues that I have found so far. And as the easy-to-find issues are mostly gone, it is the difficult-to-find issues that are now showing up. Knowing this, I can easily guess that if any new scenario tests uncover issues, they will probably uncover issues with either nested List elements or Link Reference Definition elements. And in the last couple of months, I believe that only a handful of parser errors have been found, a slightly larger number of Markdown generation errors, with the position checked having the most issues. Put together, there are two problem areas that are well tested in the parser, but the consistency checks are still finding issues, though mostly within their own code. Not perfect, that that knowledge helps.

So why do I care about these consistency checks? Because they are thorough in finding issues. That is, when I do not disable them.

So… Why Do You Disable Them?¶

The truth is that the parser and the Markdown generator are the most stable components of the project. Because of its very nature, the position checker is very precise on where it expects tokens to be positioned. As such, I have often disabled the checks with disable_consistency_checks=True when adding a new feature or fixing an existing issue. I usually try for between one hour and three hours to get the checks working, but after the two-hour mark, I go increasingly with disabling the check. I honestly admit that this is a bad habit, and one that I am trying to stop following. But that process does help me from getting stuck, so as with anything, it is not all bad.

In looking at what I wanted to do this week, I decided to focus on examining the existing tests with consistency checks disabled¹, with a goal to enable those checks. Following a search-and-remove of that text from the scenario tests, I had eighteen tests that I needed to get passing. And as I was removing that code, I did notice that the tests were in at least six groups:

2 dealing with HTML Blocks
1 dealing with Paragraphs
1 dealing with Link Reference Definitions
8 dealing with List Blocks
1 dealing with Block Quotes
5 from the “extra” scenario tests

Even with this limited information, I knew that this work would likely take me the rest of the week to complete. With at least six distinct groups, it was highly likely that I was going to look at six or more different areas of the position checker code and address those issues. That was going to be time consuming, as that meant I was going to be looking all over the position checker code.

As I started fixing issues with the position checker, I was not surprised when fixing those issues uncovered other errors with the Markdown generator and even the Markdown parser. It just made sense to me. If there was an issue with one of the two checkers, there was a decent chance that the issue would affect the other checked. And with both consistency checks disabled, there was also a decent chance that a boundary condition failure was being missed. From my experience, I had a feeling that once I was able to clear up any issues with the consistency checkers, I had a high degree of confidence that it would uncover at least two to three issues in the other components. After all, that is what those consistency checks are designed to do.

Finishing the work for Issue 87, the issue I created to contain this work, it took me until right around noon on Sunday to get everything completed. I am not in the least bit surprised that it took that long to complete. The amount of debugging I did this week was just immense. If I had to guess, I probably added as much debug code as non-debug code, just to be able to follow what was going on in the code. But the goal was reached: all eighteen scenario tests are now fully enabled.

The Journey¶

The journey to get there was not an easy one, but it was one that paid off in the end. While these are in order of where they were found, the order in which they were found was all over the place. In one case, I found three issues at the same time and had to focus on one before getting to the others. In my mind, it was more important to fix things and note how long they took to fix, not the order in which they were fixed. That, and I arranged my notes by where they were found and not the order in which they were fixed.

It Just So Happened That This Was First¶

Picking one of the tests at random, I settled on test function test_list_blocks_237g. While I was there for problems with the position checked, I noticed something almost at once upon looking at the tokenized output for the test: the Blank Line token was repeated when there was only one Blank Line present in the Markdown document. Enabling debug and rerunning the test, it was obvious to see why. As far as I could tell, the Blank Line element was being properly processed within the Block Quote element. But once that was complete, that same Blank Line was also being handed off to the Leaf Block Processor where it was processed again, yielding another Blank Line token. It was a parser issue!

Fortunately for me, I was able to figure out a quick way to resolve this issue. The following code was already in place to handle the Blank Line within the Block Quote element:

if did_blank:
    POGGER.debug(">>already handled blank line. returning.")
    container_level_tokens.extend(leaf_tokens)
    return container_level_tokens, line_to_parse, this_bq_count, None

I just made one minor change to that code:

    return container_level_tokens, line_to_parse, this_bq_count, None, True

as well as other changes to the returns in that function and the functions that call that function. While it was correctly parsing the Blank Line the first time, that function was not passing back any information that would allow the second parsing of the Blank line to be avoided. By returning that True from that code, downstream code was notified that the Blank Line had already been processed, and that it did not need to continue.

That was one of the easier issues to debug.

Double Check Your Output… Twice¶

Another test that looked weird when I looked at it was test function test_extra_007cx. When I look at a scenario test that is failing, the first thing that I do is go over the Markdown and the tokens to make sure that I have familiarity with the content that is being parsed. As I did that for this test function, I noticed that trailing whitespace was missing. After I double checked my initial findings, I did indeed have another parser issue. In this case, any whitespace within Text elements within Link elements was being omitted from the Link token. Not an easy find, but a good one.

Tracking through the code with the debugger, I came to the __complete_inline_loop function, where I was firmly convinced the issue was. Looking through the logs, the Link token and the encompassed Text token agreed up to this point, and they diverged after it. And while it was not much, the space at the end of the line was present in the Link’s Text element, but not in the Link element itself.

Doing a bit of experimentation, I added the following code to correct this behavior.

if new_string == "\n" and end_string:
    split_end_string = end_string.split(ParserHelper.newline_character)
    POGGER.debug(
        "split_end_string>>$>>",split_end_string,
    )
    if len(split_end_string) >= 2:
        new_string = split_end_string[len(split_end_string)-2] + new_string

As the trailing space was present in text block, but not in link element, I knew that I should not touch the new_string variable, only the new_string_unresolved variable. It took a while to get the second if conditional right, adding an extra scenario test to help tune the code properly.

The interesting part of trying to solve this issue was not the challenge in fixing the issue itself, that was the easy part. The interesting part was that the definition of the issue was tricky and often nuanced. As I said above:

any whitespace within Text elements within Link elements was being omitted from the Link token

This was easily a boundary case, and a good find. But not all the solutions were as easy as these two.

Transforming Into Markdown¶

When I was looking for a next scenario test to clean, function test_block_quotes_extra_02ae and its related scenario tests looked like they were an excellent choice. At some point in the resolution work, I noticed that there were four scenario tests that were failing their consistency checks due to Markdown generation. Before I spent more time digging deeper into the position check, I felt that my time would be best spent getting these four functions working properly.

As I examined each of the failing scenario tests, I noticed that they all focused on one single are: extra whitespace at the start of a Block Quote element line. There were slightly different variations on this issue, but underneath they all had the same cause. Digging into the cause, it was a simple matter to overlapping data due to nested container elements. Specifically, in the case of test function test_block_quotes_extra_02ae, it was the overlapping data of the List element and the Paragraph element.

During the development of the List elements and the Block Quote elements, I made some decisions that have affected later work. The easy decision to explain is the decision about the Block Quote element. Looking across all the examples in the GitHub Flavored Markdown specification, there are a good collection of properly spaced Block Quote characters and improperly spaced Block Quote characters. Given that information, I made the hard choice to include those starting characters in the field leading_spaces for the Block Quote token that owns that line. Managing that field and tracking along with it will parsing and checking has been frustrating, but it still makes sense to me. However, because of the more rigorous spacing requirements and a reduced quantity of improperly indented List element examples, for List tokens I decided to record their indentation levels in the indent_level and dealing with any outliers as they surfaced.

Based on the data that I had at the time, I sincerely believe that was the correct choice to make. Adding that same leading_spaces field to the List tokens would have been too ambitious given the requirements for it. And whether I liked it or not, this situation was a direct result of that decision. But because of those decisions when this Markdown is parsed:

> - start
> - quote
>
>   end

the last line gets parsed into a Paragraph token starting with two spaces inside of a List token with an indent of 4.

if extracted_whitespace:
    found_block_token = None
    was_list_found = False
    for i in range(len(self.container_token_stack) - 1, -1, -1):
        if self.container_token_stack[i].is_block_quote_start:
            found_block_token = self.container_token_stack[i]
            break
        was_list_found = True
    if found_block_token and was_list_found:
        split_leading_spaces = found_block_token.leading_spaces.split(
            ParserHelper.newline_character)
        specific_start = split_leading_spaces[
            found_block_token.leading_text_index - 1]
        if len(specific_start) <= len(extracted_whitespace):
            extracted_whitespace = extracted_whitespace[len(specific_start) :]

It took a bit of time to get there, but finally I arrived at the above code to mitigate this issue. Because of the overlap the whitespace in the Paragraph token and the List indent, this code detects that very circumstance with an existing Block Quote element and changes the extracted_whitespace variable to reduce the amount of whitespace. With a decent amount of testing, this was able to knock a couple of issues off the list all by itself!

Position Checker Woes¶

At this point in the debugging process, I was keenly aware of two things. The first was that I probably had any existing parser issues and Markdown generator issues resolved. While that meant that I needed to be vigilant about my work in getting the position checker issues resolved, it gave me the confidence to know that I probably would get most of the serious issues resolved. At the time and while I was writing this, I was incredibly careful to think “most” and not “all”. Issues are tricky things, and my experience makes me believe that you never get all.

The second thing was that this work was taking a lot of time. Just for the analysis, debugging, fixing, and testing of the three prior issues, I was up over twelve hours in total. While that included one or two of the issues in this section, the point was clear. This was taking a lot of time to get right. But I still felt strongly that it needed to get done, so I continued to push forward.

Providing Foundational Support¶

The most obvious thing that I needed to add was a sense of whether a Block Quote element was in effect when validating a given set of tokens. While the actual code to use it would come later, this code was added in a couple of places, such as in the __validate_block_token_height function, to calculate that token:

container_token_index = actual_tokens.index(last_token)
block_container_token = None
if container_token_index > 0 and actual_tokens[container_token_index-1].is_block_quote_start:
    block_container_token = actual_tokens[container_token_index-1]

While I would normally be more carefully about copy-and-paste code, I already have plans to revisit the consistency check modules to see about improving them. As such, this was both quicker and would give me a chance to evaluate some refactoring tools in the new future.

Along the same path, I added proper support for Block Quotes elements within List elements in the __maintain_block_stack function. It was not much, but I knew that I needed to have that value calculated in multiple places. This was providing that support while adding the necessary code only in a single place.

List Elements and Block Quote Elements - Redux¶

Like the work detailed in the Transforming Into Markdown section, this change to the __validate_new_line function needed to account for Block Quote tokens and List tokens being used at the same time. Although the code looks slightly different, both changes are accomplishing the same goals, just in different ways. In this function, because of the position checks that go on, the code was a lot more convoluted and nuanced. But with a solid group of five scenario tests that all looked similar:

FAILED test/test_markdown_block_quotes.py::test_block_quotes_extra_02ae - AssertionError: Line:4:[para(4,5):  ]
FAILED test/test_markdown_list_blocks.py::test_list_blocks_237x - AssertionError: Line:3:[para(3,8):]
FAILED test/test_markdown_list_blocks.py::test_list_blocks_237f - AssertionError: Line:4:[para(4,8):]
FAILED test/test_markdown_list_blocks.py::test_list_blocks_237g - AssertionError: Line:4:[para(4,8):]
FAILED test/test_markdown_list_blocks.py::test_list_blocks_238a - AssertionError: Line:3:[para(3,10):]

it was easy to go through and look for commonalities. It did take a fair amount of time to get the solution dialed in properly, but it was worth it.

Block Quotes and Blank Lines¶

Not yet finished with those tests, I resolved the issue in the previous section, only for me to be greeted by another issue with those same five scenario tests:

FAILED test/test_markdown_block_quotes.py::test_block_quotes_extra_02ae - AssertionError: Line:3:[BLANK(3,2):]
FAILED test/test_markdown_list_blocks.py::test_list_blocks_237x - AssertionError: Line:2:[BLANK(2,3):]
FAILED test/test_markdown_list_blocks.py::test_list_blocks_237f - AssertionError: Line:3:[BLANK(3,3):]
FAILED test/test_markdown_list_blocks.py::test_list_blocks_237g - AssertionError: Line:3:[BLANK(3,3):]
FAILED test/test_markdown_list_blocks.py::test_list_blocks_238a - AssertionError: Line:2:[BLANK(2,3):]

I had taken care of one issue, only to have another one replace it. I cannot say that I was surprised because I was not. This entire week was about things getting uncovered that also needed fixing. And like the other issues, this one was obvious when I looked at it. When computing the estimated column number for a Blank Line element, the estimated was sometimes off by two within Block Quote elements.

While this was obviously the problem, the issue with this issue was in defining the circumstances in which this was the problem. This code:

elif container_block_stack and current_token.is_blank_line:
    if container_block_stack[-1].is_block_quote_start:
        leading_text, did_x = (
            container_block_stack[-1].calculate_next_leading_space_part(),
            True,)

was obviously being used in all cases, and there was at least one case where it was being used improperly. After performing handful of experiments, it turned out that there were three cases. The first was if the Blank Line token was followed by end tokens. If it was only followed by end tokens, then that recalculation needs to be applied. The second was if the “block” of end tokens after the Blank Line did not include an end List token. Finally, the block part of the logic depended on whether the last block token was a Block Quote token or if the first block token was a Block Quote token.

After about four hours of experimentation and debugging, I arrived at this code:

elif container_block_stack and current_token.is_blank_line:
    top_block = None
    if container_block_stack[-1].is_block_quote_start:
        top_block = container_block_stack[-1]
    elif top_block_token:
        top_block = top_block_token

    needs_recalculation = False
    was_end_list_end = False
    next_token_index = actual_tokens.index(current_token) + 1
    while (
        next_token_index < len(actual_tokens)
        and actual_tokens[next_token_index].is_end_token
    ):
        if actual_tokens[next_token_index].is_list_end:
            was_end_list_end = True
        next_token_index += 1
    if next_token_index < len(actual_tokens):
        needs_recalculation = True
    else:
        needs_recalculation = not was_end_list_end

    if top_block and needs_recalculation:
        leading_text, did_x = (
            container_block_stack[-1].calculate_next_leading_space_part(),
            top_block.calculate_next_leading_space_part(),
            True,
        )

It backed up my earlier suspicion that the code was right, just not being used properly. It did take a while to nail down the nuances of this code, but it all made sense then I got there!

Second Verse, Same as The First¶

And it seems like a continuing theme, having to adjust for Block Quote idents in various pieces of code, and the changes to the __process_previous_token function and the __verify_next_inline_handle_previous_end were no different. The change for the __process_previous_token function was to add this code:

if block_container_token and old_line_number != estimated_line_number:
    delta = estimated_line_number - old_line_number
    if not current_inline_token.is_inline_link_end:
        block_container_token.leading_text_index += delta
    split_leading_spaces = block_container_token.leading_spaces.split(
        ParserHelper.newline_character
    )
    estimated_column_number += len(
        split_leading_spaces[block_container_token.leading_text_index]
    )

and the change to the __verify_next_inline_handle_previous_end function was to add this code:

if block_container_token:
    block_container_token.leading_text_index += new_lines
    split_leading_spaces = block_container_token.leading_spaces.split(
        ParserHelper.newline_character
    )
    new_estimated_column_number += len(
        split_leading_spaces[block_container_token.leading_text_index]
    )

Both sets of changes are almost identical, except for a couple of slight changes. But that was always expected, so it was comforting in a way to find these last two sets of changes to be so alike. If the triggering condition was met, update the leading_text_index field, grab the part of the leading_spaces field relating to the index, and add the length of that value to the current column number.

And with that, things were clean. Well, almost. As I mentioned at least once in this article, nested container blocks are nuanced. As such, when I tried to test something against alternating container blocks that were four levels deep, I ran into an issue. While I do want to tackle that as time allows, this work was about cleaning up disabled consistency checks, not nested blocks. As such, left those two tests in the code and disabled them for now.

I must admit that I wanted to deal with them right away, but I knew that I had enough to do as it was. Taking that on would just delay things. I believe I made the right move in delaying that work, but sometimes it does not feel that way.

Doing My Usual Cleanup¶

When I was going through that work, I did notice that there were unusual cases where I was still using string concatenation instead of f-strings. While gearing up for the actual check of the reported issues, I thought it would be good to go through and clean those up. While it does not show us as increased code coverage or increased scenario coverage, it just felt good to get it done and have everything using f-strings.

The Main Event¶

Yes, after all this work, there was one more thing to do: The Main Event. With a solid set of passing tests, with all consistency checks working properly, I could now start verifying that the Paragraph token’s rehydrate_index field and the Block Quote token’s leading_text_index field were working as intended. This was the plan all along, but I had to be sure everything else was in place before I could start on this.

Working through all the scenario tests listed in the issue, I was able to verify that any paragraph hydration concerns with those scenario tests had been fixed as I was unable to reproduce even one failure. There were now multiple checks to verify that the rehydrate_index field was being used properly, and all those checks were passing properly. That was the focus, and after a solid week of work to get there, I was able to resolve those issues from the Issues List. That was the good news.

The bad news was that the same could not be said for the checks for the leading_text_index field. At the end of the __rehydrate_block_quote_end function was an assert that was commented out:

# assert (
#     leading_text_index == expected_leading_text_index
# ), f"leading_text_index={str(leading_text_index)};expected_leading_text_index={str(expected_leading_text_index)}"

After two hours of trying to resolve all the issues, I added an issue to the project’s Issues List and cleaned things up. What I was able to do however, was to get it closer to the final state:

assert leading_text_index in [
    expected_leading_text_index,
    expected_leading_text_index + 1,
], f"leading_text_index={leading_text_index};expected_leading_text_index={expected_leading_text_index}"

This uncovered an unnoticed issue in the parse_paragraph function that caused the parser to emit an extra newline character. This issue is a weird boundary condition, and without the above assert, the likelihood of me finding it was remote at best. Things must align properly for this to happen, as the code to detect this condition shows:

apply_paragraph_adjustment = True
if (
    container_index + 1 == len(parser_state.token_stack)
    and position_marker.line_number == top_block_token.matching_markdown_token.line_number
    and container_index > 0
    and parser_state.token_stack[container_index - 1].is_list
):
    if (
        position_marker.line_number
        == parser_state.token_stack[
            container_index - 1
        ].matching_markdown_token.line_number
    ):
        apply_paragraph_adjustment = False

if apply_paragraph_adjustment:

And, once again, it took a while to figure it out, but I got there.

What Was My Experience So Far?¶

Sometimes the solution hits you right away, and sometimes you must fight for every part of that solution. This week, there was a lot of fighting. While some of the time it was about the problem and not the solution, most of the time it was around discovering the nuances of the problem. And that was challenging, but the work was exceedingly difficult to focus on. It required me to keep a level head and to clearly write things out for myself, so that I could look at the problem from different angles.

And that is what ultimately saved me. Writing things out. Yes, lots of scribbles and pages of weird drawings with arrows and boxes, but it worked. What I found useful about my usual practice of using paper to design was that in trying to figure out the actual problem, I was not establishing a clean visualization of the problem. Only when I was able to get enough test data represented in a cohesive form on those pieces of paper, all visible at the same time, was I able to properly view the problems and make my breakthroughs.

Now, am I saying that everyone should use paper? Nope. But for me, the pivotal thing is that it helps me to visualize things. It does not matter if it works for anyone else, but that helps me to visualized. And the more difficult things get, the more important it is to find a way to visualize things in a cohesive manner that is digestible. For me, that formula is a decent amount of paper scribbles mixed in with good logging. That just allows me to keep state and assumptions on paper while testing them as I go through the log files and recording the actual changes. As I said at the start of the paragraph, it just must work for me.

So, was it hard? Yes… it was hard. But I also moved the project forward while starting to refine another skill to add to my toolbox. That was unexpected, but cool!

What is Next?¶

Looking ahead in the Issues List, I would like to make more progress on getting things cleared out. And with me removing the disabled consistency checks this past week, I at least want to try and tackle the pragma issues. Stay tuned!

Except for pragmas, that hopefully will be next week. ↩

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments