Markdown Linter - Continuing Progress On Implementing New Rules

Summary¶

In my last article, I talked about my efforts to streamline the rule implementation process and how that went. In this article, I talk about how that effort continues to pay dividends for the project.

Introduction¶

Having looked at the tasks that I need to complete before I even remotely think about another beta version release for the PyMarkdown project, it was obvious to me that I need to implement more linting rules. Having completed a big push over the last six months for foundational work, the foundation of the project was looking stronger than ever. But without a good set of rules to provide a decent linting experience, the project is essentially just a very “expensive” GitHub Flavored Markdown parser with some extra features on top. From my point of view, it was essential to get more rules completed.

So now, after a couple of weeks of working on this task and some strict development rules for myself, just over one third of the remaining rules are now implemented. Not bad for two weeks’ worth of work! But I had to keep that enthusiasm in check, making sure it did not become complacency. It was some good progress, but there are still a lot more rules that need to be implemented. So, I wanted to keep my enthusiasm, but at the same time keep it in check with some solid pragmatism.

Given that, some confidence, and a somewhat clear calendar for the week, I started to work on more linting rules!

What Is the Audience for This Article?¶

While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please consult the commits that occurred between 05 Aug 2021 and 08 Aug 2021.

Working Fast and Efficiently - Part Two¶

After adding seven new rules last week, I was eager to get back to work on the PyMarkdown project and see if I could implement another seven rules this week. I was confident that I had the energy and the effort, but I was not sure if the remaining rules would lend themselves to the rapid design and implementation that made achieving the previous week’s progress possible.

Regardless of what those remaining rules looked like, I wanted to try and keep to my two personal rules from last week, the 30 minutes design rule and the 2 hour implementation rule, to see how many more rules I could implement. If nothing else, I knew that I could use small variations of those personal rules to keep me centered and focused on the tasks at hand. As I know that I occasionally have problems with focus, so the decision to keep with a process that helps me keep focus on the right tasks was not a difficult one to make.

So, with some optimism in mind, I started working on the new set of rules for this week!

Rules Md039 and Md040 - Getting Lucky¶

Picking up the next rule, Md039 - Spaces inside link text, I was lucky that I was able to complete the design within five minutes. When I say that, I am not being flippant or exaggerating about my design skills… the design of the rule was just that easy. It was easy enough that I decided to look ahead at the following rule Md040 - Fenced code blocks should have a language specified, to see if it followed the same pattern. Luckily enough, it did! In both scenarios, the rule reacts to information that is completely stored within a single token. Based on that information, working on these two rules together and get them completed as a pair just seemed like the correct thing to do.

Design¶

As I mentioned in the last section, the design for these two rules was trivial compared to other rules. In the case of Rule Md039, all text used for the link label is stored within the token’s text_from_blocks field. For Rule Md040, the text after the Fenced Block boundary characters is stored within the token’s extracted_text field. That made the design simple: look for the specified token and check the specified field to see if it matches the requirements for triggering the rules.

Implementation and Testing¶

Proper testing of every scenario was the hard part for Rule Md039. Because both Link elements and Image elements are impacted, along with the four types of notations allowed for each, I created a total of twenty scenario tests to cover each possibility. Compared to that, there was little effort required to create the four scenarios for Rule Md040.

Once those scenarios were completed, the rest of the implementation went off without any problems. For Rule Md039, once a Link element or an Image element is detected, a simple check is required to see if there are any spaces on either side of the link label. With that information stored in the text_from_blocks variable, the following code provides that check:

def next_token(self, context, token):
    if token.is_inline_link or token.is_inline_image:
        if token.text_from_blocks != token.text_from_blocks.strip():
            self.report_next_token_error(context, token)

Similarly for Rule Md040, given the start of a Fenced Code Block element, a simple check was added to determine if the extracted_text field contained any characters after removing any whitespace:

def next_token(self, context, token):
    if token.is_fenced_code_block:
        if not token.extracted_text.strip():
            self.report_next_token_error(context, token)

While I know I will not always get as lucky as with the design and implementation of these two rules, it was nice to know it could happen every so often.

Rule Md042 - No Empty Links¶

From the start, I knew that this rule was going to be another rule that would be handled with a simple implementation. In this case, I noticed a lot of similarities with Rule Md039. But instead of checking for spaces on either side of the link label, it was checking for a link URI that had not been specified. This is a trick that authors often use to ensure that they fill in link URIs with the proper URLs before publishing a document. As I use this trick myself, I had a personal stake in making sure that this rule was working properly.

Implementation and Testing¶

As the design of this rule was so like the designs for both Rule Md039 and Rule Md040, I leveraged their design instead of creating a new one. For the testing of this rule, I was fortunate that this rule only applies to Inline Link elements and Inline Image elements.¹ As such, I only needed to provide scenarios that deal with both inline types of links.

With the information that this rule only fires if there is an empty URI or an empty URI fragment reference, the code for this rule was quickly implemented:

def next_token(self, context, token):
    if token.is_inline_link or token.is_inline_image:
        stripped_link_uri = token.active_link_uri.strip()
        if not stripped_link_uri or stripped_link_uri == "#":
            self.report_next_token_error(context, token)

Rule Md045 - Alternate Image Text¶

At this point, I was hoping that I would find more of these simple rules to implement, but also dreading them at the same time. For whatever reason, I felt that the more of these rules that I found, the more of a chance that the other rules would be orders of magnitude more difficult. However irrational it might seem, I worked through those feelings as distractions and moved on.

But after looking at this rule, it was obvious that it fell into the same pattern as the last three rules, and I was grateful for the chance to keep things going forward.

Implementation and Testing¶

With three scenarios to cover inline Image elements and four scenarios to cover the other three types of Image elements, the scenario tests were covered, and I was ready to start implementation. As the Image element’s link label specifies the alternate image text stored in the img tag’s alt parameter, the implementation was just to verify that there is text in the text_from_blocks field, as follows:

def next_token(self, context, token):
    if token.is_inline_image:
        if not token.text_from_blocks.strip():
            self.report_next_token_error(context, token)

As I looked ahead to the next rule to implement, I breathed a sigh of relief. It was not a simple one, but it was not an extremely difficult one either. Just a good solid rule to work on, nothing more.

Rule Md046 - Code Block Style¶

While not as simple as the last four rules, I quickly found out that this rule followed the pattern of the style rules that I have implemented before. Looking into what I did for Rule Md003 and Rule Md004, I did not believe I had to do a complete rewrite of a rule, just a massaging of the previous work from those old rules to form a new rule. Not as easy as I had gotten used to, but also not as difficult as some other rules either.

Design¶

In Rule Md003, there is a lot of extra code to determine what the heading levels were for both the Atx Heading element and the SetExt Heading element. But looking at Rule Md004, its implementation was a lot simpler. In the case of that rule, most of the rule is geared towards checking if the applied style is correct, with only a handful of lines used to look for Unordered List elements and to track different levels of styles of each level of those elements. As such, it seemed prudent to base the design for Rule Md046 on Rule Md004, albeit with a couple of changes.

Those changes were simple and easy to implement. Instead of tracking Unordered List elements, Code Block elements were tracked. And instead of tracking the multiple levels required for the nesting of List elements, a single field containing a single style was sufficient for this rule.

Implementation and Testing¶

Going through the permutations in my head, there were only three scenario tests that I needed to write: two Fenced Code Block elements, two Indented Code Block elements, and a final test with one of each. For every other combination of Code Blocks that I came up with, I was able to reduce the combination into one of those three base scenarios. Wanting to be sure that I did not miss a combination, I worked through each of my scenarios again, and arrived at the same result. Three scenarios it was.

As indicated in the design, I copied the source for Rule Md004 into rule_md046.py, with only a couple of changes being required. Instead of the five styles available for Rule Md004, I defined a new set of three styles to use:

    __consistent_style = "consistent"
    __fenced_style = "fenced"
    __indented_style = "indented"
    __valid_styles = [
        __consistent_style,
        __fenced_style,
        __indented_style,
    ]

Then, in the next_token function, instead of tracking Unordered List elements, I modified the code to track Code Block elements. With that done, I only needed to set the current_style variable to the current style, and the code from the previous function did the rest of the heavy lifting.

def next_token(self, context, token):
    if token.is_code_block:
        current_style = (
            RuleMd046.__fenced_style
            if token.is_fenced_code_block
            else RuleMd046.__indented_style
        )
        if not self.__actual_style_type:
            self.__actual_style_type = current_style
        if self.__actual_style_type != current_style:
            extra_data = ("Expected: "+ str(self.__actual_style_type)
                + "; Actual: "+ str(current_style))
            self.report_next_token_error(
                context, token, extra_error_information=extra_data
            )

Rule Md048 - Code Fence Style¶

Having just completed the code for Rule Md046, I was fortunate to look ahead and find that Rule Md048 was almost identical in composition to Rule Md046. The only difference was that instead of verifying the style of the type of Code Block element, this rule was focusing on verifying the style of the character used to define the Fenced Code Block element itself.

Implementation and Testing¶

I am sure that any readers will not experience surprise when they find out that the implementation and testing of this rule were almost exact copies of the work done for Rule Md046. Besides the available style names changing, the only other code that changed was replacing this code:

    current_style = (
        RuleMd046.__fenced_style
        if token.is_fenced_code_block
        else RuleMd046.__indented_style
    )

with this code:

    current_style = (
        RuleMd048.__backtick_style
        if token.fence_character == "`"
        else RuleMd048.__tilde_style
    )

But having implemented a handful of these low-cost rules in a row, I was getting a bit restless. I did not have any issues in getting these rules completed, but I just felt like I needed a break to shake things up a bit. I did not realize that it would be a longer break than I intended.

Rule Md044 - Capitalization Of Proper Names¶

Having kept to my two efficiency rules for a while now, I wanted to give myself a chance to let loose and pick up a medium difficulty rule. If nothing else, every rule in the list needs to be implemented, so the work was not going to be wasted. With only a small deviation from my efficiency path, I thought that this rule would be a good one to shake things up a bit.

Honestly, i did not realize how much it would shake things up until it was all over, with a total of thirty-one tests required to validate it. I was in for quite the surprise.

Design¶

The thing that drew me in to this rule from the beginning is that if felt like a “simple” search for proper name strings within another string. Or at least that is what I thought it was at the start. As I read through the rule a bit more, there was one caveat: a configuration value allowing or disallowing this rule from looking in Code Blocks.

That caveat was important. Without that configuration value, this rule was a simple string-in-string search. With that configuration value, the design would need to deal with individual types of elements that may contain the proper names that are being searched for. While I knew that list contained the Text element under certain conditions, I was not sure how many other elements would require similar treatment.

To combat that uncertainty, I decided to use an iterative design approach. I started the design process by narrowing my scope to the Text element scenario. For each additional element that needed the same approach, I planned to revisit the design to resolve any additional issues that crept up. It was not ideal, but unless I wanted to spend a lot of time designing everything up front, I knew it would work.

The basic design was simple: use a simple search-find-next loop on the lower-case equivalent of the Text element to find every potential candidate for examination. For each candidate, ensure that the candidate is isolated on both sides by whitespace before checking to see if the capitalization of the candidate matches the requirements for the specified proper name. Without the isolation by whitespace, a proper name of AD would trigger on readme.md for the re before the name was found and the me.md after the name was found.

Implementation and Testing¶

Working through all the input permutations that I needed to test against this rule, I ended up with thirty-one different scenario tests. While nine of them deal with Paragraph elements, the rest of the tests deal with each of the other elements that can conceivably contain text that needs to be scanned. Of those remaining tests, fifteen of those scenario tests dealt with the various types of links and how they can be put together. It was quite the list of tests that I would have to complete.

Starting with the simple cases, the first iteration of the next_token function was very simple and just focused on Text elements:

def next_token(self, context, token):
    if self.__proper_name_list:
        if token.is_text:
            if not self.__is_in_code_block or self.__check_in_code_blocks:
                self.__search_for_matches(token.token_text, context, token)
        elif token.is_code_block:
            self.__is_in_code_block = True
        elif token.is_code_block_end:
            self.__is_in_code_block = False

From there, the applicable text was passed to the __search_for_matches function. That function performed the search-find-next loop through the text for each of the proper names:

def __search_for_matches(
    self,
    string_to_check,
    context,
    token,
):
    string_to_check = ParserHelper.remove_all_from_text(string_to_check)
    string_to_check_lower = string_to_check.lower()
    for next_name in self.__proper_name_list:
        next_name_lower = next_name.lower()
        search_start = 0
        found_index = string_to_check_lower.find(next_name_lower, search_start)
        while found_index != -1:
            self.__check_for_proper_match(
                string_to_check,
                found_index,
                next_name,
                context,
                token,
            )

            search_start = found_index + len(next_name)
            found_index = string_to_check_lower.find(next_name_lower, search_start)

For each candidate found, the __check_for_proper_match function was called to see if the candidate was properly isolated before checking the capitalization against the required capitalization:

def __check_for_proper_match(
    self,
    original_source,
    found_index,
    required_capitalization,
    context,
    token,
):

    original_found_text = original_source[
        found_index : found_index + len(required_capitalization)
    ]
    after_found_index = found_index + len(required_capitalization)

    is_character_before_match = False
    if found_index > 0:
        is_character_before_match = original_source[found_index - 1].isalnum()

    is_character_after_match = False
    if after_found_index < len(original_source):
        is_character_after_match = original_source[after_found_index].isalnum()

    if not is_character_after_match and not is_character_before_match:
        assert len(original_found_text) == len(required_capitalization)
        if original_found_text != required_capitalization:
            extra_data = ( "Expected: " + required_capitalization
                + "; Actual: " + original_found_text )
            self.report_next_token_error(
                context, token,
                extra_error_information=extra_data,
            )

After checking the simple scenario tests involving Text elements and making sure they were all working, I knew it was time to move on to the other elements. But how hard were they going to be to implement? That I did not know.

Next Iteration: Other Elements¶

Starting with the Code Span element, I was quickly able to add the required code to search for any matches. The only issue was that the line/column number for any failures pointed to the start of the token, not where the failure occurred. To adjust for those failures, I added the same_line_offset parameter to the __search_for_matches. While I knew it would not handle any cases where the source data has newlines in it, it was a quick way to adjust the line/column number in the simple cases without newlines. For the Code Span element, I set this parameter to the parts of the element that occur in Markdown before the text:

    same_line_offset = len(token.extracted_start_backticks) + \
        len(token.leading_whitespace)
    self.__search_for_matches(
        token.span_text, context, token, same_line_offset)

Once the Code Span element was working, the other elements were somewhat easy to add. For the Link element, the text in the Link Label is already represented with a Text element, so I just had to worry about other text that was exposed. After checking out all four types of links, only the Inline link type has a component that needs to be checked, the active_link_title field.

That required a bit of work to set up, as there are plenty of scenarios where an inline Link element has newlines in it. Taking a slightly different approach, I created the __adjust_for_newlines_and_search function to compute any offsets for newlines before performing the search. That function heavily relies on the __adjust_for_newlines function to accurately compute the proper offsets for the line/column number to indicate where the failure occurred. But having created those functions, I was able to quickly calculate the variables to represent the text that occurs in the Link element before the Link Title field. Not without some testing errors that I had to resolve, but the changes were quick to implement and test.

Once the Inline Link element was up and working, adding the cases for both the Inline Image element and the Link Reference Definition element were quickly completed. In both cases, it was the same recipe as with Inline Link elements: perform the calculations of what the Markdown element looks like before the specified element and pass it in to the __adjust_for_newlines_and_search function.

Wrap Up¶

This rule was not difficult because of any of the individual tasks required to create this rule. This rule was difficult to implement because of its breadth and the adjustments required for the line/column number. Maybe it was because of the late hours that I used to work on this rule, but the line/column adjustments in the __search_for_matches always seemed to have a glitch that i needed to work out.

It was only after I sat down with my trusty paper and pen, sketching out every scenario that I was able to clearly see the complexities. Getting the proper line number was easy. But because the reporting code uses a column number of 3 to mean add 3 to the column and -3 to mean absolute column 3, I had to do some coding gymnastics. Not sure if I am going to try and clear that up in the future, but it is something to consider.

Rule Md034 - Base Url Used¶

After the ease of many of the previous rules and the difficulty of the last rule, it was nice to get a rule that seemed like it had a medium difficulty. But honestly speaking, from the description that I read of this rule, I had more questions than answers.

There was one interesting question that I needed to answer, as the documentation added to Rule Md034 notes: what is an URL? Following the proper definition of URLs, they can be anything from https://google.com to #fragment and everything in between. It all depends on the given context as to which definition of an URL is most applicable to that context. Even once that decision is made, trying to come up with anything resembling a complete algorithm for detecting all valid URLs can be troublesome at best. I needed to narrow down the URL context that I was looking for if I had any hope of being able to create a decent algorithm to find those URLs. Otherwise, I would need to deal with a nasty Regular Expression like this one. I wanted to avoid that at all costs.

I experimented with Visual Code and the Markdown Lint plugin for about an hour, trying to get more information on what it considered valid URLs to be. My best guess is that the original rule is very tightly focused on two specific types of URLs: base URLs for HTTP and FTP. Every other type of URL that I tried to get the rule to recognize failed. However, almost every http URL that I tried worked, even some of the wacky http URLs. Either way, I needed a good place to start from and I felt that the information helped me find that starting point. So off I went into my design phase.

Design¶

Since I found the original rule too difficult to read clearly, I decided to approach the design for this rule from a more foundational viewpoint. I would take cues from the original rule, but I needed to be able to design a rule that had clear goals and triggering conditions that I could easily defend in the documentation.

Starting at the beginning, the first part of this design was easy: eliminate this rule firing within a Code Block element, an HTML Block element, or a Link element. Within those elements, it did not make sense to look for Bare URLs, as text within those types of elements intrinsically meant something different than a normal section of text. It is those normal blocks of text that I needed the design to focus on.

Once I had a method to eliminate scanning those types of Text elements, I had to design a way to properly scan text for a series of characters that represented an URL. After having done that research noted in the previous section, I had an answer. I was not sure it was the right one, but I had the confidence that it was a decent answer to start with. For this rule, URLs were only going to be HTTP and FTP URLs specifying a path to a resource. Basically, the rule will recognize what most people using a browser consider URLs, but only the ones starting with http:, https:, ftp:, or ftps:.

I felt that the best option to find these URLs was a simple search-find-next loop. While I would have liked to avoid having to repeat the loop for every valid base URL, I did not see any way to avoid it. Small optimizations like looking for http and then looking for either : or s: had their merits, but I did not believe they would increase the performance of the algorithm at all. So, if any one of those bases matched, I would pass on that information to another function that would further evaluate the URL. Primarily, it would look for the sequence // after the base URL prefix and proceed from there.

A Good Course Change¶

This is the point after which the “course” of this rule changed more than any other part of the design. Instead of the previous rule’s complicated sets of boundary conditions, I decided to create a simpler set of boundary conditions. The first condition is that any existing character before the base URL’s prefix must be a whitespace character. The second is that the base URL’s prefix must be followed by the sequence // and at least one non-whitespace character.

I did this for a few reasons, but the most basic reason was for simplicity. Explaining the above triggering conditions for this rule was simple and required two sentences of documentation. By keeping the triggering conditions simple, I will hopefully also make the implementation of detecting those triggering conditions simple. A win for the rule and a win for the user. I can live with that.

Implementation and Testing¶

Coming up with the scenarios required to test this rule was like peeling back the layers on an onion. I started with the scenario of a valid URL within a Paragraph’s Text element and worked outward from there. I just asked myself how the URL could not be detected properly and worked through each scenario in turn. When I exhausted those variations, I made sure that the URLs would only be detected in normal Text elements by creating scenarios with a valid URL inside of non-normal elements, such as Code Blocks. When I was done, I had fourteen scenarios ready to go.

Moving on to the implementation, it proceeded rather quickly. After eliminating any Text elements within Code Block elements, Html Block elements or Link elements, a simple search-find-next loop was added to look for multiple occurrences of a base URL prefix within the provided text. Once an occurrence was found, it was handed off to the unwritten __evaluate_possible_url function for evaluation.

def next_token(self, context, token):
    if (
        token.is_text
        and not self.__in_code_block
        and not self.__in_html_block
        and not self.__in_link
    ):
        for url_prefix in RuleMd034.__valid_uri_types:
            start_index = 0
            found_index = token.token_text.find(url_prefix, start_index)
            while found_index != -1:
                self.__evaluate_possible_url(
                    token.token_text, url_prefix, found_index, context, token
                )
                start_index = found_index + len(url_prefix)
                found_index = token.token_text.find(url_prefix, start_index)
    elif token.is_code_block:
        self.__in_code_block = True
    elif token.is_code_block_end:
        self.__in_code_block = False
...

Following the design, the __evaluate_possible_url function was easy to code. If there is no character preceding the location of base URL prefix or if it is a whitespace character, then the algorithm continues. From there, it grabs the next three characters after the found base URL prefix, verifying that the next two characters are //. Given that verification, the only thing left was to make sure that the character after the sequence // is a non-whitespace character, which was added with ease.

def __evaluate_possible_url(
    self, source_text, url_prefix, found_index, context, token
):
    if found_index == 0 or source_text[found_index - 1] in (" ", "\n"):
        url_start_sequence = source_text[found_index + len(url_prefix) :]
        if (
            len(url_start_sequence) >= 3
            and url_start_sequence.startswith("//")
            and url_start_sequence[2] not in (" ", "\n")
        ):
            (
                column_number_delta, line_number_delta,
            ) = ParserHelper.adjust_for_newlines(source_text, 0, found_index)
            self.report_next_token_error(
                context,
                token,
                line_number_delta=line_number_delta,
                column_number_delta=column_number_delta,
            )

In verifying that all scenario tests were passing for this rule, I quickly noticed that while the rule portion was triggering the rules properly, it triggered them with the wrong line/column pair. Having written the __adjust_for_newlines function as part of the work for Rule Md044, it seemed wasteful to have to write that rule again. As such, I refactored that function into the parser_helper.py module to make it accessible to both rules.

And as it was Sunday morning when I finished this rule, I hoped I had enough time to work on another rule before starting to write this article. As such, I started with Rule Md028.

Rule Md028 - Blanks In Block Quotes¶

No matter how many times I am asked the question about what the most important construct or algorithm that I have learned is, the answer is always the same: the finite state machine. While there are a lot of second places finishes for that title, I have just found that my learning and my experience with finite state machines have paid for itself many times over. Even though the heart of a finite state machine is the simple concept of tracking transitions, it is a useful tool in my toolbox. And sometimes the most useful tools are the simple tools applied properly.

It was with a bit of a smile on my face that I looked at this rule and determine right from the start that it was going to need a finite state machine.

Design¶

Perhaps it is the many parsers and other systems that I have written over the years, but I instinctively knew that this problem would require a finite state machine. To properly detect this scenario, the rule needed to first look for an end Block Quote token, then one or more Blank Line tokens, and finally a start Block Quote token. If at any point it does not find the type of token it needs to move on, the rule needs to reset its state to look for the end Block Quote token again. To me, those all looked like simple state transitions that needed something to guide them. A finite state machine it was.

Implementation and Testing¶

Knowing that this rule would require a finite state machine made the task of creating the scenario tests easier. Like my approach with the last rule, I started with the simple case and worked outwards, following the transitions in the state machine. When I was done, I had eleven scenario tests, including a couple of tests that included nesting with Block Quotes and Block Quote and Block Quotes and Lists.

As with all finite state machines, the implementation was all about transitions:

__look_for_end_of_block_quote = 0
__look_for_blank_lines = 1
__look_for_start_of_block_quote = 2

...

if self.__current_state == RuleMd028.__look_for_end_of_block_quote:
    if token.is_block_quote_end:
        self.__current_state = RuleMd028.__look_for_blank_lines
        self.__found_blank_lines = []
elif self.__current_state == RuleMd028.__look_for_blank_lines:
    if token.is_blank_line:
        self.__current_state = RuleMd028.__look_for_start_of_block_quote
        self.__found_blank_lines.append(token)
    elif not token.is_block_quote_end:
        self.__current_state = RuleMd028.__look_for_end_of_block_quote
else:
    assert self.__current_state == RuleMd028.__look_for_start_of_block_quote
    if token.is_block_quote_start:
        for next_blank_lines in self.__found_blank_lines:
            self.report_next_token_error(context, next_blank_lines)
        self.__current_state = RuleMd028.__look_for_end_of_block_quote
    elif token.is_blank_line:
        self.__found_blank_lines.append(token)
    else:
        self.__current_state = RuleMd028.__look_for_end_of_block_quote

This implementation followed my design to the letter, with some tweaks added in later to address nested Block Quotes: look for the end of a Block Quote, then one or more Blank Lines, and then the start of another Block Quote. If anything fails, reset to looking for the end of a Block Quote.

The tweak that I added to my design was to allow for multiple blanks lines to be detected and stored in the list __found_blank_lines. Then, if a Block Quote start is found after those Blank Lines, the reporting of the rule failure can be reported using the tokens for the Blank Lines instead of the start Block Quote token. Nothing too big, but a good tweak to ensure the reporting was clear as to where the failure was.

Nice and simple. Did I mention I love finite state machines?

An Interesting Side Note¶

During the implementation process, an interesting thing happened: I found a parser bug with nesting Lists and Block Quotes.

- > This is one section of a block quote

  > This is the other section.

Thinking that the rule not firing was because of something in the rule, I added debug information to figure things out. For whatever reason, when that Markdown document is parsed, it ends up creating a Block Quote element that is empty, followed by a Paragraph element. I added that one to the Issues List and wrapped up things for the week.

What Was My Experience So Far?¶

Wow! This article went on quite a bit longer than I thought it would. Believe it or not, when I started writing this article, I was worried that I would not have enough content for a proper article.

But the reality is that in the last week, I was able to knock 9 rules off the To-Do list. That brings the totals from 10 rules completed to 19 rules completed and from 21 rules left to implement down to 10 rules left to implement. That honestly is a lot more rules than I expected out of this last week. It was a good surprise though, and it just feels good to be making more progress!

Nothing more than that this week… just trying to chew through the list of rules to implement as fast and efficiently as possible.

What is Next?¶

I must be honest dear readers… the fact that I passed on Rule Md027 last week is starting to get to me. I think I will try and work on that this week. Stay tuned!

For the other three types of links, a Link Reference Definition element must be used. As a Link Referenced Definition element must contain a URI, and only an Inline Link element does not require one, only the Inline Link element was required for testing. ↩

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments

Markdown Linter - Continuing Progress On Implementing New Rules

Summary¶

Introduction¶

What Is the Audience for This Article?¶

Working Fast and Efficiently - Part Two¶

Rules Md039 and Md040 - Getting Lucky¶

Design¶

Implementation and Testing¶

Rule Md042 - No Empty Links¶

Implementation and Testing¶

Rule Md045 - Alternate Image Text¶

Implementation and Testing¶

Rule Md046 - Code Block Style¶

Design¶

Implementation and Testing¶

Rule Md048 - Code Fence Style¶

Implementation and Testing¶

Rule Md044 - Capitalization Of Proper Names¶

Design¶

Implementation and Testing¶

Next Iteration: Other Elements¶

Wrap Up¶

Rule Md034 - Base Url Used¶

Design¶

A Good Course Change¶

Implementation and Testing¶

Rule Md028 - Blanks In Block Quotes¶

Design¶

Implementation and Testing¶

An Interesting Side Note¶

What Was My Experience So Far?¶

What is Next?¶

Comments

Reading Time

Published

Markdown Linter Beta Release

Category

Tags

Stay in Touch