Markdown Linter - Getting Back To New Rules

Summary¶

In my last article, I talked about continuing that work and dealing with the remaining nested block scenario tests. In this article, I talk about starting to tackle the long list of rules that are not yet implemented.

Introduction¶

It feels like forever since I have done any development work on the PyMarkdown project. And I must say, it really does feel good to be back. It is involving a lot of hard work and a lot of pushing through cobwebs, but it still feels good. It also helps that during that time, I was able to document what I was feeling and work some more through those issues. While it was not coding, it did help me work through things, and that was good.

Below, I start off the main part of the article talking about how I decided where to start working on the project. I will spoil the surprise and say that it is expanding the number of implemented rules. Going into more detail below, I quickly found it obvious that in a linter that is essentially a rule engine, I need to get more rules implemented to verify that everything is working properly. And with 31 rules slated to be implemented before the version 1.0 release, I had some work to do. And with me looking for something fresh to work on, it was the obvious choice!

What Is the Audience for This Article?¶

While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please consult the commits that occurred between 17 Jul 2021 and 25 Jul 2021.

Where To Begin?¶

As I mentioned in my articles Autism, Stress, and Anxiety and Developers, Give Yourself Permission, I been out of it for a couple of weeks, recovering from something I caught while doing some normal work around my house. It really knocked my down for a while, and I am still partially recovering to this day. Mix into that the usual stuff I deal with at work and around my house, and I just have not had a lot of time to focus on the project lately. It took me a bit of time to build up momentum, but I am now at the point where I feel that I can devote some time to project while in the right mindset.

But where to begin? Looking at the various items in the Issues List, I am glad that I have kept them together in a single list. Why? Because of the way I analyze things, I find that it gives me a more complete picture when I can see everything in a simple overview mode. And it is there that I started looking.

Looking at that list, I quickly noticed that there were three main categories of issues that I need to eventually tackle. The first section of the list contains features that need to be added to the project. Without these features in the project, I do not feel that the project should ship a full 1.0 release, so they are very important. The second section contains issues that need to be examined and dealt with to make sure that things are working properly. If I had to sum up this section, these were typically added from tests that were mostly passing, with me punting the extra work until a later time. Finally, the remaining items are “nice to haves”, issues that I would like to see completed and dealt with but have a lower chance of being hit by users. I hope.

When I was taking a high-level look at the issues list like that, the big thing that hit me was that a lot of missing features have to deal with rules. Basically, I see developer documentation, one item for pragmas, one item for front matter as YAML, then the rest are about rules. That made the decision obvious to me. I needed to make some good headway into implementing those missing rules. And what better place to start than at the start of the list. Away I went.

Unordered List Elements And Styles¶

As soon as I started to look at this rule, I knew I was rusty. It probably did not help that I was still recovering, or at least that is what I told myself. Whether that was true or not, I cannot tell at this point. I can only say that it is what I felt at the time that I started working on this rule.

Design¶

Due to me taking some time off, coming up with the design for this rule took me a fair bit longer that I had hoped. I tried a couple of times to do something very clever, but those iterations of the design quickly found their way to my trashcan. In the end, the easy solution won out. I guess with some patterns like that, I have a bit more progress to go on keeping things simple from the start.

The easy part of the design that that part of the configuration enabled a static mode, and the other part enabled a dynamic mode. For the static mode, the Unordered List element start character was set for the entire document, no exceptions. The two dynamic modes allowed for a different starting character to be specified. For the default mode, consistent, once any Unordered List element is started, that starting character must be used for all Unordered List elements in the document. A slight variation, sublist, differs in behavior in that once any Unordered List element is started at that sublist level, that starting character must be used for all Unordered List elements at that sublist level.

To that extent, the design ended up being very simple. I first needed an array to capture the starting character to use for that level, and an index to specify what level of sublists the rule was currently at. If the entry does not exist, it needs to be initialed with either the current starting character if in dynamic mode or the configured character if in static mode. Once that was all set up, the rest of the design is simply to compare the current start List Item character against that array and if they do not match, report an error.

Coding¶

I knew I was still trying to build momentum, so I was probably more cautious than I needed to be, checking everything three or four times. In the end, I implemented the rule almost exactly as I had designed it:

def starting_new_file(self):
    self.__actual_style_type = {}
    self.__current_list_level = 0
    if self.__style_type not in (
        RuleMd004.__consistent_style,
        RuleMd004.__sublist_style,
    ):
        self.__actual_style_type[0] = self.__style_type

@classmethod
def __get_sequence_type(cls, token):
    if token.list_start_sequence == "*":
        return RuleMd004.__asterisk_style
    if token.list_start_sequence == "+":
        return RuleMd004.__plus_style
    assert token.list_start_sequence == "-"
    return RuleMd004.__dash_style

def next_token(self, context, token):
    if token.is_unordered_list_start:
        if self.__current_list_level not in self.__actual_style_type:
            self.__actual_style_type[
                self.__current_list_level
            ] = self.__get_sequence_type(token)

        this_start_style = self.__get_sequence_type(token)
        if self.__actual_style_type[self.__current_list_level] != this_start_style:
            self.report_next_token_error(context, token)
        self.__current_list_level += 1
    elif token.is_unordered_list_end:
        self.__current_list_level -= 1

Instead of using an array to hold the start characters, I decided to use a dictionary to see if it made the implementation any more redable.¹ In the starting_new_file function, I made sure that the __actual_style_type and __current_list_level member variables were properly set to ensure that the static cases were taken care of without any extra code.

Leaving the heavy lifting to the next_token function, the biggest part of its algorithm is to ensure that the correct list level is tracked in the __current_list_level member variable. If the __actual_style_type dictionary does not contain an entry for the current list level, one is created. Then the current start character for the current List Item element is compared against the entry for the current list level. If those two objects do not match, a rule error is reported.

The hard part here was not trying to get ahead of myself with test scenarios and rule implementation. I started with the simple static scenarios and the consistent setting before moving on to the sublist setting. Instead of taking the time to do things in the proper order, I wanted to jump ahead. For me, that was not a good idea. I had to take the time to relearn the patience that I have with my development process, making sure I practiced using the process until I was more comfortable with it.

Testing¶

As I have mentioned many times in my articles, I have often found that one of the most powerful tools in my testing arsenal is documentation. This was no exception.

As I was documenting this rule, I walked through the different scenarios in my head, and I discovered that I had missed something during my design phase. Basicaly, the first part of the next_token function is this:

    if self.__current_list_level not in self.__actual_style_type:
        self.__actual_style_type[
            self.__current_list_level
        ] = self.__get_sequence_type(token)

This code is meant to initialize the current list level’s type if it was not already set. The problem was it was not doing that.

Working through my design, I found that I had missed a set of scenarios that only came into play without sublists. In other words, I was so focused on getting sublists correct, I did not pay enough attention to getting the other scenarios worked out properly. It was time to fix that!

Reworking those scenarios, I started to see what the problem was. In the above code snippet, if the style was not set, the function sets it to the sequence type of the current token. For the sublist setting this worked fine and for the very first invocation with the consistent setting this worked fine, but only in those scenarios. In every other case, the self.__actual_style_type[0] variable held the value to compare to.

With that knowledge, I adjusted my design, followed through with the scenarios, and came up with the following code change:

    if self.__style_type in (
        RuleMd004.__sublist_style,
    ) or (self.__style_type in ( RuleMd004.__consistent_style) and not self.__actual_style_type):
        self.__actual_style_type[
            self.__current_list_level
        ] = self.__get_sequence_type(token)
    else:
        self.__actual_style_type[
            self.__current_list_level
        ] = self.__actual_style_type[0]

Paying extra attention to scenarios, I walked through them all individually, taking my time to make sure I did not miss another scenario. As far as I can tell, this time I did not. But I know that I am going to have that feeling that I missed something else go through my mind for a while.

Consistent List Element Indentation¶

The last rule took a lot longer to design and get coded than I had hoped it would. So my big goal for this rule was to get back into a better cadence of designing the rule and implementing it. And as far as I know, the only way to do that is to start designing, start testing, and start coding.

Design¶

The design for this rule rapidly emerged in my mind without much effort. The fundamental principle for this rule is that the indentation required for each level of a list, ordered or unordered, must be the same. At that point, I knew I needed to maintain a stack of the current indentation for that level. Reading a bit more and experimenting with the original rule, it became obvious that each top-level list resets the information for itself and any contained list. That meant that a new top-level list should clear the stack of any indentation measurements from a previous list.

From there, the design got a bit tricky. If the list is an Unordered List element, then the indentation must be maintained for this rule not to fire. That was the easy one. For an Ordered List element, there are two options: align to the left or align to the right. The difference is that if you align the Ordered Lists to the left, the list looks like:

...
8. Item
9. Item
10. Item
...

But, if you align that same list to the right, the list looks like:

...
 8. Item
 9. Item
10. Item
...

That was a bit tricky to design around, but after some debugging, I had it ironed out. The left aligned list is easy, with the column_number member of the token matching up if the list is left aligned. If the list is right aligned, then any extracted whitespace plus the list item content should be the same size. It took a bit to verify that, but everything seemed good, so I started working on coding the rule.

Coding¶

Using Test Driven Development, I started at the beginning with tests such as test_md005_good_unordered_list_single_level and test_md005_bad_unordered_list_single_level. From there I was able to get a simple implementation of the design in place which satisfied those two tests. Once those were passing, I added tests that contained two levels of lists, once again providing both good and bad examples of each. Taking the same approach, I implemented the design into code and worked through some issues.

The rest of the implementation followed the same pattern without any issues. While the implementation was a bit more verbose than the design itself, I was able to iterate and get it to work cleanly in short order:

def next_token(self, context, token):
    if token.is_unordered_list_start or token.is_ordered_list_start:
        self.__list_stack.append(token)
    elif token.is_unordered_list_end or token.is_ordered_list_end:
        del self.__list_stack[-1]
    elif token.is_new_list_item:
        if self.__list_stack[-1].is_unordered_list_start:
            if self.__list_stack[-1].indent_level != token.indent_level:
                self.report_next_token_error(context, token)
        elif self.__list_stack[-1].column_number != token.column_number:
            original_text = self.__list_stack[-1].list_start_content
            if self.__list_stack[-1].extracted_whitespace:
                original_text += self.__list_stack[-1].extracted_whitespace
            original_text_length = len(original_text)
            current_prefix_length = len(
                token.list_start_content + token.extracted_whitespace
            )
            if original_text_length == current_prefix_length:
                assert token.indent_level == self.__list_stack[-1].indent_level
            else:
                self.report_next_token_error(context, token)

Because this rule handles both Ordered and Unordered List elements, the first two parts of the if statement make sure to manage the stack properly. As the first List Item for each list sets the indentation level, there is no need to do any processing with those start List elements, just recording them.

From there, the easiest thing to do was to get the easy case, Unordered List elements out of the way, as it was a trivial check. Similarly, the left aligned case of Ordered List elements is just as trivial, so I used the inverted condition (the column numbers are NOT equal) to check for a right alignment issue. Following the debugging and the design, the original_text_length is computed from the start List element and the current_prefix_length is computed from the current List Item element.

Testing¶

While there were a couple of typing errors on my part that caused errors, the design for this rule was correct from the start, with no changes required. As such, the big part for this rule was coming up with different data combinations and comparing their results against the original rule.

And it was good that I took that step to try and find more scenarios for this, as I found an issue with the parser. Specifically, while this Markdown is a valid list containing a valid sublist with two items, the parser does not believe so:

1. Item 1
    1. Item 1a
   100. Item 1b

Verifying this against Babelmark2, it confirmed my analysis that the sublist should contain two items. But when I looked at the parser output, it was wrong. What happened? In this case, while the parser is flexible on where the list item does start, it is not as flexible on where the list item can start. As a result, starting the second list item with more indentation than the first item works:

1. Item 1
   1. Item 1a
    10. Item 1b

but starting it with less indentation does not work. I needed to figure out what the issue was.

But after taking a good hour looking at the debug output, I was not able to see what the issue was. At that point, I did not want to stop my momentum, so I added a new item to the Issues List and started to keep notes on other test scenarios to try and moved on.

Starting Top-Level Unordered Lists At The Start Of The Line¶

Knowing that I was starting to get back “in the groove”, I decided to tackle another rule and get it completed before writing this week’s article. Having completed the rule for the previous section, I was pretty sure that a similar approach would be useful in designing a solution.

Initially disabled, I was not sure why this is the case in MarkdownLint, the description of which I am using as the inspiration for the rule. Taking a solid look at both rules MD006 and MD007, I found that there was overlap, but not a duplication of function. Taking a closer look at the description, I then noticed that there were two parameters for this rule: indent and start_indented. With the default values, start_indented would be assigned a value of False, requiring that the top-level list did not start with any indentation.

With that taken care of, everything was good in the world, and I proceeded to plan out a design for this rule.

Design¶

Having just completed the design, coding, and testing for MD005, the design for this rule was easy. I needed to keep a stack for most of the same reasons as with the previous rule. The only difference was that I only needed to be concerned with Unordered List elements that were at the top-level. From a design point of view, I still needed to know what kind of list the List Item element was in, so that part of the design remained the same.

The part that really changed is the trigger condition. Instead of a complicated calculation, this calculation was simple. When processing either a start element or a List Item element, if it was the first element on the stack and it was a part of an Unordered List element, further checking was required. That further checking was also simple: did the token for the element start in the first column. If not, the rule was violated.

I worked this out on paper a couple of times, just to make sure, but it was sound. It seemed too simple, so I just double checked that I did not take any shortcuts that would hurt the algorithm in the end. After that extra checking, I proceeded on to the next phase.

Coding¶

Sometimes in Test Driven Development, I find myself iterating between generating new tests, coding to meet those new tests, and then generating more tests and starting over again. In this case the rule was simple enough that I was able to derive five different tests before I started coding. Between that and my design phase, the coding once again went off without any major issues.

def next_token(self, context, token):
    if token.is_unordered_list_start or token.is_ordered_list_start:
        self.__list_stack.append(token)
        if (
            len(self.__list_stack) == 1
            and self.__list_stack[-1].is_unordered_list_start
            and self.__list_stack[-1].column_number != 1
        ):
            self.report_next_token_error(context, token)
    elif token.is_unordered_list_end or token.is_ordered_list_end:
        del self.__list_stack[-1]
    elif token.is_new_list_item:
        if (
            len(self.__list_stack) == 1
            and self.__list_stack[-1].is_unordered_list_start
            and token.column_number != 1
        ):
            self.report_next_token_error(context, token)

Why Am I Stressing Designing, Testing, and Coding Together?¶

To be honest, part of my stressing of that process is for myself, and part of it is for the readers out there. Having benched myself while I got better, I felt a bit of an urge to jump into the coding process without doing any design. Think about it from my point of view. I have been working on this project for over a year, I was at a standstill for a week, and I want to make progress. Who would not want to go fast?

But even so, I talked myself out of it and wanted to be very pedantic about following these steps, even with simple rules like these three rules. And to be honest, in these cases, I could have probably skipped the design step and winged it, as I would consider these three rules to be either medium-level or low-level difficulty. From my experience, it is exactly in those scenarios that you want to keep to the process. You want to get that muscle memory for the process set, so that when you get to the harder problems, it is just second nature.

Now, Test Driven Development may not be everybody’s best way to develop, but it is for me. I strongly urge any readers out there to figure out a small number of development strategies, one if possible, and follow that strategy no matter what. For me, I have seen that it just helps me know what is coming next, and my mind has that muscle memory in place, ready to go when I need it.

Basically, to each reader, find some process that works, and follow that every single time if possible. If your results are anything like mine, you will see improvements.

What Was My Experience So Far?¶

Out of 31 rules, I got 3 tested and implemented. Seeing as I just came back from dealing with being sick, that is not too bad. I was hoping to get four or five rules designed, implemented, and tested, but three is a good start.

And yes, I had to deal with a bit of… well… is it ego or pride or work ethic? I wanted to do more work on the project, but my body did not have enough drive or momentum to get me there. So, I do not think it is ego. And I am not feeling hurt that someone told on me that I did not do enough work. Therefore, I am guessing that it is not pride. So hopefully it is my work ethic. Who really knows sometimes?

But I know that I need to gain more momentum. Unless I want to take nine weeks to finish all the rules², I need to get through the different phases more efficiently and get more rules done. To do this, hopefully I will notice different patterns that I can reuse and cut that time down. With that and a solid work ethic, I am hoping to cut down an estimated nine-week time frame to complete all the rules into a block of about five weeks. To be honest, five weeks would be nice, anything under seven weeks and I will be happy. I really want to make this project shine and get it out there!

But three rules done on my week back? That is not too bad… for now.

What is Next?¶

What else? More rules. Though hopefully, after getting some momentum going, I can get more than three done in the next week. Stay tuned!

As of the writing of this article, I am still undecided. At the very least, it is as readable. ↩
31 rules, 3 rules per week = 10.333 week, with one week already completed. ↩

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments

Markdown Linter - Getting Back To New Rules

Summary¶

Introduction¶

What Is the Audience for This Article?¶

Where To Begin?¶

Unordered List Elements And Styles¶

Design¶

Coding¶

Testing¶

Consistent List Element Indentation¶

Design¶

Coding¶

Testing¶

Starting Top-Level Unordered Lists At The Start Of The Line¶

Design¶

Coding¶

Why Am I Stressing Designing, Testing, and Coding Together?¶

What Was My Experience So Far?¶

What is Next?¶

Comments

Reading Time

Published

Markdown Linter Beta Release

Category

Tags

Stay in Touch