Markdown Linter

Introduction¶

In the previous articles in this series, I discussed the requirements for the Markdown linter that I am writing. From a development point of view, the main requirement is the need for an accurate stream of tokens emitted by the parser. Due to the absence of any Markdown-to-token parsers out there, I need to write a new parser that outputs an accurate stream of tokens instead of a stream of HTML text. With the last article showing the patterns I am using to test the parser, it is now time to figure out a set of good strategies for the project, to ensure I can complete it without losing my confidence (and sanity).

Why Is Strategy Important When Testing?¶

When my son was younger, like most boys in his age group, he loved playing with LEGO and he loved the idea of robots. I mean, come on! I am a lot older than him and I still like LEGO and the idea of robots! Anyhow, at his school they advertised for 5th grade students that were interested in participating in a local FIRST Lego League robotics team. From the first mention of it, he was hooked. As they needed some parents to help, I participated with him as a coach. That position was a very rewarding, very humbling, and very frustrating experience. Rewarding because I got to help 5th graders learn a little taste of what I did every day at work. Humbling because the look in the kid’s eyes when they really understood something reminded me of the benefits of being a coach. Frustrating because of almost all the rest of the time between those two types of moments.

I am not sure which parent, coach, or teacher helped me with a little gem of wisdom, but I remember it as clear as day: People have problems moving boulders, people have success moving pebbles. The idea behind that phrase is that if a team is confronted with a problem, it is like encountering a boulder that you need to move out of the way. Upon seeing a big boulder, many people looking at it and say something similar to “Wow! That is too big to move!” But if you take that boulder and break it down into smaller rocks, such as pebbles, many people will just laugh with ease at moving those rocks, even if they must do it one at a time. In a similar fashion, breaking down a big problem into smaller problems is a necessity in problem solving a situation. The boulders-to-pebbles phrase is a phrase I still use to this day when coaching people in both my professional and personal lives.

Writing a parser that handles anything more significant than a single line of text is definitely “a boulder”. I have been writing parsers for the better part of 25 years, and those parsers are still boulders to me. However, I know from experience that breaking down that “boulder-sized” task into more “pebble-sized” tasks is something that works and works well. So here are the various items of my strategy for this project.

Strategy 0: Define and Execute Testing, Linting, and Formatting¶

For me this is a strategy that I bring to almost every project, with very few exceptions. I always start with a workflow template that I apply to the project that performs formatting of the source code, linting of the source code, and executes the testing framework. Since I am a stickler for this approach, the setup for this workflow usually takes 5 minutes or less, as I usually have at least one example project lying around. By consistently executing this workflow before committing any changes, I keep the quality reasonably high as I go.

Knowing that I had this framework in place for the Markdown parser was a godsend. My preference is to find frequent small break points during the implementation of a feature, and to use those points to run the workflow. For me, it increases my confidence that I am either establishing a new “last known good point” or that I need to retrace my steps to the last known good point to address an issue. That confidence helps me go forward with a positive attitude.

Strategy 0A: Suppress Major Issues Until Later¶

This may seem like somewhat of a counter to Strategy 0, but I see it more of allowing the project to grow but being reminded that there is work to do. Minor issues such as stylistics and documentation are handled right away, as they have a direct impact on the maintainability of the code as it moves forward. Major issues usually involve a larger amount of code and changing that much code usually has a fair amount of side effects unless you work to prevent those side effects.

Major issues are usually of the “too many/much” type, such as “too much complexity”, “too many statements”, or “too many boolean statements”. When I get to a good and stable point in the project, I know I will deal with these. If I deal with the issues before I get to such a point, I am taking a chance that I won’t have the stability to make the change, while limiting and dealing with any potential side effects in a clean and efficient manner.

What is a good and stable point? For me, such a point must have two dominant characteristics. The first is that I need to have a solid collection of tests in place that I can execute. These tests make sure that any refactoring does not negatively affect the quality of the code. The second characteristic is that the source code for the project is at a point where there is a large degree of confidence that the code in the section that I want to refactor is very solid and very well defined. This ensures that I can start looking for commonalities and efficiencies for refactoring that will enhance the source code, but not prematurely.

Strategy 1: Break Tests and Development into Task Groups¶

Following the principle of keeping things at a good size, don’t plan the entire project out ahead of time, but make sure to break things down into the groups of tasks that are needed as you need them. Following an agile approach, make sure you have a good idea of what needs to be done for a given task group, and do not worry about any more details of it until you need to. And when you reach that point, reverify the tasks before going forward and flushing out the details.

For this parser, the GitHub Flavored Markdown specification delineates it’s groups by the features in Markdown that are implemented. Aligning the groups specified in that document with the groups for tests and development was a solid choice from a tracking point of view. One of the reasons that I feel this worked well is because these feature groups have anywhere between 1 and 50 examples in each group. While some of the larger ones were a tiny bit too big, for the most part it was a manageable number of scenarios to handle in each group.

Strategy 2: Organize Those Task Groups Themselves¶

Once the task groups have been identified, take a step back and organize those task groups themselves. There are almost always going to be task groups that have a natural affinity to be with similar task groups, so do so. By doing similar tasks in groups, it will help identify refactorings that can be accomplished later, as well as the efficiency benefits from repeating similar processes. Especially with a larger project, those little efficiency benefits can add up quickly.

As with the previous strategy, the GitHub Flavored Markdown specification comes to the rescue again. There are some implementation notes near the end of the specification that provide some guidance on grouping. The groups that I recognized were container blocks, normal blocks, and inline parsing. Normal blocks are the foundation of the parsing, so it made sense to schedule those first. Container blocks (lists and block quotes) add nesting requirements, so I scheduled those second. Finally, once all the block level tasks are done, inline parsing (such as for emphasis) can be performed on text blocks derived at after the processing of the normal and container blocks. After re-reading the end of the specification, the example that they gave seemed to indicate that as well, so I was probably on a decent path.

Strategy 3: K.I.S.S.¶

As I mentioned in the last article, I am a big proponent of the K.I.S.S principle. While I usually arrive at an end project that has lots of nice classes and functions, worrying about that at an early stage can often be counter-productive. Even if it means doing ugly string manipulations with variable names that you know you will change, that approach can often lead to cleaner code faster. Worry about getting the logic and the algorithms right first, and then worry about making it “look pretty”.

A good example of this is my traditional development practice of giving variables and functions “garbage names” until I am finished with a set of functions. Yes, that means during development I have variable names like “foobar”, “abc”, “sdf”, and “ghi”, just to name a few of them. When I am creating the function, I maintain a good understanding of what the variables are doing, and I want to concentrate on the logic. Once the logic is solid, I can then rename the variables to a descriptive name that accurately reflects its purpose and use.

I am not sure if this process works for everyone, but for me, not focusing on the names helps me focus on the logic itself. I also find that having a “naming pass” at the function when I am done with the work helps me to give each variable a more meaningful name before I commit the changes. Once again, this is one of my development practices that helps boost my productivity, and I acknowledge it might not work for everyone.

For the parser, I employed this strategy whole-heartedly. The first couple of groups of work on the parser were performed by dealing with strings, with the only class for the parser being the single class containing the parsing logic. Once I got to a good point (see above), I moved a few of the parsing functions and html functions into their own static helper modules. Up until that point, it was just simpler to be creative with the logic in a raw form. After that point, it made more sense to identify and solidify the logic that encapsulated some obvious patterns, moving those algorithms into their own classes for easy identification.

As with many things, finding the right points to perform changes like this are difficult to describe. I can only say that “it felt like the right time for that change”. And as I commit and stage code frequently, if I made a mistake, I could easily rewind and either retry the change, or abandon it altogether.

Strategy 4: Use Lots of Debug Output¶

There is a phrase that we use at work called “TTR” or Time-To-Resolution. This is usually measured as the time taken from knowing that you have a problem until the time that the problem is resolved and its solution is published. Added during development and debugging, spurious debug output can help provide a journal or log of what is happening in the project, allowing for a more comprehensive comparison of the output of a passing test with the output of a failing test at the same time.

To be clear, using a debugger to load the code and step through it as it executes is another way to debug the code. In fact, in a decent number of situations I recommend that. However, I find that the downside is that I do not get to see the flow through the code in the same way as with lots of debug statements. As with a lot of things, determining the balance between debug output and using a debugger will differ for individual developers and for individual projects.

Another benefit of the debug output approach is the transition from debug output to logging. Once the project has been sufficiently stabilized and completed, one of the tasks that arises is usually to output useful log messages at various points throughout the code. I personally find that a certain percentage of the debug output that was good enough to emit during development can become quality log messages with only small changes.

The parser development benefitted from this strategy. Within a given task group, there were often two Markdown patterns that were almost the same. Sometimes it looked like they should be parsed differently and sometimes I couldn’t figure out why they weren’t parsed differently. By examining the debug output for both cases, I was able to verify whether the correct paths were followed, and if not, where the divergences occurred. Sure, the debug was cryptic and most of it never made it in the final version of the parser. But when I needed to debug or verify during development, it was invaluable.

Strategy 5: Run Tests Frequently¶

Don’t only run tests when a set of changes is ready to commit, run those tests frequently during the development of each task. If done properly, most tests are there to verify things are as they should be, and to warn of changes or situations that fall outside of the requirements. If something is wrong, it is better to look through the last feature added to determine what the problem is, rather than trying to determine which of the last 5 features introduced that bad behavior. Therefore, by executing the tests frequently, either the confidence that the project is working properly increases or there are early and frequent indications that something is wrong.

During the development of the parser, the tests were instrumental in making sure that I knew what features were “locked down” and which features needed work. By keeping track of that when adding a new feature, I could easily see when work on a new feature caused a previously completed feature to fail its tests. At that point, I knew I did not have the right solution, but I also had confidence that the changes were small enough to handle.

Also, as the specification is large, there were often cases that were present but not always spelled out in the documentation as well as they could have been. However, time and time again, the saving grace for the specification were the examples, now scenarios and scenario tests in my project, sterling examples of what to expect. And as I took care to make sure they ran quickly; I was able to run all of the scenario tests in less than 10 seconds. For me, taking 10 seconds to ensure things were not broken was well worth the cost.

Strategy 6: Do Small Refactors Only at Good Points¶

While this strategy may look like a repeat of Strategy 0A: Suppress Major Issues Until Later, the scope for this strategy is on a smaller, more local level. Where Strategy 0A talks about refactoring major issues later, there are often obvious minor refactors that can be done at a local level. These changes are often done right after a function is written to fulfil a feature and rarely includes more than one function. A good example of this is taking a function that performs a given action twice with small variations and rewriting that function by encapsulating that repeated action into its own well-named function.

While such refactors almost always improve the code, care must be taken to strike a good balance between making each method more readable and trying to optimize the function ahead of time. For myself, it is often more efficient for me to see the raw code to recognize patterns from rather than already refactored code. Unless I am the author of the refactored code, I find that I do not see the same patterns as with the raw code. As with many things, “Your Mileage May Vary”.

When implementing the parser, this strategy was effectively applied at the local level to improve readability and maintainability. There were quite a few cases where the logic to detect a given case and the processing of that case were complicated. By assigning the detection of a given case to one function and the processing of that case to another function, the border between the two concepts was enhanced, making the calling function more readable. As this kind of refactoring occurred at the local level, it employed this strategy quite effectively.

How Did This Help?¶

For one, I had a plan and a strategy to deal with things. As always, something would happen during development which would require me to re-assess something. Given the above strategy, I had confidence that I would be able to deal with it, adjusting the different parts of the project as I went.

Basically, I took a boulder (writing a parser) and not only broke it down into pebbles (tasks needed to write the parser), but came up with a set of rules (strategy) on what to do if I found some rocks that were previously unknown or larger than a pebble. As I mentioned at the start of the article, it is a fairly simple bit of wisdom that I was taught, but what a gem it is!

What Comes Next?¶

In the next article, I take the requirements, scenarios, and strategies and put them together to start writing the parser. As one of the test groups that I came up with was normal Markdown blocks, I will describe how I implemented those blocks as well as the issues I had in doing so cleanly.

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments

Markdown Linter - Parser Testing Strategy