Markdown Linter - Road To Initial Release - Learned and Useful Things

Summary¶

In my last article, I documented how I worked hard to get to complete the unprioritized items in my issues list. This article takes a break from all that hard work to look back over the last year’s work, and what I learned on my path to this point.

Introduction¶

Just over a year ago, I came across a problem that I really wanted to solve. I had been writing articles and posting them on my personal blog for about five months and I was not happy with some parts of that process. I was improving my writing process and finding my voice, so that was not the problem. It was the lack of process around the consistency of the articles that I felt was the problem. I needed a tool that would help me maintain the consistency of my articles as a group, ensuring they all followed some basic rules that I wanted to enforce. It was then that I knew I wanted to write my own Markdown linter to do exactly that.

During the time that I have been working on this project, I have learned a lot about myself and some good practices to keep while undertaking such a project. I will not be so bold as to call these “Best Common Practices and Learnings”, but I will definitely step up and own them as “What Jack Learned and Finds Useful”. I know it is not as shiny of a title as the first one, but it is one that I feel I can own while remaining honest and keeping a relatively level ego.

What Is the Audience for This Article?¶

While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. Unlike past articles in this and related series, the articles used as inspiration for this article range from my first article on PyMarkdown titled Markdown Linter - Collecting Requirements to the article before this titled Markdown Linter - Delving Into Issues 19.

Learning #1: Find The Passion For Your Project¶

It might seem silly to some, but when it comes to a lot of things in my life, I like a good, solid, dependable process. It is okay with me if that process grows, matures, and changes in response to perceived issues with that process, but not having a dependable process is just one more thing I must worry about managing. Basically, if I know I am going to stress out about something, I try and have a process in place to help me mitigate that stress. So, when it comes to my personal blog, I had a simple process for writing my articles and the beginning of a manual process for publishing those articles, but I did not feel that I had any process around maintaining a solid consistency level throughout those articles. That bothered me.

Looking around for a tool that would fill this gap, the only tool that I found that was close to what I wanted was the Javascript Markdown Lint program, by David Anson. This is a plugin for Visual Studio Code (VSCode) that is easy to use, install, and get started with for linting Markdown documents with VSCode. But even though that project is in wide use and obviously means well, I did not feel that it was the right tool for the job. I wanted to be able to run a finely tuned process over my articles, ensuring their consistency. As Markdown Lint was written to be applied to text being processed by the many Markdown processors out in the wild, there are a fair number of rules that are not as fine-tuned as I would like them. On top of that, I did not feel that Node.Js was a good language for developing a parser in. To me, the NPM library system is good for websites, but lacks the accountability and stability I believe is a requirement for any good parser. For those reasons, I decided to start and write my own Markdown to HTML parser, with a Markdown Linter built on top of that.

At that time, it was relatively easy for me to come up with the basic list of requirements for the project:

must be able to see an accurate tokenization of the markdown document before translating to HTML
all whitespace must be encoded in that token stream as-is
initial tokenization for GitHub Flavored Markdown only, add others later
must be able to provide a consistent lexical scan of the Markdown document from the command line
extending the base linting rules should require very little effort
written in Python

With only one small modification, these requirements have remained the same since I recorded them down in that article on 2019 Dec 08. That one change? That change is a recent change, changing “for GitHub Flavored Markdown only” to “for GitHub Flavored Markdown only (as defined by the CommonMark reference implementation CommonMark.Js version 0.29.2).”

This desire to have a tool that meets these requirements is a large part of the passion that drives me forward on this project. The other part of the passion is made up of my desire to learn and grow. While it might seem (and usually is) dry reading, by writing about the work I did on the project EVERY week, I give myself the ability to look back and see how much I have changed along the way. Writing an article back at the beginning of the project took almost 24 hours before I was happy with the content and style. Now I can usually write an article that I am happy with in 4-6 hours, with breaks in between to spend time with my family. That kind of growth is the other part of my passion, so that works out very well!

When I have trouble making progress forward or sitting down and doing work, these are the main drivers that keep me moving forward through the many obstacles in my way. These are critical to any project that you want to see succeed. I now have my own clear proof that confirms that is the case!

Learning #2: Your Project Is Not Your Life¶

I will easily admit that I worked harder in the first three months of the project (to get it off the ground) than I have at any other point in the project. But even then, I was careful to not work too hard on the project at the expense of other things in my life. Even when the pandemic hit and I was at home all the time, I tried to make sure that I balanced my work on the PyMarkdown project with the other projects that I wanted to do and with spending time with the people in my life. Even though it was often difficult to resist the siren’s call of the latest issue that needed solving, I put them on hold when I needed to.

Do not get me wrong. I often chose to spend some extra time working on the project for one reason or another. I like challenges, and I have a passion for the project, so it makes sense that I would spend extra time on the project. But balancing that passion were the other times where I did not feel that I could give the project my best work, and specifically chose not to work on the project. Sometimes the reason was that I was not feeling well. Sometimes the reason was that the feeling of being stuck at home ALL THE TIME during a pandemic was getting to me. Sometimes, it was just I did not feel like it. Whatever the reason was, if I did not feel that I could give the project the same level of professional respect that I give to my full-time job, I did not work on the project.

For me, I believe that is one of the reasons that I have stuck with this project. Sure, there are times that I do not want to work on “yet another #@$%^& parsing error”, and I have to grit my teeth and work through them. But I also know that if I need to walk away from the issue to rethink my approach to it, that option is also on the table. I am the only one making the decision when to work and what to work on, and that is both empowering and a responsibility. At the same time, that both gives me the freedom to live my life properly, helping me decide how to balance this project and other projects with my life.

Learning #3: Be Honest About Your Goals¶

If you are in a team, make sure you are honest with your team about you goals. If you are the team, make sure you are honest with yourself. I know that might not make a lot of sense or seem that it is too basic, but I believe that it is important to my success with this project. Getting from ground zero to a fully compliant GitHub Flavored Markdown parser was not something that occurred overnight. It took a lot of hard work and a lot of goal setting along the way.

It all started with my first goal on 2019 Dec 16: getting my first scenario test coded. That was followed by my writing down the parser testing strategy that I intended to use to move forward, along with how I intended to capture the scenario information for each test. By setting those goals at the start of the project, I have been able to use them as the North Star for the project, foundational goals that I can refer to if I get disoriented or lost along the way. And from my experience, everyone gets lost at some point, needing some form of light to find their way back. It is just a matter of being prepared for it when it happens.

From that point forwards, I set realistic goals on what I wanted to achieve in the next block of work for the project. Having a single scenario test but no source code to test against, my first goal was to write the parser code behind that first scenario. With that accomplished, I broadened the goal to get the other the Markdown elements parsing, so I started in January with the straightforward blocks, and reached the final Markdown elements, links and images, at the end of April. My next goal was to prove that I could easily write a rule that would take advantage of the PyMarkdown parser, so I spent my time in May working on that initial rule support.

Having come across approximately ten bugs in how the parser handled certain situations, I decided that the next goal needed to be one that would bring stability to the scenario tests. As I am always concerned about the quality of any project that I work on, I was honest with myself that this goal: I would not finish it until I was confident that my solution could catch any failures that I missed. To accomplish that, I started working on the main bulk of the consistency checks in June, stopping at the end of September. Even though I have since completed that work, I still maintain a goal that ensures that any source code changes are accompanied by any required changes to ensure that the consistency checks remain current.

With those checks in pace, my next goal was to do experimental testing, adding new scenario tests for any issues I found or areas that I wanted to make sure were tested. That task took me from the last week of September to the first week of February 2020. There were many times in that time frame where I questioned if I was being too picky about the scenario tests. At those times, I took another look at the GFM specification and the breadth of scenarios that it tries to cover. And while it might sound counterproductive, I also looked at the output from the Babelmark 2 tool. While I spent most of my time looking at the output for the commonmark.js 0.29.2 entries produced by that tool, I also looked at the other output provided for other Markdown parsers. That output reminded me of why I felt that the testing of all these scenarios was important. A couple of lines of Markdown is interpreted into a multitude of different HTML output by different Markdown parsers. But only one set of that HTML output was the right one for this project: the CommonMark one that is the reference parser for the GFM specification.

It is not always easy to get to the next goal but being honest with myself about the amount of work required to get to that next goal helps me deal with it in a concrete fashion. For me, that honesty is represented in the project’s readme.md file. While I will rename it before I release the project, that file provides me with a simple and stark view of what I need to accomplish before releasing the project. Not a list of issues that are easily dismissed, but a cohesive list of issues that I look at every time I open the project.

That level of honesty, about what needs to be done and the values I have, keeps me honest with myself. There is no pushing off the release for years, and there is no skimping on quality. As the television actor is misquoted as saying:

Just the facts, ma’am.

Jack Webb

Learning #4: Find A Process That Works For You¶

After working on this project for over a year, I have a clearly defined process that I use with every change. For any group of scenario tests that I add, I use the North Star processes mentioned in the last section to set up those tests. I then pick one of those tests and first run it without any debug active, forming an initial observation on what I am seeing and why that may be happening. Either to clarify that observation or to confirm that observation, I then enabled debugging and follow through the flow of the data for the specific lines where I believe the issues manifests. If needed, I then refine my observations and examine the output again until I have the clarity that I require to fix the issue. Sometimes it means adding more debug, sometimes it is so obvious that I almost laugh.

With that clarified observation in hand, I then start looking at the source code, making small debug modifications to verify that I am in the right part of the source code for the issue I am observing. I then use that information to help me make a change to the source code to change the behavior of that function from non-compliant results to compliant results. That often takes a number of iterations that can take anywhere from 5 minutes to 5 hours. Once I have that one specific scenario test working with the new code, I execute the entire collection of scenario tests to determine if any of those tests were negatively affected. If so, I note that negative effect as an observation, and take another look at the change I made, altering that change into one that does not have that negative effect on the other tests. Only when all active tests are passing do I consider the change as “good”. And, while it is rare, there are cases where my observations are totally wrong, and I need to back out any changes I made and start from the beginning. Part of my process is that I need to be able to make that call if I find myself hopeless lost.

When I am finished with a group of changes, I go through the changes that I made with my editor, looking for funny variable names and function names that I used as shortcuts during the debugging process. Then I execute my clean.cmd script to start running the Black code formatter and the Flake8 and PyLint linters on the Python source code. As the final stage of that script is to re-run the scenario tests to ensure that they are all passing, I have a double check in place to ensure that I do not commit changes that break existing tests. When those checks pass, I double check the changes, and then run the clean.cmd script with the -p flag, publishing the number of tests and the test coverage to the publish directory, where I can examine them at a later date.

While that entire process is long, it works for me. For me, it is not too complicated or too bulky, it is just right. I keep things at a granular level, so I do not have to worry about big blocks of features or issues to work on, just small changes that need to be completed and verified. And while I might occasionally skip running the clean.cmd script, I know that I will most likely execute it the next time, catching anything that I missed from the time before. As such, I try and run it every time, just to keep the scope of the changes small and manageable.

The big thing here is that this process works for me. It gives me a solid framework to focus on, and it gives me a plan on how to attack each issue that I am working on. Until there are more people on the project, that process does not have to make sense to anyone else or work for anyone else other than me. And I am confident that the process works well for me!

Learning #5: …But Know When To Deviate¶

But even though I like process, there are times that I know I need to deviate from that process for my own good. During the project, there have been a few times where I knew that there were a group of issues that were all going to change code in one specific area. As such, I followed my usual process for the first two steps of my development process, but then delayed the third step of that process. I made that decision with the intent to delay that third step but not to omit it. I felt that it would produce better results if I delayed that third step until all the changes were completed, rather than trying to clean up intermediate steps. So, I deviated.

Another good example is the recent learning I have been doing on Python performance profiling. While I will be delving into what I learned in future articles, I needed some space to start working with the performance tools and learn how to use them effectively. To accomplish this, I took a couple of my usual nights off and spent an hour or so of each night going through some tutorials on cProfile and SnakeViz. Any future performance work will follow the normal process but getting to the point where I felt comfortable enough with the tools took some work. So, I deviated.

Having a process is good, and for me, it is a necessity. It helps me release my mind from figuring out what order to do things in, as I already have a process for that. But being bound to that process with no escape valve is not a good thing. I know I have a personal high bar on when I deviate from the project’s process. But I also know I am honest with myself on when and why I believe I need to deviate. And the two of those concepts working together is what makes my escape valve work nicely.

Learning #6: Know How You Rubber Duck You¶

For reference, I have a little rubber duck on my desk called “Duckie” (named after the character in the movie Pretty in Pink). While the development process of rubber ducking does not require an actual rubber duck to talk to, I had one available and I found that rubber ducking to an actual rubber duck amusing. Duckie also serves as a visual reminder to me to stop and think through what I am doing. If the concept that I am trying to get working is too complicated, how can I make it less complicated? If I can make it less complicated, it should mean that I can solve the problem by breaking it down into smaller, easier to understand problems. And I know it sounds silly, but if I cannot break the problem down into a simple enough problem that I can explain it to Duckie¹ and have Duckie understand it, then I have some work to do.

For me personally, I also know that sometimes I need to walk our dog or just get out and do something else and rubber duck with myself. From my experience, if I stare at a problem for long enough, the answer does not materialize, I just get sore eyes and a headache. By taking a break from the problem that I am trying to solve, I find my mind wanders and just naturally starts sifting through the problem in the back of my head. If you are around me when I am doing this, it will appear that I am going mad, mumbling to myself. But this process helps me sort through stuff, allowing me to then re-engage with the problem with a fresh set of eyes, hopefully observing something new that I missed before. As a plus, I think our dog likes the extra attention he gets during those walks, as I think he thinks I am talking to him when I am actually just muttering to myself.

My final level of rubber ducking is my wife. At various times during the day, we visit each other and ask each other how the other’s day is going. If she needs to talk through something with me, I give her the floor. If I need to talk through something with her, she gives me the floor. We both know that the other is seldomly going to be able to provide actual advice on the subject, but the mere act of talking it through helps. We both know a bit about what the other is doing, enough that we can ask simple questions to seek clarity on certain things that we heard. Between the listening and the simple questions, my wife is one heck of a rubber duck!

The important thing for me is not talking about how I rubber duck to give other people ideas on how they can rubber duck. Instead, I am trying to communicate that everyone rubber ducks in their own way, and if it gets the job done, it is a good rubber duck process. It must be something that either helps you organize your mind until you see the picture more clearly, or you focus your mind on something completely different to give your mind a chance to reset. It can be as simple as breathing techniques,as complicated as solving some manner of puzzle, or as exhausting as a 10 kilometer run. It is just something that works for you.

Learning #7: Ask For The Right Kind of Help¶

This final bit of learning is one that should be obvious but is not always obvious all the time: ask for help, but ask for the right kind of help and in the right way.

I have always used the forums for the CommonMark reference implementation as a resource but have recently started to ask questions in those forums. For the first 95% of the parser, the specification has great examples and great explanations for each scenario that I came across. Which has higher precedence: a link sequence or a code span sequence? See example 533 and the text around it. Are empty links allowed? See example 559 and example 560. But the questions are not always that easy to answer.

Now that I am into that last 5% of scenarios for the parser, I often find myself outside in the weeds as far as the specification and reference implementation goes. In some cases, the parser is wrong, and I need to understand how it is wrong to fix it properly. In some cases, the specification is poorly worded or did not include some text required to resolve the specific outside case that I found. And in some rare cases, I find issues with the reference implementation that I am not convinced are issues until I talk through them with the forum. In those cases, after discussion with the members of that forum, I add issue reports to the respective GitHub repository for that parser.

I believe that part of my success in communicating with that forum is based on a handful of principles. The first principle is that no parser is wrong, it just depends on what the specific requirements of that parser are. The second principle is that I am not there to point fingers at anyone or their implementation, but to ask questions and get some help. This means that I try and always word my questions using the word “I”. “I am having trouble understanding…” “I am not sure where I went wrong with…” “I am reading the specification and…” The third principle is that I always do my homework. Before I post something to the forums, I make sure I have looked at the problem from multiple angles and try and document the relative research in my post. I do not want anyone to do more work than they have to in order to help me. Maybe it is just me, but I find that rude. And finally, the last principle is an easy one. Be gracious. Everyone on the forum is participating because they want to, and many of them are responding on their own free time.

Every time I ask for help, it is with those principles in mind. And so far, it seems to be working quite well!

What Was My Experience So Far?¶

While this entire article is about my experiences and how I have learned from them, the one thing that I have learned a lot more about in the last year of working on this project is my ability to be patient. I am not sure to which extent it is surviving through this year of the pandemic or it is developing the project to meet my goals, but I know both have contributed to a renewed sense of patience that I now have.

In both cases, I think that things just are the way they are, and they must play out. Sure, I can rush the PyMarkdown project and release early, but then I would not feel good about the quality. Yes, I can decide to not be careful with the pandemic around, but I would feel terrible if someone near me got sick because of my actions. With the project, I have requirements and goals, and I just need to follow those as my North Star. With my life, I know that helping to ensure the health and safety of others is one of personal life North Stars. In both cases, I must be patient follow what I believe to be right.

It took me a lot of effort, both project-wise and life-wise to get to this point in the project. Now that I am on a clearly defined road to release, I know that I can be patient for just a little bit longer, without sacrificing any of my requirements. For me, that is a great thing to realize and learn about myself!

What is Next?¶

Having exhausted the list of unprioritized items in the issues list, it was time to get to work on the Priority 1 and Priority 2 items on that list. Stay tuned!

To be clear, Duckie is not sentient and does not speak to me. It really is a test of whether or not I think that Duckie could understand it, if he had a decent enough amount of sentience. ↩

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments