In my last article, I talked about finding middle ground within myself. In this article, I talk about whether the painstaking work that I am doing to enumerate every use case is worth it.
The PyMarkdown project is something that I started over two years ago. As time passes, the one question that rather often comes to my mind is: is it worth it?
It turns out, that was not as easy of a question to answer as I thought it was.
What Is the Audience for This Article?¶
While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather than the solutions themselves. For a full record of the solutions presented in this article, please consult the commits that occurred between 26 Jan 2022 and 06 Feb 2022.
Before We Start…¶
This is a bit embarrassing. When I finish my article for that week, I double check that everything is correct, publish it upon completion. Following that, I upload the new article to my blog, and double check again to make sure that everything looks good. If for some reason I do not do that last step on Monday night, I do it on Tuesday evening. While it might seem pedantic, it works. I have caught the two or three times where everything was published, but I forgot to upload to the blog.
Well, it worked… except for last week. Not only did I forget on Monday night to publish the article, but I forgot on Tuesday night as well. Now, I do have a particularly good excuse for missing Tuesday night: I was watching my home county’s Team Canada Men’s Soccer Team win another CONCACAF World Cup Qualifier with my family. After watching a well fought game between two good teams remarkably close to where I grew up, we then watched a couple of episodes of Disney Plus’s Hawkeye.
For the family, it was a good night. For my blog, an oops night.
Why Am I Working On PyMarkdown?¶
For those readers that are new to my blog, as described in this article:
I want my blog to inspire people and help them learn, like people have inspired and helped me in the past. For technical articles, I feel that I can best do that by focusing more on the why and how of the decisions leading up to the solutions rather than the what of the solutions themselves. For other articles, I feel that I can best do that by being an honest and believable storyteller, helping people to understand issues and situations as I see them.
Whether it is detailing the tough decisions that I make to move the PyMarkdown project further along or describing the details on how I moved the project along, I feel that both types of articles are meeting these goals. As of late, I have felt that it was important to balance those topics with more “squishy” topics, such as Finding Something That Works, Deciding What Is Important, or The Bug That Almost Knocked Me Down. As an author, it is easy to talk about the more technical aspects of developing a project, but as my experiences have taught me, they are only part of the equation.
Regardless of anything others might say about how I am progressing with the PyMarkdown project, it is my project and my contribution to the open-source community. To that end, it means I have the flexibility to work how I want, but I also bear the responsibility when things do not go the way I want them to.
And for me, that is part of the reason I decided to start working on this. When I started authoring articles, I noticed that there was only one other Markdown linter out there. While I do not knock that linter for what it is doing, I honestly wanted something that was easier to use, easier to extend, and more correct. On that last point, please understand that the other linter is doing the best job it can given its constraints. I just wanted to write something that did not have those constraints. Or at least try to create something that did not have those constraints.
What are those two constraints? The first one is that it is a linter that tries to match the specifications for multiple Markdown engines as they were around four years ago. To that extent, some of the rules are more generalized than they could be if a single specification was chosen. The second one is that it is a linter that is very general, using pattern matching. From my point of view, I feel that the design choice was very good at getting the linter to the 90-95% coverage mark, but it stopped there.
For a handful of specific rules, context is needed. There are just some patterns that require context for a linter to properly understand them. For example, there are a handful of rules where knowing the context of being within a nested container block affects how the leaf blocks inside of those containers should be treated. Without that context, the pattern is incomplete, creating either false negatives or false positives.
So, I started with those things in mind. I picked the GitHub Flavored Markdown specification because it literally is the de-facto standard. It fills that role so well that most of the more popular Markdown-to-HTML parsers have gravitated towards it in the last three years. This resolved the first constraint for me. The second constraint was resolved by my design to base the PyMarkdown linter on top of a fully compliant GFM and CommonMark compatible parser. By using the tokens used as an intermediate form of the document, I can determine the proper context to use for each of the rules.
But with those decisions comes a price.
Paying The Piper¶
The good point that I keep on reminding myself of is that I have a fully compliant GFM compatible parser as the heart of the project. However, it would be more correct to say that “as far as I have tested”, the parser is compliant. With over four thousand scenario tests and climbing, I feel comfortable in claiming that I am close to achieving my goal. But I know that one area I have concerns about are nested container blocks.
To back that viewpoint up, all I must do is to look back over the past three months at the different issues that I have fixed in the project. As far as I can tell by a quick look back at the project commit logs, I have been mostly checking in changes to one of the container block modules, the markdown generator used to verify the tokenization, or the testing of those changes. And while I am finding only minor changes needed to ensure the parser is working properly, I am still finding those small issues.
Whether it is an issue such as Issue 262 or Issue 263 dealing with bad HTML, which is parser related, or Issue 252, which looks like a simple whitespace problem but is really parser related, I am still finding little issues that need to be fixed. When I say little issues, I mean in terms of their final change area. For example, outside of the scenario tests, Issue 262 required only fourteen lines to be changed before it was resolved. However, it took around five hours of debugging and testing to arrive at those changes.
And yes, I do get tired of testing the project and finding parsing issues. But as someone who has written a decent variety of parsers over the last thirty years, I know that it comes with the territory.
How Do I Figure Out If It Is Worth It?¶
As I was coming up with the ideas for this article, I started to think about this and had a hard time at first coming up with an answer. My first answer was “because it is the right thing to do”, which I thought was a cop-out. Given differing requirements from customers, project management, resources, and technology, I am decently adept at coming up with a good balance between those opposing forces. But since I am in control of all those variables in this project, it seemed hollow when I looked at it.
So, I dug deeper over the next couple of days. As I was working on the remaining issues for this last week, I started jotting down reasons on one of my ever-present sheets of paper. Sometimes I would cross an idea out to replace it with an even better version of that idea, and sometimes I would cross them out without replacing them. As I started to draft this article, I looked at them, and found a common theme among them: because it is what I would expect from any other project.
Hopefully, that does not sound as wishy-washy as the reason above, but it was the overwhelming theme of the ideas I had written down. The way I look at it is this. If I publish a hastily written piece of code to help someone out, I am going to make sure to include a note that essentially says: “I didn’t really test this. Don’t rely on this for critical stuff!” Likewise, if I look at something like PyLint, which defaults to the Python PEP8 style guidelines, I expect PyLint to work as advertised. With its long history and wide user base, I also expect it to be well tested and debugged at this point.
Following those two lines of thought, the
README.md file of the project
states clearly at the top:
This project is currently in beta, and some of these documented things may not work 100% as advertised until after the final release. However, everything should be close enough to done that if you find something missing, please let us know. And until I can get a sufficiently large user base, I feel it is up to me to provide the bulk of the test cases.
What Is My Answer?¶
Yes. The challenging work that I am putting in is worth it.
On a personal level, I started this project with specific requirements in mind and a desire to take this project to a full release. That has not changed. On a professional level, if I was using this project as a user, I would expect to see a wide range of users, a large set of scenario tests, or both. The level to which that professional level was satisfied would directly feed into my confidence about the project.
And yes, it is often mind-numbing working coming up with variations on scenario tests, and then implementing them. But as I mentioned above, it is part of what I would expect from any other project I would use. And, not wanting to be a hypocrite, if I expect it from others, I need to hold myself to the same standard.
What is Next?¶
Having hit the limit I wanted to for the next number release, I plan to do that in the next two days. Then on to more scenarios. Stay tuned!
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.