Summary¶
In my last article, I talked my continuing work on the PyMarkdown project, aiming to get it closer to a solid release. In this article, I talk about my thoughts around adding new features to the project.
Adding Features That Make Sense¶
Usually, most new features make sense. Adding support to the
PyMarkdown project to scan for other
extensions besides .md
, that was an easy decision to make. Adding support for
reading from standard input, a bit more of a nuanced decision, but still a relatively
easy decision to make.
Then there was the decision to fix a long-standing issue, issue 330.
That I wanted to fix this was obvious to me, but until recently, the cost of fixing
that issue always outweighed the cost. To make things easier in the initial stages of
development, I added a couple of replace("\\", "/")
statements in the file scanning
code to allow for Windows backslash separators to be treated as the Posix slash
separator. Why? Because then the output always dealt with slashes, regardless of the
operating system involved. Therefore, the test output always used slashes.
But after adding the support for standard input, it just felt wrong to leave those replacement statements in the code. But paying the cost of fixing that issue was not going to be easy. Properly fixing the source code for that issue took thirty minutes including rudimentary testing. After a quick run of all tests, the impact of fixing the tests was clear: over 450 scenario tests were failing because of that change. Digging in a bit more, that was only the failure impact. If I wanted to do things properly, any of the tests for rules plugins needed to be changed to use proper pathing instead of the Posix pathing. I believe when I sketched things out, I took a guess that over 700 scenario tests would need to change.
That is not a small number, it is roughly fifteen percent of the scenario tests. In each test, I would need to change the path of the file to scan from something like:
source_path = "test/resources/rules/md007/some_file_to_scan.md"
to:
source_path = os.path.join("test","resources","rules","md007","some_file_to_scan.md")
adding that at the top of the scenario test if it was not already there.
With that change done, I then needed to look for any instances of the first
string in the rest of the scenario tests, replacing it with either source_path
or {source_path}
, depending on if it was already in a string or not.
But, when I weighed everything out, it was worth it. This was something that I had put off for long enough, and the increase in quality was worth the cost to me. Granted, I thought I could make the transformation in ten hours, but it was still worth the cost after eighteen hours of changes. Each change manual. It was not fun, but it was a good change.
The next change was going to be different. It was going to require deep thinking on my part.
Sometimes, The Decision Is Not An Easy One¶
As I mentioned in the last article, one of the users reached out with Issue 382 and asked if can add support for scanning Markdown in Jupyter notebooks. Needing to think things over a bit, I added the support for processing standard input as it was a good feature and I thought I might need it for this new feature.
But the big question in my mind was: was this actually a new feature? Not that someone could not use this functionality, but was it a feature of PyMarkdown? Or was it a separate utility? This was not an easy question for me to answer. I could see both side of this solution, and in mind, they were balanced almost equally.
On the side of adding this support as a PyMarkdown feature, there were two good points to support it. The first of those points was that it fit in with the main goal of the project: to provide a solid Markdown scanner. It just so happened that every Markdown sample that I have seen up to this point was either the entire contents of a file or the entire contents of the standard input. In my line of work, which is what we call an implementation detail. The second of these points was the ease of use of the scanner. Programmatically, I could see connecting the various file types to “input filters”, selecting the correct filter based on the provide input. I had done variations on that for extensions and rules plugins, so would one more plugin type hurt?
The opposing side was a bit more nuanced. Depending on how I read the main
goals of the PyMarkdown project, the target could be seen in one of two ways.
Going all the way back to an article I wrote on 2019 Dec 08,
there is one line that summarizes this intent:
- must be able to provide a consistent lexical scan of the Markdown document from the command line
If I take that statement literally, then the scan should be dealing with documents, not smaller units of text. Of course, an argument could be made that any text “blob” is a document, but I feel that it is just bending words to fit a scenario, and not honoring the intent. Throughout the PyMarkdown project, files are referenced as a proxy for documents, instead of using terms like file fragments.
Less nuanced and more pronounced is my sense that including this kind of support into the PyMarkdown project will make the project too big. More precisely, I feel that expanding the scope of the project to include the concept of “input filters” is not called for. From a cost-benefit analysis, the cost of supporting another plugin interface, or even a hard-wired interface, does not match the perceived benefit. At least not to me.
And The Decision Is…¶
So, after thinking about it and weighing the pros and cons of this support, I decided to support these features, but in their own project. That new project will take care of doing the necessary management to convert the Jupyter notebook Markdown fragments into a scannable form for PyMarkdown to handle. To that end, I will need to enhance the main PyMarkdown module, but I believe those enhancements will be minor.
And after worrying about a satisfactory solution for this issue for a couple of weeks, I am pleased with my decision. I feel that this will provide a template for any other such features in the future, allowing me to keep a steady handle on the PyMarkdown project. But at the same time, it will allow me to grow the project family while meeting the needs of the users.
Comments
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.