Markdown Linter - Elevating Extensions · Jack's Digital Workbench

Summary¶

In my last article, I talked about getting back to PyMarkdown and my efforts to remove items from the issues list. In this article, I talk about elevating the extension object support in the PyMarkdown project to the same level as plugins rules.

Introduction¶

This is not going to be a terribly long article. Not for lack of content, but for lack of cleanly reportable content. For me, cleanly reportable contents are things that I can say “here is what I did” or “I did this because”. And for this week’s work, that content is mostly about documentation and reorganization.

It was in the weeks leading up to this work that I started thinking about extensions in a different light. That light was that extensions and plugin rules were similar concepts, but for different foundation objects. As soon as that idea settled into my brain, I knew I needed to elevate extensions to the same level as plugins.

But here is the hard part for a writer. Writing about documentation is boring. Writing about refactoring is boring. “Hey, I refactored this function from this module to this other module and made it work” is not exactly something that screams “read me!” Neither is talking about how I spent hours agonizing over trying to get the right theme and voice for the documentation. I mean, readers that are writers may sympathize, but I am very sure that is all. And while I do talk about why I made certain moves or enhancements, there is only so much of that content available.

Here is to hoping that I can make refactoring and documentation more interesting!

What Is the Audience for This Article?¶

While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please consult the commits that occurred between 17 Jun 2021 and 20 Jun 2021.

Making Extensions First-Class Citizens¶

It was a while ago when I started asking myself if extensions needed to be on the same level as plugin rules. In any kind of rules engine, such as a linter, the object holding the rules is a first-class citizen by default. Basically, you need the engine, and you need the rule. Without either one, the other one is useless. As the rules are contained within plugin objects, those plugin rules are first-class citizens. As one of my professors in college would say “Q.E.D.”

But what about extensions? Was there a similar argument to be made for extensions being first-class citizens? For the last few months, I was convinced that similar arguments could not be made for the extensions. After all, what did extensions do for the rules engine? How did they enhance the behavior of the linter? At best, they were scan-time switches that had a bit of an algorithm behind them.

And then one evening, after one of the most embarrassing face palms in my professional life, I realized that I was getting that answer because I had been asking the wrong question. I had been looking for a similar argument, starting at the same starting point as with plugin rules: the linter. What I needed to ask myself is whether there are any other major components that I could use as a foundation element. I needed think about whether one of the other components had a similar relationship with extensions that the plugin rules have with the linter. At that point, it was obvious to me that there was a good solid answer: the PyMarkdown parser.

While I do not need the same flexibility with the extensions that I do with the plugin rules, I do need some of the same options. The big options I currently need for the extensions are configurability and observability. Extensions require configuration to allow them to be enabled or disabled, and not much more. As such, the configuration aspect was already dealt with, but could be made more transparent to the user.

The first half of the observability option was already taken care of: the token stream. If enabled and if their conditions are met, both the Pragma extension and the Front-Matter extension place a token in the token stream generated by the parser. But the other half was important as well, and that was observability from the command line. The Plugin Manager presents that information to the command line through the list and info subcommands. And from my use of the command line to check to see if a plugin rule is enabled, it worked well.

But was it really a good model? Would it work for users other than myself? It was an interesting idea, but I needed to give it more time to develop. Luckily enough, I needed to work on documentation first.

Documentation Is Important¶

I felt that it was the right decision to increase the project’s support for extensions to the same level as with the rule plugins. I also knew that the first step on that path was to create a landing page for extensions, linking the existing two placeholder pages for each extension to that page. With that accomplished, the task to fill out those two placeholder pages was next on the list.

To show how seriously I took this effort, I did not want to commit something in that was half done. So, even with about 90% of the work done by the time I started writing last week’s article, I elected to not commit that work that I had completed. It just did not feel right, so it was not until I spent some time on Wednesday and Thursday to complete those two documents that I committed those changes. And I was glad that I made that decision.

If any reader is under the delusion that documentation is easy, let me address that notion. To be blunt, adding documentation is easy. Sit down, write some stuff, and save it into a file. Done. But that will not produce good quality documentation that is well thought out, easy to read, and addresses the concepts that readers expect. I am not sure about other writers, but for me that usually involves at least 5 passes through the document, from a rough note pass in the beginning to a grammar/spelling/fine-tuning pass at the end. The adage is true: Garbage in, garbage out.

So, I take my documentation tasks seriously. If I want the project to have a high level of quality, every part needs to have a high level of quality, including documentation. For me, creating a project is about the completed picture that is presented by the project, not just the source code. So, without reservations, I worked on the documentation, giving it the time that it needed and not compromising.

Improving Through Documentation¶

As I have talked about before, walking through documentation is also a great way to see if you have properly implemented and tested a feature. This was no exception. Between both extensions, I added 10 new scenario tests to make sure the extensions were tested properly. There were not any really serious omissions in the tests, just “interesting” corner cases that might come up in everyday usage. It was not until I was walking through the documentation, writing the words, that I thought “well, what if I…”.

Along the way, to make things easier, I introduced the ParseBlockPassProperties to contain any parsing related properties that I needed to pass around. While I only had two extensions at this point, I knew that other ones were coming. My intention is to use this class as a simple data class, allowing me to pass around properties that are moderately static. At this point, I use the term “moderately static” to refer to instances that are not going to change during the parsing of the document. As the flags to see if the extensions are enabled so not change during the parsing, this is a single place to put them that can easily be passed into functions.

Adding The Extension Manager¶

The next part on the journey to make extension first-class citizens was to create an Extension Manager. My vision of the Plugin Manager was that it was the object completely responsible for anything to do with plugin rules. If that was so, then it made sense that I would create an Extension Manager to serve as the same type of foundational object, an object responsible for anything to do with the extensions.

That idea also made sense to me from a more practical point of view. By going along the path that extensions and the Extension Manager were in the same mold as plugins and the Plugin Manager, it meant that there were common code and concepts that I could use from the Plugin Manager, saving some time. From my viewpoint, if I honestly agreed that there was overlap, and did not force myself to believe there was overlap, I could probably repurpose any code that implemented overlapping functionality.

With that in mind, I created a copy of Plugin Manager and started pulling anything that was plugin specific out of the new Extension Manager. Instead of scanning a directory and loading any found plugins, I decided that extension would only be added from a list maintained by the class. While this resulted in the removal of the scanning and loading code, it was done for a more practical reason. Unlike the plugin rules, currently each extension requires a hard-coded entry point from the parser. I hope to change that in the future, but that is where it is currently.

What was left? The apply_configuration function was slightly changed to handle the extensions, but the same idea of an “enabled extensions” list was maintained. That made it easy to keep the command line logic for listing the extensions, with only slight modifications. Similarly, the logic for displaying the information on a specific extension on required slight changes, mostly in the formatting of the data to be displayed.

The determination of the enabled state of the extensions was also kept mostly intact, but with two key changes. Whereas plugins have multiple identifiers, the plugin id and one or more plugin names, extensions only have one identifier. That simplified some code from six lines down to one line. At least for now, another change is that I commented out the code to allow the extensions to be directly enabled and disabled from the command line. To be honest, I am not sure if I feel the need to enabled extensions with the same flexibility and frequency as with plugins. I will think about that and get back to that later.

Wrapping It Up¶

All in all, it took about a good solid six to eight hours to get everything coded and the tests all passing. To round everything out, I decided to include a debug extension (extension_one.py) and a roughed-out module for each extension listed in the GitHub Flavored Markdown specification. The debug extension was just a tricky, hidden way in which I could test some of the more difficult to reach places in the Extension Manager.

As to the placeholder extensions, I just felt that it was a good time to get those features, or something standing in for those features, in the project. I did not have any plans of adding them anytime soon, but I did want to show that I had plans to add them. I also did some mental exercises and walked through how I might implement each one of them. I did not do this out of a need to design those extensions, but to ensure that the work I was doing on the Extension Manager could support those hypothetical designs.

There really was not much to test, because this was largely a reorganization of features. As such, there were only a couple of small changes to the copies of the tests from the Plugin Manager, and a couple of extra tests to fill in some code coverage blind sports.

With that hard work done, it was on to the next thing!

Welcoming Code Into The Fold¶

Next up was a simple set of refactorings to try and come up with a set of behaviors that would help constrain the extensions so that they could be treated as a class of objects instead of a collection of distinct objects.

To start, I looked at the PluginDetails object from the Plugin Manager and created an ExtensionDetails class.

    def __init__(
        self,
        extension_id,
        extension_name,
        extension_description,
        extension_enabled_by_default,
        extension_version,
        extension_interface_version,
        extension_url=None,
        extension_configuration=None,
    ):

Other than replacing the text plugin with extension and replacing plugin_names with extension_name, no changes were needed. In addition, instead of using the pattern of having a FoundPlugin class as the intermediary for this information, I decided to implement two separate member variables, one for each concern. The self.__extension_objects variable contains a dictionary of the extension classes, while the self.__extension_details variable contains a dictionary of their ExtensionDetails classes.

To be honest, I am not sure which approach is cleaner. Having a class that is the same as another class with one extra field or having two lists. At the very least, I think I want to come to a resolution on these approaches and unify them going forward. But to do that, I need to see how both perform and decide.

And To Finish Up¶

Given all that work to get everything extension related into the new Extension Manager class, there was one small bit of work left to do. None of it was terribly difficult, but for the sake of neatness, I believe it was all required.

First, I took the existing content of the __init__ function of the ParseBlockPassProperties class:

    def __init__(self, properties):
        self.__properties = properties
        self.__front_matter_enabled = self.__properties.get_boolean_property(
            "extensions.front-matter.enabled", default_value=False
        )
        self.__pragmas_enabled = self.__properties.get_boolean_property(
            "extensions.pragmas.enabled", default_value=True
        )

and replaced it with a more Extension Manager friendly:

    def __init__(self, extension_manager):
        self.__front_matter_enabled = extension_manager.is_front_matter_enabled
        self.__pragmas_enabled = extension_manager.is_linter_pragmas_enabled
        self.pragma_lines = None

Once again, nothing difficult, but it was important to me to get this main switch for the two existing extensions moved over.

After that, I moved the compile_pragmas function from the Plugin Manager and the look_for_pragmas function from the Container Block Processor over to the PragmaExtension class. In the process of moving that code over, I changed the identifer for the extension from pragma to linter-pragmas.

Why Was This Important?¶

There is a saying “if it ain’t broke, don’t fix it.” More often, this is used to indicate that if things do not require any effort to let them keep on going forward, do not disturb them. But that saying really relies on one important decision point: is the thing in question broken?

At some point soon, I do want to implement the other extensions outlined in the GFM Specification. While some of them do not have a lot of benefit in my mind (strikethrough), there are others that I do assign a large benefit to (tables). It makes sense to do some of that leg work now, knowing that I will use it later.

In my mind, there is another, more important factor to consider. That factor concerns the cost to implement some of the features in the Plugin Manager without going all in. One of the reasons that I created the Extension Manager in the way I did was the low cost associated with copying it from the Plugin Manager. I was able to reuse most of the application code and the scenario test code in the process. Sure, I had to change it to work with extensions instead of plugins, but I saved a lot of time and effort by taking that approach.

If I did not take that approach, I would have had to develop another type of manager with its own quirks. That means starting out with a new set of requirements and tests that I needed to satisfy. That would take time and effort. And in the end, unless plugin rules and extensions do not have the overlap that I believe they do, I would probably want to collapse them into one paradigm anyways.

I was confident that the overlap was sufficient to make copying the Plugin Manager code a smart move. Now it is just a matter of time to prove to myself that it was the right choice.

What Was My Experience So Far?¶

A friend of mine who is a writer has often reminded me that it is those things that we find more difficult to do are the most rewarding. I am seriously not a writer; through I do try and write to the best of my ability. While I am on the road to being a “capital-W” writer at some point, should I choose to do so, I still have a lot to learn.

But I do find it rewarding. Yes, I even find writing documentation rewarding. I find writing these weekly articles rewarding. Part of that is because I find satisfaction in helping people. What else is documentation than helping people understand or use a particular object or project?

And that is the key for me: it helps people… including myself. This process of adding an Extension Manager during this beta release period was not a decision that I made lightly. It could have gone horribly wrong. But it did not because I had a number of support structures in place, including documentation on the extensions. It was in walking through them that I figured out I needed to elevate extensions, and it was walking through them again that I helped myself have a coherent implementation about extensions.

So now I have an Extension Manager that cleanly takes care of the extensions. It exposes a common command line interface with the Plugin Manager, so there is synergy there. It is not as full featured as the Plugin Manager, but it does not need to be. And I am happy with where it landed!

What is Next?¶

With that refactoring out of the way, I thought I would have some “fun” in the next week and try to solve an issue I have had for a while: making sure that the parser can handle transitions back and forth between lists and block quotes.

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments

Markdown Linter - Elevating Extensions