Markdown Linter - Road To Initial Release - Command Lines

Summary¶

In my last article, I talked about some of my requirements for configuration and how I applied those requirements to the PyMarkdown project. In this article, I talk about how I integrated that configuration object more readily into the PyMarkdown project, revamping the command line interface to match that newly upgraded configuration object.

Introduction¶

It might seem to others that I left the command line to last because I forget about it. Nothing could be further from the truth. For any kind of non-graphical utility, the command line is the central hub from which the application gets executed in various ways and forms. Therefore, if the utility does not have the right command line interface, it can seriously impact the usability of that application. It is for that exact reason that I usually add a “placeholder” command line interface during my development phase, only adding the real interface when I have a solid idea of which features have made it into the final version of the project.

While that placeholder provides the access to the features that the project needs, it often does them in a haphazard way. Now, with everything else dropping into place, it was time for me to finalize how configuration was being applied to the project and to start thinking seriously about what command line interface I wanted the project to have. After all, I am getting really close to having met my own requirements for an initial beta release!

What Is the Audience for This Article?¶

While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please go to this project’s GitHub repository and consult the commits between 11 Mar 2021 and 25 Mar 2021.

Wiring Up The New Properties Class To The Project¶

After the work detailed in my last article, the project now has a file-based configuration via the ApplicationProperties class that can handle the more complex needs of the project. While I did refer to the process of applying ordering to the different sources of configuration in the section on Ordering, I had not wired it up at that point. I had added the code mentioned in that section as a simple test of whether the process worked, but I had not yet added fully functional code that followed that process to the project. It was time to do that.

Logging¶

The obvious first choice for configuration to add to the project was logging. It was also the trickiest in a certain sense. From the point at which the configuration was properly initialized and the determination of the log file and log level is achieved, everything works as everyone expects. However, there is a slight amount of time between the start of the application and that point where things are in a grey zone. During that time, the loading of the configuration file itself or determining the logging properties can log information themselves that may be of use when debugging a scenario. With those systems dependent on configuration that was not yet initialized when I needed to debug those systems, I was left to hardwire logging workarounds directly into the code. I wanted a better way to handle those debug scenarios.

Looking at what I had available to use, I decided to imply extra context for the command line’s --stack-trace flag. Normally, my utilities use this flag to enable a mode where an exception that stops the application displays the full stack trace of the exception. As this flag defaults to disabled, the user just gets a simple message indicating what the error is. Most of the time, users do not care about where in the application something broke down, they just want it to work. If it is not working, they want to know why and if they can fix it themselves. However, when I look for root causes and I need to debug those issues, having a full stack trace of how that error was generated is always helpful. Following that logic, I determined that the only time a user might need to debug the loading of the configuration file is if they were debugging one of those scenarios that the --stack-trace flag was meant for.

As such, I added some code at the front of the main function to handle the setup for the logging system, using the --stack-trace flag in a useful manner:

    base_logger = logging.getLogger()
    base_logger.setLevel(
        logging.DEBUG if self.__show_stack_trace else logging.WARNING
    )

    if args.configuration_file:
        ApplicationPropertiesJsonLoader.load_and_set(
            self.__properties, args.configuration_file, self.__handle_error
        )

    effective_log_level = (
        args.log_level if args.log_level else self.default_log_level
    )
    log_level_to_enact = PyMarkdownLint.available_log_maps[effective_log_level]

To enable debug logging before the log level and log file are setup properly, the log level is set to DEBUG if the --stack-trace is present and WARNING otherwise. Then, the configuration file is loaded and the log level to use is calculated with the appropriate final log level. However, the important part of this is that I now have a simple switch that allows me to debug the configuration initialization without resorting to rewriting code. That is a bonus I like!

With that code in place, I removed the __load_configuration_and_apply_to_plugins function and the load_json_configuration function. I replaced those functions with the new __apply_configuration_to_plugins function that performed most of the same functions. The big change here was that instead of being passed a simple dictionary object, the apply_configuration function of the PluginManager class was now being passed an instance of the ApplicationProperties class. It was time to wire that up as well.

Plugin Manager¶

Starting with the FoundPlugin class, I decided to make all those properties read-only by making the fields private and exposing those fields using properties. At the same time, I renamed arguments for the constructor from enable_rules and disable_rules to enable_rules_from_command_line and disable_rules_from_command_line. While the old names were okay, I wanted to make sure the source of their values were clearer. I also cleaned up the registration of the plugins to ensure that things were consistent. I added the following regular expressions:

    __id_regex = re.compile("^[a-z]{2,3}[0-9]{3,3}$")
    __name_regex = re.compile("^[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]$")

to make the id field two or three lower case letters followed by three numbers and limit the characters for the name field. I figured that since I was there, I might as well leave the code better than when I got there.

Changing The Plugins¶

After that cleanup, I proceeded to make the necessary changes to accompany the changes made in the main function. As the PluginManager instance is just a container used to provide access to the various registered plugins, the instance itself has no need for configuration. This kept things simple:

    plugin_specific_facade = None
    for next_key_name in next_plugin.plugin_identifiers:
        plugin_section_title = f"{PluginManager.__plugin_prefix}{properties.separator}{next_key_name}{properties.separator}"
        section_facade_candidate = ApplicationPropertiesFacade(
            properties, plugin_section_title
        )
        if section_facade_candidate:
            plugin_specific_facade = section_facade_candidate
            break

Going through the list of identifiers for the plugin, the PluginManager instance creates a test instance of an ApplicationPropertiesFacade object, initializing it with the property string leading up to that point. For the md002 plugin, if the id field is used, the section title is plugins.md002., with similar titles for each of the comma-separated values in the name field, plugins.first-heading-h1. and plugins.first-header-h1.. Once created, if that test facade instance reports that it has any properties, it is considered to be the best candidate, and any other configuration for that plugin by its other names is ignored.¹ That best candidate is then passed in to the plugin’s set_configuration_map function, where it is used by the initialize_from_config function to set the actual configuration for that plugin.

The Fallout From Those Changes¶

With a new type of object being passed for configuration, the plugins themselves required other changes to work properly. The first change that this required was changing every plugin to utilize the ApplicationPropertiesFacade instance that was now being passed in as an argument instead of the old dictionary instance. Using the changes for rule md002 as an example, the code in the initialize_from_config function changed from:

    self.__start_level = self.get_configuration_value("level", default_value=1)

to:

    self.__start_level = self.plugin_configuration.get_integer_property(
        "level", default_value=1
    )

The second change that happened was an appropriate change in test data. Prior to this change, the test configuration for the md002 plugin was usually² presented in the form:

{"MD002": {"level": 2}}

To accommodate the addition of the plugins namespace, each test configuration provided in the supplied_configuration variable needed to be changed to add that namespace, as follows:

{"plugins": {"md002": {"level": 2}}}

With both changes in place, the only thing left was to hook up was the enabling and disabling of plugins from the command line. This change replaced the contents of the __determine_if_plugin_enabled function with the slightly more complicated:

    new_value = None
    for next_identifier in plugin_object.plugin_identifiers:
        if next_identifier in command_line_disabled_rules:
            new_value = False
            break
    if new_value is None:
        for next_identifier in plugin_object.plugin_identifiers:
            if next_identifier in command_line_enabled_rules:
                new_value = True
                break
    if new_value is not None:
        plugin_enabled = new_value

On the entrance into the __determine_if_plugin_enabled function, the plugin_enabled argument is already set to its default value. As such, this function only needs to determine if there is an override to use and to apply that override. This function was changed to treat both the id field and each comma-separated value of the name field as equal in terms of being identifiers. While this initially only applied to the name of the configuration section in the JSON file, it made sense to use it here as well. In this case, if any of those identifiers are found in the command_line_disabled_rules set, the plugin is disabled. If no matches are found in the disabled set, the same process is repeated for the command_line_enabled_rules set. Only if a matching value is found in one of those sets is the plugin_enabled variable overridden with the new value.

To me, this just made sense. If a user prefers to disable a rule using its id, they can do that. If a user prefers to enable a rule using one of its names, they can also do that. That flexibility appealed to me, and I hope it appeals to users.

Testing¶

Hopefully, any readers of my website should know that I use test driven as my main process of writing quality code. Given that process, I added the new tests as I went and before adding the functionality required to complete those tests. While it was a bit more difficult to do with all these changes going on, it was pivotal to the way in which I work. And it was because of that process that I remembered an imbalance that I had put off until somewhere near this point in the project. I had done a lot of work on configuration files, but almost none on command line configuration.

What About The Command Line?¶

With the ApplicationProperties mechanism wired into the project, it was time to tackle the other part of that mechanism: the command line. Up to this point, I had a command line interface to the project, but it was a command line that was put in place by default. I never sat down and thought about what style of command line interface I wanted for the project, and really thought through it. It was time to do that.

Which Way To Go?¶

Using various utilities on my system, I found a handful of different styles of interacting with the user on the command line. But the one that really stuck with me is what I refer to as the “Git” style of command line organization. Similar to my discussion in my last article on complex configuration, this style of command line organization is complex. This style is built of building blocks of commands and options, allowing more complex command lines to be constructed.

For example, take this command:

git -c ab=cd status --long setup.py

The main part of the command is the git at the start of the line. The -c ab=cd part of that line is a modifier to that git command. Following that, the status part of that line specifies the status subcommand, which in turn has its own modifier --long to alter how that status subcommand is presented. Finally, the text setup.py is another modifier to the subcommand, asking that only information for the file setup.py be provided.

To me, this makes very good sense for the PyMarkdown project and the command line interface that I want for it. While the main “scanning” workflow will be the most used workflow, there are other workflows that I want to expose to allow users to find out information about the results of the scan. The one that came immediately to mind was being able to show the user more information about the installed plugins. As such, a simple version of the command line interface would not work. I needed a command line interface that supported the “scanning” workflow as well as the “show information about plugins” workflow.

To be clear, this type of a command line interface is not required for every project that I do, but it was required for this one. This interface was complex enough to have to support multiple workflows, which in turn made this command line processing complex, just like the configuration.

Wiring Up Subcommands¶

Once I had that figured out, it was time to go to work. When I took my initial look at the __parse_arguments function, I was afraid that I was going to have to rewrite the entire function. However, on a closer look, I discovered that it was going to be more of a reorganization than a rewrite. Except for the --version option and the --list-files options, the options were all still valid for the application itself. It was once those options were added to the argparse object that things changed.

To handle this kind of scenario, the argparse object handles subcommands by using subparsers. So, to add a subcommand to handle the scan workflow, I used the following code:

    subparsers = parser.add_subparsers(dest="primary_subparser")
    new_sub_parser = subparsers.add_parser(
        "scan", help="scan the Markdown files in the specified paths"
    )

Once I had that subparser created, I added the rest of the arguments that were specific to that subcommand:

    new_sub_parser.add_argument(
        "-l",
        "--list-files",
        dest="list_files",
        action="store_true",
        default=False,
        help="list the markdown files found and exit",
    )
    new_sub_parser.add_argument(
        "paths",
        metavar="path",
        type=str,
        nargs="+",
        help="One or more paths to scan for eligible files",
        help="one or more paths to scan for eligible Markdown files",
    )

For this subcommand, the --list-files option that was removed from the base argparse parser was placed under the scan subparser along with the paths variable to hold the paths of Markdown files to scan. The plugins subcommand was added in the same manner:

    new_sub_parser = subparsers.add_parser("plugins", help="plugin commands")
    new_sub_parser.add_argument(
        "-p",
        "--list-plugins",
        dest="list_plugins",
        action="store_true",
        default=False,
        help="list the available plugins and exit",
    )
    new_sub_parser.add_argument(
        "paths",
        metavar="path",
        type=str,
        nargs="+",
        help="one or more paths to scan for eligible Markdown files",
    )

Finally, to implement the --version option (which was really a subcommand disguised as an option in the previous version of the command line interface), a subcommand was added for it as well:

    subparsers.add_parser("version", help="version of the application")

With all those options in place, I executed the application from the command line and noticed that I needed to do a bit more work myself now that I was using subcommands. I needed to handle the case where no subcommands were present as well as handling the version subcommand. After a bit of experimentation, I came up with the following code:

    if not parse_arguments.primary_subparser:
        parser.print_help()
        sys.exit(2)
    elif parse_arguments.frodo == "version":
        print(f"{self.__version_number}")
        sys.exit(0)
    return parse_arguments

Back in the main function, implementing the scan and plugins subcommands were easy. As all other subcommands were taken care of, so adding the following code:

    if args.primary_subparser == 'plugins':
        self.__initialize_plugins(args)
        return_code = self.__plugins.handle_argparse_subparser(args)
        sys.exit(return_code)

allowed me to redirect the plugins workflow to the PluginManager class. If the program’s flow was not redirected at that point, the scan workflow would execute by default.

With the command line now organized in a more orderly fashion, it was time to clean up the use of those workflows.

Revisiting Logging¶

In the Ordering section of my last article, I provided an example of how ordering was important when considering how to apply configuration to a project. Specifically, that code snippet was from the following function:

def __initialize_logging(self, args):

    new_handler = None
    effective_log_file = args.log_file
    if effective_log_file is None:
        effective_log_file = self.__properties.get_string_property("log.file")

    if effective_log_file:
        new_handler = logging.FileHandler(args.log_file)
        logging.getLogger().addHandler(new_handler)

    effective_log_level = args.log_level if args.log_level else None
    if effective_log_level is None:
        effective_log_level = self.__properties.get_string_property(
            "log.level", valid_value_fn=PyMarkdownLint.log_level_type
        )
    if effective_log_level is None:
        effective_log_level = self.default_log_level

To keep things neat, I moved any of the logging related initialization code into one of two functions, __set_initial_state and __initialize_logging. The __set_initial_state function simply took the initial logger settings and loading of the configuration file data and placed it under a much simpler function to read. The __initialize_logging function was where all the interesting log initialization really happened, working out what the actual values to be used were.

This new function followed the process that I had outlined in my last article. For both the log level and the log file settings, there are command line and configuration file settings for both settings. In both cases, the command line is processed first, followed by the configuration property, followed by any system default.

Looking at these values and their sources after I completed them, they just looked right. More than that, they looked usable. If no configuration was provided, there were defaults. If a configuration was provided, any supplied values would be taken from there, if they were present.

Fixing An Old Issue¶

Back in the section titled Changing The Plugins, I presented something that was just a little off. I presented it that way because at the time that I wrote that code, it looked right and was passing every existing test. It was only at this point, when I was more thoroughly testing the PluginManager configuration that I noticed that the first ApplicationPropertiesFacade tried was always being selected.

Yeah, I needed to fix that. The first thing I did was to thin out that for loop and move most of that code into a new __find_configuration_for_plugin function. That changed the relevant code to:

for next_plugin in self.__enabled_plugins:
    try:
        plugin_specific_facade = self.__find_configuration_for_plugin(
            next_plugin, properties, always_return_facade=True
        )

...

def __find_configuration_for_plugin(
    cls, next_plugin, properties, always_return_facade=False
):
    plugin_specific_facade = None
    first_facade = None
    for next_key_name in next_plugin.plugin_identifiers:
        plugin_section_title = f"{PluginManager.__plugin_prefix}{properties.separator}{next_key_name}{properties.separator}"
        section_facade_candidate = ApplicationPropertiesFacade(
            properties, plugin_section_title
        )
        if not first_facade:
            first_facade = section_facade_candidate
        if section_facade_candidate.property_names:
            plugin_specific_facade = section_facade_candidate
            break

    if always_return_facade and not plugin_specific_facade:
        plugin_specific_facade = first_facade
    return plugin_specific_facade

From there, two small changes were required to fix this function. The first change was to fix the problem of the first facade always being selected. This was accomplished by adding the property_names function to the facade and checking to see if at least one property was found. If one property was found, then the initial intention of the search was satisfied, and a configuration entry was found. This is what I had intended to do with the initial implementation, so it felt good to correct this issue.

But that caused another issue to manifest, one that was never an issue before the first issue was fixed: what if no configuration values were ever found? As there are cases where the application needs to know nothing was found and cases where it just wants some default behavior, I added the always_return_facade argument. With this argument set to True, the first facade is returned instead of a None object.

With that part of the configuration fixed, it was now time to add the plugin workflow.

Boring Support For Plugins¶

One of the main reasons that I added a complex command line interface was to support multiple workflows. As I felt that the scan workflow was now developed enough to test, it was time to implement the plugins workflow. In my design for this workflow, I wanted to support a listing of all plugins and summary information, as well as more focused information for a specific plugin. Much like how I added the scan and plugins subparsers to the main argparse instance, I added the list and info subparsers to the plugins subparser.

From there, it was just a matter of handing off control at the right spot to each of those processors, as such:

def handle_argparse_subparser(self, args):
    """
    Handle the parsing for this subparser.
    """
    subparser_value = getattr(args, PluginManager.__root_subparser_name)
    return_code = 0
    if subparser_value == "list":
        self.__handle_argparse_subparser_list(args)
    elif subparser_value == "info":
        return_code = self.__handle_argparse_subparser_info(args)
    else:
        PluginManager.__argparse_subparser.print_help()
        sys.exit(2)
    return return_code

So why did I start the title for this section with the word “Boring”? It is because once I added subparsers on the main argparse instance, this was just copying that pattern and pasting it to the plugins subparser. The list subcommand was a simple loop through all available plugins and adding values to a string to display. The info command was just locating a given plugin and displaying the information for it. After the rest of things that have been added to this project, that was quite boring. Necessary, but boring.

Almost Boring¶

Two exceptions did stand out though. The first one was the use of the Columnar library for output. This library is very effective in determining how much screen real estate the application has, and sizing columns to fit that screen size. That was very useful, instead of having to create that functionality myself.

The second exception was to mimic the results of the glob library to work on in-memory strings. Given a command line of pymarkdown plugins list plugin-id, it should be obvious that only the plugin with the id or name plugin-id should be listed. But what about the command line pymarkdown plugins list md??9? If the strings were filenames, then the plugins that started with md, contained two more arbitrary characters, and finally the digit 9 would match. Why could I not do that with plugins and their identifiers?

Thankfully, due to my earlier work to reduce the allowable text for plugin ids and plugin names to a smaller subset of values, this effort was made easier. As I did not have to worry about the * or ? characters appearing natively in the strings that I was looking at, the regular expression to satisfy my required glob functionality was:

    list_re = re.compile(
        "^" + args.list_filter.replace("*", ".*").replace("?", ".") + "$"
    )

For those readers not familiar with regular expressions, the ^ at the start of the expression and the $ at the end of the expression are anchors. Together these specify that the expression inside of those characters must match the entire string, not just being found somewhere inside of the string. For the rest of the expression, I used the . character in two forms. As the . character represents any single character in the expression, it was a perfect match for the ? glob character. As an extension of that, the * glob character represents any number of arbitrary characters. To match that in the expression, the sequence .* matches zero or more instances of any character.

After satisfying all the scenario tests I had written, and adjusting output to match the columnar output, the plugins subcommand tests were all passing. Things were good with those scenario tests. I now had to expand my scope to all the scenario tests in the project.

Altered Tests¶

When the change was made to the command line, I expected most of the existing scenario tests that deal with the command line to fail, and I was not disappointed. For most of the existing tests, I just needed to add the scan keyword to the right position in the arguments list, as that was the workflow that already had tests before this point. I stopped counting those changes after I passed the count of 20.

But it was a very useful task. As I progressed through the tests, I was able to see how specifying the workflow or subcommand was enhancing the tests. It just gave each test a bit more context that I felt was missing before. I was able to find a couple of small issues that I would have otherwise missed because of the organization.

To be blunt, between the command line changes and tests to cover them, it just felt like the project was that much more complete.

Lather, Rinse, Repeat¶

After all that work, there were little things there that I needed to work on. I disliked how some of the code was organized, so I went back and tried to organize the code more thoughtfully along usage lines. I poured through the coverage report and looked for missing coverage, adding scenario tests to provide any coverage for missed areas. Just general cleanup stuff.

Adding Version Information For Plugins¶

With that work in place, it was time to see if I was able to add something useful to the project’s configuration without many issues. The something I had in mind was better version information for the plugins. From a plugin point of view, I figured it would be useful to know what specific version was being attributed to that specific plugin. From a plugin manager point of view, it would also be useful to be able to version the actual interface itself. Both would be useful going forward, and useful to relate to the user of the application.

Changing the FoundPlugin class to accept these two new version fields was easy, adding private fields and property functions to access them. From there, I added those two parameters to each of the existing plugins, setting active plugins to 0.5.0 and 1 and setting inactive or “templated” plugins to 0.0.0 and 1. After adding some support in the __register_individual_plugin function to ensure that both versions were in acceptable formats, I was ready to make the big change to expose that information.

Okay, maybe big change is a bit misleading. In the __handle_argparse_subparser_list function, I changed the setting of the display_row variable to:

display_row = [
    next_plugin_id,
    ", ".join(next_plugin.plugin_names),
    str(next_plugin.plugin_enabled_by_default),
    str(is_enabled_now),
    next_plugin.plugin_version,
]

and the setting of the headers variable was changed to:

headers = ["id", "names", "enabled (default)", "enabled (current)", "version"]

These were not big changes, but that was the point. This was a relatively small change and if things were organized properly, it should have required a relatively small number of changes. If that were not the case, I would have been worried.

And I was happy, as I had a good workflow to get output on the plugins like the following:

  ID     NAMES                                ENABLED (DEFAULT)  ENABLED (CURRENT)  VERSION

  md001  heading-increment, header-increment  True               True               0.5.0
  md002  first-heading-h1, first-header-h1    False              False              0.5.0
  md003  heading-style, header-style          True               True               0.5.0

What Was My Experience So Far?¶

While my efforts to have a completed configuration system with command line support is not fully implemented, this works moves the command line system a lot further along to being the beacon of the PyMarkdown project. I realize that I still need a generic way of specifying override values for those values specified in the configuration file, but this is still a big step forward. Now that I have the configuration and command line systems in place, I can spend a bit of reflective time looking for pieces of those systems that I missed while working on other issues.

I really am feeling that the project is coming together. While it has never felt like a hodgepodge of functionality, I always had a feeling that it was just not quite ready to show to the public. With each of these little subsystems that I am focusing on, that feeling is going away and is being replaced with a feeling of confidence that it is ready. In my books, that is a really good feeling to have.

What is Next?¶

After nicely wrapping up configuration and the command line, there was only one thing left to do before I seriously thought about a beta release. That was to resolve the remaining items in the Priority 1 section of the issues list.

Yes, I know that is not what this code snippet actually does. I found that out by adding tests later in on in the process. I fixed this issue later, in the section named Fixing An Old Issue. ↩
Unless I was purposefully creating bad data to test failure cases, that is the form that I passed into the plugins for configuration. ↩

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments