Summary¶
In my last article, I talked about some of my requirements for configuration and how I applied those requirements to the PyMarkdown project. In this article, I talk about how I integrated that configuration object more readily into the PyMarkdown project, revamping the command line interface to match that newly upgraded configuration object.
Introduction¶
It might seem to others that I left the command line to last because I forget about it. Nothing could be further from the truth. For any kind of non-graphical utility, the command line is the central hub from which the application gets executed in various ways and forms. Therefore, if the utility does not have the right command line interface, it can seriously impact the usability of that application. It is for that exact reason that I usually add a “placeholder” command line interface during my development phase, only adding the real interface when I have a solid idea of which features have made it into the final version of the project.
While that placeholder provides the access to the features that the project needs, it often does them in a haphazard way. Now, with everything else dropping into place, it was time for me to finalize how configuration was being applied to the project and to start thinking seriously about what command line interface I wanted the project to have. After all, I am getting really close to having met my own requirements for an initial beta release!
What Is the Audience for This Article?¶
While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please go to this project’s GitHub repository and consult the commits between 11 Mar 2021 and 25 Mar 2021.
Wiring Up The New Properties Class To The Project¶
After the work detailed in my last article, the project now has a file-based configuration
via the ApplicationProperties
class that can handle the more complex needs
of the project. While I did refer to the process of applying ordering to the different
sources of configuration in the section on
Ordering,
I had not wired it up at that point. I had added the code mentioned in that section as
a simple test of whether the process worked, but I had not yet added fully functional code
that followed that process to the project. It was time to do that.
Logging¶
The obvious first choice for configuration to add to the project was logging. It was also the trickiest in a certain sense. From the point at which the configuration was properly initialized and the determination of the log file and log level is achieved, everything works as everyone expects. However, there is a slight amount of time between the start of the application and that point where things are in a grey zone. During that time, the loading of the configuration file itself or determining the logging properties can log information themselves that may be of use when debugging a scenario. With those systems dependent on configuration that was not yet initialized when I needed to debug those systems, I was left to hardwire logging workarounds directly into the code. I wanted a better way to handle those debug scenarios.
Looking at what I had available to use, I decided to imply extra context for the command
line’s --stack-trace
flag. Normally, my utilities use this flag to enable a mode
where an exception that stops the application displays the full stack trace of the
exception. As
this flag defaults to disabled, the user just gets a simple message indicating
what the error is. Most of the time, users do not care about where in the application
something broke down, they just want it to work. If it is not working, they want to know
why and if they can fix it themselves.
However, when I look for root causes and I need to debug those issues, having a full
stack trace of how that error was generated is always helpful. Following that logic, I
determined that the only time a user might need to debug the loading of the configuration
file is if they were debugging one of those scenarios that the --stack-trace
flag was
meant for.
As such, I added some code at the front of the main
function to handle the setup for the
logging system, using the --stack-trace
flag in a useful manner:
base_logger = logging.getLogger()
base_logger.setLevel(
logging.DEBUG if self.__show_stack_trace else logging.WARNING
)
if args.configuration_file:
ApplicationPropertiesJsonLoader.load_and_set(
self.__properties, args.configuration_file, self.__handle_error
)
effective_log_level = (
args.log_level if args.log_level else self.default_log_level
)
log_level_to_enact = PyMarkdownLint.available_log_maps[effective_log_level]
To enable debug logging before the log level and log file are setup properly, the log
level is set to DEBUG
if the --stack-trace
is present and WARNING
otherwise. Then,
the configuration file is loaded and the log level to use is calculated with the
appropriate final log level. However, the important part of this is that I now have a
simple switch that allows me to debug the configuration initialization without resorting
to rewriting code. That is a bonus I like!
With that code in place, I removed the __load_configuration_and_apply_to_plugins
function
and the load_json_configuration
function. I replaced those functions with the new
__apply_configuration_to_plugins
function that performed most of the same functions.
The big change here was that instead of being passed a simple dictionary object, the
apply_configuration
function of the PluginManager
class was now being passed an
instance of the ApplicationProperties
class. It was time to wire that up as well.
Plugin Manager¶
Starting with the FoundPlugin
class, I decided to make all those properties read-only by
making the fields private and exposing those fields using properties. At the same time,
I renamed arguments for the constructor from enable_rules
and disable_rules
to
enable_rules_from_command_line
and disable_rules_from_command_line
. While the old
names were okay, I wanted to make sure the source of their values were clearer. I also
cleaned up the registration of the plugins to ensure that things were consistent. I
added the following regular expressions:
__id_regex = re.compile("^[a-z]{2,3}[0-9]{3,3}$")
__name_regex = re.compile("^[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]$")
to make the id field two or three lower case letters followed by three numbers and limit the characters for the name field. I figured that since I was there, I might as well leave the code better than when I got there.
Changing The Plugins¶
After that cleanup, I proceeded to make the necessary changes to accompany
the changes made in the main
function. As the PluginManager
instance is just a
container used to provide access to the various registered plugins, the instance itself
has no need for configuration. This kept things simple:
plugin_specific_facade = None
for next_key_name in next_plugin.plugin_identifiers:
plugin_section_title = f"{PluginManager.__plugin_prefix}{properties.separator}{next_key_name}{properties.separator}"
section_facade_candidate = ApplicationPropertiesFacade(
properties, plugin_section_title
)
if section_facade_candidate:
plugin_specific_facade = section_facade_candidate
break
Going through the list of identifiers for the plugin, the PluginManager
instance creates
a test instance of an ApplicationPropertiesFacade
object, initializing it with the
property string leading up to that point. For the md002
plugin, if the id field is
used, the section title is plugins.md002.
, with similar titles for each of the
comma-separated values in the name field, plugins.first-heading-h1.
and
plugins.first-header-h1.
. Once created, if that test facade instance
reports that it has any properties, it is considered to be the best candidate, and any
other configuration for that plugin by its other names is ignored.1 That best
candidate is then passed in to the plugin’s set_configuration_map
function, where it is
used by the initialize_from_config
function to set the actual configuration for that
plugin.
The Fallout From Those Changes¶
With a new type of object being passed for configuration, the plugins themselves required
other changes to work properly.
The first change that this required was changing every plugin to utilize the
ApplicationPropertiesFacade
instance that was now being passed in as an argument instead
of the old dictionary instance. Using the changes for rule md002
as an example, the code
in the initialize_from_config
function changed from:
self.__start_level = self.get_configuration_value("level", default_value=1)
to:
self.__start_level = self.plugin_configuration.get_integer_property(
"level", default_value=1
)
The second change that happened was an appropriate change in test data. Prior to this
change, the test configuration for the md002
plugin was usually2 presented
in the form:
{"MD002": {"level": 2}}
To accommodate the addition of the plugins
namespace, each test configuration provided in
the supplied_configuration
variable needed to be changed to add that namespace, as
follows:
{"plugins": {"md002": {"level": 2}}}
With both changes in place, the only thing left was to hook up was the enabling
and disabling of plugins from the command line. This change replaced the contents of
the __determine_if_plugin_enabled
function with the slightly more complicated:
new_value = None
for next_identifier in plugin_object.plugin_identifiers:
if next_identifier in command_line_disabled_rules:
new_value = False
break
if new_value is None:
for next_identifier in plugin_object.plugin_identifiers:
if next_identifier in command_line_enabled_rules:
new_value = True
break
if new_value is not None:
plugin_enabled = new_value
On the entrance into the __determine_if_plugin_enabled
function, the plugin_enabled
argument is already
set to its default value. As such, this function only needs to determine if there is an
override to use and to apply that override. This function was changed to treat both the
id field
and each comma-separated value of the name field as equal in terms of being identifiers.
While this initially only
applied to the name of the configuration section in the JSON file, it made sense to use it
here as well. In this case, if any of those identifiers are found in the
command_line_disabled_rules
set, the plugin is disabled. If no matches are found in
the disabled set, the same process is repeated for the command_line_enabled_rules
set.
Only if a matching value is found in one of those sets is the plugin_enabled
variable
overridden with the new value.
To me, this just made sense. If a user prefers to disable a rule using its id, they can do that. If a user prefers to enable a rule using one of its names, they can also do that. That flexibility appealed to me, and I hope it appeals to users.
Testing¶
Hopefully, any readers of my website should know that I use test driven as my main process of writing quality code. Given that process, I added the new tests as I went and before adding the functionality required to complete those tests. While it was a bit more difficult to do with all these changes going on, it was pivotal to the way in which I work. And it was because of that process that I remembered an imbalance that I had put off until somewhere near this point in the project. I had done a lot of work on configuration files, but almost none on command line configuration.
What About The Command Line?¶
With the ApplicationProperties
mechanism wired into the project, it was time to tackle
the other part of that mechanism: the command line. Up to this point, I had a command
line interface to the project, but it was a command line that was put in place by default.
I never sat down and thought about what style of command line interface I wanted
for the project, and really thought through it. It was time to do that.
Which Way To Go?¶
Using various utilities on my system, I found a handful of different styles of interacting with the user on the command line. But the one that really stuck with me is what I refer to as the “Git” style of command line organization. Similar to my discussion in my last article on complex configuration, this style of command line organization is complex. This style is built of building blocks of commands and options, allowing more complex command lines to be constructed.
For example, take this command:
git -c ab=cd status --long setup.py
The main part of the command is the git
at the start of the line. The -c ab=cd
part
of that line is a modifier to that git
command. Following that, the status
part of
that line specifies the status subcommand, which in turn has its own modifier --long
to alter how that status subcommand is presented. Finally, the text setup.py
is
another modifier to the subcommand, asking that only information for the file setup.py
be provided.
To me, this makes very good sense for the PyMarkdown project and the command line interface that I want for it. While the main “scanning” workflow will be the most used workflow, there are other workflows that I want to expose to allow users to find out information about the results of the scan. The one that came immediately to mind was being able to show the user more information about the installed plugins. As such, a simple version of the command line interface would not work. I needed a command line interface that supported the “scanning” workflow as well as the “show information about plugins” workflow.
To be clear, this type of a command line interface is not required for every project that I do, but it was required for this one. This interface was complex enough to have to support multiple workflows, which in turn made this command line processing complex, just like the configuration.
Wiring Up Subcommands¶
Once I had that figured out, it was time to go to work. When I took my initial look
at the __parse_arguments
function, I was afraid that I was going to have to rewrite
the entire function. However, on a closer look, I discovered that it was going to be
more of a reorganization than a rewrite. Except for the --version
option
and the --list-files
options, the options were all still valid for the application
itself. It was once those options were added to the argparse
object that things
changed.
To handle this kind of scenario, the argparse
object handles subcommands by using
subparsers. So, to add a subcommand to handle the scan
workflow, I used the
following code:
subparsers = parser.add_subparsers(dest="primary_subparser")
new_sub_parser = subparsers.add_parser(
"scan", help="scan the Markdown files in the specified paths"
)
Once I had that subparser created, I added the rest of the arguments that were specific to that subcommand:
new_sub_parser.add_argument(
"-l",
"--list-files",
dest="list_files",
action="store_true",
default=False,
help="list the markdown files found and exit",
)
new_sub_parser.add_argument(
"paths",
metavar="path",
type=str,
nargs="+",
help="One or more paths to scan for eligible files",
help="one or more paths to scan for eligible Markdown files",
)
For this subcommand, the --list-files
option that was removed from the base argparse
parser was placed under the scan
subparser along with the paths
variable to hold
the paths of Markdown files to scan. The plugins
subcommand was added in the
same manner:
new_sub_parser = subparsers.add_parser("plugins", help="plugin commands")
new_sub_parser.add_argument(
"-p",
"--list-plugins",
dest="list_plugins",
action="store_true",
default=False,
help="list the available plugins and exit",
)
new_sub_parser.add_argument(
"paths",
metavar="path",
type=str,
nargs="+",
help="one or more paths to scan for eligible Markdown files",
)
Finally, to implement the --version
option (which was really a subcommand disguised
as an option in the previous version of the command line interface), a subcommand was
added for it as well:
subparsers.add_parser("version", help="version of the application")
With all those options in place, I executed the application from the command line and
noticed that I needed to do a bit more work myself now that I was using subcommands.
I needed to handle the case where no subcommands were present as well
as handling the version
subcommand. After a bit of experimentation, I came up with the
following code:
if not parse_arguments.primary_subparser:
parser.print_help()
sys.exit(2)
elif parse_arguments.frodo == "version":
print(f"{self.__version_number}")
sys.exit(0)
return parse_arguments
Back in the main
function, implementing the scan
and plugins
subcommands were easy.
As all other subcommands were taken care of, so adding the following code:
if args.primary_subparser == 'plugins':
self.__initialize_plugins(args)
return_code = self.__plugins.handle_argparse_subparser(args)
sys.exit(return_code)
allowed me to redirect the plugins
workflow to the PluginManager
class. If the
program’s flow was not redirected at that point, the scan
workflow would execute by
default.
With the command line now organized in a more orderly fashion, it was time to clean up the use of those workflows.
Revisiting Logging¶
In the Ordering section of my last article, I provided an example of how ordering was important when considering how to apply configuration to a project. Specifically, that code snippet was from the following function:
def __initialize_logging(self, args):
new_handler = None
effective_log_file = args.log_file
if effective_log_file is None:
effective_log_file = self.__properties.get_string_property("log.file")
if effective_log_file:
new_handler = logging.FileHandler(args.log_file)
logging.getLogger().addHandler(new_handler)
effective_log_level = args.log_level if args.log_level else None
if effective_log_level is None:
effective_log_level = self.__properties.get_string_property(
"log.level", valid_value_fn=PyMarkdownLint.log_level_type
)
if effective_log_level is None:
effective_log_level = self.default_log_level
To keep things neat, I moved any of the logging related initialization code into one of
two functions, __set_initial_state
and __initialize_logging
. The __set_initial_state
function simply took the initial logger settings and loading of the configuration file
data and placed it under a much simpler function to read. The __initialize_logging
function was where all the interesting log initialization really happened, working out
what the actual values to be used were.
This new function followed the process that I had outlined in my last article. For both the log level and the log file settings, there are command line and configuration file settings for both settings. In both cases, the command line is processed first, followed by the configuration property, followed by any system default.
Looking at these values and their sources after I completed them, they just looked right. More than that, they looked usable. If no configuration was provided, there were defaults. If a configuration was provided, any supplied values would be taken from there, if they were present.
Fixing An Old Issue¶
Back in the section titled
Changing The Plugins,
I presented something that was just a little off. I presented it that way because at the
time that I wrote that code, it looked right and was passing every existing test. It was
only at this point, when I was more thoroughly testing the PluginManager
configuration
that I noticed that the first ApplicationPropertiesFacade
tried was always being
selected.
Yeah, I needed to fix that. The first thing I did was to thin out that for loop and
move most of that code into a new __find_configuration_for_plugin
function. That
changed the relevant code to:
for next_plugin in self.__enabled_plugins:
try:
plugin_specific_facade = self.__find_configuration_for_plugin(
next_plugin, properties, always_return_facade=True
)
...
def __find_configuration_for_plugin(
cls, next_plugin, properties, always_return_facade=False
):
plugin_specific_facade = None
first_facade = None
for next_key_name in next_plugin.plugin_identifiers:
plugin_section_title = f"{PluginManager.__plugin_prefix}{properties.separator}{next_key_name}{properties.separator}"
section_facade_candidate = ApplicationPropertiesFacade(
properties, plugin_section_title
)
if not first_facade:
first_facade = section_facade_candidate
if section_facade_candidate.property_names:
plugin_specific_facade = section_facade_candidate
break
if always_return_facade and not plugin_specific_facade:
plugin_specific_facade = first_facade
return plugin_specific_facade
From there, two small changes were required to fix this function. The first change was
to fix the problem of the first facade always being selected. This was accomplished
by adding the property_names
function to the facade and checking to see if at least
one property was found. If one property was found, then the initial intention of the
search was satisfied, and a configuration entry was found. This is what I had intended
to do with the initial implementation, so it felt good to correct this issue.
But that caused another issue to manifest, one that was never an issue before the first
issue was fixed: what if no configuration values were ever found? As there are cases
where the application needs to know nothing was found and cases where it just wants
some default behavior, I added the always_return_facade
argument. With this argument
set to True
, the first facade is returned instead of a None
object.
With that part of the configuration fixed, it was now time to add the plugin
workflow.
Boring Support For Plugins¶
One of the main reasons that I added a complex command line interface was to support
multiple workflows. As I felt that the scan
workflow was now developed enough to
test, it was time to implement the plugins
workflow. In my design for this workflow,
I wanted to support a listing of all plugins and summary information, as well as more
focused information for a specific plugin. Much like how I added the scan
and
plugins
subparsers to the main argparse
instance, I added the list
and info
subparsers to the plugins
subparser.
From there, it was just a matter of handing off control at the right spot to each of those processors, as such:
def handle_argparse_subparser(self, args):
"""
Handle the parsing for this subparser.
"""
subparser_value = getattr(args, PluginManager.__root_subparser_name)
return_code = 0
if subparser_value == "list":
self.__handle_argparse_subparser_list(args)
elif subparser_value == "info":
return_code = self.__handle_argparse_subparser_info(args)
else:
PluginManager.__argparse_subparser.print_help()
sys.exit(2)
return return_code
So why did I start the title for this section with the word “Boring”? It is because
once I added subparsers on the main argparse
instance, this was just copying that
pattern and pasting it to the plugins
subparser. The list
subcommand was a simple loop
through
all available plugins and adding values to a string to display. The info
command
was just locating a given plugin and displaying the information for it. After the
rest of things that have been added to this project, that was quite boring. Necessary,
but boring.
Almost Boring¶
Two exceptions did stand out though. The first one was the use of the Columnar
library for output. This library is very effective in determining how much screen
real estate the application has, and sizing columns to fit that screen size. That
was very useful, instead of having to create that functionality myself.
The second exception was to mimic the results of the glob
library to work on
in-memory strings. Given a command line of pymarkdown plugins list plugin-id
, it
should be obvious that only the plugin with the id or name plugin-id
should be
listed. But what about the command line pymarkdown plugins list md??9
? If the
strings were filenames, then the plugins that started with md
, contained two
more arbitrary characters, and finally the digit 9
would match. Why could I not
do that with plugins and their identifiers?
Thankfully, due to my earlier work to reduce the allowable text for plugin ids and plugin
names to a smaller subset of values, this effort was made easier. As I did not
have to worry about the *
or ?
characters appearing natively in the strings that
I was looking at, the regular expression to satisfy my required glob functionality was:
list_re = re.compile(
"^" + args.list_filter.replace("*", ".*").replace("?", ".") + "$"
)
For those readers not familiar with regular expressions, the ^
at the start of the
expression and the $
at the end of the expression are anchors. Together these specify
that the expression inside of those characters must match the entire string, not just
being found somewhere inside of the string. For the rest of the expression, I used the
.
character in two forms. As the .
character represents any single character in
the expression, it was a perfect match for the ?
glob character. As an extension of
that, the *
glob character represents any number of arbitrary characters. To match
that in the expression, the sequence .*
matches zero or more instances of any character.
After satisfying all the scenario tests I had written, and adjusting output to match
the columnar output, the plugins
subcommand tests were all passing. Things were good
with those scenario tests. I now had to expand my scope to all the scenario tests in
the project.
Altered Tests¶
When the change was made to the command line, I expected most of the existing scenario
tests that deal with the command line to fail, and I was not disappointed. For most of
the existing tests, I just needed to add the scan
keyword to the right position in the
arguments list, as that was the workflow that already had tests before this point. I
stopped counting those changes after I passed the count of 20.
But it was a very useful task. As I progressed through the tests, I was able to see how specifying the workflow or subcommand was enhancing the tests. It just gave each test a bit more context that I felt was missing before. I was able to find a couple of small issues that I would have otherwise missed because of the organization.
To be blunt, between the command line changes and tests to cover them, it just felt like the project was that much more complete.
Lather, Rinse, Repeat¶
After all that work, there were little things there that I needed to work on. I disliked how some of the code was organized, so I went back and tried to organize the code more thoughtfully along usage lines. I poured through the coverage report and looked for missing coverage, adding scenario tests to provide any coverage for missed areas. Just general cleanup stuff.
Adding Version Information For Plugins¶
With that work in place, it was time to see if I was able to add something useful to the project’s configuration without many issues. The something I had in mind was better version information for the plugins. From a plugin point of view, I figured it would be useful to know what specific version was being attributed to that specific plugin. From a plugin manager point of view, it would also be useful to be able to version the actual interface itself. Both would be useful going forward, and useful to relate to the user of the application.
Changing the FoundPlugin
class to accept these two new version fields was easy, adding
private
fields and property functions to access them. From there, I added those two parameters
to each of the existing plugins, setting active plugins to 0.5.0
and 1
and setting
inactive or “templated” plugins to 0.0.0
and 1
. After adding some support in the
__register_individual_plugin
function to ensure that both versions were in acceptable
formats, I was ready to make the big change to expose that information.
Okay, maybe big change is a bit misleading. In the __handle_argparse_subparser_list
function, I changed the setting of the display_row
variable to:
display_row = [
next_plugin_id,
", ".join(next_plugin.plugin_names),
str(next_plugin.plugin_enabled_by_default),
str(is_enabled_now),
next_plugin.plugin_version,
]
and the setting of the headers
variable was changed to:
headers = ["id", "names", "enabled (default)", "enabled (current)", "version"]
These were not big changes, but that was the point. This was a relatively small change and if things were organized properly, it should have required a relatively small number of changes. If that were not the case, I would have been worried.
And I was happy, as I had a good workflow to get output on the plugins like the following:
ID NAMES ENABLED (DEFAULT) ENABLED (CURRENT) VERSION
md001 heading-increment, header-increment True True 0.5.0
md002 first-heading-h1, first-header-h1 False False 0.5.0
md003 heading-style, header-style True True 0.5.0
What Was My Experience So Far?¶
While my efforts to have a completed configuration system with command line support is not fully implemented, this works moves the command line system a lot further along to being the beacon of the PyMarkdown project. I realize that I still need a generic way of specifying override values for those values specified in the configuration file, but this is still a big step forward. Now that I have the configuration and command line systems in place, I can spend a bit of reflective time looking for pieces of those systems that I missed while working on other issues.
I really am feeling that the project is coming together. While it has never felt like a hodgepodge of functionality, I always had a feeling that it was just not quite ready to show to the public. With each of these little subsystems that I am focusing on, that feeling is going away and is being replaced with a feeling of confidence that it is ready. In my books, that is a really good feeling to have.
What is Next?¶
After nicely wrapping up configuration and the command line, there was only one thing left to do before I seriously thought about a beta release. That was to resolve the remaining items in the Priority 1 section of the issues list.
-
Yes, I know that is not what this code snippet actually does. I found that out by adding tests later in on in the process. I fixed this issue later, in the section named Fixing An Old Issue. ↩
-
Unless I was purposefully creating bad data to test failure cases, that is the form that I passed into the plugins for configuration. ↩
Comments
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.