In my last article, I talked about more Type Hint work across the multiple projects that I support. In this article, I talk about starting to add proper plugin support to the Project Summarizer project.
I am not superstitious in the least, but now that I have started at my new job, I feel I can talk more freely about it. It is a fantastic opportunity and I get to help people. For me, those are two main pluses that I was looking for. That is what has been occupying most of my time in the last 4 weeks: getting that job and then starting at that job. Can you say “firehose of information” five times real fast?
But in the evenings, it was useful to have something I can work on and move forward. I know my day job is going to have good days and bad days, but I find the lack of pressure for my open-source projects to be gratifying. But I still feel that I want to move the bar forward with them. I want to do things that help people write better Python projects, and to that end, I continue working towards that goal. And this week, it was mostly about the Project Summarizer project.
Pull Request, Merge, Wait, and Repeat¶
Having spread the Type Hint work across the various projects that I maintain,
I spent a fair amount of time this week getting those projects upgraded to their most recent versions.
The actual process is easy: just click the merge button on the Pull Requests
in GitHub and confirm it worked. But while the
Pipfile has only a single line change, the
Pipfile.lock contains more information. And as those changes conflicted,
I had to merge each change, wait for the old Pull Requests to resolve again, and
then see if I can merge them.
For the most part, the task is that simple. Except when it is not. One of those
“not” cases surfaced this past week: a merge error with the new PyLint and PyMarkdown. The
scan increased the version from
2.13.5. An otherwise
simple change, but it had effects on scanning of the code. Where things were fine
before, it was now complaining about
too-many-branches in fourteen different
script files. It was time for me to knuckle down and get to my research.
Around three hours later, I had an observation and a partial solution. The reason
I say partial is that I had to change my
clean.cmd script to work around the
issue. It was better than nothing and I was able to move forward with that work
around, and that is what really matters. And of course, I logged a new issue against
PyLint on GitHub.
The issue itself? For some reason, PyLint version
2.13.0 and higher behaves differently
when I specify each individual package distinctly on the command line, such as:
pipenv run pylint --rcfile=setup.cfg pymarkdown pymarkdown/plugins
pipenv run pylint --rcfile=setup.cfg --recursive=y pymarkdown
pipenv run pylint --rcfile=setup.cfg pymarkdown/plugins/rule_md_033.py
rule_md_033.py was one of the failure cases. When I went to
check and module
rule_md_033.py by using the third command line, PyLint
did not report any errors. However, when I used the first command line, I got:
pymarkdown\plugins\rule_md_033.py:68:4: R0912: Too many branches (14/12) (too-many-branches)
Finally, after doing research on PyLint’s command line, I came across the
--recursive=y flag, and tried it out, producing the second command line. This
was the change I ultimately made to the
clean.cmd script to solve the issue.
While there was still an issue with PyLint, a usage point of view, the new approach
is a cleaner approach and solves the issue, so it is good. Not really three hours
of time wasted, but it was three hours that was used up just the same.
Adding Plugins To Project Summarizer’s Command Line¶
With that work being completed as a background task, I started developing a good command line interface for the plugins that matched my rough designs. While I had a couple of loose ideas on how to do it, none of them seemed right to me. Since I love experimenting with code, I just spent some “fun” time trying different approaches out until I found one that felt right and worked.
Brute Force - Just Get It Done¶
My first attempt was a brute force attempt at parsing the command line. In Python,
the command line is presented as an array in
sys.argv. The script that is
invoked from the command line is stored at index 0 with the rest of the items in
the array comprising the remaining arguments. To keep things simple, I added a temporary constraint that
new plugins could only be added at the start of the command line using the
But this brute force method just did not work for me. The first thing I had to
add to this method was a check that there was another argument after the
argument. Then I had to add a check to see if that argument started with a dash.
Then I had to add an option that the argument could include an
= sign and separate
the argument and the value that way. It just seemed like needless work.
Revise and Conquer?¶
I decided to look to refine that approach a bit more, and to see if that would
help me feel more positive about that approach. Instead of looking only at the start of
the array, I tried to implement a more complex algorithm to find that pattern at
any point in the array. The basic part of the algorithm was easy. It was the
boundary cases that were the problem.
Some of the boundary cases were already managed using the extra work that I did
in the first iteration of this approach. But I had other boundary cases to
consider. The big one was a nasty one to solve: what would my code to if another
argument wanted to have
--add-plugin as its value?
I started playing around to figure out what
argparse did for those situations
and found that any value in the command line that starts with a
is assumed to be its own argument. As such, I was safe in assuming that any instance
--add-plugin argument in the command line array would be a valid one.
But that finding got me thinking. What other cases was I missing? Were there combinations
argparse package was made to deal with that I did not know about? Weird
boundary conditions that I would have to mitigate later? To me, it just made more
sense to try and let
argparse oversee the heavy lifting, and for me to take advantage
of the package whichever way I could. But how?
Thinking Out Of The Box¶
I cannot remember exactly how I ended up with the idea for the solution, but it was just
one of those “why not? let’s try it and see if it works!” ideas. In this approach,
I decided to use the
argparse package twice, once for the
and once for the remaining arguments.
remaining_arguments = sys.argv[1:] additional_plugins_to_load =  while remaining_arguments: last_args = remaining_arguments[:] parser = argparse.ArgumentParser(allow_abbrev=False, add_help=False) PluginManager.add_plugin_arguments(parser) known_args = parser.parse_known_args(args=remaining_arguments) if known_args.add_plugin: additional_plugins_to_load.extend(known_args.add_plugin) remaining_arguments = known_args if len(last_args) == len(remaining_arguments): break
The Main Loop¶
It is not pretty to look. But let me break the code down.
variable is set to a slice of the command line arguments array, starting at
index 1. This ensures that only workable arguments and interpreted, and not the
name of the script. The next part to get right is the loop started on line 3.
For the main conditional, I decided to just condition the loop off whether there are still arguments to process. If all arguments have been processed, there is nothing left to do. The assumption that I made here is that the process of parsing the command line would consume the desired arguments as it goes. I was hoping to go with a “consuming” approach for the inner processing, so this fell in line with the rest of my thinking.
For the second conditional, I decided to use an
if statement and a
statement. I could have put this conditional into the main loop, but I felt that
it read better with these two statements at the end of the loop. To power this
conditional, I created a copy of the
remaining_arguments variable in
at the start of the loop. By making the inner processing consume the arguments,
I was able to have the conditional check if the last set of arguments (
has the same length as the current
remaining_arguments variable (
If those two lengths are equal, then no arguments were consumed by the inner processing,
and the loop ends using the
As I said above, I am not sure if the design of the loop is “correct”, but for me, it is very readable. There are two major conditions for exiting the loop, and each condition is distinct from each other. From my viewpoint, that just makes it more readable.
I must admit, I did resort to a bit of trickery to get this part working properly,
and not on my first try either. The first two lines were standard to each approach
that I tried: create a parser and add the
--add-path argument to it using a
call to the
add_plugin_arguments function. Why? By doing it this way, I was
able to parse that argument independently of the main parser, but still have the
help for the
--add-path argument show up normally. Basically, even though any
eligible instances of that argument would be removed by the inner processing, the
main processing help would still report on that argument properly.
The next line was the pivotal line:
known_args = parser.parse_known_args(args=remaining_arguments)
By invoking the
argparser package using the
parse_known_args function, I got
the benefit of return values were different than normal. With this function, the
argparse.Namespace typed value is returned as the first value in a
but the second value returned is an array
with all unused arguments. This made the rest of the inner processing easy, as
remaining_arguments = known_args
was all that was needed to figure out the remaining arguments and assign them to the proper variable. After adding the lines:
if known_args.add_plugin: additional_plugins_to_load.extend(known_args.add_plugin)
to add the parsed
--add-path argument to the array of plugins to load, the only
other change that was required was to wire up the
main function with the code:
remaining_arguments = self.__initialize_plugins() args = self.__parse_arguments(remaining_arguments)
and change the code in the
__parse_arguments function to parse the list that
was passed in instead of the default
sys.argv value. Some quick testing and
adding of use cases, and everything looked good.
Not Giving Up¶
That process was by no means easy. As my son often says “sometimes you have to
throw a lot of spaghetti against the wall just to find one that sticks.” The
important thing to me was that the approaches that I took did not feel right, and I needed
to keep on searching until I found something that did feel right. While that is
not always the right thing to do, I believe it was the right thing to do in this
scenario. It was because of that feeling that I kept on changing my approach
until I found that unorthodox solution with the
What is Next?¶
With the command line parsing for adding new plugins completed, I knew that I need to spend time adding a good array of tests. Following that, there was the adaptation of the plugin logic from the PyMarkdown project that needed to be completed. Not a clean copy, but an adaptation to suit the Project Summarizer project. Stay tuned!
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.