Project Summarizer

Summary¶

In my last article, I talked about more Type Hint work across the multiple projects that I support. In this article, I talk about starting to add proper plugin support to the Project Summarizer project.

Introduction¶

I am not superstitious in the least, but now that I have started at my new job, I feel I can talk more freely about it. It is a fantastic opportunity and I get to help people. For me, those are two main pluses that I was looking for. That is what has been occupying most of my time in the last 4 weeks: getting that job and then starting at that job. Can you say “firehose of information” five times real fast?

But in the evenings, it was useful to have something I can work on and move forward. I know my day job is going to have good days and bad days, but I find the lack of pressure for my open-source projects to be gratifying. But I still feel that I want to move the bar forward with them. I want to do things that help people write better Python projects, and to that end, I continue working towards that goal. And this week, it was mostly about the Project Summarizer project.

Pull Request, Merge, Wait, and Repeat¶

Having spread the Type Hint work across the various projects that I maintain, I spent a fair amount of time this week getting those projects upgraded to their most recent versions. The actual process is easy: just click the merge button on the Pull Requests in GitHub and confirm it worked. But while the Pipfile has only a single line change, the Pipfile.lock contains more information. And as those changes conflicted, I had to merge each change, wait for the old Pull Requests to resolve again, and then see if I can merge them.

For the most part, the task is that simple. Except when it is not. One of those “not” cases surfaced this past week: a merge error with the new PyLint and PyMarkdown. The Dependabot scan increased the version from 2.12.2 to 2.13.5. An otherwise simple change, but it had effects on scanning of the code. Where things were fine before, it was now complaining about too-many-branches in fourteen different script files. It was time for me to knuckle down and get to my research.

Around three hours later, I had an observation and a partial solution. The reason I say partial is that I had to change my clean.cmd script to work around the issue. It was better than nothing and I was able to move forward with that work around, and that is what really matters. And of course, I logged a new issue against PyLint on GitHub.

The issue itself? For some reason, PyLint version 2.13.0 and higher behaves differently when I specify each individual package distinctly on the command line, such as:

pipenv run pylint --rcfile=setup.cfg pymarkdown pymarkdown/plugins

instead of:

pipenv run pylint --rcfile=setup.cfg --recursive=y pymarkdown

pipenv run pylint --rcfile=setup.cfg pymarkdown/plugins/rule_md_033.py

The module rule_md_033.py was one of the failure cases. When I went to check and module rule_md_033.py by using the third command line, PyLint did not report any errors. However, when I used the first command line, I got:

pymarkdown\plugins\rule_md_033.py:68:4: R0912: Too many branches (14/12) (too-many-branches)

Finally, after doing research on PyLint’s command line, I came across the --recursive=y flag, and tried it out, producing the second command line. This was the change I ultimately made to the clean.cmd script to solve the issue. While there was still an issue with PyLint, a usage point of view, the new approach is a cleaner approach and solves the issue, so it is good. Not really three hours of time wasted, but it was three hours that was used up just the same.

Adding Plugins To Project Summarizer’s Command Line¶

With that work being completed as a background task, I started developing a good command line interface for the plugins that matched my rough designs. While I had a couple of loose ideas on how to do it, none of them seemed right to me. Since I love experimenting with code, I just spent some “fun” time trying different approaches out until I found one that felt right and worked.

Brute Force - Just Get It Done¶

My first attempt was a brute force attempt at parsing the command line. In Python, the command line is presented as an array in sys.argv. The script that is invoked from the command line is stored at index 0 with the rest of the items in the array comprising the remaining arguments. To keep things simple, I added a temporary constraint that new plugins could only be added at the start of the command line using the --add-plugin argument.

But this brute force method just did not work for me. The first thing I had to add to this method was a check that there was another argument after the --add-plugin argument. Then I had to add a check to see if that argument started with a dash. Then I had to add an option that the argument could include an = sign and separate the argument and the value that way. It just seemed like needless work.

Revise and Conquer?¶

I decided to look to refine that approach a bit more, and to see if that would help me feel more positive about that approach. Instead of looking only at the start of the array, I tried to implement a more complex algorithm to find that pattern at any point in the array. The basic part of the algorithm was easy. It was the boundary cases that were the problem. Some of the boundary cases were already managed using the extra work that I did in the first iteration of this approach. But I had other boundary cases to consider. The big one was a nasty one to solve: what would my code to if another argument wanted to have --add-plugin as its value?

I started playing around to figure out what argparse did for those situations and found that any value in the command line that starts with a - character is assumed to be its own argument. As such, I was safe in assuming that any instance of the --add-plugin argument in the command line array would be a valid one.

But that finding got me thinking. What other cases was I missing? Were there combinations that the argparse package was made to deal with that I did not know about? Weird boundary conditions that I would have to mitigate later? To me, it just made more sense to try and let argparse oversee the heavy lifting, and for me to take advantage of the package whichever way I could. But how?

Thinking Out Of The Box¶

I cannot remember exactly how I ended up with the idea for the solution, but it was just one of those “why not? let’s try it and see if it works!” ideas. In this approach, I decided to use the argparse package twice, once for the --add-plugin argument and once for the remaining arguments.

    remaining_arguments = sys.argv[1:]
    additional_plugins_to_load = []
    while remaining_arguments:
        last_args = remaining_arguments[:]

        parser = argparse.ArgumentParser(allow_abbrev=False, add_help=False)
        PluginManager.add_plugin_arguments(parser)
        known_args = parser.parse_known_args(args=remaining_arguments)
        if known_args[0].add_plugin:
            additional_plugins_to_load.extend(known_args[0].add_plugin)
        remaining_arguments = known_args[1]

        if len(last_args) == len(remaining_arguments):
            break

The Main Loop¶

It is not pretty to look. But let me break the code down. Initially, the remaining_arguments variable is set to a slice of the command line arguments array, starting at index 1. This ensures that only workable arguments and interpreted, and not the name of the script. The next part to get right is the loop started on line 3.

For the main conditional, I decided to just condition the loop off whether there are still arguments to process. If all arguments have been processed, there is nothing left to do. The assumption that I made here is that the process of parsing the command line would consume the desired arguments as it goes. I was hoping to go with a “consuming” approach for the inner processing, so this fell in line with the rest of my thinking.

For the second conditional, I decided to use an if statement and a break statement. I could have put this conditional into the main loop, but I felt that it read better with these two statements at the end of the loop. To power this conditional, I created a copy of the remaining_arguments variable in last_args at the start of the loop. By making the inner processing consume the arguments, I was able to have the conditional check if the last set of arguments (len(last_args)) has the same length as the current remaining_arguments variable (len(remaining_arguments)). If those two lengths are equal, then no arguments were consumed by the inner processing, and the loop ends using the break statement.

As I said above, I am not sure if the design of the loop is “correct”, but for me, it is very readable. There are two major conditions for exiting the loop, and each condition is distinct from each other. From my viewpoint, that just makes it more readable.

Inner Processing¶

I must admit, I did resort to a bit of trickery to get this part working properly, and not on my first try either. The first two lines were standard to each approach that I tried: create a parser and add the --add-path argument to it using a call to the add_plugin_arguments function. Why? By doing it this way, I was able to parse that argument independently of the main parser, but still have the help for the --add-path argument show up normally. Basically, even though any eligible instances of that argument would be removed by the inner processing, the main processing help would still report on that argument properly.

The next line was the pivotal line:

known_args = parser.parse_known_args(args=remaining_arguments)

By invoking the argparser package using the parse_known_args function, I got the benefit of return values were different than normal. With this function, the standard argparse.Namespace typed value is returned as the first value in a Tuple, but the second value returned is an array with all unused arguments. This made the rest of the inner processing easy, as this line:

remaining_arguments = known_args[1]

was all that was needed to figure out the remaining arguments and assign them to the proper variable. After adding the lines:

if known_args[0].add_plugin:
    additional_plugins_to_load.extend(known_args[0].add_plugin)

to add the parsed --add-path argument to the array of plugins to load, the only other change that was required was to wire up the main function with the code:

    remaining_arguments = self.__initialize_plugins()
    args = self.__parse_arguments(remaining_arguments)

and change the code in the __parse_arguments function to parse the list that was passed in instead of the default sys.argv value. Some quick testing and adding of use cases, and everything looked good.

Not Giving Up¶

That process was by no means easy. As my son often says “sometimes you have to throw a lot of spaghetti against the wall just to find one that sticks.” The important thing to me was that the approaches that I took did not feel right, and I needed to keep on searching until I found something that did feel right. While that is not always the right thing to do, I believe it was the right thing to do in this scenario. It was because of that feeling that I kept on changing my approach until I found that unorthodox solution with the argparse package.

What is Next?¶

With the command line parsing for adding new plugins completed, I knew that I need to spend time adding a good array of tests. Following that, there was the adaptation of the plugin logic from the PyMarkdown project that needed to be completed. Not a clean copy, but an adaptation to suit the Project Summarizer project. Stay tuned!

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments

Project Summarizer - Loading Plugins