Summary

In my last article, I focused on refactoring that was performed to clean up the code base of the PyMarkdown project. In this article, I talk about how I started to learn about finding and implementing performance improvements in the PyMarkdown project using cPython and SnakeViz.

Introduction

While I have mentioned before that I was starting to investigate performance measurements for the project, it was time to bring those measurements and my observations on those measurements into the light. Based on initial measurements, I knew there was a big performance degredation going on with the logging library and my use of it. If I wanted the PyMarkdown project to be even remotely performant, I knew I needed to deal with at least that performance refactoring. At the same time, as I knew I would need to apply other performance improvements on the project down the road. So, I just figured it was a good time to learn about the analyze-deduce-change cycle of performance metrics and how to perform it with a Python program.

Basically, I needed to spend some time learning how to properly measure and improve the performance of the PyMarkdown project. I was stoked!

What Is the Audience for This Article?

While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please go to this project’s GitHub repository and consult the commits between 03 Feb 2021 and 13 Feb 2021.

Dipping My Toes Into the Water

Having upgraded the PyMarkdown project to Python 3.8, I was able to use two Python tools that I was eager to learn from: cProfile and SnakeViz. The best way to think about these tools is that cProfile allows the Python interpreter to collect information on what happened during the execution of a program and SnakeBiz is a useful way to present that information. While I have tried to use cProfile before to analyze this project, I found it too cumbersome to read effectively. For my use, I needed a GUI display that I could more easily navigate through. For me, SnakeViz provided just that interface.

Getting Started

Getting started using these tools was easy. In the project’s home directory, I have a script, named run_blog.cmd, that I use to run the PyMarkdown project over all my articles for this site. This script is very basic and very simple:

pipenv run python main.py --stack-trace -d MD018,MD020 ../blog-content/website/content/articles/SoftwareQuality

So, I copied that script to a file named prof_blog.cmd and changed the contents of the script file to:

pipenv run python -m cProfile -o p0.prof main.py --stack-trace -d MD018,MD020 ../blog-content/website/content/articles/SoftwareQuality

That was it! Nothing else was needed. Running that new script, a new p0.prof file was deposited in the project directory. To save time, I created a new file (with yet another imaginative name) named show_profile.cmd and place the following text into that file:

pipenv run snakeviz p0.prof

Running that script, it was obvious that it started a webserver on the command line, and then proceeded to open up a new tab in my browser to show me the results on.

Looking For The Obvious Starting Point

I will not go into how to use SnakeViz, as I believe the SnakeViz homepage does a real good job of walking users through the required steps. It was presenting a lot of information, and I needed to find a way to focus and walk through the data. While I did confirm that I need a GUI to help me walk through the data, it was not the graphs that were helping me out. It was being able to click on various parts of the tables and dig down into the performance of the items that I clicked on. That is what really helped me understand things!

After getting more comfortable with the output, I cleaned everything up and reran the prof_blog.cmd script, followed by the show_profile.cmd script and was greeted with this information:

Before Logging

As this learning was being done in the background as I was working on the project, I let that picture sink in for a bit while I thought about the implications in my spare time. But the results were obvious to me. Based on what I was seeing, the first 5 items in the performance list were taking 54 seconds of execution time, with none of those items having anything to do with the processing. That time was being spent processing all the debug information that I had in place to help me diagnose issues. But as far as I knew, with the project at its default settings, the project was using the WARN debug level, and should not be outputting anything to the logs. As it should not be outputting anything to the logs, it logically followed that it should not be creating debug strings that are used to place information in those logs. I was stumped.

Another Try With A Fresh Mind

After fixing some issues in the project, I returned to this problem with a fresh mind. This time, as I looked over the code, I noticed the issue almost right away. While the information was not being output to the logs, it was being prepared and formatted assuming it was being added to the logs. A good example of this is this snippet of code:

    LOGGER.debug(
        "coalesce_text_blocks>>>>%s<<",
        ParserHelper.make_value_visible(first_pass_results[coalesce_index]),

Whether the information gets logged or not, the interpreter is going to go through the process of preparing the string passed as a parameter into the debug function. In this case, it is going to dereference the first_pass_results list using the index coalesce_index and passing that into the ParserHelper.make_value_visible function. Yup, that is the same make_value_visible function that is in the number 3 position on the above list. To make matters worse, the first_pass_results list contains tokens that made it through the first pass of parsing. This means that the debug_string function of the MarkdownToken class is being called for the token inside of the make_value_visible. And yes, that is the same debug_string that is in the number 1 position of that same list.

Basically, every time that line is hit, the interpreter prepares the data, whether it is needed or not. For this line, that contributes to the debug_string function being in position 1 of that list and the make_value_visible function being in position 3 of that same list. It was then that I knew that I needed to find a way to log the information but delay evaluation of any parameters passed to it until the logger knows that it will be logging that information.

The Benefits Of Experience

Having faced issues like this before, I had some ideas on how to proceed, I just did not know if Python supported any of them. The best approach that I had in my developer’s toolkit was to use a Facade pattern to create a class that would handle any transformations of the parameters that I required, but only apply those transformations if the information was actually going to be logged.

Throwing something together quickly, I came up with the following:

class ParserLogger:
    def __init__(self, my_logger):
        self.__my_logger = my_logger

    def debug(self, log_format, *args):
        if self.__my_logger.isEnabledFor(DEBUG):
            msg = self.__munge(False, log_format, args)
            self.__my_logger.debug(msg)

    @classmethod
    def __munge(cls, log_format, args):
        split_log_format = log_format.split("$")
        recipient_array = [None] * (len(split_log_format) + len(args))
        for next_array_index in range(0, len(recipient_array)):
            if next_array_index % 2 == 0:
                recipient_array[next_array_index] = \
                    split_log_format[int(next_array_index / 2)]
            else:
                recipient_array[next_array_index] = \
                    ParserHelper.make_value_visible(
                        args[int(next_array_index / 2)])
        return "".join(recipient_array)

Hopefully, the __init__ function and the debug functions are self-explanatory as they were the simplest parts of the class to write. The rest took a bit of work to get working right, but it was worth it. Instead of passing in logger-formatted strings, I decided to use my own formatting. Therefore, in place of using the %s sequence to key a substitution as with normal logging calls, I opted for the simpler $ sequence. For debugging uses, I do not want to worry about what type of object is being passed or making sure it is cast to a string type. That is why I created the make_value_visible function. But to seamlessly incorporate those requirements together was going to take some work.

The first thing that I needed to do was to create a list that would hold the information. This meant that I had to have enough space for each part of the log_format string between $ characters and for each argument passed into the function.1 With that list created and initialized to None instances, I then needed to populate that list. The function iterates from 0 to the index that is 1 less than the size of the list. When the index is an even number (if next_array_index % 2 == 0:), the next item is populated with the information from the log_format string split on the $ character. When the index is odd (the else clause), the next item is populated with the results of the next argument being processed through the make_value_visible function. Essentially, between the module operator % and the divisor operator /, the loop takes elements from each list, alternating which list it is taking a value from.

For me, the important part to concentrate on is that the formatting of the information, including the call to the make_value_visible function, is only done after the debug function has determined that debug logging is enabled. If it is not enabled, the only overhead was calling the function with the specified parameters and the if statement at the start of the debug function. For me, that was ideal!

And to be honest, in the first couple of iterations, I had to work on how the list was populated with the information from the other two lists. In the end, the modulo approach was the most efficient and most tidy, so it won out. I tried to be fancy with how I was populating the lists, but they did not pan out. The clear winner was the above solution that used little code and basic high school math.

Almost There

It only took a couple of hours to come up with the ParserLogger prototype, which was encouraging. Writing a couple of simple throwaway scenario tests, I was able to verify that it was working properly almost right away. There was just one small problem. The source of the information.

When any good logger logs information, it includes information that describes where the log information was submitted from. For example, in its default configuration, the Python logging library output for our sample logging line is:

DEBUG    pymarkdown.coalesce_processor:coalesce_processor.py:26 coalesce_text_blocks>>>>[text(4,1):end:]<<

But with the current implementation of the ParserLogger class, the same logging location was being logged for each line of information: the line in the ParserLogger class where the lower-level logger was being called. From my experience, I expected this, so I turned to the Python logging documentation to look for something very specific. On every platform where I have seen it supported, the thing I was looking for was a parameter to the debug function that, on logging packages where it is supported, always had “stack” in the title. This information helped me to weed out parameters that were not helpful, landing on: stacklevel. Directly from that documentation:

The third optional keyword argument is stacklevel, which defaults to 1. If greater than 1, the corresponding number of stack frames are skipped when computing the line number and function name set in the LogRecord created for the logging event. This can be used in logging helpers so that the function name, filename and line number recorded are not the information for the helper function/method, but rather its caller. The name of this parameter mirrors the equivalent one in the warnings module.

Loggers are not a new feature in programming languages, and logging helpers are almost as old as the logging features in those programming languages. It stood to reason that Python would have support for this, but I was surprised to find out that it was only added in Python 3.8. Lucky for me, as the workarounds are workarounds that I would prefer not to resort to.

Armed with this information, I changed this line in the sample:

    self.__my_logger.debug(msg)

to this line:

    self.__my_logger.debug(msg, stacklevel=2)

and everything just worked. After that, I did a bit of cleaning up to make the class more production ready before applying it on a grand scale.

The Big Changeover

While this would not be last big performance changeover in the history of the project, it was the first. As changeovers go, it was easy to do but large in scope. I searched for the text:

LOGGER = logging.getLogger(__name__)

at the top of each file and replaced it with the text:

POGGER = ParserLogger(logging.getLogger(__name__))

From there, it was a simple manner to search through the files, looking for any instances of LOGGER.debug and changing them to POGGER.debug. As I changed the module the name of the variable from LOGGER to POGGER, I also looked for any instances of %s in the string and transferred them over to the new $ syntax. Finally, if there were any calls to str or make_value_visible, I removed them, leaving only the bare minimum required to log the requested information.

A great example of this change is the sample logging line from above:

    LOGGER.debug(
        "coalesce_text_blocks>>>>%s<<",
        ParserHelper.make_value_visible(first_pass_results[coalesce_index]),

which was changed to:

    POGGER.debug(
        "coalesce_text_blocks>>>>$<<", first_pass_results[coalesce_index],
    )

Smaller, more compact, and no need to call the make_value_visible when passing in the value. A win from many viewpoints!

It was a long process, but I was able to check along the way to see if the scenario tests were still passing, correcting any errors as I went. This helped me by breakig up that long, boring task into smaller sections. Then, after a lot of search-and-replace work, I completed the last change and ran the performance metrics again, and I was greeted with this information.

After Logging

Phew! All that work paid off. From experience, I knew I was very lucky to have something that was so obviously off balance as my first performance refactoring. I went from having the first 5 items in the performance list taking 54 seconds to a slightly different set of 5 items taking 1.446 seconds to complete. That was a win.

I want to stress again for any readers: this was luck. In my mind, there was a greater chance that I would be met with a different set of 5 items at the start of the list that were harder to improve. This was a slam dunk of a change because it was so obvious. Keeping that in mind, it was time to move on to other tasks.

About That Logger Name?

Oh… POGGER? Why POGGER? I thought that PERFORMANCE_LOGGER was too long. I thought that PLOGGER was too short and I felt it might be confused with LOGGER on a quick read. After a couple of simple experiments, with me as the guinea pig, I was clearly able to distinguish between POGGER and LOGGER in some test files. Because of the different replacement sequence ($) and no need for calling str around non-string parameter, I felt that having that distinguished difference was important.

I wish I could say that I could keep away from these tokens, but as they are the most troublesome token, I guess I do not get my wish. During the refactoring that was documented in the last article, I suspected that things might not be good with Link Reference Definitions if they were inside of Fenced Code Blocks or HTML Code Blocks. Based on a couple of quick tests, those suspicions were confirmed. For some reason, the logic for preventing a Link Reference Definition from starting within one of those two blocks was either missing or not working properly. Time to debug!

Since this article is mostly about me learning about performance tuning in Python, I will not go into too many details. Let’s just say that the code that I thought I had in place to prevent that happening was not there. Even more than that, after working on it for a good solid four hours in the evening, I only got most of the scenario tests working, with nine scenario tests still left as disabled. As these scenarios are very much off the beaten path, I felt okay in marking them as disabled and handling them in the future.

While it probably sounded better in my head than when I typed it in, it did feel like a cheesy name for an old 1980s series that was thankfully not produced. This task was a simple one: as much as possible, merge the code to handle link tokens and image tokens. The interesting thing was that this was already partially done. In a fair number of areas, such as the __calculate_link_and_image_deltas function, there was already code that handled both tokens:

        if current_token.is_inline_image:
            active_link_uri = current_token.image_uri
            if current_token.pre_image_uri:
                active_link_uri = current_token.pre_image_uri
            ...
        else:
            active_link_uri = current_token.link_uri
            if current_token.pre_link_uri:
                active_link_uri = current_token.pre_link_uri
            ...

It just made sense to make it official and merge them instead of having this artifical divide between the two tokens.

The first part was relatively easy. Starting with a new base class for both tokens, I carefully moved any member field that had the exact same name in both classes to the base class. At that point, there were only two fields that needed special attention and one field that was distinct to the Image token: the image_alt_text field. The two fields, image_uri and image_title were renamed to meet the Link token counterparts, link_uri and link_title, with a global search and replace taking care of any occurrences of the old name.

But I was not done yet! I had also wanted to simplify calls to those pairs of fields for quite some time. There are a couple of specific cases where the individual field is needed and the link_title field is used. In some cases, the original form of the data is needed and the pre_link_title field is used. To save space in the token, if both fields have the same value, the pre_link_title field is set to None. So, to make it easier to deal with that, I added the active_link_title property and the active_link_url property to fetch the value that I want.

After all that work, the code above was replaced with this one line:

        active_link_uri = current_token.active_link_uri

Single Call Special Character Resolution

This is another task that I have wanted to do for a while, and I am pleased with how well it proceeded. Along the way, I have needed to define special character sequences to deal with the dual representation of data in the token: actualized for the HTML rendering and source for the Markdown rendering. To accommodate this representation, to present the required aspect of that data, I have developed a series of functions in the remove_* and resolve_* namespace of the ParserHelper class.

But as I have been looking at the code in the last couple of weeks, one thought occurred to me about the various invocations of the various functions to remove those characters. Did I really need specific functions called all the time, or could I just call a “remove everything” function most of the time and be safe?

To answer that question, I created the resolve_all_from_text function and the remove_all_from_text function. For each function, I located what I thought was the most consistent ordering of the main set of functions and placed them in those two functions. I then went through the source code, replacing one of more instances of the main set of functions with these “shorthand” functions. In the 41 instances where these functions were called, either singly or in groups, there were only 3 instances where calling the “everything” function was detrimental to the operation of the parser. In the other 38 instances, multiple trips were avoided while cleaning up the code at the same time.

And Of Course, Some Cleanup

And what would a set of tasks be for me without some cleanup tasks? There really was not anything too spectacular that I changed in these tasks, just some little things here and there. I used my first list comprehension in the __calculate_for_container_blocks function and I believe I have determined that I cannot use any more due to the nature of the PyMarkdown project. It was not as hard as I thought to create, but I still need to figure it out some more to really understand it. In the HtmlHelper class, there were three functions that were using relatively big conditionals to determine whether a character was valid for the given HTML construct. After some research to confirm that I messed up the _ character and the - character in one of the conditionals, I replaced all three with a simpler in clause.

As always, nothing too stellar, but a number of small items that made me feel better about the state of the project.

What Was My Experience So Far?

When I said in the Introduction section that I was stoked about this, I was not kidding. Two constants in my development career that have always kept me young and looking forward is that I love challenges and I love to learn. But instead of learning finely tuned concepts, I try and learn those concepts and the “meta” ideas behind them.

Take performance tuning on a program. To be honest, with few exceptions, the language that the tuning is being applied to and the program that the tuning is being applied to are not as important as the approach to tuning. And that is where performance tuning scares developer with faint hearts because of one glaring truth about performance tuning. There is a honeymoon period of tuning, where there are some low hanging “performance” fruit that are easy to solve. After that, the tuning is more miss than hit. From my experience, and the experience of others, once you are in that second phase, you must be prepared to fail way more often than succeed.

And that scares some people. I know it sometimes scares me. I like to be able to make a change and see that change. But performance tuning can be a lot of “nah, the readability suffers too much for such a small improvement” or “nah, that difference in performance is not significant”. And even for good performance trained developers, the percentage of “good” experiments often does not exceed 10% once past the honeymoon period.

That is where my experience comes in. I knew I was lucky to have such an obvious gift presented to me in terms of where the performance hit for the project was. I was also appreciative that previous rounds of performance improvements on prior projects allowed me to know an approximate name of the parameter that I wanted to look for. And I am also aware that the honeymoon period for finding the easy solutions was not going to last much longer.

But, due to my experience, I was okay with that. I knew I was off to a good start, and I now had good solid experience in the performance measuring cycle with Python. So, as I said in the Introduction section:

I was stoked!

What is Next?

With one foot into performance improvements, it seems like a good time to go all in and to try and improve the performance of the PyMarkdown project some more. Stay Tuned!


  1. Yes, I am not doing any safety checking on this. I was doing a simple test to see if this would work. That kind of safety checking was added before I checked in the actual version of this class that made it into the repository. 

Like this post? Share on: TwitterFacebookEmail

Comments

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.


Reading Time

~16 min read

Published

Markdown Linter Road To Initial Release

Category

Software Quality

Tags

Stay in Touch