Summary¶
In my last article, I talked about my efforts to bolster the consistency checks I have built into the project. In this article, I decided to focus on Nested Container blocks.
Introduction¶
While I have confidence that most of the PyMarkdown parser is working the way that I want it to, there are two areas where I feel that I still need to shore up my confidence: Link Reference Definitions and Nested Container Blocks. Most of my lack of confidence in the Link Reference Definition code is in how it handles boundary conditions. As they are boundary conditions that are not hit that often, I am not too worried about users hitting those conditions too much.
Nested Container Blocks are a different story. I personally use these when authoring documents like this article. While I do not usually go more than two levels deep, they are still Markdown constructs that I use quite often. I would expect that, outside of Markdown power users, most authors will probably encounter the same issues. As such, increasing my confidence on Nested Container Blocks was something I wanted to work on.
What Is the Audience for This Article?¶
While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather than the solutions themselves. For a full record of the solutions presented in this article, please consult the commits that occurred between 01 Nov 2021 and 06 Nov 2021.
Issue 90 - Verifying Existing Rules¶
The first part of Issue 90 was a thorough look through the existing tests for Rule Md005, Basically, I was wondering if those tests were really fixed or if it just looked like they were fixed. Probably due to earlier work that I have done in the area, each of these tests worked flawlessly. They worked so well that I started to look at other tests in the area to verify if they were also working properly. An obvious place to look for me to look was Rule Md027. I have been doing a fair amount of work there in the last couple of months, and it is a meaty module with a decent number of moving parts. Basically, a good place to look.
It was hard for me to figure out exactly when I noticed something was wrong and what
the exact thing was that I noticed was wrong, but there was something there. I
could not figure out exactly what it was, only that something was “off”. As I have
valuable experience following my gut feelings about code and finding something,
I decided to try experimenting, and as a result created eight new scenario tests.
To do this, I just started
looking through different test files and started making changes. I then went to
execute those new scenario tests, and a couple of them failed, raising AssertError
exceptions in Rule Md027. Both had the same issue: the Block Quote index
was off.
Taking A Good Look… And Making A Decision¶
Digging into what I had just found, I noticed a couple of things. The first thing was that the handling of the Block Quote index was off. The second was that everything else seemed to be in decent shape. Trying extra variations on nested containers elements and other elements inside of them helped me figure that out. It seemed to be that the index field for the Block Quotes were the only things that were not correct. If that was true, then I could get a lot of mileage out of fixing the index logic.
The further I debugged into the code, the stronger that my confidence in that observation’s accuracy increased. There were a couple of small issues that, if I could get them resolved, should get me most of the way to stabilizing Rule Md027 and Block Quotes. That would also raise my confidence in Nested Container Blocks, so there was a direct payoff. But it was going to be arduous work for a couple of days, instead of addressing other issues. Was the tradeoff worth it?
In the end, I decided yes. Improving my confidence in Nested Container Blocks by adding more scenarios for Block Quote elements with other nested elements was a win for both my confidence and the project. As such, I decided that was the way I wanted the week to play out. For better or worse!
Getting Down To Work¶
Before starting on the actual changes, I wanted to start the week with some cleanup.
The main next_token
function for Rule Md027 was getting big, and since I was going
to be spending the week in that code, I wanted to split it up
to make it more maintainable. To that extent, I took the code to manage
the start of the Block Quotes and moved it into the new __handle_block_quote_start
function, the code to handle the end of the Block Quotes into the new
__handle_block_quote_end
function and so on. It made things
neater and easier to read, even with the many comments with debug code scattered
throughout the module.
After that, the first thing I noticed when I was debugging was that the Blank Line elements were not updating the index properly. From experience, I know that the order of the Blank Line tokens and the end List tokens are backwards from what they are expected to be. It was therefore no surprise to find something like this in the Rule Md027 code. To properly handle these Blank Lines, I figured out that I needed to wait until the right point after the any end List tokens were processed.
To deal with that, once a Blank Line token is found within an active
Block Quote element, I set the new __delayed_blank_line
variable to True
and added the following function:
def __process_delayed_blank_line(
self, context, token, num_container_tokens, allow_block_quote_end
):
if (
self.__delayed_blank_line
and not (
token.is_leaf_end_token
or (allow_block_quote_end and token.is_block_quote_end)
)
and (not self.__have_incremented_for_this_line or token.is_blank_line)
):
self.__have_incremented_for_this_line = False
self.__handle_blank_line(
context, self.__delayed_blank_line, num_container_tokens
)
self.__delayed_blank_line = None
Taking a while to get to this point, the debugging was worth it. Called from the
top of the __handle_within_block_quotes
function and the top of the
__handle_block_quote_end
function, this function delays the processing of a
Blank Line token until after any processing of required end tokens is completed.
The setting of __have_incremented_for_this_line
to False
? I will get to that
in a bit.
Like the logic for the Blank Line tokens, there was also a need to delay the
processing of the end Paragraph token. Working through the debugger and examining
the log files took a while to get right. But going through that, I noticed that
there were a few cases where the incrementing of the Block Quote
index from the end Paragraph token was doubling up with an increment for the token
that followed it. Specifically, this doubling was occurring for start Block Quote
tokens, List Item tokens, and end List tokens. To counter this effect, the
__process_delayed_paragraph_end
function was added that delays the increment
for the end of the Paragraph element until such time as any increments for the
listed tokens are dealt with.
Finally, after all that work to handle those two cases, there were a couple of
boundary cases that needed to be addressed. While not as __have_incremented_for_this_line
simple.
Issue 92 - Staying The Course¶
While on the path of cleaning up Rule Md027, and Nested Blocks in particular, I decided to stay on that path and clean up other code dealing with nested blocks. Having cleaned up the Block Quote index in the last section, I was hoping that I had found all the issues. But if I am honest, I was not convinced that I had. For me to be convinced that I had found them all, I really needed to do a deep dive and start throwing some serious levels of nesting container examples against the parser, and Rule Md027 specifically.
So, starting with the scenario tests and their data, I created forty-five new scenario tests and their test data. To do that, I started with nesting containers within each other up to three levels of nesting. When I was done with that, I added variations on each of the Leaf elements inside of a simple Block Quote element. Finally, I added variations on each of those Leaf elements to embed them within a List element within a Block Quote.
As I worked through those scenarios, I ended up adding eight new tests to the “extra” scenario tests to verify that I had found a new issue. While not related to those eight scenario tests, eight issues were added to the project because of working through those issues. I do not like to see parser issues in general, but it was not strange to me to see them here. As I have said at least a couple of times before, groups of alternating Nested Block elements are one area where I am not convinced that I have found all the issues.
And before I get into the work done to address the issues that I could, I want to point out that I did a bit of work with the debug strings to ensure that they are using f-strings where possible. Normally I do not expect debug strings to stay around for a long time, so I am not too fussy with them. However, with this rule and its complexities, I know that I have at least eight more visits to this code for each of the eight issues that are related to this issue. It just seemed prudent to clean them up if they are staying around for a while.
Refactoring The Code¶
As I was going through the new code for Rule Md027, I noticed that the code
I added for handling start List tokens was almost the same as the code that
I added to handle List Item tokens. While it was not a big effort, I decided
that it was best if I refactored the code into the new __check_list_starts
function.
Upon further examination, it also made sense to create a new __get_current_block_quote_prefix
function and a new __get_last_block_quote
function to deal with the
work in the __check_list_starts
function. This allowed me to streamline
the __check_list_starts
even more, simplifying the code to a good clean
level.
One of the other reasons that I went so far to do some refactoring was that I expect that I am going to need to work with those concepts after I deal with the issues that I found during the recent examination of Rule Md027. With eight issues open, I am confident that at least one or two of them are going to require me to change the code for the rule, and similarly confident that one of those is going to require one of the two new functions. And if I am wrong, the code has been tightened a bit and looks that much better.
Fixing Up The Remaining Scenario Tests¶
With the bulk of the other issues out of the way, the issues that were left were
ones where one of the elements fixed in the earlier sections were also firing
on any of those elements within List elements within Block Quote elements.
Taking a couple of minutes to think about these scenarios clearly, I was relieved that I
came up with a solution that was easy to implement. At the top of the
__handle_within_block_quotes
function, I added the following statement:
is_directly_within_block_quote = self.__container_tokens[
-1
].is_block_quote_start
From there, I added that new variable to the argument list of most of the handlers, using it to opt-out of any reporting of errors. With that in place, I started running tests, and watched as most of the tests started passing. Investigating the failing tests, I started to notice something. A decent number of scenario tests were not passing, even with these changes. What was the issue?
The Issue Was…¶
Lack of scenario coverage. As I mentioned earlier, I do not have confidence that I have all the required scenarios to test nested container blocks. After the work that I did this week, I know I am closer to that goal. However, even after that work, I still believe that I have not found every scenario.
As I know I found some new scenarios with this work, I believe that it was reasonable to expect that some percentage of those scenario tests would fail. And a small percentage did fail. To wrap up the work, I filed eight issues to investigate later. It was not much, but it was more than I had optimistically hoped for. But even though it was more than I had hoped for, I was still optimistic. After everything I have thrown at the parser so far, eight scenario tests failing was still a good metric.
But having met my goal of discovering what I had missed, I now had a good count of the issues to fix. Things were good for now, so I decided to move on and do something different for the last day of the week.
Issue 104 - Dial Home Device¶
Having been a science fiction fan since an early age, when Stargate SG-1 started airing in the last 1990s, I was hooked. Sure, it was a simple premise and did not have the international support of Doctor Who and Star Trek, but it had a certain flair of its own. With the show came its own plot device: the Dial Home Device. Without the correct coordinates to input into this device, the team was locked out of gate travel, stuck wherever they were. And yes, as those were the days of DVDs, I waited to get each season of the series as it came out on DVD, just to see the show and the highlights.
I believe it was there that I got the idea that any kind of decent software should have some way of checking to see if there are any upgrades to it, especially to solve issues. If there is a problem with one of the components on my car, I get mailed a recall notice and I must go to my local dealer to get it fixed. If one of the services is changing its terms, they must send out snail mail or email with the specifics on those changes. Following along from there, why should not an application be able to find out if there is a newer version and suggest upgrading it?
For me, the answer was a simple one. It should. As someone who uses editors like VSCode and Notepad++, repository browsers like SourceTree, and web browsers such as Chrome, I am kind of used to it. With each of those programs, a periodic check is made to figure out if there is an available upgrade. While the actual upgrade is left up to me, I am presented with that information to allow me to know to look and see if the upgrade is beneficial.
Design¶
The first part of the design for this feature was a quite simple rule: unless something was setup exceptionally wrong, any errors should be captured and logged, but should not stop the application. The purpose of this feature is to supply extra information about improvements to itself as an aside, not as a focus. That was the only one iron rule. Other than that, there were three simple parts that quickly fell into place: find the current version of the application, find the published version of the application, and find out how often to check.
Finding the current version was a simple part to design. As the setup code
already had a version of that code implemented, I just needed to make a couple of adjustments to that
code to make it capture all errors. Other than that, it remained pretty much the
same. Finding the published version was almost as easy to design. The pages
over at PyPi.org are static, so scraping the pages once every seven days
should not be a big deal. Since the request
package is one of the common ones,
its interface is very well known, and easy to work with.
That left the how often
question to be designed. Opting for simplicity, I
decided to borrow from greatness and create a .pymarkdownlnt
file in the user
directory with the last checked timestamp. If there were any errors reading this
file and interpreting the timestamp, the code should assume that a check is
needed. Otherwise, a simple comparison between the current timestamp and the
written timestamp will inform the module if seven days have passed since the
last check. To round that out, I decided to add a --force
type flag to the
command line to force the how often
algorithm to skip the check and check
anyway. To balance that out, the --disable
type flag would never check and
leave it up to the user.
Even though the designs of the individual components were simple, I took the time to do my usual due diligence and walk through the scenarios. And even with something so simple, there were a couple of tweaks that I was able to make to the design to make everything work better.
Implementation¶
Given that design, it was simple and methodical to implement. As far as the PyMarkdown core was concerned, this was the interface to it:
def __check_for_current_version(self, args):
if args.disable_version_check:
return
package_name = self.__package_name
if args.x_test_version_fault:
package_name += "xxxxxx"
helper = DialHomeHelper(package_name, PyMarkdownLint.__dial_home_expiry_in_days)
(
current_version,
version_error,
) = helper.get_semantic_version_from_version_module()
assert (
current_version
), f"Version information was not in the expected location: {version_error}"
update_message = helper.verify_version_is_currrent(
current_version, args.force_version_check
)
if update_message:
LOGGER.warning(update_message)
print(update_message)
Following from the design, the only thing that will cause this check to have a
critical error is during the extraction of the version from the PyMarkdown module.
Because this should only happen if something gets very messed up, the assert
statement is there to prevent that. Other than that, either nothing gets printed
out because everything is fine, or a simple warning message is presented to the
user and logged in the log file.
While I will not go into it in detail, the new code is the DialHomeHelper
module
was created and tested to try and ensure that all error paths were thought of
and mitigated. The get_semantic_version_from_version_module
function uses simple
techniques to get the __version__
field from the version.py
module compiled
within the project. Once that version is figured out, the verify_version_is_currrent
function is called to check to see if that version is the same as the version that
is currently registered at PyPi.org for
PyMarkdown.
The code itself is not the most complicated code I have written recently, but it does hit a mark within me for being some of the most satisfying work to complete. As the preface to the section indicates, having a process in place that lets a user know when there is an update available just makes sense to me. This work just made me feel that the project itself was that much more complete. If I had to pin it down to a concrete concept and not a feeling, I would say that I believe that useful applications should be able to dial home. As such, know that PyMarkdown can dial home, it is that much closer to being a useful application.
What Was My Experience So Far?¶
This week was interesting and fulfilling, but in weird and unusual ways. One
of the fun metrics that I keep in my head is that there are now only thirty-three lines
of issues in the Issues List between Priority 2
and Priority 3
. That was a
good improvement. And looking at that section of the list, at least half of the
items there will probably move to the Priority 3
section of the list or
to the GitHub issues list. Not bad. I also added over fifty new scenario
tests to the project and came away with only eight added issues that were not
resolvable right away. Decent for my work over the last week.
But my best output from the week was my improved confidence that I had the right design and right implementation for Nested Container Blocks. Sure, there were eight added issues that I needed to fix, but that is par for the course when talking about software development. But to have most of the 3 level and less variations on Nested Container Blocks working without problems is something solid. More importantly, it helps me reframe my confidence with Nested Container Blocks to a more positive level.
And that is what made this week interesting and fulfilling. Just run of the mill working hard to get things working together properly.
What is Next?¶
Having been highly creative in the last few weeks, I need to take at least a week off any do a tiny bit of brainless stuff. As such, I am going to start looking at a few products that are available to analyze open-source projects. Stay tuned!
Comments
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.