Summary

In my last article, I talked about my struggles to regain my confidence, since my crash at the end of May. This week, I got back to work and was able to deal with issues on my PyMarkdown project.

Introduction

It certainly has been a roller-coaster of a ride since my crash on my side projects at the end of May. By personal life has been great and my professional life has been great. And while I usually try my best to keep my side projects separate from the rest of my life, I decided to borrow a bit of positivity over the last couple of weeks to get me over the rough spot. Luckily, it was with good results.

Easing Back Into Addressing Issues

Looking at the commit list for this article, I was initially a bit disappointed that I have not done more to fix issues since the version 0.9.7 release. After all, it has been three weeks since that happened. But as I sat back and thought about things, it has been a productive few weeks. Most of the work has been about me trying to get back to where I was before, but that still counts as work.

More Testing Of Nested Blocks

One easy thing that I did was to go through the tests in the nested_three directory and split them out into two separate files. Initially, I did not think the collection of scenarios would grow to be so large, but it did. Splitting each of those files in half helped a bit, which was good enough for a while. At the very least, it was easier to find things when I was looking for samples by nesting types instead of specific scenarios.

Looking at a series of tests to get cleaned up, I started working on the Block/Block/Ordered tests, otherwise known as the tests in the test_markdown_nested_three_block_block_ordered.py file. Looking at the size of that file, it took me less than five seconds to think about splitting that file into the test_markdown_nested_three_block_block_ordered_max.py file and the test_markdown_nested_three_block_block_ordered_nomax.py file. That decision was an easy decision.

Issue 408 - Cleaning Up Tests

Given those files, I worked through and validated every test and filed clear issues for the things that needed to be fixed. That also was an easy decision. While the commit just shows the file being split into two, there was a lot of work that went on to get there. I went through and plugged each of the Markdown inputs into the CommonMark parser, ensuring that the produced HTML output in the tests were correct. Visually, I checked to see if the tokens looked correct, especially noting down a few instances where I thought the tokenization was off a bit. While the main part of the tokens were all fine, I was looking for issues with the recorded whitespace. There were some surprises, so I noted them down, and moved on.

Issue 413 - Filling Out The Scenarios

This issue is where a lot of the initial time was spent on this block of work. During the cleanup noted above, I noticed that there were a couple of experimental tests that were failing. Nothing major, but enough to cause me concern. Both scenarios involved the line following either the main list start or a new list item, and a line with not enough indent to meet the list constraints. As an example:

>
> >
>   1. list
  > item

In this case, line 3 starts a new list that is contained within a single block quote. The following line keeps the block quote active (starts are by indent count, not column number), but does not have any indentation to keep the list going. Thankfully, line 4 is treated as a paragraph continuation of the “paragraph” started on line 3, and the list item and the ordered list are both closed after line 4.

I cannot remember exactly which one of these situations forced me to look at these combinations, but it was enough to put some work in to find out the health of these scenarios. For the example above, since I was dropping the indent to the level of the list item, I added _drop_ordered to the test function name. The function with the suffix _drop_ordered_block took back the indents to the visual level of the block quote character and the function with the suffix _drop_ordered_block_block removed all indentation.

After adding 185 tests, all the combinations were covered and only seven tests in those 185 were marked as skipped. It was exhausting, but I had a complete picture of how things looked with dropping of the indentation. Part of that was due to some work I did along the way. Those issues were simple enough that I thought it was best to fix them as I went.

Empty Lists And Nested Blocks

The first of the two issues that I found and fixed dealt with an empty list item:

   >    >    1.
   >    >    item

The example might look simple, but there was an issue. If there is text at the end of line 1, then the text in line 2 becomes part of a paragraph continuation. Because that text is not present, the paragraph continuation does not take effect, leaving the text item to be part of a paragraph outside of the list. At least, that is what was supposed to be parsed. Due to the benefits from the increase logging that I mentioned a couple of weeks ago, I was able to diagnose this issue rather quickly, making a change to the __calculate_current_indent_level function to properly shut down the list before the text item.

We All Start Together

The second issue was a rather tricky issue with two block quotes and an ordered list all starting on the same line. While the tokens were being parsed correctly, the recombining code from the verification was off. Doing a bit of legwork, I was able to come up with a way to detect the bad whitespace and account for it.

To be honest, the whitespace issue is a bit of a tricky issue for me. The fact that I must adjust my recombining code to account for weird situations does not sit well with my values. If possible, I want there to be clear guidelines on how the whitespace gets put back together. If I have those guidelines, I can document them and explain them anyone who wants to write plugin rules. Right now, I approach that, but especially when it comes to containers, I fall short of that goal.

But for me, that is a slippery slope. Based on a quick scan though the transform_to_markdown.py module, I would guess that approximately 1500 lines of the 2500 lines are dedicated to handling containers and their special cases. Some of that code is necessary, and some of that code is hacks to deal with improper tokenization. However, if all the special cases only deal with whitespace and not the rest of the content of the rehydrated Markdown, I am somewhat okay with that. Sure, I would like to remove it, but I am not sure if the benefit is there to justify the cost. At least, not yet.

Issue 410 - Cleaning Up Whitespace

While I was able to fix a couple of instances of whitespace issues, there were other instances where I knew I was going to need more than fifteen minutes to diagnose the failures and fix it. With seven tests to address, it took a bit of time to find the first case, and luckily enough, the rest of the cases were all variations of the first.

In all cases, the parser was applying the leading space, resulting in twice the whitespace appearing in the rehydrated Markdown. They all tracked down to one of two scenarios. The first was that the leading space was being stored in the list token, but the internal variable representing the line was not being updated to account for that. Without that updating, the parser saw the whitespace and rightfully added it to the text. The other scenario was that the string was fine, but the index pointing into the string was not properly adjusted. Different cause, but same effect.

Issue 420 and Issue 421 - Revisiting Old Friends

After going through those fixes, I looked at the other scenario tests that were currently disabled and ran through them to see if they were fixed or easy to fix. The first one I addressed was test test_list_blocks_271a, followed by tests test_list_blocks_271c, test_list_blocks_270c, and test_list_blocks_270a. While these tests were not at once fixed by the previous work, they looked like they were close enough that it made sense to fix them.

Following good practices, I went through each of the tokens by hand and verified that they were correct, or at least looked correct. After doing silly things such as counting the number of > characters in the whitespace area of the tokens, everything looked good. Looking at the debugging from the Markdown rehydration, it looked okay as well, but something was off.

It took me about an hour of work, with futzing on a project outside and grabbing some food, before I figured it out. In certain cases, when the container text was being added back into the Markdown, the index into container token whitespace was off. And that little discrepancy was just enough to point to the wrong whitespace part, which in return added the wrong whitespace to the rehydrated token.

Issue 407 - Adding Alternate Extensions

Looking over the work to do in the issues list, I reviewed each issue and came across this request from a user. In his case, due to a preprocessing need, the Markdown he wants to scan is in a file that has a different extension that .md. As such, he asked if it would be possible to support alternate extensions to scan.

I do remember thinking “why doesn’t he just change the extension, scan, and then change it back?” I also remember me following up that thought with “I don’t think I would do that unless I REALLY had to.” As such, I started working on supplying support for alternate extensions. It is currently in the main code base with the --alternate-extensions argument, and I hope to get some time next week to properly document it for the next release. It is just a simple argument that takes a comma-separated list of extensions, with a default of .md. It was easy to write, easy to test, and hopefully the user will like it.

Despise My Earlier Opinion

Despite what I had through previously, I believe I did have a good couple of weeks fixing issue. It was not glamorous and exciting, but there was a lot of good, solid work.

The one thing that I want to call out are the changes I made to the Container Block Processor and the grab bag object. When I added support for that object in a couple of weeks ago, I was not sure how much difference it would make. I had hoped it would make a significant difference, but I was not sure.

I am now sure. Instead of having to add debug statements to keep track of variable state, it was all there in the log file. It is a small amount of work to look back in the log file to see what the last value is. But that amount of work is nothing compared to adding in a log statement, making sure it has the right information, and running the tests again to see what the value is. In my opinion, it was just an order of magnitude better.

A Final Note

While it is true that I do the work on these projects and these articles, it is a group of people that help me in many ways that enable me to do these things. I have known for 51 years a certain guy who, despite our issues in our first twenty years, has stuck by me and I by him. We have been through some rough patches together, but I know I can just send an email, a text, or a phone call, and he will listen with a wisdom that I would have never guessed at. Yup, I am talking about my brother, Mike.

I am proud to say that I am going to be spending time with that gentleman and his fiancé this weekend, as they tie the know and make it official. And as Mike is family and a dear friend who I have not seen since this pandemic thing started, my wife and I are going to take some time to enjoy the socialization. Best wishes to Mike, and I will be back in a couple of weeks! Stay Tuned!

Like this post? Share on: TwitterFacebookEmail

Comments

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.


Reading Time

~9 min read

Published

Markdown Linter Beta Release

Category

Software Quality

Tags

Stay in Touch