It certainly has been a roller-coaster of a ride since my crash on my side projects at the end of May. By personal life has been great and my professional life has been great. And while I usually try my best to keep my side projects separate from the rest of my life, I decided to borrow a bit of positivity over the last couple of weeks to get me over the rough spot. Luckily, it was with good results.
Easing Back Into Addressing Issues¶
Looking at the commit list for this article, I was initially a bit disappointed that I have not done more to fix issues since the version 0.9.7 release. After all, it has been three weeks since that happened. But as I sat back and thought about things, it has been a productive few weeks. Most of the work has been about me trying to get back to where I was before, but that still counts as work.
More Testing Of Nested Blocks¶
One easy thing that I did was to go through the tests in the nested_three directory and split them out into two separate files. Initially, I did not think the collection of scenarios would grow to be so large, but it did. Splitting each of those files in half helped a bit, which was good enough for a while. At the very least, it was easier to find things when I was looking for samples by nesting types instead of specific scenarios.
Looking at a series of tests to get cleaned up, I started working on
the Block/Block/Ordered tests, otherwise known as the tests in the
test_markdown_nested_three_block_block_ordered.py file. Looking at the
size of that file, it took me less than five seconds to think about splitting that
file into the
test_markdown_nested_three_block_block_ordered_max.py file and
test_markdown_nested_three_block_block_ordered_nomax.py file. That decision
was an easy decision.
Given those files, I worked through and validated every test and filed clear issues for the things that needed to be fixed. That also was an easy decision. While the commit just shows the file being split into two, there was a lot of work that went on to get there. I went through and plugged each of the Markdown inputs into the CommonMark parser, ensuring that the produced HTML output in the tests were correct. Visually, I checked to see if the tokens looked correct, especially noting down a few instances where I thought the tokenization was off a bit. While the main part of the tokens were all fine, I was looking for issues with the recorded whitespace. There were some surprises, so I noted them down, and moved on.
This issue is where a lot of the initial time was spent on this block of work. During the cleanup noted above, I noticed that there were a couple of experimental tests that were failing. Nothing major, but enough to cause me concern. Both scenarios involved the line following either the main list start or a new list item, and a line with not enough indent to meet the list constraints. As an example:
> > > > 1. list > item
In this case, line 3 starts a new list that is contained within a single block quote. The following line keeps the block quote active (starts are by indent count, not column number), but does not have any indentation to keep the list going. Thankfully, line 4 is treated as a paragraph continuation of the “paragraph” started on line 3, and the list item and the ordered list are both closed after line 4.
I cannot remember exactly which one of these situations forced me to look at
these combinations, but it was enough to put some work in to find out the
health of these scenarios. For the example above, since I was dropping the
indent to the level of the list item, I added
_drop_ordered to the test
function name. The function with the suffix
_drop_ordered_block took back
the indents to the visual level of the block quote character and the
function with the suffix
_drop_ordered_block_block removed all indentation.
After adding 185 tests, all the combinations were covered and only seven tests in those 185 were marked as skipped. It was exhausting, but I had a complete picture of how things looked with dropping of the indentation. Part of that was due to some work I did along the way. Those issues were simple enough that I thought it was best to fix them as I went.
Empty Lists And Nested Blocks¶
The first of the two issues that I found and fixed dealt with an empty list item:
> > 1. > > item
The example might look simple, but there was an issue. If there is text at the end
of line 1, then the text in line 2 becomes part of a paragraph continuation. Because
that text is not present, the paragraph continuation does not take effect, leaving
item to be part of a paragraph outside of the list. At least, that is
what was supposed to be parsed. Due to the benefits from the increase logging
that I mentioned a couple of weeks ago, I was able to diagnose this issue rather
quickly, making a change to the
__calculate_current_indent_level function to
properly shut down the list before the text
We All Start Together¶
The second issue was a rather tricky issue with two block quotes and an ordered list all starting on the same line. While the tokens were being parsed correctly, the recombining code from the verification was off. Doing a bit of legwork, I was able to come up with a way to detect the bad whitespace and account for it.
To be honest, the whitespace issue is a bit of a tricky issue for me. The fact that I must adjust my recombining code to account for weird situations does not sit well with my values. If possible, I want there to be clear guidelines on how the whitespace gets put back together. If I have those guidelines, I can document them and explain them anyone who wants to write plugin rules. Right now, I approach that, but especially when it comes to containers, I fall short of that goal.
But for me, that is a slippery slope. Based on a quick scan though the
transform_to_markdown.py module, I would guess that approximately 1500 lines of
the 2500 lines are dedicated to handling containers and their special cases.
Some of that code is necessary, and some of that code is hacks to deal with
improper tokenization. However, if all the special cases only deal with whitespace
and not the rest of the content of the rehydrated Markdown, I am somewhat okay
with that. Sure, I would like to remove it, but I am not sure if the benefit is
there to justify the cost. At least, not yet.
While I was able to fix a couple of instances of whitespace issues, there were other instances where I knew I was going to need more than fifteen minutes to diagnose the failures and fix it. With seven tests to address, it took a bit of time to find the first case, and luckily enough, the rest of the cases were all variations of the first.
In all cases, the parser was applying the leading space, resulting in twice the whitespace appearing in the rehydrated Markdown. They all tracked down to one of two scenarios. The first was that the leading space was being stored in the list token, but the internal variable representing the line was not being updated to account for that. Without that updating, the parser saw the whitespace and rightfully added it to the text. The other scenario was that the string was fine, but the index pointing into the string was not properly adjusted. Different cause, but same effect.
After going through those fixes, I looked at the other scenario tests that were
currently disabled and ran through them to see if they were fixed or easy to
fix. The first one I addressed was test
test_list_blocks_271a, followed by
While these tests were not at once fixed by the previous work, they looked
like they were close enough that it made sense to fix them.
Following good practices, I went through each of the tokens by hand and verified
that they were correct, or at least looked correct. After doing silly things
such as counting the number of
> characters in the whitespace area of the tokens,
everything looked good. Looking at the debugging from the Markdown rehydration,
it looked okay as well, but something was off.
It took me about an hour of work, with futzing on a project outside and grabbing some food, before I figured it out. In certain cases, when the container text was being added back into the Markdown, the index into container token whitespace was off. And that little discrepancy was just enough to point to the wrong whitespace part, which in return added the wrong whitespace to the rehydrated token.
Looking over the work to do in the issues list, I reviewed each issue and
came across this request from a user. In his case, due to a preprocessing need,
the Markdown he wants to scan is in a file that has a different extension that
.md. As such, he asked if it would be possible to support alternate extensions
I do remember thinking “why doesn’t he just change the extension, scan, and then
change it back?” I also remember me following up that thought with “I don’t think
I would do that unless I REALLY had to.” As such, I started working on supplying
support for alternate extensions. It is currently in the main code base with
--alternate-extensions argument, and I hope to get some time next week to
properly document it for the next release. It is just a simple argument that takes
a comma-separated list of extensions, with a default of
.md. It was easy to
write, easy to test, and hopefully the user will like it.
Despise My Earlier Opinion¶
Despite what I had through previously, I believe I did have a good couple of weeks fixing issue. It was not glamorous and exciting, but there was a lot of good, solid work.
The one thing that I want to call out are the changes I made to the Container Block Processor and the grab bag object. When I added support for that object in a couple of weeks ago, I was not sure how much difference it would make. I had hoped it would make a significant difference, but I was not sure.
I am now sure. Instead of having to add debug statements to keep track of variable state, it was all there in the log file. It is a small amount of work to look back in the log file to see what the last value is. But that amount of work is nothing compared to adding in a log statement, making sure it has the right information, and running the tests again to see what the value is. In my opinion, it was just an order of magnitude better.
A Final Note¶
While it is true that I do the work on these projects and these articles, it is a group of people that help me in many ways that enable me to do these things. I have known for 51 years a certain guy who, despite our issues in our first twenty years, has stuck by me and I by him. We have been through some rough patches together, but I know I can just send an email, a text, or a phone call, and he will listen with a wisdom that I would have never guessed at. Yup, I am talking about my brother, Mike.
I am proud to say that I am going to be spending time with that gentleman and his fiancé this weekend, as they tie the know and make it official. And as Mike is family and a dear friend who I have not seen since this pandemic thing started, my wife and I are going to take some time to enjoy the socialization. Best wishes to Mike, and I will be back in a couple of weeks! Stay Tuned!
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.