In my last article, I talked about how my personal life is encountering issues that I need to take extra time to deal with. This week, I am talk about the continuing work on the PyMarkdown project to get it closer to a solid release.
Things Just Take Longer¶
With all the easy items out of the way, the items that I do pick up seem to take more time. But when I think about that, it does make sense. There is more experimentation needed, more debugging needed, and often a lot more thinking about the problem is needed. And while I wish I could get through to the finished stage without as much “experimental failure”, I know that it is part of the process, and just comes with the territory.
Okay, maybe a better title would be helpful. But honestly, it was hard to try and
summarize it into four words or less. This was an issue that I found when I was doing
experimentation with the last batch of nested container tests. Just for laughs,
I added an extra space after an empty list start (
- on its own line) expecting
everything to be fine. It was not, hence, the issue.
To start with, I was having a bit of an issue visualizing the code for handling leaf nodes, which is what I originally thought the problem was. To help myself, I decided to split the paragraph handling functions out into their own module, since I have been thinking about doing that for a while. Did not help me solve the issue, but it was another small thing off the list.
After that I went back and forth between the list processor and the leaf processor,
trying to figure out where the issue was. I spent a couple of hours
on one, document for myself what I found during those hours, and revert the changes for the next
day. A couple of iterations into this, I found something interesting. I had been
convinced that it was not the leaf processing of the paragraphs, but I was starting
to doubt that. It was just the way the variables were changing that lead me to
double check my assumptions. And behold, it was one slight change to the calling
parser_state.close_open_blocks_fn function within the
function to set the
include_block_quotes parameter to
True. And it worked.
Hrm. For those who may not be aware, when I do something I consider stupid, I say something like “hrm” which is effectively me saying the word “hum” with my mouth closed. It is my way of asking myself “why did I not see that before?” In this case, when I moved the paragraph code over, I did some checks to see if it could be that code, and determined the answer was no. Instead of including it in the code paths that I reverified, I assumed it was good and left it out. As I said at the start of the paragraph, hrm.
Having logged this issue while doing the previous work on the nested containers, I had just filed this issue because I thought there may be a problem. To be blunt, during that work, I was happy to log issues that may be a problem after only doing a small amount of work to prove them as probably broken.
Specifically, this issue dealt with pairs of the nested container scenarios that included dropping of container blocks for the next line. During the debugging phase, it looked like the tokens produced after dropping the outermost container block on the next line were the same as for keeping the block in the Markdown. It just looked weird, so I decided to save it for later when I could give it the time that I thought it needed.
I was probably deep in thought when I logged this issue, as the token streams mentioned in the issue were in fact different. The normal case had three space characters in the paragraph token, and the drop case did not have those characters, as I would have expected all along. But, in fairness, I did log this issue thinking that there may be an issue, and I wanted to check it. Just to be sure, I spent time looking at the code, verifying that result. After I made assumptions with the last issue, I did not want to repeat that!
This issue was an interesting one in that I had to really dig into some areas of the rule to make sure that I had made the right decision. As the comments for this issue show, the submitter thought that the Markdown:
1. Ordered item - Sub unordered item
with a command line of
--set=plugins.md007.indent='$#4' should not trigger the rule
as it did:
test.md:7:5: MD007: Unordered list indentation [Expected: 4, Actual=5] (ul-indent)
To save readers the trouble of looking up the documentation for the rule, the indent configuration informs rule Md007 that indents for unordered lists should occur every 4 characters. And with an apparent indent of 5, it did not look right. So, to be honest, I agreed with him. I have been looking at this issue on and off since it was logged in February, but never really dedicated time to figuring it out. But I was never sure if it was correct or not.
In taking a concentrated look at this issue, the first thing that I noticed was
that the tokens looked off. Upon further examination, the parsing of the Markdown text
that exists before the above sample was not closing both lists, only the outermost
list. As a result, the two lists started with the above sample were considered to
be a second and third level list, not a first and second level list. Making some
changes to the
__close_next_level_of_lists function to properly close the lists
solved that issue, but did not solve the main question: was the triggered
After going outside and doing some yardwork, I came back inside with a nice cool glass of water and started to look at the problem again. Keeping in mind that assumptions got me in trouble before, I decided to throw them all out and start fresh. It was then that I re-read the documentation for the rule and came across the following text near the end:
The original rule did not work for Unordered List elements within Ordered List elements. For example, the original rule does not fire on the following sample:
along with a sample that includes a pair of unordered list elements within an ordered list. Perfect for this case!
Digging into the code for the rule a bit more, the changes I made to support this
rule firing within an ordered list item became clear. If the new ordered
list item was contained in anything other than an ordered list, it reset the depth
0. In the case of the above example, the rule considered the depth of
the unordered list item to be
0. It took me a while to get there, but I agreed
that the rule was correct. But how to change the documentation?
To properly document this issue, I added that information to the documentation in a new section named Notes. I am probably going to go back and see if I can write a better description in a while, but I think it is good enough for now.
One of the users reached out with Issue 382, asking if it would be possible to use the PyMarkdown linter on Markdown within a Jupyter notebook. Working with him, I was able to get the context that one of the types of information within the notebook are simple Markdown cells that usually contain notes or instructions for the reader of the notebook.
To make a move in the direction of making that possible, I knew that the first thing that I needed to do was to uncouple the linting engine from the file system. At that point, the PyMarkdown project only worked on existing text files with the correct extension. While I am not sure yet what the best way to support Jupyter notebooks is, I do know that it will probably involve passing the cells into the linter without writing a file. Or at the very least, I want to have that option available.
I started work on this on Sunday morning, and after working on that around other things I needed to do, I came to startling conclusion: I loved the argparse library, but I also hated it. I had spent over four hours trying to get my way with the library, with so many Google and Bing searches that I lost count. It was frustrating because I had the actual standard in handling written, I just needed to finish the work on the command line.
All I wanted to do was to have groups of options that were mutually exclusive from
each other. Ideally, I wanted an
-s option to trigger the reading from standard
input, raising an error if any of the file-based options were specified. From a
concept point of view, it was clear: if I was using
-s to read from standard input,
I did not need options like
-r to specify recursion through the directories.
The implementation was another story. I would like to think that I came close to
the answer a few times, but that is probably just my ego talking. Using the
argparse library, it is possible to specify that single options are mutually
exclusive, such as
-r. However, that is as far as it goes. I found,
through trial, error, and Google, that any kind of arguments groups do not nest.
As such, I could not tell it that
-s was mutually exclusive with a specified
group of arguments.
While it was not my first pick, I decided to go with a subcommand for the argument
parsing implementation. I do not think it is as elegant as my proposed solution,
but it works. Instead of using the
scan command and its arguments, I created
a new subcommand
scan-stdin that had no arguments. It seems a bit awkward from
my point of view, but it accomplishes the goals, which is the important part.
Thinking About the Notebook¶
I am still thinking about how to do the notebook. I am sure that I am going to find a solution for the user, I am just not sure what it is yet. I know that I now have more options since I have added standard input support, but I will be taking time this week to think through it properly and talk with the user.
It was frustrating at times, and I feel like I could have gotten more done, but it is still a good feeling to knock some items off the list!
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.