Summary¶
In my last article, I talked about the work I put into getting Rule Md027 implemented. In this article, I talk about the next three rules I worked on.
Introduction¶
It felt good to get Rule Md027 off of the Issues List. I hated to put it on there a couple of weeks ago, but it was the right call at the time. And since I had to tackle it at some point, reserving a week to work on it and deal with it last week was also the right call.
But with that work now completed, I needed to get back on track. And the closer I am getting to having all the rules implemented, the more I want them to be done. I know that is normal to feel that way, but I need to make sure I temper that feeling with patience and keep following the rules that got me here.
Once more into the breach I go!
What Is the Audience for This Article?¶
While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please consult the commits that occurred between 18 Aug 2021 and 22 Aug 2021.
Fenced Blocks Elements and Blank Lines¶
Having taken the entirety of last week to finish Rule Md027, I was hoping that I could get this rule completed in less time than one week. Just from looking at the rule, it sure looked a lot simpler. Instead of a whole bunch of complicated stuff, just make sure that any Fenced Code Block elements are surrounded by at least one Blank Line.
The rule looked simple. But was it really going to be simple?
Design¶
Based on my initial looks at this rule, I determined that there were going to be two groups of elements that I needed to deal with: the ones that crossed container boundaries and ones that don’t.
As soon as I figured out that the second group will cross container boundaries, I knew that I would need to keep track of the container tokens and which one of those tokens were active. Thanks to Rule Md027 and the other rules before it, I had some good design and code that I could reuse. That was one part of the design taken care of. Now on to the main part of the design: dealing with both groups.
Designing for that first group was easy. For the opening of
the Fenced Code Block element, keep track of the token used the
last time the next_token
function was called. If a start Fenced
Code Block token is encountered, check that “last” token to see
if it is a Blank Line token. The closing sequence is almost the same
but in reverse. Once the end Fenced Code Block token is encountered,
set a variable to keep on looking for tokens. In the next pass through
the next_token
function, check to see if that token is a Blank Line token.
Logically, those two patterns came easy to me. Track the Blank Lines in case we find a start Fenced Code Block element that needs them and track the Blank Lines after we found an end Fenced Code Block element. Both made sense. That was easy… almost too easy. Then the design for the second group came into focus.
For the second group, that design was going to require adjustments to the existing design. For the opening part of the design, the “last” token variable should only be set if the token is not related to a container open or closing. In that way, it can span those container boundaries without any issues. After I ran through some simple examples in my head, I was sure that I had the right design for the opening part.
For the closing part, I started thinking that the same consideration applies. But after some thought, I realized it did not. From the point of view of the Fenced Code Bock element, if it exists inside of a container element, it makes sense to look after any end tokens to determine whether any Blank Lines follow the Fenced Code Block element. The element does not care if those end tokens are there as they simply offer a logical encapsulation of an element. In most cases, the end tokens do not represent actual text in the document. Thus, if a new element and its token are present instead of the required Blank Line, it should trigger the rule.
Taking a second look at those designs, things looked solid. It was time to move on.
Testing and Implementation¶
After the massive 85 scenario tests last week for Rule Md027, I was happy when I counted the scenario tests for this rule, and I ended up with 16 scenario tests. They started with the simple tests without containers and quickly morphed into those simple tests with containers. The 16 scenario tests were a quantity of scenario tests that I could easily deal with.
def next_token(self, context, token):
if token.is_fenced_code_block:
self.__handle_fenced_code_block(context, token)
elif token.is_fenced_code_block_end:
self.__end_fenced_code_block_token = token
elif self.__end_fenced_code_block_token:
self.__handle_end_fenced_code_block(context, token)
if (not token.is_end_token):
self.__last_non_end_token = token
Starting to work on the rule implementation, getting the tests for the first group passing was the priority. Following the design, I created a simple function that dealt with the before and after cases properly, as well as setting the last token.
Handing off the processing of the before case to the
__handle_fenced_code_block
function, that function proved easy
to implement:
def __handle_fenced_code_block(self, context, token):
if (
self.__last_non_end_token
and not self.__last_non_end_token.is_blank_line
):
self.report_next_token_error(context, token)
Just like the design specified, if the last token before the
start Fenced Code Block token is not a Blank Line, trigger the
rule. Furthermore, because the design stipulates that end tokens
were not important, only non-end tokens will be in the
__last_non_end_token
variable.
While the before case with the first function was easy to implement,
the handling of the after case and the __handle_end_fenced_code_block
function was a bit trickier:
def __handle_end_fenced_code_block(self, context, token):
if not token.is_blank_line:
line_number_delta = self.__last_non_end_token.token_text.count("\n") + 2
column_number_delta = (
self.__end_fenced_code_block_token.start_markdown_token.column_number
)
if (
self.__end_fenced_code_block_token.start_markdown_token.extracted_whitespace
):
column_number_delta -= len(
self.__end_fenced_code_block_token.start_markdown_token.extracted_whitespace
)
if self.__end_fenced_code_block_token.extracted_whitespace:
column_number_delta += len(
self.__end_fenced_code_block_token.extracted_whitespace
)
column_number_delta = -(column_number_delta)
self.report_next_token_error(
context,
self.__end_fenced_code_block_token.start_markdown_token,
line_number_delta=line_number_delta,
column_number_delta=column_number_delta,
)
self.__end_fenced_code_block_token = None
The detection part of the function was straight forward, taken care of by the first statement and the last two statements of the function. The focus for this function was calculating the deltas to apply to the token’s line number and column number. As end tokens do not contain any position information, the position of the end token needs to be reconstructed using information from the start Fenced Code Block token and the Text token within the block. Once that calculation was performed, a small variance was needed to alter the column delta to compensate for any indent of the end Fenced Code Block token, and it was done.
Then it was on to the container tokens.
The Second Group¶
Having track container tokens and their scopes for a couple of rules, I have an established pattern for handling containers. Therefore, that code was easy to add at the start of the main function:
def next_token(self, context, token):
if token.is_block_quote_start:
self.__container_token_stack.append(token)
elif token.is_block_quote_end:
del self.__container_token_stack[-1]
elif token.is_list_start:
self.__container_token_stack.append(token)
elif token.is_list_end:
del self.__container_token_stack[-1]
elif token.is_fenced_code_block:
...
To complete the handle of the before case, I needed to change
the if
statement around the setting of the __last_non_end_token
variable. To ensure that a new container does not disrupt
this check, the if
statement was changed slightly to ensure
that it does not set the __last_non_end_token
on either of
the end container tokens or the start container tokens:
if (
not token.is_end_token
and not token.is_block_quote_start
and not token.is_list_start
):
self.__last_non_end_token = token
With those in place, the Blank Line checks in both helper functions only required a simple change to ensure that they would fire properly with start List elements:
if not token.is_blank_line and self.__container_token_stack and \
self.__container_token_stack[-1].is_list_start:
Doing my usual due diligence, I went through the code and tightened up variable names and organized the code to my standards. I then went through and experimented with about ten different container scenarios, checking to see if the rule fired properly for each of them. In each case, the rule fired properly, and I was able to reduce the experimental scenario down to a scenario that was already present in a scenario test.
Adding Configuration Support¶
With everything working and double checked, I added code to respond
to the list_items
configuration value. Once loaded in the
initialize_from_config
function, responding to it in the code was easy:
can_trigger = True
if (
self.__container_token_stack
and self.__container_token_stack[-1].is_list_start
):
can_trigger = self.__trigger_in_list_items
if not token.is_blank_line and can_trigger:
...
The last example from the previous section was simply altered to allow the rule to trigger in a general case. In the case where the trigger is occurring within a List element, it allows triggering based on the configured value. Not a bit change, but it was nice that the scope was small.
Checking everything again, I committed the change and started looking at the next rule.
List Elements and Blank Lines¶
Knowing that there is efficiency in working on tasks that share a common theme, I decided to find another rule that was not yet implemented and deals with Blank Lines like Rule Md031. I did not have to look far before I found Rule Md032. Dealing with List elements and Blank Lines, I hoped that I could leverage my work from Rule Md031 to get this rule implemented quickly. But, as always, design first before implementation.
Design¶
To be honest, I really did not do that much design on this rule. Once I started figuring out the design on paper, I quickly realized that it was a watered-down version of Rule Md031. There was the same concern about working properly within containers, and the same concern about tracking what happened at the start of the element and at the end of the element. Other than the token being tracked, they looked the same.
Therefore, the change of token and my previous design for Rule Md031 made it trivial to design. Remove the logic for Fenced Code Block elements, replace them with logic for the List elements, and I was practically done.
From there, it was time for testing and implementation.
Testing and Implementation¶
Following the same process for generating test scenarios as with Rule Md031, I quickly came up with a solid group of ten scenario tests for this rule. Like the design work for this rule, almost all the scenarios were just slightly changed versions of the scenarios from the previous rule.
From there, it just made sense to use the code for Rule Md031 as a base, modifying it as needed. This was not a difficult task. The Block Quote token related portions of the rule did not change, and the code for the Fenced Code Block tokens and List tokens were merged. Between those changes, it was less than five minutes before I got the first scenario test to pass. The rest of the scenario tests were passing in quick order as well.
Everything was quickly working properly in all scenarios, except for two of the more unusual scenarios. In these scenarios, there is a transition from a Block Quote element to a List element and then to another element. To be honest, I do not remember every having written a document where those elements were nested in that fashion, but as it is allowable by the specification, I knew that I needed to support it.
So, after doing some debugging with these scenarios, the solution was to move one block of code:
elif self.__end_list_end_token:
if not token.is_blank_line and not token.is_new_list_item:
self.report_next_token_error(context, token, line_number_delta=-1)
self.__end_list_end_token = None
to the start of the function:
def next_token(self, context, token):
if self.__end_list_end_token:
if not token.is_blank_line and not token.is_new_list_item:
self.report_next_token_error(context, token, line_number_delta=-1)
self.__end_list_end_token = None
Based on my debugging, the if
conditions that preceded the
original position of the statement were causing the other handlers
for the container tokens to be executed instead of that block of
code. By placing it at the start of the function, the if
block was
guaranteed to be executed without the container tokens getting in the way.
Having discovered and fixed this issue for Rule Md032, I started to wonder if I had missed it as well in other rules that I had recently worked on. I needed to check it out.
Cleaning Up¶
In going over the scenario tests for this rule, I realized that I may have missed some scenario tests in some of the rules that recently worked on. Adding additional scenario tests to those projects, I soon had a small list of things that I fixed.
Adding four scenario tests to the tests for Rule Md031, I found that the Fenced Code Block elements within certain containers were not behavior properly, as was the case for Rule Md032. In that case, I solved the issue in the same way by moving the end Fenced Code Block handling to the top of the function. A couple of extra passes over the code, and everything looked good.
Likewise, after adding four scenario tests to Rule Md022, I noticed
that I had missed some scenarios with the handling of the end Block
Quote tokens. Taking some time to debug, the problems with these
scenarios proved easy enough to fix, simply requiring the addition of
and not token.is_block_quote_end
to two of the existing conditions.
While I initially added another three scenario
tests for a total of seven new scenario tests, I eventually decided
that those scenarios were duplicates of the base four scenarios and
removed them.
Having done that work, I look at the scenario tests for Rule Md027, to see if there was anything that I missed for that rule and found nothing obvious. However, during that exploration, I was not happy with the way that the code looked and did some refactoring on the code to make it clearer. Nothing drastic, just little changes here and there to make it read better.
Taking The Time¶
Sometimes I wonder if things like this just waste of my time. I mean, instead of taking time to look at issues like these, I could be starting on a new rule or addressing some other issue. But then I take a breath and realized that it is the right thing for me to do. In the above cases, I found some new scenarios for what I was working on and wondered if the more recent rules handled them properly. The absolute worst scenario was that I consumed time and did not get much in return. The absolute best is that I found issues in all three rules and fixed them. As it was, I found issues in two of the rules, and code that I was not 100% happy with in the third rule.
Call me an optimist, but I think that was a good use of my time. But now that I had that finished, it was time to get one last rule in this week.
Unordered List Indentation¶
Sometimes, I plan an order to my tasks to make my work more efficient. Sometimes, I look back and wish that I had planned my work in a different order so that I could be more efficient. At my first glance of this rule, I was not sure which of those categories this rule was in.
The only way to find out? Dig right in and get to work.
Design¶
At a high level, the design of this rule is simple: make
sure that there is an expected level of indentation for Unordered
List elements. By itself, that was an easy concept to design for. The
algorithm would assume that an item in an Unordered List element
would be indented by 2 space characters as a default.
Therefore, the very first List element would be a level 1 List element,
indented by 0 space characters, the List element within that List would
be a level 2 List element and indented by 2 space characters, as so on.
From a design point of view, that means checking the current indentation
against the equivalent of level-1 * 2
, and triggering if the indentation
is different.
The interesting parts of that design were what followed that initial design. And yes, they are all about container tokens.
Container Tokens¶
The above design assumes that there is a simple definition of what the current indentation is, and that is not always the case. In the case of this example:
* level 1
* level 2
the definition of current indentation on the first level is 0
, and on
the second level is 2
. Those indentations are easy to calculate because there
is literally nothing before that List element on each line.
But taking one step into container blocks, this simple example changes that definition somewhat:
1. level 1
+ level 2
In this example, the indentation for the first line is still 0
, but the
indentation for the second line is 3
. This is because the level 1 List
element is an Ordered List element whose text and spacing creates a base
indentation of 3
. This means that the indentation of 3
for the Level
1 Unordered List element is correct. I had to remind myself (many times)
that while Ordered List elements and Unordered List elements are both List
elements, they are not the same. And as they are not the same, they are
treated differently.
This idea extends to Block Quote elements as well:
> + level 1
Because the Block Quote element includes the space after the >
character,
it has a base indentation of 2
. Therefore, the Unordered List element
has an indentation of 0 spaces after the Block Quote sequence.
Once nesting of container elements comes into play, things get messy quickly. But at their basis are those three examples. I worked through a couple or the more complex nesting example and made sure that they are taken care of. From what I could see from the ones I chose, things looked good.
Block Quotes And Leading Spaces¶
Having written a good design and implementation of traversal of Block Quote
tokens and their leading_spaces
lines, I knew that this design should
leverage that work.
One of my refactoring rules is “Code it twice, on the fence; code it thrice, don’t do it again”. As this was the second time, I wanted to try and capture the previous work in one or two functions that I could refactor into the old rules with later. What was important to me was that I did not have to write it “one more time” after this.
Keeping that in mind, I continued to design for the container tokens.
Designing For Container Tokens¶
Given those constraints and that research, I was now ready to design the tough part of this rule: the definition of the current indentation.
For the purposes of this rule, the current indentation is the number of spaces required to get back to the last meaningful container token that serves as an anchor for the Unordered List element. If there are no such tokens, then the anchor is the start of the line. Because Unordered List elements nest nicely, this means that the algorithm needs to go back to before any such nesting occurs. From there, the algorithm needs to leverage the data stored within the other container tokens to establish that anchor token. Once the anchor token is established, the indentation for a token is the number of space characters required to get back to that anchor token’s column. Phew!
While I had the general design figured out, I knew in advance that I was going to have to be fluid with the second half of the implementation of finding the anchor token. I have written enough rules to know that sometimes the List elements and Block Quote elements can play off each other in weird ways, and I needed to consider that from the outset. To me, this was not going “meh, I’ll design it later”, it was “I cannot design it until I get there”. Seeing as I have been very good at doing design before implementation, one intentional deviation with a good reason behind it wouldn’t hurt.
Testing and Implementation¶
Working through all the scenarios, I ended up with 28 scenarios to
test. The good part of that is that 9 of those scenario tests were
from the refactored leading_spaces
code, so hopefully this would
be one of the last times those needed to be tested.
Moving on to the next_token
function, it started simple and
remains simple:
def next_token(self, context, token):
if token.is_unordered_list_start or (
token.is_new_list_item
and self.__container_token_stack[-1].is_unordered_list_start
):
self.__check(context, token)
self.manage_container_tokens(token)
The core of this rule is very simple. If the token is starting
an Unordered List element or is a new List Item within an existing
Unordered List element, the rule needs to check if it is properly
indented. Otherwise, the manage_container_tokens
function manages
the container token stack’s __container_token_stack
variable and any required
index into a Block Quote token’s leading_spaces
field.
For the small portion of tokens that get selected for further checking,
the __check
function handles that:
def __check(self, context, token):
(
container_base_column,
block_quote_base,
list_depth,
) = self.__calculate_base_column()
if token.is_new_list_item:
list_depth -= 1
adjusted_column_number = token.column_number - 1 - container_base_column
if block_quote_base:
container_base_column -= block_quote_base
elif container_base_column:
container_base_column += 1
calculated_column_number = list_depth * self.__indent_basis
if adjusted_column_number != calculated_column_number:
self.report_next_token_error(context, token)
Per the design, the base column and two other variables are
calculated based on what is currently in the __container_token_stack
variable. With that information, the token’s column_number
field can
be adjusted to switch from a column number based on the line to an
indentation based on the anchor token. Then, using the list_depth
variable, the calculated_column_number
variable can be calculated,
leading to a comparison between the calculated_column_number
variable
and the adjusted_column_number
variable.
Calculating The Base Column¶
Calculating the base column turned out to be the function that took the most time to figure out. As I noted before, I allowed for a certain amount of fluidity in the design for this function, and I used most of it up getting this function right.
def __calculate_base_column(self):
container_base_column = 0
block_quote_base = 0
list_depth = 0
if self.__container_token_stack:
stack_index = len(self.__container_token_stack) - 1
while stack_index >= 0:
if not self.__container_token_stack[stack_index].is_unordered_list_start:
break
list_depth += 1
stack_index -= 1
ignore_list_starts = False
while stack_index >= 0:
if self.__container_token_stack[stack_index].is_ordered_list_start:
if not ignore_list_starts:
container_base_column +=
self.__container_token_stack[stack_index].indent_level
ignore_list_starts = True
elif self.__container_token_stack[stack_index].is_block_quote_start:
bq_index = self.__bq_line_index[stack_index + 1]
split_leading_spaces =
self.__container_token_stack[stack_index].leading_spaces.split("\n")
if not block_quote_base:
block_quote_base = container_base_column +
len(split_leading_spaces[bq_index])
container_base_column += len(split_leading_spaces[bq_index])
ignore_list_starts = False
stack_index -= 1
return container_base_column, block_quote_base, list_depth
For the scenarios that do not have any tokens in the token stack, this function returns simple default values. Otherwise, this function starts at the end of the stack and works its way to the start of the list.
The first while
loop gets rid of the easy tokens on the stack: the
Unordered List tokens. As these tokens stack together nicely, nothing
special is needed in processing these tokens other than incrementing
the list_depth
variable for later.
The second while
loop takes care of the other container tokens. Once
again, if there is nothing left on the stack (i.e. stack_index == 0
),
nothing more processing is needed. However, if there are more tokens
left on the stack, they are Block Quote tokens and Ordered List tokens.
To properly figure out what the base column is, those tokens need to
be examined until the beginning of the stack is reached. At that point,
the proper base column should be in the container_base_column
variable.
Taking The Long Way¶
It took quite a bit of effort and time to figure out the base column part of this rule, even if it was theoretically simple. I was right in thinking that the List elements and the Block Quote elements would play off each other. As it is, I went a couple of levels deep in the container elements and got those right, but I didn’t go deeper. In hindsight, I am hesitant to say that I got all the combinations of container elements. However, I am confident enough to say that I believe I got most of the combinations that matter.
And with that, I stopped working on the rule and started working on this article. Taking a couple of breaks, I did my usual cleaning up and linting of the rule, before committing it to the project.
What Was My Experience So Far?¶
From a pure numerical standpoint, the number of completed rules is now 23 and the number of rules remaining is now 8. As my expected benchmark was 3 completed rules per week, I was successful in meeting that goal. From a quality point of view, I was able to find a handful of new parser issues, logging them for future fixing. While they will require future work to fix them, I see those issues as issues that users of the project will not find. And from an efficiency point of view, things were going well. My adherence to my design rules were serving me well and keeping me focused.
The only negative? I still have 8 rules left to finish. I know it might sound like whining, but I really want to get those rules implemented so I can release and fix the issues that I have found. And because of that impatience, I had to take a couple of extra breaks this week to make sure that I was working on the project with the right mindset.
And yes, I am a glass-half-full type of guy. How did you guess?
What is Next?¶
It felt good to get more than one rule done. For next week, I know I am going to be close to finishing off the rules, but I won’t know if I get there until next Sunday. Stay tuned!
Comments
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.