Summary¶
In my last article, I worked diligently to resolve all the tests that I had marked as disabled in the previous week. After cleaning up those issues, I finished cleaning up the remaining List Block issues before getting back up to speed on Block Quote issues.
Introduction¶
After a week of digging deep to resolve the disabled tests, I checked the issues list. With only a handful of items left in the List Blocks section, I figured it was a good time to make a push to finish with the List Blocks and start with the Block Quotes.
Knowing that I was about to do that transition made me happy and filled me with dread at the same time. It made me happy as I was more aware of how close I was getting to the end of the first phase of the project. It also filled me with a sense of dread because I had not done any serious work with Block Quotes in a while. As such, I am not sure if the effort to get the Block Quotes to the same level as List Blocks will be the same. I hope it is significantly less, but we will see.
If I think about it, I believe that sense of dread is from not knowing how long it will take to address any issues that arise. I do know, that unless I finish up the List Blocks and start the Block Quotes, that feeling will be stuck at dread. So, time to buckle down and get stuff done!
What Is the Audience for This Article?¶
While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please go to this project’s GitHub repository and consult the commits between 10 Dec 2020 and 13 Dec 2020.
Closing Out List Blocks¶
Having finished cleaning up the disabled scenario tests, there were a few items on the issues list that I needed to get out of the way. While not high-profile items, those items were standing between myself and a list with no list block related items on it. That was motivation enough to put these items at the top of my list.
Adding Variations of Existing Patterns¶
During the previous three weeks, I had taken the Series M scenario tests and filled them out a lot. As I filled them out, I did find parsing and transformation issues, all which got fixed. But in creating that series, I needed to take some alternate path to properly exercise the specific patterns that I was trying to make sure were covered by the tests. In doing so, I often veered away from the correct variation to faithfully create the right pattern in the group of scenario tests that I was working on.
Confused? Hopefully, a concrete example will explain. I started going through the
tests in each group, starting with Fenced Code Blocks and the scenario test function
test_paragraph_series_m_ol_nl_fb
:
1.
```
foo
```
The purpose of that specific test was to make sure that a Fenced Code Block following an
empty Order List start element was parsed properly. However, from my point of view,
there was another variation of that test that was possible. Creating the new scenario
test function test_paragraph_series_m_ol_nl_all_i3_fb
, I altered the Markdown
slightly:
1.
```
foo
```
Technically that was not a big change, but it was a significant one to me. Instead of testing a variation where the Fenced Code Block terminates the List Block, it instead tests the inclusion of that Fenced Code Block into the List Block. I consider that the most correct form of that test. At its base, that reason for that classification is that if I said I wanted “an empty list with a fenced code block”, that is what I would expect. Nothing more, nothing less, just a simple example that had that Markdown in its simplest form.
I could have repeated that exercise with a lot of the scenario tests, but I stayed with four tests from each group: two tests with a single list and two tests with a sublist. Starting with Fenced Control Blocks, I then proceded to HTML Blocks, Indented Code Blocks, and SetExt Headings. I hoped that the new scenario tests were covering old scenarios, but I was not 100% confident that they were. But as I executed those tests, one by one I was convinced they were old scenarios, just being covered by the Series M tests as a group. Not one of the scenario tests failed! After my usual process of adopting the scenario tests, each test passed on its first try, requiring no changes to the parser, the transformers, or the consistency checks.
In the end, I was able to mark all these items from the issues list as resolved:
- variations of list+fenced block with proper indent for actual block
- variations of list+html block with proper indent for actual block
- variations of list+indent block with proper indent for actual block
- variations of list+set ext with proper indent for actual block
Lazy Continuations and Nested List Blocks¶
Another easy issue to get off the list was this one:
- 2-3 levels of lists with lazy continuation lines
Based on the work I had just completed to round out the Series M tests, I had confidence that these would also not require any changes. It was a bit of an educated guess, but I believe that all I wanted to do here is to provide different levels of List Blocks, ensuring that the principle of lazy continuation lines in lists was being adhered to.
Starting with test function test_list_blocks_extra_5a
, I added the following Markdown:
1. abc
def
ensuring that text def
continued the paragraph started in the level 1 List Block.
Three tests later, the test function test_list_blocks_extra_5d
was testing the level 4
List Block with the Markdown:
1. abc
1. abc
1. abc
1. abc
def
This was a simple set of scenarioes that passed right away, but I had expected them to. However, with some of the issues that I have had with List Blocks and lazy continuation lines, it was good to have some tests explicitly covering these cases.
Simple Cleanup¶
Having done some rewriting of code lately, I was debugging and found a couple of lines
in the __handle_blank_line
function that were not being used anymore. With the
breadth of scenario tests and summaries of the code coverage of those tests also in
place, it was easy to determine and test to make sure this code was no longer being
used. With that observation verified, that code was removed.
Getting Clarity¶
For a while, I have been convinced that I coded something wrong, but have not been able to prove whether it was correct or not correct. The only thing that I was convinced of was that I needed to deal with this at some point. With this being the last issue in the List Block section, it was time to deal with it.
- blank line ending a list is parsed wrong into tokens
- >>stack_count>>0>>#9:[end-ulist]
- should be end and then blank, as the blank is outside of the list
- 233 and 235, should blank and end-list tokens be reversed?
The prototypical example of this was test function test_list_blocks_233
:
- one
two
While the HTML output for that Markdown was correct, I had questions about whether I was emitting the following tokens in the correct order:
expected_tokens = [
"[ulist(1,1):-::2:]",
"[para(1,3):]",
"[text(1,3):one:]",
"[end-para:::True]",
"[BLANK(2,1):]",
"[end-ulist:::True]",
"[para(3,2): ]",
"[text(3,2):two:]",
"[end-para:::True]",
]
Specifically, my question was around the Blank token and the end Unordered List token. Should the tokens be in Blank/Unordered order, or Unordered/Blank order? Over the weeks that I have looked at this case, I had never taken the time to sit down and work through it. As it was the last item for List Blocks in the Issue List, it was time.
Doing the Dirty Work¶
This may appear to be an easy case to some people, but I had issues with it. Thinking about it at length, it felt that my understanding of this problem was influenced by which part of the GFM Specification I had last dealt with. So, to deal with that influence head-on, I re-read the parts of the specification dealing with List Blocks, Paragraph Blocks, Blank Lines, and lazy continuation lines. With that information in my head, I started to work through the problem logically.
Starting at the beginning, line 1 starts the tokens off with the first three tokens of the document, leaving an Unordered List item active and a Paragraph element open. When the Blank Line element in line 2 is encountered, it closes the Paragraph element but leaves the Unordered List and Unordered List item open. Therefore, the fourth and fifth tokens are generated and added to the document.
It was at this point in working the problem that the clarity surrounded this problem crystalized and became clear in my mind. I am not sure why, but I had wrongly believed that the Blank Line element on line 2 not only closed the Paragraph Block but closed the List Block and List Block item as well. The Markdown for example 240 clearly shows this:
- foo
bar
as the Markdown is trnslated into a single Unordered List Block with a single item that contains two paragraphs:
<ul>
<li>
<p>foo</p>
<p>bar</p>
</li>
</ul>
This meant that when line 3 is interpreted, the List Block and List Block item are still open, but the previous Paragraph element was closed by the Blank Line. As such, line 3 is not eligible for consideration as a lazy continuation line. With that option removed, the single leading space character is not enough leading space to keep line 3 in the List Block, so that block is closed, and a new Paragraph element is opened with the contents of line 3.
It took a bit of work and a straight head to work through, but I had my answer! To
make sure I did not forget about it, I added a comment to function
test_list_blocks_232
to make sure I can look back at it when I need to. While this was not something that
required a code solution, knowing that this issue was finally (and definitively)
resolved brought a smile to my face!
Starting with Block Quotes¶
With that last issue, all the specifically List Block related items were crossed off the issues list. It was now time to ease myself into work on Block Quotes and getting them up to a comparable level as I had reached with List Blocks.
Starting Out Easy¶
I decided to start with an easy item:
- "# TODO add case with >" for tests
In the beginning, each of the scenario functions in the range test_block_quotes_212
to test_block_quotes_216
were simple tests that showed how lazy continuation lines
work with Block Quote elements. One of the observations that I made when adding those
tests was that, to properly test lazy continuation lines, the removed >
character that makes the line “lazy” should be able to be inserted without changing
the HTML output. Basically, according to the GFM Specification:
If a string of lines Ls constitute a block quote with contents Bs, then the result of deleting the initial block quote marker from one or more lines in which the next non-whitespace character after the block quote marker is paragraph continuation text is a block quote with Bs as its content.
To properly test this, I created a
variant functions of each of those five scenario
tests, and in
each case I added a variation with the >
character at the start of the line. As
I worked through the scenarios, all the variant tests were working fine except for
function test_block_quotes_213a
. Looking at what made that test different, the
answer was obvious: it involved List Blocks. Even after adding other variants of this
test, I was unable to get any of them working.
I was not 100% sure it was the right thing to do, but in the name of progress, I
marked the test functions test_block_quotes_213a
to test_block_quotes_213d
as
disabled, knowing I would get back to them when testing Block Quotes and List Blocks
and how they interacted.
Three Quick Reviews¶
The next three items that I resolved were all easy issues to resolve.
The first item was the removal of a piece of code that was no longer being used:
if is_in_paragraph and at_end_of_line and is_first_item_in_list:
is_start = False
The next item was to remove the poorly worded item from the list and replace it with one that specified the problem more clearly:
- unify 2 separate calculations in `__pre_list` function
And finally, the last item was just to review the existing tests and make sure that agreed with their current state:
- 228 and 229 - what is the proper line/col for ">>>"?
None of these were tough tasks to undertake, but they were all helping me to get back up to speed on parsing Block Quotes.
Mixed Levels of Block Quotes and Serendipity¶
Going through the list looking for other easy items to resolve, this one caught my attention:
- block quotes that start and stop i.e. > then >> then > then >>>, etc.
To me, this looked like an easy issue to tackle. The test function
test_block_quotes_229
was a good base to start with. However, I felt there needed
to be a bit more data, so instead of three lines with varying numbers of Block Quote
start characters, I created test function test_block_quotes_229a
as follows:
> 1
>> 2
> 1
>> > 3
> > 2
and test function test_block_quotes_229b
with the same content, just blank lines
between each of the original lines. Basically, the first test would verify how the
different lines worked together, and the blank lines in the second test would verify
how each line worked isolated from any other lines.
Serendipity¶
Except for some moving code around to make sure it looked correct, only one real change
needed to be done to get the tests working. It was an interesting thing to run into,
but it was also a lucky break for me. If I had selected any other text for each line,
things would have worked fine, and I would be none the wiser. However, with the given
content for each line, the parser thought that the number for each line was a possible
start for an Ordered List Block. As such, it consumed the digit and then looked for
the .
or )
character to follow it. When the end of the line was encountered instead
of one of those characters, an IndexError: string index out of range
error was thrown.
While this was quickly fixed by only setting that variable if the start of the List
Block had been confirmed, it was a good issue to find!
Building Up Test Coverage¶
These issues were ones that I used to start the process of building up to the same level of coverage for various Block Quote scenario groups as I had done with List Blocks. This effort was in response to the following issue list items:
- tests like cov2 with blank before, after, and both for html blocks and other blocks
- tests like cov2 with multiple lines for block items, like html
- all leaf in all container
To accomplish this, variations of Paragraph Blocks were added to function
test_block_quotes_211
, Thematic Breaks to function test_block_quotes_212
,
Indented Code Blocks to function test_block_quotes_214
, and Fenced Code Blocks to
function test_block_quotes_215
. After adding 10 new scenario tests to address
these issues, I felt that this was a good start to addressing the issue of coverage
for Block Quotes.
With all those changes completed, I followed my usual process of verifying each
scenario test. Except for one issue with the Markdown transformer, the tests all
passed without incident. That one issue was that the Markdown rehydration for
test function test_block_quotes_214d
included an extra \x03
character in the
output. As I have mentioned in
previous articles,
that character is a NOOP character, and is used to essentially place a “Blank” in
the tokens that can be removed if not needed. In this case, that NOOP character was
added to the content of the Text token to indicate that a Blank Line was part of that
content.
As that information was not needed for the translation back to Markdown, I eventually
added a call to the function ParserHelper.resolve_noops_from_text
at the end of the
__perform_container_post_processing_lists
function to remove that extra character.
While I knew that I needed to add a call to that function at some point in the
processing chain, it took an hour or so of experimentation to find the right place
to insert that call at. Until I found and tested that location, I found lots of
locations where test function test_block_quotes_214d
was passing, but other test
functions started failing. It was frustrating, but I was able to work through all
that noise and find the right place, which was satisfying!
Sunday Morning Relaxing¶
With a solid amount of work completed during the week, I found myself sitting in front of my computer on another Sunday morning, wanting to get another issue resolved. I do not want anyone thinking that I am workaholic, as I am not. Sunday mornings in our household are mostly for whatever personal projects we are working on. As such, I choose to get up early on Sundays and try and get a couple more issues resolved from one of the projects that I am working on, before the family projects start taking control of my day.
Knowing that it was going to take a couple of hours, and having a couple of hours of peace and quiet available, I started looking at the following item:
- blank lines as part of bquote
- compare test_block_quotes_218 vs test_blank_lines_197a
- already fixed test_list_blocks_260, 257
While it may not seem like much, the positioning and whitespace of those blank lines are just slightly off. Looking at this back at the end of June 2020, I noticed that within containers, the column number for blank lines was off. Specifically, given this Markdown:
- foo
-\a\a\a
- bar
(where the \a
character is a visual indicator for a space character), the token for
the blank line was being calculated as [BLANK(2,5):]
. While that is one possible
answer to Blank Line token for that line, it has issues. Specifically, because it is
within a List Block, the consistency checking had issues with that line because it
did not appear to have the correct indentation.
After thinking about it, I eventually settled on the correct form of that token being
[BLANK(2,2): ]
. As the spaces were all that was on the line, I figured that it was
more correct to say that the column number was 2 followed by three space characters
than a column number of 3 followed by two space characters. Influenced by
example 257,
I have confidence that I made the correct choice, backed up by the commit I made
on
28 Jun 2020
with that choice and the fallout from that choice.
At that time, I was focusing on List Blocks, and I added the item into the issues list to fix this for Block Quotes, and it was now time to fix that. Unlike the fixes required to resolve this for List Blocks, the fixes required to resolve this for Block Quotes were relatively small. The first part of that change was to set the column number to the length of the text removed by the owning container blocks. This firmly set that column number to the first character after the container processing, removing the determination of the column number from the leaf block processing. To balance that out, the calculation for initial whitespace to allow for a Blank Line token within a Block Quote was set to the amount of whitespace that was extracted. Other than that, no other changes were made.
What Was My Experience So Far?¶
As I mentioned earlier, it was a relief to wrap up the verification of the List Blocks and moving on to Block Quotes. But with that transition, there was also a sense of dread that I felt as I started on Block Quotes. Would getting a solid amount of coverage for Block Quotes take a couple of months as it had for List Blocks? Would it be more? Would it be less? I just did not know. I was hoping it would be less, but that not knowing was just driving me nuts.
But I also realized that the sense of dread would not disappear until I started doing something about it. Even by working on easy Block Quote items, I was getting a clearer picture of the effort it will take to cover Block Quotes properly. Instead of a sense of dread, I believe I am at a place where I am confident that it will be less than four months, and probably more than two weeks. Not sure where in that range it will land, but pretty confident it will be in there.
And that is okay for now. The important thing is that I did not let that dread knock me down. I took it, channeled it, and got some more information that helped me deal with it. Cool!
What is Next?¶
Still feeling a small amount of dread, but mostly I am feeling optimism about the progress I am making! As such, I expect to be working with Block Quotes for a while, possibly dipping into dealing with Block Quotes and List Blocks every so often.
Comments
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.