In my last article, I talked about making hard choices and the follow-through that comes with making those choices. In this article, I talk about getting back to work on testing the PyMarkdown project.
There have been a few things on my mind this week, so I was not able to get as much work done as I had hoped to. But I still made progress on the PyMarkdown project, which is good. With a bit of expectation setting and without too much added fanfare, on to the rest of the article.
What Is the Audience for This Article?¶
While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather than the solutions themselves. For a full record of the solutions presented in this article, please consult the commits that occurred between 14 Dec 2021 and 19 Dec 2021.
Life Versus Side Projects¶
There have been a handful of articles out this past week about the Log4J vulnerabilities and how they are being handled. On one hand, as a professional software developer who relies on projects from other people, I want those projects to work properly. On the other hand, as a software developer who created the PyMarkdown project completely on his own time, I understand the issues that have been brought up by open-source maintainers. As with a lot of things, it is a matter of finding a good balance between two or more things.
Every open-source project that I have had the pleasure of using is clear on where the developer team for that project believes that line is for that project. With very few exceptions, we are all working on these projects on our own time. As such, I know that if I must decide between making an improvement on the project or dealing with life, that life is going to win somewhere above 90% of the time. If an issue is especially important to me, I may decide to shuffle things around to deal with that issue. But that choice is my choice to make, not anyone else’s choice.
This week is a good example of those statements in practice. As someone who has Autism Spectrum Disorder, sometimes my environment gets to be more than I can handle, causing me to get headaches that are often debilitating. I struggled all week to work through a mild headache that only got worse towards the end of the week. And while I was able to get some “easy” work done on the project, my health, my family, and my professional work had to take priority. In my mind, there was not even a question about it. It was just a fact.
Taking it slower during the week helped me to have a successful week at work and to take Friday off and try and deal with getting rid of my headache. Making sure I was dealing with that properly made sure that I was not (too) grumpy to my family. And from a job enjoyment viewpoint, I was able to take part in a handful of very interesting conversations with my colleagues at work. My decision to focus my energy where it needed to be, not on the PyMarkdown project, was the right choice.
To take this back to the topic of open-source software. RTFM. Read The Full Manual.1 Most open-source is provided without any strings attached for commercial use. If you intend to use it in any kind of commercial or mission critical software for your company, it is a risk that must be evaluated. If you decide to accept that risk, it is up to you and your company to mitigate the risk to your company, not the people maintaining the project. Most open-source projects encourage users to share their enhancements of the project with other users. This is part of the open-source community. This is part of how things work.
While it was not explicitly stated for open-source, a quote from my mother on topics like this is very fitting:
Don’t go and take something from your neighbor, only to complain that when you went to use it, it did not work like you expected it to.
If you use open-source, be prepared to maintain it if something breaks. If that happens, please be considerate and offer any improvements to the project maintainers. If you do not fix it yourself, do not demand that the maintainers fix something for you on your schedule. Remember that it is called the open-source community.
When it comes down to it, if you would not normally ask people on the team at your workplace to do something in a given manner, you probably should not ask any open-source maintainer to do the same thing.
With my health context from the previous section in place, that context should set the stage for the work that went on last week. Having completed the bulk of the refactoring work, it was time to create a release and get the refactoring heavy fixes out to users.
Cleaning Up For The Release¶
There was only one thing stopping me from releasing the project: my scribbles. Over the last two weeks, I kept a set of notes on things that I wanted to check on before the release. During a normal week, I probably could have taken care of these before the weekend, but this last week was not a normal week. As such, it took me until Saturday evening to get everything cleaned up. It did take longer to clean everything up than I thought it would, but it also felt like it was the right amount of time to focus on those issues as well.
What did I clean up? Most of the things that I cleaned up were simple patterns
that I was not sure that I followed while doing the refactorings. A good example
of this is my casual usage of
False. When I am writing code the
first time, I may decide to write the code as:
some_value = False ... if some_condition: some_value = True ...
That is a perfectly logical construct to use, and I use it often. When I
am writing code, I may be concerned about adding some extra information to the
function that will affect either
some_value. As such, I
often decide to write constructs like the one above to give me flexibility
while implementing the algorithm. But once I have completed development, I
would easily argue that it is not as readable as it could be. I would argue
that, if possible:
... some_value = some_condition if some_value: ...
is a better pattern to follow. Instead of being spread out within the function,
the information needed to understand the
if statement is in the immediate
vicinity of the
if statement. From my point of view, that is good!
However, that improved pattern comes with a couple of caveats attached to it. The first
is that the construct works in simple cases, but not in the more complicated cases.
A good example of that is a nested
some_value = False ... if some_condition: ... if some_other_condition: some_value = True ...
In this case, it may be possible to use that improved pattern on the second
statement, but it depends on the other logic in that function. The other
dependencies in the function is also the second caveat. If
the scope of the function is small enough, the possible dependencies on the
statement decrease, making it more likely that the improved pattern can be used.
But the larger the function, the more likely it is that the pattern will fail.
Along with that pattern, there were also some refactorings to optimize how I
for statements. Once again, I was not using them improperly,
but I felt I could rewrite them to be more readable and maintainable. However,
along a similar line of thinking but with
if statements, I had decent room
for improvement there. Refactoring
if statements into
if assignments where
possible helped, as did moving any variables used in
if constructs closer
to where they were being used.
And in each case, it was a simple change, but it took time to work through it. Look through the source files for patterns that I wanted to improve on. For each change, I made the change and executed the full set of scenario tests against the change before allowing it to remain. If it did not pass, it was time for some debugging. When it did pass, make sure it looked right and stage it in the project’s Git repository before moving on.
Lather, Rinse, and Repeat¶
Do that repeatedly. If I had to guess, I repeated that action about 500 times over the course of the week. But it was decent work that I could easily do when I had any energy and available time during a slow project week.
And having crossed out all the scribbles on my work sheet, it was around noon on Saturday when I was able to sign off on Release 0.9.3.
Issue 159 – Weird Indents¶
Feeling better on Saturday evening then I had all week, I decided to start looking at Issue 159. At first glance it might not seem correct, but the following Markdown document:
1. Item 1 1. Item 1a 100. Item 1b
should parse into a level-one ordered list with two items and a level-two ordered
list nested within the first item of the level-one list. Because the
1 on the
third line occurs before the
I on the first line, the third line is interpreted
as a new list item for the level-one list instead of the level-two list.
The problem was that this document was being parsed as a
1-2-2 instead of a
It took a bit of debugging to figure this one out, but I was able to resolve it within
a couple of hours. When checking for possible parent lists for line 3, the code
was using the
ident_level field of the list tokens to determine which list was
the parent list. However, because of the long number for the list item on
line 3, the
indent_level for the new token on line 3 was 7, greater than the
indent_level of 6 for the list token from line 2.
To properly figure out which token was the parent list token, I did my usual scribbling
on paper and came up with some very simple cases. At that point, it became
obvious to me what the solution was. For the first line’s list token, the effective
range for list item starts is between column 1 and column 3, creating an
value of 3. The second line’s list token range is between column 4 and column 6,
indent_level of 6. So, while the right side of the start for line 3’s
list token is close to the range for line 2, the left side of the start for line 3
is firmly within the range for line 1.
Once I changed the algorithms to check the start of the list item against the ranges of the lists, the problem was solved!
Starting on Sunday morning, I was able to make progress on setting up the
scenario tests based on the various combinations of containers. While I did mark
this down in my Issues List as
Nested Lists, I understood that to mean not only
nested list elements, but any kind of nested container elements. Confident that
the two-level nesting combinations were all tested by the specification itself,
I decided to start with the three-level nesting combinations.
It took a while, but by the late afternoon I had all scenario tests coded and the preliminary results for those tests. There were a small handful of tests that failed outright, mostly due to transitions between one container and the other container and back again. I quickly took care of those and fixed them so that they would not assert and fail. With those out of the way, the remaining failures were both about whitespace.
In both cases, the scenario tests pass their parsing requirements and their HTML generation requirements but fail on putting the document back together again. In each case, it is because the calculated whitespace is not correct, leading to regenerated Markdown code that is misaligned. While it is important to get these issues taken care of, it is just a matter of finding the right whitespace to add at the right time.
But, with Sunday evening approaching, I had to put of further exploration of those failures until later in the week. This article was only partially written, and I knew it would take up the rest of the night getting it close to the point where I could finish proofreading it on Monday night. But I will be working towards taking care of that soon.
What Was My Experience So Far?¶
I know that this may seem like a trivial measure of where the project is at the moment, but I am pleased that the count of serious issues is in the low single digits. I am also pleased that I am finding some issues with the nested containers. Well, I am not pleased that I am finding them, but I am pleased that I am finding them before they are being reported. And I am quite sure that I can fix them relatively quickly.
Another side effect of testing the three-level nested containers is that I am fairly confident that it will have a positive effect on the four-level nested container testing as well. From what I was able to discern from the whitespace failures, it looks like the whitespace that came before certain container elements is not being properly added to the whitespace for more nested containers. That means if I properly address those issues now, it should cut down on similar issues with extra nesting. At least that is my hope.
As always, keeping a positive attitude, and working towards getting the remaining items on the Issues List resolved.
What is Next?¶
Since I started working on Nested Lists, it is a good bet that I will probably be working on that this week. Stay tuned!
Yes, I know that the
Fin RTFM stands for something else. ↩
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.