In my last article, about my love of film, especially short film and how it inspires me. In this article, I am going to talk again on the artform called debugging.
If It Was Easy, Everyone Would Do It¶
That title just about sums it up for me. Debugging is not always an easy task to undertake. It takes brute force, luck, and stubbornness. I need to make sure that I clear my mind of distractions, but also know when I need distractions to reset my mind so I can start again. At times it can feel like you are mentally balancing on a rope, without knowing how thick the rope is or how far off the ground it is. And you have to be okay with that.
For any readers that think I may be exagerating with that opening paragraph, I am going to try and prove that I am not exagerating.
Start With Something Simple¶
To start proving my point, I am going to start with something that is relatively easy to figure out. When running this code, it looks like it should succeed, but it does not:
x = 1.1 + 2.2 assert x == 3.3
As a matter of fact, when I run this through the Python terminal, I get the following output:
Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError
For any developer with experience, one look at this code will quickly lead them
to question how Python handles floating point numbers. In this case, the value
x is assigned the value of
3.3000000000000003 and does not match the
value in the
assert statement’s conditional. From the Python documentation
on floating point values:
Unfortunately, most decimal fractions cannot be represented exactly as binary fractions. A consequence is that, in general, the decimal floating-point numbers you enter are only approximated by the binary floating-point numbers actually stored in the machine.
Adding In Some Complexity¶
Graduating from that example, the next example is a bit more nuanced:
x = 5 y = x / 2 assert y == 2
Running this through the Python terminal, we get the same output as above, with an assert occuring on the last line. Even as someone who has used Python for over five years, this one initially stumped me. But due to my debugging skills, I added one line to that example:
x = 5 y = x / 2 print(type(y)) assert y == 2
That one line provided output that was pivotal:
That one piece of information lead me to this article on Real Python that talks about division and integer division. The article talks about various aspects of division and remainders in Python, and was an interesting read. It also gave me something to think about as I am debugging other scenarios.
Taking Steps To Tame The Chaos¶
The first example outlined a situation that occurs in most programming languages.
Most languages do not have built-in support for exacting floating point values.
Instead, the default is to approximate floating point values using binary values.
The key there is the word “approximate” as various calculations end up being
off by an extremely small amount. But that amount is just enough to make conditionals
fail when they look like they should succeed, when the fail. To rememdy this problem,
the develop needs to know about the
// integer division operator which provides
the behavior that most developers expect.
The second example is something that strikes polyglot developers more than single language developers, but can be a blindspot that hits a junior developer. Different languages have specific ways of handling specific situations. In Python, dividing one integer by another integer results in a float. However, in most modern languages, the result would be an integer. It depends on the viewpoint as to whether that is a good thing or not.
But there are concepts that are handled coherently across languages. Sticking to examples related to floating point math, fixed point math is supported in Python using the decimal module. Supported with a similar name in other modern languages, one part of the Python module definition stands out clearly in the documentation:
Decimal “is based on a floating-point model which was designed with people in mind, and necessarily has a paramount guiding principle – computers must provide an arithmetic that works in the same way as the arithmetic that people learn at school.”
While there is not built-in support to properly handle these two examples, modules like
decimal module provide easy-to-use alternatives that address these issues. But
it takes learning and experience to know what the alternatives are and when to use them.
Using Language Support¶
One thing that Python has going for it is that it does not statically type every
variable that is created. Instead, a value is assigned to a type that “makes the
most sense”. Given a value of
"xxx", a string makes the most sense. Given a
1, an integer makes the most sense. Given a value of
a tuple makes the most sense.
But while that is often a boon in Python development, it can make things a bit crazy when debugging. When debugging in Python, I have either had to use debug output or traverse multiple functions to determine the type and value of a variable. While I do spend some time in Python’s real-time debugger, the volume of tests in my projects often prevent me from spending any significant time in the debugger. Instead, I needed an alternative that would at least help me figure out those values.
Coming up with proposals to determine the value of variables was an easy problem to solve. Python has a very robust built-in logger that is easy to setup and use. But that part of the solution requires me to complete parts of the code before I can run the code to produce the log entries. And that did not help me much with my type issues. Then I started paying attention to MyPy.
an optional static type checker for Python that aims to combine the benefits of dynamic (or “duck”) typing and static typing.
The MyPy type checking is run on Python code as part of a “compile phase”. While it is optional, once a module has started to implement static typing in one or more functions, warnings will be emitted if typed Python code calls into an untyped function. But the good part of that process is that most of the type annotations in MyPy are simple and easy to use. Rewriting the above example using type annotations leads to a function looking somewhat similar to this:
def my_func(x : int) -> int: y: int = x / 2 assert x == 2 return y
In this example, I have been more verbose than I usually am, but that is only
to highlight a point. By ensuring that every variable reference in the function
is an integer, the attempt to assign
x / 2 will still work in Python, but will
raise a type error in MyPy. Just having that capability alone saves time.
Multiplying That By Orders Of Multitude¶
One of the reasons that I have strived to keep the various parts of the PyMarkdown project separate from each other is because understanding and debugging complex systems suck. When I say that they suck, what I am really saying is that as humans, we all have our limits in keeping multiple values and multiple trains of thought going in our heads at the same time.
Given a simple system, such as the
my_func function in the above example,
keeping each concept in my head to figure out the problem is a relatively easy
problem to solve. With my experience to supercharge my debugging skills, determining
the issue in a small system like that is trivial.
But as complexity grows in the source code, the complexity of debugging that source code grows at an even faster rate. When writing code, I typically work towards a single goal before moving on to the next goal. If I try to keep too much in my head with respect to the code, I often fail at that task. By keeping things simple, I retain a clear picture in my head which allows me to develop the source code with the same clarity.
The way in which I debug code approaches the problem of debugging compex systems with a similar approach. To ensure that I keep that clarity, I make it a habit in my projects to using logging effectively. This means outputing the current state of a variable at a crucial point and also logging crucial paths taking due to conditional statements. By performing these actions, I provide a trail of breadcrumbs that I can refer back to in order to clarify the picture that I am debugging from. I can then focus on the local picture, and determine if the picture that the logs are telling is the same as the picture that I have in my head.
Debugging effectively for me is all about keeping a clear picture in my head and making sure that I can shift that picture as needed. But that is not always enough to get the job done.
Debugging Is A Lot of Hard Work¶
How does this relate to the whitespace work of the last couple of weeks? The honest answer is that I paid a lot of attention to the above concepts that I have developed over many years, and it was still failing me. Sometimes systems are complex enough that it is hard to decompose them into small enough pictures that make for effective debugging.
That is what happened with the last remaining issue dealing with whitespaces.
With all the other issues resolve, I had a single issue remaining where the
following Markdown generated an
> block quote >\t> another block quote
In trying to keep things simple, I was able to quickly determine what the issue was. When the parser recursed to handle the text past the first block quote on the second line, it was not recognizing the remaining text as another block quote with its own text. Furthermore, because that parsing was off, an assert was failing because the indent calculations for the produced tokens were off as well. Basically, it was not working properly, and I had at least two possible reasons for the failure.
In the two weeks that I was working on the other whitespace issues, I went back to this issue multiple times. Even with the various parts of the system somewhat isolated from each other, the parsing of tokens is a somewhat complex system. Part of that complexity is just because parsing in itself tends to be complex, with most parsers supporting forward scanning as a primary mode and backward scanning where needed.
The PyMarkdown project is no different, with multiple possible areas to check on what could be causing that behavior. I have found through experience that sometimes the most effective debugging behavior is not given up. Just sticking to my confidence that I can debug the code and determine what the issue is. And sometimes, that means trying twenty different paths just to find the one path that works.
And it was on that twentieth try that I found my answer. In that one specific
example above, the
leading_space variable for that Block Quote token is assigned
the leading spaces for both the first and second lines. When that token’s
leading_space variable was then used in a subsequent calculation, the result
was the calculation of the length of all the leading space for that Block Quote
token, not just the last line.
Having talent is essential in developing system to meet the needs they were designed for. Even without a lot of talent, a person can still be taught different concepts and ideas that build their development experience. But when it comes to debugging, I put more faith in having a clear picture in my head and knowing that I am stubborn enough to not give up easily, than to bank on experinece and talent. [more]
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.