Markdown Linter - Keeping At It With Something That Works (For You)

Summary¶

In my last article, I talked about taking some time to do things right with the PyMarkdown project. In this article, I talk about how that went and what got me through it.

Introduction¶

There is no way for me to state the effort that I have put into the PyMarkdown project in the last two weeks without using phrases like “heavy lift”, “slogging”, or “huge”. But I went into that effort knowing that I had a solid strategy to succeed and a solid personal process to get through that effort. And for me, that made all the difference. I went into the effort with the confidence that I could tackle this effort and get out the other side. All because I figured out how I work best.

What Is the Audience for This Article?¶

While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather than the solutions themselves. For a full record of the solutions presented in this article, please consult the commits that occurred between 14 Dec 2021 and 03 Jan 2022.

Cleaning Up the Mess I Made¶

When writing the heading for this section, I did not feel like sugar coating my opinion of what I was doing at all. In adding better testing of the various nesting combinations, I had found a lot of issues that I needed to deal with. Thankfully, only one of those issues dealt with any asserts failing, which was good. But as the PyMarkdown project is a Markdown linter, making sure that it parses the Markdown correctly and points to the correct line and column is essential. And most of the issues fell into that second category.

While I could easily bore any readers with a play-by-play of everything that was fixed, I do not believe it would serve any purpose. Unless a reader understands how the project is put together, it would be a lot of gobbledygook. However, describing my personally tuned process that got me through those issues is something that I believe would be beneficial, so I am going to focus on that.

Find What Works For You¶

What I am going to talk about in this article is largely what works for me. Even with the specialization to how I work, this process is not perfect. But it generally gets the work done just over 90% of the time, and that is good enough for me. If I get into a situation what I am in the non-90% sections, it is a special case, and I can figure out an alternate process to work through that situation.

From my experience, having that developer-tuned process is important. As a code developer and test automation developer, I do not want to spend time trying to figure out how to do something. I know that is not where I shine the brightest. As such, I have refined my process over the years to keep the noise of developing software as low as possible. This allows me to focus on the parts of the process that I know I can leverage to produce the greatest impact on a project.

So here are some thoughts I have had about my personal process, as I worked through things in the last two weeks.

Interactive Debugging Versus Logging¶

One thing that works for me is a focus on application logging. Now, in the last decade or so, I have heard many conversations saying that debugging through logging is very last-millennium and antiquated. In those conversations, the people around those conversations typically agree with the speaker because they do not want to appear old-fashioned and behind-the-curve. But from my point of view, that viewpoint is trying to fit everything into a single category instead of focusing on the benefits of both tools.

That is what I believe the discussion should be about: the right tool for the right problem. Any kind of interactive debugger is a tool. I would also argue that writing stuff to output and log files is also a tool, though a more process-oriented tool. Both tools have their use and their place. Focusing on one and downgrading the other is usually not a winning strategy. And those narrow strategies can often cost time and money.

Interactive debuggers are great at trying to pinpoint exactly what the problem is. As a developer walks through the code, they can pause and examine any data structure that the debugger has access to at that time. But if the developer needs to figure out what just happened, they are out of luck. As they can only see the state of how things are at that time, they need to restart the application that they are debugging and try and get to the desired point in the application as quickly as possible. And that exercise is not always easy.

Debugging through logging supplies a lot of information depending on what is logged and at what log level it is logged at. By executing the program with a specific log level, a lot of information can be stored for later examination. But the problem with this approach is that the developer needs to take the time to place log statements in their code to output desired information. There is also a non-zero execution cost to adding logging to an application.

Which is better for me? The truth is it depends. If I am working on something small and focused, usually I prefer using an interactive debugger. In most other cases, I prefer logging. I just find that it is easier for me to visualize the entire picture using logging, instead of the more limited view I get from a debugger. There is also an added benefit here in that many microservices are deployed in environments where interactive debugging is not allowed. I have found that having healthy experience with logging has put me ahead in those situations.

Understanding How You Typically Work… And Optimize It¶

This one may seem like an easy decision, but it took me a while to figure out what that process was for myself. It took a certain amount of honest soul searching, observation, and tool creation to get something that just works for me. And this is what I have found.

Write Tests First - Test Driven Development¶

For me, good project work starts with Test Driven Development. I find great utility in setting up the goals for a given part of a project before writing the code to satisfy that goal. I can honestly state that it has helped me improve my design skills by fleshing out all the combinations that I need to handle before I start writing the code. By having those combinations laid out in front of me, I can then visualize what I need to do at a high level and see if I have missed anything large in my design.

Develop In Small Steps¶

The next part of my process is implementing features in small steps. While it may seem counter intuitive, I usually start with the negative tests first. As I am implementing the code to satisfy those negative tests, I will typically add just-enough of the actual code needed to get the test passing. When it comes time for the positive tests, I have a good amount of foundation work done that paves the way for the rest of the code.

To help me in this part of my process is a good test framework and scripts to help me execute tests quickly, precisely, and efficiently. To reduce cognitive overhead, all my personal Python projects have a ptest.cmd script that is templated from project-to-project. It has a -k option to use when I only want to run against a subset of the tests. It has a -m option to use the multiple cores on my development system to speed up execution. It has a -a option to show all failed tests, instead of the default of stopping after the first five failures. It has a -c option to execute the tests with coverage tracking enabled. And to make sure I am seeing a good summary of this information, the output from the test script includes my Project Summarizer package to display a succinct summary of those tests and their code coverage.

I then basically travel in a small loop over the resolution for whatever issue or feature I am working on. I implement towards getting a test to pass and use the ptest.cmd script in various formats to move towards resolution. I do not worry about any kind of metrics other than one: is the test I am working on now passing. From my point of view, given a decent design, there is no benefit to me on working to make the code better until I have code that solidly passes the test that I am working on.

Set A High Bar for Quality - Clean And Polish The Code¶

However, when I get to that point, my clean.cmd script comes into play. Like the ptest.cmd script, I try and keep a singular script across projects, with minor changes for each project where needed. I do not worry about which tools are executed against which code bases. No matter how small the project, it is essentially the same tools for each project. For me, that reduces the cognitive overhead of trying to remember what is being executed against which project. It is always the same.

And yes, sometimes it takes multiple passes to pass the quality bar I have set for myself. But I am okay with that. If I have a set of tools that points out when my code is not at the quality level that I want, I can deal with it. From my perspective, it means that I can focus on getting the code working cleanly before I start to worry about making sure the code itself is clean and maintainable.

External Validation - Do Not Be Afraid to Seek Advice¶

I am constantly looking for ways in which I can hone my design skills, my testing skills, and my implementation skills. But I also realize that my brain has a finite size and I need to rely on external products to help me in curating my skills in small steps. As such, I will often search the GitHub Marketplace for interesting integrations that I can execute against the project code bases. As most of my projects are open-source, I can freely experiment with various integrations without worrying about exposing private repositories.

And, like all experiments, that experimentation is hit-and-miss. Some integrations occur in the VSCode editor that I use for writing Python, and some integrations occur when a Pull Request is created on GitHub. Sometimes there is a large amount of benefit to the integration, and sometimes it is just a tiny improvement, but an improvement in an area that I had not thought of before. If nothing else, I try it out and discard it a month later as not having enough benefit to justify the cost. However, I try and learn at least one thing from each integration that I try.

But for me, the important thing for me is to look for things that can help me and to try them out. I am honest with myself that I am not the world’s best Python developer. However, I can use integration tools to help me benefit from their knowledge. I would rather be a decent developer and experiment with various tools than to just sit on my chair and go “Yeah, I am good enough!”.

Be Realistic With Yourself¶

When I was reviewing this article, I decided that I wanted to add a section here about being realistic with yourself. I know that I often think things will take a certain amount of time, only to have the actual time be some multiple of that first estimate. But that is just how things are.

This recent upgrade to handling whitespaces properly and the testing of that change is a great example of this. I have somewhat lost track of the exact number of changes that I needed to make to resolve the issues, but I am confident it was somewhere between ten and twenty. For each issue, there was a certain amount of other work involved, but the focus was on the debug-code-test part of the process. If I was lucky, that part of the development effort was less than two hours. Most of the time it was in the five to seven hour range. It was usually the case that I got the resolution coded, only to find out that the changes I introduced negatively affected other parts of the code. There were also times where I made a change to make things work properly, only to require changes to over 50 tests that were verifying information based on the initial information.

I had to be realistic with myself that it all took time. A good example of this were the numerous changes to properly note where the newline (\n) character was in tokens. One small three-line change to the parser produces many small changes to individual tests. The ptest.cmd script would point out the newly failing test. I would then visually examine the test and the new proposed behavior to see if it made sense. If so, I would copy the added information into the test and run the ptest.cmd script with that specific test to make sure that the change was the right one.

That round trip usually took five to ten minutes per test. If I was working on a group of tests like that, I could usually get the time down to three minutes, but it took mental effort to do so.

I was hoping that I would be able to get all the tests working by the middle of my holiday vacation. I was grateful that I was able to get them working by the end of that same vacation. All it took was a bit of a “cognitive reset”.

Keep On Learning¶

It should be obvious that are two threads running through these sections: incremental learning and automation. I find that learning through repetition is an uncomplicated way for me to learn that has low friction associated with it. And by using automation as the vehicle for that learning, I ensure that the code will measure up against my quality bar. As I learn more, I meet that bar without having to use as much effort. It is a win-win as far as I am concerned.

What Was My Experience So Far?¶

It was a long slog. A really long slog. But a good side effect of that work is that I know my personal process for development is getting leaner and more efficient. I can cleanly define the stages of my development and adjust my use of tools to allow me to more efficiently focus on my goals for that stage. And I reaped the benefit of that during these last two weeks of worth. If I had to guess, I could have easily doubled the development time without the knowledge and tools that I used.

And I also am aware that I pushed through a lot of slight changes with large effects in the last two weeks. I do not believe that would be possible without me properly understanding myself, taking the time I need to recharge, and ensuring that I have a solid set of tools that works for me. And I find all that to be cool!

What is Next?¶

Having written down a fair number of new scenarios that I want to test, I am going to start testing them later this week. I am hopefully going to triage them tomorrow, so hopefully I can get to them and start resolving them by the end of the week! Stay tuned!

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments