Markdown Linter - Back To Project Summarizer... The Long Way

Summary¶

In my last article, I talked about applying Python Type Hints to the PyMarkdown project. In this article, I talk about more Type Hint work across the multiple projects that I maintain.

Introduction¶

Sometimes projects can take on a mind of their own. That is how I felt about the PyMarkdown project and adding Python Type Hints to it. It was just soaking up time like a dry sponge. It is true that I could have taken a break from adding the hints for a while, but in that case, I was sure that I was going to feel like the job was incomplete. While it may not have been the most pressing task to complete, I decided to move forward with it anyway.

However, there was a silver lining. Along the way, I was able to start providing updates to my other project and getting them all coordinated with scripts and supporting Type Hints. To me, that was a plus!

Issue 319 - The Final Commit?¶

Having rare available time during the week, I decided to move forward with Type Hints and the PyMarkdown project, or at least move ahead as much as time would allow. It wasn’t always pretty work, as I detailed in my last article, but it was good work none the less. Starting at 1147 issues, by the middle of the week, I was able to get the number down to around 125 issues before the fun began.

Sometimes, You Just Need To Be Literal¶

As I winnowed down the issues to that group of 125, I started to encounter a number of interesting scenarios. One of these scenarios was with the generate_close_markdown_token_from_markdown_token function in the MarkdownToken module:

    def generate_close_markdown_token_from_markdown_token(
        self,
        extracted_whitespace: str,
        extra_end_data: str,
        line_number: int = 0,
        column_number: int = 0,
    ) -> MarkdownToken:

When I coded it up, everything looked fine. And then it hit me. I had just added a return type hint of MarkdownToken within the module MarkdownToken. The yellow error line in VSCode under the class name confirmed it. Therefore, it was no surprise that after starting a quick run of the project tests that I got my expected confirmation response back:

pymarkdown\markdown_token.py:22: in <module>
    class MarkdownToken:
pymarkdown\markdown_token.py:585: in MarkdownToken
    ) -> MarkdownToken:
E   NameError: name 'MarkdownToken' is not defined

Doing quick research on the subject, my thoughts were spot on. Until the Python class is completed, it is not registered as a complete class. As such, any type hint to the class to itself is not legitimate. That was the bad news. The good news is that when the type hints framework was being designed, this scenario had been prepared for. By changing the last line from specifying the type hint as a type () -> MarkdownToken:) to specifying the type hint as a literal () -> "MarkdownToken":), the error disappeared. Covered in the documentation of PEP 484 - Type Hints as forward references, these exist to resolve thorny cyclic imports and cases where a class holds references to itself.

After fixing the handful of issues like this, it was then on to the last major set of issues: stubs.

Stubs: Application_Properties¶

Looking at the remaining issues, the one group of remaining issues were issues that had problems with imported types. Specifically, there were three imported packages that did not have any type information: columnar, its dependent package wcwidth, and the applications_properties package. To effectively use Mypy to verify the types used by the PyMarkdown project, it needed to know the types of the classes in those three external packages.

After doing my usual research, I found that for the first two packages, columnar and wcwidth, the best solution was to use the stubgen command. Packaged with the Mypy package, stubgen tries to generate a set of importable stubs that are somewhat close to the actual types that should have been included with the package. If that sounds like I am choosing my words carefully, it is because I am. Based on their document and my experience, using stubgen in the following way:

stubgen --output stubs -p columnar
stubgen --output stubs -p wcwidth

generated stub files into the stubs directory for those two packages. Those stubs files were not spot on, but they were reasonably close. If I had to associate a percentage with their accuracy, I would say their initial accuracy was in the 80-90% range. Outside of that range, there was far more usage of the Any type than I was comfortable with, with numerous cases where the type hints were missing. But with extra research and experimentation, including a healthy dose of trial and error, I was able to quickly tune the stubs into a usable format.

That left the application_properties package. I know that I could have taken the same route as with the other packages, but I decided to spend time to improve the application_properties project, setting it up properly with the correct type information.

However, once I had that type information dialed in, I found out that there were additional changes that I needed to make to allow Mypy to see that type information. The first change was to add a py.typed file in the package’s directory to let Mypy know that it had type information that was available. This also meant adding that file to the MANIFRST.in file to ensure it was copied as part of the package. Then I added an array to the __all__ variable in the package’s __init__.py file to ensure that Mypy had a good list of all the classes to load. That took a couple of hours and a healthy dose of trial and error, but it was all worth it!

After that, deciding to leave things better than I found them (an old scouting and hiking habit), I upgraded a handful of support files in the project. Most of the scripts were over six months old and out of date with my other projects, so I just went ahead on updated them. At the same time, the Pipfile that I use with pipenv was complete, but had two original packages whose version was * and six copied packages with the same * version. To be consistent, I looked up their versions and set those looked-up versions into the Pipfile.

Why did I do that? Call it covering all bases. From my experience, it is particularly useful to drop any variables where possible. A version identifier of * means that the version used is always the latest version, which is a moving target. I ran across a good example of how that can affect things this week, with a weird change to the black package. As I updated the black package’s version, the project’s clean.cmd script (which uses black) started reporting errors. Basically, there was a disconnect between the black package and the click package, as documented here. If I had specified the black package with a * version, I would have not been able to execute the black formatter for as long as this fix took to create and release.

I did a small handful of other changes, which I considered bookkeeping. I added a main.yml file that was a close copy of the one from the PyMarkdown project. Nothing fancy, just there to execute scenario tests and lint tests as part of the merge process. I added some pre-commit configuration and set it to scan the project root and project docs directory using PyMarkdown. With all that work completed, I did not want it to fall into disrepair, so I added a dependabot.yml file to automatically scan for newer package versions.

With a quick release of version 0.5.2 of the project, it was then back to the other work.

Back to Issue 319¶

With a healthy amount of testing done on the new release of application_properties, I switched the package version in PyMarkdown’s Pipfile from 0.5.0 to 0.5.2 and removed the test files that I had in the stubs/application_properties directory. As I started my clean.cmd script, I held my breath. I had done the work to hopefully make this work, but this is what it all came down to. And after a good minute or two, I was rewarded with a complete execution of the clean.cmd script with no errors.

I was close to being done. Turning on the strict mode for Mypy, I had around thirty issues that I had yet to resolve, each of them being resolved within minutes of each other. Within forty-five minutes, those issues were gone and Mypy was being called with strict mode enabled and no issues being reported!

PyMarkdown Release 0.9.6¶

I decided that I had a good, healthy number of changes queued up, and a point release was overdue. Putting together a quick mental list of things to clean up before I released the project, I made quick work of those issues.

While I do try and maintain the changelog.md file, I often fall behind. Knowing that, I took some time and cleaned up that file and double checked it for accuracy. Repeating the work on the application_properties project and its Pipfile, I went through and adjusted the file to remove any * version identifiers. I also added the same dependabot.yml file to the PyMarkdown project that was in the application_properties project. After changing the version to 0.9.6, setting the tag, creating the package, and uploading the package, I was at a good point with the PyMarkdown project.

Getting Back To The Project Summarizer Project¶

Having done all that work by Friday night, I was faced with a weekend with no planned work ahead of me. I had honestly thought it was going to take me all week to resolve the Mypy issues, so I was a bit lost on what to do when Saturday afternoon came around. Taking a page from the work that I did during the week, I went through and applied the same kind of updates to the Project Summarizer project as I had to the other two projects.

With all projects updated to my current specifications, I started to work on my design for adding plugin support to the Project Summarizer project. From my perspective, the key to having a good summarizer was allowing for end users to add functionality for their own situations. As such, I was confident that this meant supporting plugins in a fashion like what I had done with the PyMarkdown project. But I also know that this was going to be different because the requirements were slightly different. In the case of the Project Summarizer project, I figured out that the plugins needed to be parsed and at least partially acted upon before the normal command line processing.

The reasoning for that was quite clear to me. Unlike the other projects I have added plugin support to, the plugins for the Project Summarizer project needed to be able to inject command line arguments into the normal command line processing workflow. The summarizer works by taking existing reporting files and supplying summaries of the content contained within. Simply put, if I wanted the plugins to be able specify the report files to act on, I needed to evaluate the plugins to use before the normal processing occurred.

Doing a bit of experimentation, I was sure I would be able to come up with something decent, it was just a matter of which one of those options worked better than the others.

But, with my research completed and a long week ahead of me, I decided to leave it there. I was sure to have more time in the middle of the week to work on things, and that was good enough for me. I know that I want to make progress with plugins for the Project Summarizer project, and I want to keep working on my other projects, not just the PyMarkdown project. With that focus in mind, I put my computer to sleep on an early Sunday night.

What is Next?¶

Having focused exclusively on the PyMarkdown project for a couple of months, I am going to try harder to split my time between my different projects. At the very least, I hope to try harder to do so. Stay tuned!

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments

Markdown Linter - Back To Project Summarizer... The Long Way

Summary¶

Introduction¶

Issue 319 - The Final Commit?¶

Sometimes, You Just Need To Be Literal¶

Stubs: Application_Properties¶

Back to Issue 319¶

PyMarkdown Release 0.9.6¶

Getting Back To The Project Summarizer Project¶

What is Next?¶

Comments

Reading Time

Published

Project Summarizer

Category

Tags

Stay in Touch