Summary¶
In my last article, I talked about applying Python Type Hints to the PyMarkdown project. In this article, I talk about more Type Hint work across the multiple projects that I maintain.
Introduction¶
Sometimes projects can take on a mind of their own. That is how I felt about the PyMarkdown project and adding Python Type Hints to it. It was just soaking up time like a dry sponge. It is true that I could have taken a break from adding the hints for a while, but in that case, I was sure that I was going to feel like the job was incomplete. While it may not have been the most pressing task to complete, I decided to move forward with it anyway.
However, there was a silver lining. Along the way, I was able to start providing updates to my other project and getting them all coordinated with scripts and supporting Type Hints. To me, that was a plus!
Issue 319 - The Final Commit?¶
Having rare available time during the week, I decided to move forward with Type Hints and the PyMarkdown project, or at least move ahead as much as time would allow. It wasn’t always pretty work, as I detailed in my last article, but it was good work none the less. Starting at 1147 issues, by the middle of the week, I was able to get the number down to around 125 issues before the fun began.
Sometimes, You Just Need To Be Literal¶
As I winnowed down the issues to that group of 125, I started to encounter a number
of interesting scenarios. One of these scenarios was with the
generate_close_markdown_token_from_markdown_token
function in the MarkdownToken
module:
def generate_close_markdown_token_from_markdown_token(
self,
extracted_whitespace: str,
extra_end_data: str,
line_number: int = 0,
column_number: int = 0,
) -> MarkdownToken:
When I coded it up, everything looked fine. And then it hit me. I had just added
a return type hint of MarkdownToken
within the module MarkdownToken
. The yellow
error line in VSCode under the class name confirmed it. Therefore, it was no surprise
that after starting a quick run of the project tests that I got my expected
confirmation response back:
pymarkdown\markdown_token.py:22: in <module>
class MarkdownToken:
pymarkdown\markdown_token.py:585: in MarkdownToken
) -> MarkdownToken:
E NameError: name 'MarkdownToken' is not defined
Doing quick research on the subject, my thoughts were spot on.
Until the Python class is completed, it is not registered
as a complete class. As such, any type hint to the class to itself is not legitimate.
That was the bad news. The good news is that when the type hints framework was being
designed, this scenario had been prepared for. By changing the last line from
specifying the type hint as a type () -> MarkdownToken:
) to specifying the type
hint as a literal () -> "MarkdownToken":
), the error disappeared.
Covered in the documentation of
PEP 484 - Type Hints as
forward references, these exist to resolve thorny cyclic imports and cases where
a class holds references to itself.
After fixing the handful of issues like this, it was then on to the last major set of issues: stubs.
Stubs: Application_Properties¶
Looking at the remaining issues, the one group of remaining issues were issues
that had problems with imported types. Specifically, there were three
imported packages that did not have any type information:
columnar
, its dependent package wcwidth
, and the applications_properties
package. To effectively use Mypy to verify the types used by the PyMarkdown project,
it needed to know the types of the classes in those three external packages.
After doing my usual research, I found that for the first two packages, columnar
and
wcwidth
, the best solution was to use the stubgen command. Packaged with the
Mypy package, stubgen tries to generate a set of importable stubs that are somewhat
close to the actual types that should have been included with the package. If that
sounds like I am choosing my words carefully, it is because I am. Based on their
document and my experience, using stubgen in the following way:
stubgen --output stubs -p columnar
stubgen --output stubs -p wcwidth
generated stub files into the stubs
directory for those two packages. Those
stubs files were not spot on, but they were reasonably close. If I had to associate
a percentage with their accuracy, I would say their initial accuracy was in the
80-90% range. Outside of that range, there was far more usage of the Any
type
than I was comfortable with, with numerous cases where the type hints were missing.
But with extra research and experimentation, including a healthy dose of trial
and error, I was able to quickly tune the stubs into a usable format.
That left the application_properties
package. I know that I could have taken
the same route as with the other packages, but I decided to spend time to improve
the application_properties
project, setting it up properly with the correct type
information.
However, once I had that type information
dialed in, I found out that there were additional changes that I needed to make
to allow Mypy to see that type information. The
first change was to add a py.typed
file in the package’s directory to let Mypy
know that it had type information that was available. This also meant adding that
file to the MANIFRST.in
file to ensure it was copied as part of the package.
Then I added an array to the __all__
variable in the package’s __init__.py
file to ensure that Mypy had a good list of all the classes to load. That took
a couple of hours and a healthy dose of trial and error, but it was all worth it!
After that, deciding to leave things better than I found them (an old scouting
and hiking habit), I upgraded a handful of support files in the project. Most of the
scripts were over six months old and out of date with my other projects, so I just
went ahead on updated them. At the same time, the Pipfile
that I use with
pipenv
was complete,
but had two original packages whose version was *
and six copied packages with
the same *
version. To be consistent, I looked up their versions and set those
looked-up versions into the Pipfile
.
Why did I do that? Call it covering all bases. From my experience, it is particularly
useful to drop any variables where possible. A version identifier of *
means
that the version used is always the latest version, which is a moving target. I
ran across a good example of how that can affect things this week, with a weird change to the
black
package. As I updated the black
package’s version, the project’s clean.cmd
script
(which uses black
) started reporting errors. Basically, there was a disconnect
between the black
package and the click
package, as
documented here. If I had specified
the black
package with a *
version, I would have not been able to execute
the black
formatter for as long as this fix took to create and release.
I did a small handful of other changes, which I considered bookkeeping.
I added a main.yml
file that was a close copy of the one from the PyMarkdown
project. Nothing fancy, just there to execute scenario tests and lint tests as part
of the merge process. I added some pre-commit
configuration and set it to
scan the project root and project docs
directory using PyMarkdown. With all
that work completed, I did not want it to fall into disrepair, so I added a
dependabot.yml
file to automatically scan for newer package versions.
With a quick release of version 0.5.2
of the project, it was then back to
the other work.
Back to Issue 319¶
With a healthy amount of testing done on the new release of application_properties
,
I switched the package version in PyMarkdown’s Pipfile
from 0.5.0
to
0.5.2
and removed the test files that I had in the stubs/application_properties
directory. As I started my clean.cmd
script, I held my breath. I had done
the work to hopefully make this work, but this is what it all came down to.
And after a good minute or two, I was rewarded with a complete execution of
the clean.cmd
script with no errors.
I was close to being done. Turning on the strict mode for Mypy, I had around thirty issues that I had yet to resolve, each of them being resolved within minutes of each other. Within forty-five minutes, those issues were gone and Mypy was being called with strict mode enabled and no issues being reported!
PyMarkdown Release 0.9.6¶
I decided that I had a good, healthy number of changes queued up, and a point release was overdue. Putting together a quick mental list of things to clean up before I released the project, I made quick work of those issues.
While I do try and maintain the changelog.md
file, I often fall behind. Knowing that, I
took some time and cleaned up that file and double checked it for accuracy. Repeating
the work on the application_properties
project and its Pipfile
, I went through
and adjusted the file to remove any *
version identifiers. I also added the same dependabot.yml
file to the PyMarkdown project that was in the application_properties
project.
After changing the version to 0.9.6
, setting the tag, creating the package, and
uploading the package, I was at a good point with the PyMarkdown project.
Getting Back To The Project Summarizer Project¶
Having done all that work by Friday night, I was faced with a weekend with no planned work ahead of me. I had honestly thought it was going to take me all week to resolve the Mypy issues, so I was a bit lost on what to do when Saturday afternoon came around. Taking a page from the work that I did during the week, I went through and applied the same kind of updates to the Project Summarizer project as I had to the other two projects.
With all projects updated to my current specifications, I started to work on my design for adding plugin support to the Project Summarizer project. From my perspective, the key to having a good summarizer was allowing for end users to add functionality for their own situations. As such, I was confident that this meant supporting plugins in a fashion like what I had done with the PyMarkdown project. But I also know that this was going to be different because the requirements were slightly different. In the case of the Project Summarizer project, I figured out that the plugins needed to be parsed and at least partially acted upon before the normal command line processing.
The reasoning for that was quite clear to me. Unlike the other projects I have added plugin support to, the plugins for the Project Summarizer project needed to be able to inject command line arguments into the normal command line processing workflow. The summarizer works by taking existing reporting files and supplying summaries of the content contained within. Simply put, if I wanted the plugins to be able specify the report files to act on, I needed to evaluate the plugins to use before the normal processing occurred.
Doing a bit of experimentation, I was sure I would be able to come up with something decent, it was just a matter of which one of those options worked better than the others.
But, with my research completed and a long week ahead of me, I decided to leave it there. I was sure to have more time in the middle of the week to work on things, and that was good enough for me. I know that I want to make progress with plugins for the Project Summarizer project, and I want to keep working on my other projects, not just the PyMarkdown project. With that focus in mind, I put my computer to sleep on an early Sunday night.
What is Next?¶
Having focused exclusively on the PyMarkdown project for a couple of months, I am going to try harder to split my time between my different projects. At the very least, I hope to try harder to do so. Stay tuned!
Comments
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.