As I mentioned last week, after the latest release of my PyMarkdown project, I wanted to take time to do some decent refactoring to the project. But refactoring can be addictive.
In these cases, I am doing some refactoring that I have been putting off for a long time. That is to say that I have let a lot of the individual files grow to a large enough size that they routinely have multiple responsibilities. While I often allow that in smaller files where it makes sense, on the larger files it just makes things easier to work with.
So let me start with an easy file that does not need much refactoring: the file
main.py.
At just over 450 lines, it is a bit too long for me to be comfortable with it, but
it takes care of the main orchestration of the application as its single responsibility.
Sure, it takes care of command orchestration, initialization, and parsing of the
command line, but I consider each of those concepts to be part of the main application
orchestration task. Given that context, the main.py
file does not require any
“reduction refactoring” to reduce the size of the file by properly refactoring the
code according to the responsibilities.
Next on the list are files such as
leaf_markdown_token.py.
Files like this are technically not too long, but they do contain many responsibilities.
In the case of the leaf_markdown_token.py
file, it contains a collection of Markdown
token classes grouped by their token type. While not ideal, it was convenient at
the time. These types of files are slated for long term refactoring. In the long
run, I want to pull as much information about a given token as possible into a
single file for that one token. But for right now, things are good as they are.
That leaves me with the files that are too large and have too many responsibilities. A good example of that is the inline_processor.py file from the beginning of February 2023. At 2795 lines, it was a beast to work with and to scan through. It was not created that way initially, but it did grow that way unabated. As such, it is a file that was on my list of files to look at for “reduction refactoring”. And by the time this article is published, the file should be broken up into a few distinct files, each with a single responsibility.
The reason that I bring this up is that I find there is a fine line between trying to optimize everything through refactoring and making progress on other goals. Refactoring is addictive. Making things better can give you a jump in your step, just knowing you made things better. But there is a point of diminishing returns.
For me, this kind of refactoring is something I should have been doing all along, but I focused on the tab reintegration and nested container blocks, not paying attention to how disorganized the files were getting. For me, this is not making things better, this is technical debt that I owed the project.
Hope that helps people figure out why I am spending so much time on the current refactoring, even though I want to get back to adding more scenario tests! Please stay tuned!
Comments
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.