My website is now up and running, even though in my mind it took forever. To make sure everything was “just so”, I went through each article with a fine-toothed comb multiple times, each with a slightly different thing I was looking for. In the end, it worked out, but I wished I could have automated at least some of that work and reduced the time it took to do it. And I also have a lingering question of whether I got everything, or did I miss something out?
What Is a Linter?¶
A long time ago, when I first heard the term “lint”, I thought someone was referring to the stuff that you find in the clothes dryer trap that you need to clean out. According to Wikipedia, my guess was close. Like the “undesirable bits of fiber and fluff found in sheep’s wool” from the Wikipedia article, software linters are used to detect undesirable practices and patterns in the objects they scan. Once pointed out, the development team can then decide whether to address the issue or ignore the issue.
Doing My Research¶
I started looking around, and even though there are a few Markdown-to-HTML command
line programs out there, finding a solid Markdown linter was another story. I did find
a couple of starts at making one, but not a finished one that I could use. The only
exception was the NPM-based
by David Anson. This VSCode plugin is pretty much a standard for anyone creating
content in Markdown using VSCode, with well over 1.3 million downloads as of the writing
of this article. By default, as you save articles, this linter executes and produces a
list of issues in the
Problems section at the bottom of the VSCode editor.
This tool is indeed handy while writing an article, but the act of verifying multiple
articles becomes a bit of chore. My general process was to open a document I wanted to
inspect, make a small whitespace changes, save the file, and examine the
section to see what the linter came up with. Two things were more annoying about this
process that others. The first issue is that any issue for any file that is open is
displayed in that section. If I wanted to be efficient, it meant closing every other
file and just working on a single file at a time. The second issue is that other
plugins write their problems there as well. As a lot of my content is technical, there
are a fair number of spelling issues that arise that I need to ignore. Once again,
neither one of these issues is a bad thing, just annoying.
What Are the Requirements?¶
Doing some thinking about this during the couple of weeks that I worked on the website, a decent set of requirements crystalized:
- must be able to see an accurate tokenization of the markdown document before translating to HTML
- working with an accurate tokenization remedies any translation problems instead of translating from HTML
- all whitespace must be encoded in that token stream as-is
- for consistency, want an exact interpretation
- initial tokenization for GitHub Flavored Markdown only, add others later
- initial tests against the GitHub Flavored Markdown specs
- plans to later add other flavors of parser
- must be able to provide a consistent lexical scan of the Markdown document from the command line
- clean feedback on violations
- extending the base linting rules should require very little effort
- clear support for adding custom linting rules.
- written in Python
- good cross-platform support
- same language as Pelican, used as the Static Site Generator for my website
While there are only 5 requirements, they are important. The first two requirements speak to reliability: the parsed Markdown tokens should be complete. The third requirement is for stability: write against one specification with a solid set of test cases before moving on to others. The fourth requirement is all about usability: the linter can be run from any appropriate command line. Finally, the fifth requirement is about extensibility: add any needed custom rules.
From my point of view, these requirements help me visualize a project that will help me maintain my website by ensuring that any articles that I write conform to a simple set of rules. Those rules can be checked by a script before I commit them, without having to load up a text editor. Simple.
Why Is This Important to Me?¶
Writing this section, it took me a couple of tries to word this properly. In the end, I settled on a basic phrase: It is a tool that I can use to make a software project better.
In other parts of my professional life, I take a look at things such as a Java project and try and improve the quality of that project. The input is mainly Java source code and the output is mainly JAR files that are executed by a JVM. My website is no different. Remove Java source code and insert Markdown documents. Remove JAR files executed by a JVM and insert HTML files presented by a browser. There are a few differences between the two types of projects, but in all the important ways, they are the same.
I took the time to manually scan each article for my website multiple times before I
did my website’s soft release. To me, it just makes sense that there should be an
easier way to perform that process. Easier in terms of time, and easier in terms of
consistency. Unless I am missing something out there on the Internet, the only project
that came close to fulfilling my requirements was
Markdownlint, and it still had some
things missing. I came to the realization that to be able to lint a Markdown file
against a set of rules, I was going to have to write my own Markdown parser.
In the last couple of decades of professional life, I have written many parsers, so that part of the project does not scare me. Due to the great works of the people at the GFM site, there we a solid number of test cases that I can test the parser against. The extensibility issue would make me look at different ways to integrate code into my tool, so a plus there. All in all, a decent number of things I must get right, but nothing too far out of my field of experience.
Sure, it would be hard in places… but also a challenge! Just the kind of thing I like!
What Comes Next?¶
In the next article, I start breaking down the requirements for the Markdown parser and document how I will setup the tests for it. As I am parsing a well-known format with varying implementations already available, it is important to stay focused on one implementation and have a solid set of tests to ensure I don’t go backwards in my development.
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.