In my last article, I talked about resolved the remaining Priority 1 items from the issues list. In this article, I talk about how I worked through the issue in creating an installable package for the project.
Having invested a lot of time getting the PyMarkdown project to a place where I feel confident in creating an initial release of the project, it was now time for me to create that release. To be honest, I was not sure what to expect out of the Python setup process. Creating releases for other languages is usually done as an add-on to the language, not part of the core language as Python does. As such, I was genuinely interested in how the process would differ between Python and the other languages I have written installers for.
Like everything else in this project, this was going to be a learning experience, and I was eager to get underway!
What Is the Audience for This Article?¶
While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please go to this project’s GitHub repository and consult the commits between 01 Apr 2021 and 03 Apr 2021.
Where To Start?¶
While the changes that I needed to perform on the project to get it from its then state to a packaged state were small, the path to get there was anything but short. Having done my usual research, I ended up finding three sources that I thought would be helpful to my effort:
I liked Nicolas’ article on creating Python packages because it was the first article that I found in my searches that seemed to lay everything out on the table. It felt that it provided me with a lot of useful information in a concrete, easy to digest form. While I did have a couple of issues with his examples, I do believe that they were because I was trying to adapt his example as I went and messed things up. The FreeCodeCamp article was useful in filling in the gaps that I found in Nicolas’ article, especially when it came to what to do after you had a package. Finally, having the Python 3.8 library documentation helped me fill in the last bit of the knowledge that I needed to complete the setup process. Together, with just a dash of experimentation thrown in for good measure, I was confident that I could create a Python package. Even if that effort took a while.
Creating a New Setup.py¶
While I have had a local
setup.py file on my machine for months, it was always something
that I was toying around with, nothing concrete. As such, I found that it was more
efficient to start from scratch and
create a new
setup.py file based mostly on Nicolas’s article. I do not have any
issues with his use of mainline function calls, such as the ones that he uses to read the
file from the directory, but I prefer things in functions. From my perspective, it just
helps me to keep things readable. I did like the
way he was organizing some of the values at the start of the module and decided to
follow that approach. Furthermore, I decided that it was more readable to have every
value in variables, instead of being somewhat hidden in the call to the
so I also made that change.
The Most Important Parts of Setup¶
For me, the four most important parts of any setup are: name of the package, version of the package, minimum required platform, and a declaration of any dependencies. Others can disagree with me on whether these things are the most important parts of any setup script, but I believe I have a strong argument in my favor. It is a simple argument: without these four parts, the rest of the setup script is useless. Any documentation without something to document is pointless. Similarly, any declaration of what needs to be included in the package and how to access it are useless without that base declaration. At least in my mind, those four properties are always the foundation of any installation script.
setup.py module from the article Nicolas wrote as a good set of
I created a very basic module:
import runpy from setuptools import setup PACKAGE_NAME = "PyMarkdown" SEMANTIC_VERSION = get_semantic_version() MINIMUM_PYTHON_VERSION = "3.8.0" def parse_requirements(): lineiter = (line.strip() for line in open("install-requirements.txt", "r")) return [line for line in lineiter if line and not line.startswith("#")] def get_semantic_version(): version_meta = runpy.run_path("./version.py") return version_meta["__version__"] setup( name=PACKAGE_NAME, version=SEMANTIC_VERSION, python_requires=">=" + MINIMUM_PYTHON_VERSION, install_requires=parse_requirements(), )
It was not much, but it was a good start. Both the package name and the minimum Python
version required are hardwired in as they are almost never going to change. The function
get_semantic_version was written to encompass the code from the article to fetch the
version number, and the
parse_requirements function was written to encompass the
requirements for the project.
Since I decided to specify the installation requirements for the project in the file
install-requirements.txt, I added a very simple version of this file with a single
Moving Version Information Into A Single Module¶
It took me a bit to warm up to this, but after reading
it just made sense. If there is any reason to know the exact version of a Python library,
__version__ field applied to the library name should contain the definitive version
for that library. Following this PEP just made sense but required some rearrangement of
code in the project.
Previously, the only place where the version information was kept was in the
__version_number field of the
PyMarkdownLint class. While I debated an approach that
would leverage that existing code, the simplicity of simply having a single
file just made more sense to me. With the
get_semantic_version function already present
setup.py module, as detailed in the last section, I added the following code to
PyMarkdownLint class to reference that same file:
@staticmethod def __get_semantic_version(): file_path = __file__ assert os.path.isabs(file_path) file_path = file_path.replace(os.sep, "/") last_index = file_path.rindex("/") second_last_index = file_path.rindex("/", 0, last_index) file_path = file_path[0 : second_last_index + 1] + "version.py" version_meta = runpy.run_path(file_path) return version_meta["__version__"]
This code is effectively the same code as in the
get_semantic_version function of
setup.py module. The only changes present were to deduce the executable path
and to determine the relative location of the
version.py file from where that
executable is located.
After all this work, the only thing that was needed was a new
""" Library version information. """ __version__ = "0.5.0"
and a small change to the
test_markdown_with_dash_dash_version test function to fetch
the version from the
With those basics out of the way, it was time to add the documentation basics to
def load_readme_file(): with open("README.md", "r") as readme_file: return readme_file.read() AUTHOR = "Jack De Winter" AUTHOR_EMAIL = "email@example.com" ONE_LINE_DESCRIPTION = "A GitHub Flavored Markdown compliant Markdown linter." LONG_DESCRIPTION = load_readme_file() LONG_DESCRIPTION_CONTENT_TYPE = "text/markdown" KEYWORDS = ["markdown", "linter", "markdown linter"] PROJECT_CLASSIFIERS = [ "Development Status :: 4 - Beta", "Programming Language :: Python :: 3.7", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", ] setup( ... author=AUTHOR, author_email=AUTHOR_EMAIL, description=ONE_LINE_DESCRIPTION, long_description=LONG_DESCRIPTION, long_description_content_type=LONG_DESCRIPTION_CONTENT_TYPE, keywords=KEYWORDS, classifiers=PROJECT_CLASSIFIERS, ...
Most of these fields are self-explanatory and are simple string objects or lists of
string objects. The
three fields that stand apart from that are the
LONG_DESCRIPTION_CONTENT_TYPE field, and the
LONG_DESCRIPTION_CONTENT_TYPE field is the easiest of the three as it assumes that
README file for the project will always be
README.md. As such, the MIME content
type for the long description will always be
text/markdown. For my projects, I feel
that it is a good assumption to make, so that was an easy one to get out of the way. Then,
to ensure that the
LONG_DESCRIPTION field is always up to
load_readme_file function reads the contents of the
README.md file and places
them into the
LONG_DESCRIPTION field. For me,
these fields just make sense as I can contain a package description of the project and
the GitHub description of the project in one place.
Finding the right values for the
PROJECT_CLASSIFIERS field was the tasks that I had the
hardest time with out of the three fields. With a seemingly
of available classifiers, it was hard to narrow down the classifiers to a small set.
While I am not comfortable that I have the right set of classifiers for the project,
I believe I have a good set to start with.
Looking at that work, the one thing that I needed to do to wrap it up was to
make sure that the
README.md file only contained information I wanted someone to
see when they were having their initial look at the project. While I do not want
to hide the project’s issues list, I did not want it to be the first thing people
saw. As such, I moved it over into the new
Rounding Out The Setup Properties¶
According to my research, the only two other fields that I needed to add were the
field and the
packages field. The
packages field was the easy one to define out of
those two: I simply needed to list all the packages for the project.1 While both
examples use the
setuptools module and its
find_packages function, I wanted to
maintain fine-grained control over the packages. As such, I specified each package name
setup( ... scripts=ensure_scripts(["scripts/pymarkdown"]), packages=[ "pymarkdown", "pymarkdown.extensions", "pymarkdown.plugins", "pymarkdown.resources", ], )
For the specification on how to start the PyMarkdown application, it took me a while to
decide on an action to use for that. During my research phase, I had three possibilities
for how to interact with the project itself:
There was barely any information on
entry_points and how to use them, so I decided to
not use those unless I found enough information to warrant changing to them. Looking to
my third reference source, the Python libraries documentation, I found this article
As that is what the standard libraries used, I decided that was the best way for this
Looking at the example that Nicolas provided in his article, I quickly created my own script:
#!/usr/bin/env python from pymarkdown import PyMarkdownLint PyMarkdownLint.main()
but came across one glaring problem right away. That script would work well on Linux
systems, but my development environment is a Windows machine. As I use the
scanner on all my Python projects, I decided to look at how they solved this problem,
and used their
ensure_scripts function verbatim2:
def ensure_scripts(linux_scripts): """ Creates the proper script names required for each platform (taken from PyLint) """ if util.get_platform()[:3] == "win": return linux_scripts + [script + ".bat" for script in linux_scripts] return linux_scripts
It is wonderful in its simplicity! If the first three characters of the platform
win, then the function assumes that the list of scripts must refer to scripts
that will work on a Windows machine. It accomplishes this by adding another
list of scripts to the list, this new list being comprised of every element of the
original list, but with a
.bat appended to the end. With that, the last thing
was to copy the
.bat batch file format over from
@echo off rem Use python to execute the python script having the same name as this batch rem file, but without any extension, located in the same directory as this rem batch file "%~dpn0" %*
I was not sure if that batch script was going to work, but if it was good enough for PyLint, I figured it was a good enough starting place for me.
Two simple things were left before my first attempt to compile my first Python package.
The first thing was to add a simple
LICENSE.txt file to the project
to establish the use of the project. The other was to add a
pymarkdown directory to make sure that the base of the project was considered
a module for setup to pick up.
With those two things addressed and out of the way, it was time to compile the setup for the project!
The Fun Begins: Getting Packaging To Work¶
To start compiling the setup, I included the
setuptools into my
development environment using
pipenv install twine setuptools. Once that
was complete, I added the following
package.cmd script to the repository to make things
rmdir /s /q dist rmdir /s /q build rmdir /s /q PyMarkdown.egg-info pipenv run python setup.py sdist bdist_wheel pipenv run twine check dist/*
It was nothing fancy, but it allowed me to repeatedly repackage the project to test
any changes in an efficient manner. Basically, it removes any signs of a previous
build before running the
setup.py script and then the
twine script. While it
is not as fancy as the Gradle scripts I have for Java projects at work, I found that it
is uncomplicated and works very well. I purposefully did not add any error handling to
the batch script as I wanted to make sure I saw all the information that was reported,
To assist in testing those changes, I created a new project
pymtest at the same level
PyMarkdown project and left it almost empty for now. I created that project
to be my test installation environment, useful once I had a package to install. For now,
I just wanted to get it ready for later. Thus, I created a simple
script with these contents:
pipenv uninstall PyMarkdown pipenv install ..\pymarkdown\dist\PyMarkdown-0.5.0.tar.gz
Simply, uninstall any existing
PyMarkdown package and install a new one right from
dist directory of the
Now on to the real work: debugging the install script.
Pass 1: Getting The Version Right¶
package.cmd script, everything worked fine, and I had a new package to
test! Switching over to my test project, I executed the
script… and waited. Looking at the output, the
uninstall command was completing
in under a second, but the install command was taking its time on the Resolving
phase of installing the package. It was agonizing!
But when it was done, it displayed the following error:
ERROR: Command errored out with exit status 1: ... FileNotFoundError: [Errno 2] No such file or directory: '..pip-req-build-mfg5j1bu\\version.py' ---------------------------------------- ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
I tried a couple of different things with no luck before I opened the
PyMarkdown-0.5.0.tar.gz archive file from the project’s
dist directory and examined
its contents. When I did that, I noticed that there was no
version.py file anywhere in
At that point, I spent about an hour or so trying to figure out how to get that
version.py file into the archive at the right place before deciding to go with a more
intuitive approach. After looking
at how the files were installed after the
install pymarkdown command was completed,
it was obvious that my current approach would necessitate copying the
pymarkdown directory. So, instead of trying to figure out how to do that
“complicated” action, I decided on the “simple” action to move the
With that decision made, I rewrote the
get_semantic_version in the
module as follows:
def get_semantic_version(): version_meta = runpy.run_path("./pymarkdown/version.py") return version_meta["__version__"]
I also rewrote the
__get_semantic_version function in the
main.py module as follows:
def __get_semantic_version(): file_path = __file__ if not os.path.isabs(file_path): assert False file_path = file_path.replace(os.sep, "/") last_index = file_path.rindex("/") file_path = file_path[0 : last_index + 1] + "version.py" version_meta = runpy.run_path(file_path) return version_meta["__version__"]
version.py file moved into the
pymarkdown directory, and with both references
to that file now looking for it in the new location, that error was now resolved.
Pass 2: File Name Casing Matters¶
After packaging the project again, I ran the
refresh_package.cmd script and
was now greeted with this error:
FileNotFoundError: [Errno 2] No such file or directory: 'README.md'
I examined the directory structure of the archive a good four or five times without any
ideas coming to mind. I even looked at the Python install pages to see if I could find
anything. But all I could find with a list of the
files to distribute.
This included other types of readme files, but not specifically the
Double checking the project that Nicolas set up, I saw that he was using
a source for his long documentation without any apparent extra setup needed to include
that file. So, I figured it must be something else.
That is when it hit me. Windows has many uses as an operating system3, but one
of the things I do not like about it is the case-insensitivity of the file system. In
this case, I had called the readme file
readme.md instead of
correcting the case of the file name resolved this issue.
After a simple case of “cannot see the forest because of the trees”, it was on to the next issue.
Pass 3: Making Sure The Right Files Are Included¶
This time, when I executed the
refresh_package.cmd script after repackaging the project,
I was greeted with this error:
FileNotFoundError: [Errno 2] No such file or directory: 'install-requirements.txt'
With some newfound experience under my belt, I immediately opened the archive and
found that the
install-requirements.txt file was not in the archive. Thankfully, in
looking for solutions for the last error, I came across a solution to include data
files into the setup process using a
MANIFEST.in file. Located in the same section where
I found the information detailing which files were
automatically included in the setup archive,
there is information on the
MANIFEST.in file near the end of that section.
Following those instructions, I was quickly able to create a new
with the following contents:
After a quick repackaging and reinstalling, this error was indeed solved.
Pass 4: Lather, Rinse, Repeat¶
While that file was now present in the archive, the new error was complaining about a missing directory:
error: package directory 'pymarkdown\resources' does not exist
The main reason for this directory is to host the
entities.json file. That file
contains each of the named entities, with the corresponding Unicode character they each
entity maps to. I tried adding
__init__.py and other such workarounds to get the file included, but nothing worked.
Convinced that I had tried other approaches, I followed the same approach as the last
section, and added it to the
I do not want to make it sound that I dislike the
MANIFEST.in approach to including files
in the setup archive. I don’t. But to me, it feels like that file is the last option to
include files, with all other options having been exhausted. For me, that is my own
for whether the use of the
MANIFEST.in file is warranted. For example, I would rather
figure out that I need to change the
readme.md file into the
README.md file before I
thought about adding it to the
MANIFEST.in file. In this case, I was convinced that
there was no other way to include the file, and as such, I had passed my own sniff test.
And It Was Done¶
With that change made, I was now seeing the refresh of the packaging complete without any errors:
Installing ..\pymarkdown\dist\PyMarkdown-0.5.0.tar.gz... Adding PyMarkdown to Pipfile's [packages]... Installation Succeeded Pipfile.lock (db4242) out of date, updating to (29513d)... Locking [dev-packages] dependencies... Locking [packages] dependencies... Locking...Building requirements... Resolving dependencies... Success! Updated Pipfile.lock (29513d)! Installing dependencies from Pipfile.lock (29513d)... ================================ 1/1 - 00:00:05 To activate this project's virtualenv, run pipenv shell. Alternatively, run a command inside the virtualenv with pipenv run.
And Now, Verifying The Usage¶
With everything looking good in the packaging and installation, the next step was
to test the usage of the newly installed library. With optimism in my heart, I went to
execute my first test command,
pipenv run pymarkdown --help, and I waited. After a good
couple of minutes, I killed the script, checked things again, and everything seemed fine.
It seemed like I was not done debugging the setup process quite yet.
Pass 1: Proper Script Files¶
Having “imported” the script files from the PyLint project, I hoped they would work
out of the box, but assumed that I would have to do some work to get them operational.
I liked the idea of calling the
pymarkdown script from the
but after 45 minutes and approximately 4 attempts at rewriting the scripts, I gave up.
Just like before, I decided to go with simplicity for both files, the
#!/usr/bin/env python from pymarkdown import PyMarkdownLint PyMarkdownLint().main()
python -c "from pymarkdown import PyMarkdownLint; PyMarkdownLint().main()" %*
Instead of having one script call the other, I opted for matching the contents of both
scripts as closely as possible. In the shell version, the
at the start of script takes care of invoking Python and Python itself takes care
of the command line arguments. In the batch script version, I needed to explicitly
call Python with the
-c argument to tell Python to execute the next argument as
a Python script. Finally, the
$* at the end of that line causes any arguments passed
to the batch script to be passed to the Python program specified with the
After a couple of tries, mostly due to small typing mistakes, when I executed the
pipenv run pymarkdown --help, I was welcomed with the help documentation
for the project. Success!
Pass 2: Init Files¶
With the batch
script issue in the last section resolved, the execution of the test command
pipenv run pymarkdown --help now yielded this error:
Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: cannot import name 'PyMarkdownLint' from 'pymarkdown' (C:\Users\jackd\.virtualenvs\pymtest-W-bOTTm6\lib\site-packages\pymarkdown\__init__.py)
Perhaps it is my knowledge of other programming languages, but I favor direct imports
in the files that need them over the use of
__init__.py modules. For me, it just
seems like overkill in 98% of the cases, leading to a hard-to-understand view of
dependencies between files. In the case of creating a setup package, this
turned out to be one of the 2% cases that I had not come across yet.
But, seeing it as this was an obvious request for a proper
__init__.py module, I added
one to the
pymarkdown package with the contents:
from pymarkdown.main import PyMarkdownLint
I do not use it in any of the other modules for the project, but it is there for the
setup.py module and any others that need it. As such, I can stay true to how I use
import statements while providing the information that the setup scripts need.
For me, that is a win-win.
Pass 3: Including Data Files¶
pipenv run pymarkdown --help command now running without any issues, I wanted
to include some more complex examples to test in the
refresh_package.cmd script. To
that extent, I added the following lines to the end of that file:
pipenv run pymarkdown plugins list pipenv run pymarkdown plugins info md048 pipenv run pymarkdown plugins info md047 pipenv run pymarkdown scan ..\blog-content\website\content\articles
Going through the reinstall process with the new version of this script, the installation
and the first three commands all went off without any issues. However, when it got to the
scan command, the following error was emitted:
BadTokenizationError encountered while initializing tokenizer: Named character entity map file '..\lib\site-packages\pymarkdown\resources\entities.json' was not loaded ([Errno 2] No such file or directory: '..\\lib\\site-packages\\pymarkdown\\resources\\entities.json').
Going back to the useful
files to distribute section,
I quickly noticed that one of the items in the list was labelled
Installing Additional Files.
This seemed to fit the situation that I had before me exactly. Reading the information
on the other side of that link, I knew what to do within a couple of minutes. Within a
couple more minutes, I had this change coded up and inserted at the end of the
function call in the
setup( ... data_files=[('Lib/site-packages/pymarkdown/resources', ['pymarkdown/resources/entities.json'])] )
Going through the entire process again, everything worked fine, and I was now done with the test scenarios I had in mind. I tried a handful of additional scenarios to make sure I had them all covered, and each scenario worked as I expected it to. I had a fully functioning install script!
Pass 3.1: Cleanup¶
This was not really a pass on its own, but a little bit of cleanup that I wanted to
do. While looking at various other Python setup articles and library packages, I
decided to add three more arguments to the
setup function call:
setup( ... maintainer=AUTHOR, maintainer_email=AUTHOR_EMAIL, url=PROJECT_URL, ...
Since I am both the author and the maintainer, it just made sense to set the
fields to the same values as with the
author fields. I also wanted people to be able to
get more information on the project, so setting the
url field also made sense.
What Was My Experience So Far?¶
Based on my experience with other languages, creating an installation package for the project in Python was a walk in the park. There was no fancy extra packaging required, everything was written in Python. While it took me about four hours to make sure everything was working properly, I would estimate that a similar installer for C# or Java would easily take at least eight hours to get into a similarly finished form. For me, that is a win.
In general, I am very pleased with how this work went on getting the setup code into proper shape. There were some very good examples that I could lean on to get my code working, and the starting points were all well-defined. That made the distance I needed to travel from sample code to working code very short, which was very pleasant for once. During the creation of the setup script, I did notice a couple of extra things that I want to clean up before the initial release. But like before, they are all small and reasonable, so I am confident I can make short work of them.
What is Next?¶
With the setup packaging complete for now, I move on to simplifying the output from some of the commands and starting to update the rules for the initial release.
I almost feel that a “duh?!” would be warranted here, but do not feel that it is appropriate. ↩
This comment is not meant to start a religious war. I firmly believe that there are many different jobs that need done, with some tools being the obvious choice for that job. There are other jobs where the tools that can be used are more on personal preference combined with the job at hand. For myself, operating systems are just that: tools. ↩
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.