Summary¶
In my last article, I talked about resolved the remaining Priority 1 items from the issues list. In this article, I talk about how I worked through the issue in creating an installable package for the project.
Introduction¶
Having invested a lot of time getting the PyMarkdown project to a place where I feel confident in creating an initial release of the project, it was now time for me to create that release. To be honest, I was not sure what to expect out of the Python setup process. Creating releases for other languages is usually done as an add-on to the language, not part of the core language as Python does. As such, I was genuinely interested in how the process would differ between Python and the other languages I have written installers for.
Like everything else in this project, this was going to be a learning experience, and I was eager to get underway!
What Is the Audience for This Article?¶
While detailed more eloquently in this article, my goal for this technical article is to focus on the reasoning behind my solutions, rather that the solutions themselves. For a full record of the solutions presented in this article, please go to this project’s GitHub repository and consult the commits between 01 Apr 2021 and 03 Apr 2021.
Where To Start?¶
While the changes that I needed to perform on the project to get it from its then state to a packaged state were small, the path to get there was anything but short. Having done my usual research, I ended up finding three sources that I thought would be helpful to my effort:
I liked Nicolas’ article on creating Python packages because it was the first article that I found in my searches that seemed to lay everything out on the table. It felt that it provided me with a lot of useful information in a concrete, easy to digest form. While I did have a couple of issues with his examples, I do believe that they were because I was trying to adapt his example as I went and messed things up. The FreeCodeCamp article was useful in filling in the gaps that I found in Nicolas’ article, especially when it came to what to do after you had a package. Finally, having the Python 3.8 library documentation helped me fill in the last bit of the knowledge that I needed to complete the setup process. Together, with just a dash of experimentation thrown in for good measure, I was confident that I could create a Python package. Even if that effort took a while.
Creating a New Setup.py¶
While I have had a local setup.py
file on my machine for months, it was always something
that I was toying around with, nothing concrete. As such, I found that it was more
efficient to start from scratch and
create a new setup.py
file based mostly on Nicolas’s article. I do not have any
issues with his use of mainline function calls, such as the ones that he uses to read the
readme.md
file from the directory, but I prefer things in functions. From my perspective, it just
helps me to keep things readable. I did like the
way he was organizing some of the values at the start of the module and decided to
follow that approach. Furthermore, I decided that it was more readable to have every
value in variables, instead of being somewhat hidden in the call to the setup
function,
so I also made that change.
The Most Important Parts of Setup¶
For me, the four most important parts of any setup are: name of the package, version of the package, minimum required platform, and a declaration of any dependencies. Others can disagree with me on whether these things are the most important parts of any setup script, but I believe I have a strong argument in my favor. It is a simple argument: without these four parts, the rest of the setup script is useless. Any documentation without something to document is pointless. Similarly, any declaration of what needs to be included in the package and how to access it are useless without that base declaration. At least in my mind, those four properties are always the foundation of any installation script.
Using the setup.py
module from the article Nicolas wrote as a good set of
crib notes,
I created a very basic module:
import runpy
from setuptools import setup
PACKAGE_NAME = "PyMarkdown"
SEMANTIC_VERSION = get_semantic_version()
MINIMUM_PYTHON_VERSION = "3.8.0"
def parse_requirements():
lineiter = (line.strip() for line in open("install-requirements.txt", "r"))
return [line for line in lineiter if line and not line.startswith("#")]
def get_semantic_version():
version_meta = runpy.run_path("./version.py")
return version_meta["__version__"]
setup(
name=PACKAGE_NAME,
version=SEMANTIC_VERSION,
python_requires=">=" + MINIMUM_PYTHON_VERSION,
install_requires=parse_requirements(),
)
It was not much, but it was a good start. Both the package name and the minimum Python
version required are hardwired in as they are almost never going to change. The function
get_semantic_version
was written to encompass the code from the article to fetch the
version number, and the parse_requirements
function was written to encompass the
requirements for the project.
Since I decided to specify the installation requirements for the project in the file
install-requirements.txt
, I added a very simple version of this file with a single
line present:
Columnar
Moving Version Information Into A Single Module¶
It took me a bit to warm up to this, but after reading
PEP 396,
it just made sense. If there is any reason to know the exact version of a Python library,
the __version__
field applied to the library name should contain the definitive version
for that library. Following this PEP just made sense but required some rearrangement of
code in the project.
Previously, the only place where the version information was kept was in the
__version_number
field of the PyMarkdownLint
class. While I debated an approach that
would leverage that existing code, the simplicity of simply having a single version.py
file just made more sense to me. With the get_semantic_version
function already present
in the setup.py
module, as detailed in the last section, I added the following code to
the PyMarkdownLint
class to reference that same file:
@staticmethod
def __get_semantic_version():
file_path = __file__
assert os.path.isabs(file_path)
file_path = file_path.replace(os.sep, "/")
last_index = file_path.rindex("/")
second_last_index = file_path.rindex("/", 0, last_index)
file_path = file_path[0 : second_last_index + 1] + "version.py"
version_meta = runpy.run_path(file_path)
return version_meta["__version__"]
This code is effectively the same code as in the get_semantic_version
function of
the setup.py
module. The only changes present were to deduce the executable path
from the __file__
variable
and to determine the relative location of the version.py
file from where that
executable is located.
After all this work, the only thing that was needed was a new version.py
module:
"""
Library version information.
"""
__version__ = "0.5.0"
and a small change to the test_markdown_with_dash_dash_version
test function to fetch
the version from the version.py
module.
Adding Documentation¶
With those basics out of the way, it was time to add the documentation basics to
the setup.py
module:
def load_readme_file():
with open("README.md", "r") as readme_file:
return readme_file.read()
AUTHOR = "Jack De Winter"
AUTHOR_EMAIL = "jack.de.winter@outlook.com"
ONE_LINE_DESCRIPTION = "A GitHub Flavored Markdown compliant Markdown linter."
LONG_DESCRIPTION = load_readme_file()
LONG_DESCRIPTION_CONTENT_TYPE = "text/markdown"
KEYWORDS = ["markdown", "linter", "markdown linter"]
PROJECT_CLASSIFIERS = [
"Development Status :: 4 - Beta",
"Programming Language :: Python :: 3.7",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
]
setup(
...
author=AUTHOR,
author_email=AUTHOR_EMAIL,
description=ONE_LINE_DESCRIPTION,
long_description=LONG_DESCRIPTION,
long_description_content_type=LONG_DESCRIPTION_CONTENT_TYPE,
keywords=KEYWORDS,
classifiers=PROJECT_CLASSIFIERS,
...
Most of these fields are self-explanatory and are simple string objects or lists of
string objects. The
three fields that stand apart from that are the LONG_DESCRIPTION
field,
the LONG_DESCRIPTION_CONTENT_TYPE
field, and the PROJECT_CLASSIFIERS
field.
The LONG_DESCRIPTION_CONTENT_TYPE
field is the easiest of the three as it assumes that
the README
file for the project will always be README.md
. As such, the MIME content
type for the long description will always be text/markdown
. For my projects, I feel
that it is a good assumption to make, so that was an easy one to get out of the way. Then,
to ensure that the LONG_DESCRIPTION
field is always up to
date, the load_readme_file
function reads the contents of the README.md
file and places
them into the LONG_DESCRIPTION
field. For me,
these fields just make sense as I can contain a package description of the project and
the GitHub description of the project in one place.
Finding the right values for the PROJECT_CLASSIFIERS
field was the tasks that I had the
hardest time with out of the three fields. With a seemingly
endless page
of available classifiers, it was hard to narrow down the classifiers to a small set.
While I am not comfortable that I have the right set of classifiers for the project,
I believe I have a good set to start with.
Looking at that work, the one thing that I needed to do to wrap it up was to
make sure that the README.md
file only contained information I wanted someone to
see when they were having their initial look at the project. While I do not want
to hide the project’s issues list, I did not want it to be the first thing people
saw. As such, I moved it over into the new issues.md
file.
Rounding Out The Setup Properties¶
According to my research, the only two other fields that I needed to add were the scripts
field and the packages
field. The packages
field was the easy one to define out of
those two: I simply needed to list all the packages for the project.1 While both
examples use the setuptools
module and its find_packages
function, I wanted to
maintain fine-grained control over the packages. As such, I specified each package name
separately.
setup(
...
scripts=ensure_scripts(["scripts/pymarkdown"]),
packages=[
"pymarkdown",
"pymarkdown.extensions",
"pymarkdown.plugins",
"pymarkdown.resources",
],
)
For the specification on how to start the PyMarkdown application, it took me a while to
decide on an action to use for that. During my research phase, I had three possibilities
for how to interact with the project itself: py_modules
, scripts
, and entry_points
.
There was barely any information on entry_points
and how to use them, so I decided to
not use those unless I found enough information to warrant changing to them. Looking to
my third reference source, the Python libraries documentation, I found this article
on
setup scripts.
As that is what the standard libraries used, I decided that was the best way for this
project.
Looking at the example that Nicolas provided in his article, I quickly created my own script:
#!/usr/bin/env python
from pymarkdown import PyMarkdownLint
PyMarkdownLint.main()
but came across one glaring problem right away. That script would work well on Linux
systems, but my development environment is a Windows machine. As I use the PyLint
scanner on all my Python projects, I decided to look at how they solved this problem,
and used their ensure_scripts
function verbatim2:
def ensure_scripts(linux_scripts):
"""
Creates the proper script names required for each platform (taken from PyLint)
"""
if util.get_platform()[:3] == "win":
return linux_scripts + [script + ".bat" for script in linux_scripts]
return linux_scripts
It is wonderful in its simplicity! If the first three characters of the platform
are win
, then the function assumes that the list of scripts must refer to scripts
that will work on a Windows machine. It accomplishes this by adding another
list of scripts to the list, this new list being comprised of every element of the
original list, but with a .bat
appended to the end. With that, the last thing
was to copy the .bat
batch file format over from PyLint
:
@echo off
rem Use python to execute the python script having the same name as this batch
rem file, but without any extension, located in the same directory as this
rem batch file
"%~dpn0" %*
I was not sure if that batch script was going to work, but if it was good enough for PyLint, I figured it was a good enough starting place for me.
Almost Finished¶
Two simple things were left before my first attempt to compile my first Python package.
The first thing was to add a simple LICENSE.txt
file to the project
to establish the use of the project. The other was to add a __init__.py
module
to the pymarkdown
directory to make sure that the base of the project was considered
a module for setup to pick up.
With those two things addressed and out of the way, it was time to compile the setup for the project!
The Fun Begins: Getting Packaging To Work¶
To start compiling the setup, I included the twine
and setuptools
into my
development environment using pipenv install twine setuptools
. Once that
was complete, I added the following package.cmd
script to the repository to make things
easier:
rmdir /s /q dist
rmdir /s /q build
rmdir /s /q PyMarkdown.egg-info
pipenv run python setup.py sdist bdist_wheel
pipenv run twine check dist/*
It was nothing fancy, but it allowed me to repeatedly repackage the project to test
any changes in an efficient manner. Basically, it removes any signs of a previous
build before running the setup.py
script and then the twine
script. While it
is not as fancy as the Gradle scripts I have for Java projects at work, I found that it
is uncomplicated and works very well. I purposefully did not add any error handling to
the batch script as I wanted to make sure I saw all the information that was reported,
unfiltered.
To assist in testing those changes, I created a new project pymtest
at the same level
as the PyMarkdown
project and left it almost empty for now. I created that project
to be my test installation environment, useful once I had a package to install. For now,
I just wanted to get it ready for later. Thus, I created a simple refresh_package.cmd
script with these contents:
pipenv uninstall PyMarkdown
pipenv install ..\pymarkdown\dist\PyMarkdown-0.5.0.tar.gz
Simply, uninstall any existing PyMarkdown
package and install a new one right from
the dist
directory of the PyMarkdown
project.
Now on to the real work: debugging the install script.
Pass 1: Getting The Version Right¶
Executing the package.cmd
script, everything worked fine, and I had a new package to
test! Switching over to my test project, I executed the refresh_package.cmd
batch
script… and waited. Looking at the output, the uninstall
command was completing
in under a second, but the install command was taking its time on the Resolving
phase of installing the package. It was agonizing!
But when it was done, it displayed the following error:
ERROR: Command errored out with exit status 1:
...
FileNotFoundError: [Errno 2] No such file or directory: '..pip-req-build-mfg5j1bu\\version.py'
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
I tried a couple of different things with no luck before I opened the
PyMarkdown-0.5.0.tar.gz
archive file from the project’s dist
directory and examined
its contents. When I did that, I noticed that there was no version.py
file anywhere in
the archive.
At that point, I spent about an hour or so trying to figure out how to get that
version.py
file into the archive at the right place before deciding to go with a more
intuitive approach. After looking
at how the files were installed after the install pymarkdown
command was completed,
it was obvious that my current approach would necessitate copying the version.py
file
into the pymarkdown
directory. So, instead of trying to figure out how to do that
“complicated” action, I decided on the “simple” action to move the version.py
file
into the pymarkdown
directory.
With that decision made, I rewrote the get_semantic_version
in the setup.py
module as follows:
def get_semantic_version():
version_meta = runpy.run_path("./pymarkdown/version.py")
return version_meta["__version__"]
I also rewrote the __get_semantic_version
function in the main.py
module as follows:
def __get_semantic_version():
file_path = __file__
if not os.path.isabs(file_path):
assert False
file_path = file_path.replace(os.sep, "/")
last_index = file_path.rindex("/")
file_path = file_path[0 : last_index + 1] + "version.py"
version_meta = runpy.run_path(file_path)
return version_meta["__version__"]
With the version.py
file moved into the pymarkdown
directory, and with both references
to that file now looking for it in the new location, that error was now resolved.
Pass 2: File Name Casing Matters¶
After packaging the project again, I ran the refresh_package.cmd
script and
was now greeted with this error:
FileNotFoundError: [Errno 2] No such file or directory: 'README.md'
I examined the directory structure of the archive a good four or five times without any
ideas coming to mind. I even looked at the Python install pages to see if I could find
anything. But all I could find with a list of the
files to distribute.
This included other types of readme files, but not specifically the README.md
file.
Double checking the project that Nicolas set up, I saw that he was using README.md
as
a source for his long documentation without any apparent extra setup needed to include
that file. So, I figured it must be something else.
That is when it hit me. Windows has many uses as an operating system3, but one
of the things I do not like about it is the case-insensitivity of the file system. In
this case, I had called the readme file readme.md
instead of READMD.md
. Simply
correcting the case of the file name resolved this issue.
After a simple case of “cannot see the forest because of the trees”, it was on to the next issue.
Pass 3: Making Sure The Right Files Are Included¶
This time, when I executed the refresh_package.cmd
script after repackaging the project,
I was greeted with this error:
FileNotFoundError: [Errno 2] No such file or directory: 'install-requirements.txt'
With some newfound experience under my belt, I immediately opened the archive and
found that the install-requirements.txt
file was not in the archive. Thankfully, in
looking for solutions for the last error, I came across a solution to include data
files into the setup process using a MANIFEST.in
file. Located in the same section where
I found the information detailing which files were
automatically included in the setup archive,
that section
there is information on the MANIFEST.in
file near the end of that section.
Following those instructions, I was quickly able to create a new MANIFEST.in
file
with the following contents:
include install-requirements.txt
After a quick repackaging and reinstalling, this error was indeed solved.
Pass 4: Lather, Rinse, Repeat¶
While that file was now present in the archive, the new error was complaining about a missing directory:
error: package directory 'pymarkdown\resources' does not exist
The main reason for this directory is to host the entities.json
file. That file
contains each of the named entities, with the corresponding Unicode character they each
entity maps to. I tried adding
an __init__.py
and other such workarounds to get the file included, but nothing worked.
Convinced that I had tried other approaches, I followed the same approach as the last
section, and added it to the MANIFEST.in
file:
include pymarkdown/resources/entities.json
I do not want to make it sound that I dislike the MANIFEST.in
approach to including files
in the setup archive. I don’t. But to me, it feels like that file is the last option to
include files, with all other options having been exhausted. For me, that is my own
sniff test
for whether the use of the MANIFEST.in
file is warranted. For example, I would rather
figure out that I need to change the readme.md
file into the README.md
file before I
thought about adding it to the MANIFEST.in
file. In this case, I was convinced that
there was no other way to include the file, and as such, I had passed my own sniff test.
And It Was Done¶
With that change made, I was now seeing the refresh of the packaging complete without any errors:
Installing ..\pymarkdown\dist\PyMarkdown-0.5.0.tar.gz...
Adding PyMarkdown to Pipfile's [packages]...
Installation Succeeded
Pipfile.lock (db4242) out of date, updating to (29513d)...
Locking [dev-packages] dependencies...
Locking [packages] dependencies...
Locking...Building requirements...
Resolving dependencies...
Success!
Updated Pipfile.lock (29513d)!
Installing dependencies from Pipfile.lock (29513d)...
================================ 1/1 - 00:00:05
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
And Now, Verifying The Usage¶
With everything looking good in the packaging and installation, the next step was
to test the usage of the newly installed library. With optimism in my heart, I went to
execute my first test command, pipenv run pymarkdown --help
, and I waited. After a good
couple of minutes, I killed the script, checked things again, and everything seemed fine.
It seemed like I was not done debugging the setup process quite yet.
Pass 1: Proper Script Files¶
Having “imported” the script files from the PyLint project, I hoped they would work
out of the box, but assumed that I would have to do some work to get them operational.
I liked the idea of calling the pymarkdown
script from the pymarkdown.bat
script,
but after 45 minutes and approximately 4 attempts at rewriting the scripts, I gave up.
Just like before, I decided to go with simplicity for both files, the pymarkdown
file:
#!/usr/bin/env python
from pymarkdown import PyMarkdownLint
PyMarkdownLint().main()
and the pymarkdown.bat
file:
python -c "from pymarkdown import PyMarkdownLint; PyMarkdownLint().main()" %*
Instead of having one script call the other, I opted for matching the contents of both
scripts as closely as possible. In the shell version, the
shebang
at the start of script takes care of invoking Python and Python itself takes care
of the command line arguments. In the batch script version, I needed to explicitly
call Python with the -c
argument to tell Python to execute the next argument as
a Python script. Finally, the $*
at the end of that line causes any arguments passed
to the batch script to be passed to the Python program specified with the -c
argument.
After a couple of tries, mostly due to small typing mistakes, when I executed the
command line pipenv run pymarkdown --help
, I was welcomed with the help documentation
for the project. Success!
Pass 2: Init Files¶
With the batch
script issue in the last section resolved, the execution of the test command
pipenv run pymarkdown --help
now yielded this error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: cannot import name 'PyMarkdownLint' from 'pymarkdown' (C:\Users\jackd\.virtualenvs\pymtest-W-bOTTm6\lib\site-packages\pymarkdown\__init__.py)
Perhaps it is my knowledge of other programming languages, but I favor direct imports
in the files that need them over the use of __init__.py
modules. For me, it just
seems like overkill in 98% of the cases, leading to a hard-to-understand view of
dependencies between files. In the case of creating a setup package, this
turned out to be one of the 2% cases that I had not come across yet.
But, seeing it as this was an obvious request for a proper __init__.py
module, I added
one to the pymarkdown
package with the contents:
from pymarkdown.main import PyMarkdownLint
I do not use it in any of the other modules for the project, but it is there for the
setup.py
module and any others that need it. As such, I can stay true to how I use
import
statements while providing the information that the setup scripts need.
For me, that is a win-win.
Pass 3: Including Data Files¶
With the pipenv run pymarkdown --help
command now running without any issues, I wanted
to include some more complex examples to test in the refresh_package.cmd
script. To
that extent, I added the following lines to the end of that file:
pipenv run pymarkdown plugins list
pipenv run pymarkdown plugins info md048
pipenv run pymarkdown plugins info md047
pipenv run pymarkdown scan ..\blog-content\website\content\articles
Going through the reinstall process with the new version of this script, the installation
and the first three commands all went off without any issues. However, when it got to the
scan
command, the following error was emitted:
BadTokenizationError encountered while initializing tokenizer:
Named character entity map file '..\lib\site-packages\pymarkdown\resources\entities.json' was not loaded ([Errno 2] No such file or directory: '..\\lib\\site-packages\\pymarkdown\\resources\\entities.json').
Going back to the useful
files to distribute section,
I quickly noticed that one of the items in the list was labelled
Installing Additional Files.
This seemed to fit the situation that I had before me exactly. Reading the information
on the other side of that link, I knew what to do within a couple of minutes. Within a
couple more minutes, I had this change coded up and inserted at the end of the setup
function call in the setup.py
module:
setup(
...
data_files=[('Lib/site-packages/pymarkdown/resources', ['pymarkdown/resources/entities.json'])]
)
Going through the entire process again, everything worked fine, and I was now done with the test scenarios I had in mind. I tried a handful of additional scenarios to make sure I had them all covered, and each scenario worked as I expected it to. I had a fully functioning install script!
Pass 3.1: Cleanup¶
This was not really a pass on its own, but a little bit of cleanup that I wanted to
do. While looking at various other Python setup articles and library packages, I
decided to add three more arguments to the setup
function call:
setup(
...
maintainer=AUTHOR,
maintainer_email=AUTHOR_EMAIL,
url=PROJECT_URL,
...
Since I am both the author and the maintainer, it just made sense to set the maintainer
fields to the same values as with the author
fields. I also wanted people to be able to
get more information on the project, so setting the url
field also made sense.
What Was My Experience So Far?¶
Based on my experience with other languages, creating an installation package for the project in Python was a walk in the park. There was no fancy extra packaging required, everything was written in Python. While it took me about four hours to make sure everything was working properly, I would estimate that a similar installer for C# or Java would easily take at least eight hours to get into a similarly finished form. For me, that is a win.
In general, I am very pleased with how this work went on getting the setup code into proper shape. There were some very good examples that I could lean on to get my code working, and the starting points were all well-defined. That made the distance I needed to travel from sample code to working code very short, which was very pleasant for once. During the creation of the setup script, I did notice a couple of extra things that I want to clean up before the initial release. But like before, they are all small and reasonable, so I am confident I can make short work of them.
What is Next?¶
With the setup packaging complete for now, I move on to simplifying the output from some of the commands and starting to update the rules for the initial release.
-
I almost feel that a “duh?!” would be warranted here, but do not feel that it is appropriate. ↩
-
Since I took a look, someone refactored the setup code. Please look at this code, which is the code I cribbed from. ↩
-
This comment is not meant to start a religious war. I firmly believe that there are many different jobs that need done, with some tools being the obvious choice for that job. There are other jobs where the tools that can be used are more on personal preference combined with the job at hand. For myself, operating systems are just that: tools. ↩
Comments
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.