Summary¶
In my last article, I talked about the final changes that I needed to make to get the PyMarkdown project ready for its initial release. In this article, I take a bit of a break from pushing towards release to work on some refactoring.
Introduction¶
Having taken up a lot of energy in the past month or so to get the initial release of the PyMarkdown project done, I was tired. Not so tired that I could not look at a computer screen, but tired of the push towards the goal of the initial release. And like anyone pushing towards a goal who then reaches that goal, I knew that the healthy thing to do was to take a bit of needed downtime before I get back to the issues waiting for me on the PyMarkdown project.
There is a catch though: I do not like sitting still. Even when I am relaxing, I like doing something like reading, working on a puzzle or a puzzle book, or trying something new out. It is not that I do not care for relaxing, quite the contrary. It is just that getting something done helps me feel more at piece, even something as simple as taking care of one of my chores around the house. One more thing done, one less thing to worry about. And I knew that I did not want a big break from writing Python code, I just wanted a small break from writing code for the PyMarkdown project.
Looking at the various things that I could spend time on Python-wise, there was
one thing that caught my eye: the application_properties.py
module.
What Is the Audience for This Article?¶
While detailed more eloquently in
this article,
my goal for this technical article is to focus on the reasoning behind my solutions,
rather that the solutions themselves. For a full record of the solutions presented in
this article, please consult the PyMarkdown
commit of
06 Jun 2021
and the application_properties
commits between
02 Jun 2021
and
06 Jun 2021.
Why That Module?¶
When I was looking over my list of various things that I could work on during my
break, there was one prevalent theme among the Python oriented projects:
foundational objects. In each case, they needed simple configuration support,
simple logging support, and simple file determination support. While I needed
to think about how to better deal with logging and file determination in the
future, I already had a good solution for configuration that I had developed
for the PyMarkdown project: the application_properties.py
module.
But what was the best way to move that to the other projects that I wanted to start working on?
The Basics¶
Created back in March, this article details the thinking around why I created that module. My decision to create “another” properties system was not an easy one to make. However, after listing out my requirements for any properties system, I was not able to find any existing system that had the level of quality and readability that I wanted for a properties system. As such, I felt the best alternative was to create a properties system that met those requirements.
With the PyMarkdown project, I felt strongly about documenting what I was thinking about and going through as I was creating the project. As the properties system was an important part of the interface, I dedicated an entire article to explain why I created that system. And while I will not repeat every single point made in that article, I believe there is benefit in highlighting two of the important points of that article.
The first point that I made is that I believe that every configuration system can be evaluated on five core concepts: basic property support, command line support and environment support, validation support, grouping support, and finally hierarchy support. Basic property support is the provision of basic operations to get the property value out of the configuration system. Command line support and environment support assume that some manner of file support is a given and ensuring that the user can override the configuration from either the command line, the environment, or both is an extra. Validation support speaks to any functionality that is provided to help the caller determine the correctness of a configuration value without having to further handle the value returned from the system. Grouping support is the ability of the configuration system to recognize some manner of grouping configuration items that have a common purpose together. Finally, hierarchical support provides for an extra level of understanding of what configuration items belong together with which other configuration items.
While that is by no means a comprehensive list, it was a good list of requirements to start off with. Using that as a foundation, I then presented a good argument that those requirements help define three basic types of configurations: the Simple Configuration, the Grouped Configuration, and the Complex Configuration. The main difference is not the data that is stored within each configuration, as that data is identical in all cases. It is the organization of the data that is the main difference, with the Complex Configuration type providing the best organization of all three types.
With those types now defined, how do we pick one for any given project?
Picking The Best Option For A Project¶
From my experience, if a project’s needs are for five or less configuration items, a Simple Configuration type typically works best. For between five and ten configuration items, it is usually best to start organizing those configuration items into groups, and hence a Grouped Configuration type is more fitting. Following that logic even further, when there are more than ten configuration items, it usually a good idea to organize those groups into a meaningful hierarchy, lending itself to the Complex Configuration type.
While I do not use those ranges as concrete guidelines that I strictly follow, to me they follow my personal common-sense rule. The first question that I ask myself is:
What are the configuration requirements for the project that I am working on?
The follow up question to that is:
What is the most obvious way to present that data to make it clear, understandable, and maintainable?”
Examples¶
I believe that the best way to show why I believe that these ranges make sense are with some concrete examples. While this set of examples is completely fictious, it follows patterns and situations that I have observed and helped mitigate over the years.
Example 1: A Simple Webservice¶
One of the simplest cases that I can think of is the configuration for a simple webservice that presents data from a simple data file. In this case, I would probably create a data file that looked like this:
port=8080
file=my_data.json
From a common-sense point of view, I believe this layout makes the most sense. Both property values relate to the webservice, but that seems to be their only link to each other. So, partially due to the low number of property items and partially due to their lack of connection, a Simple Configuration system seems to be the most logical choice for representing this data. I believe that this file presents the intention of the data in a way that satisfies all three of the criteria mentioned above.
Example 2: Growing The Webservice¶
As with all things simple, they organically seem to grow as people try and make it do “just one more simple thing”. For the purpose of this example, I am going to grow the configuration by allowing it to also specify the endpoint for the webservice, the input type of the data file, and pagination values to limit the size of items being hosted. Just adding these values to the configuration file could result in this orderly file:
port=8080
endpoint=/api
file=my_data.json
mime_type=application/json
page_item_count=5
page_maximum=10
or it could result in this disorganized file:
port=8080
endpoint=/api
page_item_count=5
mime_type=application/json
file=my_data.json
page_maximum=10
Using the Simple Configuration Type, there just is not any good way to organize this
information. If someone decides to add a file
configuration item, the only rule is that
it must be within the file. That is where the Grouped Configuration type comes
into play. Switching over to that type would allow us to reorganize the configuration file
into a form such as:
[rest]
port=8080
endpoint=/api
[source-data]
file=my_data.json
mime_type=application/json
[pagination]
item_count=5
item_maximum=10
While the exact format of the file is not important, the grouping of the data within that file is important. Instead of a group of six seemingly connected configuration items, the file is now organized to show that there are three groups of configuration items.
Once again, from a common-sense point of view, I believe this Grouped Configuration type
layout is logical. The
items in the rest
section deal with how the REST endpoint for the webservice is setup.
The items in the source-data
section deal with the information that is being presented
and the format in which it is stored. Finally, the items in the pagination
section are
used to provide instructions on how to deal with presenting large amounts of data through
the webservice.
To reiterate a point that I made above, both files present the same configuration items. The only differences are that some of the names changed and the addition of groups for similar items adds more context to each item within that group.
Example 3: This Really Should Be Multiple Webservices¶
Following the rule of “do one thing well”, this example really should be implemented as multiple webservices. But due to external pressures, it is commonplace to overload an already overloaded system just a bit more, even if it is only a stop gap measure. For this example, I am going to create a fictional requirement that the webservice also hosts another data file at a second endpoint.1
Given that requirement, I started with the Grouped Configuration file that was presented in the last section, and then made some modifications:
[rest]
port=8080
endpoint=/api
alternate_endpoint=/data
[source-data]
file=my_data.json
mime_type=application/json
[alternate-source-data]
file=my_other_data.json
mime_type=text/csv
[pagination]
item_count=5
item_maximum=10
Going strictly by my own guidelines, nine configuration items means that this should be a
good configuration file, but it looks disjointed to me. When I read that file, the big
question that I have is whether the item alternate_endpoint
is related to the items under
the alternate-source-data
group. And if I must remind myself of that each time that
I read the file, it means the maintainability of the configuration is not where it could be.
That configuration file’s lack of maintainability presents a good reason for bumping the file up to a Complex Configuration type, such as:
{
"port" : 8080,
"endpoint" : {
"path": "/api",
"source" : {
"path" : "my_data.json",
"mime_type" : "application/json"
}
},
"alternate_endpoint" : {
"path": "/data",
"source" : {
"path" : "my_other_data.csv",
"mime_type" : "text/csv"
}
},
"pagination" : {
"item_count" : 5,
"item_maximum" : 10
}
}
While there is a bit more text in this file, that text helps define meaningful context
that adds hierarchical context to the data. Speaking directly to my previous question
regarding the item alternate_endpoint
, it is now clear that the endpoint’s path and
the endpoint’s source data are directly connected. Additionally, it allows the file
to group the path
and mime_type
configuration items under the source
entry, making
it clear that both of those items are related to the source used for the webservice.
I may be reiterating this point too many times, but this kind of formatting just seems like common sense to me. This data is more complex, so the configuration type must evolve with the data, or understanding and maintainability suffers. To me, the format of this file makes the data clear, understandable, and maintainable.
Looking For Something Out There¶
Other than sounding like the opening line to a Hair Rock ballad, the section title describes what I initially did when looking for a solution that provided all three levels of configuration. While there were some solutions out there that provided for the first two types of configuration, I could not find anything that handles the Complex Configuration type. I know that the above sample is a simple JSON file and loading JSON files are easy, but the traversal of the configuration data should be easy as well.
It was then that I realized that I wanted to pull the application_properties.py
module
out of the PyMarkdown project and turn it into its own package. My plan is to start with
what I have, and quickly add on other loaders to address the Simple Configuration type
and the Grouped configuration type. The thing that ties them together? Regardless of
how the configuration is loaded, the Python interface should remain the same.
Basically, I believe that I have a good way to present properties to Python developers,
and I believe that releasing application_properties
as a package will provide some
benefit to others. Now I just had to do the work to get it done!
Getting It Done¶
The first two commits to the new repository, jackdewinter/application_properties were simple with only a couple of changes. The only changes that I made before that commit were to add new test functions for any functionality that was previously uncovered. That effort was nothing serious, just the addition of a handful of test functions to cover lines that previous had been covered by the scenario tests for PyMarkdown. Due to the simple nature of the package, I was able to get the code coverage to 100 percent with little effort. And, like the PyMarkdown project, maintaining a coverage percentage near that value is a worthy goal that I hope to be able to maintain.
Starting With Cleanup¶
Before I could package the project up, I felt that I needed to clean up two things. The first
bit of cleanup that I needed to do was to split up the various classes in the
application_properties.py
file up, reorganizing them to follow the one class one module rule.
When the module was in the PyMarkdown project, that module was only one module in a group
of other modules. As such, keeping all three classes in one module made sense. Now that it
was in a package dedicated to providing access to properties, it just seemed like
the right thing to do.
Once that task was completed, I took some time to properly understand the
purpose of a __init__.py
file in a package, then taking that knowledge to
create an __init__.py
file for the package. I was confused at first
as to how to properly construct a good __init__.py
file that would work properly.
Looking at other Python packages and how they constructed their __init__.py
files
helped me learn a lot. Initially, I was also concerned it would be a lot of effort to create,
but it turned out to be straightforward process. As an added benefit, it made accessing the
package from the test modules a lot easier.
Then Make A Local Package¶
From that point, my next goal was to create a distributable package that I could test
locally. Like the way I created a simple pymtest
project to test the installation
of the PyMarkdown project, I decided to use the PyMarkdown project to test the installation
of the application_properties
package. From my experience with PyMarkdown, I knew that I
could install the package locally using the command line:
pipenv install ../pymarkdown/dist/pymarkdownlnt-0.8.0.tar.gz
Being able to test that package locally before I published it was fantastic. I could fiddle with whatever settings I wanted to until everything looked just right.
So, with the project itself cleaned up, I decided to use the PyMarkdown project as a “cheat sheet” of what to do. Since I created the packaging for that project and I knew how well it worked, I figured that using the PyMarkdown project as a template was a smart move. Therefore, I started copying files from the PyMarkdown project as examples of what I needed to do in the current project.
The list of files that I needed to copy over and change from the PyMarkdown project
was quite small. The obvious file is the setup.py
file, and it required around ten
changes to work properly. Other than that, the version.py
file, the MANIFEST.in
file, the install-requirements.txt
file, and the package.cmd
file were the only
other files that I copied over. After making small changes to the version.py
file
and the install-requirements.txt
file, the package was building within five minutes
of starting the work on this section.
Increasing Readability?¶
Outside of
changing the name of the package, the biggest change was my introduction of a new
file called the pypi.md
file. Since my creation of the PyMarkdown project’s
readme.md file,
I have started to wonder if that Pypi.org page for the package has too much information.
While I believe that the PyMarkdown readme.md file is a solid GitHub repository readme.md
file, I am not sure if it is the right length for a
PyPi.org project description page.
I am not sure of whether a copy of the readme.md
file or a shorter version
of that information is the best thing for either project, so I decided to experiment.
I created the pypi.md
file to be used as the source for the application_properties
package’s
PyPi.org project description page.
Into that file, I copied the first three sections, with reference links to the main
readme.md file. This way I can see both in action, solicit feedback, and make an educated
decision at a later date.
As I said, it is an experiment. Not sure how it will turn out, but time will tell.
Testing With PyMarkdown¶
Going back to the PyMarkdown project, I was easily able to add the newly built package to my project using the following command line:
pipenv install ../application_properties/dist/application_properties-0.5.0.tar.gz
Once I did that, I went to the PyMarkdown source files and removed the
application_properties.py
from the project. Trying to compile, I noticed that I
needed to do some small changes on the import statements for the application_properties
module. Other than that, everything just worked. I would like to think that a certain
amount of that was luck, but I believe that it was simply good organization. Other than
the new __init__.py
module in the project taking care of the import responsibilities,
everything was the same as before. They had the same class names and the same function names;
the only difference was that they were in a different package.
After double checking, the test_application_properties.py
module was removed, as it had
also been moved to the new project. Running all the PyMarkdown scenario tests, everything
worked fine, so it was time to move on.
Documentation¶
With the thorough tests in place, zero PyLint & Flake8 warnings, a clean build of the Python package itself, and the testing of that package in the PyMarkdown project accomplished, there was only one thing left to do: documentation.
If I am being honest, I was mostly looking forward to working on the documentation. Based on my usage of the module in the PyMarkdown project and the few changes that I already performed in this project, I was confident that I had a good set of modules to document. For me, that is always the bulk of the battle. The more confident and interested I am in the subject that I am documenting, the easier the words flow out of my fingers.
The other reason that I look forward to working on documentation is that it has a habit of forcing me to walk through the entire user interface for the purposes of documenting it. Along the way, if something is not done right, it quickly becomes obvious to me. From my experience, I have found that the effort required to document an object is directly proportional to how difficult that object is to understand. I knew there were probably going to be a couple of things that I missed or could be simplified, and I was eager to get to them.
As the module had been previously tested, there was not that much to find. One of the
functions that was missing was a new function called property_names_under
to
complement the property_names
property that listed every property name in the
configuration instance. This new function took that same concept but altered the
returned list slightly to only return those property names that were under a given
prefix. That way, instead of the caller having to filter the return value of property_names
to only include the values under rest
, the caller could ask the
property_names_under
function for values under rest
.
Along those same lines, I needed to fix up the handle_error_fn
parameter of the
ApplicationPropertiesJsonLoader
class. Since the PyMarkdown project wanted a
consistent response to any load exception, I created this parameter to pass
failure text along with the exception. That way, the project could decide how to
best display that information. The only problem was that it did not have any default
behavior, so it required an inner function to be created in the test functions.
I decided to clear that up by adding the following code to the load_and_set
function:
if not handle_error_fn:
def print_error_to_stdout(formatted_error, thrown_exception):
_ = thrown_exception
print(formatted_error)
handle_error_fn = print_error_to_stdout
A simple write of the formatted_error
to standard out, but a simple enough default.
As Always… Clean Up¶
Before I published, I took a quick look through the code and documentation, and just did some small cleanup tasks here and there. Nothing big, just some rewording of the documentation, creating GitHub issues to track what needs to be done, and adding placeholder sections to some of the documents. There was some stuff left to do, but for an initial release, it would be good enough.
And with that, I ran the publish_to_pypi.cmd
script and published the package. I then
went over to the PyMarkdown project and updated its dependency from a local package to
application_properties==0.5.0
, and reran every test.
With things looking good, and all tests passing, it was time to call it a day!
What Was My Experience So Far?¶
While it might seem weird to some people, it was a fun and relaxing week
working on the initial release of the application_properties
package. No pressure
from myself to do anything, just take something I already have and clean it up for
publication.
It was also nice to know that I was thinking about future projects in Python. Looking at the PyMarkdown directory on my computer, there are a couple of helper scripts that I use every so often for various small tasks. It would be nice to get them out of that directory and into their own project. That would allow me to clean them up and make them easier to use. Nothing serious, just some small helper applications, but nice fun projects to work on. And having a good start on the configuration for those projects, that was a nice thing to have in my pocket.
Yes, next week was going to be about getting more work done on PyMarkdown. But for this week, it was a nice and relaxing small break. I felt refreshed.
What is Next?¶
Getting back to PyMarkdown, I start to go through the issues list, with the goal to knock a couple more things off that list.
-
While I would push back on this requirement from a quality point of view, I would also acknowledge that there may well be a business benefit to this overload for a variety of reasons. ↩
Comments
So what do you think? Did I miss something? Is any part unclear? Leave your comments below.