Static Websites: Publishing To GitHub Pages

This is the fifth article in a series about setting up my own website using a Static Site Generator. For other articles in the series, click on the title of the article under the heading “Static Site Generators” on the right side of the webpage.

Introduction¶

In the previous articles, I used Pelican as my Static Site Generator, generating a nicely themed site with some test articles in it. To make sure I can speak with my own voice, I took my time determining what theming I wanted for the website. By taking these deliberate steps, I was able to arrive at my choice for the site’s theme with confidence and clarity.

This entire journey has been geared towards generating an externally hosted website that is primarily a blog. This article talks about the final step on that journey: publishing my website to the free GitHub Pages platform.

Before We Start¶

Please read Operating System Paths from the second article in this series for my view on operating system pathing. (i.e. /mount/my/path vs. c:\my\path). To understand the term base project directory and the script text %%%MY_DIRECTORY%%%, please read the explanation under Step 2: Create a Project Directory For The Site, also from the second article in this series.

Why GitHub Pages?¶

In looking at a number of sites built on various Static Site Generators (SSGs), it became obvious that a majority of the pages were hosted on GitHub Pages. With that in mind, I looked into GitHub pages to figure out if it was best solution for me.

The article What is GitHub Pages? goes into a decent amount of detail, but the summary boils down to:

it’s 100% free
only static content can be hosted
don’t do anything illegal
don’t create an excessively large site (over 1 GB in size)
don’t create a site that is incredibly popular (over 100GB per month in bandwidth)

For cases where the last two items occur, their website even mentions that they will send “a polite email from GitHub Support or GitHub Premium Support suggesting strategies for reducing your site’s impact on our servers”.

To me, this seemed like a good place to start. I already use Git for source management, so familiarity with the website and tooling is already there. Their documentation is good, and it looks relatively easy to implement. Another plus. Most importantly, there are no fees for upload or serving the content, so I can experiment with various things and not worry about incurring extra charges.

Branches on GitHub Pages¶

After doing my research on GitHub, specifically about publishing on GitHub pages, I was confused about one point. From my experience with Git, most people and companies do either repository based development or branch based development. Even less frequent is something called monolith based development. The approach for GitHub pages is not one of those.

Repository based development uses Git’s distributed nature to create changes on your own repository, only merging the changes into the “main” repository when you are sure of your changes. Branched based development is similar, except the branching feature of Git is used on a single remote repository, only merging changes into the “main” branch when you are sure of your changes. Monolith development is more dangerous, with committing all changes to a single repository with a single branch. For all three type of development, there is one thread going through all of them: you are keeping versions of a single type of thing in your repository.

In a number of sites that I researched, it appeared that they were using a tool called ghp-import. This tool allows for the content for the site to be stored in the content branch of the repository, while the produced website is pushed to the master branch of the same repository. While I can wrap my mind around it, to me it didn’t seem like a good idea. As this is outside of my normal workflows, I was pretty sure that at some point I would forget and push the wrong thing to the wrong branch. To keep things simple, I wanted my website content in one repository, and my website content in another repository.

That itself raised some issues with my current setup, having the output directory at the same level as the content directory. During my research, I came across the statement that Git repositories cannot contain other repositories. If you do need to have this kind of format, a concept called submodules was recommended. The plugins and themes repositories for Pelican make heavy use of submodules, so I knew it could be done. But after some experimentation with some sample repositories, I was unable to make this work reliably. Also, while I can learn to wrap my mind around it, it seemed like a lot of extra work to go through.

In the end, I decided that it was best to keep things simple, keeping 2 repositories that were 100% separate from each other. If I do more research and figure out how to make submodules work reliably, I am confident that I can condense these distinct repositories into one physical repository.

With that decision made, I needed to create a new output directory outside of the blog-content directory. I decided to call this new directory blog-output and have it at the same level as blog-content. To make sure it was initialized properly with a local repository, I entered the following commands:

mkdir ..\blog-ouptut
cd ..\blog-ouptut
git init

Once that was complete, I had to ensure that the pelican-* scripts were changed to point to the new location, taking a simple search and replace over all of the script files. That being completed, I executed each of my pelican-* scripts, to verify the changes were correct, with no problems. To further ensure things looked good, I performed a git status -s on both repositories to be sure I didn’t miss anything. While this approach wasn’t as elegant as the other solution, in my mind it was simpler, and therefore more maintainable.

Adding Remotes Repositories¶

Now that I had two local repositories, one for content and one for output, it was time to make remote repositories on GitHub for each of them. I already had a GitHub account for some other projects I was looking at, so no worry there. Even if I didn’t have one set up, GitHub makes it simple to set up a new account on their home page.

From there, it was a simple matter of clicking the plus icon at the top right of the main window, and selecting New Repository from the drop down list. The first repository I created was for the content, and I simply called it blog-content, which I entered in the edit box under the Repository Name label. As I wanted my content to be private, I changed the selection from Public to Private and clicked on the Create Repository button.

For the other repository, I followed the same instructions with two exceptions. The first exception is that, as the output of Pelican needs to be public to be seen, I kept the selection on Public. The second exception was the name of the repository. According to the User Pages page, to publish any committed pages you need to use a site of the form user-name.github.io and push any changes to the master branch. As my user name on GitHub is jackdewinter, this made my repository name jackdewinter.github.io.

If you are using this article as a guide, please note that you will need to change the repository name to match your own GitHub user name.

Securing The GitHub Access¶

The first time that I added my remote repositories to their local counterparts, I encountered a problem almost right away. When I went to interact with the remotes, I was asked to enter my user id and password for GitHub each time. This was more than annoying. Having faced this issue before on other systems, I knew there were solutions, so back to the research!

Now, keep in mind that my main machine is a Windows machine, so of course this is a bit more complicated than when I am working on a Linux machine. If I was on a Linux machine, I would follow the instructions at Connecting to GitHub with SSH and things would probably work with no changes. To start with, I want to make sure that GitHub has it’s own private/public key pair, so I would follow the instructions under Generating a New SSH Key and adding it to the ssh-agent. I would then follow the instructions under Adding a new SSH key to your GitHub account to make sure GitHub had the right half of the key. A couple of Git commands later, and it would be tested.

In this case, I needed to get it running on windows, and the Win10 instance of SSH takes a bit more finessing. To make sure the service was installed and operational, I followed the instructions on Starting the SSH-Agent. Once that was performed, I was able to execute ssh-agent, and only then could I use ssh-add to add the newly created private key to ssh-agent.

In a nutshell, I needed to execute these commands to setup the key on my local machine:

ssh-agent
ssh-keygen -f %USERPROFILE%\.ssh\blog-publish-key -C "jack.de.winter@outlook.com"
ssh-add %USERPROFILE%\.ssh\blog-publish-key

Attaching Remote Repositories to Local Repositories¶

This was the real point where I would see if things flowed together properly. First, I needed to specify the remote for the blog-content repository. Looking at my GitHub account, I browsed over to my blog-content repository, and clicked on the clone or download button. Making sure the link began with ssh, I pressed the clipboard icon to copy the link into the clipboard.

Back in my shell, and I change directory to blog-content and entered the following:

git remote add origin %%%PASTE HERE%%%

where %%%PASTE HERE%%% was the text I copied into the clipboard. As my user id is jackdewinter and the repository is blog-content, the actual text was:

git remote add origin https://github.com/jackdewinter/blog-content.git

This process was then copied for the blog-output directory and the jackdewinter.github.io repository.

Publish the Content to Output¶

Until this point, when I wanted to look at the website, I would make sure to have the windows from the pelican-devserver.bat script up and running. Behind the scenes, the pelican-autobuild.bat script and the pelican-server.bat scripts were being run in their own windows, the first script building the site on any changes and the second script serving the newly changed content. As long as I am developing the site or an article, that workflow is a good and efficient workflow.

When generating the output for the actual website, I felt that I needed a different workflow. As that act of publishing is a very deliberate act, my feeling is that it should be more controlled than automatically building the entire site on each change. Ideally, I want to be able to proof a group of changes to the website before making those changes public.

One of the major reasons for the deliberate workflow is that, from experience, the generation of anything production grade relies on some form of configuration that is specific to the thing you are producing. For my website, this needs extra testing specifically around that production configuration in order for my confidence in those changes to be high enough that I am confident in publishing it.

The most immediate example of such configuration is the SITE_URL configuration variable. While it was not obvious in the examples that I researched, this variable must be set to the actual base URL of the hosting site. Using the Elegant theme, if you click on the Categories button in the header, and then the Home button, it will stay on the Categories page. Looking more closely at the source for the base.html page, the Home button contains an url is defaulted to ’‘. Digging into the template for the base.html page, the value being set for the anchor of that button is href=”{{ SITEURL }}”.

Hence, for the Home button to work properly, SITE_URL needs a proper value. The default configuration in pythonconf.py for SITE_URL is ’‘, so that needed to be changed. For the developer website to work properly, SITE_URL must be set to ‘http://localhost:8000’ in pythonconf.py. This however introduces a new issue: how do I make sure this variable is set properly when we publish the output?

Luckily, the Pelican developers thought of situations like this. Back in the second article of this series, Step 4: Create a Generic Web Site, I mentioned a file called publishconf.py. This file was generated as part of the output of pelican-quickstart and has not been mentioned since. This file is intended to be used as part of a publish workflow, allowing the variables from publishconf.py to be overridden.

Specifically, in that file, the following code imports the settings from publishconf.py before defining alternate values for them:

sys.path.append(os.path.abspath(os.curdir))
from website.pelicanconf import *

Below this part of the configuration, in the same manner as in pythonconf.py, the SITEURL variable in publishconf.py is set to ’‘. Therefore, when I publish the website with the publish configuration, it will use ’‘ for the SITE_URL. To make sure the website publishes properly, I needed to change the SITE_URL variable in publishconf.py to reflect the website where we are publishing to, namely https://jackdewinter.github.io .

Now that I took care of that, I just needed to come up with a batch script that makes use of publishconf.py. To accomplish that, I simply copied the pelican-build.bat script to pelican-publish.bat, and edited the file removing the –debug flag and referring to publishconf.py instead of pelicanconf.py:

pelican --verbose --output ..\blog-output --settings website\publishconf.py website\content

To test this out, I stopped the pelican-autobuild.bat script and executed the pelican-publish.bat script. By leaving the pelican-server.bat script running, I was able to double check the published links, verifying that they were based on the jackdewinter.github.io site where I wanted to publish them.

Pushing the Content To The Remote¶

At this point, I had two local repositories, one with commits and one without, and two remote repositories with no information. While I wanted to see the results and work on the blog-output repositories first, it was more important to make sure my work was safe in the blog-content repositories. So that one would be first.

Changing into the blog-content directory and doing a git status -s, I noticed a couple of changes that were not committed. A quick git add –all and a git commit later, all of the changes were committed to the local repository. At this point, the changes are present in the local repository, but not in the remote repository. The following command will push those changes up to the remote repository’s master branch.

git push --set-upstream origin master

At this point, I did a quick check on the blog-content repository in GitHub and made sure that all of the repository was up there. Now, in the future, I knew I would be more selective than using git add –all most of the time, but for now it was a good start. So I carefully went through the files that GitHub listed and verified them manually against what was in the directory. I didn’t expect any issues, but a quick check helped with my confidence that I had set up the repository correctly.

Pushing the Output To The Remote¶

Once that was verified, I carefully repeated the same actions with the blog-output directory but with one small change. In the blog-content directory, I want to save any changes. However, with the blog-output directory, I want to commit everything, ever if there are conflicts. This is something that is done with quite a few static sites, so the workflow is decently documented.

As this is an action that I am going to repeat every time I publish, I placed in a script file called pelican-upload.bat:

pushd ..\blog-output
git add --all .
git commit -m "new files"
ssh-agent
git push origin master --force
popd

In order: switch to the blog-output directory, add all of the files, commit them with a simple reason, ensure the ssh-agent is up and running, push the committed files to remote repository, and go back to our original directory.

If that last git push looks weird, it is. It is so weird and destructive that there are a number of posts like git push –force and how to deal with it and GIT: To force-push or not to force-push. However, even after I looked at the manual page for git push, I was still trying to figure it out. It wasn’t until I came across The Dark Side of the Force Push, and specifically the Force Push Pitfalls section of that article, that things made sense.

Under

new script
run pelican-upload.bat

Viewing the Webpage¶

To make sure things looked right, I wanted to do a side by side comparison of what I could see in my browser both locally and on the new website. To do that, I opened up one tab of my browser and pointed it to http://localhost:8000/, and another tab beside it and pointed it to https://jackdewinter.github.io/. To be honest, while I was hoping there would be no issues, I was expecting at least 1-2 items to be different. However, as I went through the comparison, there was 100% parity between the two versions of the website.

What Was Accomplished¶

At the beginning of this article, I had most of what I needed to start selecting a theme. It took some small updates to the configuration to make sure I had a good test site available. This was critical to allowing me to go through each theme I was interested in and see if it was for what I was looking for. While one of the themes proved to be a handful, the experience was good in advising me of possible issues I might have in customizing my own site.

In the end, I had a strong choice of the elegant theme, which as benefits, is actively being developed and has great documentation.

So what do you think? Did I miss something? Is any part unclear? Leave your comments below.

Comments