Nov 2021
If you're a Django developer and want to publish a website without the hassle (and costs) of deploying a web app, then this post may give you some useful tips.
I found myself in this situation several times, so have created a time-saving workflow/set of tools for extracting a dynamic Django website into a static website (= a website that does not require a web application, just plain simple HTML pages).
Disclaimer: this method is not suited for all types of websites. EG if your Django application is updated frequently (e.g. more than once a day), or if it has keyword search (or faceted search) pages that inherently rely on dynamical queries to the Django back-end based on user input, then a static site won't cut it for you, most likely.
In a nutshell - this is how it works:
localhost
and is used to review/style these pages.All of the steps happen on your local computer. Keep reading to find out more about each of these steps.
It's useful to think of a concrete use-case, so I'll be talking about my personal site www.michelepasin.org (the site you're on right now) which happens to be the first site I've published following this method.
My website has generally two types of pages:
For the static text pages, I basically have a folder on my computer full of Markdown files, which I can edit using any text editor I want. E.g., lately I've been using Obsidian (screenshot below) for taking notes and editing markdown documents. It's excellent (see this review). So I keep all the blog posts in a folder (outside the Django app) that is monitored by Obsidian (a 'vault', that's how Obsidian calls it).
As a result, I can make use of Obsidian's advanced Markdown editing features.
Plus, it fits nicely in my daily routine because the posts live alongside other daily notes that are not meant for sharing. This is super cool because I have my entire blog archive accessible in my note-taking app, which kinds of blurs the line between published note and private notes.
For the dynamic pages, I just use the good old Django templates (the usual Django-way of doing things).
There's nothing new here - if you are a Django developer, you'd know immediately what I'm talking about. HTML, template tags etc. etc...
This approach allows me to have a clean separation between content I need a database for - that is, content I want to catalog and organize methodically (e.g. projects, papers, etc…) - and other textual content that I'd rather edit outside the web app, for example using my text editor of choice.
So, in a nutshell
The Django web app is where things come together. The idea is that you run the Django app as usual and see the results on localhost
.
$ python manage.py runserver
The interesting bit has to do with how the Markdown / static text pages get integrated into the website. There are two parts: markdown indexing and markdown rendering.
A command line script creates an index of all the Markdown files (in the blog posts folder). This index is basically a collection of metadata about these files (e.g. title, publication date, tags, file location on the computer), which get stored in the Django back-end database.
As a result, this metadata can be used by the Django web app in order to retrieve / search / filter the Markdown files, via the usual model-view-controller machinery Django provides.
Here is a sample run of the interactive script, called blogs-reindex
:
$ ./tools/blogs-reindex
+++++++++++++++++
REFRESHING BLOGS INDEX...
+++++++++++++++++
Environment: local
DEBUG:True
+++++++++++++++++
Reading... </Users/michele.pasin/markdonw/Blog/>
=> Found new markdown file: 2021-10-29-django-wget-static-site.md.
Add to database? [y/N]: y
=> Created new obj: {pub}
# Files read: 340
# Records added: 1
# Records modified: 0
Cleaning db...
# Records in db : 340
# Markdown files: 340
----------
Done
See the full source code on Github.
The second part of this architecture simply consists of the Python functions that transform Markdown into HTML. Of course, that follows the usual Django pattern, i.e. a url
controller and related view
functions that know how to handle the Markdown format:
#
# in settings.py
#
BLOGS_ROOT = "/path/to/my/markdown/blog/"
#
# in views.py
#
def blog_detail(request, year="", month="", day="", namedetail=""):
"""
Generate the blog entry page from markdown files
"""
context = {}
permalink = f"""{year}/{month}/{day}/{namedetail}"""
return_item = get_object_or_404(Publication, permalink=permalink)
# get the contents from the source MD files
# NOTE the filepath is stored in the `md_file` field
TITLE, DATE, REVIEW, PURE_MARKDOWN= parse_markdown(BLOGS_ROOT +
return_item.md_file)
html_blog_entry = markdown.markdown(PURE_MARKDOWN,
extensions=['fenced_code', 'codehilite'])
context = {
'return_item' : return_item,
'admin_change_url' : admin_change_url,
'blog_entry': html_blog_entry,
}
templatee = "detail-blogs.html"
return render(request, APP + '/pages/' + templatee, context)
#
# in urls.py
#
url(r'^blog/(?P<year>[\w-]+)/(?P<month>[\w-]+)/(?P<day>[\w-]+)/(?P<namedetail>[\w-]+)/$', views_pubs.blog_detail, name='blogs-detail'),
The code above uses the python-markdown library in order to transform Markdown text to HTML. For more details, see the source code of the parse_markdown
function.
The final step is publishing the website online. For this I'm using Wget to generate a static version of the site, and GitHub pages to make it available online at no cost.
This is done via another command line script: ./tools/site-dump-and-publish
. The command does four main things:
wget
on the Django app running in local to pull all the site pages as a static website (note: hyperlinks must be 'mirrored').docs
folder (which I configured in GitHub Pages as the site source location). See the full command source code.
Here is a sample run:
$ ./tools/site-dump-and-publish
=================
PREREQUISITE
Ensure site is running on 127.0.0.1:8000...
=================
>>>>>>> DJANGO is RUNNING!
=================
=================
Dumping site..
=================
CREATE BACKUP DIRS in: backups/2021_10_28__05_31_49_pm/
+++++++++++++++++
BACKING UP SITE....RSYNC /DOCS TO /backups/2021_10_28__05_31_49_pm/docs
+++++++++++++++++
building file list ... done
...
... [OMITTED]
...
sent 101359148 bytes received 25982 bytes 40554052.00 bytes/sec
total size is 101243272 speedup is 1.00
BACKING UP DB DATA ....src/manage.py dumpdata TO /backups/2021_10_28__05_31_49_pm/data
+++++++++++++++++
Environment: local
DEBUG:True
+++++++++++++++++
DONE: backups/2021_10_28__05_31_49_pm/data/dump.json
BACKING UP MARKDOWN FILES from /Users/michele.pasin/Dropbox/Apps/NVALT/001/Blog TO /backups/2021_10_28__05_31_49_pm/md
+++++++++++++++++
DONE: backups/2021_10_28__05_31_49_pm/md
CLEAN UP TEMP BUILD DIR..
+++++++++++++++++
WGET SITE INTO TEMP BUILD DIRECTORY..
+++++++++++++++++
...
... [OMITTED]
...
FINISHED --2021-10-28 17:34:37--
Total wall clock time: 2m 44s
Downloaded: 878 files, 96M in 0.8s (120 MB/s)
RSYNC TEMP BUILD DIRECTORY INTO FINAL LOCATION: /docs
+++++++++++++++++
...
... [OMITTED]
...
sent 19574621 bytes received 17352 bytes 13061315.33 bytes/sec
total size is 101243252 speedup is 5.17
=================
Commit and push
=================
[master dd479c70] live site auto update
32 files changed, 1363 insertions(+), 49 deletions(-)
delete mode 100755 tools/data-dump
Enumerating objects: 220, done.
Counting objects: 100% (220/220), done.
Delta compression using up to 8 threads
Compressing objects: 100% (85/85), done.
Writing objects: 100% (116/116), 20.55 KiB | 1.37 MiB/s, done.
Total 116 (delta 63), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (63/63), completed with 51 local objects.
To https://github.com/lambdamusic/portfolio-site.git
8556da94..dd479c70 master -> master
Done
=================
wget
GNU Wget is your friend. It's as old as the web (nearly, 1996) and still one of the most powerful tools for extracting web pages.
Trivia
In 2010, Chelsea Manning used Wget to download 250,000 U.S. diplomatic cables and 500,000 Army reports that came to be known as the Iraq War logs and Afghan War logs sent to WikiLeaks. (wikipedia)
Here are some powerful options as described in the manual:
--mirror
– Makes (among other things) the download recursive.--convert-links
– convert all the links (also to stuff like CSS stylesheets) to relative, so it will be suitable for offline viewing.--adjust-extension
– Adds suitable extensions to filenames (html
or css
) depending on their content-type.--page-requisites
– Download things like CSS style-sheets and images required to properly display the page offline.--no-parent
– When recursing do not ascend to the parent directory. It useful for restricting the download to only a portion of the site.--no-host-directories
– Disable generation of host-prefixed directories. By default, invoking Wget with ‘-r http://fly.srk.fer.hr/’ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.--wait
– Waiting time between calls. Set reasonable or random waiting times between two downloads to avoid the Connection closed by peer error.I used GitHub pages to host my static site, but there are many other no-cost options out there e.g. GitLab or Netlify. So take your pick.
Of course, there are size limits too. EG with GitHub Pages the maximum size for a repo (and hence a site) is 1 GB.
Also, there might be issues with how frequently you publish the site, although I never ran into any problem with that so far.
In this post I've shown how to turn a Django site into a static website, so that it can be published online without the hassle (and costs) of deploying a web app.
Most likely, this method will resonate with Django developers primarily, but even if you're not, I'm hoping that some of these ideas can be easily transposed to other Python web frameworks.
I found a couple of other options online that allow turning a Django site into a static website. None of them worked for me, but they may for you, so worth having alook:
Feedbacks/comments are welcome, here or on Github as usual :-)
Cite this blog post:
Comments via Github:
2022
International Conference on Science, Technology and Innovation Indicators (STI 2022), Granada, Sep 2022.
2019
Second biennial conference on Language, Data and Knowledge (LDK 2019), Leipzig, Germany, May 2019.
2017
paper Using Linked Open Data to Bootstrap a Knowledge Base of Classical Texts
WHiSe 2017 - 2nd Workshop on Humanities in the Semantic web (colocated with ISWC17), Vienna, Austria, Oct 2017.
2014
New Technologies and Renaissance Studies II, ed. Tassie Gniady and others, Medieval and Renaissance Texts and Studies Series (Iter Academic Press), Dec 2014. Volume 4
2012
2009