Changelog

I have begun keeping a sporadic list of things I change on the site as a timeline/chronology of its evolution. This is primarily for my own use so I can quickly glance back and glean some context about when and why I might have made a change. I’ve opted to post it publicly to the site in case anyone else finds it interesting.

Add dataclasses to ratchet up the strictness

In continuing to ratchet up the strictness of the build system I landed on the concept of dataclasses, which are new to this novice programmer, though the concept of a schema for validation is not.

Anyway, I landed on creating one dataclass for Documents and another for Assets.

@dataclass
class DocumentMetadata:
    filepath: Path
    uid: str
    slug: str
    title: str
    primary: str
    secondary: str
    available: datetime.datetime
    created: datetime.datetime
    updated: datetime.datetime
    creator: str = ""
    note: str = ""
    favourite: bool = False
    parent: str = ""
    description: str = ""
    layout: str = TEMPLATE_DEFAULT
    source: Dict = field(default_factory=dict)
    via: Dict = field(default_factory=dict)
    location: Dict[str, Any] = field(
        default_factory=lambda: {
            "continent": "",
            "country": "",
            "region": "",
            "city": "",
            "note": "",
            "lat": int,
            "lng": int,
        }
    )
    collection: Dict[str, Any] = field(
        default_factory=lambda: {
            "style": "title",
            "order": "chronological",
            "include": [],
        }
    )
    attribution: Dict[str, str] = field(
        default_factory=lambda: {
            "plain": "",
            "djot": "",
            "html": "",
        }
    )
    media: str = "application/toml"
    words: Dict[str, Any] = field(
        default_factory=lambda: {
            "self": 0,
            "code": {
                "lines": 0,
                "words": 0,
            },
            "references": 0,
        }
    )
    status: str = ""
    links: LinksDict = field(
        default_factory=lambda: {
            "internal": list(),
            "external": list(),
            "backlinks": list(),
        }
    )
    options: List[str] = field(default_factory=list)
    tags: List[str] = field(default_factory=list)
    content: Dict[str, str] = field(default_factory=dict)

    def __post_init__(self):
        # Validate links dictionary structure
        required_link_types = {"internal", "external", "backlinks"}
        if (
            not isinstance(self.links, dict)
            or set(self.links.keys()) != required_link_types
        ):
            raise ValueError(
                f"links must be a dictionary with exactly these keys: {required_link_types}"
            )
        for key in self.links:
            if not isinstance(self.links[key], set):
                self.links[key] = set(self.links[key])
@dataclass
class AssetMetadata:
    filepath: Path
    media: str
    uid: str
    slug: str
    title: str
    available: datetime.datetime
    created: datetime.datetime
    updated: datetime.datetime
    creator: str = ""
    note: str = ""
    favourite: bool = False
    source: Dict = field(default_factory=dict)
    via: Dict = field(default_factory=dict)
    hash: str = ""
    output_width: int = 0
    output_height: int = 0
    location: Dict[str, Any] = field(
        default_factory=lambda: {
            "continent": "",
            "country": "",
            "region": "",
            "city": "",
            "note": "",
            "lat": int,
            "lng": int,
        }
    )
    attribution: Dict[str, str] = field(
        default_factory=lambda: {
            "plain": "",
            "djot": "",
            "html": "",
        }
    )
    words: Dict[str, Any] = field(
        default_factory=lambda: {
            "self": 0,
            "code": {
                "lines": 0,
                "words": 0,
            },
            "references": 0,
        }
    )
    links: LinksDict = field(
        default_factory=lambda: {
            "internal": list(),
            "external": list(),
            "backlinks": list(),
        }
    )
    tags: List[str] = field(default_factory=list)
    content: Dict[str, str] = field(default_factory=dict)

    def __post_init__(self):
        # Validate links dictionary structure
        required_link_types = {"internal", "external", "backlinks"}
        if (
            not isinstance(self.links, dict)
            or set(self.links.keys()) != required_link_types
        ):
            raise ValueError(
                f"links must be a dictionary with exactly these keys: {required_link_types}"
            )
        for key in self.links:
            if not isinstance(self.links[key], set):
                self.links[key] = set(self.links[key])

This turned up a couple of oversights in metadata for existing documents, and also encouraged me to restructure the final built metadata, see backlinks and wordcounting.

Speaking of backlinks, I coupled this work with my ongoing effort to make the building of the completely deterministic. The builds already are deterministic in every way that matters. Same inputs make for the same output, except in one way: because documents are built asynchronously, it has been possible for the order of backlink references to change between builds. This doesn’t alter any functionality or correctness but does mean that the order of the backlinks in the hidden build summary of each page could change, and this makes diffing over the whole site to spot regressions more noisy, as many differences were not really differences at all.

Solving this turned out to be easy once I realised that a few pages kept seeing the order shift because they referenced pages that had the same publish (available) date, which is the key for the sort order. With that remedied, determinism is assured.

For a glimpse at the surface area of the changes, click here to see a diff from implementing the first dataclass, though more changes have landed since.
 # Imports
 from collections import Counter
-from dataclasses import dataclass, field
+from dataclasses import dataclass, field, asdict
 from hashlib import md5
 from pathlib import Path
 from shutil import copyfile
 from subprocess import run
 from typing import List, Dict, Any, Tuple
+from typing_extensions import TypedDict
 import asyncio
 import datetime
 import logging
@@ -73,11 +74,11 @@ class SiteMetadata:
     words: Dict[str, int] = field(
         default_factory=lambda: {"drafts": 0, "references": 0, "self": 0}
     )
-    links: Dict[str, set] = field(
+    links: Dict[str, Any] = field(
         default_factory=lambda: {
-            "internal": set(),
+            "internal": list(),
+            "backlinks": list(),
             "external": set(),
-            "backlinks": set(),
         }
     )
     pagecount: int = 0
@@ -91,9 +92,15 @@ class SiteMetadata:
     slug_to_title_lookup: Dict[str, str] = field(default_factory=dict)


+class LinksDict(TypedDict):
+    internal: list[str]
+    external: list[str]
+    backlinks: list[str]
+
+
 @dataclass
 class DocumentMetadata:
-    filename: Path
+    filepath: Path
     uid: str
     slug: str
     created: datetime.datetime
@@ -102,20 +109,78 @@ class DocumentMetadata:
     title: str
     primary: str
     secondary: str
+    creator: str = ""
+    note: str = ""
+    favourite: bool = False
+    parent: str = ""
+    description: str = ""
+    layout: str = TEMPLATE_DEFAULT
+    source: Dict = field(default_factory=dict)
+    via: Dict = field(default_factory=dict)
+    location: Dict = field(default_factory=dict)
+    collection: Dict[str, Any] = field(
+        default_factory=lambda: {
+            "style": "title",
+            "order": "chronological",
+            "include": [],
+        }
+    )
+    attribution: Dict[str, str] = field(
+        default_factory=lambda: {
+            "plain": "",
+            "djot": "",
+            "html": "",
+        }
+    )
+    media: str = "application/toml"
     words: int = 0
-    links: Dict[str, set] = field(
+    status: str = ""
+    links: LinksDict = field(
         default_factory=lambda: {
-            "internal": set(),
-            "external": set(),
-            "backlinks": set(),
+            "internal": list(),
+            "external": list(),
+            "backlinks": list(),
         }
     )
     options: List[str] = field(default_factory=list)
     tags: List[str] = field(default_factory=list)
-    interlinks: List[str] = field(default_factory=list)
-    backlinks: List[str] = field(default_factory=list)
     content: Dict[str, str] = field(default_factory=dict)
-    html: str = ""
+
+    def __post_init__(self):
+        # Validate required string fields are not empty
+        # for field_name in ["uid", "slug", "title", "primary", "secondary"]:
+        #    value = getattr(self, field_name)
+        #    if not isinstance(value, str) or not value.strip():
+        #        raise ValueError(f"\n\n{self}\n\n{field_name} {value} must be a non-empty string"\n)
+
+        # Validate filepath is a Path object
+        if not isinstance(self.filepath, Path):
+            self.filepath = Path(self.filepath)
+
+        ## Validate datetime fields
+        # for field_name in ["created", "updated", "available"]:
+        #    value = getattr(self, field_name)
+        #    if not isinstance(value, datetime.datetime):
+        #        raise ValueError(f"{field_name} must be a datetime object")
+        #    # Ensure timezone is None
+        #    setattr(self, field_name, value.replace(tzinfo=None))
+
+        # Validate words is non-negative
+        if not isinstance(self.words, int) or self.words < 0:
+            raise ValueError("words must be a non-negative integer")
+
+        # Validate links dictionary structure
+        required_link_types = {"internal", "external", "backlinks"}
+        if (
+            not isinstance(self.links, dict)
+            or set(self.links.keys()) != required_link_types
+        ):
+            raise ValueError(
+                f"links must be a dictionary with exactly these keys: {required_link_types}"
+            )
+        for key in self.links:
+            if not isinstance(self.links[key], set):
+                self.links[key] = set(self.links[key])


 def init_site():
@@ -132,12 +197,47 @@ def init_site():
     )


-def load_assets():
+def preprocess_asset_metadata(
+    uid: str, asset_data: Dict[str, Any], manifest_path: Path
+) -> Dict[str, Any]:
+    """Preprocess asset metadata to ensure it meets DocumentMetadata requirements."""
+    processed = asset_data.copy()
+
+    # Handle dates
+    for date_field in ["created", "updated", "available"]:
+        if isinstance(processed.get(date_field), str):
+            processed[date_field] = _parse_date(processed[date_field])
+        elif isinstance(processed.get(date_field), datetime.datetime):
+            processed[date_field] = processed[date_field].replace(tzinfo=None)
+        else:
+            processed[date_field] = datetime.datetime.now()
+
+    # Set required fields with defaults if not present
+    processed.setdefault("uid", uid)
+    processed.setdefault("primary", "asset")
+    processed.setdefault("secondary", processed["media"])
+
+    return processed
+
+
+def load_assets() -> Dict[str, DocumentMetadata]:
+    """Load asset manifests and convert them to DocumentMetadata instances."""
     assets = {}
     asset_manifests = list(ASSET_DIR.glob("manifests/*.toml"))
+
     for manifest in asset_manifests:
         with open(manifest, "rb") as f:
-            assets.update(tomllib.load(f))
+            manifest_data = tomllib.load(f)
+
+        for uid, asset_data in manifest_data.items():
+            try:
+                processed_data = preprocess_asset_metadata(uid, asset_data, manifest)
+                processed_data["filepath"] = ASSET_DIR / processed_data["filepath"]
+                assets[uid] = DocumentMetadata(**processed_data)
+            except Exception as e:
+                print(f"Error processing asset {uid} from {manifest}: {str(e)}")
+                continue
+
     return assets


@@ -176,7 +276,7 @@ def get_files() -> List[Path]:
     return [f for f in NOTES_DIR.glob("**/*.md") if "available = " in f.read_text()]


-def extract_external_links(text: str) -> List:
+def extract_external_links(text: str, site) -> List:
     url_pattern = r"(https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+(?:/[^)\s]*)?)"
     matches = re.findall(url_pattern, text)

@@ -186,18 +286,21 @@ def extract_external_links(text: str) -> List:
         parsed_url = urlparse(url)
         if parsed_url.netloc.lower() != "silasjelley.com":
             external_links.add(url)
+            site.links["external"].add(url)

-    return list(external_links)
+    return sorted(external_links)


 async def process_document(
-    filename: Path, site: SiteMetadata
-) -> Tuple[str, Dict[str, Any]]:
-    with open(filename, "rb") as f:
+    filepath: Path, site: SiteMetadata
+) -> Tuple[str, DocumentMetadata]:
+    """Process a document file and return its UID and metadata."""
+
+    with open(filepath, "rb") as f:
         try:
             parsed_toml = tomllib.load(f)
         except:
-            print(filename)
+            print(filepath)
             import sys

             sys.exit(1)
@@ -205,52 +308,48 @@ async def process_document(
     # The UID is now the top-level table name
     uid = parsed_toml["uid"]

-    # Preprocess metadata (assuming this function exists and works with the new format)
-    document = preprocess_metadata(filename, parsed_toml)
+    # Process metadata into DocumentMetadata instance
+    document = preprocess_metadata(filepath, parsed_toml)

     # Calculate and update word counts
     try:
-        document["words"] = len(document["content"]["plain"].split())
+        document.words = len(document.content.get("plain", "").split())
     except:
-        document["words"] = 0
+        document.words = 0

-    if document.get("status", "") == "draft":
-        site.words["drafts"] += document["words"]
+    if document.status == "draft":
+        site.words["drafts"] += document.words
     else:
         try:
-            document["source"]["words"] = len(document["source"]["text"].split())
-            site.words["references"] += document["source"]["words"]
+            source_words = len(document.source.get("text", "").split())
+            site.words["references"] += source_words
         except KeyError:
             pass
         try:
-            site.words["self"] += document["words"]
+            site.words["self"] += document.words
         except:
             pass

     # Extract external links from the plain text content
     try:
-        plain_text = document.get("content", {}).get("plain", "") + " "
-        plain_text += document.get("source", {}).get("url", "") + " "
-        plain_text += document.get("via", {}).get("url", "") + " "
-
-        external_links = extract_external_links(plain_text)
+        plain_text = (
+            document.content.get("plain", "")
+            + " "
+            + document.source.get("url", "")
+            + " "
+            + document.via.get("url", "")
+        )

-        # Store the external links in document["links"]["external"]
-        document["links"] = {
-            "internal": set(),
-            "external": set(),
-            "backlinks": set(),
-        }
-        document["links"]["external"].update(external_links)
-        site.links["external"].update(external_links)
+        external_links = extract_external_links(plain_text, site)
+        document.links["external"] = external_links
     except KeyError:
-        print(f"KeyError while compiling external links from {document['filename']}")
+        print(f"KeyError while compiling external links from {document.filepath}")
         pass

     return uid, document


-async def ingest_documents(site: SiteMetadata) -> Dict[str, Dict[str, Any]]:
+async def ingest_documents(site: SiteMetadata) -> Dict[str, Any]:
     logger.info("Ingesting files")
     file_list = get_files()
     documents = {}
@@ -258,11 +357,9 @@ async def ingest_documents(site: SiteMetadata) -> Dict[str, Dict[str, Any]]:
     slug_to_uid_lookup = {}
     site_primaries = set()
     site_secondaries = set()
-    site_series = set()
-    tags = set()
     uuid_collision_lookup = []

-    tasks = [process_document(filename, site) for filename in file_list]
+    tasks = [process_document(filepath, site) for filepath in file_list]
     results = await asyncio.gather(*tasks)
     site.words["total"] = (
         site.words["drafts"] + site.words["references"] + site.words["self"]
@@ -270,13 +367,11 @@ async def ingest_documents(site: SiteMetadata) -> Dict[str, Dict[str, Any]]:

     for uid, doc in results:
         documents[uid] = doc
-        slug_to_title_lookup[doc["slug"]] = doc["title"]
-        slug_to_uid_lookup[doc["slug"]] = uid
-        site_primaries.add(doc["primary"])
-        site_secondaries.add(doc["secondary"])
-        if "series" in doc:
-            site_series.add(doc["series"])
-        tags.update(doc.get("tags") or [])
+        slug_to_title_lookup[doc.slug] = doc.title
+        slug_to_uid_lookup[doc.slug] = uid
+        site_primaries.add(doc.primary)
+        site_secondaries.add(doc.secondary)
+        site.tags += doc.tags
         uuid_collision_lookup.append(uid)

     site.slug_to_uid_lookup = slug_to_uid_lookup
@@ -284,7 +379,6 @@ async def ingest_documents(site: SiteMetadata) -> Dict[str, Dict[str, Any]]:
     check_uuid_collisions(uuid_collision_lookup)
     site.primaries = list(site_primaries)
     site.secondaries = list(site_secondaries)
-    site.tags = list(tags)
     site.pagecount = len(documents)

     logger.info(f"Ingested {site.pagecount} files")
@@ -373,14 +467,14 @@ def process_image_parallel(input_data: Tuple[Path, Path, int, str]) -> None:


 def process_assets(
-    assets: Dict[str, Dict[str, Any]], asset_dir: Path, output_dir: Path
+    assets: Dict[str, DocumentMetadata], asset_dir: Path, output_dir: Path
 ) -> None:
     logger.info("Processing assets")
     manifest_images = []

     for asset_identifier, asset_metadata in assets.items():
-        source_path = Path(asset_metadata["filepath"])
-        output_path = output_dir / asset_metadata["slug"]
+        source_path = Path(asset_metadata.filepath)
+        output_path = output_dir / asset_metadata.slug
         os.makedirs(output_path.parent, exist_ok=True)

         if not source_path.exists():
@@ -422,74 +516,52 @@ def _parse_date(date_str: str) -> datetime.datetime:
         return datetime.datetime.strptime(date_str, "%Y-%m-%d").replace(tzinfo=None)


-def preprocess_metadata(filename: Path, metadata: Dict[str, Any]) -> Dict[str, Any]:
-    """Preprocesses metadata for a document, ensuring required fields exist and formatting data."""
-
-    metadata["filename"] = filename
-
-    # Validate required fields
-    required_fields = ["uid", "slug", "available", "created", "primary", "secondary"]
-    missing_fields = [field for field in required_fields if field not in metadata]
-    if missing_fields:
-        raise ValueError(
-            f"[ERROR] Document missing {', '.join(missing_fields)}\n  {filename}"
-        )
-
-    # Set default values
-    metadata.setdefault("updated", metadata["created"])
+def preprocess_metadata(filepath: Path, metadata: Dict[str, Any]) -> DocumentMetadata:
+    """Preprocesses metadata for a document and converts it to a DocumentMetadata instance."""
+    # Create a working copy to avoid modifying the input
+    processed = metadata.copy()

     # Parse date fields
     for date_field in ["created", "updated", "available"]:
-        if date_field in metadata:
-            if isinstance(metadata[date_field], str):
-                metadata[date_field] = _parse_date(metadata[date_field])
-            elif isinstance(metadata[date_field], datetime.datetime):
-                metadata[date_field] = metadata[date_field].replace(tzinfo=None)
-
-    # Process source information
-    if "source" in metadata:
-        if "via" in metadata:
-            metadata.update(
-                process_source_information(metadata["source"], metadata["via"])
-            )
-        else:
-            metadata.update(process_source_information(metadata["source"], {}))
+        if isinstance(processed.get(date_field), str):
+            processed[date_field] = _parse_date(processed[date_field])
+        elif isinstance(processed.get(date_field), datetime.datetime):
+            processed[date_field] = processed[date_field].replace(tzinfo=None)
+
+    # Set default updated time if not provided
+    processed.setdefault("updated", processed.get("created"))
+
+    # Process source information if present
+    if "source" in processed:
+        processed["attribution"] = process_source_information(
+            processed["source"], processed.get("via", {})
+        )
+    else:
+        processed["attribution"] = {}
+        processed["source"] = {}
+
+    if "via" not in processed:
+        processed["via"] = {}

     # Determine title
-    metadata["title"] = (
-        metadata.get("title")
-        or metadata.get("attribution", {}).get("plain")
-        or metadata["created"].strftime("%B %e, %Y %-I.%M%p")
+    processed["title"] = (
+        processed.get("title")
+        or processed.get("attribution", {}).get("plain")
+        or processed["created"].strftime("%B %e, %Y %-I.%M%p")
     )
-    # Ensure title and slug are strings
-    metadata["title"] = str(metadata["title"])
-    if metadata.get("status") == "draft":
-        metadata["slug"] = "drafts/" + metadata["uid"]
-        metadata["parent"] = "a26221ae-c742-4b73-8dc6-7f8807456a1b"
-    else:
-        metadata["slug"] = str(metadata["slug"])

-    # Initialize interlinks, backlinks, and tags
-    metadata["interlinks"] = set()
-    metadata["backlinks"] = set()
-    metadata["tags"] = metadata.get("tags") or []
+    # Handle draft status
+    if processed.get("status") == "draft":
+        processed["slug"] = f"drafts/{processed['uid']}"

-    # Strip whitespace from plain content
-    try:
-        metadata["content"]["plain"] = metadata["content"]["plain"].strip()
-    except KeyError:
-        pass
-    try:
-        metadata["source"]["text"] = metadata["source"]["text"].strip()
-    except KeyError:
-        pass
+    # Add filepath as it's required but comes from function parameter
+    processed["filepath"] = filepath

-    return metadata
+    # Create and return DocumentMetadata instance
+    return DocumentMetadata(**processed)


-def process_source_information(
-    source: Dict[str, Any], via
-) -> Dict[str, Dict[str, str]]:
+def process_source_information(source: Dict[str, Any], via) -> Dict[str, str]:
     creator = source.get("creator") or source.get("director")
     title = source.get("title") or (
         " ♫ " + str(source.get("track"))
@@ -587,13 +659,11 @@ def process_source_information(
             partsvia = f" ([via]({escape_url(via["url"])}))"

     return {
-        "attribution": {
-            "plain": f"{speaker}{', '.join(partsplain + partsshared)}",
-            "djot": f"{speaker}{', '.join(partsdjot + partsshared)}",
-            "html": format_rich_attribution(
-                " — " + f"{speaker}{', '.join(partshtml + partsshared) + partsvia}"
-            ),
-        }
+        "plain": f"{speaker}{', '.join(partsplain + partsshared)}",
+        "djot": f"{speaker}{', '.join(partsdjot + partsshared)}",
+        "html": format_rich_attribution(
+            " — " + f"{speaker}{', '.join(partshtml + partsshared) + partsvia}"
+        ),
     }


@@ -619,8 +689,8 @@ def check_uuid_collisions(uuid_list):


 def insert_substitutions(
-    documents: Dict[str, Dict[str, Any]],
-    assets: Dict[str, Dict[str, Any]],
+    documents: Dict[str, DocumentMetadata],
+    assets: Dict[str, DocumentMetadata],
     site: SiteMetadata,
 ) -> None:
     logger.info("Performing substitutions")
@@ -635,53 +705,56 @@ def insert_substitutions(
     merged_data = {**documents, **assets}

     for key, page in documents.items():
-        logger.debug(f"  {key}, {page['title'][:40]}")
+        logger.debug(f"  {key}, {page.title[:40]}")

-        text = page.get("content", {}).get("plain")
+        text = page.content.get("plain")
         if text:
             text = replace_import_references(text, REF_IMPT_RE, merged_data, key, page)
             text = replace_cite_references(text, REF_CITE_RE, merged_data)
             text = replace_title_references(text, REF_TITLE_RE, merged_data)
             text = replace_slug_references(text, REF_SLUG_RE, merged_data)
             text = process_reference_links(text, REF_LINK_RE, merged_data, key)
-            page["content"]["plain"] = text.strip()
+            page.content["plain"] = text.strip()


 def replace_slug_references(
-    text: str, regex: re.Pattern, merged_data: Dict[str, Dict[str, Any]]
+    text: str, regex: re.Pattern, merged_data: Dict[str, DocumentMetadata]
 ) -> str:
     for match in regex.finditer(text):
         ref_type, ref_short_id = match.groups()
         full_match = match.group(0)
         ref_id = next((k for k in merged_data if k.startswith(ref_short_id)), None)
         if ref_id:
-            replacement = f"/{merged_data[ref_id]['slug']}"
+            try:
+                replacement = f"/{merged_data[ref_id].slug}"
+            except AttributeError:
+                replacement = f"/{merged_data[ref_id].slug}"
             text = text.replace(full_match, replacement)
     return text


 def replace_title_references(
-    text: str, regex: re.Pattern, merged_data: Dict[str, Dict[str, Any]]
+    text: str, regex: re.Pattern, merged_data: Dict[str, DocumentMetadata]
 ) -> str:
     for match in regex.finditer(text):
         opening, ref_type, ref_short_id, comment, closing = match.groups()
         full_match = match.group(0)
         ref_id = next((k for k in merged_data if k.startswith(ref_short_id)), None)
         if ref_id:
-            replacement = merged_data[ref_id]["title"]
+            replacement = merged_data[ref_id].title
             text = text.replace(full_match, replacement)
     return text


 def replace_cite_references(
-    text: str, regex: re.Pattern, merged_data: Dict[str, Dict[str, Any]]
+    text: str, regex: re.Pattern, merged_data: Dict[str, DocumentMetadata]
 ) -> str:
     for match in regex.finditer(text):
         opening, ref_type, ref_short_id, comment, closing = match.groups()
         full_match = match.group(0)
         ref_id = next((k for k in merged_data if k.startswith(ref_short_id)), None)
         if ref_id:
-            replacement = f"[{merged_data[ref_id]["attribution"]["djot"]}] (/{merged_data[ref_id]["slug"]})"
+            replacement = f"[{merged_data[ref_id].attribution["djot"]}] (/{merged_data[ref_id].slug})"
             text = text.replace(full_match, replacement)
     return text

@@ -689,29 +762,32 @@ def replace_cite_references(
 def replace_import_references(
     text: str,
     regex: re.Pattern,
-    merged_data: Dict[str, Dict[str, Any]],
+    merged_data: Dict[str, DocumentMetadata],
     key: str,
-    page: Dict,
+    page: DocumentMetadata,
 ) -> str:
     for match in regex.finditer(text):
         opening, ref_type, ref_short_id, comment, closing = match.groups()
         full_match = match.group(0)
         ref_id = next((k for k in merged_data if k.startswith(ref_short_id)), None)
         if ref_id:
-            ref_text = merged_data[ref_id]["content"]["plain"]
+            ref_text = merged_data[ref_id].content["plain"]
             if ref_type == "import::":
                 replacement = ref_text
             elif ref_type == "aside::":
-                ref_title = merged_data[ref_id]["title"]
-                ref_slug = merged_data[ref_id]["slug"]
-                ref_location = merged_data[ref_id].get("location", "")
-                location_string = (
-                    " ⚕ "
-                    + ref_location.get("city")
-                    + ", "
-                    + ref_location.get("country")
-                    or ""
-                )
+                ref_title = merged_data[ref_id].title
+                ref_slug = merged_data[ref_id].slug
+                ref_location = merged_data[ref_id].location
+                if ref_location:
+                    location_string = (
+                        " ⚕ "
+                        + ref_location.get("city")
+                        + ", "
+                        + ref_location.get("country")
+                        or ""
+                    )
+                else:
+                    location_string = ""
                 replacement = (
                     f'{{.aside}}\n{':'*78}\n'
                     f'{ref_text}\\\n'
@@ -722,18 +798,22 @@ def replace_import_references(
                 )
             else:
                 raise ValueError(f"Unrecognised reference type: {ref_type}")
-            if not page.get("status", "") == "draft":
-                merged_data[ref_id]["backlinks"].add(key)
+            if not page.status == "draft":
+                merged_data[ref_id].links["backlinks"].add(key)
             text = text.replace(full_match, replacement)
     return text


 def process_reference_links(
-    text: str, regex: re.Pattern, merged_data: Dict[str, Dict[str, Any]], key: str
+    text: str, regex: re.Pattern, merged_data: Dict[str, DocumentMetadata], key: str
 ) -> str:
     for ref_text_match, _, ref_type, ref_short_id in regex.findall(text):
         match = f"[{ref_text_match}]({ref_type}{ref_short_id})"
-        ref_id = next((k for k in merged_data if k.startswith(ref_short_id)), None)
+        ref_id = next(
+            (k for k in merged_data.keys() if k.startswith(ref_short_id)), None
+        )
+        if ref_id is None:
+            print(f"No match found for {ref_short_id}")

         if not ref_id:
             raise ValueError(
@@ -746,16 +826,16 @@ def process_reference_links(
             )

         ref_text = get_reference_text(ref_text_match, merged_data[ref_id])
-        ref_slug = f"/{merged_data[ref_id]['slug']}"
+        ref_slug = f"/{merged_data[ref_id].slug}"

         if ref_type == "link::":
             try:
                 # Double quotes within a page title are escaped so that they don't break the HTML 'title' element
-                ref_title = f"{merged_data[ref_id]['title']} | {merged_data[ref_id]['created'].strftime('%B %Y')}".replace(
+                ref_title = f"{merged_data[ref_id].title} | {merged_data[ref_id].created.strftime('%B %Y')}".replace(
                     '"', '\\"'
                 )
             except KeyError:
-                ref_title = merged_data[ref_id]["title"].replace('"', '\\"')
+                ref_title = merged_data[ref_id].title.replace('"', '\\"')
             replacement = f'[{ref_text}]({ref_slug}){{title="{ref_title}"}}'
         elif ref_type == "img::":
             replacement = f"[{ref_text}]({ref_slug})"
@@ -778,18 +858,18 @@ def process_reference_links(
     return text


-def get_reference_text(ref_text_match: str, ref_data: Dict) -> str:
+def get_reference_text(ref_text_match: str, ref_data) -> str:
     if ref_text_match.startswith("::") or ref_text_match == "":
-        return ref_data.get("title")
+        return ref_data.title
     return ref_text_match


-def create_quote_replacement(ref_data: Dict, ref_slug: str) -> str:
-    ref_src = ref_data["attribution"]["djot"]
+def create_quote_replacement(ref_data: DocumentMetadata, ref_slug: str) -> str:
+    ref_src = ref_data.attribution["djot"]
     try:
-        ref_text = ref_data["source"]["text"].replace("\n\n", "\n> \n> ").strip()
+        ref_text = ref_data.source["text"].replace("\n\n", "\n> \n> ").strip()
     except:
-        print(f"Error creating quote replacement: {ref_data["uid"]}")
+        print(f"Error creating quote replacement: {ref_data.uid}")
         import sys

         sys.exit()
@@ -808,10 +888,10 @@ Your browser does not support the video tag.
 def generate_html(documents):
     logger.info("Generating HTML")
     for key, page in documents.items():
-        if page.get("content", {}).get("plain"):
-            page["content"]["html"] = run_jotdown(page["content"]["plain"])
-        if page.get("source", {}).get("text"):
-            page["source"]["html"] = run_jotdown(page["source"]["text"])
+        if page.content.get("plain"):
+            page.content["html"] = run_jotdown(page.content["plain"])
+        if page.source.get("text"):
+            page.source["html"] = run_jotdown(page.source["text"])


 class LedgerLexer(RegexLexer):
@@ -841,7 +921,7 @@ class LedgerLexer(RegexLexer):
     }


-def highlight_code(code: str, language: str = None) -> str:
+def highlight_code(code: str, language: str) -> str:
     """
     Highlight code using Pygments with specified or guessed language.
     """
@@ -921,20 +1001,17 @@ def build_backlinks(documents, site):
     FOOTNOTE_LINK_URL_RE = re.compile(r"\[.+?\]:\s\/(.*)", re.DOTALL)
     interlink_count = 0
     for key, page in documents.items():
-        if (
-            "nobacklinks" in page.get("options", "")
-            or page.get("status", "") == "draft"
-        ):
+        if "nobacklinks" in page.options or page.status == "draft":
             continue

-        logger.debug(page["filename"])
+        logger.debug(page.filepath)

-        text = page.get("content", {}).get("plain")
+        text = page.content.get("plain")
         # Skip if no main content
         if not text:
             continue

-        interlinks = set(documents[key]["interlinks"])
+        interlinks = set(documents[key].links["internal"])

         combined_refs = INLINE_LINK_RE.findall(text) + FOOTNOTE_LINK_URL_RE.findall(
             text
@@ -950,11 +1027,28 @@ def build_backlinks(documents, site):
                     continue
                 logger.warning(f"\nKeyError in {page['title']} ({key}): {slug}")

-        documents[key]["interlinks"] = list(interlinks)
+        documents[key].links["internal"] = interlinks
         for interlink_key in interlinks:
-            documents[interlink_key]["backlinks"].add(key)
+            documents[interlink_key].links["backlinks"].add(key)

-    return interlink_count
+    for key, page in documents.items():
+        # Sort interlinks based on published dates
+        documents[key].links["internal"] = sorted(
+            documents[key].links["internal"],
+            key=lambda x: documents[x].available,
+            reverse=True,  # Most recent first
+        )
+        # Sort backlinks based on published dates
+        documents[key].links["backlinks"] = sorted(
+            documents[key].links["backlinks"],
+            key=lambda x: documents[x].available,
+            reverse=True,  # Most recent first
+        )
+
+    """
+    TODO: REMOVE SITE.BACKLINKS in favour a 'stats' or 'count' (templates will need updating
+    """
+    site.backlinks += interlink_count


 def should_ignore_slug(slug):
@@ -966,7 +1060,7 @@ def should_ignore_slug(slug):


 def build_collections(
-    documents: Dict[str, Dict[str, Any]], site: SiteMetadata
+    documents: Dict[str, DocumentMetadata], site: SiteMetadata
 ) -> Tuple[Dict[str, List[Dict[str, Any]]], List[Dict[str, Any]]]:
     collections = {
         primary: []
@@ -978,24 +1072,24 @@ def build_collections(
     sitemap = []

     for key, page in sorted(
-        documents.items(), key=lambda k_v: k_v[1]["available"], reverse=True
+        documents.items(), key=lambda k_v: k_v[1].available, reverse=True
     ):
-        if page.get("status", "") == "draft":
+        if page.status == "draft":
             collections["cd68b918-ac5f-4d6c-abb5-a55a0318846b"].append(page)
             continue
-        elif "nofeed" in page.get("options", []):
+        elif "nofeed" in page.options:
             sitemap.append(page)
             continue
         else:
             sitemap.append(page)
             collections["everything"].append(page)
-            collections[page["primary"]].append(page)
-            collections[page["secondary"]].append(page)
+            collections[page.primary].append(page)
+            collections[page.secondary].append(page)

-            for tag in page.get("tags", []):
+            for tag in page.tags:
                 collections[tag].append(page)

-            if page["secondary"] in [
+            if page.secondary in [
                 "essays",
                 "wandering",
                 "rambling",
@@ -1008,8 +1102,8 @@ def build_collections(


 def output_html(
-    assets: Dict[str, Dict[str, Any]],
-    documents: Dict[str, Dict[str, Any]],
+    assets: Dict[str, DocumentMetadata],
+    documents: Dict[str, DocumentMetadata],
     collections: Dict[str, List[Dict[str, Any]]],
     site: SiteMetadata,
     env: Environment,
@@ -1018,7 +1112,7 @@ def output_html(
     logger.info("Generating Hypertext")

     for key, page in documents.items():
-        template_file = page.get("layout", TEMPLATE_DEFAULT)
+        template_file = page.layout
         template = env.get_template(template_file)

         collection = build_page_collection(page, collections)
@@ -1028,28 +1122,29 @@ def output_html(
             assets=assets,
             collections=collections,
             collection=collection,
-            page=page,
+            page=asdict(page),
             site=site,
         )

-        output_path = output_dir / page["slug"] / "index.html"
+        output_path = output_dir / page.slug / "index.html"
         output_path.parent.mkdir(parents=True, exist_ok=True)

         with open(output_path, "w") as f:
             f.write(output)

-        logger.debug(f"  {page['filename']} >> {output_path}")
+        logger.debug(f"  {page.filepath} >> {output_path}")


 def build_page_collection(page, collections):
     try:
         collection = [
             item
-            for include in page["collection"]["include"]
+            for include in page.collection["include"]
             for item in collections[include]
         ]
-        return sorted(collection, key=lambda x: x["available"], reverse=True)
+        return sorted(collection, key=lambda x: x.available, reverse=True)
     except KeyError:
+        print(f"Failed collection for {page.filepath}")
         return []


@@ -1126,8 +1221,7 @@ async def main():
     generate_html(documents)

     # Build backlinks and collections
-    interlink_count = build_backlinks(documents, site)
-    site.backlinks += interlink_count
+    build_backlinks(documents, site)
     collections, sitemap = build_collections(documents, site)

     # Output HTML, feeds, and sitemap
@@ -1142,7 +1236,8 @@ async def main():
     logger.info("Build complete!")
     logger.info(f"Pages: {site.pagecount}")
     logger.info(f"Words: {site.words["total"]}")
-    logger.info(f"Interlinks: {interlink_count}")
+    logger.info(f"Internal links: {site.backlinks}")
+    logger.info(f"External links: {len(site.links["external"])}")


 if __name__ == "__main__":

CSS for responsive image galleries at all viewport widths

CSS is pure mayhem, but whatever, it’s what we’ve got. For a long time I’ve had reasonably good responsive image galleries. Really the last remaining roadblocks has been with those galleries where I’ve decided I need to vary the width of images for aesthetic reasons

Well, today I improved the style behaviour in half of those cases.

The only caveat to this set of styles is that you have to be careful when using the wide-first or wide-last classes as they can leave an orphan image out in space if you don’t consider how many (odd or even) images are in the gallery or if they are of very different aspect ratios. But besides those I’ve found this to be pretty rock solid, if a little verbose.

Here’s the relevant cascade,

root {
  --body-width: 820px;
  --gallery-spacing: clamp(2px, 4px, 1rem);
  --gallery-transition: 0.3s ease-in-out;
}

.gallery {
  display: grid;
  gap: var(--gallery-spacing);
  width: 100%;
  margin: 2rem 0;
}

.gallery > img,
.gallery > picture {
  width: 100%;
  height: 100%;
}

.gallery > img,
.gallery > picture img {
  width: 100%;
  height: 100%;
  object-fit: cover;
  border-radius: var(--image-radius);
  transition: transform var(--gallery-transition);
}

.gallery > img:hover,
.gallery > picture:hover img {
  transform: scale(1.01);
  cursor: zoom-in;
}

/* Dynamic columns based on number of children */
.gallery:has(> :nth-child(1):nth-last-child(1)) {
  grid-template-columns: 1fr;
}

.gallery:has(> :nth-child(1):nth-last-child(2)),
.gallery:has(> :nth-child(2):nth-last-child(1)) {
  grid-template-columns: repeat(2, 1fr);
}

.gallery:has(> :nth-child(1):nth-last-child(3)),
.gallery:has(> :nth-child(2):nth-last-child(2)),
.gallery:has(> :nth-child(3):nth-last-child(1)) {
  grid-template-columns: repeat(3, 1fr);
}

.gallery:has(> :nth-child(n + 4)) {
  grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
}

/* Layout variants */
.gallery.one-wide {
  grid-template-columns: 1fr !important;
}

.gallery.two-wide {
  grid-template-columns: repeat(2, 1fr) !important;
}

.gallery.four-wide {
  grid-template-columns: repeat(auto-fit, minmax(185px, 1fr));
}

.gallery.wide-first > :first-child,
.gallery.wide-last > :last-child {
  grid-column: 1 / -1;
  max-height: 80vh;
}

@media (max-width: 768px) {
  .gallery:has(> :nth-child(1):nth-last-child(3)),
  .gallery:has(> :nth-child(2):nth-last-child(2)),
  .gallery:has(> :nth-child(3):nth-last-child(1)) {
    grid-template-columns: repeat(2, 1fr);
  }
  .gallery.four-wide,
  .gallery:has(> :nth-child(n + 4)) {
    grid-template-columns: repeat(2, 1fr);
  }
  .gallery:has(> :nth-child(3):nth-last-child(1)) > :last-child {
    grid-column: span 2;
  }
}

@media (max-width: 480px) {
  .gallery {
    grid-template-columns: 1fr;
  }
  .gallery.four-wide,
  .gallery:has(> :nth-child(n + 4)) {
    grid-template-columns: repeat(2, 1fr);
  }

  /* Disable hover effects on touch devices */
  .gallery > img:hover,
  .gallery > picture:hover img {
    transform: none;
  }

}

/* Full viewport width variant */
.gallery.full-viewport {
  width: 100vw;
  margin-left: calc(50% - 50vw);
  margin-right: calc(50% - 50vw);
}

.gallery.full-viewport img,
.gallery.full-viewport picture {
  transform: none !important;
  border-radius: 0;
}

/* Square aspect ratio class */
.gallery.square-items > img,
.gallery.square-items > picture img {
  aspect-ratio: 1;
}

Distinguishing between prose and code when counting words

I’m including more and larger code snippets in documents throughout the site — such as this one — and I didn’t like that these were being counted amongst the ‘wordcount’ of the site. I’ve always distinguished between my words and reference words — when you see a quote in one of my documents, that isn’t counted in my own wordcount — but now it’s time do distinguish more granularly between my prose and my code.

The first change was to the SiteMetadata dataclass.

@dataclass
class SiteMetadata:
...
    words: Dict[str, Any] = field(
        default_factory=lambda: {
            "self": 0,
            "drafts": 0,
            "code": {
                "lines": 0,
                "words": 0,
            },
            "references": 0,
        }
    )
...

The same block was also added to the DocumentMetadata dataclass, just without the drafts key.

The logic for tallying up the totals isn’t very elegant, I consider this a first draft, but it works. First the total document is summed,

    for key, page in documents.items():
        prose_wordcount = len(page.content.get("plain", "").split())
        references_wordcount = len(page.source.get("text", "").split())
        if page.status == "draft":
            page.words["self"] += prose_wordcount
            site.words["drafts"] += prose_wordcount
        else:
            page.words["self"] += prose_wordcount
            site.words["self"] += prose_wordcount

            site.words["references"] += references_wordcount
            page.words["references"] += references_wordcount

        logger.debug(f"  {key}, {page.title[:40]}")

And then, during the syntax highlighting, the code lines are tallied up and then subtracted from the prose lines.

def save_code_block(match):
    leading_space = match.group(1)
    raw_html_marker = match.group(2)
    language = match.group(3)
    code = match.group(4).rstrip()
    trailing_space = match.group(5)
    code_words = len(code.split())
    code_lines = len(code.splitlines())
    page.words["code"]["lines"] += code_lines
    page.words["code"]["words"] += code_words
    site.words["code"]["lines"] += code_lines
    site.words["code"]["words"] += code_words
    # Remove the wordcount of codeblocks from the prose wordcounts
    page.words["self"] -= code_words
    site.words["self"] -= code_words

    ...

The result is exactly what I wanted, but the method isn’t super elegant.

Including this post, the current counts are as follows,

222,852 words

    136,784 of my own published words
     21,441 words of unpublished drafts
     64,627 words of quotes and reference material
      6,659 lines of code
     24,182 words of code

Include all external URLs in site statistics

The stats page has long shown a count of the number of unique external links referenced on the site. Before this change the stats showed 483 unique external links, the relevant code is contained in the snippet below,

def extract_external_links(text: str) -> List:
    url_pattern = r"(https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+(?:/[^)\s]*)?)"
    matches = re.findall(url_pattern, text)

    external_links = set()
    for url in matches:
        parsed_url = urlparse(url)
        if parsed_url.netloc.lower() != "silasjelley.com":
            external_links.add(url)

    return list(external_links)

...

try:
    plain_text = document["content"]["plain"]
    external_links = extract_external_links(plain_text)

    document["links"]["external"].update(external_links)
    site.links["external"].update(external_links)
except KeyError:
    pass

Up to now I’ve only included links within the main body of a page (content["plain"]), but my document format has evolved since then with more and more links being structured into the [source] and [via] tables.

First I wanted to grasp what kind of a difference this would make so, as usual, I started with a shell pipeline search through the output of the build script,

rg --no-filename \
   --only-matching \
   --no-line-number \
   'http.*?://[^ \[\]"&<>)]*' \
   **/*.html | \
   sed \
     -e "s/'.*//" \
     -e '/silasjelley\.com/d' \
     -e 's,/$,,' \
     -e 's/#.*//' | \
     sort | \
     uniq | \
     wc -l

The regex for link matching was pretty coarse so I tidied/normalised the output and removed internal links with some sed patterns, before sorting, de-duping, and counting ( sort | uniq | wc -l ).

The result: 907 unique external links, almost double the 483 contained purely in the main body content.

Now that number is slightly inflated as it includes links to external APIs such as https://api.maptiler.com/maps/topo-v2/{z}/{x}/{y}.png?key=APIKEY for my walk map, so our final number should be slightly lower.

This change includes the [source] and [via] tables when searching for links,

try:
    plain_text = document.get("content", {}).get("plain", "") + " "
    plain_text += document.get("source", {}).get("url", "") + " "
    plain_text += document.get("via", {}).get("url", "") + " "

    external_links = extract_external_links(plain_text)

    document["links"]["external"].update(external_links)
    site.links["external"].update(external_links)
except KeyError:
    print(f"KeyError while compiling external links from {document['filename']}")
    pass

After the change, the stats reflect 869 unique external links.

Switch image references to use picture elements and AVIF encoded files

Up to now, image references, eg. [::ImageAlt](img:​:5f39dd4e) have functioned like so:

  • The source section (between parentheses) is substituted directly from the asset manifest, in this case referenced by the first 8-chars of its UUIDv4.
  • If the Alt-text [between square brackets] begins with ‘::’, it is also replaced by the description from the corresponding asset manifest, else it is left as is.

This has worked well, but I’ve been wanting to add AVIF support in order to take advantage of its better compression performance. AVIF is not universally supported, so a fallback (eg. JPEG) is necessary. The HTML <picture> element is well suited to this. The Djot image syntax has been bypassed, image references are now directly replaced with a final HTML snippet like so,

replacement = f"""`​`​`=html
<picture>
<source srcset="{Path(ref_slug).with_suffix(".avif")}" type="image/avif" />
<img alt="{ref_text}" src="{ref_slug}">
</picture>
`​`​`"""

The actual AVIF creation support was added using pillow_avif-plugin, a plugin for the Pillow library I was already using for compressing JPEGs. Most of the relevant code is in the process_image_parallel function,

def process_image_parallel(input_data: Tuple[Path, Path, int, str]) -> None:
    workaround_import = pillow_avif.AvifImagePlugin
    input_image, output_path, output_width, uid = input_data
    lock_path = output_path.with_suffix(".lock")
    lock = FileLock(str(lock_path))

    # Define AVIF output path
    avif_output_path = output_path.with_suffix(".avif")

    # Check if AVIF support is available
    avif_available = "AVIF" in Image.SAVE

    try:
        with lock:
            if output_path.exists() and avif_output_path.exists():
                return

            os.makedirs(output_path.parent, exist_ok=True)

            with Image.open(input_image) as im:
                original_format = im.format
                im = ImageOps.exif_transpose(im)
                output_height = int(im.size[1] * (output_width / im.size[0]))

                with im.resize(
                    (output_width, output_height), Image.Resampling.LANCZOS
                ) as output_image:
                    # Save JPEG version
                    if (
                        original_format != "JPEG"
                        and str(output_path).endswith("jpg")
                        and output_image.mode in ("RGBA", "P")
                    ):
                        output_image = output_image.convert("RGB")

                    output_image.save(output_path, quality=85, optimize=True)

                    # Save AVIF version only if support is available
                    if avif_available:
                        try:
                            if output_image.mode in ("RGBA", "P"):
                                avif_image = output_image.convert("RGB")
                            else:
                                avif_image = output_image.copy()

                            avif_image.save(
                                avif_output_path,
                                format="AVIF",
                                quality=60,  # Lower quality for better compression, still maintains good visual quality
                                speed=5,  # Slowest speed = best compression (0 is slowest, 10 is fastest)
                                bits=10,  # Use 10-bit color depth for better quality-to-size ratio
                                compress_level=8,  # Highest compression level (range 0-8)
                                color_space="bt709",  # Use YUV BT.709 color space
                                chroma=0,  # 4:4:4 chroma sampling (0=4:4:4, 1=4:2:0, 2=4:2:2)
                                num_threads=0,  # Use all available CPU threads for encoding
                            )
                            logger.debug(
                                f"Processed image: {input_image} -> {output_path} and {avif_output_path}"
                            )
                        except Exception as e:
                            logger.error(
                                f"Error saving AVIF version of {input_image}: {e}"
                            )
                    else:
                        logger.error(
                            "AVIF support not available. Skipping AVIF conversion."
                        )
                        logger.debug(f"Processed image: {input_image} -> {output_path}")

    except OSError as e:
        logger.error(f"OS error processing {input_image}: {e}")
    except Exception as e:
        logger.error(f"Error processing {input_image}: {e}")
    finally:
        if lock_path.exists():
            try:
                lock_path.unlink()
            except OSError:
                pass

It was also necessary to update my gallery/lightbox JavaScript code, as the previous version was loading the JPEG fallback when the lightbox opened, wasting bandwidth.

class Lightbox {
  constructor(galleryElement) {
    this.galleryElement = galleryElement;
    this.galleryItems = this.galleryElement.querySelectorAll('img, picture');
    this.currentIndex = 0;
    this.overlay = null;
    this.content = null;
    this.image = null;
    this.caption = null;
    this.closeButton = null;
    this.prevButton = null;
    this.nextButton = null;
    this.isOpen = false;
    this.init();
    this.imageCount = galleryElement.querySelectorAll('img').length;
  }

  init() {
    this.createLightbox();
    this.addEventListeners();
  }

  createLightbox() {
    this.overlay = document.createElement('div');
    this.overlay.className = 'lightbox-overlay';
    this.overlay.innerHTML = `
      <div class="lightbox-content">
        <img src="" alt="" class="lightbox-image">
        <p class="lightbox-caption"></p>
        <button class="lightbox-prev">&lt;</button>
        <button class="lightbox-next">&gt;</button>
      </div>
      <button class="lightbox-close">&times;</button>
    `;
    document.body.appendChild(this.overlay);

    this.content = this.overlay.querySelector('.lightbox-content');
    this.image = this.overlay.querySelector('.lightbox-image');
    this.caption = this.overlay.querySelector('.lightbox-caption');
    this.closeButton = this.overlay.querySelector('.lightbox-close');
    this.prevButton = this.overlay.querySelector('.lightbox-prev');
    this.nextButton = this.overlay.querySelector('.lightbox-next');
  }

  getImageDetails(element) {
    if (element.tagName.toLowerCase() === 'picture') {
      const img = element.querySelector('img');
      return {
        // currentSrc gives us the source that the browser actually loaded
        src: img.currentSrc || img.src,
        alt: img.alt
      };
    } else {
      return {
        src: element.currentSrc || element.src,
        alt: element.alt
      };
    }
  }

  addEventListeners() {
    this.galleryItems.forEach((item, index) => {
      // For picture elements, we need to attach the listener to the img inside
      const clickTarget = item.tagName.toLowerCase() === 'picture'
        ? item.querySelector('img')
        : item;

      clickTarget.addEventListener('click', (e) => {
        e.preventDefault();
        this.openLightbox(index);
      });
    });

    this.overlay.addEventListener('click', (e) => {
      if (!this.content.contains(e.target) || e.target === this.content) {
        this.closeLightbox();
      }
    });

    this.closeButton.addEventListener('click', () => this.closeLightbox());

    document.addEventListener('keydown', (e) => {
      if (this.isOpen) {
        if (e.key === 'Escape') this.closeLightbox();
        if (e.key === 'ArrowLeft') this.showPrevious();
        if (e.key === 'ArrowRight') this.showNext();
      }
    });

    window.addEventListener('popstate', () => {
      if (this.isOpen) {
        this.closeLightbox(false);
      }
    });

    this.prevButton.addEventListener('click', () => this.showPrevious());
    this.nextButton.addEventListener('click', () => this.showNext());
  }

  openLightbox(index) {
    this.currentIndex = index;
    this.updateLightboxContent();
    this.overlay.style.display = 'block';
    document.body.style.overflow = 'hidden';
    this.isOpen = true;
    history.pushState({ lightboxOpen: true }, '');
  }

  closeLightbox(pushState = true) {
    this.overlay.style.display = 'none';
    document.body.style.overflow = '';
    this.isOpen = false;
    if (pushState) {
      history.pushState({ lightboxOpen: false }, '');
    }
  }

updateLightboxContent() {
  const currentItem = this.galleryItems[this.currentIndex];
  const { src, alt } = this.getImageDetails(currentItem);

  this.image.src = src;
  this.image.alt = alt;
  this.caption.textContent = alt;

  // Use imageCount instead of galleryItems.length
  if (this.imageCount <= 1) {
    this.prevButton.style.display = 'none';
    this.nextButton.style.display = 'none';
  } else {
    this.prevButton.style.display = 'block';
    this.nextButton.style.display = 'block';
  }
}

  showPrevious() {
    this.currentIndex = (this.currentIndex - 1 + this.galleryItems.length) % this.galleryItems.length;
    this.updateLightboxContent();
  }

  showNext() {
    this.currentIndex = (this.currentIndex + 1) % this.galleryItems.length;
    this.updateLightboxContent();
  }
}

// Initialize the lightbox for each gallery
document.addEventListener('DOMContentLoaded', () => {
  const galleries = document.querySelectorAll('.gallery');
  galleries.forEach(gallery => new Lightbox(gallery));
});

The result: higher image quality (raised image default width from 1400px to 1600px) and half the file-size, eg. my nonsense page has been cut from 7.8MB transferred to load, to 3.45MB. Or, to compare only the actual images, 6.8MB to 2.94MB, a 57% saving. At higher quality! Madness.

Coupled with the previous change — Convert heavy PNGs to JPEG where appropriate — that page has been cut from 15.9 to 3.45MB. Happy with that.

Small problem with the new image reference solution: classes specified in Djot are no longer passed through as the Djot rendering is bypassed. Most images on the site are wrapped in a ‘gallery’ div, so I can still class that, but it does create a small annoyance for bare images.

Switched to non-breaking spaces in Chapter, Volume, Edition, and Page attribution

A very minor change to resolve a bit of acute visual nit-picking. The space in eg, ‘p. 56’, ‘Ch. 7’, ‘Vol. 9’, or ‘(3rd edition)’, created by my attribution logic, could cause the two parts to appear on separate lines in a long attribution string or a narrow viewport. Switching to a non-breaking space ‘ ’ (visually indistinguishable from a normal space) means they will always appear side by side.

Unicode codepoint U+00A0, commonly abbreviated as NBSP.

Convert heavy PNGs to JPEG where appropriate

While viewing my nonsense page with the network tab of the developer tools open I noticed the page was transferring a total of 15.8MB to load. The page loads hundreds of posts so this isn’t completely absurd, but it felt excessive. Turns out most of the weight was coming from a handful of PNG posters and screenshots.

In the general case, JPEGs are smaller than PNGs, but I didn’t want to hardcode conversion of all PNGs because there are times when lossless or transparent images are wanted.

I toyed around with a couple of implementations. My first prototype added a convert_to key to the asset manifests, but in the end I decided to simply convert on the basis of the output_path suffix being different than the input_path suffix in a manifest entry, allowing every stage of the build to seamlessly pick up the correct path to reference.

Switching those few heavy-weight PNGs to JPEG output cut the page weight from 15.8MB down to 5.18MB.

The most egregious offender was a lossless poster for Blade Runner (1982) coming in at 2.8MB initially, down to 240kB after conversion to JPEG. Shown below, the only change required in a manifest entry is to change the suffix/file-extension.

[27991f33-f40f-48ab-9690-0003db7b2038]
type = "poster"
title = "The poster for the film Blade Runner (1982)"
filepath = "/home/silas/library/images/posters/films/1982_Blade-Runner.png"
slug = "library/images/posters/films/1982_Blade-Runner.jpg"

The only change I needed to make to the build script was to check the colour-space when converting to JPEG and switch it to RGB in the case that the source colour-space was either ‘RGBA’ or ‘P’:

def process_image_parallel(input_data: Tuple[Path, Path, int]) -> None:
    input_image, output_path, output_width = input_data
    lock_path = output_path.with_suffix(".lock")
    lock = FileLock(str(lock_path))

    try:
        with lock:
            if output_path.exists():
                return

            os.makedirs(output_path.parent, exist_ok=True)

            with Image.open(input_image) as im:
                original_format = im.format
                im = ImageOps.exif_transpose(im)
                output_height = int(im.size[1] * (output_width / im.size[0]))

                with im.resize(
                    (output_width, output_height), Image.Resampling.LANCZOS
                ) as output_image:
                    if (
                        original_format != "JPEG"
                        and str(output_path).endswith("jpg")
                        and output_image.mode in ("RGBA", "P")
                    ):
                        output_image = output_image.convert("RGB")

                    output_image.save(output_path, quality=85, optimize=True)

            logger.debug(f"Processed image: {input_image} -> {output_path}")

    except OSError as e:
        logger.error(f"OS error processing {input_image}: {e}")
    except Exception as e:
        logger.error(f"Error processing {input_image}: {e}")
    finally:
        if lock_path.exists():
            try:
                lock_path.unlink()
            except OSError:
                pass

Import referenced documents at build time

Added import references, restoring and improving the functionality of the ‘transclude’ reference that I previously deprecated and removed.

Has two modes, both of which support comments.

  • import, transcludes the content of the referenced document as is, without any additional markup or visual cue.
  • aside, creates an aside-type block that also includes an outbound link.

Both support an optional comment at the reference-site that is discarded during substitution.

The following snippet (minus the backslash),

  {{ aside::501108be#getting arrested again }\}

Would be substituted with this markup,

  {.aside}
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
The first time you get arrested your friends ask you, "What happened?!"\
The second time you get arrested it's already, "What did you do?"
`<time class="smallcaps">`{=html}[November  3, 2024 3.13PM  Istanbul, Turkey](/2024/11/03/151330)`</time>`{=html}
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Which would then be rendered into the following later in the build,

The first time you get arrested your friends ask you, “What happened?!”
The second time you get arrested it’s already, “What did you do?”

Both forms of import create a backlink in the referenced document, but only aside creates a visible outward link at the reference-site. This allows documents to be assembled seamlessly where that is a better fit than a call-out, but generally I expect I’ll use the aside mode more often, as I generally prefer for connections to be bi-directional.

Add syntax highlighting to code blocks using Pygments

Implementation was pretty straight forward after getting the regex right to treat ‘=html’ blocks as raw.

Changes located in the run_jotdown function and the new highlightcode_ function, as well as fiddling with css.

def highlight_code(code: str, language: str = None) -> str:
    """
    Highlight code using Pygments with specified or guessed language.
    """
    try:
        if language:
            lexer = get_lexer_by_name(language.lower())
        else:
            lexer = guess_lexer(code)

        formatter = HtmlFormatter(
            style=config["syntax_highlighting"]["style"],
            linenos="table"
            if config["syntax_highlighting"].get("line_numbers", False)
            else False,
            cssclass="highlight",
        )
        return highlight(code, lexer, formatter)
    except ClassNotFound:
        # If language isn't found, return code wrapped in pre tags
        return f"<pre><code>{code}</code></pre>"


def run_jotdown(plaintext: str) -> str:
    """
    Modified to handle code blocks with syntax highlighting.
    Fixed to properly handle both raw HTML and HTML code blocks.
    """
    CODE_BLOCK_RE = re.compile(r"\`\`\`\`*(=html|\s*(?:(\w+)\n))?(.*?)\`\`\`\`*", re.DOTALL)
    code_blocks = []
    marker_template = "§CODE_BLOCK_{}§"

    def save_code_block(match):
        raw_html_marker = match.group(1)
        language = match.group(2)
        code = match.group(3).strip()

        # Check if this is a raw HTML block
        if raw_html_marker == "=html":
            return f"\`\`\`=html\n{code}\n\`\`\`"

        # For all other cases, including 'html' language, highlight the code
        highlighted = highlight_code(code, language)
        marker = marker_template.format(len(code_blocks))
        code_blocks.append(highlighted)
        return f"\`\`\`=html\n{marker}\n\`\`\`"

    # First, replace all code blocks with markers
    processed_text = CODE_BLOCK_RE.sub(save_code_block, plaintext)

    # Run through jotdown
    html = run("jotdown", input=processed_text, text=True, capture_output=True).stdout

    # Replace markers with actual highlighted code
    for i, code in enumerate(code_blocks):
        marker = marker_template.format(i)
        html = html.replace(marker, code)

    return html

Note that in the above snippet, the backticks (`) have been escaped to prevent the markup on this page from breaking, remove the backslashes if you use this code yourself.

Click here to see the CSS I settled on
:root {
  --highlight-font-size: 0.9em;
  --highlight-border-radius: 6px;
  --highlight-padding: 1rem;
  --highlight-line-number-color: #6e7681;
}

.highlight {
  font-size: var(--highlight-font-size, 0.9em);
  border-radius: var(--highlight-border-radius, 6px);
  padding: var(--highlight-padding, 1rem);
}

.highlight pre {
  margin: 0;
  overflow-x: auto;
}

.highlight table {
  border-spacing: 0;
  border: none;
  margin: 0;
}

.highlight table td {
  padding: 0;
  border: none;
}

.highlight .linenos {
  user-select: none;
  padding-right: 1rem;
  color: var(--highlight-line-number-color);
  text-align: right;
  width: 1%;
  white-space: nowrap;
}

@media (prefers-color-scheme: dark) {
  .highlight {
    background: var(--code-background, #262220) !important;
  }
  pre {
    line-height: 125%;
  }
  td.linenos .normal {
    color: #4e4e4e;
    background-color: transparent;
    padding-left: 5px;
    padding-right: 5px;
  }
  span.linenos {
    color: #4e4e4e;
    background-color: transparent;
    padding-left: 5px;
    padding-right: 5px;
  }
  td.linenos .special {
    color: #8f9494;
    background-color: #ffffc0;
    padding-left: 5px;
    padding-right: 5px;
  }
  span.linenos.special {
    color: #8f9494;
    background-color: #ffffc0;
    padding-left: 5px;
    padding-right: 5px;
  }
  .highlight .hll {
    background-color: #ddd0c0;
  }
  .highlight {
    background: #262220;
    color: #ddd0c0;
  }
  .highlight .c {
    color: #70757a;
  } /* Comment */
  .highlight .err {
    color: #af5f5f;
  } /* Error */
  .highlight .esc {
    color: #ddd0c0;
  } /* Escape */
  .highlight .g {
    color: #ddd0c0;
  } /* Generic */
  .highlight .k {
    color: #919191;
  } /* Keyword */
  .highlight .l {
    color: #af875f;
  } /* Literal */
  .highlight .n {
    color: #ddd0c0;
  } /* Name */
  .highlight .o {
    color: #878787;
  } /* Operator */
  .highlight .x {
    color: #ddd0c0;
  } /* Other */
  .highlight .p {
    color: #ddd0c0;
  } /* Punctuation */
  .highlight .ch {
    color: #8f9f9f;
  } /* Comment.Hashbang */
  .highlight .cm {
    color: #70757a;
  } /* Comment.Multiline */
  .highlight .cp {
    color: #fdd0c0;
  } /* Comment.Preproc */
  .highlight .cpf {
    color: #c9b98f;
  } /* Comment.PreprocFile */
  .highlight .c1 {
    color: #70757a;
  } /* Comment.Single */
  .highlight .cs {
    color: #af5f5f;
  } /* Comment.Special */
  .highlight .gd {
    color: #bb6868;
  } /* Generic.Deleted */
  .highlight .ge {
    color: #ddd0c0;
    font-style: italic;
  } /* Generic.Emph */
  .highlight .ges {
    color: #ddd0c0;
  } /* Generic.EmphStrong */
  .highlight .gr {
    color: #af5f5f;
  } /* Generic.Error */
  .highlight .gh {
    color: #ddd0c0;
  } /* Generic.Heading */
  .highlight .gi {
    color: #849155;
  } /* Generic.Inserted */
  .highlight .go {
    color: #ddd0c0;
  } /* Generic.Output */
  .highlight .gp {
    color: #ddd0c0;
  } /* Generic.Prompt */
  .highlight .gs {
    color: #ddd0c0;
    font-weight: bold;
  } /* Generic.Strong */
  .highlight .gu {
    color: #ddd0c0;
  } /* Generic.Subheading */
  .highlight .gt {
    color: #af5f5f;
  } /* Generic.Traceback */
  .highlight .kc {
    color: #875f5f;
  } /* Keyword.Constant */
  .highlight .kd {
    color: #875f5f;
  } /* Keyword.Declaration */
  .highlight .kn {
    color: #875f5f;
  } /* Keyword.Namespace */
  .highlight .kp {
    color: #919191;
  } /* Keyword.Pseudo */
  .highlight .kr {
    color: #b46276;
  } /* Keyword.Reserved */
  .highlight .kt {
    color: #af875f;
  } /* Keyword.Type */
  .highlight .ld {
    color: #af875f;
  } /* Literal.Date */
  .highlight .m {
    color: #87afaf;
  } /* Literal.Number */
  .highlight .s {
    color: #c9b98f;
  } /* Literal.String */
  .highlight .na {
    color: #ddd0c0;
  } /* Name.Attribute */
  .highlight .nb {
    color: #ddd0c0;
  } /* Name.Builtin */
  .highlight .nc {
    color: #875f5f;
  } /* Name.Class */
  .highlight .no {
    color: #af8787;
  } /* Name.Constant */
  .highlight .nd {
    color: #fdd0c0;
  } /* Name.Decorator */
  .highlight .ni {
    color: #ddd0c0;
  } /* Name.Entity */
  .highlight .ne {
    color: #877575;
  } /* Name.Exception */
  .highlight .nf {
    color: #fdd0c0;
  } /* Name.Function */
  .highlight .nl {
    color: #ddd0c0;
  } /* Name.Label */
  .highlight .nn {
    color: #ddd0c0;
  } /* Name.Namespace */
  .highlight .nx {
    color: #ddd0c0;
  } /* Name.Other */
  .highlight .py {
    color: #dfaf87;
  } /* Name.Property */
  .highlight .nt {
    color: #87afaf;
  } /* Name.Tag */
  .highlight .nv {
    color: #ddd0c0;
  } /* Name.Variable */
  .highlight .ow {
    color: #878787;
  } /* Operator.Word */
  .highlight .pm {
    color: #ddd0c0;
  } /* Punctuation.Marker */
  .highlight .w {
    color: #ddd0c0;
  } /* Text.Whitespace */
  .highlight .mb {
    color: #87afaf;
  } /* Literal.Number.Bin */
  .highlight .mf {
    color: #87afaf;
  } /* Literal.Number.Float */
  .highlight .mh {
    color: #87afaf;
  } /* Literal.Number.Hex */
  .highlight .mi {
    color: #87afaf;
  } /* Literal.Number.Integer */
  .highlight .mo {
    color: #87afaf;
  } /* Literal.Number.Oct */
  .highlight .sa {
    color: #dfaf87;
  } /* Literal.String.Affix */
  .highlight .sb {
    color: #c9b98f;
  } /* Literal.String.Backtick */
  .highlight .sc {
    color: #c9b98f;
  } /* Literal.String.Char */
  .highlight .dl {
    color: #c9b98f;
  } /* Literal.String.Delimiter */
  .highlight .sd {
    color: #878787;
  } /* Literal.String.Doc */
  .highlight .s2 {
    color: #c9b98f;
  } /* Literal.String.Double */
  .highlight .se {
    color: #af5f5f;
  } /* Literal.String.Escape */
  .highlight .sh {
    color: #c9b98f;
  } /* Literal.String.Heredoc */
  .highlight .si {
    color: #af5f5f;
  } /* Literal.String.Interpol */
  .highlight .sx {
    color: #fdd0c0;
  } /* Literal.String.Other */
  .highlight .sr {
    color: #af5f5f;
  } /* Literal.String.Regex */
  .highlight .s1 {
    color: #c9b98f;
  } /* Literal.String.Single */
  .highlight .ss {
    color: #af5f5f;
  } /* Literal.String.Symbol */
  .highlight .bp {
    color: #87afaf;
  } /* Name.Builtin.Pseudo */
  .highlight .fm {
    color: #fdd0c0;
  } /* Name.Function.Magic */
  .highlight .vc {
    color: #ddd0c0;
  } /* Name.Variable.Class */
  .highlight .vg {
    color: #ddd0c0;
  } /* Name.Variable.Global */
  .highlight .vi {
    color: #ddd0c0;
  } /* Name.Variable.Instance */
  .highlight .vm {
    color: #ddd0c0;
  } /* Name.Variable.Magic */
  .highlight .il {
    color: #87afaf;
  } /* Literal.Number.Integer.Long */
}

@media (prefers-color-scheme: light) {
  .highlight {
    background: var(--code-background, #e7daca) !important;
  }
  pre {
    line-height: 125%;
  }
  td.linenos .normal {
    color: inherit;
    background-color: transparent;
    padding-left: 5px;
    padding-right: 5px;
  }
  span.linenos {
    color: inherit;
    background-color: transparent;
    padding-left: 5px;
    padding-right: 5px;
  }
  td.linenos .special {
    color: #000000;
    background-color: #ffffc0;
    padding-left: 5px;
    padding-right: 5px;
  }
  span.linenos.special {
    color: #000000;
    background-color: #ffffc0;
    padding-left: 5px;
    padding-right: 5px;
  }
  .highlight .hll {
    background-color: #ffffcc;
  }
  .highlight {
    background: #ffffff;
  }
  .highlight .c {
    color: #999988;
    font-style: italic;
  } /* Comment */
  .highlight .err {
    color: #a61717;
    background-color: #e3d2d2;
  } /* Error */
  .highlight .k {
    font-weight: bold;
  } /* Keyword */
  .highlight .o {
    font-weight: bold;
  } /* Operator */
  .highlight .ch {
    color: #999988;
    font-style: italic;
  } /* Comment.Hashbang */
  .highlight .cm {
    color: #999988;
    font-style: italic;
  } /* Comment.Multiline */
  .highlight .cp {
    color: #999999;
    font-weight: bold;
  } /* Comment.Preproc */
  .highlight .cpf {
    color: #999988;
    font-style: italic;
  } /* Comment.PreprocFile */
  .highlight .c1 {
    color: #999988;
    font-style: italic;
  } /* Comment.Single */
  .highlight .cs {
    color: #999999;
    font-weight: bold;
    font-style: italic;
  } /* Comment.Special */
  .highlight .gd {
    color: #000000;
    background-color: #ffdddd;
  } /* Generic.Deleted */
  .highlight .ge {
    font-style: italic;
  } /* Generic.Emph */
  .highlight .ges {
    font-weight: bold;
    font-style: italic;
  } /* Generic.EmphStrong */
  .highlight .gr {
    color: #aa0000;
  } /* Generic.Error */
  .highlight .gh {
    color: #999999;
  } /* Generic.Heading */
  .highlight .gi {
    color: #000000;
    background-color: #ddffdd;
  } /* Generic.Inserted */
  .highlight .go {
    color: #888888;
  } /* Generic.Output */
  .highlight .gp {
    color: #555555;
  } /* Generic.Prompt */
  .highlight .gs {
    font-weight: bold;
  } /* Generic.Strong */
  .highlight .gu {
    color: #aaaaaa;
  } /* Generic.Subheading */
  .highlight .gt {
    color: #aa0000;
  } /* Generic.Traceback */
  .highlight .kc {
    font-weight: bold;
  } /* Keyword.Constant */
  .highlight .kd {
    font-weight: bold;
  } /* Keyword.Declaration */
  .highlight .kn {
    font-weight: bold;
  } /* Keyword.Namespace */
  .highlight .kp {
    font-weight: bold;
  } /* Keyword.Pseudo */
  .highlight .kr {
    font-weight: bold;
  } /* Keyword.Reserved */
  .highlight .kt {
    color: #445588;
    font-weight: bold;
  } /* Keyword.Type */
  .highlight .m {
    color: #009999;
  } /* Literal.Number */
  .highlight .s {
    color: #bb8844;
  } /* Literal.String */
  .highlight .na {
    color: #008080;
  } /* Name.Attribute */
  .highlight .nb {
    color: #999999;
  } /* Name.Builtin */
  .highlight .nc {
    color: #445588;
    font-weight: bold;
  } /* Name.Class */
  .highlight .no {
    color: #008080;
  } /* Name.Constant */
  .highlight .ni {
    color: #800080;
  } /* Name.Entity */
  .highlight .ne {
    color: #990000;
    font-weight: bold;
  } /* Name.Exception */
  .highlight .nf {
    color: #990000;
    font-weight: bold;
  } /* Name.Function */
  .highlight .nn {
    color: #555555;
  } /* Name.Namespace */
  .highlight .nt {
    color: #000080;
  } /* Name.Tag */
  .highlight .nv {
    color: #008080;
  } /* Name.Variable */
  .highlight .ow {
    font-weight: bold;
  } /* Operator.Word */
  .highlight .w {
    color: #bbbbbb;
  } /* Text.Whitespace */
  .highlight .mb {
    color: #009999;
  } /* Literal.Number.Bin */
  .highlight .mf {
    color: #009999;
  } /* Literal.Number.Float */
  .highlight .mh {
    color: #009999;
  } /* Literal.Number.Hex */
  .highlight .mi {
    color: #009999;
  } /* Literal.Number.Integer */
  .highlight .mo {
    color: #009999;
  } /* Literal.Number.Oct */
  .highlight .sa {
    color: #bb8844;
  } /* Literal.String.Affix */
  .highlight .sb {
    color: #bb8844;
  } /* Literal.String.Backtick */
  .highlight .sc {
    color: #bb8844;
  } /* Literal.String.Char */
  .highlight .dl {
    color: #bb8844;
  } /* Literal.String.Delimiter */
  .highlight .sd {
    color: #bb8844;
  } /* Literal.String.Doc */
  .highlight .s2 {
    color: #bb8844;
  } /* Literal.String.Double */
  .highlight .se {
    color: #bb8844;
  } /* Literal.String.Escape */
  .highlight .sh {
    color: #bb8844;
  } /* Literal.String.Heredoc */
  .highlight .si {
    color: #bb8844;
  } /* Literal.String.Interpol */
  .highlight .sx {
    color: #bb8844;
  } /* Literal.String.Other */
  .highlight .sr {
    color: #808000;
  } /* Literal.String.Regex */
  .highlight .s1 {
    color: #bb8844;
  } /* Literal.String.Single */
  .highlight .ss {
    color: #bb8844;
  } /* Literal.String.Symbol */
  .highlight .bp {
    color: #999999;
  } /* Name.Builtin.Pseudo */
  .highlight .fm {
    color: #990000;
    font-weight: bold;
  } /* Name.Function.Magic */
  .highlight .vc {
    color: #008080;
  } /* Name.Variable.Class */
  .highlight .vg {
    color: #008080;
  } /* Name.Variable.Global */
  .highlight .vi {
    color: #008080;
  } /* Name.Variable.Instance */
  .highlight .vm {
    color: #008080;
  } /* Name.Variable.Magic */
  .highlight .il {
    color: #009999;
  } /* Literal.Number.Integer.Long */
}

Changed body font to Source Serif Pro.

I was playing around with serif fonts a bit and, landing on Source Serif, I found something that fit with my style in a way that no others did.

Visually quite a big change, the previous body font was the very modern Inter, a serif font, which I have kept for eg. titles for now.

Just noticed that Source Serif Pro isn’t handling left-single-smart-quotes correctly when they’re in italics, instead falling back to the right-quote/apostrophe. A pretty niche case, but I care about those little details. I’ll have to investigate whether it’s just missing from the font altogether, or if I inadvertently removed it when sub-setting it for my uses.

’example’

(If the above is no longer wrong, it’s because I’ve fixed the issue.)

Swapped out map tile provider

Switched map tiles provider from Thunder Forest Outdoors to maptiler.

Much prefer the topographic styling of these maps. They even offer a set of satellite tiles if I decide I want that option in the future.

Before and after:

  .then(() => {
    var map = L.map('map');
    L.tileLayer('https://tile.thunderforest.com/outdoors/{z}/{x}/{y}.png?apikey=APIKEY', {
        maxZoom: 19,
        attribution: null
    }).addTo(map);
  .then(() => {
    var map = L.map('map');
    L.tileLayer('https://api.maptiler.com/maps/topo-v2/{z}/{x}/{y}.png?key=APIKEY', {
        maxZoom: 19,
        attribution: null
    }).addTo(map);

Created a 'main' feed to supplement automatic collections

Previously the main feed on the site, linked to from the <head>, was ‘everything’ but this is a bit of a firehose with all my quotes, notes, nonsense, and links et al, so I’ve created a ‘main’ feed with a few chosen collections included. Currently it mimics the front page.

You can of course still pick and choose which specific collection/tag feeds you want to subscribe to from the feeds page.

Switched to TOML for document format and asset manifests

Previously I was using a pseudo-YAML for document frontmatter and JSON for the asset manifests. I’ve switched both to TOML, partly for it’s simpler parsing but really entirely because I much prefer its readability.

Now every document and asset is contained in TOML documents. A blog post is itself a completely valid TOML document that easily serialises into a Python struct. This has the advantage of bringing stricter validation to all data, and also opens up opportunities to build documents from structured data — something I’ve been wanting to do with data for the walk among other things.

The only issue I have with TOML is that there is no spec compliant way to have multiple TOML documents in a single file. Eventually this limitation may drive me back to YAML, but I don’t regret the migration at all. Now that I’m using a strict schema it will be trivial to migrate to JSON5, YAML, or even XML in the future if I wanted to.

September 22, 2024 8.05PM

Replaced my previous hand rolled site search that brute-forced an ungainly blob of JSON.

Opted for pagefind as I knew I wanted to stick with static search that I can compile at build time. Will tweak styles later but you can see it working already at the search page.

Much faster and more robust.

By default it grabs the title from the first <h1> on the page which strikes me as odd. Not all my pages — like those I call nonsense — have a <h1>. Workaround: added one of its data attributes to the title element in my document head template and set the result icon to my site favicon for all pages that have no images:

<head>
    {% block head -%}
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <meta name="generator" content="The humble hands of Silas" />
    <meta data-pagefind-default-meta="image[content]" content="/favicon.png" />
<title data-pagefind-meta="title">
    {%- block title -%}
        {% if page -%}
            {{ page.title -}}
Added dependency:    pagefind
Removed dependency:  my terrible javascript skills

Monday, 19th November 2024 20:44:22-08:00

Added the data-pagefind-ignore attribute to collections so that collection pages don’t rank in search results.

<section data-pagefind-ignore class="collection-section">
{% for item in items %}
<article class="collection {{ style }}">

August 25, 2024 8.36PM

Used Anthropics Claude 3.5 Sonnet LLM to iterate on a new JavaScript lightbox/gallery for my site that solves the quirks and annoyances of my previous attempt. Namely it now closes on browser-back as users expect, prevents scroll when the lightbox is open, and a couple other things. Simon Willison is right, this level of assistance is a magical boost when it comes to quick development of discrete code/tools.

Removed dependency: lightboxjs

September 20, 2023 6.58PM

Deprecated and removed the somewhat ambiguous id:: substitution reference in favour of slug:: so that it is always clear what a reference will be substituted with. This change also obviated the need for the misc:: handler, so that was culled from the code too.

September 20, 2023 10.50AM

Replaced author metadata element with creator as I gradually formalise the metadata terms used in my own infrastructure. creator generalises much better to a broader media set. For example: if I’m referencing a video then author does not make a great deal of sense.

August 18, 2023 6.01PM

Added the lightgallery library to the site for multi-image galleries, replacing my previous self built solution.

While I slightly mourn adding a dependency, it’s all self-hosted, and the resulting lightbox is much more robust and suited to the growing audience of this site.

Add HEIF image support

Added pillow-heif to build requirements to support compressing/processing HEIC images. For the time being HEIC/HEIF images are turned into JPEGs as browsers don’t natively support them.

Switched from markdown to Djot

Switched from markdown to Djot (See SEP 13: Switch to a more reasonable source markup language) for lightweight document markup. Djot offers many conveniences in terms of consistency and reasonability.

This change required syntactical tweaks be made to virtually all documents, but the pay off in terms of having a thoughtfully considered, simple, robust syntax feels significant.

In Djot raw HTML has to be explicitly declared. This has numerous advantages, again mostly having to do with predictability. No longer is it a guessing game as to whether output will be mangled or needs escaping.

Speed of HTML generation has been improved despite a rough implementation that currently shells out to an external Djot parser (Jotdown) as there is no Djot parser currently written for Python.

April 20, 2023 10.58AM

Not logging a specific change, simply recording here that the average full site build time now exceeds 2.5 seconds. This is with a total of 84457 words published to the site across 5 document types, 17 document classes, 28 document tags, and with 182 internal links between documents. Not any kind of problem, but thinking long term it makes sense to finish implementing SEP 35: Incremental builds in order to maintain the tight, iterative feedback loop of revise/rebuild that works so well for me.

10:52:29 Imported modules in 0.143 seconds
10:52:29 Setup environment in 0.0 seconds
10:52:29 Defined functions in 0.0 seconds
10:52:30 Ingested documents in 0.552 seconds
10:52:30 Inserted substitutes in 0.139 seconds
10:52:31 Generated HTML in 1.191 seconds
10:52:31 Built backlinks in 0.003 seconds
10:52:31 Built collections in 0.002 seconds
10:52:32 Wrote out HTML files in 0.403 seconds
10:52:32 Built feed stylesheet in 0.002 seconds
10:52:32 Built feeds in 0.129 seconds
10:52:32 Built sitemap in 0.003 seconds
10:52:32 Copied assets in 0.015 seconds
10:52:32 Processed assets in 0.015 seconds
10:52:32 Built in 2.604 seconds

April 8, 2023 11.51PM

Partial rewrite, significant changes to major subsystems.

Overhaul backlink extraction in order to drop BeautifulSoup as a dependency, (See ✅ SEP 16: Extract backlinks from plaintext). New system works on the plaintext documents directly which offers greater flexibility, consistency, as well as being lighter and faster.

Add collision checking (✅ SEP 14: UUID collision checking) amongst UUIDs, necessary for several reasons including effects on SEPs 12 and 16.

Add insert_substitutes(): function for all pre-processing of plaintext prior to HTML conversion. The major component within is currently the implementation of internal UUID referencing.

Overhaul build_html() function into a new, leaner, simpler generate_html() function made possible by the changes above.

Secondary benefit: 30% faster builds

June 12, 2022 1.08PM

Added the Inter font, prefer it’s spacing and readability. Currently only supporting WOFF2 as unsupported clients can always fall back to system defaults. My only gripe with Inter is I feel its italic type is much too subtle for a sans-serif font.

April 21, 2022 4.42PM

Reverted change to feeds made on Mar 20, 2022.

Take silasjelley.com/feeds/notes for example, why not place its feed at silasjelley.com/notes/feed? In that example I agree that the latter structure is just as elegant and useful as the former. But what if I want to create arbitrary feeds such as a feed that combines the notes and glossary collections? Where would I put that? Situating all feeds beneath silasjelley.com/feeds allows for taxonomical freedom. In this case: silasjelley.com/feeds/notes+glossary.