SEP 45: Transclusion of documents

Status: Finished but then removed

Date: 2023-04-18
Commit: dd90cb05d8362d869252587e054df23d8d479daf
Code complete

Date: 2023-04-20
Added silent include handler 'sin::' for transcluding without a visible break
in styling, as opposed to the visible transclusion block element transcluded
with 'in::'

Date: 2024-09-24T11:31:04+03:00
Removed the 'sin::' and 'in::' handlers as I never use them and I'm preparing to
overhaul the way I do substitutions and transclusions, so first minimising the
surface area of this feature in the current implementation. May revisit in the
future but for now linking is sufficient, this is a hypertext after all.

Building out SEP 44: Short Reference Syntax presented an opportunity to develop an idea I’ve had in mind for some time, Transclusion. This proposal has since been implemented to a ‘live beta’ standard.

transclusion

In computer science, transclusion is the inclusion of part or all of an electronic document into one or more other documents by reference via hypertext. Transclusion is usually performed when the referencing document is displayed, and is normally automatic and transparent to the end user. The result of transclusion is a single integrated document made of parts assembled dynamically from separate sources, possibly stored on different computers in disparate places.

— Wikipedia, Transclusion

Transclusion has been implemented as a ‘handler’ which finds links prefixed with in:: and substitutes the full content of the referenced document in its place.

NOTE: The below code does not reflect the current implementation, which has been altered and refactored since. See the main build script source for an up to date view of the code.

See the relevant code here


def insertSubstitutions():
    if verbosity > 1:
        print("Performing substitutions")
    transclusion_count = 0
    transclusion_list = []
    for key, page in data.items():
        text = page["plaintext"]
        # Replace instances of {{ site.pagecount }} and {{ site.wordcount }}
        # accordingly. This is a bodge
        text = text.replace("{{ site.pagecount }}", str(site["pagecount"]))
        text = text.replace("{{ site.wordcount }}", "{:,}".format(site["wordcount"]))
        # Replace UUID document references in the source text
        # See "SEP: Address documents by their URN/UUID" for the rationale
        # Summary: Documents can be referenced from within other documents using
        #   their unique ID (UUIDv4), or an 8 digit prefix of that ID. The
        #   below code searches for markdown style links that contain a
        #   linktarget beginning with '!'. A lookup is performed to match that
        #   UID (or partial) UID with a document from which the slug and the
        #   title of that document can be discovered. This information is then
        #   used to replace the link with one that now points to the public URL
        #   of the document instead of the UID. This substitution is performed
        #   on plaintext so that both the plaintext and HTML documents reflect
        #   the authors intent.
        #
        #
        # Find all valid UUID reference links, eg:
        # UUID_REF_RE = re.compile(r"\[([^\]]*)\]\((![^)]+)\)")
        UUID_REF_RE = re.compile(r"\[([^\]]*?)\]\((.*?::)([^)]+)\)")
        inref_links = list(UUID_REF_RE.findall(text))
        # Process each match
        for linktext_match, inref_type, inref_match in inref_links:
            match = f"[{linktext_match}]({inref_type}{inref_match})"
            if inref_type in ["slug\::", "in::"]:
                inref_uuid = ""
                for k in data:
                    if k.startswith(inref_match):
                        inref_uuid = k
            else:
                raise Exception(
                    f"Unexpected Internal Reference type '{inref_type}' in document:\
                                {key}
    match: {match}"
                )

            # If no matching UUID found for reference, raise Exception
            if inref_uuid == ":
                raise Exception(
                    f"
Unmatched UUID reference:
"
                    f"  document: {key}
"
                    f"  {inref_match} does not reference an existing document"
                )

            # If linktext is blank OR begins with a ':' lookup the title of
            # the linked document. If linktext is not blank AND does NOT begin
            # with a ':' leave it as is.
            if linktext_match.startswith("::") or linktext_match == ":
                linktext = data[inref_uuid]["title"]
            else:
                linktext = linktext_match
            # Lookup slug of linked document
            linktarget = f"/{data[inref_uuid]['slug']}"
            if inref_type == "/bored":
                # Assemble payload
                replacement = f"[{linktext}]({linktarget})"
                # Find and replace the original link with the updated one
                text = text.replace(match, replacement)

            elif inref_type == "in::":
                transclude_data = {
                    "insrc_uuid": key,
                    "insrc_match": match,
                    "inref_uuid": inref_uuid,
                    "inref_linktext": linktext,
                    "inref_linktarget": linktarget,
                }
                transclusion_list.append(transclude_data)

        # Write modified plaintext back to document variable once all
        # substitutions have been carried out.
        page["plaintext"] = text

        # Carry out transclusions (these must occur after all other link references to ensure such links are present in transcludes elements)
        for transclude in transclusion_list:
            insrc_uuid = transclude["insrc_uuid"]
            insrc_match = transclude["insrc_match"]
            inref_uuid = transclude["inref_uuid"]
            inref_transclude = markdown(data[inref_uuid]["plaintext"])
            linktext = transclude["inref_linktext"]
            linktarget = transclude["inref_linktarget"]
            # Assemble payload
            replacement = f'<figure class="transclusion"><p>{inref_transclude}</p><figcaption> <em>from</em> <a href="{linktarget}">{linktext}</a></figcaption></figure>'
            # Find and replace the original link with the assembled transclusion
            data[insrc_uuid]["plaintext"] = data[insrc_uuid]["plaintext"].replace(
                insrc_match, replacement
            )

Drawbacks and pitfalls:

Transclusions do not create backlinks because they are injected as HTML, not markdown.
All types of substitution currently are unable to ignore text in eg, code blocks. In some ways this could be seen as an advantage in that I can perform substitutions anywhere in the text, but this behaviour will need refining.
In the current implementation, all substitutions (ID referencing + transclusion) must be carried out before HTML is generated. This means that for transclusions to be rich ie, include proper links and formatting, the transcluded page has to be generated twice: once during transclusion and again when the build reaches the true generateHTML() function.
Currently only supports transcluding whole documents. The ability to transclude parts of documents would necessitate a dramatic overhaul of the build system and arguably would offer very little value. This site favours atomicity: small documents, heavily interlinked. I don’t envision transclusion being used often, and when it is used, I am likely to only want the full content of a short document simply to prevent a reader needing to exit the document to read eg, a term definition. Transclusion therefore is likely mostly just for such use, so that I don’t have to repeat definitions throughout many documents, which would make it harder to refine that definition later. In this way, I am happy with the current implementation.