✅ SEP 12: Reference documents by their URN/UUID

RFC: A Universally Unique IDentifier (UUID) URN Namespace

Status: Finished

Date: 2023-04-07
Commit: e521f96aa448b83f61e25b9f986e17456934ec15
Code complete

Date: 2023-04-15
Commit: 00147859efc5a8547b47b099cb93c729e358888f
All internal reference links across all documents updated to use the new UUID
referencing system. All 'slug' referencing discontinued.

Summary: Documents can be referenced from within other documents using their unique ID (UUIDv4), or an 8 digit prefix of that ID. The below code searches for links that contain a refSlug including ‘::’. A lookup is performed to match that UID (or partial) UID with a document from which the slug and the title of that document can be discovered. This information is then used to replace the link with one that now points to the public URL of the document instead of the UID. This substitution is performed on plaintext so that both the plaintext and HTML documents reflect the authors intent.

The implementation of this proposal prompted the subsequent proposal ✅ SEP 14: UUID collision checking.

It would be useful for several reasons to be able to address any document, from within any other document, simply by way of its stable globally unique identifier (GUID). In this project these IDs take the form of Version 4 UUIDs and represent the primary key for every document.

The UUID of a document will never change (unlike a title and, in rare circumstances, a URL) and as such they represent far and away the most useful and durable key for addressing documents.

This has two component parts, listed here in order of priority.

Primarily what I want is to be able to reference any document from within any other document simply using its UUID and have the build system transform that into a more desirable representation in the resulting HTML. There should be a simple, reasonable syntax for expressing the desired markup from within the source text.
Secondarily, I want to support referencing documents using their UUID in a URL.

Syntax

I’m currently leaning toward using the standard markdown/djot link syntax with a few thin rules for translation.

[]($UUID)  > [Address documents by their URN/UUID](\!$UUID)
[In line ref text]($UUID)  > [In line ref text](\!$UUID)

In the first case where no link text is given, the link text is populated automatically with the referenced documents title value, in the second example where link text is given it is left as is. This allows for a ‘shorthand’ for referencing a document and including its title in a way that is durable to a change in the referenced documents’ title, but still permits ‘editorial’ control over link text where it explicitly is not meant to reflect the title such as in an in-line reference.

It should be made explicit that the two examples below should yield distinct output.

[]($UUID) > [$TITLE]($URL)
[](\$UUID) > [$TITLE]($UUID)

This is so that it does not become impossible to ‘escape’ the transformation of UUIDs into their respective URLs when desired. The UUID should only be replaced when it is bare within the parentheses. If it is preceded or trailed by a slash it should be left as is.

This syntax could be made more explicit with the use of a special character. This has the appeal of removing any last shred of ambiguity as to whether or not a UUID should be substituted. I think the most apt character would be an exclamation mark. Thus the parser can simply and confidently make substitutes on the following syntax (with or without link text) and ignore everything else:

[](!$UUID)
[](!$UUID_first_eight)

Considerations and miscellaneous notes

RESOLVED Order of parsing

Should the substitution of link text and target happen before or after the document is parsed into markup?

I’m inclined to say before, as I think it simplifies both the ‘lookup’ and substitution. It will I think become necessary to push the processing of source text into markup further back in the order of the build system to accommodate this substitution at a time when the build system has all the information it needs about every document, but the result will be cleaner as it will allow both the intermediate and final markup to updated and correct during the build.
UNRESOLVED Should the substitution take place in HTML source blocks?
IMPLEMENTED Shorter referencing

A full UUID is 32 characters split by four hyphens for a 36 char string. This could be cumbersome for regular referencing and may benefit from a shorter, derived ID. This should be approached cautiously so as to avoid having many seperate and ill-defined ID systems. I’m inclined to use the first 8 digits of a UUID as a suitable shorthand reference. This will slightly complicate the implementation but I believe the benefit of not littering source with many full length UUIDs is preferable.
IGNORED An XML/HTML-like syntax

I’m making a note here to record that I considered but for the time being have dismissed the possibilty of using an XML like representation for these references as I think it makes more sense to consider this a feature of the ‘lightweight markup’ for authoring documents rather than the more ‘heavyweight markup’ (HTML) used for displaying documents.
IGNORED Should the parser be block aware?

Should the parser evaluate whether a possible substitute exists within a code block? For the sake of simplicity of implementation I’m inclined to say no.

&#9888 Bug discovered and fixed 2023-04-09. If the referenced UUID dis not exist it would currently insert the UUID from the last loop instead. A silent failure. It must instead fail, and fail loudly!