SEP 3: Define arbitrary collections in logic

Currently the data model and pipeline I have built can accommodate building feeds/collections from one or more tags, I want to improve the granularity of this feature so that feeds/collections could be defined on the basis of multiple requirements. As an example, I might want a feed/collection of all documents published last year, that are tagged with ‘proposals’ AND ‘design’, rather than the current model with can only assess whether they are tagged with ‘proposals’ OR ‘design’.

There are two necessary parts to this. Changes in the document metadata, and a rewrite of the program logic that attaches context to each document as it is processed.

Probably it would preferable to be able to give each unique collection a name.

Getting the full use out of this feature, and implementing it in an elegant way, will involve fully overhauling the existing data model.

Possible implementations


    uid: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXXXXXX
    title: Proposals for the improvement of this site
    creator: Silas Jelley
    created: 2022-07-12 09:25:39
    published: 2022-07-12 10:06:27
    location: Nelson, New Zealand
    slug: proposals
    options:
      - nofeed
      - collection-title
    collections:
        propdes:
            - proposals
            - design
        notes
    tags:
      - meta
      - design
    
Figure 1: In this example, multiple collections are being defined in the document metadata. 'propdes' is a custom, named collection of documents that are tagged with both proposal and design while 'meta' is the normal collection of that name.

There are a few issues with the proposal in Figure 1:

  1. Defining collections in this way places a high burden of correctness on the user as a nested data format is vulnerable to input error.

  2. The logic necessary to unpack an arbitrarily deep nesting of collection definitions would be complex. Unnecessary complexity should be avoided. We don’t need Turing complete logic here.

  3. The model would encourage the use of arbitrary rather than consistent naming of collections which could lead to confusion or even collisions.

An alternative, that may address some or all of the above issues, would be for all custom collections to be defined in one place.

This would also sidestep the N+1 Query Problem that would arise if I attempted to compute all possible collection combinations up front.

One solution, though not necessarily the most conceptually elegant one, would be to hard code this set of custom collections in the build pipeline.