SEP 34: Overhaul image pipeline

Most documents on the site do not contain images, but images being images, they are always going to take up more space than textfiles.

At time of writing (2023-04-11) there are 178 images on the site, most of them in documents related to my ‘journeys’, weighing in at 31.7MB, for an average image size of 178KB. Not an absurd figure but I’m not above trying to crunch it down.

91.3% of the sites size on disk is images. That being said, I’m not really waging a war on images, I think they’re important. I don’t want to squash my photos into a pixelated mess just to save a bunch of theoretical bandwidth1. But if optimisations can be made while maintaining the level of quality I want, then why not.

Currently I use the Pillow package to process images, click below to see this function — it’s a crude effort but it gets the job done.

Prep for this proposal will require a review of the state of the art in image optimisation. Prospective tools:

  1. pngnq/advpng for PNGs

  2. mozjpeg for JPEGs

Appendix 1: Current Image Processing Function

# Process images
def process_images():
    status = "Used cached images"
    if verbosity > 1:
        print("Processing Images")
    image_assets = []
    image_assets.extend(Path(images_dir).glob('**/*.jpg'))
    image_assets.extend(Path(images_dir).glob('**/*.jpeg'))
    image_assets.extend(Path(images_dir).glob('**/*.JPG'))
    image_assets.extend(Path(images_dir).glob('**/*.png'))
    image_assets.extend(Path(images_dir).glob('**/*.PNG'))
    image_assets.extend(Path(images_dir).glob('**/*.webp'))
    for image in image_assets:
        rel = os.path.relpath(image, images_dir)
        ext = os.path.splitext(rel)[1]
        base = os.path.splitext(rel)[0]
        path_large = os.path.join(output_dir + 'images/' + rel)
        path_small = os.path.join(output_dir + 'images/' + base + '.small' + ext)
        # if image already exists in output_dir, skip
        try:
            with open(path_large) as f:
                with open(path_small) as f:
                    pass
        # If image does not yet exist in output_dir, process and output
        except IOError:
            status = ''
            # Print source file path
            if verbosity > 2:
                print(f"  {image} >> ", end=", flush=True)
            # Open image, derive height from new width as a proportion of original width
            im = Image.open(image)
            width_large = 1000
            height_large = int(im.size[1]*float((width_large/float(im.size[0]))))
            large = im.resize((width_large, height_large), Image.Resampling.LANCZOS)
            width_small = 300
            height_small = int(im.size[1]*float((width_small/float(im.size[0]))))
            small = im.resize((width_small, height_small), Image.Resampling.LANCZOS)
            # Create output directory tree
            os.makedirs(os.path.dirname(path_large), exist_ok=True)
            # Output resized images
            large.save(path_large)
            small.save(path_small)
            # Print written file path
            if verbosity > 2:
                print(f"{path_large}")
    if status and verbosity > 1:
        print(f"  {status}")

  1. I say theoretical because the number of people who actually read this site is probably a rounding-error on a rounding-error, but hey, gotta pinch those bytes.↩︎