Iβve always been annoyed by the way Calibre organises ebooks. Today I decided to dig into its export templating syntax though, and Iβve come up with something Iβm happy with. As with MusicBrainz previously, Iβm preserving this here so I never have to fiddle with its arcane syntax again!
program:
first_author = list_item(field('authors'),0,'&');
author = re(re(re(re(re(re(re(re( first_author, '\.', ''), '&', 'and'), '^\s+|\s+$', ''), ':', ''), '[\[\]\(\)\{\}]', ''), "'", ''), ',', ''), '\s+', '-');
title = re(re(re(re(re(re(re( field('title'), '&', 'And'), '^\s+|\s+$', ''), ':', ''), '[\[\]\(\)\{\}]', ''), "'", ''), ',', ''), '\s+', '-');
publisher = re(re(re(re(re(re(re( field('publisher'), '&', 'And'), '^\s+|\s+$', ''), ':', ''), '[\[\]\(\)\{\}]', ''), "'", ''), ',', ''), '\s+', '-');
year = format_date(field('pubdate'),'yyyy');
titleyear = "_" & year;
series_part = test(
field('series'),
'-' & re(re(re(re(re(re(re( field('series'), '&', 'And'), '^\s+|\s+$', ''), ':', ''), '[\[\]\(\)\{\}]', ''), "'", ''), ',', ''), '\s+', '-')
& '-' & format_number(field('series_index'), '0') & '_',
''
);
publisher_part = test(
field('publisher'),
'_' & publisher,
''
);
author & '/' & year & '/' & series_part & title & titleyear & publisher_part
Of course, just as soon as Iβm done fiddling I learn that Calibre also supports Python templating, so Iβve since rewritten in its far preferable syntax and expanded it to cover a lot more edge cases. The code is pretty straightforward.
NOTE: The python:
at the beginning is a necessary sigil for Calibre to enter Python Template Mode.
python:
def evaluate(book, context):
import re
import ast
def clean(s):
if not s:
return ""
s = re.sub(r"&", "and", str(s))
s = re.sub(r"[\.\[\]\(\)\{\}\'\":,]", "", s)
s = re.sub(r"[\\/<>|?*]", "-", s)
s = re.sub(r"\s+", "-", s.strip())
return re.sub(r"[-_]+", lambda m: m.group()[0], s).strip("-_")
def first_author_name(book):
if a_sort := (book.get("author_sort") or "").strip():
first = re.split(r"\s*[&;]\s*|\s+and\s+", a_sort)[0].strip()
if "," in first:
last, firsts = [p.strip() for p in first.split(",", 1)]
return f"{firsts} {last}".strip()
return first
# Build components
author_clean = clean(first_author_name(book) or "Unknown")
title = clean(book.get("title", ""))
publisher = clean(book.get("publisher", ""))
year = ""
if pubdate := book.get("pubdate"):
if hasattr(pubdate, "strftime") and (y := pubdate.strftime("%Y")).isdigit() and int(y) >= 1000:
year = y
series_part = ""
if series := book.get("series"):
series_part = f"{clean(series)}-{int(book.get('series_index', 0)):0d}_"
# Assemble path
path = f"{author_clean}/"
if year:
path += f"{year}/"
path += f"{series_part}{title}"
if year:
path += f"_{year}"
if publisher:
path += f"_{publisher}"
return re.sub(r"[-_]+", lambda m: m.group()[0], path).strip("-_")
Things werenβt as easy and smooth as they could have been however. After banging my head against the wall for nearly an hour, having got my python script to exactly* where I wanted it, there emerged a persistent bug wherein one of my variables was returning an unexpected value for no discernible reasonβ¦
I found the cause.
For no bloody good reason, Kovid Goyal, the benevolant dictator of Calibre has opted to run a couple of arbitrary string replaces on custom templates at runtime.
Buried in a 464 line python file installed by Calibre at /usr/lib/calibre/calibre/library/save_to_disk.py
is this function:
def preprocess_template(template):
template = template.replace('//', '/')
template = template.replace('{author}', '{authors}')
template = template.replace('{tag}', '{tags}')
if not isinstance(template, str):
template = template.decode(preferred_encoding, 'replace')
return template
WHAT! Why are you replacing strings in a userβs template Kovid?! Whatβs most insidious about it is heβs not replacing author
, but {author}
so this bug only bites you if you use an author
variable inside an f-string (eg path = f"{author}/{year}"
in my case).
Now I know the sauce if this weird feckin bug, itβs a trivial fix: rename my author
variable to anything else, I went with author_clean
.
Bugger me.
Click here (future me) for the full error that eventually led me back to the source of this mayhem.
calibre, version 8.7.0
ERROR: Error while saving: Failed to save any books to disk, click "Show details" for more information
Failed to save: Confessions of an English Opium-Eater by Thomas De Quincey to disk, with error:
Traceback (most recent call last):
File "/usr/lib/calibre/calibre/utils/formatter.py", line 1770, in _run_python_template
rslt = compiled_template(self.book, self.python_context_object)
File "<string>", line 94, in evaluate
NameError: name 'authors' is not defined. Did you mean: 'author'?
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/calibre/calibre/library/save_to_disk.py", line 286, in get_path_components
components = get_components(opts.template, mi, book_id, opts.timefmt, path_length,
ascii_filename if opts.asciiize else sanitize_file_name,
to_lowercase=opts.to_lowercase,
replace_whitespace=opts.replace_whitespace, safe_format=False,
last_has_extension=False, single_dir=opts.single_dir)
File "/usr/lib/calibre/calibre/library/save_to_disk.py", line 251, in get_components
components = Formatter().unsafe_format(template, format_args, mi)
File "/usr/lib/calibre/calibre/utils/formatter.py", line 1978, in unsafe_format
return self.evaluate(fmt, [], kwargs, self.global_vars)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/calibre/calibre/utils/formatter.py", line 1891, in evaluate
ans = self._eval_python_template(fmt[7:], self.column_name)
File "/usr/lib/calibre/calibre/utils/formatter.py", line 1758, in _eval_python_template
return self._run_python_template(func, arguments=None)
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/calibre/calibre/utils/formatter.py", line 1784, in _run_python_template
raise ValueError(_('Error in function {0} on line {1} : {2} - {3}').format(
ss.name, ss.lineno, type(e).__name__, str(e)))
ValueError: Error in function evaluate on line 94 : NameError - name 'authors' is not defined
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/calibre/calibre/gui2/save.py", line 138, in do_one_collect
self.collect_data(book_id)
~~~~~~~~~~~~~~~~~^^^^^^^^^
File "/usr/lib/calibre/calibre/gui2/save.py", line 146, in collect_data
components = get_path_components(self.opts, mi, book_id, self.path_length)
File "/usr/lib/calibre/calibre/library/save_to_disk.py", line 292, in get_path_components
raise ValueError(_('Failed to calculate path for '
'save to disk. Template: %(templ)s\n'
'Error: %(err)s')%dict(templ=opts.template, err=e))
ValueError: Failed to calculate path for save to disk. Template: python:
def evaluate(book, context):
import re
import ast
def clean(s):
if not s:
return ""
s = str(s)
# Replace & with "and", remove dots, brackets, quotes, colons, commas
s = re.sub(r"&", "and", s)
s = re.sub(r"\.", "", s)
s = re.sub(r"[\[\]\(\)\{\}]", "", s)
s = re.sub(r"[\'\":,]", "", s)
s = s.strip()
# Whitespace -> hyphen
s = re.sub(r"\s+", "-", s)
# Collapse repeated separators
s = re.sub(r"-+", "-", s)
s = re.sub(r"_+", "_", s)
# Trim edges
return s.strip("-_")
def first_author_name(book):
# Prefer author_sort: "Last, First & Last2, First2"
a_sort = book.get("author_sort", "") or ""
if a_sort:
first = re.split(r"\s*&\s*|\s*;\s*|\s+and\s+", a_sort)[0].strip()
if "," in first:
last, firsts = [p.strip() for p in first.split(",", 1)]
return f"{firsts} {last}".strip()
return first
# Fallback: authors (may be list or a stringified list)
authors = book.get("authors", [])
if isinstance(authors, (list, tuple)):
return authors[0] if authors else ""
if isinstance(authors, str):
s = authors.strip()
if s.startswith("[") and s.endswith("]"):
try:
# Attempt to parse it as a Python literal
parsed = ast.literal_eval(s)
if isinstance(parsed, (list, tuple)) and parsed:
return str(parsed[0])
except (ValueError, SyntaxError):
# If ast fails, fall back to string manipulation.
s_no_brackets = s[1:-1].strip()
first_author_str = s_no_brackets.split(",")[0].strip()
return first_author_str.strip("'\"")
# If it's a regular string with multiple authors, split by common delimiters
if "&" in s or "," in s or ";" in s:
return re.split(r"\s*&\s*|\s*,\s*|\s*;\s*", s)[0].strip()
return s # It's a single author name as a string
return ""
# ---- AUTHOR ----
author_name = first_author_name(book) or "Unknown"
author = clean(author_name)
# ---- TITLE ----
title = clean(book.get("title", ""))
# ---- PUBLISHER ----
publisher = clean(book.get("publisher", ""))
# ---- YEAR ----
pubdate = book.get("pubdate", None)
year = ""
if pubdate and hasattr(pubdate, "strftime"):
y = pubdate.strftime("%Y")
if y.isdigit() and int(y) >= 1000:
year = y
# ---- SERIES ----
series = book.get("series", "")
if series:
series_index = book.get("series_index", 0)
series_part = f"-{clean(series)}-{int(series_index):0d}_"
else:
series_part = ""
# ---- PUBLISHER PART ----
publisher_part = f"_{publisher}" if publisher else ""
# ---- TITLE+YEAR ----
titleyear = f"_{year}" if year else ""
# ---- BUILD PATH ----
path = f"{authors}/"
if year:
path += f"{year}/"
path += f"{series_part}{title}{titleyear}{publisher_part}"
# Final safety pass
path = re.sub(r"-+", "-", path)
path = re.sub(r"_+", "_", path)
return path.strip("-_")
Error: Error in function evaluate on line 94 : NameError - name 'authors' is not defined
And lastly, hereβs a sample of my ebooks nicely organised.
library/documents/books
βββ Alexandre-Dumas
βΒ Β βββ 2004
βΒ Β βββ Count-of-Monte-Cristo-Abridged_2004_Barnes-and-Noble.epub
βββ Arundhati-Roy
βΒ Β βββ 2020
βΒ Β βββ Azadi-Freedom-Fascism-Fiction_2020_Haymarket-Books.epub
βββ Avvaiyar
βΒ Β βββ 2009
βΒ Β βββ Give-Eat-and-Live-Poems-of-Avvaiyar_2009_Red-Hen-Press.pdf
βββ Ellen-Lupton
βΒ Β βββ 2010
βΒ Β βΒ Β βββ Thinking-with-Type-A-Critical-Guide-for-Designers-Writers-Editors-and-Students-2nd-Edition_2010.epub
βββ George-Saunders
βΒ Β βββ 2021
βΒ Β βββ A-Swim-in-a-Pond-in-the-Rain_2021_Random-House-Publishing-Group.epub
βββ Julia-Cameron
βΒ Β βββ 2016
βΒ Β βββ The-Artists-Way-25th-Anniversary-Edition_2016_Penguin-Publishing-Group.epub
βββ Kyle-Siemens
βΒ Β βββ 2022
βΒ Β βββ Piranha-Fishing-in-the-Amazon_2022.pdf
βββ Lao-tzu
βΒ Β βββ 1996
βΒ Β βββ Taoteching-With-Selected-Commentaries-from-the-Past-2000-Years_1996_Red-Pine.pdf
βββ Martha-Beck
βΒ Β βββ 2021
βΒ Β βββ The-Way-of-Integrity-Finding-the-Path-to-Your-True-Self_2021_Penguin-Publishing-Group.epub
βββ Raynor-Winn
βΒ Β βββ 2018
βΒ Β βββ The-Salt-Path_2018_Penguin-Books-Limited.epub
βββ Sir-Ernest-Henry-Shackleton
βΒ Β βββ 2012
βΒ Β βββ South-The-Story-of-Shackletons-Last-Expedition-1914-1917_2012_Duke-Classics.epub
βββ Susan-Sontag
βΒ Β βββ 2021
βΒ Β βββ On-Photography_2021.pdf
βββ Ta-Nehisi-Coates
βΒ Β βββ 2015
βΒ Β βββ Between-the-World-and-Me_2015_Random-House-Publishing-Group.epub
βββ Viktor-E-Frankl
Β Β βββ 2006
Β Β βββ Mans-Search-for-Meaning_2006_Beacon-Press.epub
I also dug out my PDF fix-up tricks on a couple PDFs. The first was just missing an EOF in the file, so:
pdftk \
Cappadocia-A-Travel-Guide-Through-the-Land-of-Fairychimneys-and-Rock-Castles_2010_Books-on-Demand.pdf \
output \
Cappadocia-A-Travel-Guide-Through-the-Land-of-Fairychimneys-and-Rock-Castles_2010_Books-on-Demand-Repaired.pdf
The second had major Xref issues throughout and even using ghostscript
to try and fully rewrite it proved insufficient :(
ghostscript \
-o Cool-Tools-A-Catalog-of-Possibilities_2014-Repaired.pdf \
-sDEVICE=pdfwrite -dPDFSETTINGS=/prepress \
Cool-Tools-A-Catalog-of-Possibilities_2014.pdf
Iβll have to revisit that one another time. File still works fine, PDF readers are tolerant, but it being malformed means I canβt fix-up all the metadata Iβve added.