Another Way of integrating Mozilla Readability in Emacs EWW

A different Reader View experience

📅 10 Apr 2024 | ~3 min read
Tags: #emacs

I was browsing Planet Emacslife yesterday when I came across Anand Tamariya’s post Mozilla Readability in GNU Emacs. I recently wrote a minor mode that had achieves the same thing, but Anand and I took very different approaches to get the same results.

Anand used the official Mozilla Readability package, but I use rdrview which is a port of Mozilla’s original program written in C. The developer claims that “the code is closely adapted from the Firefox version and the output is expected to be mostly equivalent.”

He also modified the source code of eww.el, whereas my approach was to use the eww-retrieve-command variable which was added in Emacs 28.1. The following is enough to get started, but I wasn’t entirely happy with the results.

(setq eww-retrieve-command '("rdrview" "-H"))

Since Emacs 29.1, EWW buffers can be automatically renames to give more descriptive names I typically like it to display the title of the web page. The following code can be used to accomplish it.

(defun my-eww-rename-buffer ()
  (when (eq major-mode 'eww-mode)
    (when-let ((string (or (plist-get eww-data :title)
                           (plist-get eww-data :url))))
      (format "%s *eww*" string))))

(setq eww-auto-rename-buffer 'my-eww-rename-buffer)

However, the way that rdrview works is that it only displays the main content of a web page and that does not include the <head> or <title> tags. I often have many EWW buffers open, so it is very important that I can quickly distinguish each of them without having to opening them.

By changing the flags provided to the rdrview command, I could prepend the web page with the title. Then, I simply had to write a function that updates the title in eww-data, and add the function to eww-after-render-hook.

(define-minor-mode eww-rdrview-mode
  "Toggle whether to use `rdrview' to make eww buffers more readable."
  :lighter " rdrview"
  (if eww-rdrview-mode
        (setq eww-retrieve-command '("rdrview" "-T" "title,sitename,body" "-H"))
        (add-hook 'eww-after-render-hook #'eww-rdrview-update-title))
      (setq eww-retrieve-command nil)
      (remove-hook 'eww-after-render-hook #'eww-rdrview-update-title))))

(defun eww-rdrview-update-title ()
  "Change title key in `eww-data' with first line of buffer.
It should be the title of the web page as returned by `rdrview'"
    (goto-char (point-min))
    (plist-put eww-data :title (string-trim (thing-at-point 'line t))))

That’s all you really need to use this minor mode, but my preferred entry point is toggling it on and off from within EWW. The regular eww-readable function is bound to a single key press, and I wanted to also have a function that can be called the same way.

(defun eww-rdrview-toggle-and-reload ()
  "Toggle `eww-rdrview-mode' and reload page in current eww buffer."
  (if eww-rdrview-mode (eww-rdrview-mode -1)
    (eww-rdrview-mode 1))

Generally eww-readable is good enough at getting rid of the distracting parts of a web page, but there are still times when it doesn’t work exactly as I want it to. There are also times when rdrview doesn’t work as I’d hoped it to either.

Since writing the above code, I find that I have been using both methods for removing clutter from web pages. I’ll try one and if I’m not satisfied, then I’ll use the other.

✉️ Respond by Email.