Extracting A Light-Weight Key-Sequence Reader By Progressive Simplification

1 Background

In the previous article entitled On Defining Keys In Emacs I covered the issue of declaring key-sequences when defining keyboard short-cuts. During the last week, I took the underlying Emacs Lisp function edmacro-parse-key and via a process of progressive simplification derived a new new-kbd function that is much simpler and consequently easier to understand. You can see the step-by-step simplification via the Git history for file new-kbd.el (note: used to be called ems-kbd.el). That file contains the final version of the simplified function, along with a test-suite that verifies that it's behavior is consistent with the solution built into Emacs. The updated function is now part of Emacspeak and is named ems-kbd in that package.

The next section gives a high-level overview of the steps that led to the final version.

2 Steps Toward Simplification

2.1 Separate Tokenization From Processing

Function edmacro-parse-keys interweaves the process of tokenizing its input string and how various parts of that string are processed in a single while loop.

The first step in simplification was to separate these steps, by using function split-string to split the input string on whitespace to generate a list of words.

A simple cl-loop is then used to turn each word into a key that is accumulated into a result vector.

2.2 Refactoring Case Analysis

Once tokenization is factored out, the remainder of function edmacro-parse-keys converts each key-specification into either the corresponding string or vector representation.

The original requirement of parsing the serialization of keyboard-macros brought along additional logic that I first eliminated, since my goal was to create a function to be used in defining keyboard-shortcuts.

  • I eliminated code that handled invocation of M-x execute-extended-command during a keyboard-macro.
  • I eliminated processing of comments within the keyboard-macro serialization.

2.3 Rearranging Conditionals

Next, I rearranged conditionals and in that process eliminated cond clauses that were now effectively dead-code.

In the process, I also eliminated test-predicates that had side-effects to hopefully result in less fragile code.

2.4 Lexically Bind Regex Patterns

To improve readability, I created let-bindings to some of the regex patterns used to identify key-sequence patterns. In the process, I also made these more readable by using [:space:] for white-space tests.

2.5 Always Return A Vector

Finally, I setup the new function to always return a vector; function edmacro-parse-keys returns either a string or a vector based on how it is called. Since Emacs now takes a vector in every context where a key-sequence is expected, this simplification does not break when using our simplified function for defining keys.

   (defun new-kbd (string )
  "Simplified and hopefully more robust kbd function.
Always returns a vector i.e. like passing need-vector to edmacro-parse-keys. "
  (let ((res [])
        (special-char-reg "^\\(NUL\\|RET\\|LFD\\|ESC\\|SPC\\|DEL\\)$")
        (modifier+angle-reg "^\\(\\([ACHMsS]-\\)*\\)<\\(.+\\)>$"))
     for word in (split-string string)
     (let* ((key nil))
        ((and ;;; modifier+-<key> without DEL etc
          (not (string-match special-char-reg word))
          (string-match modifier+angle-reg word))
         (setq key
                 (concat ;;; strip < and >
                  (substring word (match-beginning 1) (match-end 1))
                  (substring word (match-beginning 3) (match-end 3)))))))
         (let ((prefix 0)
               (bits 0))
           (while ;;; calculate modifier bits
               (string-match "^[ACHMsS]-." word)
             (cl-incf bits
                       (assq (aref word 0)
                             '((?A . ?\A-\^@)
                               (?C . ?\C-\^@)
                               (?H . ?\H-\^@)
                               (?M . ?\M-\^@)
                               (?s . ?\s-\^@)
                               (?S . ?\S-\^@)))))
             (cl-incf prefix 2)
             (cl-callf substring word 2))
           (when (string-match "^\\^.$" word)
             (cl-incf bits ?\C-\^@)
             (cl-incf prefix)
             (cl-callf substring word 1))
                (assoc word
                       '(("NUL" . "\0")
                         ("RET" . "\r")
                         ("LFD" . "\n")
                         ("TAB" . "\t")
                         ("ESC" . "\e")
                         ("SPC" . " ")
                         ("DEL" . "\177"))))
             (setq word (cdr found)))
           (cond ;;; apply modifiers 
            ((= bits 0) (setq key word))
            ((/= (length word) 1)
             (error "%s: Prefix  must precede a single character, not %s"
                    string word))
              (/= (logand bits ?\C-\^@) 0)
              (string-match "[@-_a-z]" word))
             (setq key
                   (list (+ bits (- ?\C-\^@)
                            (logand (aref word 0) 31)))))
            (t (setq key (list (+ bits (aref word 0)))))))))
;;; push key on to the result vector 
       (when key (cl-callf vconcat res key))))

You can verify the code above by running the tests found at the end of file new-kbd.el — the tests were extracted from the various patterns described in the Elisp Reference, as well as by reading the code in edmacro-parse-keys.

2.6 Closing Thoughts

The above simplification exercise was done by:

  1. Starting with the original edmacro-parse-keys copied over to a new file and renamed to function new-kbd.
  2. Adding a set of tests at the end of file, essentially this is a let that binds a set of tests, then compares the result of calling our new function on each value with that returned by the original.
  3. Modifying and simplifying our new function and running eval-buffer after each step.
  4. It was a fun exercise to see order emerge from chaos at each step!

Date: 2020-10-09 Fri 00:00

Author: T.V Raman

Created: 2020-10-10 Sat 08:48