Generating Elisp via Babashka
March 18, 2021

Generating Emacs Lisp code with Babashka

Recently when working on a YAML parser in Emacs lisp, I soon realized that YAML was much more complicated than I had imagined. To complete the task I would need to go off a formal specification exactly as written. Looking around I soon found a project to express the YAML grammar as data in JSON (https://github.com/yaml/yaml-grammar). There was even examples of projects using this grammar to generate parsers. The following is an example of one of the rules:

  "s-indent-le": {
    "(...)": "n",
    "(<<<)": {
      "(all)": [
        {
          "(***)": "s-space"
        },
        {
          "(<=)": [
            {
              "(len)": "(match)"
            },
            "n"
          ]
        }
      ]
    }
  },

I wanted to do something similar with my parser so I decide to try to use Clojure to generate the elisp code to parse the various grammar rules. The final result looks as follows:

(cond
 ((eq state 's-indent-le)
  (let ((n (nth 0 args)))
    (yaml--frame "s-indent-le"
      (yaml--may (yaml--all (yaml--rep2 0 nil (lambda () (yaml--parse-from-grammar 's-space)))
                            (<= (length (yaml--match)) n))))))
 ...)

Overall the project was a success and I wanted to go into more about this. You can find the generation script herer: https://github.com/zkry/yaml.el/blob/master/grammargen.bb

S-Expressions

The defining feature of Emacs Lisp and Clojure that make such a solution simple is the fact that both rely on S-expressinos. While they are interpreted differently and especially Emacs would have difficulty dealing with some of Clojure’s syntax, in terms of lists, both have a very similar syntax. This allowed me to print Clojure lists representing Emacs’ syntax. Some items of interest are:

  • Emacs characters can be defined by creating a symbol out of the Emacs character reading syntax (symbol "?\\x20").
  • Emacs true is just the symbol t in clojure. Emacs nil is the symbol nil.
  • The rest is just nested list of symbols!

When implementing I found that calling recursive calls to multimethods based on the type worked really well. Here is a condensed example:

(defmulti gen-elisp-parse-expr #(.getName (class %)))

(defmethod gen-elisp-parse-expr "java.lang.String" [chr]
  (cond (= "<start-of-line>" chr)
        (list 'yaml--start-of-line)
        (#{"in-flow" "block-key" "flow-out" "flow-in" "block-in" "block-out"} chr)
        chr
        (or (= (count chr) 1) (= (first chr) \x))
        (list 'yaml--chr (gen-elsip-char-symbol chr))
        (= "N" chr)
        (list (prefix-package-symbol "\n"))
        :else
        (list (symbol fn-name) (prefix-package-symbol chr))))

(defmethod gen-elisp-parse-expr "clojure.lang.PersistentVector" [[min max]]
  (list 'yaml--chr-range
        (gen-elsip-char-symbol min)
        (gen-elsip-char-symbol max)))

(defmethod gen-elisp-parse-expr "clojure.lang.PersistentArrayMap" [m]
  (cond
    (get m "(all)")
    (concat (list 'yaml--all) (map gen-elisp-parse-expr (get m "(all)")))

    (get m "(any)")
    (concat (list 'yaml--any) (map gen-elisp-parse-expr (get m "(any)")))


    (get m "(<<<)")
    (list 'yaml--may (gen-elisp-parse-expr (get m "(<<<)")))

    (get m "(<=)")
    (let [[a b] (get m "(<=)")]
      (list '<= (gen-elisp-fn-arg a) (gen-elisp-fn-arg b)))

    (get m "(max)")
    (list 'yaml--max (get m "(max)"))

    ;; else funcall with args
    :else
    (let [[f args] (first m)]
      ;;(println "[debug-2]" (pr-str f) (pr-str args))
      (concat (list (symbol fn-name) (prefix-package-symbol f))
              (map gen-elisp-fn-arg (flatten (list args)))))))

Macros

One thing I found that sort of helped were macros. For example, in the following generated Elisp code,

(yaml--any (yaml--but (lambda () (yaml--parse-from-grammar 'ns-plain-safe c))
                      (lambda () (yaml--chr ?\:))
                      (lambda () (yaml--chr ?\#)))
           (yaml--all (yaml--chk "<=" (yaml--parse-from-grammar 'ns-char))
                      (yaml--chr ?\#))
           (yaml--all (yaml--chr ?\:)
                      (yaml--chk "=" (yaml--parse-from-grammar 'ns-plain-safe c))))

the yaml--any and yaml-all are macros that run the code with the correct semantics. yaml--any will save the parsing position and run the forms in order, failing and resetting the parsing position if none of the sub-rules are successfully or returning the first success. Doing this with just functions would look a little more verbose and hard to follow:

(lambda ()
  (yaml--any
   (lambda ()
     (yaml--but (lambda () (yaml--parse-from-grammar 'ns-plain-safe c))
                (lambda () (yaml--chr ?\:))
                (lambda () (yaml--chr ?\#))))
   (lambda ()
     (yaml--all (lambda () (yaml--chk "<=" (yaml--parse-from-grammar 'ns-char)))
                (lambda () (yaml--chr ?\#))))
   (lambda ()
     (yaml--all (lambda () (yaml--chr ?\:))
                (lambda () (yaml--chk "=" (yaml--parse-from-grammar 'ns-plain-safe c)))))))

Conclusion

If you ever have the need of generating Emacs lisp code and already enjoy using Clojure, Babashka is a great tool that lets you write a script for the generation. Granted, S-expressions are so simple you can easily print them in other languages.