Generating Emacs Lisp code with Babashka
Recently when working on a YAML parser in Emacs lisp, I soon realized that YAML was much more complicated than I had imagined. To complete the task I would need to go off a formal specification exactly as written. Looking around I soon found a project to express the YAML grammar as data in JSON (https://github.com/yaml/yaml-grammar). There was even examples of projects using this grammar to generate parsers. The following is an example of one of the rules:
"s-indent-le": {
"(...)": "n",
"(<<<)": {
"(all)": [
{
"(***)": "s-space"
},
{
"(<=)": [
{
"(len)": "(match)"
},
"n"
]
}
]
}
},
I wanted to do something similar with my parser so I decide to try to use Clojure to generate the elisp code to parse the various grammar rules. The final result looks as follows:
(cond
((eq state 's-indent-le)
(let ((n (nth 0 args)))
(yaml--frame "s-indent-le"
(yaml--may (yaml--all (yaml--rep2 0 nil (lambda () (yaml--parse-from-grammar 's-space)))
(<= (length (yaml--match)) n))))))
...)
Overall the project was a success and I wanted to go into more about this. You can find the generation script herer: https://github.com/zkry/yaml.el/blob/master/grammargen.bb
S-Expressions
The defining feature of Emacs Lisp and Clojure that make such a solution simple is the fact that both rely on S-expressinos. While they are interpreted differently and especially Emacs would have difficulty dealing with some of Clojure’s syntax, in terms of lists, both have a very similar syntax. This allowed me to print Clojure lists representing Emacs’ syntax. Some items of interest are:
- Emacs characters can be defined by creating a symbol out of the Emacs character reading syntax
(symbol "?\\x20")
. - Emacs true is just the symbol
t
in clojure. Emacs nil is the symbolnil
. - The rest is just nested list of symbols!
When implementing I found that calling recursive calls to multimethods based on the type worked really well. Here is a condensed example:
(defmulti gen-elisp-parse-expr #(.getName (class %)))
(defmethod gen-elisp-parse-expr "java.lang.String" [chr]
(cond (= "<start-of-line>" chr)
(list 'yaml--start-of-line)
(#{"in-flow" "block-key" "flow-out" "flow-in" "block-in" "block-out"} chr)
chr
(or (= (count chr) 1) (= (first chr) \x))
(list 'yaml--chr (gen-elsip-char-symbol chr))
(= "N" chr)
(list (prefix-package-symbol "\n"))
:else
(list (symbol fn-name) (prefix-package-symbol chr))))
(defmethod gen-elisp-parse-expr "clojure.lang.PersistentVector" [[min max]]
(list 'yaml--chr-range
(gen-elsip-char-symbol min)
(gen-elsip-char-symbol max)))
(defmethod gen-elisp-parse-expr "clojure.lang.PersistentArrayMap" [m]
(cond
(get m "(all)")
(concat (list 'yaml--all) (map gen-elisp-parse-expr (get m "(all)")))
(get m "(any)")
(concat (list 'yaml--any) (map gen-elisp-parse-expr (get m "(any)")))
(get m "(<<<)")
(list 'yaml--may (gen-elisp-parse-expr (get m "(<<<)")))
(get m "(<=)")
(let [[a b] (get m "(<=)")]
(list '<= (gen-elisp-fn-arg a) (gen-elisp-fn-arg b)))
(get m "(max)")
(list 'yaml--max (get m "(max)"))
;; else funcall with args
:else
(let [[f args] (first m)]
;;(println "[debug-2]" (pr-str f) (pr-str args))
(concat (list (symbol fn-name) (prefix-package-symbol f))
(map gen-elisp-fn-arg (flatten (list args)))))))
Macros
One thing I found that sort of helped were macros. For example, in the following generated Elisp code,
(yaml--any (yaml--but (lambda () (yaml--parse-from-grammar 'ns-plain-safe c))
(lambda () (yaml--chr ?\:))
(lambda () (yaml--chr ?\#)))
(yaml--all (yaml--chk "<=" (yaml--parse-from-grammar 'ns-char))
(yaml--chr ?\#))
(yaml--all (yaml--chr ?\:)
(yaml--chk "=" (yaml--parse-from-grammar 'ns-plain-safe c))))
the yaml--any
and yaml-all
are macros that run the code with the
correct semantics. yaml--any
will save the parsing position and run
the forms in order, failing and resetting the parsing position if none
of the sub-rules are successfully or returning the first success.
Doing this with just functions would look a little more verbose and hard to follow:
(lambda ()
(yaml--any
(lambda ()
(yaml--but (lambda () (yaml--parse-from-grammar 'ns-plain-safe c))
(lambda () (yaml--chr ?\:))
(lambda () (yaml--chr ?\#))))
(lambda ()
(yaml--all (lambda () (yaml--chk "<=" (yaml--parse-from-grammar 'ns-char)))
(lambda () (yaml--chr ?\#))))
(lambda ()
(yaml--all (lambda () (yaml--chr ?\:))
(lambda () (yaml--chk "=" (yaml--parse-from-grammar 'ns-plain-safe c)))))))
Conclusion
If you ever have the need of generating Emacs lisp code and already enjoy using Clojure, Babashka is a great tool that lets you write a script for the generation. Granted, S-expressions are so simple you can easily print them in other languages.