Syntax

This chapter describes how Hy source code is understood at the level of text, as well as the abstract syntax objects that the reader (a.k.a. the parser) turns text into, as when invoked with hy.read. The basic units of syntax at the textual level are called forms, and the basic objects representing forms are called models.

An introduction to models

Reading a Hy program produces a nested structure of model objects. Models can be very similar to the kind of value they represent (such as Integer, which is a subclass of int) or they can be somewhat different (such as Set, which is ordered, unlike actual sets). All models inherit from Object, which stores textual position information, so tracebacks can point to the right place in the code. The compiler takes whatever models are left over after parsing and macro expansion and translates them into Python ast nodes (e.g., Integer becomes ast.Constant), which can then be evaluated or rendered as Python code. Macros (that is, regular macros, as opposed to reader macros) operate on the model level, taking some models as arguments and returning more models for compilation or further macro expansion; they’re free to do quite different things with a given model than the compiler does, if it pleases them to, like using an Integer to construct a Symbol.

In general, a model doesn’t count as equal to the value it represents. For example, (= (hy.models.String "foo") "foo") returns False. But you can promote a value to its corresponding model with hy.as-model, or you can demote a model with the usual Python constructors like str or int, or you can evaluate a model as Hy code with hy.eval.

Models can be created with the constructors, with the quote or quasiquote macros, or with hy.as-model. Explicit creation is often not necessary, because the compiler will autopromote (via hy.as-model) any object it’s trying to evaluate.

Note that when you want plain old data structures and don’t intend to produce runnable Hy source code, you’ll usually be better off using Python’s basic data structures (tuple, list, dict, etc.) than models. Yes, “homoiconicity” is a fun word, but a Hy List won’t provide any advantage over a Python list when you’re managing a list of email addresses or something.

The default representation of models (via hy.repr) uses quoting for readability, so (hy.models.Integer 5) is represented as '5. Python representations (via repr()) use the constructors, and by default are pretty-printed; you can disable this globally by setting hy.models.PRETTY to False, or temporarily with the context manager hy.models.pretty. You can also color these Python representations with colorama by setting hy.models.COLORED to True.

class hy.models.Object[source]

An abstract base class for Hy models, which represent forms.

Non-form syntactic elements

Shebang

If a Hy program begins with #!, Hy assumes the first line is a shebang line and ignores it. It’s up to your OS to do something more interesting with it.

Whitespace

Hy has lax whitespace rules less similar to Python’s than to those of most other programming languages. Whitespace can separate forms (e.g., a b is two forms whereas ab is one) and it can occur inside some forms (like string literals), but it’s otherwise ignored by the reader, producing no models.

The reader only grants this special treatment to the ASCII whitespace characters, namely U+0009 (horizontal tab), U+000A (line feed), U+000B (vertical tab), U+000C (form feed), U+000D (carriage return), and U+0020 (space). Non-ASCII whitespace characters, such as U+2009 (THIN SPACE), are treated as any other character. So yes, you can have exotic whitespace characters in variable names, although this is only especially useful for obfuscated code contests.

Comments

Comments begin with a semicolon (;) and continue through the end of the line.

There are no multi-line comments in the style of C’s /* */, but you can use the discard prefix or string literals for similar purposes.

Discard prefix

Like Clojure, Hy supports the Extensible Data Notation discard prefix #_, which can be thought of as another kind of comment. When the reader encounters #_, it reads and then discards the following form. Thus #_ is similar to ; except that normal parsing resumes after the next form ends rather than at the start of the next line: [dilly #_ and krunk] is equivalent to [dilly krunk], whereas [dilly ; and krunk] is equivalent to just [dilly. Comments indicated by ; can be nested within forms discarded by #_, but #_ has no special meaning within a comment indicated by ;.

Identifiers

Identifiers are a broad class of syntax in Hy, comprising not only variable names, but any nonempty sequence of characters that aren’t ASCII whitespace nor one of the following: ()[]{};"'. The reader will attempt to read each identifier as a numeric literal, then attempt to read it as a keyword if that fails, then fall back on reading it as a symbol if that fails.

Numeric literals

All of Python’s syntax for numeric literals is supported in Hy, resulting in a Integer, Float, or Complex. Hy also provides a few extensions:

  • Commas (,) can be used like underscores (_) to separate digits without changing the result. Thus, 10_000_000_000 may also be written 10,000,000,000.

  • NaN, Inf, and -Inf are understood as literals. Each produces a Float.

  • Hy allows complex literals as understood by the constructor for complex, such as 5+4j. (This is also legal Python, but Hy reads it as a single Complex, and doesn’t otherwise support infix addition or subtraction, whereas Python parses it as an addition expression.)

class hy.models.Integer(number, *args, **kwargs)[source]

Represents a literal integer (int).

class hy.models.Float(num, *args, **kwargs)[source]

Represents a literal floating-point real number (float).

class hy.models.Complex(real, imag=0, *args, **kwargs)[source]

Represents a literal floating-point complex number (complex).

Keywords

An identifier starting with a colon (:), such as :foo, is a Keyword.

Literal keywords are most often used for their special treatment in expressions that aren’t macro calls: they set keyword arguments, rather than being passed in as values. For example, (f :foo 3) calls the function f with the parameter foo set to 3. The keyword is also mangled at compile-time. To prevent a literal keyword from being treated specially in an expression, you can quote the keyword, or you can use it itself as a keyword argument, as in (f :foo :bar).

Otherwise, keywords are simple model objects that evaluate to themselves. Users of other Lisps should note that it’s often a better idea to use a string than a keyword, because the rest of Python uses strings for cases in which other Lisps would use keywords. In particular, strings are typically more appropriate than keywords as the keys of a dictionary. Notice that (dict :a 1 :b 2) is equivalent to {"a" 1 "b" 2}, which is different from {:a 1 :b 2} (see Dictionary literals).

The empty keyword : is syntactically legal, but you can’t compile a function call with an empty keyword argument.

class hy.models.Keyword(value, from_parser=False)[source]

Represents a keyword, such as :foo.

Variables

name – The string content of the keyword, not including the leading :. No mangling is performed.

__bool__()[source]

The empty keyword : is false. All others are true.

__call__(data, default=<object object>)[source]

Get the element of data named (hy.mangle self.name). Thus, (:foo bar) is equivalent to (get bar (hy.mangle "foo")).

The optional second parameter is a default value; if provided, any KeyError from get will be caught, and the default returned instead.

Symbols

Symbols are the catch-all category of identifiers. In most contexts, symbols are compiled to Python variable names, after being mangled. You can create symbol objects with the quote operator or by calling the Symbol constructor (thus, Symbol plays a role similar to the intern function in other Lisps). Some example symbols are hello, +++, 3fiddy, $40, just✈wrong, and 🦑.

As a special case, the symbol ... compiles to the Ellipsis object, as in Python.

class hy.models.Symbol(s, from_parser=False)[source]

Represents a symbol.

Symbol objects behave like strings under operations like get, len(), and bool; in particular, (bool (hy.models.Symbol "False")) is true. Use hy.eval to evaluate a symbol.

Mangling

Since the rules for Hy symbols and keywords are much more permissive than the rules for Python identifiers, Hy uses a mangling algorithm to convert its own names to Python-legal names. The steps are as follows:

  1. Remove any leading underscores. Underscores are typically the ASCII underscore _, but they may also be any Unicode character that normalizes (according to NFKC) to _. Leading underscores have special significance in Python, and Python normalizes all Unicode before this test, so we’ll process the remainder of the name and then add the leading underscores back onto the final mangled name.

  2. Convert ASCII hyphens (-) to underscores (_). Thus, foo-bar becomes foo_bar. If the name at this step starts with a hyphen, this first hyphen is not converted, so that we don’t introduce a new leading underscore into the name. Thus --has-dashes? becomes -_has_dashes? at this step.

  3. If the name ends with ASCII ?, remove it and prepend is_. Thus, tasty? becomes is_tasty and -_has_dashes? becomes is_-_has_dashes.

  4. If the name still isn’t Python-legal, make the following changes. A name could be Python-illegal because it contains a character that’s never legal in a Python name or it contains a character that’s illegal in that position.

    • Prepend hyx_ to the name.

    • Replace each illegal character with XfooX, where foo is the Unicode character name in lowercase, with spaces replaced by underscores and hyphens replaced by H. Replace leading hyphens and X itself the same way. If the character doesn’t have a name, use U followed by its code point in lowercase hexadecimal.

    Thus, green☘ becomes hyx_greenXshamrockX and is_-_has_dashes becomes hyx_is_XhyphenHminusX_has_dashes.

  5. Take any leading underscores removed in the first step, transliterate them to ASCII, and add them back to the mangled name. Thus, (hy.mangle '_tasty?) is "_is_tasty" instead of "is__tasty" and (hy.mangle '__-_has-dashes?) is "__hyx_is_XhyphenHminusX_has_dashes".

  6. Finally, normalize any leftover non-ASCII characters. The result may still not be ASCII (e.g., α is already Python-legal and normalized, so it passes through the whole mangling procedure unchanged), but it is now guaranteed that any names are equal as strings if and only if they refer to the same Python identifier.

You can invoke the mangler yourself with the function hy.mangle, and try to undo this (perhaps not quite successfully) with hy.unmangle.

Mangling isn’t something you should have to think about often, but you may see mangled names in error messages, the output of hy2py, etc. A catch to be aware of is that mangling, as well as the inverse “unmangling” operation offered by hy.unmangle, isn’t one-to-one. Two different symbols can mangle to the same string and hence compile to the same Python variable. The chief practical consequence of this is that (non-initial) - and _ are interchangeable under mangling, so you can’t use e.g. foo-bar and foo_bar as separate variables.

String literals

Hy allows double-quoted strings (e.g., "hello"), but not single-quoted strings like Python. The single-quote character ' is reserved for preventing the evaluation of a form, (e.g., '(+ 1 1)), as in most Lisps (see Additional sugar). Python’s so-called triple-quoted strings (e.g., '''hello''' and """hello""") aren’t supported. However, in Hy, unlike Python, any string literal can contain newlines; furthermore, Hy has bracket strings. For consistency with Python’s triple-quoted strings, all literal newlines in literal strings are read as in "\n" (U+000A, line feed) regardless of the newline style in the actual code.

String literals support a variety of backslash escapes. Unrecognized escape sequences are a syntax error. To create a “raw string” that interprets all backslashes literally, prefix the string with r, as in r"slash\not".

Like Python, Hy treats all string literals as sequences of Unicode characters by default. The result is the model type String. You may prefix a string literal with b to treat it as a sequence of bytes, producing Bytes instead.

Unlike Python, Hy only recognizes string prefixes (r, b, and f) in lowercase, and doesn’t allow the no-op prefix u.

class hy.models.String(s=None, brackets=None)[source]

Represents a literal string (str).

Variables

brackets – The custom delimiter used by the bracket string that parsed to this object, or None if it wasn’t a bracket string. The outer square brackets and # aren’t included, so the brackets attribute of the literal #[[hello]] is the empty string.

class hy.models.Bytes[source]

Represents a literal bytestring (bytes).

Bracket strings

Hy supports an alternative form of string literal called a “bracket string” similar to Lua’s long brackets. Bracket strings have customizable delimiters, like the here-documents of other languages. A bracket string begins with #[FOO[ and ends with ]FOO], where FOO is any string not containing [ or ], including the empty string. (If FOO is exactly f or begins with f-, the bracket string is interpreted as a format string.) For example:

=> (print #[["That's very kind of yuo [sic]" Tom wrote back.]])
"That's very kind of yuo [sic]" Tom wrote back.
=> (print #[==[1 + 1 = 2]==])
1 + 1 = 2

Bracket strings are always raw Unicode strings, and don’t allow the r or b prefixes.

A bracket string can contain newlines, but if it begins with one, the newline is removed, so you can begin the content of a bracket string on the line following the opening delimiter with no effect on the content. Any leading newlines past the first are preserved.

Format strings

A format string (or “f-string”, or “formatted string literal”) is a string literal with embedded code, possibly accompanied by formatting commands. The result is an FString, Hy f-strings work much like Python f-strings except that the embedded code is in Hy rather than Python.

=> (print f"The sum is {(+ 1 1)}.")
The sum is 2.

Since =, !, and : are identifier characters in Hy, Hy decides where the code in a replacement field ends (and any debugging =, conversion specifier, or format specifier begins) by parsing exactly one form. You can use do to combine several forms into one, as usual. Whitespace may be necessary to terminate the form:

=> (setv foo "a")
=> (print f"{foo:x<5}")
…
NameError: name 'hyx_fooXcolonXxXlessHthan_signX5' is not defined
=> (print f"{foo :x<5}")
axxxx

Unlike Python, whitespace is allowed between a conversion and a format specifier.

Also unlike Python, comments and backslashes are allowed in replacement fields. The same reader is used for the form to be evaluated as for elsewhere in the language. Thus e.g. f"{"a"}" is legal, and equivalent to "a".

class hy.models.FString(s=None, brackets=None)[source]

Represents a format string as an iterable collection of hy.models.String and hy.models.FComponent. The design mimics ast.JoinedStr.

Variables

brackets – As in hy.models.String.

class hy.models.FComponent(s=None, conversion=None)[source]

An analog of ast.FormattedValue. The first node in the contained sequence is the value being formatted. The rest of the sequence contains the nodes in the format spec (if any).

Sequential forms

Sequential forms (Sequence) are nested forms comprising any number of other forms, in a defined order.

class hy.models.Sequence(iterable=(), /)[source]

An abstract base class for sequence-like forms. Sequence models can be operated on like tuples: you can iterate over them, index into them, and append them with +, but you can’t add, remove, or replace elements. Appending a sequence to another iterable object reuses the class of the left-hand-side object, which is useful when e.g. you want to concatenate models in a macro.

Expressions

Expressions (Expression) are denoted by parentheses: ( ). The compiler evaluates expressions by checking the first element. If it’s a symbol, and the symbol has the name of a currently defined macro, the macro is called. Otherwise, the expression is compiled into a Python-level call, with the first element being the calling object. The remaining forms are understood as arguments. Use unpack-iterable or unpack-mapping to break up data structures into individual arguments at runtime.

The empty expression () is legal at the reader level, but has no inherent meaning. Trying to compile it is an error.

class hy.models.Expression(iterable=(), /)[source]

Represents a parenthesized Hy expression.

List, tuple, and set literals

Literal lists (List), tuples (Tuple), and sets (Set) are denoted respectively by [ ], #( ), and #{ }.

class hy.models.List(iterable=(), /)[source]

Represents a literal list.

Many macros use this model type specially, for something other than defining a list. For example, defn expects its function parameters as a square-bracket-delimited list, and for expects a list of iteration clauses.

class hy.models.Tuple(iterable=(), /)[source]

Represents a literal tuple.

class hy.models.Set(iterable=(), /)[source]

Represents a literal set. Unlike actual sets, the model retains duplicates and the order of elements.

Dictionary literals

Literal dictionaries (dict, Dict) are denoted by { }. Odd-numbered child forms become the keys whereas even-numbered child forms become the values. For example, {"a" 1 "b" 2} produces a dictionary mapping "a" to 1 and "b" to 2. Trying to compile a Dict with an odd number of child models is an error.

As in Python, calling dict with keyword arguments is often more convenient than using a literal dictionary.

class hy.models.Dict(iterable=(), /)[source]

Represents a literal dict. keys, values, and items methods are provided, each returning a list, although this model type does none of the normalization of a real dict. In the case of an odd number of child models, keys returns the last child whereas values and items ignores it.

Additional sugar

Syntactic sugar is available to construct two-item expressions with certain macros. When the sugary characters are encountered by the reader, a new expression is created with the corresponding macro as the first element and the next parsed form as the second. Thus, since ' is short for quote, 'FORM is read as (quote FORM). No parentheses are required. This is all resolved at the reader level, so the model that gets produced is the same whether you take your code with sugar or without.

Macro

Syntax

annotate

`^FORM

quasiquote

`FORM

quote

'FORM

unpack-iterable

#* FORM

unpack-mapping

#** FORM

unquote

~FORM

unquote-splice

~@FORM

Reader macros

A hash (#) followed by a symbol invokes the reader macro named by the symbol. (Trying to call an undefined reader macro is a syntax error.) Parsing of the remaining source code is under control of the reader macro until it returns.