Lexical Structure

This chapter specifies the lexical structure of Dada programs. A Dada source file is a sequence of Unicode characters, which the lexer converts into a sequence of tokens.

Source Encoding

syntax.lexical-structure.source-encoding

Dada source files are encoded as UTF-8.

Tokens

syntax.lexical-structure.tokens

The lexer produces a sequence of tokens:


Token ::= Identifier
          | Keyword
          | Literal
          | Operator
          | Delimiter

A token Token is one of the following kinds:

syntax.lexical-structure.tokens.preceding-whitespace

Each token records whether it was preceded by whitespace, a newline, or a comment. This information is used by the parser but does not produce separate tokens.

Whitespace and Comments

Whitespace

syntax.lexical-structure.whitespace-and-comments.whitespace

Whitespace characters (spaces, tabs, and other Unicode whitespace excluding newlines) separate tokens but are otherwise not significant.

syntax.lexical-structure.whitespace-and-comments.whitespace.newlines

Newline characters (\n) are tracked by the lexer. Whether a token is preceded by a newline may affect how the parser interprets certain constructs.

Comments

syntax.lexical-structure.whitespace-and-comments.comments

A comment begins with # and extends to the end of the line.

syntax.lexical-structure.whitespace-and-comments.comments.content

The content of a comment, including the leading #, is ignored by the lexer. A comment implies a newline for the purpose of preceding-whitespace tracking.

`Identifier` definition

syntax.lexical-structure.identifier-definition

An identifier Identifier begins with a Unicode alphabetic character or underscore (_), followed by zero or more Unicode alphanumeric characters or underscores, provided it is not a keyword Keyword:


Identifier ::= (Alphabetic | _) (Alphanumeric | _)*    (not a Keyword)

syntax.lexical-structure.identifier-definition.case-sensitivity

Identifiers are case-sensitive.

`Keyword` definition

syntax.lexical-structure.keyword-definition

The following words are reserved as keywords:


Keyword ::= as
            | async
            | await
            | class
            | else
            | enum
            | export
            | false
            | fn
            | give
            | given
            | if
            | is
            | let
            | match
            | mod
            | mut
            | my
            | our
            | perm
            | pub
            | ref
            | return
            | self
            | share
            | shared
            | struct
            | true
            | type
            | unsafe
            | use
            | where

`Operator` definition

syntax.lexical-structure.operator-definition

The following single characters are recognized as operator tokens:


Operator ::= + | - | * | / | % | = | !
           | < | > | & | | | : | , | . | ; | ?

.plus +
.minus -
.star *
.slash /
.percent %
.equals =
.bang !
.less-than <
.greater-than >
.ampersand &
.pipe |
.colon :
.comma ,
.dot .
.semicolon ;
.question ?

syntax.lexical-structure.operator-definition.multi-character

Multi-character operators such as &&, ||, ==, <=, >=, and -> are formed by the parser from adjacent operator tokens.

`Delimiter` definition

syntax.lexical-structure.delimiter-definition

A delimited token contains a matched pair of brackets and their contents:


Delimiter ::= ( Token* ) | [ Token* ] | { Token* }

.parentheses Parentheses: ( and ).
.square-brackets Square brackets: [ and ].
.curly-braces Curly braces: { and }.

syntax.lexical-structure.delimiter-definition.balanced

Delimiters must be balanced. An opening delimiter without a matching closing delimiter is an error.

syntax.lexical-structure.delimiter-definition.nesting

The lexer tracks delimiter nesting. Content between matching delimiters is treated as a unit, which enables deferred parsing of function bodies and other nested structures.

`Literal` definition

syntax.lexical-structure.literal-definition

A literal Literal is one of the following:


Literal ::= IntegerLiteral
            | BooleanLiteral
            | StringLiteral

`IntegerLiteral` definition

syntax.lexical-structure.literal-definition.integerliteral-definition

An integer literal IntegerLiteral is a sequence of one or more ASCII decimal digits (0–9), optionally separated by underscores (_) that do not affect the value:


IntegerLiteral ::= Digit (_? Digit)*
Digit ::= 0 | 1 | ... | 9

`BooleanLiteral` definition

syntax.lexical-structure.literal-definition.booleanliteral-definition

The keywords true and false are boolean literals:


BooleanLiteral ::= true | false

`StringLiteral` definition

syntax.lexical-structure.literal-definition.stringliteral-definition

String literal syntax is specified in String Literals.

Lexical Errors

syntax.lexical-structure.lexical-errors

Characters that do not begin a valid token are accumulated and reported as a single error spanning the invalid sequence.

Keyboard shortcuts

Dada Language Specification