REP Language Reference¶
The REP Language is the main glue for supporting various capabilities of the Juji platform. Instead of boring you with BNF or other formal grammars, this reference attempts to illustrate the language with examples and give some intuition behind the design.
The REP language deign has evolved slowly overtime, mainly driven by the use cases. However, there are a few design goals that we strive to achieve.
- As a domain specific language (DSL), REP itself is not designed to be general purpose. However, the complexity of the domain, human conversation, requires that the language to be expressive enough to easily specify a large percentage of normal conversations, and to make the rest possible.
- The concepts and constructs of the language should not involve too much incidental complexity. The basis of the language is a rule engine. A small core set of orthogonal and consistent rules should cover the vast majority of cases. A simple language is easier to learn and helps with adoption.
- The code should be easy to write and read, without too much noise or boilerplate. Writing a chatbot should be a fun process, not a chore. In addition to programmers, the target audience includes the line of business people who are used to writing scripts for applications such as Excel, the hard core gamers who are used to doing game modifications, or the technology hobbyists who like to tinker.
- In order to support a graphic user interface layer on top of the DSL, the code should be easily generated and manipulated by programs. Smaller components should be readily composed together to form larger components. Components should be reusable. The most composible solution is to use pure data, and we will take this approach.
- To enable the functionalities beyond rules, the system is designed to be easily extensible by directly embedding user defined functions in the script. The goal is to have a system framework where advanced technology components such as natural language processing, machine learning and others could be plugged into.
Here we summarize the basic syntactic elements used in our language. Content
; is comment. We use the following data types:
These are primitive values that can be composed into collections.
These are the Boolean logic values.
1 2 3
truein the context of logic expression.
We parse a number literal into long or double number based on whether there’s a dot in it.
A string literal is enclosed by double quotation marks.
Keywords are symbolic identifiers that evaluate to themselves. They starts with a
When used inside a pair of parentheses, symbols are identifiers that are used to refer to something else. They often return the value bond to them when evaluated.
These are composite data structures. The following collection types are used in REP extensively.
Vector collection literal starts with
[ and ends with
]. It contains an ordered collection of elements. Each element of the vector can be anything: scalar values or other collections.
1 2 3 4 5
These are hashes that map keys to values. A map is enclosed by
The key value pairs inside a map are not ordered.
1 2 3 4 5
Lists starts with
( and ends with
). They are also ordered collections, but
they often represent executable code and
the first element of the list tells us what the execution is about, e.g. a
control flow construct, a function, or a declaration, and so on.
1 2 3 4 5 6 7 8 9 10
Like most natural language processing software, REP breaks up an utterance into a sequence of words, called tokens. In languages such as English, punctuation such as spaces, periods, colons, and so on are the natural boundary between tokens. In REP, the punctuation marks that are not blanks are also regarded as tokens.
For example, the string
"Hello, world!" is converted into a sequence of
The only exception is
-, which is not considered a token of its own.
"twenty-five-year-old" is a single token.
In addition, we group consecutive digits together as a single token.
"2:30pm" is converted into a sequence of four tokens:
In REP, a token could be represented with a symbol, a string, or a regex.
Symbol tokens are first converted to the lower case, then into the canonical
form (lemma) of their names, so different forms of the same word are treated as
the same token. For example,
BIKES are the same token.
Symbol containing / is not a token
/ is a token of its own, a symbol containing a
/ will be treated
as a name spaced REP language programming construct, instead of a token.
String tokens are not lemmatized, and they are only case insensitive. For
"Bike" are the same token.
For maximal specificity, a token can be specified by a regular expression
(regex). A tag
#token/regex is used to designate a string as a regex.
The string has the same syntax as Java's regex pattern .
See Regex tag below for more details.
Token Conversion Precedence
Strings, symbols, and regex can be freely mixed in a pattern, as expected.
Pattern of Rule¶
REP in its core is a rule language. The patterns of the rules are the basic abstraction of REP. A rule pattern can be used to infer the meaning of user’s input, or to specify the bot's actions. A pattern used in the former case is called a trigger pattern, used in the later, called an action pattern.
Trigger patterns are similar in concept to regular expressions, but the focus here is on capturing natural language patterns. Therefore, they take tokens instead of characters as the basic units of pattern matches.
There are many types of rule patterns. The following are the basic types of patterns that can be composed together to form a complex pattern.
The simplest type of pattern is an ordered sequence of tokens separated by spaces, represented as a vector of symbols. The name of the symbol suggests the string to be matched. For example:
1 2 3
Used as action pattern, the sequence pattern will be output as strings separated by spaces.
If we only want to match the literal form of the text, without lemmatization or skipping, we should make the pattern a string:
1 2 3
String pattern is case insensitive.
Another common type of pattern is to specify multiple alternatives that are
equivalent for the match. However, there are several cases of matching
behaviours for alternatives. For example, should we match zero or more of the
:*, zero or one of them
:?, one or more of them
:+, only one
:1, one to three of them
:1-3, all of them
:a, or anything but them
We use a keyword to indicate the desired case:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
The case indicator keyword has to be the first element of the vector. The orders among the rest of the elements are ignored, since they are alternatives.
When used in action patterns, the system will randomly pick the alternatives using the
compatible semantics as matches. For example,
:1 will randomly pick one
alternative as output;
:? will pick one or zero alternative at random chance;
and so on. The one exception is the
:0 case, as it does not make sense in
actions. The choices are made at run time.
Sometimes we do not know the alternatives and wish to match any words, and wildcard patterns are needed for these cases. Similar to regular expressions, we have four wildcard symbols:
1 2 3 4 5 6 7 8
* can match any number of any words, including zero word.
For convenience, in a sequence pattern, such as
[I love pizza], system automatically insert
* between the regular tokens (i.e. symbols or strings), so that the pattern is the same as
[I * love * pizza]. However, for other cases, such as between a regular token and a vector pattern, or between two vector patterns, the explicit use of
* is required if so desired. For example, the pattern
[[:1 where [which place] [what place]] you [:1 born located]] will not match input “where are you located” due to the extra token
[[:1 where [which place] [what place]] * you [:1 born located]] will match.
. matches any one word,
? matches zero or any one word, and
matches any one or more words.
If the four wildcard literals,
+, need to appear as a part of text, one needs to double quote them as strings.
If we want to specify concrete numbers of wildcard words or a range of numbers, we need to be explicit:
1 2 3 4 5 6 7 8 9 10 11
Wildcard patterns do not make sense in actions, and thus are not allowed there.
The patterns we introduced so far will only match if the input strictly
conforms to the prescribed regular grammar. However, it is often desirable to
specify a loosely defined containment relationship, such as, the input must
contain all the specified patterns (
:a), the input must not contain any of
the specified patterns (
:!), or the input must contain some of the specified patterns (
Order of the sub-patterns do not matter for containment.
1 2 3 4 5 6 7 8 9 10
Containment patterns are not allowed in actions.
At occasions when we need to refine a given pattern to impose further
restrictions, two refinement patterns can be used. These patterns start with a
refinement keyword, either
-. The first part of the pattern following
the refinement keyword is the main pattern to be matched, and the rest are the
In addition to match the first (main) pattern, requirement pattern
the subsequent patterns to match as well; Conversely, exclusion pattern
excludes the subsequent patterns from matching.
1 2 3 4 5 6
Refinement patterns are not allowed in actions.
We sometimes require a pattern to be at the start or the end of the sentence to match. As can be extrapolated from the above, keyword
:0. placed at the beginning (meaning there should be no more token in front) or at the end (meaning there should be no more token behind) of a pattern can be used to signal these.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
For certain syntactic or semantic class of content, some pre-defined tags can also be used to annotate a pattern, requiring its content to fit the class. Tags are prefixed with
#, and are placed in front of the pattern to be annotated.
1 2 3 4
Parts of Speech tag
|#pos/noun||Noun||desk, books, water|
|#pos/verb||Verb||go, enjoy, love|
|#pos/adj||Adjective||superior, one-of-a-kind, the most|
|#pos/adv||Adverb||very, later, lovely|
|#pos/pronoun||Pronoun||she, her, you|
|#pos/preposition||Preposition||on, for, after|
|#pos/particle||Particle||so, up, let|
|#pos/number||Number token||two, third|
|#pos/modal||Verbs don't take s ending in 3rd person||can, may, must|
|#pos/determiner||Determiner||a, no, the, any, each, that|
|#pos/conjunction||Conjunction||and, but, nor, or, plus, minus|
|#pos/interjection||Interjection||my, oh, please, see, uh, well, yes|
|#phrase/NP||Noun phrase||the police officer's dog, a yellow house|
|#phrase/VP||Verb phrase||was walking, must go, let the fresh air in|
|#phrase/PP||Preposition phrase||in the storefront window, by the river|
|#phrase/ADJP||Adjective phrase||smarter than me, extremely delighted|
|#phrase/ADVP||Adverb phrase||in total silence, quite easily|
|#phrase/sub||subordinate clause||that, because, while|
|#phrase/other||not part of any chunk|
|#entity/person||person name||John, Mary|
|#entity/org||organization name||UN, IBM|
|#entity/location||location name||Canada, Main St.|
|#entity/time||time||tomorrow, around 10:30|
|#entity/duration||duration||5 years, 3 hours|
When a pattern requires sub-token variations, we can use character based regular
expressions. A regular expression is represented as a
string with a tag
#token/regex in front. The syntax of the string follows
Java's regular expression.
1 2 3 4 5
In REP, the regex tag pattern is restricted to represent a single token only. A regex representing multiple tokens will never match since the input to the pattern is always a single token.
When a token’s case-sensitivity is important, e.g. when matching acronyms, regex tag can be useful.
Tag patterns are not allowed in actions.
With the exception of regex tag, the same set of tags indicated above can be specified using namespaced keywords, which represent placeholders for the specified class of content. For example,
Essentially, Class Pattern can be thought of as shorthand for a special case of Tag Pattern, where the tagged content are Wildcard Patterns.
1 2 3
Patterns can contain lists representing things that can be called to produce results. We evaluate lists recursively using Clojure’s
There are two types of callable in REP.
Clojure Built-in Form
In order to support proper logic branching behavior in action patterns, we implemented special forms
if ourselves to match Clojure’s semantics. This enables us to also support most of the Clojure’s branching macros:
condp. The only exception is
when-first due to the way it was implemented in Clojure.
Two types of function call can be included in the patterns.
Functions with names starting with
_ allow the generation of dynamic patterns at runtime. Such function calls can appear anywhere in place of a token, as long as they return an appropriate data structure for a pattern. This applies to both trigger and action patterns.
1 2 3 4
Because function calls are executed during live chat, the calls should not take too long to complete if a good response time is desired.
Regular functions do not generate patterns, but can be used for two purposes:
Producing side effects, such as displaying a visualization, processing user actions, querying database, and so on;
Serving as an additional condition for the trigger pattern. That is to say, if the function return value is
nil, the whole match fails. In other words, there’s an implicit
andlogic relation among the functions within a trigger pattern.
or? Well, all rules are implicitly
ored together in a topic (see below).
One can still explicitly use Clojure logic forms such as
not within the patterns.
Juji platform provide a set of built-in functions, see System Functions for details.
Juji system functions have special calling conventions: 1. No namespace is necessary. 2. The first argument should be omitted, for it refers to the chatbot itself.
To define a function, the
defn form of Clojure can be used in the script. The
defined function resides in the namespace of the script (see below).
In addition to plain compositions of rule patterns, we introduce some important constructs that are useful for writing more sophisticated scripts.
Often we want to reuse a pattern in different places, so we want to assign the
pattern a name to refer to it. Such named pattern is indicated by a symbol
The visibility of the name pattern depends on where it is defined. If the assignments are done with a top level
named-pattern form, the named patterns defined therein are globally accessible. If defined inside a topic (see below), it is visible only within that topic.
1 2 3 4 5 6
The bindings of named patterns happen in the order they appear, so later bindings can refer to previous named patterns. Topic specific named patterns can refer to global named patterns. Topic specific named patterns can also override the global named patterns with the same name.
We encourage the use of named patterns as they promote code reuse and lead to better organized and more readable scripts.
The substitution of a pattern name by the actual pattern it refers to happens at compile time. Named patterns can contain function calls (see below), but not captured content (see below).
We often want to name the content matched by a pattern. A form looks like
(?captured-content-name pattern) can be used to do that, where a symbol starting with
? will be assigned the content matched by the pattern.
1 2 3 4
The captured content can then be referred to later by its name symbol, for example,
?kind. The reference to captured content is visible within the containing rule, including the remaining parts of the trigger pattern, action pattern and anonymous followup topics.
1 2 3 4 5 6 7 8 9 10 11
For the most common case of capturing the
+ wildcard pattern, i.e. capturing any one or more tokens, we can use a shorthand, just
1 2 3
It is often desirable to normalize the captured content into a standard format to store in the captured variable, we can add an additional argument for the capture form to do this. This 3rd elment of the capture form can be either a function or a value. If it is a value, it will simply replace the captured content. If it is a function, the function must be a variadic function with more than one parameter: the first parameter is the REP instance, and the rest of the parameters each correspond to the captured tokens. For example,
1 2 3 4 5 6 7 8
Capturing content is not allowed in actions, but referring to the captured content in action patterns is its intended use, where we gather user input.
A useful use case for class pattern is to combine it with capturing:
1 2 3 4
?-starting Symbol Resolution Precedence
The resolution of a symbol started with
? uses the following search precedence: First check if it is an argument of the containing topic, then check if it is a reference to captured content, followed by a check on whether it is inherited from parent topics if this is part of an anonymous followup topic (see below), if all above fails, it will then be treated as a shorthand for capturing
In the Conceptual Overview, we have seen that topics are the building blocks of REP script.
Define and Use¶
The declaration of a topic is represented by a
deftopic as the first element, then a symbol as the name of the
topic, followed by a vector of parameters. A topic may take zero or more
parameters, e.g. useful for passing contextual values to followup topics. Each
parameter is represented by a symbol starting with
The body of the topic consists of an optional map and a number of rules. A rule is a pair of a trigger and an action, optionally followed by zero or multiple followup topic invocations. Schematically, the structure of a topic definition is illustrated by the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
followup-topic2-1 must all be defined elsewhere already. Each of them happens
to take a single parameter.
As you can see, the invocation of a topic is simply represented by a list, with
the first element being the name of the topic, followed by a number of parameter
values that match the topic definition. For example, an invocation of the topic
a-topic defined above could look like this
(a-topic 2 5), where the
?para-1 is bound to value
?para-2 to value
A topic can be seen as essentially a collection of rules. To generate a response, rules in a topic are tried in the order that they are written.
Two types of rules can be defined, the simple rule and the branched rule.
We have seen a few examples of simple rules, which consist of a trigger pattern, an action pattern, and optionally a number of followup topic invocations.
1 2 3
If a simple rule’s trigger pattern is
, it means that this rule is a
proactive rule and will fire regardless when the rule is tested.
Branched rules allow further refinement for a trigger pattern, which will be refined into a tree of more trigger patterns, with corresponding actions and followup topics. That is to say, instead of a single action, a trigger will be paired with a list of sub-rules, each can be a rule of its own. The list can optionally end with a default action (optionally with a list of followup topics), which will be returned when all the sub-rules fail to trigger.
1 2 3 4 5 6 7 8 9
Line 2 and 6 are two branch triggers. Only one of them can be triggered. Or none of them matches. In that case, the default action in Line 9 will be generated as output.
Sub-rules inherit the captured content of the ancestor matches, allowing a path of matches to capture multiple pieces of information necessary for a complex action.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
A topic may optionally include an option map to control its behavior. The options take default values if not specified in the option map. These are the options:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
:default-rules, a topic may be thought
of as a composition of two sets of rules that are tried in order, with an
intermission of ad-lib topics:
- rules in the main body of the topic
- rules in
:default-rules, which only apply after none of the above fires and none of the ad-lib topics fires.
Both sets of rules may include rules of other topics.
:include-after options enable the rules of a topic to become parts of another
topic, allowing topics to become composible. Rules in the topics of
:include-before are tried before the main body of rules,
tried after the main body of rules.
Taken together, the order of trying rules of a topic is the following:
- include-before of the topic
- main body of the topic
- include-after of the topic
- ad-lib topics
- include-before of the default rules
- main body of the default rules
- include-after of the default rules
Sometimes it is necessary to use the set of rules in the main body only. These
rules can be referred to with a special topic name, an earmuff enclosed
topic name and a
-main suffix. For example, for a topic named
my.ns/handle-favorite-things, the system automatically creates a corresponding
my.ns/*handle-favorite-things-main* topic to refer to the rules in the main body of
the topic. Similarly,
my.ns/*handle-favorite-things-default* refers to the set
of rules in the default-rules of the topic.
Since the followup topics of a topic can contain any topic, including the parent topic itself, REP supports arbitrary recursion among topics. This mechanism can be used for repeating, looping and any other purposes that require to go back to a prior topic.
In addition to using the topic name as the recursion target, REP provides a
special followup topic,
(*recur*), which always recurs back to the
current topic. This is specially useful when a topic is included in another
topic, so that the followup topic of the included topic can
go back to the including topic, which is usually the intended behavior.
Any named topic can be a followup topic of another topic. However, sometimes we do not want to come up with names for some one-shot followup topics. In these cases, we can define anonymous followup topic and use it in place.
1 2 3 4 5 6 7 8 9
_as the topic name, they do not take parameters and do not have their own option maps. Instead they inherit the parameters and option map of the ancestor topics, as well as the captured contents of the ancestor rules.
Variables are symbols that can be used to track information or store results during system run. Variables are scoped within a running REP instance and are available only during the run-time.
Syntactically, variables should always appear inside a pair of parentheses.
Sometimes it is necessary to track some information local to a topic. Local
variables can be set using function
(set-var name value), where name could be
any symbol and value any Clojure data. This function always returns
nil, so it
should only be used in actions.
<- is a shorthand function name.
In trigger patterns, use
(set-var-ret name value) instead, because it returns
the value itself and will not short-circuit the match. Shorthanded name for this
Local variables of a topic are inherited by its followup topics. The local variable of a followup topic takes precedence over that of the parent topic with the same name.
It is often useful to use some global variables to track conversational state that span across multiple conversation topics. We can set a global variable value with
(set-global-var name value) or
(set-global-var-ret name value) system function calls, and their short-hands are
-<-| respectively. Global variables are accessible by all topics and live until the end of the conversation.
Local variable has precedence over global variable when the two names collide.
REP reuses Clojure namespace constructs. Each script has a unique namespace, and may require other namespaces.
1 2 3
/is considered as namespaced, e.g.
abcis the namespace prefix. Namespaced symbols can be used to refer to things defined in other namespaces,
Clojure core functions or macros of Clojure, such as
well as Juji system functions can be called without namespace prefix.
In order to support conducting surveys and have good results reporting, REP treat questions specially. Questions need to be defined before being asked in the topics. Similar to named patterns, a top level
(question ...) form is used to define named questions with a binding form.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
:single-choicequestions are radio buttons;
:multiple-choicequestions are check boxes.
The value of
:choices attribute may be given a name, defined before hand, so that they are reusable, e.g.
yes-on, or it could be included inline, e.g. the choices in
:open-ended question are normally presented as sentences in a conversational turn. They may optionally have a
:wording attribute that is an action pattern.
Questions, once defined, can be used in special functions to be displayed to the
:open-ended questions are displayed using function
:multiple-choice questions are displayed with
(ask-qui-question question-name) function.
To record user’s answer, use function
(record-answer question-name content), where content should be a captured content of user input or user choices.
REP can present information and accept user input via GUI displays. Displays are specified in a top level form
gui, which binds some GUI elements to the corresponding symbols.
1 2 3 4 5 6
(display-gui display)can be called to display a GUI element. Normally this should happen on the action pattern of some rules.
REP is designed as a declarative language. Developers do not control the execution flow directly, as the conversation may progress in a non-deterministic fashion due to user's responses. Developers only write down the rules, and code execution is handled by the system.
Users can influence the system behaviours by specifying some control directives.
In addition to topic specific directives in option map, some global directives for the bot can be declared in a global map called
1 2 3 4 5 6 7 8 9 10 11 12 13
:release-actionallows a vector of function calls to run right after compilation, so some setup for the release can be done, e.g. to prepare some read-only data resources.
:pre-action allows a vector of function calls to run before a chat session begins.
This allows some session specific setup, e.g. to initialize some global variables.
:post-action allows a vector of function calls after a chat session ends.
:agenda vector specifies desired conversation progression in term of topics. It uses a similar format as that of action patterns, only that the basic unit is topic invocation instead of tokens. It supports sequence pattern, alternative pattern, wildcard pattern, exclusion pattern, start and end pattern. Also, parameters for top level topics are given here.
REP may also use some topics as conversational fillers, e.g. to initiate small
talks unrelated to the agenda, or to quickly dispatch user digression. These
topics are declared in
REP may also actively check some topics in the background, where some external conditions are the
triggers. When the conditions are met, the REP may notify the users about the
external events. These topics are in
User may also give unexpected input to REP. These exceptional user input are
handled by topics declared in
The declaration of
:exception uses the same format.