Parser Components

Pavilion Parsing (test_config.parsers)

Pavilion uses several LALR parsers to interpret the value strings in test configs.

For most of these values the Pavilion StringParser is applied.

  1. Pavilion unique escapes are handled (like \{).

  2. Expressions (text in {{<expr>}} blocks) are pulled out and parsed and resolved by the Expression Parser.

  3. Iterations are applied (text in [~ {{repeat}} ~]). The contents of these are repeated for all permutations of the contained variables.

If you need to parse an individual Pavilion value string, use the parse_text() function defined in this module.

The exception to this is result evaluation strings, which are interpreted directly as a ResultExpression.

class pavilion.parsers.ErrorCat(message, examples, disambiguator=None)

Bases: object

Instances of this class are used to categorize syntax errors.

pavilion.parsers.check_expression(expr: str) List[str]

Check that expr is valid, returning the variables used.

Raises

StringParserError – When the expression can’t be parsed.

pavilion.parsers.match_examples(exc, parse_fn, examples, text)
Given a parser instance and a dictionary mapping some label with

some malformed syntax examples, it’ll return the label for the example that bests matches the current error.

Parameters
  • exc (Union[_lark.UnexpectedCharacters,_lark.UnexpectedToken]) –

  • parse_fn – A lark parser function.

  • examples (list[ErrorCat]) –

  • text – The text that triggered the error.

Returns

pavilion.parsers.parse_text(text, var_man) str

Parse the given text and return the parsed result. Will try to figure out, to the best of its ability, exactly what caused any errors and report that as part of the StringParser error.

Parameters
  • text (str) – The text to parse.

  • var_man (pavilion.test_config.variables.VariableSetManager) –

Raises
  • variables.DeferredError – When a deferred variable is used.

  • StringParserError – For syntax and other errors.

pavilion.parsers.re_compare(str1: str, str2: str, regex) bool

Return True if both strings match the given regex.

pavilion.parsers.state_stack_dist(stack1: List, stack2: List) float

Compare to stacks of states. States are lists of state id’s. The returned number is the sum of mismatches in each direction, divided by the total possible mismatches. So 0.0 is perfection matched, and 1.0 is completely different.

Shared Parser Components

This module contains base classes and exceptions shared by the various Pavilion parsers.

class pavilion.parsers.common.PavTransformer(visit_tokens: bool = True)

Bases: Transformer

Transformers walk the parse tree and modify it. In our case, we’ll be resolving it into a final value. Our transformer always passes up tokens to better track where in the syntax things went wrong, and more carefully handles exceptions.

The String Parser

Grammar and transformation for Pavilion string syntax.

String LALR Grammar

// All strings resolve to this token. 
start: string TRAILING_NEWLINE?

TRAILING_NEWLINE: /\n/

// It's important that each of these start with a terminal, rather than 
// a reference back to the 'string' rule. A 'STRING' terminal (or nothing) 
// is definite, but a 'string' would be non-deterministic.
string: STRING?
      | STRING? iter string
      | STRING? expr string

iter: _ITER_STRING_START iter_inner SEPARATOR
_ITER_STRING_START: "[~"
SEPARATOR.2: _TILDE _STRING_ESC _CLOSE_BRACKET
_TILDE: "~"
_CLOSE_BRACKET: "]"

iter_inner: STRING?
          | STRING? expr iter_inner


expr: _START_EXPR EXPR? (ESCAPED_STRING EXPR?)* FORMAT? _END_EXPR
_START_EXPR: "{{"
_END_EXPR: "}}"
EXPR: /[^}~{":]+/
// Match anything enclosed in quotes as long as the last 
// escape doesn't escape the close quote.
// A minimal match, but the required close quote will force this to 
// consume most of the string.
_STRING_ESC_INNER: /.*?/
// If the string ends in a backslash, it must end with an even number
// of them.
_STRING_ESC: _STRING_ESC_INNER /(?<!\\)(\\\\)*?/
ESCAPED_STRING : "\"" _STRING_ESC "\""

// This regex matches the whole format spec for python.
FORMAT: /:(.?[<>=^])?[+ -]?#?0?\d*[_,]?(.\d+)?[bcdeEfFgGnosxX%]?/

// Strings must start with:
//  - A closing expression '}}', a closing iteration '.]', an opening
//    iteration '[~', or the start of input.
//    - Look-behind assertions must be equal length static expressions,
//      which is why we have to match '.]' instead of just ']', and why
//      we can't match the start of the string in the look-behind.
//  - Strings can contain anything, but they can't start with an open
//    expression '{{' or open iteration '[~'.
//  - Strings cannot end in an odd number of backslashes (that would 
//    escape the closing characters).
//  - Strings must end with the end of string, an open expression '{{',
//    an open iteration '[~', or a tilde.
//  - If this is confusing, look at ESCAPED_STRING above. It's uses the
//    same basic structure, but is only bookended by quotes.
STRING: /((?<=}}|.\]|\[~)|^)/ _STRING_INNER /(?=$|}}|{{|\[~|~)/
_STRING_INNER: /(?!{{|\[~|~|}})(.|\s)+?(?<!\\)(\\\\)*/
class pavilion.parsers.strings.ExprToken(type_, value, start_pos=None, line=None, column=None, end_line=None, end_column=None, end_pos=None)

Bases: Token

Denotes a special token that represents an expression.

column: int
end_column: int
end_line: int
end_pos: int
line: int
start_pos: int
type: str
value: Any
class pavilion.parsers.strings.StringTransformer(var_man)

Bases: PavTransformer

Dynamically transform parsed strings into their final value.

  • string productions always return a list of tokens.

  • ExprTokens are generated for expressions.

  • These lists are collapsed by both ‘start’ and ‘sub_string’ productions.

    • The collapsed result is a single token.

    • The collapse process resolves all ExprTokens.

  • All other productions collapse their components immediately.

EXPRESSION = '<expression>'
classmethod expr(items: List[Token]) Token

Grab the expression and format spec and combine them into a single token. We can’t resolve them until we get to an iteration or the start. The merged expression tokens are set to the self.EXPRESSION type later identification, and have a dict of {‘format_spec’: <spec>, ‘expr’: <expression_string>} for a value.

Parameters

items – The expr components and possibly a format_spec.

iter(items: List[Token]) Token

Handle an iteration section. These can contain anything except nested iteration sections. This part of the string will be repeated for every combination of used multi-valued variables (that don’t specify an index). The returned result is a single token that is fully resolved and combined into a single string.

Parameters

items – The ‘iter_inner’ token and a separator token. The value of ‘iter_inner’ will be a list of Tokens including strings, escapes, and expressions.

iter_inner(items)

Works just like a string production, but repeaters aren’t allowed.

static parse_expr(expr: Token) Tree

Parse the given expression token and return the tree.

start(items) str

Resolve the final string components, and return just a string.

Parameters

items (list[lark.Token]) – A single token of string components.

string(items) Token

Strings are merged into a single token whose value is all substrings. We’re essentially just preserving the tree.

Parameters

items (list[lark.Token]) – The component tokens of the string.

class pavilion.parsers.strings.StringVarRefVisitor(*args, **kwds)

Bases: VarRefVisitor

Parse expressions and get all used variables.

static expr(tree: Tree) List[str]

Parse the expression, and return any used variables.

pavilion.parsers.strings.get_string_parser(debug=False)

Return a string parser, from cache if possible.

pavilion.parsers.strings.should_parse(text)

Returns true if text is a string that needs to be parsed. We err on the side of parsing some string unnecessarily, but this check is much faster than actually calling the parser.

The Expression Parser

Grammar and transformer for Pavilion expression syntax.

// All expressions will resolve to the start expression.
start: expr _WS?
     |          // An empty string is valid

// Trailing whitespace is ignored. Whitespace between tokens is
// ignored below.
_WS: /\s+/

expr: or_expr

// These set order of operations.
// See https://en.wikipedia.org/wiki/Operator-precedence_parser
or_expr: and_expr ( OR and_expr )*
and_expr: not_expr ( AND not_expr )*
not_expr: NOT? compare_expr
compare_expr: add_expr ((EQ | NOT_EQ | LT | GT | LT_EQ | GT_EQ ) add_expr)*
add_expr: mult_expr ((PLUS | MINUS) mult_expr)*
mult_expr: pow_expr ((TIMES | DIVIDE | INT_DIV | MODULUS) pow_expr)*
pow_expr: primary ("^" primary)?
primary: literal
       | var_ref
       | negative
       | "(" expr ")"
       | function_call
       | list_

// A function call can contain zero or more arguments.
function_call: NAME "(" (expr ("," expr)*)? ")"

negative: (MINUS|PLUS) primary

// A literal value is just what it appears to be.
literal: INTEGER
       | FLOAT
       | BOOL
       | ESCAPED_STRING

// Allows for trailing commas
list_: L_BRACKET (expr ("," expr)* ","?)? R_BRACKET

// Variable references are kept generic. We'll use this both
// for Pavilion string variables and result calculation variables.
var_ref: NAME ("." var_key)*
var_key: NAME
        | INTEGER
        | TIMES

// Strings can contain anything as long as they don't end in an odd
// number of backslashes, as that would escape the closing quote.
_STRING_INNER: /.*?/
_STRING_ESC_INNER: _STRING_INNER /(?<!\\)(\\\\)*?/
ESCAPED_STRING : "\"" _STRING_ESC_INNER "\""

L_BRACKET: "["
R_BRACKET: "]"
PLUS: "+"
MINUS: "-"
TIMES: "*"
DIVIDE: "/"
INT_DIV: "//"
MODULUS: "%"
AND: /and(?![a-zA-Z_])/
OR: /or(?![a-zA-Z_])/
NOT.2: /not(?![a-zA-Z_])/
EQ: "=="
NOT_EQ: "!="
LT: "<"
GT: ">"
LT_EQ: "<="
GT_EQ: ">="
INTEGER: /\d+/
FLOAT: /\d+\.\d+/
// This will be prioritized over 'NAME' matches
BOOL.2: "True" | "False"

// Names can be lower-case or capitalized, but must start with a letter or
// underscore
NAME.1: /[a-zA-Z_][a-zA-Z0-9_]*/

// Ignore all whitespace between tokens.
%ignore  / +(?=[^.])/
class pavilion.parsers.expressions.BaseExprTransformer(visit_tokens: bool = True)

Bases: PavTransformer

Transforms the expression parse tree into an actual value. The resolved value will be one of the literal types.

BOOL(tok: Token) Token

Convert to a boolean.

ESCAPED_STRING(tok: Token) Token

Remove quotes from the given string.

FLOAT(tok: Token) Token

Convert to a float.

Parameters

tok (lark.Token) –

INTEGER(tok) Token

Convert to an int.

Parameters

tok (lark.Token) –

NUM_TYPES = (<class 'int'>, <class 'float'>, <class 'bool'>)
add_expr(items) Token

Pass single items up, otherwise, perform the chain of math operations. This function will be used for all binary math operations with a tokenized operator.

Parameters

items (list[lark.Token]) – An odd number of tokens. Every second token is an operator.

and_expr(items)

Pass a single item up. Otherwise, apply 'and' logical operations.

Parameters

items (list[lark.Token]) – Tokens to logically 'and'. The ‘and’ terminals are not included.

Returns

compare_expr(items) Token

Pass a single item up. Otherwise, perform the chain of comparisons. Chained comparisons '3 < 7 < 10' will be evaluated as '3 < 7 and 7 < 10', just like in Python.

Parameters

items (list[lark.Token]) – An odd number of tokens. Every second token is an comparison operator ('==', '!=', '<', '>', '<=', '>=').

expr(items)

Simply pass up the expression result.

function_call(items) Token

Look up the function call, and call it with the given argument values.

Parameters

items (list[lark.Token]) – A function name token and zero or more argument tokens.

list_(items) Token

Handle explicit lists.

Parameters

items (list[lark.Token]) – The list item tokens.

literal(items) Token

Just pass up the literal value. :param list[lark.Token] items: A single token.

math_expr(items) Token

Pass single items up, otherwise, perform the chain of math operations. This function will be used for all binary math operations with a tokenized operator.

Parameters

items (list[lark.Token]) – An odd number of tokens. Every second token is an operator.

mult_expr(items) Token

Pass single items up, otherwise, perform the chain of math operations. This function will be used for all binary math operations with a tokenized operator.

Parameters

items (list[lark.Token]) – An odd number of tokens. Every second token is an operator.

negative(items) Token
Parameters

items (list[lark.Token]) –

Returns

not_expr(items) Token

Apply a logical not, if 'not' is present.

Parameters

items (list[lark.Token]) – One or two tokens

or_expr(items)

Pass a single item up. Otherwise, apply 'or' logical operations.

Parameters

items (list[lark.Token]) – Tokens to logically 'or. The ‘or’ terminals are not included.

Returns

pow_expr(items) Token

Pass single items up, otherwise raise the first item to the power of the second item. :param list[lark.Token] items: One or two tokens

primary(items) Token

Simply pass the value up to the next layer. :param list[Token] items: Will only be a single item.

start(items)

Returns the final value of the expression.

class pavilion.parsers.expressions.EvaluationExprTransformer(results: Dict)

Bases: BaseExprTransformer

Transform result evaluation expressions into their final value. The result dictionary referenced for values will be updated in place, so subsequent uses of this will have the cumulative results.

static var_key(items) Token

Just return the key component.

var_ref(items) Token

Iteratively traverse the results structure to find a value given a key. A ‘*’ in the key will return a list of all values located by the remaining key. (‘foo.*.bar’ will return a list of all ‘bar’ elements under the ‘foo’ key.).

Parameters

items

Returns

class pavilion.parsers.expressions.ExprTransformer(var_man)

Bases: BaseExprTransformer

Convert Pavilion string expressions into their final values given a variable manager.

static var_key(items) Token

Just return the key component.

var_ref(items) Token

Resolve a Pavilion variable reference.

Parameters

items

Returns

class pavilion.parsers.expressions.VarRefVisitor(*args, **kwds)

Bases: Visitor

Finds all of the variable references in an expression parse tree.

static var_ref(tree: Tree) List[str]

Assemble and return the given variable reference.

visit(tree) List[str]

Visit the tree bottom up and return all the variable references found.

visit_topdown = None
pavilion.parsers.expressions.get_expr_parser(debug=False)

Return an expression parser (cached if possible).