Result Parsers

Result parsers are fairly simple in practice, yet contain a wide variety of options to help you collect results from test output.

General Operations

To see what result parsers are available, run:

$ pav show result_parsers

To see full documentation for one these:

$ pav show result_parsers --doc regex

Base Option Overview

Result parser plugins define their own configuration options, but many options are handled at a higher level. These options are available for every result parser, though some don’t allow for any settings other than the default.

for_lines_matching

A regular expression that tells Pavilion which lines in the file should be examined with the result parser.

Default: match every line.

preceded_by

A list of regular expressions used to match the series of lines before lines where we call the result parser.

Default: No pre-conditions

match_select

When multiple lines match, which is the result?

Default: Use the first matched result.

files

One or more filename globs (*.txt, test.out) that selects which file to parse results from. These are relative to the test build directory (which is the working directory when the test runs).

Default: ‘../run.log’ (the run script output)

per_file

What to do with results from multiple files.

Default: Keep the results from the first file with matches.

action

Manage the output type of the result parser.

Default: Auto-convert to a numeric type, if possible. Default (for the result key): ‘store_true’

Parsing Process

For each key in result_parse, the results are parsed using the following steps.

  1. A list of files is generated from the globs in the files option, in order.
    • If a glob matches no files, the glob is given an ‘_unmatched_glob’ entry in our list of results per file. These start with an underscore, so won’t be in the final results, but will be evaluated when using ‘per_file: all’ or ‘per_file: any’.
  2. Each file is searched using the the for_lines_matching AND preceded_by options. The result parser is called to look at any lines that match all of these conditions.
    • The parser can look at more than just the matched line. The file is reset such that we continue looking at lines starting at the line after the matched line, regardless of what the result parser does.
  3. The result parser returns matched data, or not.
  4. The match_select option which result to return from the list of matches.
  5. The matches for each file found are modified and stored according to the per_file result. - This generally involves type manipulation according to action. - It may involve storing results in multiple keys (see below).

Result Keys

By default, the value found by the result parser is simply stored in the result json under the given key. Key names can be alpha-numeric with underscores.

Keys that begin with an underscore are temporary. They will not be present in the final results.

Multiple Result Keys

(New in 2.3)

Result parsers can produce a list of values, and you can assign them to multiple keys all at once. This is most common with the ‘split’ and ‘regex’ result parsers.

result_parse:
    regex:
        # When you use multiple groupings in a regex, the
        # matches are returned in a list.
        "speed, runtime, points":
            regex: 'results: ([0-9.]+) ([0-9.]+) (\d+)'

    split:
        # You can use underscore as the key for values you want to discard.
        # If there are more values than keys, that's fine too (the extras
        # will be dropped).
        "_, speed2, boats":
            sep: ","
            # Our comma separated line is after this one.
            preceded_by: ['^Results2']

Result Value Types

Result parsers can return any sort of json compatible value. This can be a string, number (int or float), boolean, or a complex structure that includes lists and dictionaries. Pavilion, in handling result values, groups these into a few internal categories.

  • empty - An empty result is a json null, or an empty list. Everything else is non-empty.
  • match - A match is a non-empty result that is also not json false.
  • false - False is special, in that it is neither empty nor a match.

The actions and per_file sections below work with these categories when deciding how to handle result parser values.

Result Parser Defaults

(New in 2.3)

So you’re parsing out 300 different bits of information with the regex parser, and they all use the same, non-default, settings:

result_parse:
    regex:
        normal_key:
            regex: 'normal_key: (\s*)'
        mykey1:
            regex: 'mykey: (\s*)'
            per_file: name
            files: '*.out'
        mykey2:
            regex: 'mykey: (\s*)'
            per_file: name
            files: '*.out'
        # etc...

You can use the ‘_default’ key to set defaults for all keys under that result parser. Be careful with keys that don’t need your new defaults though:

result_parse:
    regex:
        # Note that there is no order to these keys.
        _default:
            per_file: name
            files: '*.out'
        normal_key:
            # You have to go back to the defaults here, unfortunately.
            regex: 'normal_key: (\s*)'
            per_file: first
            files: '../run.log'
        mykey1:
            regex: 'mykey: (\s*)'
        mykey2:
            regex: 'mykey: (\s*)'
        # etc...

Preceded_By and For_Lines_Matching

As mentioned above, these are used to select which lines to call the result parser on. They are combined to form a ‘sliding window’ of regexes that are applied, in order, to check that a sequence of lines matches each of them. The result parser is then called on the line matching the ‘for_lines_matching’ regex.

Given:

result_parse:
    regex:
        foo:
            preceded_by:
                - '^a'
                - '^b'
            for_lines_matching: '^flm'

and a file that looks like:

c
a
a
b
flm
a
b
flm

We’ll match like:

c       ^a   X |      |        |
a              | ^a ✓ |        |
a              | ^b X | ^a ✓   |
b                     | ^b ✓   |
flm                   | ^flm ✓ |
a                              | ^a ✓
b                              | ^b ✓
flm                            | ^flm ✓

Resulting in the the result parser being called twice.

  • We resume checking from the line after any positive selection.
  • Since the default ‘for_lines_matching’ is '' (which matches everything), and ‘preceded_by’ is empty, by default pavilion calls the result parser on every line.

Match_Select

Pavilion calls each result parser for every preceded_by/for_lines_matching match found. Match select allows us to control which match to use.

This is typically the first one (which is default), in which case Pavilion stops searching the file after a single successful match is found.

You can also give an integer index (counting from zero, or backwards from -1) to select the Nth match. If the match at that index doesn’t exist, an error is noted. The keywords ‘first’, and ‘last’ also work.

The ‘all’ keywords causes the full list of matches to be returned, including instances where the result parser returned nothing.

Actions

Actions change how Pavilion stores the final result value in the results.

  • store - (Default, mostly) Store the auto-type converted result into
    the given key/s. Strings that look like ints/floats/True/False will become that native type.
  • store_str - Don’t auto-convert strings, just store them.
  • store_true - (Default for ‘result’ key) Store true if the result is a match (non-empty and not false).
  • store_false - Stores true if the result is not a match.
  • count - Count the length of list matches, regardless of contents. Non-list matches are 1 if a match, 0 otherwise.

Some ‘per_file’ settings bypass the action step, namely ‘namelist’, which doesn’t store the value at all. Others, like ‘all’, will apply the ‘action’ before the ‘all’ calculation.

Files

By default, each result parser reads through the test’s run.log file. You can specify a different file, a file glob, or even multiple file globs to match an assortment of files. The files are parsed in the order given, though ordering for files matched by glob wildcards is filesystem dependent.

Relative paths are relative to the test run’s build directory, which is the working directory when the run/build scripts are run. If you need to reference the run log in addition to other files, it is one directory up from there, in ../run.log.

This test runs across a bunch of nodes, and produces an output file for each. The regex parser runs across each of these, and (because it defaults to returning the first found item only) returns that item or null for each of the files found. What it does with those values depends on the per_file attribute for the result parser.

hugetlb_check:
    scheduler: slurm
    slurm:
      num_nodes: 4

    run:
        cmds:
            # Use the srun --output option to specify that results are
            # to be written to separate files.
            - {{sched.test_cmd}} --output="%N.out" env

    result_parse:
        regex:
          # The matched values will be stored under the 'huge_size' key,
          # but that will vary based on the 'per_file' value.
          huge_size:
              regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
              # Run the parser against all files that end in .out
              files: '*.out'
              per_file: # We'll demonstrate these settings below

per_file: Manipulating Multiple File Results

The per_file option lets you manipulate how results are stored on a file-by-file basis. Since the choice here will have a drastic effect on your results, we’ll demonstrate each from the standpoint of the test config above.

Let’s say the test ran on four nodes (node1, node2, node3, and node4), but only node2 and node3 found a match. The results would be:

  • node1 - <null>
  • node2 - 2M
  • node3 - 4K
  • node4 - <null>

first - Keep the first result (Default)

result_parse:
    regex:
      huge_size:
        regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
        files: '*.out'
        per_file: first

Only the result from the first file with a match is kept. In this case, the value from node1 would be ignored in favor of that of node2. The results would contain:

{
  "huge_size": "2M"
}

In the simple case of only specifying one file, the ‘first’ result is the only result. That’s why this is the default; the first is all you normally need.

last - Keep the last result

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: last

Just like ‘first’, except we work backwards through the files and get the last match value. In this case, that means ignoring node4’s result (because it is null) and taking node3’s:

{
  "huge_size": "4K",
}

name - Stores in a filename based dict.

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: name

Put the result under the key, but in a dictionary specific to that file. All the file specific dictionaries are stored under the per_file key.

{
  "fn": {
    "node1": {"huge_size": null},
    "node2": {"huge_size": "2M"},
    "node3": {"huge_size": "4K"},
    "node4": {"huge_size": null}
  }
}
  • When using the name per_file setting, the key cannot be result.
  • The final extension is removed from the filename.
  • The names are normalized and made unique. Non alphanumeric characters are changed to underscores. Ex: ‘node%3.foo.out’ -> ‘node_3_foo’.

name_list - Stores the name of the files that matched.

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: name_list

Stores a list of the names of the files that matched. The actual matched values aren’t saved. This also normalizes the names and removes the extension as with ‘per_file: name’.

{
  "huge_size": ["node2", "node3"],
}

all - True if each file returned a True result

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: all

By itself, ‘all’ sets the key to True if the result values for all the files evaluate to True. Setting action: store_true produces more predictable results.

  value t/f value action: store_true
No result <null> false false
Non-empty strings '2M' true true
Empty strings '' false true
Non-zero numbers 5 true true
Zero 0 false true
Literal true true true true
Literal false false false false

In our example, the result is false because some of our files had no matches (a <null> result).

{
  "huge_size": false,
}

any - True if any file returned a True result

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: any

Like ‘all’, but is true if any of the results evaluates to True. In the case of our example, since at least one file matched, the key will be set to ‘true’

{
  "huge_size": true,
}

list - Merge the file results into a single list

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: list

For each result from each file, add them into a single list. empty values are not added, but false is. If the result value is a list already, then each of the values in the list is added.

{
  "huge_size": ["2M", "4K"],
}