Test Results

Every successful test run generates a set of results in JSON. These are saved with the test, but are also logged to a central results.log file that is formatted in a Splunk compatible manner.

This page covers how results are gathered for each test run, as well as how to use the results system to pull results from test output.

Gathering Results

Result gathering steps:

  1. Generate the base result field values.
  2. Use result parser plugins to parse values from result files.
  3. Use result evaluations to modify results or generate additional values.
  4. Convert the ‘result’ key from True/False to ‘PASS’/’FAIL’.

All results are stored in a json mapping that looks like this:

{"avg_speed": 3.7999,
 "command_test": "PASS",
 "created": "2020-07-06 15:28:08.528899",
 "duration": "0:00:06.130584",
 "finished": "2020-07-06 15:28:08.528617",
 "id": 16176,
 "job_id": "123",
 "name": "mytests.base",
 "pav_result_errors": [],
 "result": "PASS",
 "return_value": 0,
 "sched": {"avail_mem": "80445",
           "cpus": "36",
           "free_mem": "76261",
           "min_cpus": "36",
           "min_mem": "131933.144",
           "total_mem": "125821"},
 "started": "2020-07-06 15:28:02.398033",
 "sys_name": "pav-test",
 "n": {
    "node01": {"raw_speed": 33},
    "node03": {"raw_speed": 39},
    "node05": {"raw_speed": 42}
 }
 "user": "bob"}
  • Most values are stored at the top level of the mapping.
  • Most scheduler variables are included, to improve tracking of how exactly the test ran.
  • pav_result_errors stores a list of errors encountered when gathering results.
  • The ‘n’ and ‘fn’ sections store per_file information. This can be used to gather separate per-node information in cases where you have an output file for each node when using result parsers.
  • Average speed in this case is a value calculated from each of the node speeds using result evaluations.

The Results Log

The results log records every step of the results gathering process in great detail. It is stored in each test run directory in results.log, and is available via the pav log results <test_id> command.

The general results log keeps a record of the results of every test run under your instance of Pavilion. It’s location is configured via the general pavilion.yaml config, but defaults to residing in the working directory. It’s format is designed to be easily read by Splunk and similar tools.

Basic Result Keys

These keys are present in the results for every test, whether the test passed or failed. To see the latest list of base result values, run pav show result_base. All of these keys, as well as ‘result’, are reserved.

$ pav show result_base
 Name              | Doc
-------------------+-------------------------------------------------
 name              | The test run name
 id                | The test run id
 created           | When the test was created.
 started           | When the test run itself started.
 finished          | When the test run finished.
 duration          | Duration of the test run (finished - started)
 user              | The user that started the test.
 job_id            | The scheduler plugin's jobid for the test.
 sched             | Most of the scheduler variables.
 sys_name          | The system name '{{sys.sys_name}}'
 pav_result_errors | Errors from processing results.
 n                 | Per file results (the filename sans extension).
 fn                | Per filename results.
 return_value      | The return value of run.sh

All time fields are in ISO8601 format.

Basic Result Parsing

The result_parse section of the config lets you configure result parsers to pull useful information out of files generated by your test run.

Result parsers are plugins. Pavilion comes with several that are useful in common scenarios, but tests with more complex results may warrant the development of a custom result parser to specially handle that test’s output. To see the currently available result parsers, use pav show parsers.

$ pav show result_parsers
 Available Result Parsers
-----------+----------------------------------------------------------
 Name      | Description
-----------+----------------------------------------------------------
 command   | Runs a given command.
 filecheck | Checks the working directory for a given file. The
           | parser will tell the user if the filename exists or not.
 constant  | Set a constant as result.
 table     | Parses tables.
 regex     | Find matches to the given regex in the given file. The
           | matched string or strings are returned as the result.

Each of these is configured using its own configuration section under result parse. To see the full configuration documentation for a parser, use pav show result_parsers --config <parser_name>.

mytest:
  scheduler: raw
  run:
    cmds:
      - ping -c 10 google.com

  result_parse:
      # The results.parse section is comprised of configs for result parsers,
      # identified by name. Each parser type section is further comprised
      # of a mapping of key to
      regex:
        # Each result parser can have multiple configs.

        # The value matched will be stored in this key
        loss:
          # This tells the regex parser what regular expression to use.
          # Single quotes are recommended, as they are literal in yaml.
          regex: '\d+% packet loss'
  • Each result parser type can define multiple keys.
  • The found value is stored at the top level of the result JSON (by default).
  • All result parser keys must be unique.
  • For more information on result parsers, see Result Parsers

Result Evaluations

Result evaluations allow you to perform math operations or call various functions on your results to further process them. This allows you to perform normalization, set constants, calculate averages, and other useful operations. The syntax is identical to Expressions (the part inside the double curly braces), except variables are looked up from the results rather than Pavilion variables.

eval_example:
    run:
        cmds:
            - time wget google.com

    # This will produce results that include the line:
    # real   0m3.256s

    result_parse:
        regex:
            # Parse out the minutes and seconds
            real_m:
                regex: 'real\s+(\d+)m'
            real_s:
                regex: 'real\s\d+m([0-9.]+)s'

    result_evaluate:
        # combine the minutes and seconds that we parsed out
        # into one number of seconds.
        real: 'real_m * 60 + real_s'
        # These can reference values set by other evaluations.
        real_hours: 'real/60/60'
  • Result evaluations may reference each other. Order doesn’t matter.
  • Values are automatically treated as the type that most resemble. This conversion is applied both to constants and the contents of variables.
    • 3.59 -> float
    • 3 -> int
    • True -> bool
    • “hello” -> string
    • hello -> variable
    • You can refer to deeply nested values using dot notation:
  • “table.col.row” - Would refer to the ‘row’ key of the ‘col’ dict which is itself a key in the ‘table’ dict.
  • “sched.test_node_list.5” - Would refer to the fifth item in the node list.

Complex Variable References

The result JSON isn’t necessarily a flat dictionary. It may contain lists, other dictionaries, lists of dicts containing lists, and so on. To get to deeper keys, you simply provide each step in the path in a dot-seperated name.

For example, given a result structure that looks like:

{
    "name": "example_results",
    "table":
        "node1": {"speed": 30.0, "items": 5},
        "node2": {"speed": 50.0, "items": 4},
        "node3": {"speed": 70.0, "items": 3}
    "sched": {
        "test_nodes": 3,
        "test_node_list": ["node1", "node2", "node3"],
}
  • “table.node1.speed” - Would refer to the ‘speed’ key of the ‘node1’ dict which is itself a key in the ‘table’ dict. (It has value 30.0).
  • “sched.test_node_list.1” - Would refer to the second item in the node list. (‘node2’ in this case)

Pulling Lists of Values

You can also use a single ‘*’ in a variable name to return a list of every matching value.

  • “table.*.speed” -> [30.0, 50.0, 70.0]
  • “test_node_list.*” -> [“node1”, “node2”, “node3”]
    • Which is the same as “test_node_list” by itself, actually.

To get the keys of a dictionary, use the keys function. The keys are guaranteed to be in the same order as the values produced when using a ‘*’.

A Complex Example

Given a test that produces values per node and gathers them using the result parser per_file: Manipulating Multiple File Results feature, you may want to use all of those values to calculate an average or outliers.

multi_val:
    slurm:
        num_nodes: all

    run:
        # Get the root filesystem usage per node.
        cmds: '{{test_cmd}} -o "%N.out" df /'

    result_parse:
        regex:
            used:
                regex: '^rootfs\s+\d+\s+(\d+)'
                files: '*.out'
                per_file: 'name'
  1. This will give us a <node_name>.out file for each node with the command output.
  2. The result parser will parse out the 3rd column from the ‘rootfs’ line from each of these files.
  3. The ‘per_file’ option of ‘name’ will store these results in the ‘n’ dictionary by the root filename.

The results will look like:

{
    "name": "examples.multi_val",
    "n": {
        "node01": {"used": "709248"},
        "node03": {"used": "802218"},
        "node04": {"used": "699320"},
        "node05": {"used": "2030531"},
    },
    "etc": "...",
}

We could then add the following to our test config to perform calculations on these values.

multi_val:
    # ...

result_evaluate:
    mean_used: 'avg(n.*.used)'
    outlier_data: 'outliers(n.*.used, keys(n), 1.5)'
    outliers: 'keys(outlier_data)'

Would give us results like:

{
    "name": "examples.multi_val",
    "n": {
        "node01": {"used": "709248"},
        "node03": {"used": "802218"},
        "node04": {"used": "699320"},
        "node05": {"used": "2030531"},
    },
    "mean_used": 1060329.25,
    "outlier_data": {"node05": 1.7276},
    "outliers": {"node05"},
}

String Expressions in Result Evaluations

Pavilion variables and string expressions (‘{{ stuff }}’) can be used, carefully, in result evaluations. Keep in mind that they are evaluated as a separate step (using pavilion variables), and will become the string that is later evaluated in result evaluations.

expr_eval:

    variables:
        num_var: 'num'
        num: 8
        base: 2
        message: "hello world"

    result_evaluate:
        # Set the result 'num' key to 52
        num: '50'

        half_num: '{{num_var}}/2'
        # After strings are resolved, this will become:
        # 'num/2'
        # This will then be evaluated, and the 'num' result value will be
        # used (50).
        # The result 'half_num' key will thus be 25. (ie: ``50/2`` )

        # If we want to include a Pavilion variable as a string, it must
        # be put in quotes.
        msg_len: 'len("{{message}}")'
        # This will become: 'len("hello world")'
        # Which will evaluate to ``11``.

        # You can actually include more complex expressions in both
        # the string expression and the evaluation, but this should be
        # avoided
        complicated: '(num * {{ base^10 }})/100'
        # The value string resolves to: '(num * 1024)/100'
        # Which evaluates to: ``(50 * 1024)/100`` -> ``512.0``

The Test Result

The ‘result’ key denotes the final test result, and will always be either ‘PASS’, ‘FAIL’ or ‘ERROR’. ERROR in this case means the test had a non-recoverable error when checking whether the test passed or failed.

You can set the ‘result’ using either result parsers or result evaluations. It must be set as a single True or False value.

  • For result parsers, this means you should use an ‘action’ of ‘store_true’ (the default) or ‘store_false’ (See Actions). You will also need to use a ‘per_file’ setting that produces a single value, like ‘first’ or ‘all’ (See per_file: Manipulating Multiple File Results).
  • For result evaluations this simply means ensuring that the evaluation returns a boolean, typically by way of a comparison operator.

If you don’t set the ‘result’ key yourself, Pavilion defaults to adding the evaluation: result: 'return_value == 0'. This is why, by default, Pavilion test runs PASS if the run script returns 0.

Result Parsers

The result_parse section of each test config lets us configure additional result parsers that can pull data out of test output files. By default each parser reads from the run log, which contains the stdout and stderr from the run script and your test.

mytest:
  scheduler: raw
  run:
    cmds:
      - ping -c 10 google.com

  result_parse:
    # The results.parse section is comprised of configs for result parsers,
    # identified by name. Each parser can have a list of one or more
    # configs, each of which will parse a different result value from
    # the test output.
    result_parse:
      regex:
      # Each result parser can have multiple configs.
        # The value matched will be stored in this key
        loss:
          # This tells the regex parser what regular expression to use.
          # Single quotes are recommended, as they are literal in yaml.
          regex: '\d+% packet loss'

          # We're storing this value in the result key. If it's found
          # (and has a value of 'True', then the test will 'PASS'.
        result:
          regex: '10 received'
          # The action denotes how to handle the parser's data. In this case
          # a successful match will give a 'True' value.
          action: store_true

The results for this test run might look like:

{
  "name": "mytest",
  "id": 51,
  "created": "2019-06-18 16:00:35.692878-06:00",
  "started": "2019-06-18 16:00:36.744221-06:00",
  "finished": "2019-06-18 16:01:39.997299-06:00",
  "duration": "0:01:04.304421",
  "result": "PASS",
  "loss": "0% packet loss"
}

Result Value Types

Result parsers can return any sort of json compatible value. This can be a string, number (int or float), boolean, or a complex structure that includes lists and dictionaries. Pavilion, in handling result values, groups these into a few internal categories.

  • empty - An empty result is a json null, or an empty list. Everything else is non-empty.
  • match - A match is a non-empty result that is also not json false.
  • false - False is special, in that it is neither empty nor a match.

The actions and per_file sections below work with these categories when deciding how to handle result parser values.

Actions

We saw in the above example that we can use an action to change how the results are stored. There are several additional actions that can be selected:

  • store - (Default) Simply store the result parser’s output.
  • store_true - Store true if the result is a match (non-empty and not false)
  • store_false - Stores true if the result is not a match.
  • count - Count the length of list matches, regardless of contents. Non-list matches are 1 if a match, 0 otherwise.

Files

By default, each result parser reads through the test’s run.log file. You can specify a different file, a file glob, or even multiple file globs to match an assortment of files. The files are parsed in the order given.

If you need to reference the run log in addition to other files, it is one directory up from the test’s run directory, in ../run.log.

This test runs across a bunch of nodes, and produces an output file for each. The regex parser runs across each of these, and (because it defaults to returning the first found item only) returns that item or null for each of the files found. What it does with those values depends on the per_file attribute for the result parser.

hugetlb_check:
    scheduler: slurm
    slurm:
      num_nodes: 4

    run:
      cmds:
        # Use the srun --output option to specify that results are
        # to be written to separate files.
        - {{sched.test_cmd}} --output="%N.out" env

    result_parse:
        regex:
          # The matched values will be stored under the 'huge_size' key,
          # but that will vary based on the 'per_file' value.
          huge_size:
              regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
              # Run the parser against all files that end in .out
              files: '*.out'
              per_file: # We'll demonstrate these settings below

per_file: Manipulating Multiple File Results

The per_file option lets you manipulate how results are stored on a file-by-file basis. Since the choice here will have a drastic effect on your results, we’ll demonstrate each from the standpoint of the test config above.

Let’s say the test ran on four nodes (node1, node2, node3, and node4), but only node2 and node3 found a match. The results would be:

  • node1 - <null>
  • node2 - 2M
  • node3 - 4K
  • node4 - <null>

first - Keep the first result (Default)

result_parse:
    regex:
      huge_size:
        regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
        files: '*.out'
        per_file: first

Only the result from the first file with a match is kept. In this case, the value from node1 would be ignored in favor of that of node2. The results would contain:

{
  "huge_size": "2M"
}

In the simple case of only specifying one file, the ‘first’ result is the only result. That’s why this is the default; the first is all you normally need.

last - Keep the last result

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: last

Just like ‘first’, except we work backwards through the files and get the last match value. In this case, that means ignoring node4’s result (because it is null) and taking node3’s:

{
  "huge_size": "4K",
}

all - True if each file returned a True result

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: all

By itself, ‘all’ sets the key to True if the result values for all the files evaluate to True. Setting action: store_true produces more predictable results.

  value t/f value action: store_true
No result <null> false false
Non-empty strings '2M' true true
Empty strings '' false true
Non-zero numbers 5 true true
Zero 0 false true
Literal true true true true
Literal false false false false

In our example, the result is false because some of our files had no matches (a <null> result).

{
  "huge_size": false,
}

any - True if any file returned a True result

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: any

Like ‘all’, but is true if any of the results evaluates to True. In the case of our example, since at least one file matched, the key will be set to ‘true’

{
  "huge_size": true,
}

list - Merge the file results into a single list

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: list

For each result from each file, add them into a single list. empty values are not added, but false is. If the result value is a list already, then each of the values in the list is added.

{
  "huge_size": ["2M", "4K"],
}

fullname - Stores in a filename based dict.

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: fullname

Put the result under the key, but in a dictionary specific to that file. All the file specific dictionaries are stored under the fn key by filename.

{
  "fn": {
    "node1.out": {"huge_size": null},
    "node2.out": {"huge_size": "2M"},
    "node3.out": {"huge_size": "4K"},
    "node4.out": {"huge_size": null}
  }
}
  • When using the fullname per_file setting, the key cannot be result.
  • The rest of the file’s path is ignored, so there is potential for file name collisions, as the same filename could exist in multiple places. Pavilion will report such collisions in the results under the error key.

name - Stores in a filename (without extension) based dict.

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: fullname

Just like fullname, but instead the file name with the file extension removed. These are stored under the n key in the results.

{
  "n": {
    "node1": {"huge_size": null},
    "node2": {"huge_size": "2M"},
    "node3": {"huge_size": "4K"},
    "node4": {"huge_size": null}
  }
}

fullname_list - Stores the name of the files that matched.

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: fullname_list

Stores a list of the names of the files that matched. The actual matched values aren’t saved.

{
  "huge_size": ["node2.out", "node3.out"],
}

name_list - Stores the name of the files that matched.

result_parse:
    regex:
      huge_size:
          regex: 'HUGETLB_DEFAULT_PAGE_SIZE=(.+)'
          files: '*.out'
          per_file: name_list

Stores a list of the names of the files that matched, minus extension. The actual matched values aren’t saved.

{
  "huge_size": ["node2", "node3"],
}

Errors

If an error occurs when parsing results that can be recovered from, a description of the error is recorded under the error key. Each of these is a dictionary with some useful values:

{
  "errors": [{
    # The error happened under this parser.
    "result_parser": "regex",
    # The file being processed.
    "file": "node3.out",
    # The key being processed
    "key": "hugetlb",
    "msg": "Error reading file 'node3.out': Permission error"
  }]
}