Scheduler Plugins

Scheduler plugins take care of the scheduling part of testing. They provide tests with a set of variables that can be used in the test, and handle passing test runs off to the control of the scheduler.

Everything in Plugin Basics applies here, so you should read that first.

This may seem quite daunting at first. The hard part, however, is typically in parsing the information you get back from the scheduler itself. Interfacing that with Pavilion is fairly easy.

Scheduler Requirements

For a scheduler to work with Pavilion, it must:

  • Produce jobs with a unique (for the moment), trackable job id

  • Produce jobs that can be cancelled

  • Allow a job to be started asynchronously.

The Pavilion Scheduler plugin system was designed to be flexible in order to support as many schedulers as possible.

Pavilion also provides an advanced scheduler class that provides quite a few features:

  • Allows tests to auto-size relative to available/up nodes.

  • Will automatically break the system into discrete ‘chunks’ of nodes, allowing for tests that run over the whole system in a piecemeal fashion.

Advanced schedulers must be able to get an accurate inventory of nodes, including:

  • Whether each node is currently ‘up’ or ‘allocated’.

  • System information about each node (CPUS, memory info, etc…)

  • The scheduler ‘groups’ that the node belongs to: reservations, partitions. Pavilion’s must be able to filter nodes according the allocation parameters the same way the scheduler would.

Advanced schedulers must also be able to dictate to the scheduler exactly which nodes to use.

Scheduler Plugins

The Scheduler Plugin

This inherits from the ‘pavilion.schedulers.BasicSchedulerPlugin’ or ‘pavilion.schedulers.AdvancedSchedulerPlugin’ class. All of these are fully documented in the ‘pavilion.schedulers.scheduler.SchedulerPlugin’ class.

All scheduler plugin require that you extend the base class by providing:

  1. A _kickoff() method - a means to acquire an allocation given the scheduler parameters and run a script on it. Also needs to return a ‘serializable’ job id, to uniquely identify a scheduler job.

  2. A job_status() method, that asks the scheduler whether a given job id is scheduled, had a scheduling error, was cancelled, or is running.

  3. A cancel() method, to cancel a given job id.

  4. A _get_alloc_nodes() method, to get the list of nodes in an allocation that Pavilion is currently running under.

  5. An available() method, to tell Pavilion if your scheduler can be used at all.

Advanced schedulers must also override the following. They are fully documented in the ‘pavilion.schedulers.advanced.SchedulerPluginAdvanced’ class.

  1. _get_raw_node_data() - Should fetch and return a list of information about each node.

    This is the per-node information mentioned above.

  2. _transform_raw_node_data() - Converts that data into a ‘{node: info_dict}’ dictionary.

    There are several required keys each node’s info_dict must contain, see the method documentation for info on the required and optional keys.

Basic scheduler plugins don’t require any extra methods, but are limited in functionality. See Scheduler Plugin Types for more info.

Scheduler Variables

Every scheduler should also include a scheduler variables class, assigned to your class’s ‘VAR_CLASS’ class variable. This provides information from the scheduler for each test to use in it’s configuration, such as sched.test_nodes (the for each test to use in it’s configuration, such as sched.test_nodes (the number of nodes in the test’s allocation). The base class uses information given by the scheduler plugin and the test’s configuration to figure out 99% of these on its own. You’ll only need to override a few.

Writing a Scheduler Plugin Class

Handling Errors

Your scheduler class should catch any errors it reasonably expects to occur. This includes OSError when making system calls, ValueError when manipulating values (like converting strings to ints), etc. Once caught, then raise a Pavilion specific error, in this case it should always be SchedulerPluginError. Pavilion exceptions take a message about the local context as their first argument, and the prior exception as the second (optional) argument.

from pavilion.schedulers import SchedulerPluginError

try:
    int(foo)
except ValueError as exc:
    raise SchedulerPluginError("Invalid value for foo.", exc)

This allows Pavilion to catch and handle predictable errors, and pass them directly to the user.

Init

Scheduler plugins initialize much like other Pavilion plugins:

from pavilion import schedulers

class Slurm(schedulers.SchedulerPluginAdvanced):

    def __init__(self):
        super().__init__(
            name='slurm',
            description='Schedules tests via the Slurm scheduler.'
        )

Most customization is through method overrides and a few class variables that we’ll cover later. There is also a SchedulerPluginBasic which allows for working with schedulers with a much reduced feature set.

Configuraton

Pavilion has unified scheduler plugin configuration into the ‘schedule’ section. Not all keys from this section will apply to your scheduler, and that’s ok. Most keys are handled automatically given the information gathered on nodes.

You can also, optionally, add a scheduler specific configuration section. To do this, you’ll need to override the _get_config_elems() method. This method returns three items:

  1. A list of YamlConfig Elements.

  2. A dictionary of validation/normalization functions. These will be called to transform the data for each key to a standard format.

  3. A dictionary of default values for each key.

Pavilion uses the Yaml Config library to manage it’s configuration format. Yaml Config uses ‘config elements’ to describe each component of the configuration and their relationships.

The Slurm scheduler plugin provides a solid example of this, but in general:

  • You should only use yaml_config StrElem, ListElem, KeyedElem (a dict with specific key and value formats), and CategoryElem (a dict with mostly unlimited keys, and a shared value format).

  • Validators for individual keys are optional, but you should do str to int conversion and value range checking. These can take several forms, see the SchedulerPlugin._get_config_elems() method documentation.

  • Don’t use the built-in validation and default options for the yaml_config objects, use the validation callbacks/objects and defaults dictionary returned by the function instead.

Kicking Off Tests

Pavilion scheduler plugins generate a kickoff script for each job - a script that will be handed to the scheduler to be run within the allocation. That script will run Pavilion one or more times within that allocation, starting a run.sh script for each test. It’s the responsibility of the run.sh script to actually run applications under MPI, either with mpirun, srun, or similar.

Many schedulers rely on a header information in that kickoff script to relay to the scheduler what the settings for the allocation should be. This is header is optional - the default header adds nothing to the file except a #!/bin/bash line. If you need to define header lines, you’ll need to create a class that inherits from pavilion.schedulers.scheduler.KickoffScriptHeader, and override the _kickoff_lines() method. This method simply returns a list of header lines to add.

Alternatively, when writing your _kickoff method, you can simply pass any relevant information about the job to the scheduler directly through the command line or API calls.

Either way, there are a set of parameters that must be passed on to the scheduler. These are described in the SchedulerPlugin._kickoff docstring. You can safely ignore parameters that aren’t supported by your scheduler.

Composing Commands

Your scheduler plugin will most likely require that you run commands in a subshell. This section provides guidance on how to do so reliably under Pavilion.

# These should be at the top of the file, as standard
import subprocess
import shutil

# Use shutil.which to find the path to your executable, if needed
srun_cmd = shutil.which('srun')
if srun_cmd is None:
    raise SchedulerError("Could not find srun command path.")

my_cmd = [srun_cmd]

# Building your commands with a list is simple and flexible.
if config['account']:
    my_cmd.extend(['-A', config['account']])

# subprocess.check_output will run your command to completion and simultaniously redirect
# and gather the output.
try:
    # You should also redirect stderr, as is appropriate for your command.
    run_output = subprocess.check_output(my_cmd, stderr=subprocess.STDOUT)
# A CalledProcessError will be raised if the command returns an error code.
except CalledProcessError as err:
    raise SchedulerError("Error calling srun. Return code '{}', msg:\n{}"
                         .format(err.returncode, err.output)

# The output will be binary, and will need to be decoded
run_output = run_output.decode()

To find commands on a system, ‘distutils.spawn.find_executable’ is essentially an in-python version of ‘which’.

Environment Variables

You can also add to the environment through the env argument, though you need to make sure to include the base environment in most cases.

import os
import subprocess

myenv = dict(os.environ)
myenv['MY_ENV_VAR'] = 'Hiya!'
myenv['PATH'] = '{}:/opt/share/something/bin'.format(os.environ['PATH'])

subprocess.run(my_cmd, env=myenv)

Job Id’s

Regardless of how you kickoff a test, you must capture a job id for it, and return it as part of a JobInfo object (which is really just a dict). All scheduler commands that act on a job, like cancel, will have access to this object either directly or through an attached test.

The JobInfo dict can contain any keys and values you like, as long as they’re all strings. It’s useful to include the ‘sys_name’ of the machine you’re on (via ‘sys_vars.get_vars(True) [“sys_name”]’) so that you also check if the system that started the job is the same as the one that’s manipulating it.

Job Status

The ‘_job_status()’ method takes the Pavilion base config (Pavilion’s configuration, rather than a test configuration), and the JobInfo for job that status is needed for. It returns a ‘TestStatusInfo’ object, describing the job state returned by the scheduler.

It’s job is to translate all the complicated potential job states for any particular scheduler into one of four more basic states understood by Pavilion:

  • SCHED_ERROR - There was an error in scheduling the job

  • SCHED_CANCELLED - The job was cancelled (usually externally to Pavilion)

  • SCHED_RUNNING - The job is running (but not necessarily the particular test.

  • SCHEDULED - The job is simply waiting for an allocation.

Note that this will only be called if the cached job status in the plugin’s internal ‘_job_statuses’ dictionary is out of date. In fact, you can (as the slurm plugin does), simply use the first call of this function to update the status of all the jobs on the system at once in that dictionary.

# The STATES object has attributes for each valid Pavilion test state,
# but you'll only be using those with the 'SCHED_' prefix.
from pavilion.status_file import STATES
from pavilion.status_file import TestStatusInfo

my_status = TestStatusInfo(
    STATES.SCHED_ERROR,     # Simply pass one of the valid scheduler state constants.
    "Cthulhu at my test.")  # Along with a longer message describing the state.

Cancelling Runs

To write the ‘cancel()’ method, all you need to do is use the job id you saved when you kicked a test off. If there’s an error doing so, return a message why, otherwise simply return ‘None’ to denote success.

All the more complicated parts of cancelling are handled by functions that will wrap your method, so there really isn’t too much to worry about here. The Slurm plugin cancel command is a good example in how simple this can be.

Finding the Allocation Nodes

The _get_alloc_nodes() method needs to be overridden to find the list of nodes for a test’s allocation. This will always be called only from within the allocation - typically the scheduler sets an environment variable with this information.

Note that this may not always be called. If chunking is used, the scheduler plugin will know the exact list of allocation nodes before the test is kicked off.

Scheduler Availability

The ‘available()’ method simply tells Pavilion if the scheduler is available to run jobs on the given system. It’s not a measure of operability, simply a True/False value saying whether the basic commands (or API modules) needed to use the plugin exist.

Advanced Scheduler Methods

If you’re trying to write an advanced scheduler plugin using the ‘SchedulerPluginAdvanced’ parent class, there are a couple more methods to override. These are:

  • _get_raw_node_data() - A method to gather raw information on the cluster’s nodes.

  • _transform_raw_node_data - A method that translates that same data into a dictionary of information about each node.

For information on overriding each of these, refer to the doc strings for each as defined in the ‘pavilion.schedulers.advanced.SchedulerPluginAdvanced’ class. They will tell you everything you need to know about how to write those methods.

The purpose of these methods is to provide Pavilion with the information it needs to make decisions about what nodes to schedule on itself, rather than relying on the scheduler to do so. This allows Pavilion to partition the system in ways that the scheduler might not support on its own. These include the ability to specify ‘all’ as the number of nodes requested, and the ability to perform Chunking of system into multiple, evenly sized pieces.

The downside is that the per-node information must be perfectly accurate or jobs may be rejected by the scheduler (such as when improperly requesting nodes not in the selected partition) or simply wait in the queue forever (such as when selecting nodes that are down).

Scheduler Variables

The second part of creating a scheduler plugin is adding a set of variables that test configs can use to manipulate their test. The vast majority of these are automatically derived from the information you gathered about the nodes for Advanced scheduler plugins or via the schedule.cluster_info test configuration information for Basic scheduler plugins.

Pavilion provides a framework for creating these variables, the pavilion.schedulers.vars.SchedulerVariables class. By inheriting from this class, you can define scheduler variables simply by adding decoratored methods to your child class. The decorators do most of the hard work, you simply have create and return the value. The class itself provides good documentation on how to do this.

The most important variable in all of these is the test_cmd variable, which is probably the only variable that will need to be customized for your scheduler plugin. It provides tests with an mpi startup command, such as mpirun, with arguments automatically set according to the test’s settings. Pavilion tests generally use this variable to prefix their mpi runs when writing their run scripts:

test_cmd_example:

  scheduler: slurm
  schedule:
    nodes: 32

  run:
    cmds:
      - '{{test_cmd}} ./my_mpi_cmd'

How to write a test_cmd variable is documented in the SchedulerVariables.test_cmd() method’s doc string.

Adding the Scheduler Vars to the Scheduler Plugin

To add your scheduler variable class to your scheduler plugin, simply set the variable class as the VAR_CLASS attribute on your scheduler.

from pavilion import schedulers

class MyVarClass(schedulers.SchedulerVariables):
    # Your scheduler variable class

class MySchedPlugin(schedulers.SchedulerPlugin):
    VAR_CLASS = MyVarClass