RELEASE NOTES FOR SLURM VERSION 1.3
27 June 2008


IMPORTANT NOTE:
SLURM state files in version 1.3 are different from those of version 1.2.
After installing SLURM version 1.2, plan to restart without preserving 
jobs or other state information. While SLURM version 1.2 is still running, 
cancel all pending and running jobs (e.g.
"scancel --state=pending; scancel --state=running"). Then stop and restart 
daemons with the "-c" option or use "/etc/init.d/slurm startclean".

There are substantial changes in the slurm.conf configuration file. It 
is recommended that you rebuild your configuration file using the tool
doc/html/configurator.html that comes with the distribution. The node 
information is unchanged and the partition information only changes for 
the Shared and Priority parameters, so those portions of your old 
slurml.conf file may be copied into the new file.

Two areas of substantial change are accounting and job scheduling.
Slurm is now able to save accounting information in a database, 
either MySQL or PostGreSQL. We have written a new daemon, slurmdbd 
(Slurm DataBase Daemon), to serve as a centralized data manager for 
multiple Slurm clusters. A new tool sacctmgr is available to manage
user accounting information through SlurmdDBD and a variety of 
other tools are still under development to generate assorted 
acccounting reports including graphics and a web interface. Slurm 
now supports gang scheduling (time-slicing of parallel jobs for 
improved responsiveness and system utilization). Many related 
scheduling changes have also been made. 

There are changes in SLURM's RPMs. "slurm-auth-munge" was changed to 
"slurm-munge" since it now contains the authentication and cryptographic 
signature plugins for Munge. The SLURM plugins have been moved out of 
the main "slurm" RPM to a new RPM called "slurm-plugins". There is a 
new RPM called "slurm-slurmdbd" (SLURM DataBase Daemon). Slurmdbd is 
used to provide a secure SLURM database interface for accounting purposes 
(more information about that below). The "slurm-slurmdbd" RPM requires 
the "slurm-plugins" RPM, but none of the other SLURM RPMs. The main 
"slurm" RPM also requires the "slurm-plugins" RPM.

To archive accounting records in a database then database RPMs must be 
installed where the SLURM RPMs are build and where the database is used. 
You have a choise of database, either "mysql" plus "mysql-devel" or 
"postgres" plus "postgres-devel" RPMs.

Many enhancements have been made for better Slurm integration with 
Moab and Maui schedulers. Moab version 5.2.3 or higher should be 
used with SLURM version 1.3. In the Moab configuration file, moab.cfg,
change the SUBMITCMD option from "srun --batch" to "sbatch" since the 
"srun --batch" option is no longer valid (use of full pathnames to 
the commands are recommended, e.g. "/usr/local/bin/sbatch").

Major changes in Slurm version 1.3 are described below. Some changes
made after the initial release of Slurm version 1.2 are also noted.
Many less significant changes are not identified here. A complete list 
of changes can be found in the NEWS file. Man pages should be consulted
for more details about command and configuration parameter changes.


COMMAND CHANGES
===============
* The srun options --allocate, --attach and --batch have been removed.
  Use the new commands added in SLURM version 1.2 for this functionality:
  salloc  - Create a job allocation (functions like "srun --allocate")
  sattach - Attach to an existing job step (functions like "srun --attach")
  sbatch  - Submit a batch job script (functions like "srun --batch")
  These commands generally have the same options as the srun command.
  See the individual man pages for more information. 

* The slaunch command has been removed. Use the srun command instead.

* The srun option --exclusive has been added for job steps to be 
  allocated processors not already assigned to other job steps. This 
  can be used to execute multiple job steps simultaneously within a 
  job allocation and have SLURM perform resource management for the 
  job steps much like it does for jobs. If dedicated resources are 
  not immediately available, the job step will be executed later 
  unless the --immediate option is also set.

* Support is now provided for feature counts in job constraints. For 
  example: srun --nodes=16 --constraint=graphics*4 ...

* The srun option --pty has been added to start the job with a pseudo 
  terminal attached to task zero (all other tasks have I/O discarded).

* Job time limits can be specified using the following formats: min, 
  min:sec, hour:min:sec, and days-hour:min:sec (formerly only supported 
  minutes).

* scontrol now shows job TimeLimit and partition MaxTime in the format of
  [days-]hours:minutes:seconds or "UNLIMITED". The scontrol update options 
  for times now accept minutes, minutes:seconds, hours:minutes:seconds, 
  days-hours, days-hours:minutes, days-hours:minutes:seconds or "UNLIMITED".
  This new format also applies to partition MaxTime in the slurm.conf file.

* scontrol "notify" command added to send message to stdout of srun for 
  specified job id. 

* Support has been added for a much richer job dependency specification 
  including testing of exit codes and multiple dependencies.

* The srun options --checkpoint=<interval> and --checkpoint-path=<file_path>
  have been added.

* Event trigger support was added in Slurm v1.2.2. The command strigger
  was added to manage the triggers.

* Added a --task-mem option and removed --job-mem option from srun, salloc, 
  and sbatch commands. Memory limits are applied on a per-task basis.


SCHEDULING CHANGES
==================
* The sched/backfill plugin has been largely re-written. It now supports 
  select/cons_res and all job options (required nodes, excluded nodes, 
  contiguous, etc.).

* Added a new partition parameter, Priority. A job's scheduling priority is 
  based upon two factors. First the priority of its partition and second the 
  job's priority. Since nodes can be configured in multiple partitions, this 
  can be used to configure high priority partitions (queues).

* The partition parameter Shared now has a job count. For example:
  Shared=YES:4     (Up to 4 jobs may share each resource, user control)
  Shared=FORCE:2   (Up to 2 jobs may share each resource, no user control)

* Added new parameters DefMemPerTask and MaxMemPerTask to control the default
  and maximum memory per task. Any task that exceeds the specified size will 
  be terminated (enforcement requires job accounting to be enabled with a 
  non-zero value for JoabAcctGatherFrequency).

* The select linear plugin (allocating whole nodes to jobs) can treat memory 
  as a consumable resource with SelectTypeParameter=CR_Memory configured.

* A new scheduler type, gang, was added for gang scheduling (time-slicing of 
  parallel jobs). Note: The Slurm gang scheduler is not compatible with the
  LSF, Maui or Moab schedulers.

* The new parameter, SchedulerTimeSlice, controls the length of gang scheduler 
  time slices.

* Added a new parameter, Licenses to support cluster-wide consumable 
  resources. The --licenses option was also added to salloc, sbatch, 
  and srun.

* The Shared=exclusive option in conjunction with SelectType=select/cons_res
  can be used to dedicate whole nodes to jobs in specific partitions while
  allocating sockets, cores, or hyperthreads in other partitions.

* Changes in the interface with the Moab and Maui scheduler have been 
  extensive providing far better integration between the systems.
  * Many more parameters are shared between the systems.
  * A new wiki.conf parameter, ExcludePartitions, can be used to enable 
    Slurm-based scheduling of jobs in specific partitions to achieve
    better responsiveness while losing Moab or Maui policy controls.
  * Another new wiki.conf parameter, HidePartitionJobs, can be used to 
    to hide jobs in specific partitions from Moab or Maui as well. See
    the wiki.conf man pages for details. 
  * Moab relies upon Slurm to get a user's environment variables upon 
    job submission. If this can not be accomplished within a few seconds 
    (see the GetEnvTimeout parameter) then cache files can be used. Use
    contribs/env_cache_builder.c to build these cache files.


ACCOUNTING CHANGES
==================
* The job accounting plugin has been split into two components: gathering
  of data and storing the data. The JobAcctType parameter has been replaced by
  JobAcctGatherType (AIX or Linux) and AccountingStorageType (MySQL, PostGreSQL,
  filetext, and SlurmDBD). Storing the accounting information into a database
  will provide you with greater flexibility in managing the data.

* A new daemon SlurmDBD (Slurm DataBase Daemon) has been added. This can 
  be used to securely manage the accounting data for several Slurm clusters
  in a central location. Several new parameters have been added to support
  SlurmDBD, all starting with SlurmDBD. Note that the SlurmDBD daemon is 
  designed to use a Slurm JobAcctStorageType plugin to use MySQL now. 
  It also uses existing Slurm authentication plugins.

* A new command, sacctmgr, is available for managing user accounts in
  SlurmDBD has been added. This information is required for use of SlurmDBD
  to manage job accounting data. Information is maintained based upon 
  an "association", which has four components: cluster name, Slurm partition, 
  user name and bank account. This tool can also be used to maintain 
  scheduling policy information that can be uploaded to Moab (various 
  resource limits and fair-share values) See the sacctmgr man page and 
  accounting web page for more information. Additional tools to generate 
  accounting reports are currently under development and will be released 
  soon.

* A new command, sreport, is available for generating accounting reports.
  While the sacct command can be used to generate information about 
  individual jobs, sreport can combine this data to report utilization 
  information by cluster, bank account, user, etc. 

* Job completion records can now be written to a MySQL or PostGreSQL
  database in addition to a test file as controlled using the JobCompType
  parameter.


OTHER CONFIGURATION CHANGES
===========================
* A new parameter, JobRequeue, to control default job behavior after a node 
  failure (requeue or kill the job). The sbatch--requeue option can be used to
  override the system default.

* Added new parameters HealthCheckInterval and HealthCheckProgram to 
  automatically test the health of compute nodes.

* New parameters UnkillableStepProgram and UnkillableStepTimeout offer
  better control when user processes can not be killed. For example
  nodes can be automatically rebooted (added in Slurm v1.2.12)

* A new parameter, JobFileAppend, controls how to proceed when a job's
  output or error file already exist (truncate the file or append to it, 
  added in slurm v1.2.13). Users can override this using the --open-mode
  option when submitting a job.

* A new parameter, EnforcePartLimits, was dded. If set then immediately 
  reject a job that exceeds a partition's size and/or time limits rather
  then queued for a later change in the partition's limits. NOTE: Not 
  reported by "scontrol show config" to avoid changing RPCs. It will be 
  reported in SLURM version 1.4.

* Checkpoint plugins have been added for XLCH and OpenMPI.

* A new parameter, PrivateData, can be used to prevent users from being 
  able to view jobs or job steps belonging to other users.

* A new parameter CryptoType to specify digital signature plugin to be used
  Options are crypto/openssl (default) or crypto/munge (for a GPL license).

* Several Slurm MPI plugins were added to support srun launch of MPI tasks
  including mpich1_p4 (Slurm v1.2.10) and mpich-mx (Slurm v1.2.11). 

* Cpuset logic was added to the task/affinity plugin in Slurm v1.2.3. 
  Set TaskPluginParam=cpusets to enable.


OTHER CHANGES
=============
* Perl APIs and Torque wrappers for Torque/PBS to SLURM migration were 
  added in Slurm v1.2.13 in the contribs directory. SLURM now works 
  directly with Globus using the PBS GRAM.

* Support was added for several additional PMI functions to be used by 
  MPICH2 and MVAPICH2. Support for an PMI_TIME environment variable was
  also added for user to control how PMI communications are spread out 
  in time. Scalability up to 16k tasks has been achieved. 

* New node state FAILING has been added along with event trigger for it.
  This is similar to DRAINING, but is intended for fault prediction work.
  A trigger was also added for nodes becoming DRAINED.

