Setup
Digichem reads configuration information from two main locations. The main configuration file is located in your home directory at
~/.config/digichem{VERSION}/digichem.yaml
, where {VERSION}
is the major Digichem version.
For example, for all versions of Digichem 6, the main config file is located at: ~/.config/digichem6/digichem.yaml
.
Meanwhile, information about the available computational engines and available server queues are stored in sub directories
in the same folder as the main config file. Engines are stored in the Programs
folder, while queues are stored in the
Destinations
folder.
To change the settings of Digichem for just a single user (yourself), these files can be located here:
~/.config/digichem6/digichem.yaml
~/.config/digichem6/Destinations
~/.config/digichem6/Programs
For a multi-user installation, changes can be made to these files instead/in addition to affect all users:
/etc/digichem6/digichem.yaml
/etc/digichem6/Destinations
/etc/digichem6/Programs
Before you can use Digichem to submit calculations for the first time, you need to tell it what computational engines and server queues are available on your cluster. You’ll also want to setup a rendering engine (VMD) to generate 3D molecular orbital plots, and optionally links to external programs to generate certain types of reports.
Automated setup
The digichem setup
program will automatically search your server for available server queues, computational programs,
and rendering engines (VMD installations). The search is normally extremely fast. Server queues are queried using SLURM’S sinfo
and PBS’ qstat
commands, which are typically instantaneous. Computational programs are searched using the locate
command,
which relies on an internal database to locate files instead of trawling through the filesystem. If locate
is not available
on your system (or is otherwise unable to find anything), then Digichem will fall-back to a traditional filesystem search with
the find
command. For very large filesystems, especially if distributed over a network (NFS etc.), find
may be too slow,
but you can drastically increase the speed of the search by specifying file hints to start searching from (see below).
In most cases, you can setup everything Digichem needs by simply calling:
$ digichem setup
See the installation walkthrough for example output from the program.
By default, setup
will write new configuration files to ~/.config/digichem6/
; these will apply to the current user only.
If you wish to instead run the setup for multiple users, use the -o
option to specify a different output location:
$ digichem setup -o /etc/digichem6
Digichem will normally refuse to overwrite previously written setup files. If you attempt to run digichem setup
again, you’ll
see an error similar to this:
$ digichem setup
digichem: ERROR: Output directory '/home/oliver/.config/digichem6/Programs/SETUP' is not empty! Refusing to overwrite existing files. Try 'digichem setup --force' if you're sure.
digichem: ERROR: Output directory '/home/oliver/.config/digichem6/Destinations/SETUP' is not empty! Refusing to overwrite existing files. Try 'digichem setup --force' if you're sure.
digichem: ERROR: stopped with error
Digichem_exception: Refusing to overwrite existing config files
If you are sure you want to run the setup procedure, specify the --force
option:
$ digichem setup --force
Digichem will write the new files to a SETUP directory in the given output location. If the default location is used, these files can be found at:
~/.config/digichem6/Destinations/SETUP/
~/.config/digichem6/Programs/SETUP/
You are free to edit and/or remove the config files afterwards as you see fit.
Specifying file hints
On most systems, the default command of digichem setup
is all you need. However, if your file system is particularly large and/or slow,
and the fast search program locate
is unavailable, then the filesystem search performed with find
might take a long time.
In this case, you can increase the speed of the setup program by specifying hints for where to look for computational programs.
Hints are given after the digichem setup
command. For instance, to limit the search to the /opt
and /usr
directories:
$ digichem setup /opt /usr
The file hints can be as specific as you need. If you know the exact location of the programs on your cluster, you can give these
exactly to digichem setup
. For example, if you have two Gaussian installations at /opt/software/g16
and /opt/software/g09
,
an Orca installation at /opt/software/ORCA
, and a VMD installation at /usr/local/bin/vmd
(which is the default for VMD):
$ digichem setup /opt/software/g16 /opt/software/g09 /opt/software/ORCA /usr/local/bin/vmd
Note that the order of the file hints does not matter.
Manual setup
External programs
VMD
The options to control VMD are set in ~/.config/digichem6/digichem.yaml
, under the render
heading.
Three options are required:
render:
engine: vmd
vmd:
executable: vmd
tachyon: tachyon
executable
is the path to the main VMD executable, while tachyon
is the path to the Tachyon ray tracer library.
Tachyon is normally included in the VMD installation itself.
If both vmd
and tachyon
are in your PATH
, then the above configuration is all you need. Otherwise, set the
executable
and tachyon
options to the appropriate file paths. For example:
render:
engine: vmd
vmd:
executable: /usr/local/bin/vmd
tachyon: /usr/local/lib/vmd/tachyon_LINUXAMD64
Gaussian utilities
Note
These options are redundant and will likely be removed in a future version.
To generate reports from Gaussian calculations, Digichem needs to know the location of both the formchk
and cubegen
utilities.
These can be set in ~/.config/digichem6/digichem.yaml
, under the external
heading:
external:
formchk: /opt/gaussian/g16/formchk
cubegen: /opt/gaussian/g16/cubegen
Default report programs
Note
These options are redundant and will likely be removed in a future version.
When generating a report at the end of a calculation, Digichem will use the same program definition that ran the calculation to perform any
requested post analysis (natural transition orbitals, difference density etc.). However, when a report is generated manually, Digichem needs to know
which program definition to use. These can be set in ~/.config/digichem6/digichem.yaml
, under the report
heading:
report:
turbomole:
program: Turbomole
gaussian:
program: Gaussian 16
scratch_path: /tmp
orca:
program: Orca
Each program
option corresponds to the name of a program, as set during setup.
For example, if a Gaussian engine was setup with the following options:
link:
tag: Gaussian 16
...
Then the following option would be set.
report:
gaussian:
program: Gaussian 16
Computational engines
The installation script will create an example config file for each of the supported calculation engines in
~/.config/digichem6/Programs
.
$ ls ~/.config/digichem6/Programs
Gaussian.yaml.off Orca.yaml.off Turbomole.yaml.off
To activate a config, first rename the file to remove the .off
suffix.
For example, to setup a Gaussian backend, rename Gaussian.yaml.off
to Gaussian.yaml
.
The file can then be edited according to the guide below:
Gaussian
An example Gaussian 16 config file looks like this:
link:
tag: Gaussian 16
# The top config for the calculations we support.
previous:
- Program Base
next:
- Gaussian Auto
- [Calculation Base, Gaussian]
# The name of the program this config is for. These are pre-built into Digichem.
meta:
class_name: gaussian
# 'Path' to the Gaussian executable.
executable: g16
# The name of the environmental variable Gaussian looks for to find 'root'.
# This changes between versions, for Gaussian 16 it is g16root.
# For Gaussian 09 it is g09root etc.
root_environ_name: g16root
# Path to the directory where gaussian is installed.
root: /software/g16
# Path to the gaussian .profile script which is run to set-up gaussian.
init_file: /software/g16/bsd/g16.profile
To setup Gaussian, change the root
and init_file
options to point to the location where Gaussian is installed.
root
is the main Gaussian installation folder. If the g16
binary is located at /software/g16/g16
, then
root
should be set to /software/g16
. init_file
is the script used to setup Gaussian, it is normally located
at bsd/g16.profile
(although the name will change appropriately for other versions, eg. g09.profile
). In the same
example, the init script would be located at /software/g16/bsd/g16.profile
.
Note
Gaussian includes two setup scripts, one for SH-like shells (ending in .profile
) and one for CSH-like shells (ending in .login
).
Make sure to use the SH setup script (.profile
), not the CSH one (.login
)!
For other versions of Gaussian, the root_environ_name
and executable
options should also be changed appropriately:
g16:
root_environ_name: g16root
,executable: g16
g09:
root_environ_name: g09root
,executable: g09
g03:
root_environ_name: g03root
,executable: g03
Turbomole
An example Turbomole config file looks like this:
# Config for a Turbomole submission
link:
tag: Turbomole
previous:
- Program Base
# The top config for the calculations we support.
next:
- Turbomole Auto
- [Calculation Base, Turbomole]
# The name of the program this config is for. These are pre-built into Digichem.
meta:
class_name: Turbomole
# Path to the main folder in which Turbomole is installed.
# Inside this folder the 'bin' folder can be located.
root: /software/TURBOMOLE/
# Path to the Turbomole init file (for SH compliant shells, not CSH).
# This file is normally called 'Config_turbo_env'
init_file: /software/TURBOMOLE/Config_turbo_env
To setup Turbomole, change the root
and init_file
options to point to the location where Turbomole is installed.
root
is the main Turbomole installation folder. If the bin
folder is located at /software/TURBOMOLE/bin
, then
root
should be set to /software/TURBOMOLE
. init_file
is the script used to setup Turbomole, it is normally located
directly inside the main Turbomole directory and is called Config_turbo_env
. In the same example, the init script would be
located at /software/TURBOMOLE/Config_turbo_env
.
Note
Turbomole also includes two setup scripts, one for SH-like shells (Config_turbo_env
) and one for CSH-like shells (Config_turbo_env.csh
).
Make sure to use the SH setup script (Config_turbo_env
), not the CSH one (Config_turbo_env.csh
)!
Orca
An example Orca config file looks like this:
# Config for Orca
link:
tag: Orca
previous:
- Program Base
# The top config for the calculations we support.
next:
- Orca Auto Characterisation
- [Calculation Base, Orca]
# The name of the program this config is for. These are pre-built into silico.
meta:
class_name: Orca
# Path to the main Orca executable.
executable: /software/ORCA/orca
# Path to the main folder in which Orca is installed.
# The main Orca executable is typically located inside this directory.
root: /software/ORCA
# Required for parallel execution, the path to the root directory of your mpi installation.
# Inside this root directory can be found the 'bin' and 'lib' directores.
mpi_root: /software/openmpi/
Each Orca installation is actually made up of two parts: 1) The Orca installation itself, and 2) OpenMPI, which is needed to run calculations in parallel. OpenMPI is sometimes installed inside the Orca installation folder, sometimes it is separate.
To setup Orca, the executable
and root
options must be set appropriately. executable
is the full, absolute
path to the main Orca binary, while root
is the main Orca installation directory. executable
is normally immediately
inside root
. For example, if the Orca binary is located at /software/ORCA/orca
, then root
should be set to
/software/ORCA
.
Note
Unlike the other computational engines, the executable
option for Orca must include the full path,
not just the executable name.
The path to the OpenMPI installation is set using the mpi_root
option. This is technically optional, but is strongly
recommended as otherwise Orca calculations will be limited to 1 CPU only. mpi_root
should point to the main OpenMPI
directory, inside of which are the bin
and lib
sub-directories. For example, if mpirun
is located at
/software/openmpi/bin/mpirun
, then mpi_root
should be set to /software/openmpi
.
Server queues
The installation script will also create example config files for both SLURM and PBS. These can be found in
~/.config/digichem6/Destinations
.
$ ls ~/.config/digichem6/Destinations
PBS.yaml.off SLURM.yaml.off
These configs can be activated in the same way as the calculation engines, by removing the .off
suffix.
PBS
An example PBS config file looks like this:
link:
tag: PBS
previous:
- Destination Base
next:
- Program Base
meta:
class_name: PBS
# The queue/partition name.
# Leave commented if your PBS chooses a queue automatically.
#queue: Main Queue
# The maximum job time (dd-hh:mm:ss)
#time: 7-00:00:00
The name of the PBS queue can set using the queue
option. If your cluster automatically fills different
queues based on the requested resources, this option can be left unset.
The maximum allowed job execution can bet set using the time
option, in dd-hh:mm:ss format. Some example
time formats are:
7 days:
7-00:00:00
24 hours:
24:00:00
30 minutes:
00:30:00
SLURM
An example SLURM config file looks like this:
link:
tag: SLURM
previous:
- Destination Base
next:
- Program Base
meta:
class_name: SLURM
# The queue/partition name.
# Leave commented if your SLURM chooses a queue automatically.
#queue: Main Queue
# The maximum job time (dd-hh:mm:ss)
#time: 7-00:00:00
# Uncomment to setup auto mail.
#mail:
# events: [BEGIN, END, FAIL]
# Where to send the mail (user@host)
# If not set, the default is to send to the current user.
# user: user
# host: host
The options for SLURM and PBS are nearly identical, and both queue
and time
have the same meaning for both.
Note
In SLURM, queues are often referred to as ‘partitions’, but these are the same thing. The partition name can be set
using the queue
option, or left blank to let SLURM pick automatically (if configured).
SLURM additionally supports sending automated emails when certain calculation milestone are hit.
To enable this functionality, uncomment the mail
option and set events
to the milestones you’d like to be notified
of. See the SLURM manual for a full list of supported options.
By default, the email will be sent to your account on the cluster. To change this, set user
and host
appropriately.
For example, if your email is joe.blogs@mail.com
, set:
user: joe.blogs
host: mail.com
Multiple Queues
Often, it is desirable to setup multiple queues to pick from. This is particularly useful if the cluster has more than one partition/queue and does not support automatic queue assignments, and/or to optimise wait times by reducing the maximum job duration for shorter calculations.
Additional queues can be specified by writing additional files to the Destinations
folder, or they can be added to the
existing example config files. Multiple queues are separated in the same file by three hyphens (---
), for example:
link:
tag: SLURM 1
previous:
- Destination Base
next:
- Program Base
meta:
class_name: SLURM
---
link:
tag: SLURM 2
previous:
- Destination Base
next:
- Program Base
meta:
class_name: SLURM
Important
Each queue must be given a unique tag
, or otherwise Digichem will produce an error.
Note
Configs for both SLURM and PBS can be included in the same file; the file name is only used for your reference.
In this example config, three SLURM queues are defined, one with a maximum job time of 24h, one with 48h, and one with 72h.
link:
tag: SLURM 24h
previous:
- Destination Base
next:
- Program Base
meta:
class_name: SLURM
# The maximum job time (dd-hh:mm:ss)
time: 24:00:00
---
link:
tag: SLURM 48h
previous:
- Destination Base
next:
- Program Base
meta:
class_name: SLURM
# The maximum job time (dd-hh:mm:ss)
time: 48:00:00
---
link:
tag: SLURM 72h
previous:
- Destination Base
next:
- Program Base
meta:
class_name: SLURM
# The maximum job time (dd-hh:mm:ss)
time: 72:00:00
While in this example, three different PBS queues are exposed (each with a different queue
name):
link:
tag: PBS Normal
previous:
- Destination Base
next:
- Program Base
meta:
class_name: PBS
queue: standard
---
link:
tag: PBS Slow
previous:
- Destination Base
next:
- Program Base
meta:
class_name: PBS
queue: long
---
link:
tag: PBS Testing
previous:
- Destination Base
next:
- Program Base
meta:
class_name: PBS
queue: debug
Array Submission
By default, each Digichem calculation is submitted as a separate job. This is simpler and easier to manage, but some computational clusters discourage submitting large numbers of individual jobs simultaneously for performance reasons.
As an alternative, Digichem supports array submission for both PBS and SLURM. When submitting multiple molecules in this mode, only a single job is submitted to the scheduler, with each molecule being represented as an individual array task in the larger array.
To switch from single job submission to array submission, change the meta: class_name
option to either
PBS-Array
or SLURM-Array
, for PBS or SLURM respectively. All other options function the same as for
single job submission.
For example, this config exposes both single job submission and array submission to SLURM:
link:
tag: SLURM
previous:
- Destination Base
next:
- Program Base
meta:
class_name: SLURM
queue: standard
---
link:
tag: SLURM Array
previous:
- Destination Base
next:
- Program Base
meta:
class_name: SLURM-Array
queue: standard