.. _setup: Setup ===== Digichem reads configuration information from two main locations. The main configuration file is located in your home directory at ``~/.config/digichem{VERSION}/digichem.yaml``, where ``{VERSION}`` is the major Digichem version. For example, for all versions of Digichem |major_version|, the main config file is located at: ``~/.config/digichem|major_version|/digichem.yaml``. Meanwhile, information about the available computational engines and available server queues are stored in sub directories in the same folder as the main config file. Engines are stored in the ``Programs`` folder, while queues are stored in the ``Destinations`` folder. To change the settings of Digichem for just a single user (yourself), these files can be located here: - ``~/.config/digichem|major_version|/digichem.yaml`` - ``~/.config/digichem|major_version|/Destinations`` - ``~/.config/digichem|major_version|/Programs`` For a multi-user installation, changes can be made to these files instead/in addition to affect all users: - ``/etc/digichem|major_version|/digichem.yaml`` - ``/etc/digichem|major_version|/Destinations`` - ``/etc/digichem|major_version|/Programs`` Before you can use Digichem to submit calculations for the first time, you need to tell it what :ref:`computational engines ` and :ref:`server queues ` are available on your cluster. You'll also want to setup a rendering engine (:ref:`VMD `) to generate 3D molecular orbital plots, and optionally links to external programs to generate certain types of reports. Automated setup _______________ The ``digichem setup`` program will automatically search your server for available server queues, computational programs, and rendering engines (VMD installations). The search is normally extremely fast. Server queues are queried using SLURM'S ``sinfo`` and PBS' ``qstat`` commands, which are typically instantaneous. Computational programs are searched using the ``locate`` command, which relies on an internal database to locate files instead of trawling through the filesystem. If ``locate`` is not available on your system (or is otherwise unable to find anything), then Digichem will fall-back to a traditional filesystem search with the ``find`` command. For very large filesystems, especially if distributed over a network (NFS etc.), ``find`` may be too slow, but you can drastically increase the speed of the search by specifying file hints to start searching from (see below). In most cases, you can setup everything Digichem needs by simply calling: .. code-block:: console $ digichem setup See the :ref:`installation walkthrough ` for example output from the program. By default, ``setup`` will write new configuration files to ``~/.config/digichem|major_version|/``; these will apply to the current user only. If you wish to instead run the setup for multiple users, use the ``-o`` option to specify a different output location: .. code-block:: console $ digichem setup -o /etc/digichem6 Digichem will normally refuse to overwrite previously written setup files. If you attempt to run ``digichem setup`` again, you'll see an error similar to this: .. code-block:: console $ digichem setup digichem: ERROR: Output directory '/home/oliver/.config/digichem|major_version|/Programs/SETUP' is not empty! Refusing to overwrite existing files. Try 'digichem setup --force' if you're sure. digichem: ERROR: Output directory '/home/oliver/.config/digichem|major_version|/Destinations/SETUP' is not empty! Refusing to overwrite existing files. Try 'digichem setup --force' if you're sure. digichem: ERROR: stopped with error Digichem_exception: Refusing to overwrite existing config files If you are sure you want to run the setup procedure, specify the ``--force`` option: .. code-block:: console $ digichem setup --force Digichem will write the new files to a SETUP directory in the given output location. If the default location is used, these files can be found at: - ~/.config/digichem|major_version|/Destinations/SETUP/ - ~/.config/digichem|major_version|/Programs/SETUP/ You are free to edit and/or remove the config files afterwards as you see fit. Specifying file hints --------------------- On most systems, the default command of ``digichem setup`` is all you need. However, if your file system is particularly large and/or slow, and the fast search program ``locate`` is unavailable, then the filesystem search performed with ``find`` might take a long time. In this case, you can increase the speed of the setup program by specifying hints for where to look for computational programs. Hints are given after the ``digichem setup`` command. For instance, to limit the search to the ``/opt`` and ``/usr`` directories: .. code-block:: console $ digichem setup /opt /usr The file hints can be as specific as you need. If you know the exact location of the programs on your cluster, you can give these exactly to ``digichem setup``. For example, if you have two Gaussian installations at ``/opt/software/g16`` and ``/opt/software/g09``, an Orca installation at ``/opt/software/ORCA``, and a VMD installation at ``/usr/local/bin/vmd`` (which is the default for VMD): .. code-block:: console $ digichem setup /opt/software/g16 /opt/software/g09 /opt/software/ORCA /usr/local/bin/vmd Note that the order of the file hints does not matter. Manual setup ____________ External programs ----------------- .. _setup_vmd: VMD ~~~ The options to control VMD are set in ``~/.config/digichem|major_version|/digichem.yaml``, under the ``render`` heading. Three options are required: .. code-block:: yaml render: engine: vmd vmd: executable: vmd tachyon: tachyon ``executable`` is the path to the main VMD executable, while ``tachyon`` is the path to the Tachyon ray tracer library. Tachyon is normally included in the VMD installation itself. If both ``vmd`` and ``tachyon`` are in your ``PATH``, then the above configuration is all you need. Otherwise, set the ``executable`` and ``tachyon`` options to the appropriate file paths. For example: .. code-block:: yaml render: engine: vmd vmd: executable: /usr/local/bin/vmd tachyon: /usr/local/lib/vmd/tachyon_LINUXAMD64 Gaussian utilities ~~~~~~~~~~~~~~~~~~ .. note:: These options are redundant and will likely be removed in a future version. To generate reports from Gaussian calculations, Digichem needs to know the location of both the ``formchk`` and ``cubegen`` utilities. These can be set in ``~/.config/digichem|major_version|/digichem.yaml``, under the ``external`` heading: .. code-block:: yaml external: formchk: /opt/gaussian/g16/formchk cubegen: /opt/gaussian/g16/cubegen Default report programs ~~~~~~~~~~~~~~~~~~~~~~~ .. note:: These options are redundant and will likely be removed in a future version. When generating a report at the end of a calculation, Digichem will use the same program definition that ran the calculation to perform any requested post analysis (natural transition orbitals, difference density etc.). However, when a report is generated manually, Digichem needs to know which program definition to use. These can be set in ``~/.config/digichem|major_version|/digichem.yaml``, under the ``report`` heading: .. code-block:: yaml report: turbomole: program: Turbomole gaussian: program: Gaussian 16 scratch_path: /tmp orca: program: Orca Each ``program`` option corresponds to the name of a program, as set during :ref:`setup `. For example, if a Gaussian engine was setup with the following options: .. code-block:: yaml link: tag: Gaussian 16 ... Then the following option would be set. .. code-block:: yaml report: gaussian: program: Gaussian 16 .. _setup_programs: Computational engines --------------------- The installation script will create an example config file for each of the supported calculation engines in ``~/.config/digichem|major_version|/Programs``. .. code-block:: console $ ls ~/.config/digichem|major_version|/Programs Gaussian.yaml.off Orca.yaml.off Turbomole.yaml.off To activate a config, first rename the file to remove the ``.off`` suffix. For example, to setup a Gaussian backend, rename ``Gaussian.yaml.off`` to ``Gaussian.yaml``. The file can then be edited according to the guide below: .. _gaussian_setup: Gaussian ~~~~~~~~ An example Gaussian 16 config file looks like this: .. code-block:: yaml link: tag: Gaussian 16 # The top config for the calculations we support. previous: - Program Base next: - Gaussian Auto - [Calculation Base, Gaussian] # The name of the program this config is for. These are pre-built into Digichem. meta: class_name: gaussian # 'Path' to the Gaussian executable. executable: g16 # The name of the environmental variable Gaussian looks for to find 'root'. # This changes between versions, for Gaussian 16 it is g16root. # For Gaussian 09 it is g09root etc. root_environ_name: g16root # Path to the directory where gaussian is installed. root: /software/g16 # Path to the gaussian .profile script which is run to set-up gaussian. init_file: /software/g16/bsd/g16.profile To setup Gaussian, change the ``root`` and ``init_file`` options to point to the location where Gaussian is installed. ``root`` is the main Gaussian installation folder. If the ``g16`` binary is located at ``/software/g16/g16``, then ``root`` should be set to ``/software/g16``. ``init_file`` is the script used to setup Gaussian, it is normally located at ``bsd/g16.profile`` (although the name will change appropriately for other versions, eg. ``g09.profile``). In the same example, the init script would be located at ``/software/g16/bsd/g16.profile``. .. note:: Gaussian includes two setup scripts, one for SH-like shells (ending in ``.profile``) and one for CSH-like shells (ending in ``.login``). Make sure to use the SH setup script (``.profile``), not the CSH one (``.login``)! For other versions of Gaussian, the ``root_environ_name`` and ``executable`` options should also be changed appropriately: - g16: ``root_environ_name: g16root``, ``executable: g16`` - g09: ``root_environ_name: g09root``, ``executable: g09`` - g03: ``root_environ_name: g03root``, ``executable: g03`` .. _turbomole_setup: Turbomole ~~~~~~~~~ An example Turbomole config file looks like this: .. code-block:: yaml # Config for a Turbomole submission link: tag: Turbomole previous: - Program Base # The top config for the calculations we support. next: - Turbomole Auto - [Calculation Base, Turbomole] # The name of the program this config is for. These are pre-built into Digichem. meta: class_name: Turbomole # Path to the main folder in which Turbomole is installed. # Inside this folder the 'bin' folder can be located. root: /software/TURBOMOLE/ # Path to the Turbomole init file (for SH compliant shells, not CSH). # This file is normally called 'Config_turbo_env' init_file: /software/TURBOMOLE/Config_turbo_env To setup Turbomole, change the ``root`` and ``init_file`` options to point to the location where Turbomole is installed. ``root`` is the main Turbomole installation folder. If the ``bin`` folder is located at ``/software/TURBOMOLE/bin``, then ``root`` should be set to ``/software/TURBOMOLE``. ``init_file`` is the script used to setup Turbomole, it is normally located directly inside the main Turbomole directory and is called ``Config_turbo_env``. In the same example, the init script would be located at ``/software/TURBOMOLE/Config_turbo_env``. .. note:: Turbomole also includes two setup scripts, one for SH-like shells (``Config_turbo_env``) and one for CSH-like shells (``Config_turbo_env.csh``). Make sure to use the SH setup script (``Config_turbo_env``), not the CSH one (``Config_turbo_env.csh``)! .. _orca_setup: Orca ~~~~~~~~~ An example Orca config file looks like this: .. code-block:: yaml # Config for Orca link: tag: Orca previous: - Program Base # The top config for the calculations we support. next: - Orca Auto Characterisation - [Calculation Base, Orca] # The name of the program this config is for. These are pre-built into silico. meta: class_name: Orca # Path to the main Orca executable. executable: /software/ORCA/orca # Path to the main folder in which Orca is installed. # The main Orca executable is typically located inside this directory. root: /software/ORCA # Required for parallel execution, the path to the root directory of your mpi installation. # Inside this root directory can be found the 'bin' and 'lib' directores. mpi_root: /software/openmpi/ Each Orca installation is actually made up of two parts: 1) The Orca installation itself, and 2) OpenMPI, which is needed to run calculations in parallel. OpenMPI is sometimes installed inside the Orca installation folder, sometimes it is separate. To setup Orca, the ``executable`` and ``root`` options must be set appropriately. ``executable`` is the full, absolute path to the main Orca binary, while ``root`` is the main Orca installation directory. ``executable`` is normally immediately inside ``root``. For example, if the Orca binary is located at ``/software/ORCA/orca``, then ``root`` should be set to ``/software/ORCA``. .. note:: Unlike the other computational engines, the ``executable`` option for Orca must include the full path, not just the executable name. The path to the OpenMPI installation is set using the ``mpi_root`` option. This is technically optional, but is strongly recommended as otherwise Orca calculations will be limited to 1 CPU only. ``mpi_root`` should point to the main OpenMPI directory, inside of which are the ``bin`` and ``lib`` sub-directories. For example, if ``mpirun`` is located at ``/software/openmpi/bin/mpirun``, then ``mpi_root`` should be set to ``/software/openmpi``. .. _setup_destinations: Server queues ------------- The installation script will also create example config files for both SLURM and PBS. These can be found in ``~/.config/digichem|major_version|/Destinations``. .. code-block:: console $ ls ~/.config/digichem|major_version|/Destinations PBS.yaml.off SLURM.yaml.off These configs can be activated in the same way as the calculation engines, by removing the ``.off`` suffix. PBS ~~~ An example PBS config file looks like this: .. code-block:: yaml link: tag: PBS previous: - Destination Base next: - Program Base meta: class_name: PBS # The queue/partition name. # Leave commented if your PBS chooses a queue automatically. #queue: Main Queue # The maximum job time (dd-hh:mm:ss) #time: 7-00:00:00 The name of the PBS queue can set using the ``queue`` option. If your cluster automatically fills different queues based on the requested resources, this option can be left unset. The maximum allowed job execution can bet set using the ``time`` option, in dd-hh:mm:ss format. Some example time formats are: - 7 days: ``7-00:00:00`` - 24 hours: ``24:00:00`` - 30 minutes: ``00:30:00`` SLURM ~~~~~ An example SLURM config file looks like this: .. code-block:: yaml link: tag: SLURM previous: - Destination Base next: - Program Base meta: class_name: SLURM # The queue/partition name. # Leave commented if your SLURM chooses a queue automatically. #queue: Main Queue # The maximum job time (dd-hh:mm:ss) #time: 7-00:00:00 # Uncomment to setup auto mail. #mail: # events: [BEGIN, END, FAIL] # Where to send the mail (user@host) # If not set, the default is to send to the current user. # user: user # host: host The options for SLURM and PBS are nearly identical, and both ``queue`` and ``time`` have the same meaning for both. .. note:: In SLURM, queues are often referred to as 'partitions', but these are the same thing. The partition name can be set using the ``queue`` option, or left blank to let SLURM pick automatically (if configured). SLURM additionally supports sending automated emails when certain calculation milestone are hit. To enable this functionality, uncomment the ``mail`` option and set ``events`` to the milestones you'd like to be notified of. See the `SLURM manual `_ for a full list of supported options. By default, the email will be sent to your account on the cluster. To change this, set ``user`` and ``host`` appropriately. For example, if your email is ``joe.blogs@mail.com``, set: - ``user: joe.blogs`` - ``host: mail.com`` Multiple Queues ~~~~~~~~~~~~~~~ Often, it is desirable to setup multiple queues to pick from. This is particularly useful if the cluster has more than one partition/queue and does not support automatic queue assignments, and/or to optimise wait times by reducing the maximum job duration for shorter calculations. Additional queues can be specified by writing additional files to the ``Destinations`` folder, or they can be added to the existing example config files. Multiple queues are separated in the same file by three hyphens (``---``), for example: .. code-block:: yaml link: tag: SLURM 1 previous: - Destination Base next: - Program Base meta: class_name: SLURM --- link: tag: SLURM 2 previous: - Destination Base next: - Program Base meta: class_name: SLURM .. important:: Each queue must be given a unique ``tag``, or otherwise Digichem will produce an error. .. note:: Configs for both SLURM and PBS can be included in the same file; the file name is only used for your reference. In this example config, three SLURM queues are defined, one with a maximum job time of 24h, one with 48h, and one with 72h. .. code-block:: yaml link: tag: SLURM 24h previous: - Destination Base next: - Program Base meta: class_name: SLURM # The maximum job time (dd-hh:mm:ss) time: 24:00:00 --- link: tag: SLURM 48h previous: - Destination Base next: - Program Base meta: class_name: SLURM # The maximum job time (dd-hh:mm:ss) time: 48:00:00 --- link: tag: SLURM 72h previous: - Destination Base next: - Program Base meta: class_name: SLURM # The maximum job time (dd-hh:mm:ss) time: 72:00:00 While in this example, three different PBS queues are exposed (each with a different ``queue`` name): .. code-block:: yaml link: tag: PBS Normal previous: - Destination Base next: - Program Base meta: class_name: PBS queue: standard --- link: tag: PBS Slow previous: - Destination Base next: - Program Base meta: class_name: PBS queue: long --- link: tag: PBS Testing previous: - Destination Base next: - Program Base meta: class_name: PBS queue: debug Array Submission ~~~~~~~~~~~~~~~~ By default, each Digichem calculation is submitted as a separate job. This is simpler and easier to manage, but some computational clusters discourage submitting large numbers of individual jobs simultaneously for performance reasons. As an alternative, Digichem supports array submission for both PBS and SLURM. When submitting multiple molecules in this mode, only a single job is submitted to the scheduler, with each molecule being represented as an individual array task in the larger array. To switch from single job submission to array submission, change the ``meta: class_name`` option to either ``PBS-Array`` or ``SLURM-Array``, for PBS or SLURM respectively. All other options function the same as for single job submission. For example, this config exposes both single job submission and array submission to SLURM: .. code-block:: yaml link: tag: SLURM previous: - Destination Base next: - Program Base meta: class_name: SLURM queue: standard --- link: tag: SLURM Array previous: - Destination Base next: - Program Base meta: class_name: SLURM-Array queue: standard