.. _digichem_convert: Digichem Convert ============================= ``digichem convert`` is used to convert different coordinate file types. .. code-block:: console $ digichem convert INPUT_FILE -O OUTPUT_FILE By default, the format of both the input and output files is determined by their extension. For example: .. code-block:: console $ digichem convert Benzene.com -O Benzene.xyz Would convert a Gaussian input file (.com) to a platform independent XYZ (.xyz) file. The file formats can also be explicitly specified using the ``--input`` (or ``-i``) and ``--output`` (or ``-o``) options: .. code-block:: console $ digichem convert Benzene.in -O Benzene.out -i com -o xyz $ cat Benzene.out 12 C 1.38314 -0.22144 0.00537 C 0.50694 -1.30651 -0.00792 C -0.87093 -1.09053 -0.01470 C -1.37290 0.21095 -0.00441 C -0.49670 1.29607 0.01060 C 0.88118 1.07996 0.01366 H 2.45680 -0.38978 0.00923 H 0.89795 -2.32061 -0.01321 H -1.55354 -1.93593 -0.02737 H -2.44648 0.37928 -0.00825 H -0.88777 2.31003 0.01974 H 1.56378 1.92549 0.02297 See :ref:`below` for a full list of supported input and output formats. Certain coordinate formats, including the :ref:`Digichem native format (.si)` and Gaussian input format (.com), natively support molecular charge and multiplicity. These can be set using the ``--charge`` (or ``-C``) and ``--multiplicity`` (or ``-M``) options respectively: .. code-block:: console $ digichem convert Benzene.xyz -O Benzene_rad_cation.si -C 1 -M 2 $ digichem convert Benzene.xyz -O Benzene_rad_anion.si -C -1 -M 2 .. warning:: Most coordinate formats do not retain electron information (charge or multiplicity). In these cases, any values set with the ``-C`` or ``-M`` options will be silently ignored. When working with radicals and/or charged species, make sure to use a format that supports lossless conversion of charge and multiplicity (see **C&M** column :ref:`below`), or set the charge and multiplicity appropriately with the ``digichem submit`` :ref:`command`. Some formats only store coordinates in two dimensions. This is common in formats used by 2D drawing programs, such as ChemDraw (.cdx, .cdxml, etc.) and Marvin Sketch (.cml, .mrv, .etc). Other formats may contain coordinates in either two or three dimensions, depending on how the structure was drawn (.cml, etc.), while yet others do not use coordinates to represent the molecule at all (SMILES, etc.). When converting between formats, ``digichem convert`` will automatically detect whether the input file contains 3D coordinates. If it does not, then a rapid force-field optimisation (using the ``obabel --gen3D`` command) will be performed to generate 3D coordinates. In most cases, this is beneficial because the resulting 3D coordinates will be a better starting point for the full optimisation using your chosen calculation engine and method. However, the preoptimisation will change the geometry between the input and output files. This may be unexpected, and in some cases undesirable, in which case the preoptimisation can be disabled by using the ``-gen3D False`` option: .. code-block:: console $ digichem convert Benzene.cdx -O Benzene.xyz --gen3D False .. warning:: The ``--gen3D`` option should not be considered a substitute for a true optimisation with a computational chemistry method. When drawing molecules from scratch, whether in two dimensions or three, **always** perform a geometry optimisation before calculating other properties. .. _coord_types: Supported File Formats ---------------------- .. csv-table:: :file: /_static/convert_ref/Formats.csv :header-rows: 1 **Read**: This format can be read. **Write**: This format can be written. **C&M**: This format supports lossless charge and multiplicity information (the charge and multiplicity can be both read and written). The Digichem native format (.si), the program-independent XYZ (.xyz), and the Gaussian input format (.com, .gjf, etc.) are handled natively. Calculation log files are parsed using the cclib library :cite:p:`cclib`. All other formats are converted by calling the external ``obabel`` command, and thus require a functioning installing of Open Babel :cite:p:`Openbabel`. .. _si_format: Digichem Native Format (.si) ---------------------------- The Digichem coordinate format is written in YAML, and contains six properties: .. code-block:: console $ digichem convert Benzene.xyz -o si version: 2.1.0 name: null charge: 0 multiplicity: 1 atoms: - {atom: C, x: 1.38314, y: -0.22144, z: 0.00537} - {atom: C, x: 0.50694, y: -1.30651, z: -0.00792} - {atom: C, x: -0.87093, y: -1.09053, z: -0.0147} - {atom: C, x: -1.3729, y: 0.21095, z: -0.00441} - {atom: C, x: -0.4967, y: 1.29607, z: 0.0106} - {atom: C, x: 0.88118, y: 1.07996, z: 0.01366} - {atom: H, x: 2.4568, y: -0.38978, z: 0.00923} - {atom: H, x: 0.89795, y: -2.32061, z: -0.01321} - {atom: H, x: -1.55354, y: -1.93593, z: -0.02737} - {atom: H, x: -2.44648, y: 0.37928, z: -0.00825} - {atom: H, x: -0.88777, y: 2.31003, z: 0.01974} - {atom: H, x: 1.56378, y: 1.92549, z: 0.02297} history: null **version** The version string of the input file, used by the digichem parser for backwards compatibility. **name** An optional string describing the name of the molecule. The ``name`` is only used for descriptive purposes, and can be omitted. If it is omitted, the name of the file itself is used as the name of the molecule. **charge** Integer charge of the molecule. **multiplicity** Integer multiplicity of the molecule. **atoms** A list of dictionaries describing the geometry of the molecule. Each dictionary contains four key:value pairs: * **atom**: The atom type as its standard element symbol. * **x**: The x coordinate, in angstrom. * **y**: The y coordinate, in angstrom. * **z**: The z coordinate, in angstrom. **history** An optional unique identifier describing the calculation that generated the geometry in this file.