Digichem Convert

digichem convert is used to convert different coordinate file types.

$ digichem convert INPUT_FILE -O OUTPUT_FILE

By default, the format of both the input and output files is determined by their extension. For example:

$ digichem convert Benzene.com -O Benzene.xyz

Would convert a Gaussian input file (.com) to a platform independent XYZ (.xyz) file.

The file formats can also be explicitly specified using the --input (or -i) and --output (or -o) options:

$ digichem convert Benzene.in -O Benzene.out -i com -o xyz
$ cat Benzene.out
12

C          1.38314       -0.22144        0.00537
C          0.50694       -1.30651       -0.00792
C         -0.87093       -1.09053       -0.01470
C         -1.37290        0.21095       -0.00441
C         -0.49670        1.29607        0.01060
C          0.88118        1.07996        0.01366
H          2.45680       -0.38978        0.00923
H          0.89795       -2.32061       -0.01321
H         -1.55354       -1.93593       -0.02737
H         -2.44648        0.37928       -0.00825
H         -0.88777        2.31003        0.01974
H          1.56378        1.92549        0.02297

See below for a full list of supported input and output formats.

Certain coordinate formats, including the Digichem native format (.si) and Gaussian input format (.com), natively support molecular charge and multiplicity. These can be set using the --charge (or -C) and --multiplicity (or -M) options respectively:

$ digichem convert Benzene.xyz -O Benzene_rad_cation.si -C 1 -M 2
$ digichem convert Benzene.xyz -O Benzene_rad_anion.si -C -1 -M 2

Warning

Most coordinate formats do not retain electron information (charge or multiplicity). In these cases, any values set with the -C or -M options will be silently ignored. When working with radicals and/or charged species, make sure to use a format that supports lossless conversion of charge and multiplicity (see C&M column below), or set the charge and multiplicity appropriately with the digichem submit command.

Some formats only store coordinates in two dimensions. This is common in formats used by 2D drawing programs, such as ChemDraw (.cdx, .cdxml, etc.) and Marvin Sketch (.cml, .mrv, .etc). Other formats may contain coordinates in either two or three dimensions, depending on how the structure was drawn (.cml, etc.), while yet others do not use coordinates to represent the molecule at all (SMILES, etc.).

When converting between formats, digichem convert will automatically detect whether the input file contains 3D coordinates. If it does not, then a rapid force-field optimisation (using the obabel --gen3D command) will be performed to generate 3D coordinates. In most cases, this is beneficial because the resulting 3D coordinates will be a better starting point for the full optimisation using your chosen calculation engine and method. However, the preoptimisation will change the geometry between the input and output files. This may be unexpected, and in some cases undesirable, in which case the preoptimisation can be disabled by using the -gen3D False option:

$ digichem convert Benzene.cdx -O Benzene.xyz --gen3D False

Warning

The --gen3D option should not be considered a substitute for a true optimisation with a computational chemistry method. When drawing molecules from scratch, whether in two dimensions or three, always perform a geometry optimisation before calculating other properties.

Supported File Formats

Code

Description

Read

Write

C&M

abinit

ABINIT Output Format

acesin

ACES input format

acesout

ACES output format

acr

ACR format

adf

ADF cartesian input format

adfband

ADF Band output format

adfdftb

ADF DFTB output format

adfout

ADF output format

alc

Alchemy format

aoforce

Turbomole AOFORCE output format

arc

Accelrys/MSI Biosym/Insight II CAR format

ascii

ASCII format

axsf

XCrySDen Structure Format

bgf

MSI BGF format

box

Dock 3.5 Box format

bs

Ball and Stick format

c09out

Crystal 09 output format

c3d1

Chem3D Cartesian 1 format

c3d2

Chem3D Cartesian 2 format

cac

CAChe MolStruct format

caccrt

Cacao Cartesian format

cache

CAChe MolStruct format

cacint

Cacao Internal format

can

Canonical SMILES format

car

Accelrys/MSI Biosym/Insight II CAR format

castep

CASTEP format

ccc

CCC format

cdjson

ChemDoodle JSON

cdx

ChemDraw binary format

cdxml

ChemDraw CDXML format

cht

Chemtool format

cif

Crystallographic Information File

ck

ChemKin format

cml

Chemical Markup Language

cmlr

CML Reaction format

cof

Culgi object file format

com

Gaussian Input

confabreport

Confab report format

CONFIG

DL-POLY CONFIG

CONTCAR

VASP format

CONTFF

MDFF format

crk2d

Chemical Resource Kit diagram(2D)

crk3d

Chemical Resource Kit 3D format

csr

Accelrys/MSI Quanta CSR format

cssr

CSD CSSR format

ct

ChemDraw Connection Table format

cub

Gaussian cube format

cube

Gaussian cube format

dallog

DALTON output format

dalmol

DALTON input format

dat

Generic Output file format

dmol

DMol3 coordinates format

dx

OpenDX cube format for APBS

ent

Protein Data Bank format

exyz

Extended XYZ cartesian coordinates format

fa

FASTA format

fasta

FASTA format

fch

Gaussian formatted checkpoint file format

fchk

Gaussian formatted checkpoint file format

fck

Gaussian formatted checkpoint file format

feat

Feature format

fh

Fenske-Hall Z-Matrix format

fhiaims

FHIaims XYZ format

fix

SMILES FIX format

fps

FPS text fingerprint format (Dalke)

fpt

Fingerprint format

fract

Free Form Fractional format

fs

Fastsearch format

fsa

FASTA format

g03

Gaussian Output

g09

Gaussian Output

g16

Gaussian Output

g92

Gaussian Output

g94

Gaussian Output

g98

Gaussian Output

gal

Gaussian Output

gam

GAMESS Output

gamess

GAMESS Output

gamin

GAMESS Input

gamout

GAMESS Output

gau

Gaussian Input

gjc

Gaussian Input

gjf

Gaussian Input

got

GULP format

gpr

Ghemical format

gr96

GROMOS96 format

gro

GRO format

gukin

GAMESS-UK Input

gukout

GAMESS-UK Output

gzmat

Gaussian Z-Matrix Input

hin

HyperChem HIN format

HISTORY

DL-POLY HISTORY

inchi

InChI format

inchikey

InChIKey

inp

GAMESS Input

ins

ShelX format

jin

Jaguar input format

jout

Jaguar output format

k

Compare molecules using InChI

lmpdat

The LAMMPS data format

log

Generic Output file format

lpmd

LPMD format

mcdl

MCDL format

mcif

Macromolecular Crystallographic Info

MDFF

MDFF format

mdl

MDL MOL format

ml2

Sybyl Mol2 format

mmcif

Macromolecular Crystallographic Info

mmd

MacroModel format

mmod

MacroModel format

mna

Multilevel Neighborhoods of Atoms (MNA)

mol

MDL MOL format

mol2

Sybyl Mol2 format

mold

Molden format

molden

Molden format

molf

Molden format

molreport

Open Babel molecule report

moo

MOPAC Output format

mop

MOPAC Cartesian format

mopcrt

MOPAC Cartesian format

mopin

MOPAC Internal

mopout

MOPAC Output format

mp

Molpro input format

mpc

MOPAC Cartesian format

mpd

MolPrint2D format

mpo

Molpro output format

mpqc

MPQC output format

mpqcin

MPQC simplified input format

mrv

Chemical Markup Language

msi

Accelrys/MSI Cerius II MSI format

msms

M.F. Sanner’s MSMS input format

nul

Outputs nothing

nw

NWChem input format

nwo

NWChem output format

orca

ORCA output format

orcainp

ORCA input format

out

Generic Output file format

outmol

DMol3 coordinates format

output

Generic Output file format

paint

Painter format

pc

PubChem format

pcjson

PubChem JSON

pcm

PCModel Format

pdb

Protein Data Bank format

pdbqt

AutoDock PDBQT format

png

PNG 2D depiction

pointcloud

Point cloud on VDW surface

pos

POS cartesian coordinates format

POSCAR

VASP format

POSFF

MDFF format

pov

POV-Ray input format

pqr

PQR format

pqs

Parallel Quantum Solutions format

prep

Amber Prep format

pwscf

PWscf format

qcin

Q-Chem input format

qcout

Q-Chem output format

report

Open Babel report format

res

ShelX format

rinchi

RInChI

rsmi

Reaction SMILES format

rxn

MDL RXN format

sd

MDL MOL format

sdf

MDL MOL format

si

Silico Input Format

siesta

SIESTA format

smi

SMILES format

smiles

SMILES format

smy

SMILES format using Smiley parser

stl

STL 3D-printing format

svg

SVG 2D depiction

sy2

Sybyl Mol2 format

t41

ADF TAPE41 format

tdd

Thermo format

therm

Thermo format

tmol

TurboMole Coordinate format

txyz

Tinker XYZ format

unixyz

UniChem XYZ format

VASP

VASP format

vmol

ViewMol format

wln

Wiswesser Line Notation

xed

XED format

xml

General XML format

xsf

XCrySDen Structure Format

xtc

XTC format

xyz

XYZ cartesian coordinates format

yob

YASARA.org YOB format

zin

ZINDO input format

Read: This format can be read. Write: This format can be written. C&M: This format supports lossless charge and multiplicity information (the charge and multiplicity can be both read and written).

The Digichem native format (.si), the program-independent XYZ (.xyz), and the Gaussian input format (.com, .gjf, etc.) are handled natively. Calculation log files are parsed using the cclib library [12]. All other formats are converted by calling the external obabel command, and thus require a functioning installing of Open Babel [11].

Digichem Native Format (.si)

The Digichem coordinate format is written in YAML, and contains six properties:

$ digichem convert Benzene.xyz -o si
version: 2.1.0
name: null
charge: 0
multiplicity: 1
atoms:
- {atom: C, x: 1.38314, y: -0.22144, z: 0.00537}
- {atom: C, x: 0.50694, y: -1.30651, z: -0.00792}
- {atom: C, x: -0.87093, y: -1.09053, z: -0.0147}
- {atom: C, x: -1.3729, y: 0.21095, z: -0.00441}
- {atom: C, x: -0.4967, y: 1.29607, z: 0.0106}
- {atom: C, x: 0.88118, y: 1.07996, z: 0.01366}
- {atom: H, x: 2.4568, y: -0.38978, z: 0.00923}
- {atom: H, x: 0.89795, y: -2.32061, z: -0.01321}
- {atom: H, x: -1.55354, y: -1.93593, z: -0.02737}
- {atom: H, x: -2.44648, y: 0.37928, z: -0.00825}
- {atom: H, x: -0.88777, y: 2.31003, z: 0.01974}
- {atom: H, x: 1.56378, y: 1.92549, z: 0.02297}
history: null
version

The version string of the input file, used by the digichem parser for backwards compatibility.

name

An optional string describing the name of the molecule. The name is only used for descriptive purposes, and can be omitted. If it is omitted, the name of the file itself is used as the name of the molecule.

charge

Integer charge of the molecule.

multiplicity

Integer multiplicity of the molecule.

atoms

A list of dictionaries describing the geometry of the molecule. Each dictionary contains four key:value pairs:

  • atom: The atom type as its standard element symbol.

  • x: The x coordinate, in angstrom.

  • y: The y coordinate, in angstrom.

  • z: The z coordinate, in angstrom.

history

An optional unique identifier describing the calculation that generated the geometry in this file.