Digichem Convert
digichem convert
is used to convert different coordinate file types.
$ digichem convert INPUT_FILE -O OUTPUT_FILE
By default, the format of both the input and output files is determined by their extension. For example:
$ digichem convert Benzene.com -O Benzene.xyz
Would convert a Gaussian input file (.com) to a platform independent XYZ (.xyz) file.
The file formats can also be explicitly specified using the --input
(or -i
) and --output
(or -o
) options:
$ digichem convert Benzene.in -O Benzene.out -i com -o xyz
$ cat Benzene.out
12
C 1.38314 -0.22144 0.00537
C 0.50694 -1.30651 -0.00792
C -0.87093 -1.09053 -0.01470
C -1.37290 0.21095 -0.00441
C -0.49670 1.29607 0.01060
C 0.88118 1.07996 0.01366
H 2.45680 -0.38978 0.00923
H 0.89795 -2.32061 -0.01321
H -1.55354 -1.93593 -0.02737
H -2.44648 0.37928 -0.00825
H -0.88777 2.31003 0.01974
H 1.56378 1.92549 0.02297
See below for a full list of supported input and output formats.
Certain coordinate formats, including the Digichem native format (.si) and Gaussian input format (.com), natively support molecular charge and multiplicity.
These can be set using the --charge
(or -C
) and --multiplicity
(or -M
) options respectively:
$ digichem convert Benzene.xyz -O Benzene_rad_cation.si -C 1 -M 2
$ digichem convert Benzene.xyz -O Benzene_rad_anion.si -C -1 -M 2
Warning
Most coordinate formats do not retain electron information (charge or multiplicity). In these cases, any values set with the -C
or -M
options will be silently ignored.
When working with radicals and/or charged species, make sure to use a format that supports lossless conversion of charge and multiplicity (see C&M column below),
or set the charge and multiplicity appropriately with the digichem submit
command.
Some formats only store coordinates in two dimensions. This is common in formats used by 2D drawing programs, such as ChemDraw (.cdx, .cdxml, etc.) and Marvin Sketch (.cml, .mrv, .etc). Other formats may contain coordinates in either two or three dimensions, depending on how the structure was drawn (.cml, etc.), while yet others do not use coordinates to represent the molecule at all (SMILES, etc.).
When converting between formats, digichem convert
will automatically detect whether the input file contains 3D coordinates.
If it does not, then a rapid force-field optimisation (using the obabel --gen3D
command) will be performed to generate 3D coordinates.
In most cases, this is beneficial because the resulting 3D coordinates will be a better starting point for the full optimisation using your chosen calculation engine and method.
However, the preoptimisation will change the geometry between the input and output files. This may be unexpected, and in some cases undesirable, in which case the
preoptimisation can be disabled by using the -gen3D False
option:
$ digichem convert Benzene.cdx -O Benzene.xyz --gen3D False
Warning
The --gen3D
option should not be considered a substitute for a true optimisation with a computational chemistry method.
When drawing molecules from scratch, whether in two dimensions or three, always perform a geometry optimisation before calculating other properties.
Supported File Formats
Code |
Description |
Read |
Write |
C&M |
---|---|---|---|---|
abinit |
ABINIT Output Format |
✅ |
❌ |
❌ |
acesin |
ACES input format |
❌ |
✅ |
❌ |
acesout |
ACES output format |
✅ |
❌ |
❌ |
acr |
ACR format |
✅ |
❌ |
❌ |
adf |
ADF cartesian input format |
❌ |
✅ |
❌ |
adfband |
ADF Band output format |
✅ |
❌ |
❌ |
adfdftb |
ADF DFTB output format |
✅ |
❌ |
❌ |
adfout |
ADF output format |
✅ |
❌ |
❌ |
alc |
Alchemy format |
✅ |
✅ |
❌ |
aoforce |
Turbomole AOFORCE output format |
✅ |
❌ |
❌ |
arc |
Accelrys/MSI Biosym/Insight II CAR format |
✅ |
❌ |
❌ |
ascii |
ASCII format |
❌ |
✅ |
❌ |
axsf |
XCrySDen Structure Format |
✅ |
❌ |
❌ |
bgf |
MSI BGF format |
✅ |
✅ |
❌ |
box |
Dock 3.5 Box format |
✅ |
✅ |
❌ |
bs |
Ball and Stick format |
✅ |
✅ |
❌ |
c09out |
Crystal 09 output format |
✅ |
❌ |
❌ |
c3d1 |
Chem3D Cartesian 1 format |
✅ |
✅ |
❌ |
c3d2 |
Chem3D Cartesian 2 format |
✅ |
✅ |
❌ |
cac |
CAChe MolStruct format |
❌ |
✅ |
❌ |
caccrt |
Cacao Cartesian format |
✅ |
✅ |
❌ |
cache |
CAChe MolStruct format |
❌ |
✅ |
❌ |
cacint |
Cacao Internal format |
❌ |
✅ |
❌ |
can |
Canonical SMILES format |
✅ |
✅ |
❌ |
car |
Accelrys/MSI Biosym/Insight II CAR format |
✅ |
❌ |
❌ |
castep |
CASTEP format |
✅ |
❌ |
❌ |
ccc |
CCC format |
✅ |
❌ |
❌ |
cdjson |
ChemDoodle JSON |
✅ |
✅ |
❌ |
cdx |
ChemDraw binary format |
✅ |
❌ |
❌ |
cdxml |
ChemDraw CDXML format |
✅ |
✅ |
❌ |
cht |
Chemtool format |
❌ |
✅ |
❌ |
cif |
Crystallographic Information File |
✅ |
✅ |
❌ |
ck |
ChemKin format |
✅ |
✅ |
❌ |
cml |
Chemical Markup Language |
✅ |
✅ |
❌ |
cmlr |
CML Reaction format |
✅ |
✅ |
❌ |
cof |
Culgi object file format |
✅ |
✅ |
❌ |
com |
Gaussian Input |
✅ |
✅ |
✅ |
confabreport |
Confab report format |
❌ |
✅ |
❌ |
CONFIG |
DL-POLY CONFIG |
✅ |
✅ |
❌ |
CONTCAR |
VASP format |
✅ |
✅ |
❌ |
CONTFF |
MDFF format |
✅ |
✅ |
❌ |
crk2d |
Chemical Resource Kit diagram(2D) |
✅ |
✅ |
❌ |
crk3d |
Chemical Resource Kit 3D format |
✅ |
✅ |
❌ |
csr |
Accelrys/MSI Quanta CSR format |
❌ |
✅ |
❌ |
cssr |
CSD CSSR format |
❌ |
✅ |
❌ |
ct |
ChemDraw Connection Table format |
✅ |
✅ |
❌ |
cub |
Gaussian cube format |
✅ |
✅ |
❌ |
cube |
Gaussian cube format |
✅ |
✅ |
❌ |
dallog |
DALTON output format |
✅ |
❌ |
❌ |
dalmol |
DALTON input format |
✅ |
✅ |
✅ |
dat |
Generic Output file format |
✅ |
❌ |
❌ |
dmol |
DMol3 coordinates format |
✅ |
✅ |
❌ |
dx |
OpenDX cube format for APBS |
✅ |
✅ |
❌ |
ent |
Protein Data Bank format |
✅ |
✅ |
❌ |
exyz |
Extended XYZ cartesian coordinates format |
✅ |
✅ |
❌ |
fa |
FASTA format |
✅ |
✅ |
❌ |
fasta |
FASTA format |
✅ |
✅ |
❌ |
fch |
Gaussian formatted checkpoint file format |
✅ |
❌ |
❌ |
fchk |
Gaussian formatted checkpoint file format |
✅ |
❌ |
❌ |
fck |
Gaussian formatted checkpoint file format |
✅ |
❌ |
❌ |
feat |
Feature format |
✅ |
✅ |
❌ |
fh |
Fenske-Hall Z-Matrix format |
❌ |
✅ |
❌ |
fhiaims |
FHIaims XYZ format |
✅ |
✅ |
❌ |
fix |
SMILES FIX format |
❌ |
✅ |
❌ |
fps |
FPS text fingerprint format (Dalke) |
❌ |
✅ |
❌ |
fpt |
Fingerprint format |
❌ |
✅ |
❌ |
fract |
Free Form Fractional format |
✅ |
✅ |
❌ |
fs |
Fastsearch format |
✅ |
✅ |
❌ |
fsa |
FASTA format |
✅ |
✅ |
❌ |
g03 |
Gaussian Output |
✅ |
❌ |
❌ |
g09 |
Gaussian Output |
✅ |
❌ |
❌ |
g16 |
Gaussian Output |
✅ |
❌ |
❌ |
g92 |
Gaussian Output |
✅ |
❌ |
❌ |
g94 |
Gaussian Output |
✅ |
❌ |
❌ |
g98 |
Gaussian Output |
✅ |
❌ |
❌ |
gal |
Gaussian Output |
✅ |
❌ |
❌ |
gam |
GAMESS Output |
✅ |
❌ |
❌ |
gamess |
GAMESS Output |
✅ |
❌ |
❌ |
gamin |
GAMESS Input |
✅ |
✅ |
❌ |
gamout |
GAMESS Output |
✅ |
❌ |
❌ |
gau |
Gaussian Input |
✅ |
✅ |
✅ |
gjc |
Gaussian Input |
✅ |
✅ |
✅ |
gjf |
Gaussian Input |
✅ |
✅ |
✅ |
got |
GULP format |
✅ |
❌ |
❌ |
gpr |
Ghemical format |
✅ |
✅ |
❌ |
gr96 |
GROMOS96 format |
❌ |
✅ |
❌ |
gro |
GRO format |
✅ |
✅ |
❌ |
gukin |
GAMESS-UK Input |
✅ |
✅ |
❌ |
gukout |
GAMESS-UK Output |
✅ |
✅ |
❌ |
gzmat |
Gaussian Z-Matrix Input |
✅ |
✅ |
✅ |
hin |
HyperChem HIN format |
✅ |
✅ |
❌ |
HISTORY |
DL-POLY HISTORY |
✅ |
❌ |
❌ |
inchi |
InChI format |
✅ |
✅ |
❌ |
inchikey |
InChIKey |
❌ |
✅ |
❌ |
inp |
GAMESS Input |
✅ |
✅ |
❌ |
ins |
ShelX format |
✅ |
❌ |
❌ |
jin |
Jaguar input format |
✅ |
✅ |
❌ |
jout |
Jaguar output format |
✅ |
❌ |
❌ |
k |
Compare molecules using InChI |
❌ |
✅ |
❌ |
lmpdat |
The LAMMPS data format |
❌ |
✅ |
❌ |
log |
Generic Output file format |
✅ |
❌ |
❌ |
lpmd |
LPMD format |
✅ |
✅ |
❌ |
mcdl |
MCDL format |
✅ |
✅ |
❌ |
mcif |
Macromolecular Crystallographic Info |
✅ |
✅ |
❌ |
MDFF |
MDFF format |
✅ |
✅ |
❌ |
mdl |
MDL MOL format |
✅ |
✅ |
❌ |
ml2 |
Sybyl Mol2 format |
✅ |
✅ |
❌ |
mmcif |
Macromolecular Crystallographic Info |
✅ |
✅ |
❌ |
mmd |
MacroModel format |
✅ |
✅ |
❌ |
mmod |
MacroModel format |
✅ |
✅ |
❌ |
mna |
Multilevel Neighborhoods of Atoms (MNA) |
❌ |
✅ |
❌ |
mol |
MDL MOL format |
✅ |
✅ |
❌ |
mol2 |
Sybyl Mol2 format |
✅ |
✅ |
❌ |
mold |
Molden format |
✅ |
✅ |
❌ |
molden |
Molden format |
✅ |
✅ |
❌ |
molf |
Molden format |
✅ |
✅ |
❌ |
molreport |
Open Babel molecule report |
❌ |
✅ |
❌ |
moo |
MOPAC Output format |
✅ |
❌ |
❌ |
mop |
MOPAC Cartesian format |
✅ |
✅ |
❌ |
mopcrt |
MOPAC Cartesian format |
✅ |
✅ |
❌ |
mopin |
MOPAC Internal |
✅ |
✅ |
❌ |
mopout |
MOPAC Output format |
✅ |
❌ |
❌ |
mp |
Molpro input format |
❌ |
✅ |
❌ |
mpc |
MOPAC Cartesian format |
✅ |
✅ |
❌ |
mpd |
MolPrint2D format |
❌ |
✅ |
❌ |
mpo |
Molpro output format |
✅ |
❌ |
❌ |
mpqc |
MPQC output format |
✅ |
❌ |
❌ |
mpqcin |
MPQC simplified input format |
❌ |
✅ |
❌ |
mrv |
Chemical Markup Language |
✅ |
✅ |
❌ |
msi |
Accelrys/MSI Cerius II MSI format |
✅ |
❌ |
❌ |
msms |
M.F. Sanner’s MSMS input format |
❌ |
✅ |
❌ |
nul |
Outputs nothing |
❌ |
✅ |
❌ |
nw |
NWChem input format |
❌ |
✅ |
❌ |
nwo |
NWChem output format |
✅ |
❌ |
❌ |
orca |
ORCA output format |
✅ |
❌ |
❌ |
orcainp |
ORCA input format |
❌ |
✅ |
❌ |
out |
Generic Output file format |
✅ |
❌ |
❌ |
outmol |
DMol3 coordinates format |
✅ |
✅ |
❌ |
output |
Generic Output file format |
✅ |
❌ |
❌ |
paint |
Painter format |
❌ |
✅ |
❌ |
pc |
PubChem format |
✅ |
❌ |
❌ |
pcjson |
PubChem JSON |
✅ |
✅ |
❌ |
pcm |
PCModel Format |
✅ |
✅ |
❌ |
pdb |
Protein Data Bank format |
✅ |
✅ |
❌ |
pdbqt |
AutoDock PDBQT format |
✅ |
✅ |
❌ |
png |
PNG 2D depiction |
✅ |
✅ |
❌ |
pointcloud |
Point cloud on VDW surface |
❌ |
✅ |
❌ |
pos |
POS cartesian coordinates format |
✅ |
❌ |
❌ |
POSCAR |
VASP format |
✅ |
✅ |
❌ |
POSFF |
MDFF format |
✅ |
✅ |
❌ |
pov |
POV-Ray input format |
❌ |
✅ |
❌ |
pqr |
PQR format |
✅ |
✅ |
❌ |
pqs |
Parallel Quantum Solutions format |
✅ |
✅ |
❌ |
prep |
Amber Prep format |
✅ |
❌ |
❌ |
pwscf |
PWscf format |
✅ |
❌ |
❌ |
qcin |
Q-Chem input format |
❌ |
✅ |
❌ |
qcout |
Q-Chem output format |
✅ |
❌ |
❌ |
report |
Open Babel report format |
❌ |
✅ |
❌ |
res |
ShelX format |
✅ |
❌ |
❌ |
rinchi |
RInChI |
❌ |
✅ |
❌ |
rsmi |
Reaction SMILES format |
✅ |
✅ |
❌ |
rxn |
MDL RXN format |
✅ |
✅ |
❌ |
sd |
MDL MOL format |
✅ |
✅ |
❌ |
sdf |
MDL MOL format |
✅ |
✅ |
❌ |
si |
Silico Input Format |
✅ |
✅ |
✅ |
siesta |
SIESTA format |
✅ |
❌ |
❌ |
smi |
SMILES format |
✅ |
✅ |
❌ |
smiles |
SMILES format |
✅ |
✅ |
❌ |
smy |
SMILES format using Smiley parser |
✅ |
❌ |
❌ |
stl |
STL 3D-printing format |
❌ |
✅ |
❌ |
svg |
SVG 2D depiction |
❌ |
✅ |
❌ |
sy2 |
Sybyl Mol2 format |
✅ |
✅ |
❌ |
t41 |
ADF TAPE41 format |
✅ |
❌ |
❌ |
tdd |
Thermo format |
✅ |
✅ |
❌ |
therm |
Thermo format |
✅ |
✅ |
❌ |
tmol |
TurboMole Coordinate format |
✅ |
✅ |
❌ |
txyz |
Tinker XYZ format |
✅ |
✅ |
❌ |
unixyz |
UniChem XYZ format |
✅ |
✅ |
❌ |
VASP |
VASP format |
✅ |
✅ |
❌ |
vmol |
ViewMol format |
✅ |
✅ |
❌ |
wln |
Wiswesser Line Notation |
✅ |
❌ |
❌ |
xed |
XED format |
❌ |
✅ |
❌ |
xml |
General XML format |
✅ |
❌ |
❌ |
xsf |
XCrySDen Structure Format |
✅ |
❌ |
❌ |
xtc |
XTC format |
✅ |
❌ |
❌ |
xyz |
XYZ cartesian coordinates format |
✅ |
✅ |
❌ |
yob |
YASARA.org YOB format |
✅ |
✅ |
❌ |
zin |
ZINDO input format |
❌ |
✅ |
❌ |
Read: This format can be read. Write: This format can be written. C&M: This format supports lossless charge and multiplicity information (the charge and multiplicity can be both read and written).
The Digichem native format (.si), the program-independent XYZ (.xyz), and the Gaussian input format (.com, .gjf, etc.) are handled natively.
Calculation log files are parsed using the cclib library [12].
All other formats are converted by calling the external obabel
command, and thus require a functioning installing of Open Babel [11].
Digichem Native Format (.si)
The Digichem coordinate format is written in YAML, and contains six properties:
$ digichem convert Benzene.xyz -o si
version: 2.1.0
name: null
charge: 0
multiplicity: 1
atoms:
- {atom: C, x: 1.38314, y: -0.22144, z: 0.00537}
- {atom: C, x: 0.50694, y: -1.30651, z: -0.00792}
- {atom: C, x: -0.87093, y: -1.09053, z: -0.0147}
- {atom: C, x: -1.3729, y: 0.21095, z: -0.00441}
- {atom: C, x: -0.4967, y: 1.29607, z: 0.0106}
- {atom: C, x: 0.88118, y: 1.07996, z: 0.01366}
- {atom: H, x: 2.4568, y: -0.38978, z: 0.00923}
- {atom: H, x: 0.89795, y: -2.32061, z: -0.01321}
- {atom: H, x: -1.55354, y: -1.93593, z: -0.02737}
- {atom: H, x: -2.44648, y: 0.37928, z: -0.00825}
- {atom: H, x: -0.88777, y: 2.31003, z: 0.01974}
- {atom: H, x: 1.56378, y: 1.92549, z: 0.02297}
history: null
- version
The version string of the input file, used by the digichem parser for backwards compatibility.
- name
An optional string describing the name of the molecule. The
name
is only used for descriptive purposes, and can be omitted. If it is omitted, the name of the file itself is used as the name of the molecule.- charge
Integer charge of the molecule.
- multiplicity
Integer multiplicity of the molecule.
- atoms
A list of dictionaries describing the geometry of the molecule. Each dictionary contains four key:value pairs:
atom: The atom type as its standard element symbol.
x: The x coordinate, in angstrom.
y: The y coordinate, in angstrom.
z: The z coordinate, in angstrom.
- history
An optional unique identifier describing the calculation that generated the geometry in this file.