Function Repository Resource:

SmilesString

Source Notebook

Get a SMILES string for a molecule

Contributed by: Jason Biggs

ResourceFunction["SmilesString"][mol]

returns the SMILES string for the molecule mol.

Details and Options

SMILES is an acronym for simplified molecular-input line-entry system.

ResourceFunction["SmilesString"] works on "Chemical" entities as well as molecules.

ResourceFunction["SmilesString"] has the following options:

"AllBondsExplicit"

False

whether to explicitly show all bonds

"Canonical"

True

whether to list atoms in canonical order

IncludeAromaticBonds

Automatic

whether to use aromatic or Kekule form

"IncludedAtoms"

All

which atoms to include in the string

IncludeHydrogens

Automatic

include hydrogens as distinct atoms

"Isomeric"

True

include stereochemistry and isotope information

"RootedAtom"

Automatic

the atom to begin the string

"WriteImplicitHydrogens"

False

whether to show all implicit hydrogens with their heavy atom

With the default options, ResourceFunction["SmilesString"][mol] is equivalent to MoleculeValue[mol,"SMILES"].

Examples

Basic Examples (3)

Get the SMILES string from a molecule:

In[1]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/da725a70-bf69-4598-9550-d1ab313c4d48"]

Out[2]=

This is equivalent to the molecule property "SMILES":

In[3]:=

Out[3]=

Get the SMILES string without stereochemistry information:

In[4]:=

Out[4]=

Scope (3)

Get the SMILES string for a chemical entity:

In[5]:=

Out[5]=

The SMILES string returned can be used to construct a new Molecule object:

In[6]:=

Out[6]=

Use ToEntity to get back to the entity:

In[7]:=

Out[7]=

Options (8)

By default, single bonds are omitted from the string. Use the "AllBondsExplicit" option to control this:

In[8]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/73f5f9e8-0cbd-4b28-abf7-c99f1e98eb22"]

Out[8]=

Two equivalent molecules will give the same SMILES string even if their atom ordering is different:

In[9]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/d636761b-47e3-43bb-983e-9a574ed0afa7"]

Out[11]=

To disable canonicalization of the atom ordering, use "Canonical"→False:

In[12]:=

Out[12]=

With the default setting of IncludeAromaticBonds→Automatic, aromaticity in the SMILES string reflects the aromaticity in the Molecule expression:

In[13]:=

benzene = Molecule[{
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"]}, {
Bond[{1, 2}, "Aromatic"],
Bond[{2, 3}, "Aromatic"],
Bond[{3, 4}, "Aromatic"],
Bond[{4, 5}, "Aromatic"],
Bond[{5, 6}, "Aromatic"],
Bond[{6, 1}, "Aromatic"]}];
benzeneKekule = MoleculeModify[benzene, "Kekulize"];
ResourceFunction["SmilesString"] /@ {benzene, benzeneKekule}

Out[15]=

Giving an explicit setting for the IncludeAromaticBonds option will override this behavior:

In[16]:=

Out[16]=

The "IncludedAtoms" option allows finding the SMILES string for a molecule fragment. The value of the option should be All or a list of atom indices:

In[17]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/951d7b53-7e79-4f08-a3e3-618d2fd92644"]

Out[17]=

Note that the SMILES for a fragment will not necessarily be a valid:

In[18]:=

Out[18]=

When the included atoms are not bonded, the fragment SMILES will be disconnected:

In[19]:=

(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/334ebf84-95ee-435f-b011-44ca5256ca16"]

Out[19]=

With the default setting of IncludeHydrogens→Automatic, hydrogen atoms explicitly present in a Molecule expression will be in the resulting string:

In[20]:=

Out[22]=

Giving an explicit setting for the IncludeHydrogens option will override this behavior:

In[23]:=

Out[23]=

Use the "Isomeric" option to control whether isotope information is encoded:

In[24]:=

Out[25]=

Double–bond and tetrahedral stereochemistry is controlled by this option as well:

In[26]:=

ResourceFunction["SmilesString"][Molecule[{
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"]}, {
Bond[{1, 2}, "Single"],
Bond[{2, 3}, "Single"],
Bond[{3, 4}, "Single"],
Bond[{4, 5}, "Double"],
Bond[{5, 6}, "Single"],
Bond[{6, 7}, "Single"],
Bond[{7, 8}, "Single"],
Bond[{6, 9}, "Single"]}, StereochemistryElements -> {<|"StereoType" -> "Tetrahedral", "ChiralCenter" -> 6, "Direction" -> "Counterclockwise", "FiducialAtom" -> 5, "Ligands" -> {7, 9}|>, <|"StereoType" -> "DoubleBond", "StereoBond" -> {4, 5}, "Ligands" -> {3, 6}, "Value" -> "Together"|>}], "Isomeric" -> #] & /@ {True, False}

Out[26]=

Use "RootedAtom"→n to create a SMILES string starting at the atom with index n:

In[27]:=

listOfSmiles = ResourceFunction["SmilesString"][Molecule[{
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"],
Atom["C"]}, {
Bond[{1, 2}, "Single"],
Bond[{2, 3}, "Single"],
Bond[{3, 4}, "Single"],
Bond[{4, 5}, "Double"],
Bond[{5, 6}, "Single"],
Bond[{6, 7}, "Single"],
Bond[{7, 8}, "Single"],
Bond[{6, 9}, "Single"]}, StereochemistryElements -> {<|"StereoType" -> "Tetrahedral", "ChiralCenter" -> 6, "Direction" -> "Counterclockwise", "FiducialAtom" -> 5, "Ligands" -> {7, 9}|>, <|"StereoType" -> "DoubleBond", "StereoBond" -> {4, 5}, "Ligands" -> {3, 6}, "Value" -> "Together"|>}], "RootedAtom" -> #] & /@ Range[9];
Column[listOfSmiles]

Out[28]=

These SMILES strings all create equivalent molecules:

In[29]:=

Out[29]=

Implicit hydrogens are not included in a SMILES string when their presence can be inferred from normal valence rules. Use "WriteImplicitHydrogens"→True to write all implicit hydrogens:

In[30]:=

Out[31]=

Publisher

JasonB

Version History

1.0.0 – 29 July 2020

Related Resources

License Information

This work is licensed under a Creative Commons Attribution 4.0 International License

SmilesString

Details and Options

Examples

Basic Examples (3)

Scope (3)

Options (8)

Publisher

Version History

Related Resources

Related Symbols

License Information