close

Function Repository Resource:

SmilesString

Source Notebook

Get a SMILES string for a molecule

Contributed by: Jason Biggs

ResourceFunction["SmilesString"][mol]

returns the SMILES string for the molecule mol.

Details and Options

SMILES is an acronym for simplified molecular-input line-entry system.
ResourceFunction["SmilesString"] works on "Chemical" entities as well as molecules.
ResourceFunction["SmilesString"] has the following options:
"AllBondsExplicit"Falsewhether to explicitly show all bonds
"Canonical"Truewhether to list atoms in canonical order
IncludeAromaticBondsAutomaticwhether to use aromatic or Kekule form
"IncludedAtoms"Allwhich atoms to include in the string
IncludeHydrogensAutomaticinclude hydrogens as distinct atoms
"Isomeric"Trueinclude stereochemistry and isotope information
"RootedAtom"Automaticthe atom to begin the string
"WriteImplicitHydrogens"Falsewhether to show all implicit hydrogens with their heavy atom
With the default options, ResourceFunction["SmilesString"][mol] is equivalent to MoleculeValue[mol,"SMILES"].

Examples

Basic Examples (3) 

Get the SMILES string from a molecule:

In[1]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/da725a70-bf69-4598-9550-d1ab313c4d48"]
Out[2]=
BERJAYA

This is equivalent to the molecule property "SMILES":

In[3]:=
SameQ[%, MoleculeValue[m, "SMILES"]]
Out[3]=
BERJAYA

Get the SMILES string without stereochemistry information:

In[4]:=
ResourceFunction["SmilesString"][m, "Isomeric" -> False]
Out[4]=
BERJAYA

Scope (3) 

Get the SMILES string for a chemical entity:

In[5]:=
ResourceFunction["SmilesString"][
 Entity["Chemical", "QuercetinDihydrate"], IncludeHydrogens -> False]
Out[5]=
BERJAYA

The SMILES string returned can be used to construct a new Molecule object:

In[6]:=
Molecule[%]
Out[6]=
BERJAYA

Use ToEntity to get back to the entity:

In[7]:=
ToEntity@%
Out[7]=
BERJAYA

Options (8) 

By default, single bonds are omitted from the string. Use the "AllBondsExplicit" option to control this:

In[8]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/73f5f9e8-0cbd-4b28-abf7-c99f1e98eb22"]
Out[8]=
BERJAYA

Two equivalent molecules will give the same SMILES string even if their atom ordering is different:

In[9]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/d636761b-47e3-43bb-983e-9a574ed0afa7"]
Out[11]=
BERJAYA

To disable canonicalization of the atom ordering, use "Canonical"False:

In[12]:=
ResourceFunction["SmilesString"][#, "Canonical" -> False] & /@ {m, m2}
Out[12]=
BERJAYA

With the default setting of IncludeAromaticBondsAutomatic, aromaticity in the SMILES string reflects the aromaticity in the Molecule expression:

In[13]:=
benzene = Molecule[{
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"]}, {
Bond[{1, 2}, "Aromatic"], 
Bond[{2, 3}, "Aromatic"], 
Bond[{3, 4}, "Aromatic"], 
Bond[{4, 5}, "Aromatic"], 
Bond[{5, 6}, "Aromatic"], 
Bond[{6, 1}, "Aromatic"]}];
benzeneKekule = MoleculeModify[benzene, "Kekulize"];
ResourceFunction["SmilesString"] /@ {benzene, benzeneKekule}
Out[15]=
BERJAYA

Giving an explicit setting for the IncludeAromaticBonds option will override this behavior:

In[16]:=
Table[ResourceFunction["SmilesString"][mol, IncludeAromaticBonds -> bool], {bool, {True, False}}, {mol, {benzene, benzeneKekule}}]
Out[16]=
BERJAYA

The "IncludedAtoms" option allows finding the SMILES string for a molecule fragment. The value of the option should be All or a list of atom indices:

In[17]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/951d7b53-7e79-4f08-a3e3-618d2fd92644"]
Out[17]=
BERJAYA

Note that the SMILES for a fragment will not necessarily be a valid:

In[18]:=
Molecule@%
BERJAYA
Out[18]=
BERJAYA

When the included atoms are not bonded, the fragment SMILES will be disconnected:

In[19]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/334ebf84-95ee-435f-b011-44ca5256ca16"]
Out[19]=
BERJAYA

With the default setting of IncludeHydrogensAutomatic, hydrogen atoms explicitly present in a Molecule expression will be in the resulting string:

In[20]:=
benzene = Molecule[{
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"]}, {
Bond[{1, 2}, "Aromatic"], 
Bond[{2, 3}, "Aromatic"], 
Bond[{3, 4}, "Aromatic"], 
Bond[{4, 5}, "Aromatic"], 
Bond[{5, 6}, "Aromatic"], 
Bond[{6, 1}, "Aromatic"]}];
benzeneKekule = MoleculeModify[benzene, "AddHydrogens"];
ResourceFunction["SmilesString"] /@ {benzene, benzeneKekule}
Out[22]=
BERJAYA

Giving an explicit setting for the IncludeHydrogens option will override this behavior:

In[23]:=
Table[ResourceFunction["SmilesString"][mol, IncludeHydrogens -> bool], {bool, {True, False}}, {mol, {benzene, benzeneKekule}}]
Out[23]=
BERJAYA

Use the "Isomeric" option to control whether isotope information is encoded:

In[24]:=
m = Molecule[{Entity["Isotope", "Hydrogen2"], "O", Entity["Isotope", "Hydrogen3"]}, {Bond[{1, 2}], Bond[{2, 3}]}];
ResourceFunction["SmilesString"][m, "Isomeric" -> #] & /@ {True, False}
Out[25]=
BERJAYA

Double–bond and tetrahedral stereochemistry is controlled by this option as well:

In[26]:=
ResourceFunction["SmilesString"][Molecule[{
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"]}, {
Bond[{1, 2}, "Single"], 
Bond[{2, 3}, "Single"], 
Bond[{3, 4}, "Single"], 
Bond[{4, 5}, "Double"], 
Bond[{5, 6}, "Single"], 
Bond[{6, 7}, "Single"], 
Bond[{7, 8}, "Single"], 
Bond[{6, 9}, "Single"]}, StereochemistryElements -> {<|"StereoType" -> "Tetrahedral", "ChiralCenter" -> 6, "Direction" -> "Counterclockwise", "FiducialAtom" -> 5, "Ligands" -> {7, 9}|>, <|"StereoType" -> "DoubleBond", "StereoBond" -> {4, 5}, "Ligands" -> {3, 6}, "Value" -> "Together"|>}], "Isomeric" -> #] & /@ {True, False}
Out[26]=
BERJAYA

Use "RootedAtom"n to create a SMILES string starting at the atom with index n:

In[27]:=
listOfSmiles = ResourceFunction["SmilesString"][Molecule[{
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"], 
Atom["C"]}, {
Bond[{1, 2}, "Single"], 
Bond[{2, 3}, "Single"], 
Bond[{3, 4}, "Single"], 
Bond[{4, 5}, "Double"], 
Bond[{5, 6}, "Single"], 
Bond[{6, 7}, "Single"], 
Bond[{7, 8}, "Single"], 
Bond[{6, 9}, "Single"]}, StereochemistryElements -> {<|"StereoType" -> "Tetrahedral", "ChiralCenter" -> 6, "Direction" -> "Counterclockwise", "FiducialAtom" -> 5, "Ligands" -> {7, 9}|>, <|"StereoType" -> "DoubleBond", "StereoBond" -> {4, 5}, "Ligands" -> {3, 6}, "Value" -> "Together"|>}], "RootedAtom" -> #] & /@ Range[9];
Column[listOfSmiles]
Out[28]=
BERJAYA

These SMILES strings all create equivalent molecules:

In[29]:=
MoleculeEquivalentQ @@ Molecule /@ listOfSmiles
Out[29]=
BERJAYA

Implicit hydrogens are not included in a SMILES string when their presence can be inferred from normal valence rules. Use "WriteImplicitHydrogens"True to write all implicit hydrogens:

In[30]:=
mol = Molecule["hexane", IncludeHydrogens -> False];
ResourceFunction["SmilesString"][mol, "WriteImplicitHydrogens" -> #] & /@ {True, False}
Out[31]=
BERJAYA

Publisher

JasonB

Version History

  • 1.0.0 – 29 July 2020

Related Resources

License Information