SPIDeR, the structure generation module of the SPROUT Toolkit, aims to
generate skeletons or molecular graphs that
satisfy steric constraints.
An example of the constraints
The steric constraints consist of a boundary, usually defined by the solvent
accessible surface of a receptor site, and a set of
target sites, these are small regions
of space that model localised interactions between the growing ligand and the
receptor site. See HIPPO for more details
about the target sites.
Skeletons are built by the stepwise joining of small molecular
fragments, called templates.
Templates are 3D molecular graphs where the edges of a graph represent chemical
bonds and the vertices of a graph represent generalised atoms, i.e., they are
defined by hybridisation state but not element type.
Templates = 3D molecular graphs
Template joining operations include the fusing of templates,
spiro joining templates and forming a new single bond between two templates.
Template Joining
In the latter case, a number of conformations are
produced about the new bond. The symmetry of templates is taken into account
and a number of template joining rules exist to increase the efficiency of
the program and also to prevent the formation of unlikely substructures.
The template library consists of chain and ring templates. Acyclic templates
of between 1 and 4 atoms are included in the library and they can be joined
into larger fragments by
forming new bonds between them. Thus, any chain structure of sp3 and
sp2 atoms can be built.
The ring templates are listed below. Some of them
are represented by more than one conformation. In these cases,
the number of conformations
is displayed inside the ring. Click on the inline image
if you want to see an external picture of the conformations.
The method of structure generation used in the first version of SPROUT
has been published [2].
More recently, we have developed a new method for structure generation.
This method is summarised here and will be described in more detail
in the near future [3]. The program has been
tested and is currently being used by several pharmaceutical companies.
Templates are positioned at all of the targets sites prior to skeleton
generation. These templates become partial skeletons. (see EleFAnT for more details)
Partial skeletons are grown outwards from each of the target sites by
the stepwise joining of templates.
Partial skeletons originating from different target sites are connected
by superimposing a template that is common to both partial skeletons.
The partial skeletons are oriented by a geometrical
docking method after each joining or connection operation.
The generated structures undergo a series of tests to check if they
satisfy the user defined parameters.
The problem space for skeleton generation is represented by a number of trees,
i.e. a forest of trees. Each tree in the forest is
associated with either a single target site or a group of target sites.
Each node (branch junction or leaf)
of a tree represents a partial skeleton that satisfies the target sites
covered by the tree. The roots of the trees represent the target sites.
Forest
Click on the figure above to see a schematic representation of the problem space.
The trees of the forest are explored in tree pair connection phases.
Each tree pair connection phase takes two trees as input and
results in a single combined tree. Thus as the search progresses the number of
trees used to represent the problem decreases until finally, when all of the
search space has been explored, the results are represented by one tree, the
solution tree.
A tree pair connection phase consists of:
selecting the two trees to connect;
performing a Breadth First Search (BFS)
on the first tree (the BF tree);
performing a Depth First Search (DFS)
on the second tree (the DF tree);
replacing the two source trees by the
combined tree of the connection phase.
In the BFS phase, the BF tree is grown by applying join operations in each
node expansion until all the leaves span at least half the distance between
the target sites of the BF and DF trees.
Expansion
During the DFS phase the nodes in the DF tree are expanded.
Following each node expansion,
a connection is attempted between each expanded node in the DF tree
and all the nodes
of the BF tree that have a template common to the expanded node.
The successful connections result in new nodes in the combined tree.
The skeletons resulting from the connection of partial skeletons
are positioned
to satisfy the geometric constraints of the target sites by a
geometric docking process. A connected skeleton
must be positioned so that: it covers all the target sites that
were covered by the individual partial skeletons; none of its vertices
violate the boundary; and its position is optimised relative to the
remaining target sites.
The algorithm that is applied is a
Directional Least Squares Fit (DLSF). It is based on
[1], but is extended to optimise the
directions of bonds as well as the positions of the vertices.
The algorithm also applies some wriggling (small rotations) and translations
to attempt to reach the target sites that are still uncovered.
Both the connected and the expanded skeletons are optimised within the cavity
to avoid violating the solvent accessible surface and to reach the closest
possible position to the goal target site(s) without losing contact with
the satisfied target sites. This positioning procedure consists of the same
steps outlined above, but uses also additional conditions and parameters.
The speed of the docking process is very important as it is applied to each of
the partial skeletons generated during the combinatorial search.
The method is very fast (it processes hundreds of skeletons per second on an
SG Indy-4000) because it is purely geometry based and does not
perform any energy calculation. The distance calculation is accelerated
by a precalculated quasi-cubic distance grid.
There are a number parameters that can be used to limit the large diversity of
structures that are possible solutions.
These limits also help to reduce the combinatorial explosion.
A brief description of some of these parameters is
given below.
Vertex limit
The maximum number of heavy (non-hydrogen) atoms in a skeleton
Ring-3 limit
The maximum number of 3 membered rings in a skeleton
Ring-4 limit
The maximum number of 4 membered rings in a skeleton
Ring-5 limit
The maximum number of 5 membered rings in a skeleton
Ring-6 limit
The maximum number of 6 membered rings in a skeleton
Chain length
The maximum number of consecutive acyclic bonds in a skeleton
Rotatable bonds
The maximum number of rotatable bonds in a skeleton
Spiro joins
The maximum number of spiro joins in a skeleton
Fuse joins
The maximum number of fused bonds in a skeleton
Ring ratio
The minimum percentage of ring vertices required in a skeleton
Van der Waals energy cut-off
The maximum allowed intra-molecular VdW interaction energy (kJ/mol)
Strain energy cut-off
The maximum allowed conformational strain energy (kJ/mol)
Rotatable bond penalty
The energy penalty for each rotatable bond (added to the strain energy)
Accessible surface tolerance
The probe radius of the accessible surface for skeleton generation
The user can interrupt the search process and use graphical tools to browse
through the search trees to monitor the process and he/she can interact with
the search process itself. The search can be stopped after any node expansion
and there are then many possibilities for guiding the search. For example:
The order in
which the trees are processed can be altered.
To speed up the search a tree pair connection phase can be aborted
before completion; the current BF
and DF trees are deleted and processing resumed with the next tree connection
to give a subset of the possible solutions.
Nodes can be pruned from any of the trees at any time. Pruning a node also
results in the pruning of all its successor nodes.
The set of spacer templates
currently in use can be altered at any time during the search by removing
templates or adding new ones to the set.
The operations that can be performed
on a node, or on several nodes, in a tree can be altered, e.g. preventing
certain type (fusion/spiro/new bond) of joining of new templates to partial
skeletons.
At one extreme, a specific skeleton can be interactively
built by manually specifying individual templates and joining operations.
In practice, interaction is more useful at a higher level.
A test run was performed for generating ligands that can bind to the
APPA binding site of Trypsin. Five target sites (generated by HIPPO) were
used: two compound donor sites where the amidine group nitrogen atoms of APPA
are bound, an acceptor and a dual (donor and acceptor) site
at the carboxy group of APPA and a covalent site. The selected template set
included chair cyclohexane, five and six membered aromatic rings and acyclic
templates. The minimum ring ratio was set to 0.33, the number of vertices
was limited to 20, the number of 5-membered rings was limited to 1,
the number of 6-membered rings was limited to 2.
The CPU requirement of the run was 31 minutes and 18 seconds on a Silicon
Graphics Indy 4000 machine. The program generated 20 solutions that
satisfy all the target sites, steric and parametric constraints.
The solutions are listed below. An external large colour image of each
solution in protein environment with the target sites is available by clicking
on the corresponding image# link (# is the number of the skeleton).
The PDB# links lead to
pdb files containing the skeleton together with the receptor site. They can be
displayed by the "MIME hyperactive molecule" system.
The first version of SPROUT [2] used a
different algorithm for structure generation that samples the problem space
by choosing
discrete but fixed orientations for the skeletons. Therefore it was limited
to an arbitrary subset of the problem space, and was not able to find
solutions for some (theoretically solvable) problems.
The version outlined here (SPROUT2) is more exhaustive as it explores structure space
as a continuum. It is able to
generate a large number of solutions for a diverse set of problems and
is currently being used in a number of laboratories for structure-based
drug design.
The version of SPROUT that is currently under development uses a more
suitable target site representation
derived from the output of HIPPO, so that the structures that are
generated are restricted to those that are most likely to bind strongly
to the receptor site.
Other future developments of the program will include
improvements to the pharmacophore mode representation.