CompanyProductsScienceSupportWhatsnew
[Product Releases]
Index
[Blog]

Most recent post

[News]

IBM's Systems & Technology Group releases a white paper with 
eHiTS & Cell
Oct 2008

Virtual Screening by Flexible Docking on a PlayStation 3
Apr, 2008

EPA's ToxCastTM project will use SimBioSys' eHiTS as docking engine
Nov, 2007

[Events]

240th ACS
Aug 22-26, 2010
Boston, MA, USA
booth #945
see >> more

Index

De Novo Ligand Generation and Docking

Poster presented at the 36th Buffalo Medicinal Conference, May 1995.


Introduction

An automatic, interactive computer system, called SPROUT, for de novo structure based molecular design is currently under development in the University of Leeds. The system consists of several modules addressing different subproblems of structure based drug design: detection of protein clefts, identification of potential interaction sites, primary molecular structure generation, conversion of primary structures into molecules and analysis of the solutions. This poster outlines the primary structure generation method with the docking of the generated molecular structures into a receptor site.

The structure generation is exhaustive within the bounds defined by the constraints and uses novel systematic graph searching, hence does not rely on random techniques. Therefore the best solution consistent with the constraints is always guaranteed to be found.

The generated partial structures are docked into the receptor site in every step of the search using a very fast (several hundred structures per second), purely geometric rigid body docking process which focuses on localised interaction sites, called target sites. These sites represent regions in space where ligand atoms with constrained directionality should be found. The docking method consists of several algorithms including a binary search technique combined with least squares fit, analytical and numerical optimisations.

Structure generation by bidirectional graph searching

3D chemical structure generation is a combinatorial problem. The problem space is explored by graph searching techniques, that involves heuristics for pruning the graph in order to reduce the combinatorial explosion. The number of examined graph nodes is further reduced by applying a novel bidirectional search technique.

Some functional groups are docked at the target sites prior to structure generation. Then two group of target sites are selected and structures are generated to connect them. The resulting set of structures is used as starting point for another connection phase. The pairwise connection phases are applied until a final set of structures satisfy all the desired target sites.

In a connection phase the structures are grown from two opposite directions at the same time and the halves are connected at the midle of the cavity. A Breadth First Search (BFS) is applied from one side, then a Depth First Search (DFS) is applied from the other side to generate all the structures that can be connected to any of the structures on the first side.

The figure above illustrates the part of the problem space that is explored by the method (grey shaded) and also the saving (stripes) compared to a single graph search. Let n denote the number of levels, s the number of successors of a node. A single graph search would examine sn nodes. The simultaneous search examines 2sn/2 nodes only! E.g. if s=20 and n=6 then sn = 64,000,000 and 2sn/2 = 16,000.

Growing structures by template joining

The structures are represented by vertices, which are defined by hybridisation (hence geometry) but not atom types, and bonds, which are defined by bond type (single, double, aromatic, etc.). The partial structures are grown by joining small fragments, called templates, to the seed vertex of the existing structure (initially it is the docked starting functional group, later a partial structure that already consists of several templates). Three different join types are applied in SPROUT:

A predefined discrete sampling of dihedral angles (representing low energy conformations) is applied about each new bond join. The template library consists of 3-6 membered rings and sp3 and sp2 atoms (for building chains).

Connecting partial structures

A connection is made between a pair of partial structures originating from different target sites. The candidate partial structures must have a template in common, which is overlapped during the connection. All possible pairs are examined in turn. Each resulting structure is positioned by the geometric docking method to satisfy all the target sites of both partial structures and to avoid violating the steric constraints of the receptor site. If such a position and orientation does not exists then the structure is rejected. The figure below shows an example for the connection:

Target sites represented by geometric regions

The interaction sites are represented by 3D geometric regions. The regions are calculated according to distance and angle tolerances for an expected interaction to a certain receptor atom. For example, a hydrogen bond acceptor region is generated for each hydrogen bond donor of the receptor site to ensure the complementarity required for molecular recognition. The minimum and maximum distance between the acceptor atom and the hydrogen together with the minimum hydrogen bond angle define the volume within which the acceptor ligand atom should lie. Similarly, a geometric region is defined for ligand donors and hydrogens to interact with acceptor atoms in the receptor site. Geometric regions for metal ion interactions are also defined by bond angle and distance tolerances. Examples of these geometric regions are shown below:

Specific, strong interactions are observed when a ligand atom forms multicentred or bifurcated hydrogen bonds to the receptor site. SPROUT can represent 8 different compound hydrogen bonding situations by appropriate geometric regions. One of these cases is a double interaction when an OH group donates its proton to an acceptor atom of the receptor site while accepting another hydrogen bond from a donor atom of the receptor. A geometric region, representing the volume within which the OH oxygen can be placed to provide this situation, is shown on the right.


Hierarchical Least Squares Fit

The Least Squares Fit (LSF) technique is suitable for overlapping two set of points in 3D. It can also be used to place some atoms within given spheres by applying a weighting scheme which reflects the differences in the radii of the spheres.

A hierarchical enclosing sphere system is defined for each target site region. The outermost sphere encloses the whole region. The region is cut into two halves by a plane perpendicular to the longest dimension (see figure below). Two enclosing spheres are generated for the halves. The procedure is repeated for both parts until the radius of the enclosing sphere is smaller than the desired resolution value (e.g. radius of 0.1A).

The positioning of the structure to satisfy the target sites is carried out by iterative application of LSF to fit the covering vertices to the centres of the spheres in the hierarchical representation. In the first iteration, the outermost enclosing sphere is used for each target region. Then a sphere is selected from the second level of each site, which is closer to the actual position of the covering vertex. The procedure is iterated searching down through the hierarchy until the leaf nodes are reached. The number of iterations is equal to the number of levels in the deepest hierarchy, i.e. the logarithm of the number of smallest spheres.

Position optimisation by penalty function minimisation

The second phase of the docking resolves boundary violations and orientates the mobile structures as close as possible to the goal target sites. A structure is mobile if it is anchored to less than 3 target sites.

The method is based on numerical optimisation techniques applied to a penalty function. The penalty function consists of three components:

St = distance between the covering vertex and target site t (zero if satisfied).

Bv = distance of vertex v from the boundary surface if it is outside, otherwise zero.

Gv = distance of vertex v from the goal target site.

The distances are precalculated and stored in a grid. The shape of the cavity is taken into account, i.e. the shortest route is calculated within the available volume (avoiding boundary violations). The precalculation uses a flood algorithm combined with the Dijkstra algorithm to calculate 'quasi cubic' distances. Cubic distances are measured along main axis directions only which can be a coarse over-estimate of the Euclidean distance. The novel quasi cubic distances are generated using a flood that also progresses in diagonal directions but using different step increments for axis directions (10 units), plane diagonals (14 units) and 3D diagonals (17 units):

Test results for the APPA binding site of Trypsin (PDB code 1TTP)

p-amidino-phenyl-pyruvate (APPA):

SPROUT was set up to generate molecular structures that have an amidino and a carboxy group in the key interaction positions involved in APPA binding. The bound state conformation (sp3) of the carbonyl group (at O1), which provides a covalent bond to serine oxygen, was also required. Using chair cyclohexane and benzene as the only ring spacers, the program generated 7275 partial structures for this highly constrained input in 2.5 minutes. 524 structures were successfully docked and 9 final solutions were found. Some of the final solutions are shown below:

Solution #2 has an equivalent 2D structure to the bound state of APPA. The generated 3D conformation is shown (blue) in the following figure, overlayed with the bound conformation of APPA (yellow). The receptor atoms around the binding site are also shown.

Test results for the ras P21 protein (crystallised with GDP, code 1Q21)

The natural ligand, GDP (guanine diphosphate):

SPROUT was set up to mimic the guanine and b-phosphate groups of GDP, because these groups are known to interact strongly with the receptor. The program has generated 525432 partial structures during the run (3 hours and 23 minutes), from which 50855 were successfully docked. 177 final solutions were found that had atoms in appropriate positions and orientations for the expected interactions. Some of these are shown below:

Solution #117 (blue) is shown below in the context of the receptor site, overlayed on GDP (yellow). The target site regions are highlighted on the figure.

Conclusions

A fast de novo structure generation method coupled with a geometric docking method has been developed and implemented. The generation is based on graph searching and applies a combination of BFS and DFS strategies. The search is exhaustive within the bounds defined by the template set, conformational sampling and limiting parameters. This feature is unique to SPROUT among the large number of de novo drug design programs. Although, methods using random steps can give quickly some promising solutions, there is no guarantee that they will find all reasonable solutions, hence the optimal drug candidate might easily be missed.

It was shown that the program is able to generate solutions that have equivalent 2D structures and very similar 3D conformations to known bound ligands. SPROUT can also generate a large variety of promising novel structures.

Future work plans for the project include the ability to start the search from known fragments and let the program extend the structure to satisfy additional target sites, and also to provide better handling of hydrophobic sites.




Copyright © 2010 SimBioSys Inc., All rights reserved.