[Product Releases]

Most recent post


Can we trust docking results?
Sept 2010

IBM Systems and Technology Group releases a white paper with eHiTS and Cell
Oct 2008

EPA's ToxCastTM project will use SimBioSys' eHiTS as docking engine
Nov, 2007


243rd ACS
Mar 25-29, 2012
San Diego, CA
see >> more


Presentation at the: 3DSig 2008 Structural Bioinformatics and Computational Biophysics

Toronto, Canada  Jul 18-19, 2008

A novel scoring function in eHiTS and LASSO 

Z. Zsoldos

SimBioSys Inc., 135 Queen's Plate Dr, Unit 520, Toronto, ON M9W 6V1, Canada


The primary goal of most virtual screening experiments is to find new lead compounds as starting point of the drug discovery pipeline. There are two typical approaches that are sometimes combined to a screening funnel: ligand-based (2D similarity, 3D pharmacophore, fingerprint, surface or other QSAR descriptor) and structure based flexible ligand docking and scoring. The later is often considered too slow for large scale screening (databases of millions of structures), while the former does not provide 3D coordinates or estimated binding energies.

The fragment based exhaustive flexible ligand docking engine of eHiTS has been published previously [1]. Now we would like to focus on the scoring function of eHiTS, which departs from the traditional atom based interaction scoring (typical to most empirical, force field based and statistical scoring methods as well) and introduces a novel concept of scoring interactions based on Interacting Surface Points (ISP) that are represented by 3D position, normal vector and chemical feature type (23 types including H-bond donor/acceptor, aromatic Pi electron, hydrophobic etc.). A statistically derived empirical scoring function is constructed using 4-parameter geometric description of the relationship between ISP pairs. The geometry parameters include distance between the pair of ISP, angles between the normal vectors and the direction of the interaction and a dihedral angle between the normals. The energy associated with each possibly ISP pair is deduced from the statistics based on reverse application of the Boltzmann equation. During the statistics collection, the temperature factors were considered with the corresponding Gaussian functions applied to the atom positions to account for the variable uncertainty of the atom positions in PDB X-ray structures. More accurate geometric statistics have been collected from the CrystalEye and recently incorporated into the PDB data. Certain atoms (e.g. Nitrogen in the imidazol ring) may participate in very different type of interactions at the same time (e.g. H-bonding and aromatic Pi-stacking). The ISP representation can capture these interactions better than the atom based approach by having multiple ISP associated with the same atom in different directions.

The advantage of the statistically driven ISP scoring is demonstrated on a case study using the Acethylcholine Binding Protein (AChBP) which has a key cation-Pi interaction observed crystallographically for several substrates (e.g. CCE, Nicotine, Lobeline, Epibatidine). Empirical and force field based scoring functions fail to rank the correct binding pose highest even when using DFT-6-31**B3LYP charges. In contrast, eHiTS produces the correct pose with the best score even when using the default statistical table and weighting scheme for which no example from this protein family was included. When the automated training script is run to include the family in the knowledge base, the energy separation between the correct pose and other generated poses improves, providing very cleanly distinguished clusters. Furthermore, the eHiTS score gives a good correlation with the experimentally measured log(Kd) values for the series, correctly rank ordering the actives.

A simple count of the various ISP types present on a ligand provides a very compact descriptor for the ligand's interaction activity profile. We have used these descriptors via a machine learning technique to create a very rapid ligand based VHTS filter - called LASSO (Ligand Activity in Surface Similarity Order) [3]. The descriptor is independent of 3D conformation and is focused on the interaction properties rather than connectivity or structural similarity, therefore it is capable of scaffold hopping, i.e. retrieving active ligands with different underlying structure. LASSO is demonstrated to achieve high enrichment rates for all families included in the DUD benchmark set [4]. LASSO offers an extremely rapid filtering in excess of a million ligands per minute on a single CPU.

The eHiTS flexible docking has proved to be among the most accurate pose prediction tools [5] and combined with the LASSO ligand based filter it provides one of the highest enrichment factors based on comparative evaluation studies [6]. While LASSO can rapidly and efficiently reduce the number of candidates to be docked to a few percent of the total database, the accurate flexible docking with eHiTS used to take several minutes of CPU time per ligand on traditional hardware architectures. The algorithm has been recently redesigned and coded to take advantage of the Cell BE accelerator architecture providing 30-100 fold speed-up [7] bringing the run-time down to a few seconds per ligand on a PS3 or an IBM Cell Blade for the most accurate flexible docking.

The revolutionary hardware technology requires new computation methods, replacing approximate precomputed grids with proximity look-up and explicit pair-wise interaction computation. As a result, the calculation is not only orders of magnitude faster, but it also provides more accurate energy predictions. The emerging technologies presented could also be applied to speed-up other molecular modeling related problems, e.g. QM or MD simulations and protein folding, by multiple orders of magnitude.

[1] Z. Zsoldos, D. Reid, A. Simon, S.B. Sadjad, A.P. Johnson: eHiTS a new fast, exhaustive flexible ligand docking system; Journal of Molecular Graphics and Modeling. Volume 26, Issue 1, July 2007, Pages 198-212; doi:10.1016/j.jmgm.2006.06.002
[2] S.B. Hansen, G. Sulzenbacher, T. Huxfold, P. Marchot, P. Taylor, Y. Bourne: Structures of Aplysia AChBP complexes with nicotinic agonists and antagonists reveal distinctive binding interfaces and conformations. The EMBO Journal (2005) 24, 3635-3646. doi:10.1038/sj.emboj.7600828
[3] D. Reid, B.S. Sadjad, Z. Zsoldos, A. Simon: LASSO - ligand activity by surface similarity order: a new tool for ligand based virtual screening.
Journal of Computer-Aided Molecular Design,,
doi: 10.1007/s10822-007-9164-5
[4] N. Huang, B.K. Shoichet, J.J. Irwin: Benchmarking sets for molecular docking.
J. Med. Chem. 49(23): 6789-801
[5] Kontoyianni, M.; McClellan, L. M.; Sokol, G. S.: Evaluation of Docking Performance: Comparative Data on Docking Algorithms,  J. Med. Chem., 2004; 47(3);  558-565.
[6]  G.B. McGaughey, R.P. Sheridan, C.I. Bayly, C. Culberson, C. Kreatsoulas, S. Lindsley, V. Maiorov, J. Truchon and W.D. Cornell: Comparison of Topological, Shape, and Docking Methods in Virtual Screening
J. Chem. Inf. Model. 2007; 47(4), pp 1504 - 19 DOI: 10.1021/ci700052x
[7] Bio-IT World article (

Full Presentation

Back to the SimBioSys:
Presentations or 2008 Events

[Related Links]

Copyright © 2011 SimBioSys Inc., All rights reserved.