In Memoriam: Peter Csizmadia
December 15th, 2009I just learnt about a terrible news, that deeply saddened me. Peter Csizmadia, one of the founders of ChemAxon, and the father of Marvin, disappeared on a mountain climbing expedition in China, in October 2009. I learnt about Peter, a few months ago, when I saw his breath taking picture on one of the summits of the world with a ChemAxon memorabilia:
http://picasaweb.google.com/real.csizi/PeterCsizmadia#5409623424222466162
Very brave, and very talented - I thought - when I read about his background at the time. Since I know his brother Csizi, the other founder of ChemAxon, and many other exceptionally talented people at ChemAxon, this did not come as a surprise to me. ChemAxon IS a great team of exceptionally talented and hard working people, conquering even the most difficult peaks.
My sincere condolences to the family, the company, and to science. Loss of Peter is a great tragedy. However, his short life was not in vain, the fruits of his work, like Marvin, make his memory live forever.
posted by Aniko
ARChem 2009.1 is released
December 10th, 20092009 has been a year of major progress for ARChem, and the system has hit a number of significant milestones that secured its leading position in the field. We wanted to share a few of our achievements, and to extend our gratitude to many users whose comments have made an impact on the system.
-
Chemistry – Several changes to chemical perception algorithms have been implemented. They improve the way target molecules are being addressed, and the way reaction rules are being extracted and clustered from reaction databases. Those improvements have made a small set of manually coded reaction rules obsolete, and have enhanced the system’s capability to deal with some of the challenging aspects of organic synthesis such as chemical interference, stereochemistry and regioselectivity.
-
Data – As a knowledge-based system, ARChem is highly dependent on the quality and quantity of reactions data encapsulated in commercial databases. We are therefore grateful and proud to have further tightened our relationships with two leaders of the chemical information publishing industry: Elsevier and Symyx. Both CrossFire Beilstein, and Cheminform databases have been fully integrated into the system. Covering a vast spectrum of chemical reactions and offering valuable supporting information through the system.
-
Breaking up starting materials – The search down a branch of the retrosynthetic tree stops whenever a starting material from the educts database is found. Sometimes it is desirable to break such compounds to even simpler precursors, since they are expensive to purchase, not in stock, etc. The user can now exclude starting materials matching the target molecules, and find synthetic routes to those compounds.
-
Viewing solutions – The ability to browse through the manifold of generated solutions has been dramatically improved by a synoptic view of reaction steps. The user can see a “preview” of the various solutions by inspecting the list of the next proposed precursors, and jump directly to the associated solutions.
-
System design – ARChem is now a more complete system which can be used not only as a local installation, but also as an online service. A queueing system, security features, accelerated search times and many other features have upgraded the system performance, accessibility and usability.
Below is an example for a synthetic route found by ARChem for Maraviroc – an HIV drug that was developed in Pfizer’s labs in Sandwich, UK, and got FDA approval in 2007. ARChem’s solution includes 9 reactions, with 6 steps in the two longest paths. In this case, the retrosynthetic analysis leads all the way back to commercially available starting materials, shown with their corresponding providers and catalog numbers. ARChem supplies a lot more information to complete the experimental details of the synthetic scheme, such as, reaction conditions, bibliographic references, and additional starting materials providers and catalog numbers.
The above suggested synthetic route has been generated completely automatically with no user intervention. It is a strong demonstration of the huge potential of this concept, and of the accomplishments so far. We look forward to 2010 with plenty of items in the ARChem pipeline, and we are particularly eager to continue the dialogue with our industrial and academic users – a scientific exchange that guarantees that the development process maintains continuous, rigorous and coherent progress.
posted by Orr Ravitz
eHiTS 2009 as a Blind Docking Tool
December 1st, 2009As the molecular docking paradigm solidifies its status as a significant tool for drug discovery, chemists explore additional applications of the methods in ways that sometime stretch the existing algorithms to their limits. Most docking programs, including eHiTS, have not been designed or optimized to perform blind docking. In structure based drug discovery, the user is typically expected to define, at some level of accuracy, the binding pocket in the target of interest. The binding site is determined either based on known binding modes of ligands as found in crystal structures of complexes, or based on an educated hypothesis. There are cases, however, in which assumptions about the possible locations of binding hot spots are difficult or should be avoided altogether. This is the case, for example, when the existence of secondary binding sites is suspected, or when one would like to screen active ligands and other compounds on a range of targets to estimate the possibility for drug side-effects, toxicity, and other types of biological activities.
The standard eHiTS usage requires a rough definition of the binding pocket. This is done through the clip file. This file should contain at least two sets of coordinates (or two spatial points) that are located in the designated binding pocket. eHiTS then draws a box around those points, expands it to some extent in all directions and places the search grid inside that box. Then, the box is “flooded” with a virtual fluid to detect all the cavities which will define the binding surface. This is a highly automated process, but it still relies on that user-defined clipping. Commonly the native ligand, amino acids from the binding pocket, or a few atoms from either are chosen as a clip file. If eHiTS is run with the -complex option, the native ligand is inferred as the clipping coordinates. However, eHiTS could be used without any clipping. In this case, the entire receptor will be considered for docking. The whole protein will be flooded, and sufficiently deep clefts will be searched on its surface. The final space in which docking will be performed is defined by the interconnected pockets found on the target. The search grid in such scenarios is typically large, and extensive sampling is required. Nevertheless, the computational efficiency of the eHiTS algorithm allows good sampling in reasonable timescales.
Several eHiTS users expressed specific interest in blind docking in recent months, and therefore we decided to evaluate eHiTS’ performance in this context. We used the set that was used in an earlier blind docking evaluation (Hetenyi and van der Spoel, 2006 [1]). We focused on the 43 complexes used in the paper and have not attempted to use the apo structures. 3 codes (1B70, 1FIW and 1QIZ) were left out because of uncertainty regarding the exact structure used in the paper for docking. The default accuracy (3) was used throughout the study. The average blind docking time was 9 minutes per receptor for this set.
Results:
77.5% of the cases gave at least one conformation under 2 Å in the top 10 poses. In the other cases, one accumulative docking round using poses from the first round as clip files produced successful binding modes in the top 5 poses. The top rank pose is in most cases in the correct binding pocket, offering a good starting point for pose refinement.
The table here details the results for the specific codes. The Job# column describes whether the results were obtained with a single blind docking run, or with 2 cycles. The Rank# and RMSD columns indicate the rank of the first pose under 2 Å and its RMSD from the crystallographic conformation. The last two columns indicate the top-rank and closest poses RMSDs.
The blind docking of phenol into insulin (1MPJ) is shown in Picture1 below. The crystallographic pose is shown in cyan, and sample poses are shown in “hot spots” detected during docking. Those poses can be used to clip the receptor in accumulative docking runs in which the sampling is finer, and the binding pockets are better modelled. It should be noted that this code generates an unusually big number (5) of hot spots. In most cases in the set we observed three, two and often one hot spot, manifesting the detection of the correct binding pocket.

Picture1: Phenol binding to Insulin.Several potential binding pockets are detected for this small ligand.
1NGP (N1G9 FAB fragment) is a case where the majority of poses are generated far from the native ligand. Picture 2 below shows that most of the poses are located in the big cavity between chains L and H of the crystal structure. Several poses, however, reproduce the x-ray binding mode (in cyan) with close to 1 Å RMSD.

Picture2: 2-(4-hydroxy-3-nitrophenyl)acetic acid docked into N1G9 FAB fragment. The majority of poses are located in the big cavity between chains L and H.
Conclusions:
The above results clearly demonstrate the viability of eHiTS as a blind docking tool. In all cases the correct binding pocket has been identified in the top 32 solutions, and in most cases good poses under 2 Å and even 1 Å were found at the top of the generated poses. The conformations may be further refined by clipping the receptor for subsequent runs, and by working at higher accuracies. As always in eHiTS, the jobs are extremely easy to setup with a simple command line, and with no required preparation for the receptor or the ligand. This, and the speed of the calculations make eHiTS a high throughput blind docking solution.
Reference:
- Hetenyi, C. Van der Spoel, D.: ”Blind docking of drug-sized compounds to proteins with up to a thousand residues.”;
2006 Feb 20;580(5):1447-50. Epub 2006 Jan 31. - Blind docking results for eHiTS 2009.1, using the test set of Hetenyi et.al.[1]
by Orr Ravitz
eHiTS 2009.1 is released
November 27th, 2009We are pleased to announce the release of 2009.1 docking and screening portfolio, which includes: eHiTS, Tune, Score and LASSO packages for the Intel and Cell platforms. This is an important bug-fix release that resolves instability issues, improves the accuracy of docking, and introduces several new features.
One illustration of the progress is the Top Ranking and Closest Pose accuracy analysis on the Astex 85 [1, 2] test set (figures below) where 5-10% improvement can be observed in almost every category:


Overall the improvement between 2009.0 and 2009.1 is 0.36 A (i.e. 17%) in top ranking averages, and 0.17 (i.e. 18%) is closest average RMSD values.

For more detailed report on what is new in the package please see the release notes under the general docs pages: http://www.simbiosys.ca/docs/
References:
Ref 2: Astex diverse dataset http://www.ccdc.cam.ac.uk/products/life_sciences/gold/validation/astex_diverse/
[http:/ / www.ccdc.cam.ac.uk/ products/ life_sciences/ gold/ validation/ downloads/ download.php4]
Useful scripts available as free download on the SimBioSys website
November 24th, 2009We recently posted on the CCL (Computational Chemistry List, see link here), that there are some useful scripts for the molecular modelling Linux / Unix community available for free on our website. Some of these are specific for eHiTS users, some are more general-purpose. They are all available free of charge here: http://www.simbiosys.ca/download/scripts/index.html
Bookmark the above site, as we’ll keep updating it with more and more scripts.
A novel BACE-1 inhibitor discovered using eHiTS
November 4th, 2009It always feels good, when your product is successful in the hands of the end users, even more so when it comes to scientific software, and drug discovery.
A new article in Elsevier’s Bioorganic & Medicinal Chemistry Letters describes how researchers at the University of Leeds discovered novel non-peptide leads for β-secretase (BACE-1) - one of the key enzymes involved in the pathogenesis of Alzheimer’s disease, and a major target for drug discovery.
It is particularly exciting for us to know that our tools may play an instrumental role in finding a cure for a disease that affects so many beloved people in our lives.
The paper:
Interesting read: London Stock Exchange dumps Windows for Linux
October 15th, 2009ComputerWorld reported on Oct 7th, 2009:
When it comes to business computer systems, nothing is more mission-critical than the massive trading software systems that underlie stock markets. A failure of an hour here can mean billions of dollars of lost trades….
see article at London Stock Exchange dumps Windows for Linux
Bottom line, the London Stock Exchange (LSE) had so many troubles and scandals due to software problems (crash, slow etc.) in the past, all related to their Windows 2003-based servers, that they decided to look for a Linux replacement that seems to be more reliable, faster and a lot cheaper solution - their conterpart in the USA - the NYSE - had done so a long time ago.
I hope the pharmaceutical companies do not consider their operations less mission critical.
posted by: Aniko
SimBioSys and Symyx team up to enhance computer aided synthesis design capabilities
July 16th, 2009The field of computer aided synthetic design is once again capturing the interest of the chemistry community after years of status quo. SimBioSys’ ARChem is delivering one of the most advanced solutions for synthesis design by by exhaustively enumerating routes from readily available (in-house or purchasable) starting materials to the target molecule of interest. The program performs retrosynthetic analysis using reaction rules deduced from an artificial-intelligence (machine learning) generalization of millions of rules in reaction databases. The success or failure of such a machine learning approach depend, in part, on the quality and comprehensive nature of the reaction databases supplied by the user. Therefore, it is with great excitement that we announce a new partnership between Symyx and SimBioSys under which the ChemInform Reaction Library (CIRX) will be made available for use in ARChem. CIRX is derived from the well respected journal of current reaction data published by FIZ Chemie, Berlin. This database is updated semiannually to keep abreast of the latest developments in organic synthesis, with roughly 60,000 new reactions added every year to a database that already has well over a million reactions. All areas of organic chemistry are abstracted, including heterocyclic,and natural product chemistry, enzymatic processes, and reactions involving new catalysts. The high quality of the CIRX database provides ARChem with a solid basis for the reaction rules, which will be generated from it.
We look forward to integrating Symyx’s database into ARChem, and to the exploration of other areas of common interest.
CLiDE for Converting Structure Images to Structure Files
June 15th, 2009SimBioSys is a distributor of the CLiDE software, a software package for converting chemical structure images to chemical structures that can be read and interpreted by chemistry software packages, such as ChemDraw and ISIS Draw for example. The software package has been developed in the past by two of our founders: Aniko Simon PhD, Computer Scientist, currently VP of Business Development at SimBioSys, and over many years by Prof A. Peter Johnson (http://www.chem.leeds.ac.uk/People/Johnson.html), an expert in the field of de-novo structure design, synthetic chemistry and the applications of software to chemical structures. CLiDE is installed in organizations around the world and, for many years held a unique position. A new publication on CLiDE just came out a few weeks ago, by the current development team headed by: Aniko T. Valko, see the full citation at:
SimBioSys scientific publications page or ACS - JCIM page
This recent paper systematically evaluates CLiDE Pro’s performance on a large variety of structures, that surpasses our previous validation set for CLiDE. The authors are offering this new, carefully selected test set for base-lining and testing other optical chemical structure recognition (OCSR) tools. They suggest that this test set could be the starting point for a community-based effort to establish a benchmarking test set which would include different categories of images each of which dealt with specific problem types.
This new OCSR baseline testset is available from the publisher of the CLiDE paper as supporting information to the paper as well as downloadable from our web-site: http://www.simbiosys.ca/clide/validation.html

