[Product Releases]
|
|
|
|
|
[Blog]
|
|
Most
recent post
|
[News]
|
|
IBM's Systems & Technology Group
releases a white paper with
eHiTS & Cell
Oct
2008
Virtual Screening by Flexible Docking on a
PlayStation 3
Apr,
2008
EPA's ToxCastTM project will use SimBioSys'
eHiTS as docking engine
Nov, 2007
|
[Events]
|
|
237th
ACS
Aug 22-26, 2010
Boston, MA, USA
booth #945
see
>> more
|
|
|
|
|
eHiTS ®
: Electronic High Throughput
Screening
|
|
Frequently
Asked Questions:
|
- Does the software predict the Active Site?
- If the protein is a hypothetical one, is there any provision to
define the active/binding site or can it perform a blind docking?
- Do I need to add hydrogens to the protein?
- Should I add hydrogens to the input ligand?
- What does eHiTS do about protein flexibility?
- Can eHiTS work with peptide ligands?
- Why eHiTS
says: "cannot convert" for some of my peptide ligands in PDB format?
- Why does eHiTS give me an error when it tries to convert the PDB file?
- eHiTS is giving strange results when I use PDB files generated with Sybyl, why?
- What kind of CHARGES should I save in Sybyl as MOL2 input file for eHiTS?
- What input formats does eHiTS accept?
- Do I need to convert my 2D ligands into 3D?
- How does eHiTS identify the file type?
- How are cofactors, water and metal ions considered during docking?
- How does it handle Water Molecules? (Freely rotatable, displaceable or oriented?)
- Can eHiTS handle metal ions including metalloprotiens?
- Do I need to prepare the correct protonation state of the ligand?
- Can I assign protonation states to the molecules?
- Why hydrogen atoms are not included in resulting SDF files?
- Why do I get bond type 4 (aromatic) in my SDF output files?
- What does eHiTS do with chiral molecules?
- Why do I not get the ligand name in the output score file?
- Does the program give the RMS value in Docking results?
- How can I visualise the results of eHiTS?
- eHiTS on ubuntu Linux, is it not working there?
- Install says iteratively: "Package_Linux.bin: 32: cut_relative: not found". Can eHiTS (or any other SimBioSys package like CheVi, Lasso etc.) be installed on ubuntu Linux?
- License Expired?
Tune package related questions:
-
If I don't use "-active" flag, how many complexes should I put into the list file for the tune package?
If I describe 2 complexes in list file, Tune automatically train with only two actives and 400 decoys.
-
If I use "-active", how many active compounds should I prepare at least? And that time, how many complex does Tune need in list file?
-
Could Tune train pose validation like eHiTS 6.2? If Tune couldn't, is the purpose of Tune that improvement of enrichment?
- Please let me know the property of decoy compounds.
Known Issues:
- Why do I get charge changes in SDF output? (for eHiTS 6.2)
- Why the stand alone split application from eHiTS 2009.1 does not want to work for me?
Frequently Asked Questions
Q: Does the software predict the Active Site?
A: Predicting the active site may have two meanings:
- predicting where in the protein can the ligand dock, and
- predicting the exact geometry of a prespecified binding pocket.
Item (1) is discussed in the next reply, see Blind Docking related reply below.
Regarding item (2), this is certainly done in eHiTS. In this context we could
discuss two main options to run eHiTS: the -complex keyword and
the -ligand -receptor -clip combination. In the former case, the user lets
eHiTS separate the ligand from the receptor, and eHiTS "clips" the protein
around the found ligand. In the second option, the user specifies the general
location of the binding pocket using the clip file, and eHiTS clips the
protein around the coordinates supplied in that file. The clipping itself
means that a box around the relevant coordinates will be created, and the
search grid will be placed in that box. All the rest of the protein becomes
obsolete for the remainder of the calculation. After clipping, eHiTS "floods"
the clip box and determines the surface of the binding pocket, by detecting
the interconnected cavities. This continuous cavity consitutes the binding
pocket.
Q: If the protein is a hypothetical one, is there any provision to
define the active/binding site or can it perform a blind docking?
A: eHiTS was not designed for binding pocket detection, but it actually does a
good job in that respect. If the user has separate files for the ligand and
for the receptor, then running eHiTS with the following command:
ehits.sh -receptor protein_file -ligand ligand_file
without using the '-clip' option, will invoke blind docking. eHiTS will
attempt to bind the ligand everywhere in the protein an in most cases will
find at least one possible binding site. We have saen cases where it detected
correctly the main site, and secondary sites as well.
For more information about Bind Docking, see our 2009 Technical Note, or Dec 2009 blog posting.
Q: Do I need to add hydrogens to the protein?
A: No need to add hydrogens - eHiTS will do that
Q: Should I add hydrogens to the input ligand?
A: No need to add hydrogens to the ligand either
- eHiTS will do
that. But if the input DOES have already H atoms, then eHiTS will use
the given POSITIONS, but it may switch the protonation states treating
some of those as a lone pair rather than hydrogen.
This feature
("use H when given, generate when not given") allows the user more
control. If the user has generated the H positions from reliable source
(e.g. QM modelling, minimization), then it is better to use those.
However, it is not worth using OpenBabel, Corina or other simple
modelling tool to generate them, because eHiTS' internal knowledge base
will do as good or better job and will definitely match better its own
training that way.
Q: What does eHiTS do about protein flexibility?
A: We consider that eHiTS provides a soft
representation of the receptor, because of the following three
algorithmic solutions:
- The eHiTS scoring function takes advantage of
the temperature factor information provide in the PDB files to give a
more complete picture of the interaction. The program also uses the
probability of the atom positions to create derived empirical scoring
function.
- eHiTS rotates the -OH groups of Ser, Thr and
Tyr residues of the protein and also the -NH3+ group of Lys. I.e. the
interaction flexibility of these is considered. Note: we are not moving
the heavy atoms of the main or side chains during the process.
- The steric clash, or van der Waals potential,
is not considered with a hard 6-12 potential like typically in most
force fields, but with a softer quadratic potential.
Q: Can eHiTS work with peptide ligands?
A: Yes, eHiTS can work with peptide ligands, we see very good results
using peptides (we have a pharmaceutical partner that exclusively uses
eHiTS for peptides). They have seen very good performance with 8-10
residues, although the more you add the longer the computation will
take. eHiTS does not have a limit for the number of rigid fragments,
but results will be useless beyond 12 or so fragments.
Q: Why eHiTS says: "cannot convert" for some of my peptide ligands in PDB file format?
A: This happens only with input ligands in PDB files, because we do not
yet handle properly some of the end-residues (N-terminal portions
of the peptide ligands) for peptides in PDB file
format. Problems can be circumvented by use of the peptide input ligand in MOL2 file format
(e.g. by converting from PDB to MOL2 format using OpenBabel).
Q: Why does eHiTS give me an error when it tries to convert the PDB file?
A: If you used some molecular visualization software to save
the
PDB file...
The Protein Data Bank has a very strict file format definition for PDB
files: http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html.
Unfortunately, most PDB files that are publicly available DO NOT follow
the written standard of the format. In fact, several commercially
available molecular visualization software DO NOT adhere to these PDB
file conventions. Software such as Quanta do not include the "CONECT"
keyword when it saves the PDB which is in fact essential to determining
the correct connectivity of the atoms. These softwares use
simple
distance based criteria from the coordinates to GUESS the connectivity
of the atoms. However, eHiTS requires the "CONECT" keyword to
ensure the accuracy and integrity of the molecule.
So the take home message is: if you use some molecular
visualization software to manipulate the PDB file, DO NOT save it as a
.pdb but instead save it in one of the other formats that eHiTS
accepts: .mol, .mol2, .sd
If you downloaded the PDB file and used
it as is
without changing or saving in another program...
It is unfortunate that even some of the PDB files in the Data Bank
violate their own standards, contain inconsistencies, errors in the
file. Some automated error correction is already implemented
in
eHiTS, but there can be error scenarios that we have not yet
discovered. For instance, there are some files in
which the
connections of the atoms are not correct. This will
inherently
cause an error in eHiTS. However, this problem is currently being
solved by our development team and will be announced as soon as it has
been fixed.
Please inform us about the PDB code if you run into input file
conversion problems with original PDB files from the Data Bank.
Q: eHiTS is giving strange results when I use PDB files generated with Sybyl, why?
A: Sybyl, as well as some other molecular visualization tools, do not
always stick to the strict PBD standard formatting. In the case of
Sybyl, it does not include the atom labels in columns 77-78, and
therefore the output does not distinguish between atoms with ambiguous
atom labels, such as alpha carbons (CA) and calcium (CA). In this case
it is better to use the default MOL2 output from Sybyl as input to
eHiTS.
Q: What kind of CHARGES should I save in Sybyl as MOL2 input file for eHiTS?
A: According to Tripos' MOL2 file format definition (from 2005) the charge type
associated with a molecule in a mol2 file could have one of the following
values:
NO_CHARGES, DEL_RE, GASTEIGER, GAST_HUCK, HUCKEL, PULLMAN, GAUSS80_CHARGES,
AMPAC_CHARGES, MULLIKEN_CHARGES, DICT_ CHARGES, MMFF94_CHARGES, USER_CHARGES
In eHiTS we accept ALL of the above charge types, and consequently use the
charge values coming from the mol2 input file, except for the type:
NO_CHARGES <- this is the charge type field of the mol2 file
In that case, we completely ignore all the charge values in the file.
We have recently seen some input files coming from Sybyl 8.0, that looked like
this:
USER_CHARGES <- this is the charge type
INVALID_CHARGES <- this is the INTERNAL status bit for Sybyl
These kind of files will be perceived incorrectly by eHiTS v9 series, because
eHiTS will still use the charges, despite it's stated in the second line not
to. To avoid this, either set the charge values correctly in your mol2 file or
if you do not have such information about your system, use NO_CHARGES - and
eHiTS will automatically calculate it for you.
Q: What input formats does eHiTS accept?
A: The following input file formats are supported:
-
MDL Molecular files (mol or sdf) - 3D only;
-
Protein Data Bank files (pdb);
-
Tripos Mol2 files (mol2) - 3D only;
-
Tagged Molecule Ascii (tma) - native eHiTS
format;
-
Tagged Molecule Binary (tmb) - native eHiTS format.
Q:
Do I need to convert my
2D ligands into 3D?
A: Yes, eHiTS works only with 3D ligand files. So, if your input is in
2D coordinate system, please convert it to 3D with a tool, such as Corina.
Q: How does eHiTS identify the file type?
A: The input file format is identified by the
extension
of the file name. Some examples:
file.pdb - PDB file
file.mol - MDL Molecular file
Please DO NOT use "." in the file name because this will cause errors,
e.g. file.name.tma
Q: How are cofactors, water and metal ions considered during docking?
A: There are two cases here to consider:
- if you use the "-receptor" option then ALL ATOMS of the receptor are treated equally,
i.e. they all contribute to the steric grid to define the shape of the
cavity, perception will
assign activity to the atoms (H-bond, hydrophobicity etc.), surface points are generated
to those that are at the cavity surface, the type of the surface
points are based on the activity assigned in perception, docking will
position fragments and score ligand interactions against them.
So, in nutshell, it does not matter if an atom is part of a protein
residue, cofactor, salt ion, metal ion, water molecule or anything
else, they all perceived
correctly (based on the connection table), properties assigned to the surface and as
such fully participate in the docking as long as they are atoms of
the receptor molecule.
- if you use the "-complex" option, to split a
PDB
file, we have a list of recognized co-factors, water and metals that
will not be considered as ligand, but be kept as part of the receptor.
For example, the residue IDs that are recognized as co-factors are:
"NAD", "NAP", "NAG", "CNA", "NDP", "FAD", "FMN", "HE0", "TYS", "BTB",
"COA", "MAN", "LMU", "PLP", "HEM", "BTN", "HEA", "HAS", "MES"
. Any other co-factor name that is not listed here may be treated as
"ligand" by split (depending on the size relative to the real ligand),
which could potentially lead to incorrect splitting. However, this
problem is completely bypassed if the -receptor option is used instead
of the -complex.
Q: How does it handle Water Molecules? (Freely rotatable, displaceable or oriented?)
A: Water molecules in eHiTS are freely rotatable.
Q: Can eHiTS handle metal ions including metalloprotiens?
A: Yes. eHiTS performs well for a variety of situations involving metal-ions and
metal ion chelating ligands.
A: Water molecules in eHiTS are freely rotatable.
Q:
Do I need to prepare the correct protonation state of the ligand?
A: No, it does not need
to be prepared. eHiTS handles all possible protonation states of the
receptor and ligand in a single run!
The issue of
protonation state is very important to the docking problem. Ligands and receptors with different
protonation states can have dramtically different binding positions.
However, it is common practice for many docking programs to ignore this issue
and require that the user define a particular protonation state prior
to running a docking experiment.
This approach may be fine for a re-docking experiment
where there is experimental data to help the user identify the correct
protonation state. However in many cases there is no way for
the user to know this.
Protonation states of ligands and receptors are
determined by the interaction between the two. Thus for any
particular receptor-ligand pair there will generally be one correct
protonation state (although there are cases where multiple valid docking poses
exists for different protonation states of a particular receptor/ligand
pair). However for a different ligand, the protonation state
of the receptor may be altered, to reflect the characteristics of the
ligand. If a docking program were to pre-set the protonation
state of the receptor then possible interactions with a ligand could be
lost. A better solution, with a more appropriate score, can
be found only if the program is run with different protonation states (not
necessarily the neutral or the lowest energy form).
eHiTS takes a unique approach to the protonation
problem. eHiTS systematically evaluates all possible
protonation states for the receptor and ligands, automatically for every
receptor-ligand pair. It does this through the use of
ambiguous properties flags for postions that could be either protonated or
deprotonated (i.e. have a lone pair). Then during the docking
algorithm each state is evaluated and scored. The result is
the only docking program that evaluates all possible protonation states for
the receptor and ligand in a single run.

On a more practical level, this means that eHiTS may
alter the protonation state of the input receptor and ligand to achieve
the best possible binding score. For example, if a user were
to enter a molecule with a carboxilic acid group (as pictured above) in
its neutral protonation state (left), depending on the receptor environment, eHiTS
may output the deprotonated carboxilate form (right), as this the
form often seen under physiological conditions.
For more info see the technical notes on the Automatic
Protonation state handling in eHiTS
Q: Can I assign protonation states to the molecules?
A: Yes, you can.
Whenever the input files do not contain any hydrogen atoms, eHiTS will
evaluate the protonation state on the fly as described in the answer to
question the previous question. If
the user provides a specific protonation state in the input files, the
automatic protonation state handling mechanism will still be invoked as long
as the command line argument "-fixproto" has not been used. However, even
when -fixproto is not used, eHiTS will use the coordinates of the hydrogens
in the input files as optional locations for protons, and will assess whether
those are populated or not.
The user may wish to run the docking with the pre-assigned protonation state.
In this case the user should add to the command line:
-fixproto ligand/receptor/both
where the user can choose whether to fix the
protonation state for the entire system, or only for the ligand or the
receptor.
Q: Why hydrogen atoms are not included in resulting SDF files?
A: In the current version of eHiTS, we are not
outputting the protonation state that we are using in the scoring. The
way we sample protonation is by using a local model only, no
consideration for pH at all. If a location can be protonated or
de-protonated, we score it as if it were either, then choose the
protonation state that scores the best. Essentially, for a given pose
we are giving it the "best score possible".
There are several problems when outputting the
protonation states. One is that a protonation state is a property of
the complex not just the ligand, therefore we would need to have the
corresponding receptor information as well. We are working on
incorporating this in CheVi (we will show both the receptor and ligand
protonation).
The second, is that as you change protonation,
tautomers could also change, and we currently do not have a mechanism
of fixing the tautomers. We are currently finishing up a project to
address this issue, and it should be in the next release.
So what we are currently doing is outputting (in
default mode) either TMA's or SDF files with no hydrogens. We then let
the user "assume" the appropriate protonation state. We know this is
not the best solution.
Note: If you are using using the convert utility,
after a docking
run, the convert just sees the ligand, and doesn't have any information
about the receptor or the scores, so it just puts Hydrogens everywhere
(where they "normally" or "typically" are).
Q: Why do I get bond type 4 (aromatic) in my SDF output files?
A: The answer depends on your input file type. If
the
input was MDL's SDF or MOL file eHiTS will keep the same bond type as
it was in the input file. If the input was MOL2 the same thing happens,
with the exception that in MOL2 definition the bond type can be
aromatic (i.e. "am" type), so if "am" bond was in the input MOL2 file,
the output SDF file will have aromatic (i.e. "4") bond type. If the
input was PDB file, eHiTS will perceive all the aromatic bonds and save
them as type "4" into the output SDF file.
Note: if the aromatic bond type in SDF causes a
problem for a tool
that you use after eHiTS, there are programs (like MOE) which have the
ability to readjust the aromatic bond type to single and double
alternating bonds. Please contact us for more details.
Q: Does the program give the RMS value in Docking results?
A: Yes, eHiTS gives automatically the RMS values when used for self docking with
the -complex option. If the receptor and ligand are provided separately, the
user should use the -rms flag to get RMS values reported. The order of atoms
in the ligand file should be identical to that in the rms file.
Q: What does eHiTS do with chiral molecules?
A: Chirality of molecules is not changed in eHiTS. The algorithm preserves the chirality of the rigid fragments, and handles it at join points. So, whatever was the input chirality is preserved in the output as well.
Q: Why do I not get the ligand name in the output score file?
A: The answer again depends on your ligand input file type. If
the input was Tripos' MOL2 file, the name is automatically detected and included into the scores.txt and best_scores.txt files.
If the input ligand file was MDL's SDF or MOL file eHiTS needs to be notified which tag name is the
right one to be used for the name. I.e. the syntax is:
ehits.h [your regular parameters] -tagname TAG_NAME
where the "TAG_NAME" is the label (or tag) which identifies the molecule's name in sdf file,
and it must be between quotation marks if it has spaces.
Q: How can I visualise the results of eHiTS?
A: With our own viewer - FREE for everyone:
-
CheVi®
(Chemical Visualiser)
Available for Linux only for now, Web-plugin and other platform support
is coming soon.
With other tools:
Note: make sure that you use the "-out myresults.sdf" command line
argument to produce an output file that can be viewed by most of the
standard molecular modelling programs. Once you have the
"myresults.sdf" file, you can use the most standard 3D viewing
programs:
MarvinView from ChemAxon;
Pymol,
CACTVS;
MOE - from CCG (Note: when opening eHiTS output SDF files in MOE you
get one database
entry for each result along with its eHiTS
pose and score); Insight II - from Accelrys (Note: when opening eHiTS
output files in Insight II, you get all the structures, but no data,
i.e. no score and no pose number); Maestro - from Schrodinger;
Sybyl - from Tripos (Note: select MACCS as the type of file to be
opened).
Q: eHiTS on ubuntu Linux, why is it not working there?
A: if you get a strange message something like:
1246: Syntax error. Bad
substitution.
This is a problem with the latest ubuntu v 6.10 and above, where the
distributors of this Linux distribution have changed the default shell
to "dash" instead of bash and it's not compatible with bash. Try
running eHiTS with:
/bin/bash INSTALL_PATH/ehits.sh
and that should solve the problem.
Q: Install says iteratively: "Package_Linux.bin: 32: cut_relative: not found". Can eHiTS (or any other SimBioSys package like CheVi, Lasso etc.) be installed on ubuntu Linux?
A: Yes, the problem is with "ubuntu" not using bash as the default shell.
so please try installing the SimBioSys software package with:
[path_to_bash] [SimBioSys_package.bin] [WHERE_TO_INSTALL]
e.g.
/bin/bash /home/user/Download/CheVi_9.0_Linux.bin /home/user/SimBioSys/
and that should solve the problem.
Q: License Expired?
A: If a user has his license extended by SimBioSys support staff, but
does not run eHiTS prior to the original expiry date, then eHiTS will
think the license has expired. The user will get the following message:
WARNING! Your license will expire in -X days.
If you wish to use the software after the expiry, you need to contact
SimBioSys Inc. to request an extension, e.g. email support@simbiosys.com
To solve this problem, you must delete the grant
file found in your ehits_work/license directory:
rm ~/ehits_work/license/*.grant
Tune package related questions
Q: If I don't use "-active" flag, how many complexes should I put into the list file for the tune package? If I describe 2 complexes in list file, Tune automatically train with only two actives and 400 decoys.
A: In principle, one should use as many complexes as possible. Two complexes is
not a sufficient set for tuning because the data is split to a set that is
used for training and a set that is used for validation. You can look at the
receptors.rkba file and see that most families have around 10 PDB codes, and
I believe 5 or 6 complexes is roughly the minimum required to have a sensible
tuned weight set. The ligands in the complexes are used as actives and with
an automatically selected decoy set they are used to rescale the score with a
LASSO-like scheme.
Q: If I use "-active", how many active compounds should I prepare at least? And that time, how many complex does Tune need in list file?
A: The supplied list of actives supplements the actives in the complexes. The
structural information is used to tune the relative weights of the various
terms in the scoring function so that more faithful (low RMSD) poses get
better scores. The actives are used to rescale the entire score based on a
LASSO-like filter. In principle, the more actives, the better. But it is also
important to have a diverse set of actives, and there is little use in adding
ligands that are just small variations of others on the list.
Q: Could Tune train pose validation like eHiTS 6.2? If Tune couldn't, is the purpose of Tune that improvement of enrichment?
A: eHiTS 6.2 had facilities to carry out deeper tuning that affected PoseMatch
and DockOptim. Those tuning modes have proved to have little effect, and on
the other hand they were very expensive computationally. The tuning in eHiTS
2009 is designed to generate a better score differentiation between good and
bad poses. Before and after tuning eHiTS will generate the same poses, but
after tuning it will be able to give the better poses lower scores. In
addition, the LASSO rescaling improves the enrichment capabilities, and once
again, this is done through scoring and not through different pose
generation.
Q: Please let me know the property of decoy compounds.
A: The decoy compounds in the Tuning package, as in LASSO 2009, are chosen
automatically from a set of decoys that was assembled from various sources,
such as the DUD set. In each tuning run, a subset of decoys is chosen such
that it forms as diverse set as possible in terms of the LASSO descriptors,
and such that it does not overlap with the set of actives, since there is
always the risk that a decoy from the set may in truth have some activity.
Known Issues
Q: Why do I get charge changes in SDF output for multi-ligand runs? (in eHiTS 6.2)
A: There is a known bug in eHiTS version 6.2 that
can
change the charge of Nitrogens or Oxygens in the SDF output of a
multiple ligand screen if you are using the "-out myresults.sdf"
command line option. The problem is due to a mis-calculation of the
partial charges in the final writing of the output file. The problem
does not affect ligand pose or score, as it happens only during the
writing of the final SDF file.
Workaround: The current work around is
to run eHiTS
multi-ligand runs without using the "-out myresults.sdf" flag and to
look for the results in the file:
$HOME/ehits_work/results/<RECEPTOR>/<LIGAND>/ehits_best.sdf.
This file does not contain the bug.
Single ligand docking runs are not affected by this bug at all.
Note: if you have run a multi-ligand run
without
reading this first, we do have scripts to correct the problem without
re-running the job. Please contact us for more details.
Q: Why the stand alone "split" application from eHiTS 2009.1 does not want to work for me?
A: There is command line argument "-config path/parameters.cfg"
which must be specified, however it was missed from the help usage text of the application.
Thus the correct syntax for the stand alone "split" application is:
Usage:
PATH/eHiTS_2009.1/Linux/bin/split input_file_name -config PATH/eHiTS_2009.1/data/parameters.cfg [MARGIN] [-keep_water] [debug-options]
Back to the Top
|
|
[eHiTS Links]
|
|
|
[Related Products]
|
|
|

|
|