My previous post about errors in crystal structures have triggered strong reactions from CCDC (not only response post, direct email, but also email to my former PhD supervisor in the UK asking him for remedy and explanation). Apparently, they have interpreted my post as an attack on the quality of their services. Let me clarify first, that I have never intended to imply anything negative or derogatory about the CCDC services or software. My sincere apologies if my post came across that way. All I wanted to do is raise awareness in the docking/scoring community that small molecule crystallographic data is not free of errors. My understanding is, that the data deposited in CSD has been determined by thousands of people all over the world and published in various scientific journals, while CCDC aggregates the data and creates a comprehensive, validated and value-added database known as the Cambridge Structural Database (CSD), and the complete CSD System (CSDS) includes the CSD itself and associated software for search, visualisation and analysis of stored information. I acknowledge that CCDC provides a valuable service to the community and any error in the data is not their fault.
They have also sent us a “friendly reminder” that since our license to CSD has expired, according to the signed agreement we are not allowed to retain or use any data downloaded form CSD, not even any derived information or data. As I already stated in the update added to the previous blog entry, we have ceased using any data derived from CSD to comply with the license. I have even removed the image of the molecule from the post (since that can also be considered as derived data). We have not incorporated any data into our software. As I mentioned in the previous post, we had the intention to improve our scoring function with statistics collected from CSD (while we had the license during 2007), but it did not prove to be useful, therefore we abandoned that approach and continue to use publicly available PDB crystal structure data — which has been used for all released version of the software. We have not renewed our (rather pricey) license for 2008 for this reason.
One lesson I learned from this exchange is the importance of Open Data for scientific advancement (some scientists believe that research data must be free), e.g. such that is available from CrystalEye. When even non-profit organizations (registered as charity) use draconian license agreements protecting data created and published by others, then fully commercial entities (like pharmaceutical companies) must be guarding their own data even stronger. It makes it difficult to make scientific progress if a single blog mention of an error in a data entry invites the wrath of the company who sells services on the data. It is ironic that the links expressing the need for open data and the open repository happens to point to a web site within the same University where CCDC resides.