Crystallographic techniques provide huge amounts of information about molecular structures. This information is stored in the Cambridge Structural Database and the Protein Data Bank, and both have proved invaluable to drug discovery researchers.


Computational techniques underlie many of the procedures in modern drug discovery. Many of these are based on modelling of small molecules (for example, potential active entities), others on modelling the structures of macromolecular systems (such as the potential therapeutic target), and the rest on the interactions between the two. But modelling has to be based ultimately on experimental observation; modelling procedures are validated by their success in reproducing those observations. Many of these experimental observations are energetic measurements, some are results from ab initioquantum mechanical calculations, but at the root of all modelling work, most importantly, lie experimental measurements of molecular structure.

Many spectroscopic techniques provide molecular structural information. IR spectroscopy can tell which chemical groups are present in a small molecule, mass spectrometry shows molecular connectivity, and nuclear magnetic resonance (NMR) provides some information about solution structure for both small molecules and proteins. But the most information-rich technique is x-ray crystallography.

Less than one century old, this technique has developed quickly to a state where, given a suitable crystal, the structure of a small molecule can be routinely determined in a matter of a couple of hours, and the structure of a macromolecular system in a couple of days, depending on the nature of the system. In the case of small molecules, there exists a database of structures that has been collated and updated since the early days of the technique. This database, the Cambridge Structural Database (CSD), celebrates its 40th anniversary in 2005 and contains almost 350,000 organic and organometallic crystal structures.

Cambridge Structural Database

Over the years, CSD has been used as a unique source of knowledge on intramolecular structure (for example, conformation) and intermolecular interactions (such as the forces that hold the molecules together in the crystal form). Over 1200 papers have been published referencing the CSD, and knowledge extracted from it forms the basis of some well-known modelling programmes used for drug discovery applications. The forms and values of many force-field parameters commonly used in modelling arise from and have been tested against crystal structures in the CSD. The database remains a growing, invaluable resource to chemists and modellers working in drug discovery.

Protein Data Bank

Of even greater importance to drug discovery researchers is the Protein Data Bank (PDB), originally conceived at the Brookhaven Laboratory, but now curated and made available by a consortium of US and international institutions called RCSB. Where the CSD stops, the PDB starts – it contains the structures of (at the time of writing) nearly 32,000 native proteins and protein-ligand complexes, determined mostly by crystallography but also by NMR. Due to the nature of crystallographic techniques, the large molecule structures in the PDB are less precise and more prone to error than the small molecule structures in the CSD. Therefore, they need to be approached and used with some care, but nevertheless the PDB is proving to be an absolute gold mine for researchers into drug discovery, as the mode of action of drugs may be revealed therein.

Most commercial enterprises engaged in drug discovery research, be they large or small, have organised access to protein crystallographic facilities on a proprietary basis, either in-house, through service companies, through academic institutions, or through government laboratories for specialised techniques such as the use of synchrotron radiation. The work done on these facilities supplements that in the publicly available PDB and often acts as the basis for modelling projects and screening campaigns.

The importance of knowledge of molecular structure, be it for small molecules, proteins, or complexes of the two, cannot be underestimated. Almost all drug discovery research work is based on it.

PDB is proving to be a gold mine for researchers in drug discovery, as the mode of action of drugs may be revealed within.

Company profile

CCG’s flagship product, MOE, is an integrated suite of drug discovery software tools used by computational and medicinal chemists, and by biologists at pharmaceutical and biotech companies, and universities worldwide.