Bioinformatics Projects

Biological applications areas

It is one of the characteristics of our research program that we expect all CHIP investigators to focus on applications areas in medicine or biology. For some, this focus arises from a long-standing clinical or scientific interest but for all, it serves as the proving ground on which quantitative or computational approaches are tested for their performance and their relevance. Currently, in addition to the application areas which have their own rubrics on this page, other biomedical applications include: neuroscience, development, diabetes, aging, transplantation biology, and cancer.

Across all these biological application areas, availability of tissue, primarily human, in sufficient sample sizes has been a persistent problem. This problem has been the focus of our NCI-funded Shared Pathology Informatics Network project.

 

 

 

Public tools, data models and data exchange

We believe that investigations of the massive amount of information generated by the human genome project could benefit from a higher level of integration and from increased ease of access. We aim at developing tools able to present genetic data in a format that more closely matches the requirements of the researchers who will use them. An example of this is SNPper, a web-based application that we developed to assist researchers in designing and performing large-scale association studies. By integrating various databases of genes and Single Nucleotide Polymorphisms (SNPs), SNPper provides the user with powerful search capabilities that make it possible to retrieve sets of SNPs according to their position in the genome. In addition, we are developing tools to automatically link together databases containing related information, but coming from different sources (for example: linking a gene symbol to its PubMed entry, its OMIM code, the microarray labels that refer to it, clinical annotations, etc.). Like others, we firmly believe that these tools are important but not an end unto themselves. We are equally convinced that continued evolution of these tools will depend on a critical mass of investigators with expertise in the computational or statistical sciences and deep knowledge of the biomedical sciences. The necessary expertise cannot come from biology alone (see this survey of biomedical informatics expertise).

Pharmacogenomics

Linking pharmacological measurements to expression measurements for drug-discovery and hypothesis generation as to the pharmacological mechanisms of action. More recently, we have begun to investigate what is more classically thought of as pharamcogenomics: individual variation in drug response as a function of individual genomic variation.

Population Genomics

We are involved with several large epidemiological studies to understand the genetic component of several diseases. In addition to implementing the databases, we are developing novel methods of finding robust correlations between phenotype and genotype.

Gene Expression in Inflammatory Myopathies

The goals here are: (1) to formulate and validate hypotheses relevant to the pathogenesis of the inflammatory myopathies through the use of DNA microarrays and measurement of large-scale muscle tissue gene expression, (2) to characterize patterns of muscle gene expression among distinct clinical subtypes of inflammatory myopathies and correlate these patterns with clinical phenotypes, and (3) to explore possible gene function for genes, cDNAs, and expressed sequence tags (ESTs) of unknown function through computational techniques applied to these expression profiles.

Similarity Measures in DNA Microarray Dataset Analysis for Functional Genomics

Several methodologies are available to explore functional relationships among genes as inferred from DNA microarray expression analysis. The choice of similarity measure in these varied techniques of functional genomic clustering is the most significant determinant of the resulting hypothesized relationships (e.g. ones based on dynamics, signal coherence). Consideration of the distinct mathematical properties of similarity measures provides insight into their appropriate use in gene expression datasets as well allowing for abstractions to other less intuitive similarity measures.

Noise, Error and Reproducibility

Comparisons between expression measurements from repeated samples on duplicate identical microarrays. Analysis of quanititation algorithms in microarray array scanning software. Comparisons of expression measurements from repeated samples across differing microarray technologies. Quality control analysis of microarrays.