-
Core B
Data Management and Modeling

This CICB core is responsible for providing the researchers with state-of-the-art computational and data management support for maintaining and managing an interactive database. Core B also provides consultation and bioinformatics support for the analysis and reporting of microarray data produced by the CICB investigators. In the studies conducted by the CICB investigators, different types of experiments are carried out, including chromatin immunoprecipitation microarray (ChIP-chip), differential methylation hybridization (DMH), and meDIP. The datasets obtained from these experiments are analyzed and integrated with other types of data. The design and execution of these experiments and the following analyses of results constitute a workflow that involves collection, integration, and analysis of information from multiple data sources. Diverse data sets obtained from these experiments need to deposited in a database system whereby they can be queried according to gene annotations, available clinicopathological features, and any additional genetic and/or epigenetic data. Core B activities are focused on developing and deploying the techniques, tools, and middleware systems to address these requirements. These activities span several different areas:
  • We develop innovative Bayesian methods to predict outcomes of epigenetic and genetic variables. Both supervised and unsupervised classification methods are used for data mining of epigenomic results. To improve the quality of our predictions, we use a combination of cross-validation and permutation testing methods to produce robust statistical models.
  • We provide methods to address problems inherent to the analysis of large, complex epigenomic data sets.
  • We have developed a web-based query support tool, called QUEST. This tool allows researchers to compose a query using a Graphical User Interface and download the results into a CSV or Excel file. The datasets are maintained in a relational database backend. QUEST provides a simple, easy-to-use, yet flexible mechanism for researchers to query and retrieve their data from this database without having to write complex SQL queries.
  • We have developed Genome Data Visualization Toolkit (GDVTK), which consists of a set of data structures and core classes. GDVTK is a sound framework for developing web-based applications to present genomic annotations in visual form. We will employ GDVTK to develop a robust, flexible data management system for the storage and query of promoter CpG islands and the associated methylation and genetic changes, histone modifications and chromatin status in cancer cell lines, neoplastic epithelium, and tumor stroma.
  • We are developing Grid-enabled components to facilitate integration across collaborating institutions of data and tools that are being developed in this project and sharing of information and analytical resources in epigenetic studies in general. We are using the caGrid infrastructure from the NCI-funded caBIGTM (cancer Biomedical Informatics Grid) program. We have developed a prototype application using caGrid to support 1) development of oligonucleotide library, 2) selection of appropriate oligonucleotides for an epigenomic CpG island microarray platform, and 3) annotation of a pre-built microarray for further analysis. This application consists of annotation methods and data sources, wrapped as caGrid analytical and data services, and demonstrates the application of caGrid in epigenetic studies.
Inquiries regarding Core B should be addressed to Joel Saltz or Ramana Davuluri.

The Ohio State University:

Joel Saltz, Principle Investigator
Ramana Davuluri, Co-Investigator
Dustin Potter, Research Scientist
Gtreg Singer, Research Scientist
Terry Camerlengo, System Analyst/Programmer
Tahsin Kurc, Research Assistant Professor, Biomedical Informatics
Hao Sun, Research Scientist
Saranyan K. Palaniswamy, Bioinformatics Programmer/Analyst
Sandya Liyanarachchi, Statistician
Jeffrey Huang, Consultant

Indiana University:
Lang Li, Co-Investigator
Qianqian Zhou, Biostatistician

Database Group
Meeting Minutes - Jan 18, 2007

Relevant Publications:

  • B. Woods, B. Clymer, J. Heverhagen, M. Knopp, J. Saltz, T. Kurc. ``Parallel 4-D Haralick Texture Analysis for Disk-resident Image Datasets'', Concurrency and Computation: Practice and Experience, 2007.
  • J. Han, D. Potter, T. Kurc, G. Singer, S. Hao, S. Hastings, S. Langella, S. Oster, P. S. Yan, R. Davuluri, T. H.-M. Huang, and J. Saltz, "A Grid-Enabled Array Annotator to Support Custom Chip Design for Epigenomic Analysis", Journal of the American Medical Informatics Association (JAMIA), under revision, 2007.
  • S. Langella, S. Oster, S. Hastings, D. Ervin, F. Siebenlist, T. Kurc, and J. Saltz, "Enabling the Provisioning and Management of a Federated Grid Trust Fabric", the 6th Annual PKI Research and Development Workshop, accepted for publication, 2007.
  • T. Pan, M. Gurcan, S. Langella, S. Oster, S. Hastings, A. Sharma, B. Rutt, D. Ervin, T. Kurc, K. Siddiqui, J. Saltz, E. Siegel, "GridCAD: A Grid-Based Computer-Aided Detection System", to appear in the July 2007 issue of Radiographics.
  • J. Saltz, S. Oster, S. Hastings, S. Langella, T. Kurc, W. Sanchez, M. Kher, A. Manisundaram, K. Shanbhag, and P. Covitz. `` caGrid: Design and Implementation of the Core Architecture of the Cancer Biomedical Informatics Grid.'', Bioinformatics, 2006.
  • T. Kurc, D. A. Janies, A. D. Johnson, S. Langella, S. Oster, S. Hastings, F. Habib, T. Camerlengo, D. Ervin, U. Catalyurek, J. Saltz. ``An XML-based System for Synthesis of Data from Disparate Databases.'', Journal of American Medical Informatics Association (JAMIA), 2006.
  • Hao Sun, Saranyan K. Palaniswamy, Twyla T. Pohar, Victor X. Jin, Tim H.-M. Huang and Ramana V. Davuluri
    MPromDb: an integrated resource for annotation and visualization of mammalian gene promoters and ChIP-chip experimental data
    Nucleic Acids Research. 2006 January 1, 34:D98-D103
  • S. Langella, S. Oster, S. Hastings, F. Siebenlist, T. Kurc, J. Saltz. ``Dorian: Grid Service Infrastructure for Identity Management and Federation.'', The 19th IEEE Symposium on Computer-Based Medical Systems, SPECIAL TRACK: Grids for Biomedical Informatics, June 22-23, 2006, Salt Lake City, Utah.
  • V.S. Kumar, B. Rutt, T. Kurc, U. Catalyurek, S. Chow, S. Lamont, M. Martone, and J. Saltz. ``Large Image Correction and Warping in a Cluster Environment.'', Supercomputing 2006 (SC 2006), Tampa, FL, November 2006.
  • S. Hastings, S. Oster, S. Langella, D. Ervin, T. Kurc, and J. Saltz, "Introduce: An Open Source Toolkit for Rapid Development of Strongly-Typed Grid Services", Journal of Grid Computing, under review. Also available as OSU Biomedical Informatics Department Technical Report, BMI-OSU-0001-0706, 2006.
  • B. Rutt, V. S. Kumar, T. Pan, T. Kurc, U. Catalyurek, J. Saltz. ``Distributed Out-of-core Preprocessing of Very Large Microscopy Images for Efficient Querying'', Proceedings of the 2005 IEEE International Conference on Cluster Computing (Cluster 2005), September 2005.
  • S. Hastings, S. Oster, S. Langella, T. Kurc, T. Pan, U. Catalyurek, and J. Saltz. ``A Grid-based Image Archival and Analysis System'', Journal of American Medical Informatics Association (JAMIA), 2005.
  • Saranyan K. Palaniswamy, Victor X. Jin, Hao Sun and Ramana V. Davuluri
    OMGProm: A Database of Orthologous Mammalian Gene Promoters.
    Bioinformatics. 2005 March 15, 21(6):835-836.
  • V X Jin, H Sun, T T Pohar, S Liyanarachchi, S K Palaniswamy, T H-M Huang and R V Davuluri ERTargetDb: An integral information resource of transcription regulation of ER Targ$
    Journal of Molecular Endocrinology 2005. 35: 225-23
  • Sun, H., and Davuluri, R.V. Genome data visualization tool kit(GDVTK): Java-based Application Framwork for Visualization of Gene Regulatory Region Annotations
    Bioinformatics. 2004 20: 727-734.
    PDF