In the era of precision medicine, enormous amounts of data are being generated from disparate sources, including omics, imaging, sensing and beyond. Today, computational scientists need to develop better tools to manage, integrate and share data to make
it clinically actionable. The Bioinformatics for Big Data conference at the Molecular Medicine Tri-Conference 2018 will showcase how medical centers and the pharma industry are developing such tools and software to meet this goal.
Who should attend: Directors, Managers, Researchers, and Scientists from Pharma, Biotechs, Academia, Government and Healthcare Organizations working in Research, Biomedical Informatics, Information Technology, Data Science, Modeling &
Simulation, R&D Informatics, Software Engineering, Translational Genomics, Predictive Medicine, Biostatistics, Computational Biology, and Bioinformatics
Monday, February 12
10:30 am Conference Program Registration Open
11:50 Chairperson’s Opening Remarks
Elizabeth Worthey, Ph.D., Faculty Investigator, Clinical Informatics Director, and Adjunct Associate Professor, Software Development and Informatics, Pediatrics and Genetics, HudsonAlpha Institute for Biotechnology
12:00 pm How Data Commons Are Changing the Way That Large Biomedical Datasets are Analyzed and Shared
Robert Grossman, Ph.D., Frederick H. Rawson Professor, Professor of Medicine and Computer Science, Jim and Karen Frank Director, Center for Data Intensive Science (CDIS), Co-Chief, Section of Computational Biomedicine and Biomedical Data Science, Dept.
of Medicine, University of Chicago
Biomedical data has grown too large for most research groups to host and analyze the data from large projects themselves. Data commons provide an alternative by co-locating data, storage and computing resources with commonly used software services, applications
and tools for analyzing, harmonizing and sharing data to create an interoperable resource for the research community. We give an overview of data commons and describe some lessons learned from the NCI Genomic Data Commons, the BloodPAC Data Commons
and the BRAIN Commons. We also give an overview of how an organization can set up a commons themselves.
12:30 Molecular Diagnostics in the Era of Big Data and Precision Medicine
Elizabeth Worthey, Ph.D., Faculty Investigator, Clinical Informatics Director, and Adjunct Associate Professor, Software Development and Informatics, Pediatrics and Genetics, HudsonAlpha Institute for Biotechnology
Genome-wide sequencing is used as a standard molecular diagnostic test. The major bottleneck in identification of causal variants is not sequencing or initial analysis, but rather interpretation. Interpretation of genetic findings is scarcely a new challenge,
but the task today can be more complex given the increase in dataset size and complexity. Commoditizing interpretation requires development and application of appropriately scaled tools and methods. I will discuss challenges that are faced during
implementation as well as the solutions in place within our institution.
1:00 Session Break
1:10 Luncheon Presentation: Applications of AI in Drug Discovery
Alix Lacoste, Ph.D., Lead Technical Solution Specialist, IBM Watson Health Life Sciences
With millions of scientific research articles published each year, innovation in the life sciences suffers from knowledge waste and lack of knowledge integration. IBM Watson for Drug Discovery addresses this issue by mining large corpuses of literature
and data to help scientists accelerate biomedical research. Using advanced analytics and machine learning, the platform can also predict novel relationships, as demonstrated through our recent work with Barrow Neurological in ALS disease, and Pfizer
in immuno-oncology, among many projects.
1:40 Session Break
2:30 Chairperson’s Remarks
Nathan D. Price, Ph.D., Professor & Associate Director, Institute for Systems Biology
2:40 Mining Personal, Dense, Dynamic, Data Clouds for Health and Disease Insights
Nathan D. Price, Ph.D., Professor & Associate Director, Institute for Systems Biology
We have generated personal, dense, dynamic, data clouds (PD3) for thousands of people (and growing), consisting of genomics, proteomics, transcriptomics, microbiomes, clinical chemistries and wearable devices of the quantified self to monitor wellness
and disease. I will present results from our proof-of-concept pilot study in a set of 108 individuals (Price et al., Nature Biotechnology 2017) as well as from the next thousand individuals. I will show how the interpretation of these data lead to
actionable findings for individuals to improve health and reduce risk drivers of disease.
3:10 Systematic Functional Annotation of Somatic Mutations in Clinically Actionable Genes
Han Liang, Ph.D., Associate Professor and Deputy Chair, Department of Bioinformatics and Computational Biology, Associate Professor, Department of Systems Biology, The University of Texas MD Anderson Cancer Center
Understanding the functional effects of somatic mutations in cancer cells is a fundamental issue in cancer research, since mutated proteins have been widely used as biomarkers and therapeutic targets. We developed a systems-biology approach that integrates
high-throughput mutant ORF construction, high-throughput sensitive cell viability assays, high-throughput functional proteomics, and drug sensitivity screens, and applied it to >1,000 mutations in clinically actionable genes. Our study provides
a valuable resource for identifying clinically actionable mutations for precision cancer medicine.
3:40 LinkedOmics: Analyzing Multi-Omics Data within and across 32 Cancer Types
Bing Zhang, Ph.D., Professor, Department of Molecular and Human Genetics, Lester & Sue Smith Breast Center, Baylor College of Medicine
LinkedOmics is a web platform to explore associations between different types of molecular and clinical attributes, to compare associations discovered from different omics platforms or sample cohorts, and to interpret identified associations in the context
of biological pathways and molecular networks. The current version of LinkedOmics includes all cancer genomic and proteomic data from TCGA and CPTAC, and it can be easily extended to support other cohort-based multi-omics studies.
4:10 Selected Poster Presentation: Novel Computational Method Integrating Disparate Data Types for Drug Candidate MoA Profiling
Timothy J. Cardozo, MD, PhD, Associate Professor, Department of Biochemistry and Molecular Pharmacology, NYU School of Medicine
4:40 Refreshment Break and Transition to Plenary Session
5:00 Plenary Keynote Session (click here for more details)
6:00 Grand Opening Reception in the Exhibit Hall with Poster Viewing
7:30 Close of Day
Tuesday, February 13
7:30 am Registration Open and Morning Coffee
8:00 Plenary Keynote Session (click here for more details)
9:00 Refreshment Break in the Exhibit Hall with Poster Viewing
10:05 Chairperson’s Remark
Hongzhe Li, Ph.D, Professor of Biostatistics and Statistics, Chair, Graduate Program in Biostatistics Director, Center for Statistics in Big Data (CSBD, University of Pennsylvania
10:15 Novel Feature Selection Strategies for Enhanced Predictive Modeling and Deep Learning in the Biosciences
Tom Chittenden, Ph.D., D.Phil., Lecturer and Senior Biostatistics and Mathematical Biology Consultant, Harvard Medical School
Artificial Intelligence (AI) is the single most transformative technology in history. Advancements in medicine depend upon furthering our understanding of how genetic variation and somatic mutation regulate aberrant gene activity and subsequent disease
biology. Our advanced deepCODE feature selection strategies quantitatively integrate multiple types of high-throughput omics data. These approaches improve performance of classification methods and the subsequent identification of genes and molecular
pathways more highly predictive of disease etiology.
10:45 CancerLocator: Non-Invasive Cancer Diagnosis and Tissue-of-Origin Prediction Using Methylation Profiles of Cell-Free DNA
Xianghong Jasmine Zhou, Ph.D., Professor, Pathology and Laboratory Medicine, University of California, Los Angeles
We propose a probabilistic method, CancerLocator, which exploits the diagnostic potential of cell-free DNA by determining not only the presence but also the location of tumors. CancerLocator simultaneously infers the proportions and the tissue-of-origin
of tumor-derived cell-free DNA in a blood sample using genome-wide DNA methylation data. CancerLocator outperforms two established multi-class classification methods on simulations and real data, even with the low proportion of tumor-derived DNA
in the cell-free DNA scenarios. CancerLocator also achieves promising results on patient plasma samples
11:05 Vetting Integrated ‘Big Data’ Approaches to Precision Health Care
Nicholas J. Schork, Ph.D., Professor, Quantitative Medicine, The Translational Genomics Research Institute
Vetting big data and machine learning techniques meant to enable precision medicine is not trivial. However, there are a few emerging strategies for proving the utility of integrated, data-intensive approaches for advancing precision health care.
These include aggregating N-of-1 trials, pursuing drug matching trials, and developing clinical learning systems. In addition, recent trends in regulatory oversight may admit novel strategies like those discussed.
11:25Analyzing Genomic Data at Scale with Google Cloud
Jonathan Sheffi, Product Manager, Genomics & Life Sciences, Google Cloud
Google Cloud enables scientists to change the way they perform research and collaborate with one another. This presentation will highlight how Google Cloud is accelerating life sciences research and finding new ways to innovate.
11:55 Observational Data for Biomedical Discovery
Nicholas Tatonetti Ph.D., Herbert Irving Assistant Professor of Biomedical Informatics, Director of Clinical Informatics, Herbert Irving Comprehensive Cancer Center, Columbia University
Observation is the starting point of discovery. Based on observations scientists form hypotheses that are then tested. In the information trillions of observations are being made and recorded every day – from online social interactions to the
emergency room visit. With so much data available, generating hypotheses using a single scientist’s mind is no longer sufficient. Data mining is about training algorithms to recognize patterns in enormous sets of data and automatically identify
new hypotheses. In this talk, I will discuss how we use data mining algorithms to identify unexpected effects of drugs used singly and in combination with other drugs. Using integrative informatics methods, we are able to discover drug-drug interactions
that no one considered possible before. Finally, I will demonstrate how to use simple and efficient laboratory experiments to validate these hypotheses. In many cases these experiments can be executed in high-throughput by robotic systems, with
the ultimate goal of automating the scientific method.
12:15 pm Session Break
12:25 Enjoy Lunch on Your Own
1:25 Refreshment Break in the Exhibit Hall with Poster Viewing
2:00 Chairperson’s Remarks
Matthew Trunnell, Vice President and CIO, Fred Hutchinson Cancer Center
2:10 The NCI Cancer Research Data Commons: Integrating Heterogeneous Data for Knowledge Discovery
Anthony R. Kerlavage, Ph.D., Chief, Cancer Informatics Branch, National Cancer Institute, Center for Biomedical Informatics & Information Technology
Precision medicine requires identifying the molecular basis for disease and matching targeted therapies to each patient’s unique biology. Cancer researchers need to access, integrate, and analyze data from genomics, metabolomics, proteomics,
microbiomics, imaging, clinical research and outcomes, population-based data, and data collected by health care providers and patients themselves. Building upon current systems, we are defining an integrated, cloud-based Cancer Research Data Commons
necessary to fully leverage these data.
2:40 Converged IT and Data Commons
Simon Twigger, Ph.D., Senior Scientific Consultant, BioTeam Inc.
Data management is an ongoing and growing challenge in Life Sciences. The Data Commons approach aims to streamline accessibility to the right data and right types of analytics tools and resources by creating a converged platform from the foundational
infrastructure to the user interface. This talk will cover the industry trends for developing a strategy around and implementing Data Commons solutions and what role converged IT plays in the process.
3:10 PANEL DISCUSSION: Data Commons
Moderator: Matthew Trunnell, Vice President, CIO, Fred Hutchinson Cancer Center
Panelists:
Lucila Ohno-Machado, M.D., Ph.D., Associate Dean, Informatics and Technology, University of California, San Diego Health
Lara Mangravite, Ph.D., President, Sage Bionetworks
Simon Twigger, Ph.D., Senior Scientific Consultant, BioTeam Inc.
Robert Grossman, Ph.D., Frederick H. Rawson Professor, Professor of Medicine and Computer Science, Jim and Karen Frank Director, Center for Data Intensive Science (CDIS), Co-Chief, Section of Computational Biomedicine and Biomedical Data Science,
Dept. of Medicine, University of Chicago
- What is a data common?
- Challenges in data commons
- Data commons and open science
- Technology innovations
4:10 Valentine’s Day Celebration in the Exhibit Hall with Poster Viewing
5:00 Breakout Discussions in the Exhibit Hall
These interactive discussion groups are open to all attendees, speakers, sponsors, & exhibitors. Participants choose a specific breakout discussion group to join. Each group has a moderator to ensure focused discussions around key issues within
the topic. This format allows participants to meet potential collaborators, share examples from their work, vet ideas with peers, and be part of a group problem-solving endeavor. The discussions provide an informal exchange of ideas and are not
meant to be a corporate or specific product discussion.
Creating FAIR (Findable, Accessible, Interoperable, Reusable) Data
Lara Mangravite, Ph.D., President, Sage Bionetworks
- Importance of FAIR data in biomedical research
- How to minimize the effort required as a data generator to ensure that data is FAIR
- Standards and systems for implementing FAIR data practices
Machine Learning Techniques and Big Data to Enable Precision Medicine
Nicholas J. Schork, Ph.D., Professor, Quantitative Medicine, The Translational Genomics Research Institute
- How can machine learning be leveraged in very early, pre-clinical drug development initiatives, e.g., in drug screening studies, that might enable precision medicine?
- What changes to current clinical trials infrastructure would need to be made to accommodate emerging big data and machine learning techniques?
- What machine learning and big data-oriented strategies might complement, or even replace, traditional late phase (e.g., phase IV) clinical trials infrastructure?
6:00 Close of Day
Wednesday, February 14
7:30 am Registration Open and Morning Coffee
8:00 Plenary Keynote Session (click here for more details)
10:00 Refreshment Break and Poster Competition Winner Announced in the Exhibit Hall
10:50 Chairperson’s Remarks
Ajay Shah, Director, Research Informatics, Office of the Chief Informatics Officer, Beckman Research Institute and City of Hope National Medical Center
11:00 Using Human Genetics to Drive Drug Discovery: The Industry Perspective
Anna Podgornaia, Ph.D., Associate Principal Scientist, Genetics and Pharmacogenomics, Translational Medicine, Merck
The Merck Genetics and Pharmacogenomics (GpGx) group uses human genetics and genomics across the entire drug development pipeline to make decisions anchored in human genetics. During the presentation, I will provide 3 vignettes about how we use human
genetics during the drug discovery process, including 1) Using human genetics to get inspiration for novel drug programs; 2) Using human genetics to gain insight into potential safety issues; 3) Pharmacogenomics. I will close with a section on
challenges and opportunities in using human genetics to drive drug discovery.
11:30 Immune-Mediated Dermatological Conditions: Target Identification
Deepak K. Rajpal,Senior Scientific Director, Computational Biology, Target Sciences, GSK
We share a framework for developing new therapeutic intervention strategies for such indications by utilizing publicly available clinical transcriptomics data sets. We propose a strategy based on developing disease signatures, and utilization of the
disease signatures conceptually for identifying potential drug repurposing opportunities and present novel target identification approaches. We anticipate that the conceptual methodology shared here or similar approaches will further support not
only biomarker discovery efforts but also the development of new drugs.
12:00 pm bStyle: A Graphical, Integrated and Modular Systems Biology Platform
Corrado Priami, Ph.D., President & CEO, COSBI
bStyle is a graphical platform to run systems biology analysis in the field of systems pharmacology. It handles multi-omics data to detect active networks and end-up performing in silico experiments for drug design
and development. All the mathematical technicalities are hidden behind the graphics and it is then easy to use even by a non-expert of modeling and data analysis.
12:30 Session Break
12:40 Enjoy Lunch on Your Own
1:10 Dessert Break in the Exhibit Hall and Last Chance for Poster Viewing
1:50 Chairperson’s Remarks
Michael N. Liebman, Ph.D., Managing Director, IPQ Analytics, LLC; Professor, Drexel College of Medicine; Professor, Wenzhou First University Medical School
2:00 Pharma and Physician Perspective - The Future of Drug Development and Health Care
Charles E. Barr, M.D., MPH, Group Medical Director & Head, RWE Strategy & External Relationships, US Medical Affairs, Genentech
Science advances the knowledge of disease mechanisms, enabling the creation of transformative therapies. However, both health care and drug development face serious challenges including unsustainable growth in costs. Feasible solutions will require
new ways for patients, physicians and researchers to leverage advanced technologies to accelerate both research and health care cost-effectively.
2:30 Healthcare Perspective – Limitations of Big Data Approaches and Clinical Needs
Hal Wolf, Director and Practice Leader for Information and Digital Health Strategy, The Chartis Group
Genomics has quickly become a wide and broad topic capturing both the academic and consumer medical/health models. But the access to meaning big data sets that can be turned into useful knowledge and the lack of clear medical needs has left many approaches
at a crossroads on how to proceed. Where will genomics set path and what are the dependencies to support its useful integration into the healthcare eco-system?
3:00 PANEL DISCUSSION
Moderator: Michael N. Liebman, Ph.D., Managing Director, IPQ Analytics, LLC; Professor, Drexel College of Medicine; Professor, Wenzhou First University Medical School
- Complexity of disease(s): Disease stratification; limitations in diagnosis
- Complexity of patients: Clinical history; co-morbidities; genomics
- Clinical guidelines: Quality of guidelines; compliance
- Trial populations vs. real world patients
- Translation of clinical trial results into clinical practice
- Unmet vs. unstated unmet clinical needs
3:30 Session Break
3:40 Chairperson’s Remarks
Lara Mangravite, Ph.D., President, Sage Bionetworks
3:45 Collaborative Ecosystems in Data-Intensive Science for Precision Medicine
Lara Mangravite, Ph.D., President, Sage Bionetworks
An advanced understanding of the dynamic nature of disease is necessary to meaningfully implement precision medicine but several barriers exist. In particular, approaches to understand dynamic fluctuations in disease are highly data intense and require
bioinformatic inquiry for which standard methodologies do not exist. These issues can be systematically addressed by combining resources, benchmarking methods, and establishing community consensus around well-supported research findings.
4:15 Novel Approaches to Participant Engagement in Genetic Research and Translating Big Data into Action
David Verbel, MPH, Director, Translational Data Science, Human Biology and Data Science, Eisai, Inc.
To identify the right medicines and patients to receive them, Eisai is exploring ways to identify individuals who carry genetic variants of interest. In two such studies, biological samples from genetically and clinically selected individuals
will be characterized to learn more about cellular and molecular consequences to changes in the function of particular genes. The first involves utilizing a novel research platform; the latter working with a leading academic center.
4:45 Scientific Informatics for Translational Oncology
Ronghua Chen, Director, Scientific Informatics, Global Research IT, R&D IT, Merck
The applications of molecular profiling technologies including next-generation sequencing in translational oncology offer unprecedented opportunities to discover new drug targets and biomarkers as well as to understand tumor biology. This
presentation will elaborate the complexities of oncology data sets and highlight an integrated scientific informatics approach in analyzing data and supporting translational research.
5:15 Close of Conference Program