Catherine Blake, Ph.D.

UNC Image

Research Interests

Language Processing      Human Synthesis      Medical Informatics      Support       CV  

My primary research goal is to accelerate scientific discovery by synthesizing evidence from text. I have developed several methods that automate the synthesis process, from term co-occurrence, to shallow and now deep language processing methods. I have also modeled the processes used by scientists as they generate and test new hypotheses. The underlying motivation is that by understanding how scientists already synthesize evidence we will be in a better position to develop technologies that accelerate that process. The third theme centers on medical and health informatics where there is also a need to structure and organize health related information. You can find a summary of my research contributions in my CV (PDF) (last updated Sept 2008).

Language Processing to Automate Information Synthesis

Topic-Focused Multi-Document Summarization: The goal of this work is to generate a fluent, well-organized 250 word summary from a topic, query and 25 news stories containing relevant information. This paper describes the query expansion, lexical compression and sentence selection strategies we used.

  • Blake,C., Kampov,J., Orphanides,A., West,D., & Lown,C., (2007) UNC-CH at DUC 2007: Query Expansion, Lexical Simplification, and Sentence Selection strategies for Multi-Document Summarization, Presentation at Document Understanding Conference (DUC) 2007, Rochester, NY PDF

Textual Entailment and Paraphrasing: The goal of this work is to accurately predict when the information in one sentence entails another. Although this is Blake's first participation in RTE, the system achieved the 10th highest score of 45 systems. The second paper describes how to resolve contradictory evidence in a question answering setting.

  • Blake,C. (2007) The Role of Sentence Structure in Recognizing Textual Entailment. The Third Recognizing Textual Entailment Challenge Preprint
  • Blake, C. (2003) A Technique to Resolve Contradictory Answers. AAAI Spring Symposium on New Directions in Question Answering, Stanford, CA. PDF

Modeling Language in Scientific Literature: This paper compares terminology from different journals within a collection of more than 100000 full-text chemistry articles. The results suggest that a random sample of text gives similar IDF weights, but that terminology between journals differs.

  • Blake,C (2006) A Comparison of Document, Sentence and Term Event Spaces, The Joint 21st International Conference on Computational Linguistics (Coling) and the 44th Annual Meeting of the Association for Computational Linguistics (ACL), Sydney Australia. PDF and presentation

Multi-User Information Extraction for Information Synthesis (METIS): This research uses information extraction and meta-analytic techniques to synthesize secondary evidence from scientific literature.

  • Blake,C (2005) Information Synthesis: A New Approach to Explore Secondary Information in Scientific Literature, The Joint Conference on Digital Libraries, Denver, CO, USA. PDF
  • Blake, C. (2004) A Text Mining Approach to Enable Detection of Candidate Risk Factors. In Proceedings of Medinfo2004 Building High Performance Health Care Organizations, San Francisco, CA. PDF
  • Blake, C. (2003). Information Synthesis: A Mixed-initiative Meta-Analytic Approach to Facilitate Knowledge Discovery from Scientific Text. Doctoral Dissertation, School of Information and Computer Science, University of California, Irvine
  • Blake, C. & Pratt, W. (2002). Automated Information Extraction and Analysis for Information Synthesis. Proceedings of the Annual American Medical Informatics Association Conference (AMIA 2002), San Antonio, TX. PPT
  • Blake, C. (2002). Information Synthesis: A Process used by Scientists in Medicine and Public Health to Overcome Information Overload. In Fourth International Conference on Conceptions of Library and Information Science: Emerging Frameworks and Methods (CoLIS 4) , Doctoral Forum, Seattle, WA. PDF
  • Blake, C. & Pratt, W. (2002). Collaborative Information Synthesis. In Proceedings of Annual Conference of the American Society for Information Science and Technology (ASIST 2002), Philadelphia, PA. PDF

Knowledge Discovery from Literature: This research attempts to automate Swanson’s ABC model using the Unified Medical Language System (UMLS) representation of text and advocates a shift from retrieval to synthesis.

  • Blake, C. & Pratt, W. (2002).A Semantic Approach to Identify Candidate Treatments from Existing Medical Literature. In AAAI Symposium on Knowledge-based Approaches, Stanford, CA. PDF
  • Pratt, W., Srinivasan, P., Smalheiser, N. & Blake, C. (2004), Mining the Literature to Promote Biomedical Discoveries. Panel In Proceedings of Medinfo2004 Building High Performance Health Care Organizations, CA. PDF

Classic Text Mining: I have also worked on, what I consider classic Text Mining projects – creating a landscape of terms using Themescape and generating association rules from text. The latter suggests that changing the text representations only can lead to more plausible and useful association rules.

  • Blake, C. and Tengs, T. (2001). The Nation’ Breast Cancer Research Portfolio: A View from 30,000 ft. Presentation at the Avon Symposium, University of California, Irvine, CA. The presentation is not available, but the generated self organizing maps of breast cancer research from different funding agencies.
  • Blake, C. & Pratt, W. (2001). Better rules fewer features: A semantic approach to selecting features from text. In Proceedings of the Institute of Electrical and Electronics Engineers Data Mining Conference (IEEE DM 2001), San Jose, CA. PDF This paper was also published in the Workshop on Text Mining (TextDM-2001) at the same conference
  • Blake, C. & Pratt, W. (2000). Multiple Categorizations of Search Results: An extension to Dynamic Categorization. In Proceedings of the Annual American Medical Informatics Association Conference (AMIA 2000), Los Angeles, CA. PDF

Modeling Human Synthesis

How scientists synthesize evidence: This work models the processes used by scientists in public health and complementary and alternative medicine as they attempt to resolve contradictions and the redundancy in medical literature. It relates closely to the systematic review and to meta-analyses processes.

  • Blake, C. and Pratt, W. (2006) Collaborative Information Synthesis I: A Model of Information Behaviors of Scientists in Medicine and Public Health, Journal of the American Information Society of Science and Technology. 1740-9 PDF *2007 JASIST Best Paper Award
  • Blake, C. and Pratt, W. (2006) Collaborative Information Synthesis II: Recommendations for Information Systems to Support Synthesis Activities, Journal of the American Information Society of Science and Technology. 1888-95 PDF *2007 JASIST Best Paper Award

How scientists arrive at and verify new hypotheses: This work elicited from experienced scientists in chemistry and in chemical engineering the process they used to arrive at new research questions and take those questions from the initial idea through to the final publication. The findings are based on the results of hour long one-on-one interviews with scientist.

  • Blake,C and Rendall, M., (2006) Scientific Discovery: A View from the trenches, in Proceedings of the Ninth International Conference on Discovery Science (DS-2006), Barcelona, Spain (Long paper acceptance rate 27%) PDF and presentation
  • Tenopir,C, Brown,A., Brown,C., Blake, C. (2006), "How chemists are really finding and using information in our digital environment", In Proceedings of Annual Conference of the American Society for Information Science and Technology (ASIST 2006), Austin, TX PDF
  • Blake,C, (In Preparation) "Designing Knowledge Discovery Systems to support Science: A Case Study of Chemists and Chemical Engineers"
  • Blake,C, (In Preparation) Methods used by Academic Scientists in Chemistry and Chemical Engineering to Verify a New Hypothesis.

Position papers that describe the transition from Information Retrieval to Information Synthesis

  • Blake,C. (2007) In support of e-science - Shifting from Information Retrieval to Information Synthesis, Presentation at the 2007 Microsoft eScience Workshop at RENCI, Chapel Hill, NC. Abstract
  • Blake,C and Nassar, N.(2007) Using Concepts and Entailment for Passage Retrieval from Biomedical Literature, Abstract in 2007 Microsoft eScience Workshop at RENCI, Chapel Hill, NC. Abstract
  • Blake,C., & Anderson,C., (2005) The Shift from Information Retrieval to Synthesis, The First i-Conference of the i-school Community Bringing Disciplines to Confront Grand Challenges, State College, PA PDF

Medical and Health Informatics

Methods to detect newly diagnosed diabetes cases: This project combines both structured and unstructured information sources to identify early onset of diabetes.

  • West, S., Blake, C.,Liu, Z., McKoy, N., Oertel,M., Carey,T. (In Press). Reflections on using Electronic Health Record Data for Clinical Research. Chapel Hill. Accepted to Health Informatics Journal. PDF Pre-print
  • West, S., Liu, Z., McKoy, N., Oertel,M., Blake, C., Schwartz,B., Ochart,F., Carey,T. (2007) Use of electronic medical records and administrative claims data for assessing type 2 diabetes care, UNC DEcIDE project report for Grant AHRQ 290-05-0040-1.

Information Extraction from Medical Records: Much of the information in a hospital setting is unstructured, which is not amenable to data mining. This project was conducted as part of the DeCide project and uses shallow natural language processing methods to identify drugs, the amount and their method of delivery from transcribed medical notes.

  • Kraus,S., Blake,C. & West,S.L (2007) Information Extraction from Medical Notes, Accepted to MEDINFO, Brisbane, Australia.PDF

Consumer Health Information: This project explores the manual processes used to catalog online health information.

  • Blake,C, West, D., Luo,L., Marchionini, G. (2005) Cataloging On-Line Health information: A Content Analysis of the NC Health Info Portal Proceedings of the Annual American Medical Informatics Association Conference (AMIA 2005) , Washington DC, USA. PDF
  • Luo, L. West,D., Marchionini,G. Blake,C (2005) A Study of Annotations for a Consumer Health Portal. Extended Abstract in the Joint Conference on Digital Libraries, Denver, CO, USA. PDF

Personal Health Records: This project uses a grounded theory approach to identify characteristics that patients would require in a personal health record system.

  • Wildemuth, B.M, Blake, C.L., Spurgin,K., Oh,S. Zhang, Y. (2006) Patients’ Perspectives on Personal Health Records: An Assessment of Needs and Concerns, Critical Issues in eHealth Research , Bethesda, Maryland. Doc and Poster

Support

I am grateful to the study participants and to the following organizations for providing financial support that has made much of this research possible.

  • Renaissance Computing Institute (RENCI) - 2007 Faculty Fellowship
  • The Computing Research Association (CRAW) - 2007 grant from the Multidisciplinary Research Opportunity for Women (joint work with Sue West in the School of Public Health)
  • Center for Environmentally Responsible Solvents and Processes (CERSP) - 2004-2006 project on 'Text Mining Chemistry Literature'
  • IBM SUR - 2005 grant for equipment
  • University of North Carolina at Chapel Hill - 2006 UNC Faculty Fellowship
  • University of North Carolina at Chapel Hill, School of Information and Library Science - Start-up funds & ongoing support (2004-current)
  • Lineberger Cancer Center - Start-up funds and ongoing support (2004-current)
  • The California Breast Cancer Research Program - 2002 Dissertation award
  • University of California, Irvine - 2001 Interdisciplinary grant to study information extraction
  • University of California, Irvine, School of Information and Computer Science - 1997 and 2003 Dissertation Fellowship and RA and TA support throughout my doctoral studies.
  • University of Wollongong, Australia - 1994 Education Abroad Scholarship

Previous Research Interests

Intelligent Manufacturing Systems (IMS): Before returning to graduate school, I was a research scientist at the BHP Research Laboratories in Melbourne (MRL) where I applied machine learning techniques to manufacturing processes. The approach leveraged work in the Global Manufacturing (GLOBEMAN) and Holonics Consortia. These papers describe a data and learning framework that would be used by agents in an IMS.

  • Blake, C. (1996). An Overview of Quinlan's C4.5 Algorithm. BHP TechNote BHPR/CP/N/015, Melbourne Research Laboratories, Australia.
  • Blake, C. (1996). Data Access and Data Integrity in a Geographically Distributed Environment-'Global Information Warehouse' BHP TechNote BHPR/CP/N/029, Melbourne Research Laboratories, Australia.