Brought to you by Adobe
- Adobe® Acrobat® 9 Pro Extended - a complete PDF solution
- Create interactive presentations
- Bring people & ideas together
- Communicate with impact
Featured White Papers
- 5 Strategies for Making Sales the Engine for Growth (AchieveGlobal)
- Hosted CRM comparison guide (Inside CRM)
- Enterprise PBX comparison guide (VoIP-News)
Technology Industry
Industry: Email Alert RSS FeedText analytics for life science using the Unstructured Information Management Architecture
IBM Systems Journal, Sept, 2004 by R. Mack, S. Mukherjea, A. Soffer, N. Uramoto, E. Brown, A. Coden, J. Cooper, A. Inokuchi, B. Iyer, Y. Mass, H. Matsuzawa, L.V. Subramaniam
In this sense, BioTeKS is a middleware system that can be used to build applications and that can be used for many application-level tasks. Many important details of how BioTeKS is used will vary with different target application requirements. For example, different sets of annotators will be needed for different application-level functions. We also anticipate that text annotators built for BioTeKS will need to bc customized or extended for specific research or potential customer applications. We know, for example, that pharmaceutical companies have internal knowledge resources including document collections and thesauri of terms and concepts proprietary to how these companies conduct research and development. Using BioTeKS to solve specific problems would require incorporating additional resources (dictionaries, etc.) into the BioTeKS annotators.
The value of BioTeKS in this applied setting is twofold. Firstly, it casts each application scenario into a similar sequence of application-design considerations--each application implies one or more types of documents (we have focused on MEDLINE abstracts), one or more types of text data, a configuration of one or more text annotators that can extract relevant types of text data, and one or more CAS consumers for translating annotations into persistent forms for convenient access by upstream applications. The database model implied for the text-mining applications discussed earlier may or may not be useful for other applications. Secondly, the value of BioTeKS is in the quality of its specific text annotation methods. The development and refinement of these methods is an ongoing process, involving both software engineering and basic research.
Conclusions
BioTeKS is a significant technical initiative within the IBM Research laboratories to integrate and customize a broad suite of text-analysis projects and technologies targeting problems in the domain of biomedical text analysis. The goal of the BioTeKS text-analysis methods is to convert initially unstructured text information into structured text data, commensurate with structured data derived from sources other than text (e.g., gene names in micro-array experiments, or drug, treatment, and disease references in clinical records). The domain of text-analysis problems that BioTeKS addresses is very broad, and many problems are the subject of basic research initiatives in industry and academic institutions.
Accordingly, both BioTeKS and UIMA are works in progress. Future work will focus on research issues and on solving real-world problems. Research will focus on improving the quality of BioTeKS annotators for both biomedical entities and the facts and relations associated with them. We need to better exploit emerging biomedical ontology resources in the process of IE. In addition, we believe machine learning techniques hold great promise for improving the quality of IE. Improving text-analysis quality also requires the development of, or access to, test beds or "gold standards" for correct identification of biomedical entities and relations. Lack of such test beds is a problem in the bioinformatics research community, as we noted earlier (see Reference 10), and it is equally problematic in analyzing clinical records and patents. Finally, a key requirement for making progress is the use of text analysis for the kinds of real-world drug-discovery and biomedical-discovery tasks surveyed in the introduction. The domain of potential IE problems is very large, and progress will require focus and feedback that we believe implies significant collaboration with domain experts (and problem solvers) in biomedical research and development organizations in both academia and industry.
