Movatterモバイル変換


[0]ホーム

URL:


Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
Thehttps:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log inShow account info
Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation
pubmed logo
Advanced Clipboard
User Guide

Full text links

BioMed Central full text link BioMed Central Free PMC article
Full text links

Actions

Share

.2009 Feb 5;10 Suppl 2(Suppl 2):S6.
doi: 10.1186/1471-2105-10-S2-S6.

Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text

Affiliations

Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text

Yael Garten et al. BMC Bioinformatics..

Abstract

Background: Pharmacogenomics studies the relationship between genetic variation and the variation in drug response phenotypes. The field is rapidly gaining importance: it promises drugs targeted to particular subpopulations based on genetic background. The pharmacogenomics literature has expanded rapidly, but is dispersed in many journals. It is challenging, therefore, to identify important associations between drugs and molecular entities--particularly genes and gene variants, and thus these critical connections are often lost. Text mining techniques can allow us to convert the free-style text to a computable, searchable format in which pharmacogenomic concepts (such as genes, drugs, polymorphisms, and diseases) are identified, and important links between these concepts are recorded. Availability of full text articles as input into text mining engines is key, as literature abstracts often do not contain sufficient information to identify these pharmacogenomic associations.

Results: Thus, building on a tool called Textpresso, we have created the Pharmspresso tool to assist in identifying important pharmacogenomic facts in full text articles. Pharmspresso parses text to find references to human genes, polymorphisms, drugs and diseases and their relationships. It presents these as a series of marked-up text fragments, in which key concepts are visually highlighted. To evaluate Pharmspresso, we used a gold standard of 45 human-curated articles. Pharmspresso identified 78%, 61%, and 74% of target gene, polymorphism, and drug concepts, respectively.

Conclusion: Pharmspresso is a text analysis tool that extracts pharmacogenomic concepts from the literature automatically and thus captures our current understanding of gene-drug interactions in a computable form. We have made Pharmspresso available at http://pharmspresso.stanford.edu.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pharmspresso pipeline for data processing. The Pharmspresso pipeline for data processing: full text PDFs of articles are downloaded, converted to text, and tokenized into individual words and sentences. Next, the text is parsed to identify words or phrases that are members of specific categories within the ontology. These are marked as such and indexed for future search accessibility.
Figure 2
Figure 2
Pharmspresso search page. Snapshot of Pharmspresso search page. User is searching for text that includes the keyword 'ABCB1' as well as a member of the {drug} category and a member of the {polymorphism} category, within the abstract or full text.
Figure 3
Figure 3
Pharmspresso results page. Results page for the search shown in Figure 2. There are eight publications (from the corpus of 1025 in Pharmspresso) that include a total of 20 sentences fulfilling the query conditions. Users may view the sentences in each of these articles that match the query. The number of matches indicates the number of sentences containing the query keywords and categories.
Figure 4
Figure 4
Marked-up sentences found in corpus which match user query. Sentences matching the query are color-coded with keywords and categories highlighted. In this example, 'tacrolimus' is a member of the {drug} category, and 'G2677T' and 'C3435T' are members of the {polymorphism} category. Pharmspresso displays the title and sentence number within the text.
Figure 5
Figure 5
Pharmspresso retrieves sentences from full text not found when scanning abstract only. User queried for 'warfarin' keyword + a member of the {polymorphism} category. Results show that the article titled 'Relative impact of covariates in prescribing warfarin according to CYP2C9 genotype' contains such a sentence, but this sentence would not be found by reading abstract only, as it is sentence number 132 in the article, which actually appears in the 'Discussion' section. Although the 'star notation' (*2, *3) is used earlier in the article to describe gene variants, explicit genomic location information which can be used to map this polymorphism is first given in sentence 132.
Figure 6
Figure 6
Pharmspresso retrieves fact from referenced article. User queried for both keywords 'CYP2D6' and 'codeine' and a member of the {polymorphism} category. Although the article ('Functional Analysis of Six Different Polymorphic CYP1B1 Enzyme Variants Found in an Ethiopian Population') discusses the gene 'cytochrome P450 1B1' and not 2D6, there is a reference to knowledge in a referenced article, regarding a polymorphism in CYP2D6 (not in CYP1B1) and its affect on affinity for codeine. Thus, this article is extracted in response to the query.
See this image and copyright information in PMC

Similar articles

See all similar articles

Cited by

See all "Cited by" articles

References

    1. Rubin DL, Thorn CF, Klein TE, Altman RB. A statistical approach to scanning the biomedical literature for pharmacogenetics knowledge. J Am Med Inform Assoc. 2005;12:121–9. doi: 10.1197/jamia.M1640. - DOI - PMC - PubMed
    1. Ahlers CB, Fiszman M, Demner-Fushman D, Lang F, Rindflesch TC. Extracting semantic predications from medline citations for pharmacogenomics. Pac Symp Biocomput. 2007;12:205–208. - PubMed
    1. Siadaty MS, Shu J, Knaus WA. Relemed: sentence-level search engine with relevance score for the MEDLINE database of biomedical articles. BMC Med Inform Decis Mak. 2007;7:1. doi: 10.1186/1472-6947-7-1. - DOI - PMC - PubMed
    1. Hoffmann R, Krallinger M, Andres E, Tamames J, Blaschke C, Valencia A. Text mining for metabolic pathways, signaling cascades, and protein networks. Sci STKE. 2005;10:pe21. doi: 10.1126/stke.2832005pe21. - DOI - PubMed
    1. Rajagopalan D, Agarwal P. Inferring pathways from gene lists using a literature-derived network of biological relationships. Bioinformatics. 2005;21:788–93. doi: 10.1093/bioinformatics/bti069. - DOI - PubMed

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full text links
BioMed Central full text link BioMed Central Free PMC article
Cite
Send To

NCBI Literature Resources

MeSHPMCBookshelfDisclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.


[8]ページ先頭

©2009-2025 Movatter.jp