PubMed Mapper: A Python library that map PubMed XML to Python object

Last update: Dec 08, 2022

Related tags

Database Drivers pubmed-mapper

Overview

pubmed-mapper: A Python Library that map PubMed XML to Python object

中文文档

1. Philosophy

view UML

Programmatically access PubMed article is a common task for me. Luckily, with the help of eutils, we can access full article data in XML format. What I need is Python objects, not just XML strings, so pubmed-mapper was born.

2. Installation

pip install pubmed-mapper

3. Usage

3.1 use as library

3.1.1 parse a PubMed ID

from pubmed_mapper import Article


article = Article.parse_pmid('32329900')

# PubMed ID
print(article.pmid)  # 32329900

# ids
print(article.ids)  # [pubmed: 32329900, doi: 10.1111/jgs.16467]
print(article.ids[1].id_type)  # doi
print(article.ids[1].id_value)  # 10.1111/jgs.16467

# title
print(article.title)  # Associations of Coffee...

# abstract
print(article.abstract)  # <p><strong>Background: </strong>Coffee and tea...

# keywords
print(article.keywords)  # ['aging', 'coffee; diet; longevity', 'tea']

# MeSH headings
print(article.mesh_headings)  # ['Aged', 'Body Mass Index', '...']

# authors
print(article.authors)  # [hadyab AH Aladdin H, Manson JE JoAnn E, ...]
print(article.authors[0].last_name)  # Shadyab
print(article.authors[0].forename)  # Aladdin H
print(article.authors[0].initials)  # AH
print(article.authors[0].affiliation)  # Department of Family...

# journal
print(article.journal)  # Journal of the American Geriatrics Society
print(article.journal.issn)  # 1532-5415
print(article.journal.issn_type)  # Electronic
print(article.journal.title)  # Journal of the American Geriatrics Society
print(article.journal.abbr)  # J Am Geriatr Soc

# volume
print(article.volume)  # 68

# issue
print(article.issue)  # 9

# references
print(article.references)  # [n. 2013;129:643-659....]
print(article.references[0].citation)  # Lotfield E, Freedman ND...
print(article.references[0].ids)  # []

# pubdate
print(article.pubdate)  # 2020-09-01

3.1.2 parse a downloaded XML file

from lxml import etree
from pubmed_mapper import Article


infile = 'xxx.xml'
with open(infile) as fp:
    root = etree.parse(fp)


articles = []
for pubmed_article_element in root.xpath('/PubmedArticleSet/PubmedArticle'):
    article =  Article.parse_element(pubmed_article_element)
    articles.append(article)

3.2 use as command line software

3.2.1 parse PubMed ID

pubmed-mapper pmid -p 32329900

3.2.2 parse single PubMed XML file

pubmed-mapper file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

3.2.3 parse a directory who contains multiple PubMed XML files

pubmed-mapper directory -i data/ -o output/pubmed-mapper.jl

4. FAQs

4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?

Parse publication date is a hard work, until now pubmed-mapper can't parse all types of them. The types pubmed-mapper can be parsed and the parsed value are:

type	value
2021-03-13	2021-03-13
2021-03	2021-03-01
2021 Spring	2021-04-01
2021	2021-01-01
2021 Jan-Feb	2021-01-01
2021 Mar 13-15	2021-03-13
2021 Mar-2022 Jan	2021-03-01
2021-2022	2021-01-01
2021 Mar 13-Dec 15	2021-03-13
1976-1977 Winter	1976-01-01
1977-1978 Fall-Winter	1977-10-01

4.2 What is pubmed-mapper.log generated by pubmed-mapper?

pubmed-mapper.log is the default log file generate by pubmed-mapper, you can change the file by using --log-file options:

pubmed-mapper --log-file my-custom.log file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

You can go to this log file to find out more parsing details.

4.3 I want log detail message in my log file?

Using --log-level can log more detail message:

pubmed-mapper --log-file my-custom.log --log-level DEBUG file -i data/pubmed21n0001.xml -o output/pubmed21n0001.jl

PubMed Mapper: A Python library that map PubMed XML to Python object

Related tags

Overview

pubmed-mapper: A Python Library that map PubMed XML to Python object

1. Philosophy

2. Installation

3. Usage

3.1 use as library

3.1.1 parse a PubMed ID

3.1.2 parse a downloaded XML file

3.2 use as command line software

3.2.1 parse PubMed ID

3.2.2 parse single PubMed XML file

3.2.3 parse a directory who contains multiple PubMed XML files

4. FAQs

4.1 There many types of PubMed article publication date, how do you convert it to datetime.date object?

4.2 What is pubmed-mapper.log generated by pubmed-mapper?

4.3 I want log detail message in my log file?

Owner

灵魂工具人

Redis client for Python asyncio (PEP 3156)

A selection of SQLite3 databases to practice querying from.

MySQLdb is a Python DB API-2.0 compliant library to interact with MySQL 3.23-5.1 (unofficial mirror)

A simple python package that perform SQL Server Source Control and Auto Deployment.

Creating a python package to convert /transfer excelsheet data to a mysql Database Table

Amazon S3 Transfer Manager for Python

Pure-python PostgreSQL driver

Simple Python demo app that connects to an Oracle DB.

PyMongo - the Python driver for MongoDB

A database migrations tool for SQLAlchemy.

Async ORM based on PyPika

AWS SDK for Python

Google Cloud Client Library for Python

Implementing basic MySQL CRUD (Create, Read, Update, Delete) queries, using Python.

Simplest SQL mapper in Python, probably

Async ODM (Object Document Mapper) for MongoDB based on python type hints

Lazydata: Scalable data dependencies for Python projects

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Makes it easier to write raw SQL in Python.

A collection of awesome sqlite tools, scripts, books, etc