Python wrapper for Stanford CoreNLP.

Overview

stanfordcorenlp

PyPI GitHub release PyPI - Python Version

stanfordcorenlp is a Python wrapper for Stanford CoreNLP. It provides a simple API for text processing tasks such as Tokenization, Part of Speech Tagging, Named Entity Reconigtion, Constituency Parsing, Dependency Parsing, and more.

Prerequisites

Java 1.8+ (Check with command: java -version) (Download Page)

Stanford CoreNLP (Download Page)

Py Version CoreNLP Version
v3.7.0.1 v3.7.0.2 CoreNLP 3.7.0
v3.8.0.1 CoreNLP 3.8.0
v3.9.1.1 CoreNLP 3.9.1

Installation

pip install stanfordcorenlp

Example

Simple Usage

# Simple usage
from stanfordcorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27')

sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print 'Tokenize:', nlp.word_tokenize(sentence)
print 'Part of Speech:', nlp.pos_tag(sentence)
print 'Named Entities:', nlp.ner(sentence)
print 'Constituency Parsing:', nlp.parse(sentence)
print 'Dependency Parsing:', nlp.dependency_parse(sentence)

nlp.close() # Do not forget to close! The backend server will consume a lot memery.

Output format:

# Tokenize
[u'Guangdong', u'University', u'of', u'Foreign', u'Studies', u'is', u'located', u'in', u'Guangzhou', u'.']

# Part of Speech
[(u'Guangdong', u'NNP'), (u'University', u'NNP'), (u'of', u'IN'), (u'Foreign', u'NNP'), (u'Studies', u'NNPS'), (u'is', u'VBZ'), (u'located', u'JJ'), (u'in', u'IN'), (u'Guangzhou', u'NNP'), (u'.', u'.')]

# Named Entities
 [(u'Guangdong', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'of', u'ORGANIZATION'), (u'Foreign', u'ORGANIZATION'), (u'Studies', u'ORGANIZATION'), (u'is', u'O'), (u'located', u'O'), (u'in', u'O'), (u'Guangzhou', u'LOCATION'), (u'.', u'O')]

# Constituency Parsing
 (ROOT
  (S
    (NP
      (NP (NNP Guangdong) (NNP University))
      (PP (IN of)
        (NP (NNP Foreign) (NNPS Studies))))
    (VP (VBZ is)
      (ADJP (JJ located)
        (PP (IN in)
          (NP (NNP Guangzhou)))))
    (. .)))

# Dependency Parsing
[(u'ROOT', 0, 7), (u'compound', 2, 1), (u'nsubjpass', 7, 2), (u'case', 5, 3), (u'compound', 5, 4), (u'nmod', 2, 5), (u'auxpass', 7, 6), (u'case', 9, 8), (u'nmod', 7, 9), (u'punct', 7, 10)]

Other Human Languages Support

Note: you must download an additional model file and place it in the .../stanford-corenlp-full-2018-02-27 folder. For example, you should download the stanford-chinese-corenlp-2018-02-27-models.jar file if you want to process Chinese.

# _*_coding:utf-8_*_

# Other human languages support, e.g. Chinese
sentence = '清华大学位于北京。'

with StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27', lang='zh') as nlp:
    print(nlp.word_tokenize(sentence))
    print(nlp.pos_tag(sentence))
    print(nlp.ner(sentence))
    print(nlp.parse(sentence))
    print(nlp.dependency_parse(sentence))

General Stanford CoreNLP API

Since this will load all the models which require more memory, initialize the server with more memory. 8GB is recommended.

 # General json output
nlp = StanfordCoreNLP(r'path_to_corenlp', memory='8g')
print nlp.annotate(sentence)
nlp.close()

You can specify properties:

  • annotators: tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref (See Detail)

  • pipelineLanguage: en, zh, ar, fr, de, es (English, Chinese, Arabic, French, German, Spanish) (See Annotator Support Detail)

  • outputFormat: json, xml, text

text = 'Guangdong University of Foreign Studies is located in Guangzhou. ' \
       'GDUFS is active in a full range of international cooperation and exchanges in education. '

props={'annotators': 'tokenize,ssplit,pos','pipelineLanguage':'en','outputFormat':'xml'}
print nlp.annotate(text, properties=props)
nlp.close()

Use an Existing Server

Start a CoreNLP Server with command:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000

And then:

# Use an existing server
nlp = StanfordCoreNLP('http://localhost', port=9000)

Debug

import logging
from stanfordcorenlp import StanfordCoreNLP

# Debug the wrapper
nlp = StanfordCoreNLP(r'path_or_host', logging_level=logging.DEBUG)

# Check more info from the CoreNLP Server 
nlp = StanfordCoreNLP(r'path_or_host', quiet=False, logging_level=logging.DEBUG)
nlp.close()

Build

We use setuptools to package our project. You can build from the latest source code with the following command:

$ python setup.py bdist_wheel --universal

You will see the .whl file under dist directory.

Comments
  •  INFO:root:Waiting until the server is available.

    INFO:root:Waiting until the server is available.

    Hi, I'm trying to use this library for a project where I would need corefence resolution. I was testing out the test.py file and this kept on showing indefinitely, how do I fix this?

    Thanks in advance.

    opened by antoineChammas 13
  • 中文指代消解没有输出

    中文指代消解没有输出

    我在general api里添加dcoref的时候,没有输出结果 3 2

    但是pipeline中去掉dcoref,就会有正确的输出,英文部分并不存在这个问题,我在源码中看到中文和英文的处理代码除了model文件之外几乎没有区别,请问是这个wrapper现在还不支持中文的指代消解吗?

    下面的代码是没有问题的测试代码

    1

    opened by chenjiaxiang 10
  • Problems with NER

    Problems with NER

    When I tried to run the demo code for ner: print 'Named Entities:', nlp.ner(sentence) Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/pypy2.7/dist-packages/stanfordcorenlp/corenlp.py", line 146, in ner r_dict = self._request('ner', sentence) File "/usr/local/lib/pypy2.7/dist-packages/stanfordcorenlp/corenlp.py", line 171, in _request r_dict = json.loads(r.text) File "/usr/lib/pypy/lib-python/2.7/json/init.py", line 347, in loads return _default_decoder.decode(s) File "/usr/lib/pypy/lib-python/2.7/json/decoder.py", line 363, in decode obj, end = self.raw_decode(s, idx=WHITESPACE.match(s, 0).end()) File "/usr/lib/pypy/lib-python/2.7/json/decoder.py", line 381, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded

    Does anyone know how to solve this problem? I use StanfordCoreNLP version 3.7.0

    opened by TwinkleChow 10
  • No JSON object error in test.py

    No JSON object error in test.py

    Hi

    When I ran the test with python 2.7, I got the following error:

    Initializing native server...
    java -Xmx4g -cp "/home/ehsan/Java/JavaLibraries/stanford-corenlp-full-2016-10-31/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000
    The server is available.
    Traceback (most recent call last):
      File "test.py", line 9, in <module>
        print('Tokenize:', nlp.word_tokenize(sentence))
      File "/home/ehsan/Python/stanford-corenlp/stanfordcorenlp/corenlp.py", line 78, in word_tokenize
        r_dict = self._request('ssplit,tokenize', sentence)
      File "/home/ehsan/Python/stanford-corenlp/stanfordcorenlp/corenlp.py", line 114, in _request
        r_dict = json.loads(r.text)
      File "/home/ehsan/anaconda3/envs/py27-test-corenlp/lib/python2.7/json/__init__.py", line 339, in loads
        return _default_decoder.decode(s)
      File "/home/ehsan/anaconda3/envs/py27-test-corenlp/lib/python2.7/json/decoder.py", line 364, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/home/ehsan/anaconda3/envs/py27-test-corenlp/lib/python2.7/json/decoder.py", line 382, in raw_decode
        raise ValueError("No JSON object could be decoded")
    ValueError: No JSON object could be decoded
    
    opened by ehsanmok 10
  • JSONDecoderError: Expecting Value line 1 column 1

    JSONDecoderError: Expecting Value line 1 column 1

    I was trying to do NER with my text

    from stanfordcorenlp import StanfordCoreNLP

    documents = pd.read_csv('some csv file')['documents'].values.tolist() text = documents[0] ## test is a string

    nlp = StanfordCoreNLP('my_path\stanford-corenlp-full-2018-02-27') ## latest version print(nlp.ner(text))

    But I keep getting this error

    raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

    opened by yceny 6
  • FileNotFoundError

    FileNotFoundError

    I'm getting a file not found error as shown below:

    Traceback (most recent call last): File "D:/Users/[user]/Documents/NLP/arabic_tagger/build_model.py", line 6, in <module> nlp = StanfordCoreNLP(corenlp_path, lang='ar', memory='4g') File "C:\Users\[user]\AppData\Roaming\Python\Python36\site-packages\stanfordcorenlp\corenlp.py", line 46, in __init__ if not subprocess.call(['java', '-version'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) == 0: File "C:\Users\[user]\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 267, in call with Popen(*popenargs, **kwargs) as p: File "C:\Users\[user]\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 709, in __init__ restore_signals, start_new_session) File "C:\Users\[user]\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 997, in _execute_child startupinfo) FileNotFoundError: [WinError 2] The system cannot find the file specified

    I have used this wrapper before and am using it in the same way as always:

    corenlp_path = 'D:/Users/[user]/Desktop/StanfordCoreNLP/Full_CoreNLP_3.8.0' nlp = StanfordCoreNLP(corenlp_path, lang='ar', memory='4g')

    Just to be sure, I downloaded version 3.8.0 as well as the Arabic models and made sure they are in the path specified. I'm wondering if the FileNotFoundError is not referring to the CoreNLP path but something else... subprocess.py is in the correct directory. So yeah... not sure what's wrong/what to do.

    Thanks!

    opened by mjrinker 5
  • Batch processing

    Batch processing

    It is a great wrapper. Can you make it run as a batch process as it is too slow to run this each time for a new sentence? I need to make it dependency parse several sentences within seconds. Please look into the issue.

    opened by Subh1m 5
  • How to use stanfordcorenlp to replace all pronouns in a sentence with the nouns

    How to use stanfordcorenlp to replace all pronouns in a sentence with the nouns

    How can I use standfordcorenlp to replace all pronouns with their nouns in a sentence. For example the sentence is : Fred Rogers lives in a house with pets. It is two stories, and he has a dog, a cat, a rabbit, three goldfish, and a monkey.

    This needs to be converted to : Fred Rogers lives in a house with pets. House is two stories, and Fred Rogers has a dog, a cat, a rabbit, three goldfish, and a monkey.

    I had used the below code nlp = StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2017-06-09', quiet=False) props = {'annotators': 'coref', 'pipelineLanguage': 'en'}

    text = 'Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008.' result = json.loads(nlp.annotate(text, properties=props))

    mentions = result['corefs'].items()

    Although, I cannot understand how to read through mentions and perform what I want to do.

    opened by manavpolikara 4
  • Connecting to a unavailable server doesn't throw exception

    Connecting to a unavailable server doesn't throw exception

    When trying to connect to a existing server that is unavailable the code throws no exception and becomes unresponsive while trying to annotate leading to timeout errors.

    opened by GanadiniAkshay 3
  • How to extract relations from stanford annotatedText?

    How to extract relations from stanford annotatedText?

    I am able to annotate text successfully by using stanford-corenlp as follows

    nlp = StanfordCoreNLP('http://localhost', port=9000)
    sentence = '''Michael James editor of Publishers Weekly,
                     Bill Gates is the owner of Microsoft,
                     Obama is the owner of Microsoft,
                     Satish lives in Hyderabad'''
    
    props={'annotators': 'tokenize, ssplit, pos, lemma, ner, regexner,coref',
           'regexner.mapping':'training.txt','pipelineLanguage':'en'}
    
    annotatedText = json.loads(nlp.annotate(sentence, properties=props))
    

    I am trying to get the relation of annotatedText, but it returns nothing

    roles = """
        (.*(                   
        analyst|
        owner|
        lives|
        editor|
        librarian).*)|
        researcher|
        spokes(wo)?man|
        writer|
        ,\sof\sthe?\s*  # "X, of (the) Y"
        """
    ROLES = re.compile(roles, re.VERBOSE)
    for rel in nltk.sem.extract_rels('PERSON', 'ORGANIZATION', annotatedText,corpus='ace', pattern = ROLES):
    print(nltk.sem.rtuple(rel))
    
    

    can you please help how to extract the relation from Stanford annotated text using NLTK nltk.sem.extract_rels

    opened by satishkumarkt 3
  • AttributeError: 'NoneType' object has no attribute 'PROCFS_PATH

    AttributeError: 'NoneType' object has no attribute 'PROCFS_PATH

    Exception ignored in: <bound method StanfordCoreNLP.del of <stanfordcorenlp.corenlp.StanfordCoreNLP object at 0x7f5d1e4856d8>> Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/stanfordcorenlp/corenlp.py", line 111, in del File "/usr/lib/python3/dist-packages/psutil/init.py", line 349, in init File "/usr/lib/python3/dist-packages/psutil/init.py", line 370, in _init File "/usr/lib/python3/dist-packages/psutil/_pslinux.py", line 849, in init File "/usr/lib/python3/dist-packages/psutil/_pslinux.py", line 151, in get_procfs_path AttributeError: 'NoneType' object has no attribute 'PROCFS_PATH AttributeError: 'NoneType' object has no attribute 'PROCFS_PATH

    opened by yzho0907 3
  • Unable to print word_tokenize Chinese word

    Unable to print word_tokenize Chinese word

    I used the example code:

    from stanfordcorenlp import StanfordCoreNLP
    
    # Other human languages support, e.g. Chinese
    sentence = '清华大学位于北京。'
    
    with StanfordCoreNLP(r'install_packages/stanford-corenlp-full-2016-10-31', lang='zh') as nlp:
        print(nlp.word_tokenize(sentence))
        print(nlp.pos_tag(sentence))
        print(nlp.ner(sentence))
        print(nlp.parse(sentence))
        print(nlp.dependency_parse(sentence))
    

    And my output is:

    ['', '', '', '', '']
    [('', 'NR'), ('', 'NN'), ('', 'VV'), ('', 'NR'), ('', 'PU')]
    [('', 'ORGANIZATION'), ('', 'ORGANIZATION'), ('', 'O'), ('', 'GPE'), ('', 'O')]
    (ROOT
      (IP
        (NP (NR 清华) (NN 大学))
        (VP (VV 位于)
          (NP (NR 北京)))
        (PU 。)))
    [('ROOT', 0, 3), ('compound:nn', 2, 1), ('nsubj', 3, 2), ('dobj', 3, 4), ('punct', 3, 5)]
    

    It seems that parse code is able to print correct result, but other codes can't. I don't know why this happened.

    opened by PolarisRisingWar 0
  • 'json.decoder.JSONDecodeError' Error

    'json.decoder.JSONDecodeError' Error

    As the lastest version of Chinese language package met the 'json.decoder.JSONDecodeError', which is caused by http request error :[500], the fuction _request should be updated as(change the annotators conf) :

    def _request(self, annotators=None, data=None, *args, **kwargs): if sys.version_info.major >= 3: data = data.encode('utf-8') if annotators: annotators = 'tokenize,ssplit' + ',' + annotators # NEW language package({version} > 3.9.1.1) API cmd must inintial with these two annototars:tokenize,ssplit properties = {'annotators': annotators, 'outputFormat': 'json'} params = {'properties': str(properties), 'pipelineLanguage': self.lang} if 'pattern' in kwargs: params = {"pattern": kwargs['pattern'], 'properties': str(properties), 'pipelineLanguage': self.lang}

        logging.info(params)
        r = requests.post(self.url, params=params, data=data, headers={'Connection': 'close'})
        print(self.url, r)
        r_dict = json.loads(r.text)
    
        return r_dict
    
    opened by mvllwong 0
  • how to extract data anotator

    how to extract data anotator

    How can I handle error. If there is error like this I'm using like this syntax

    after several trial. I have notice, to load json using this function from this issue

    import nltk, json, pycorenlp, stanfordcorenlp
    from clause import clauseSentence
    
    url_stanford = 'http://corenlp.run'
    model_stanford = "stanford-corenlp-4.0.0"
    try:
      nlp = pycorenlp.StanfordCoreNLP(url_stanford) # pycorenlp
    except:
      nlp = stanfordcorenlp.StanfordCoreNLP(model_stanford) # stanford nlp
    
    props={"annotators":"parse","outputFormat": "json"}
    sent = 'The product shall plot the data points in a scientifically correct manner'
    dataform = nlp.annotate(sent, properties=props)
    try:
      dt = clauseSentence().print_clauses(dataform['sentences'][0]['parse'])
    except:
      parser = json.loads(dataform)
      dt = clauseSentence().print_clauses(parser['sentences'][0]['parse'])
    print(dt)
    

    but there is problem that result this one. JSONDecodeError: Expecting value: line 1 column 1 (char 0)

    opened by asyrofist 0
  • error with

    error with "%"

    when I use this tool to analyse a text which include "%", it raise error like this: image

    and I fix it by changing the post code in corenlp.py like this: image

    opened by liuxin99 0
  • How to use 4class NER classifier instead of default classifier

    How to use 4class NER classifier instead of default classifier

    Hi, I am trying to use 4class NER classifier but not sure how to do this. I tried to search online and found a post but unable to do this in the above Python package. Any help will be greatly appreciated. I tried to follow this link below but not successful. https://stackoverflow.com/questions/31711995/how-to-load-a-specific-classifier-in-stanfordcorenlp

    opened by KRSTD 0
Releases(v3.9.1.1)
Okaeri Robot: a modular bot running on python3 with anime theme and have a lot features

OKAERI ROBOT Okaeri Robot is a modular bot running on python3 with anime theme a

Dream Garden (rey) 2 Jan 19, 2022
Minimal Python client for the Iris API, built on top of Authlib and httpx.

🕸️ Iris Python Client Minimal Python client for the Iris API, built on top of Authlib and httpx. Installation pip install dioptra-iris-client Usage f

Dioptra 1 Jan 28, 2022
Oussama has taken his first dose of vaccine D days ago

Oussama has taken his first dose of vaccine D days ago. He may take the second dose no less than L days and no more than R days since his first dose. Determine if Oussama is too early, too late, or i

INDIA - ENSAM Rabat 2 Feb 01, 2022
Telegram bot for downloading covid-19 vaccine certificate

cowin-certificate-bot This is the source code of @cowincertbot, A telegram bot inspired by the whatsapp bot implementation of indian government for co

ArUn Pt 30 Oct 07, 2022
An Simple Advance Auto Filter Bot Complete Rewritten Version Of Adv-Filter-Bot

Adv Auto Filter Bot V2 This Is Just An Simple Advance Auto Filter Bot Complete Rewritten Version Of Adv-Filter-Bot.. Just Sent Any Text As Query It Wi

0 Dec 18, 2021
Automatically changes your discord status

Automatically changes your discord status, Be careful as this may get you rate limited and banned

octo 5 Sep 20, 2022
Async ShareX uploader written in python

Async ShareX uploader written in python

Jacob 2 Jan 07, 2022
A simple bot that lives in your Telegram group, logging messages to a Postgresql database and serving statistical tables and plots to users as Telegram messages.

telegram-stats-bot Telegram-stats-bot is a simple bot that lives in your Telegram group, logging messages to a Postgresql database and serving statist

22 Dec 26, 2022
Google Translater v2

Google_Translater_V2 Features Supporting 100 More Languages You can Set Your Custom Languages Supporting in Group Configs TG_BOT_TOKEN - Get bot token

Lntechnical 31 Nov 12, 2022
A discord token nuker With loads of options that will screw an account up real bad

A discord token nuker With loads of options that will screw an account up real bad, also has inbuilt massreport, GroupChat Spammer and Token/Password/Creditcard grabber and so much more!

XPTGR 0 Aug 07, 2022
Singer Tap for dbt Artifacts built with the Meltano SDK

tap-dbt-artifacts tap-dbt-artifacts is a Singer tap for dbtArtifacts. Built with the Meltano SDK for Singer Taps.

Prratek Ramchandani 9 Nov 25, 2022
RichWatch is wrapper around AWS Cloud Watch to display beautiful logs with help of Python library Rich.

RichWatch is TUI (Textual User Interface) for AWS Cloud Watch. It formats and pretty prints Cloud Watch's logs so they are much more readable. Because

21 Jul 25, 2022
CLI tool that checks who does and who does not follow you back on Instagram

CLI tool that checks who does and who does not follow you back on Instagram. It also checks who you don't follow back on Instagram.

Ayushman Roy 3 Dec 02, 2022
Fully Automated Omegle Chatbot

omegle-bot tutorial features fast runs in background can run multiple instances at once Requirement Run this command in cmd, terminal or PowerShell (i

6 Aug 07, 2021
Discord-RAID-Tool - Hacks/tools

How to use Python must be installed run install-config If you dont have python installed, download python 3.7.6 and make sure you click on the 'ADD TO

1 Jan 01, 2022
SEMID - OSINT module with lots of discord functions

SEMID Framework About Semid is a framework with different Discord functions and

Hima 20 Sep 23, 2022
A simple API wrapper for the Tenor API

Gifpy A simple API wrapper for the Tenor API Installation Python 3.9 or higher is recommended python3 -m pip install gifpy Clone repository: $ git cl

Juan Ignacio Battiston 4 Dec 22, 2021
HASOKI DDOS TOOL- powerful DDoS toolkit for penetration tests

DDoS Attack Panel includes CloudFlare Bypass (UAM, CAPTCHA, GS ,VS ,BFM, etc..) This is open source code. I am not responsible if you use it for malic

Rebyc 1 Dec 02, 2022
A python package for AxisVM

PyAxisVM The package is under development. Follow us on social media, where we'll announce the first release! Overview The PyAxisVM project offers a h

AxisVM - InterCAD 8 Nov 19, 2022
A python package that allows you to place automated trades using the TD Ameritrade API.

Template Repo Table of Contents Overview Setup Usage Support These Projects Overview Setup Setup - Requirements Install:* For this particular project,

Alex Reed 4 Jan 25, 2022