A fast streaming JSON parser for Python that generates SAX-like events using yajl

Related tags

JSONjson-streamer
Overview

json-streamer Build Status

jsonstreamer provides a SAX-like push parser via the JSONStreamer class and a 'object' parser via the ObjectStreamer class which emits top level entities in any JSON object. Based on the fast c libary 'yajl'. Great for parsing streaming json over a network as it comes in or json objects that are too large to hold in memory altogether.

Dependencies

git clone [email protected]:lloyd/yajl.git
cd yajl
./configure && make install

Setup

pip3 install jsonstreamer

Also available at PyPi - https://pypi.python.org/pypi/jsonstreamer

Example

Shell

python -m jsonstreamer.jsonstreamer < some_file.json

Code

variables which contain the input we want to parse

json_object = """
    {
        "fruits":["apple","banana", "cherry"],
        "calories":[100,200,50]
    }
"""
json_array = """[1,2,true,[4,5],"a"]"""

a catch-all event listener function which prints the events

def _catch_all(event_name, *args):
    print('\t{} : {}'.format(event_name, args))

JSONStreamer Example

Event listeners get events in their parameters and must have appropriate signatures for receiving their specific event of interest.

JSONStreamer provides the following events:

  • doc_start
  • doc_end
  • object_start
  • object_end
  • array_start
  • array_end
  • key - this also carries the name of the key as a string param
  • value - this also carries the value as a string|int|float|boolean|None param
  • element - this also carries the value as a string|int|float|boolean|None param

Listener methods must have signatures that match

For example for events: doc_start, doc_end, object_start, object_end, array_start and array_end the listener must be as such, note no params required

def listener():
    pass

OR, if your listener is a class method, it can have an additional 'self' param as such

def listener(self):
    pass

For events: key, value, element listeners must also receive an additional payload and must be declared as such

def key_listener(key_string):
    pass

import and run jsonstreamer on 'json_object'

from jsonstreamer import JSONStreamer 

print("\nParsing the json object:")
streamer = JSONStreamer() 
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_object[0:10]) #note that partial input is possible
streamer.consume(json_object[10:])
streamer.close()

output

Parsing the json object:
    doc_start : ()
    object_start : ()
    key : ('fruits',)
    array_start : ()
    element : ('apple',)
    element : ('banana',)
    element : ('cherry',)
    array_end : ()
    key : ('calories',)
    array_start : ()
    element : (100,)
    element : (200,)
    element : (50,)
    array_end : ()
    object_end : ()
    doc_end : ()

run jsonstreamer on 'json_array'

print("\nParsing the json array:")
streamer = JSONStreamer() #can't reuse old object, make a fresh one
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_array[0:5])
streamer.consume(json_array[5:])
streamer.close()

output

Parsing the json array:
    doc_start : ()
    array_start : ()
    element : (1,)
    element : (2,)
    element : (True,)
    array_start : ()
    element : (4,)
    element : (5,)
    array_end : ()
    element : ('a',)
    array_end : ()
    doc_end : ()

ObjectStreamer Example

ObjectStreamer provides the following events:

  • object_stream_start
  • object_stream_end
  • array_stream_start
  • array_stream_end
  • pair
  • element

import and run ObjectStreamer on 'json_object'

from jsonstreamer import ObjectStreamer

print("\nParsing the json object:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_object[0:9])
object_streamer.consume(json_object[9:])
object_streamer.close()

output

Parsing the json object:
    object_stream_start : ()
    pair : (('fruits', ['apple', 'banana', 'cherry']),)
    pair : (('calories', [100, 200, 50]),)
    object_stream_end : ()

run the ObjectStreamer on the 'json_array'

print("\nParsing the json array:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_array[0:4])
object_streamer.consume(json_array[4:])
object_streamer.close()

output - note that the events are different for an array

Parsing the json array:
    array_stream_start : ()
    element : (1,)
    element : (2,)
    element : (True,)
    element : ([4, 5],)
    element : ('a',)
    array_stream_end : ()

Example on attaching listeners for various events

ob_streamer = ObjectStreamer()

def pair_listener(pair):
    print('Explicit listener: Key: {} - Value: {}'.format(pair[0],pair[1]))
    
ob_streamer.add_listener('pair', pair_listener) #same for JSONStreamer
ob_streamer.consume(json_object)

ob_streamer.remove_listener(pair_listener) #if you need to remove the listener explicitly

Even easier way of attaching listeners

class MyClass:
    
    def __init__(self):
        self._obj_streamer = ObjectStreamer() #same for JSONStreamer
        
        # this automatically finds listeners in this class and attaches them if they are named
        # using the following convention '_on_eventname'. Note method names in this class
        self._obj_streamer.auto_listen(self) 
    
    def _on_object_stream_start(self):
        print ('Root Object Started')
        
    def _on_pair(self, pair):
        print('Key: {} - Value: {}'.format(pair[0],pair[1]))
        
    def parse(self, data):
        self._obj_streamer.consume(data)
        
        
m = MyClass()
m.parse(json_object)

Troubleshooting

  • If you get an OSError('Yajl cannot be found.') Please ensure that libyajl is available in the relevant directory. For example, on mac(osx) /usr/local/lib should have a "libyajl.dylib" Linux -> libyajl.so Windows -> yajl.dll
Comments
  • Trouble using 'jsonstreamer` with 'yajl-2' on Ubuntu 14.04

    Trouble using 'jsonstreamer` with 'yajl-2' on Ubuntu 14.04

    Hey @kashifrazzaqui

    I have been trying to use your library json-streamer for implementing a Streaming API..

    As directed, I have installed yajl on my Ubuntu 14.04 system and also verified it's presence and correct installation (refer: [1] & [2])

    Still, on running the command python3 -m jsonstreamer.jsonstreamer < test.json i.e. using it with jsonstreamer gives me the following :

      File "/usr/local/lib/python3.4/dist-packages/jsonstreamer/yajl/parse.py", line 29, in load_lib
        raise OSError('Yajl cannot be found.')
    OSError: Yajl cannot be found.
    

    Following up in https://github.com/lloyd/yajl/issues/190 it seems that there might be an issue in the parse.py file itself ? Maybe it's looking for yajl1 and not yajl2.

    Any pointers on this one ? Help appreciated.


    [1] Running gcc -lyajl yields:

    [email protected]:~$ gcc -lyajl
    ....
    /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
    (.text+0x20): undefined reference to `main'
    collect2: error: ld returned 1 exit status
    

    [2] And sudo ldconfig -p | grep yajl results in:

    [email protected]:~$ sudo ldconfig -p | grep yajl
        libyajl.so.2 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libyajl.so.2
    
    opened by jigyasa-grover 10
  • Ensure exception __str__ methods return strings

    Ensure exception __str__ methods return strings

    Hi there,

    Issues that throw JSONStreamerException classes are difficult to debug because there is no expectation that a str will be returned. This makes debugging a PITA.

    awesome_module.py", line 51, in map_step
        url + '\n' + str(e))
    TypeError: __str__ returned non-string (type bytes)
    
    opened by mach-kernel 3
  • Missing tests & tags

    Missing tests & tags

    PyPI has 1.3.6 , and no tests.

    GitHub only has a tag for v1.0.0 , so I cant use that.

    Could you tag v1.3.6 in GitHub, so I can use it to get tests, and finish https://build.opensuse.org/package/show/home:jayvdb:py-new/python-jsonstreamer after https://github.com/kashifrazzaqui/again/issues/8 is also fixed.

    opened by jayvdb 2
  • SyntaxError: invalid syntax

    SyntaxError: invalid syntax

    Traceback (most recent call last): File "test_jsonstreamer.py", line 3, in from jsonstreamer import JSONStreamer File "/usr/local/lib/python2.7/dist-packages/jsonstreamer/init.py", line 9, in from jsonstreamer.jsonstreamer import JSONStreamer, ObjectStreamer File "/usr/local/lib/python2.7/dist-packages/jsonstreamer/jsonstreamer.py", line 12, in from again import events File "/usr/local/lib/python2.7/dist-packages/again/init.py", line 4, in from .events import EventSource, AsyncEventSource File "/usr/local/lib/python2.7/dist-packages/again/events.py", line 49 yield from each(*args, **kwargs) ^ SyntaxError: invalid syntax python --version Python 2.7.3

    opened by tuhaolam 2
  • Want to split a 22M JSON file into smaller files to track a problem

    Want to split a 22M JSON file into smaller files to track a problem

    I have a large JSON file that has an error somewhere. I want to split the up the JSON file into smaller files that are also JSON so that I can find out where the error is. Possible with your package ?

    opened by winash12 1
  • Trouble using 'jsonstreamer` with 'yajl' on Windows 10

    Trouble using 'jsonstreamer` with 'yajl' on Windows 10

    Hey @kashifrazzaqui

    I have been trying to use your library json-streamer for implementing a Streaming API..

    As directed, I have installed yajl on my Windows 10 system and installed it as below:

    C:\Users\mianand\Downloads\lloyd-yajl-2.1.0-0-ga0ecdde\lloyd-yajl-66cb08c\build>nmake install

    Microsoft (R) Program Maintenance Utility Version 14.00.24210.0 Copyright (C) Microsoft Corporation. All rights reserved.

    [ 30%] Built target yajl_s [ 60%] Built target yajl [ 66%] Built target yajl_test [ 72%] Built target gen-extra-close [ 78%] Built target json_reformat [ 84%] Built target json_verify [ 90%] Built target parse_config [100%] Built target perftest Install the project... -- Install configuration: "Release" -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl.lib -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl.dll -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl_s.lib -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_parse.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_gen.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_common.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_tree.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_version.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/share/pkgconfig/yajl.pc -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/bin/json_reformat.exe -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/bin/json_verify.exe

    Still, on running the conda with python 3.6 gives me the following :

    from jsonstreamer import JSONStreamer Traceback (most recent call last): File "", line 1, in File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer_init_.py", line 9, in from jsonstreamer.jsonstreamer import JSONStreamer, ObjectStreamer File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\jsonstreamer.py", line 14, in from .yajl.parse import YajlParser, YajlListener, YajlError File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\yajl\parse.py", line 32, in yajl = load_lib() File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\yajl\parse.py", line 29, in load_lib raise OSError('Yajl cannot be found.') OSError: Yajl cannot be found.

    Any pointers on this one ? Help appreciated.

    opened by mitendraanand 1
  • Not looking for yajl.dll when loading Yajl

    Not looking for yajl.dll when loading Yajl

    In the method load_lib(), there is never an attempt to load Yajl from yajl.dll, which is the name of Yajl on windows. I think it would be rather easy to add this, and make this package useful on Windows as well.

    opened by Groomtar 1
  • pypi version ahead of master branch

    pypi version ahead of master branch

    Please update the PyPI entry of json-streamer https://pypi.python.org/pypi/jsonstreamer/1.3.6 and consider linking there from the short text description here.

    opened by johnyf 1
  • outdated pypi package

    outdated pypi package

    Hi,

    Could you update the pypi package? As far as I see, there were some commits since the last pypi upload. Also, I think it is a bit confusing that there is one tagged release, which is 1.0, while pypi package has 1.3.6 version number, but both of them almost a year older than some important fixes, e.g. the exponential floats. (I can install the file on my own, but I think it would be nice to update the releases.)

    opened by dvolgyes 0
Releases(v1.3.8)
Owner
Kashif Razzaqui
https://medium.com/@kashifrazzaqui
Kashif Razzaqui
Same as json.dumps or json.loads, feapson support feapson.dumps and feapson.loads

Same as json.dumps or json.loads, feapson support feapson.dumps and feapson.loads

boris 5 Dec 01, 2021
A fast JSON parser/generator for C++ with both SAX/DOM style API

A fast JSON parser/generator for C++ with both SAX/DOM style API Tencent is pleased to support the open source community by making RapidJSON available

Tencent 12.6k Dec 30, 2022
API that provides Wordle (ES) solutions in JSON format

Wordle (ES) solutions API that provides Wordle (ES) solutions in JSON format.

Álvaro García Jaén 2 Feb 10, 2022
Make JSON serialization easier

Make JSON serialization easier

4 Jun 30, 2022
A Cobalt Strike Scanner that retrieves detected Team Server beacons into a JSON object

melting-cobalt 👀 A tool to hunt/mine for Cobalt Strike beacons and "reduce" their beacon configuration for later indexing. Hunts can either be expans

Splunk GitHub 150 Nov 23, 2022
Random JSON Key:Pair Json Generator

Random JSON Key:Value Pair Generator This simple script take an engish dictionary of words and and makes random key value pairs. The dictionary has ap

Chris Edwards 1 Oct 14, 2021
Simple, minimal conversion of Bus Open Data Service SIRI-VM data to JSON

Simple, minimal conversion of Bus Open Data Service SIRI-VM data to JSON

Andy Middleton 0 Jan 22, 2022
This open source Python project allow you to create JSON data trees using Minmup.com

This open source Python project allow you to create JSON data trees using Minmup.com. I try to develop this project all the time. But feel free to use :).

Arttu Väisänen 1 Jan 30, 2022
jq for Python programmers Process JSON and HTML on the command-line with familiar syntax.

jq for Python programmers Process JSON and HTML on the command-line with familiar syntax.

Denis Volk 3 Jan 09, 2022
simplejson is a simple, fast, extensible JSON encoder/decoder for Python

simplejson simplejson is a simple, fast, complete, correct and extensible JSON http://json.org encoder and decoder for Python 3.3+ with legacy suppo

1.5k Jan 05, 2023
JSON Schema validation library

jsonschema A JSON Schema validator implementation. It compiles schema into a validation tree to have validation as fast as possible. Supported drafts:

Dmitry Dygalo 309 Jan 01, 2023
Define your JSON schema as Python dataclasses

Define your JSON schema as Python dataclasses

62 Sep 20, 2022
MOSP is a platform for creating, editing and sharing validated JSON objects of any type.

MONARC Objects Sharing Platform Presentation MOSP is a platform for creating, editing and sharing validated JSON objects of any type. You can use any

CASES Luxembourg 72 Dec 14, 2022
No more boilerplate to check and build a Python object from JSON.

JSONloader This module is for you if you're tired of writing boilerplate that: builds a straightforward Python object from loaded JSON. checks that yo

3 Feb 05, 2022
Ibmi-json-beautify - Beautify json string with python

Ibmi-json-beautify - Beautify json string with python

Jefferson Vaughn 3 Feb 02, 2022
JsonParser - Parsing the Json file by provide the node name

Json Parser This project is based on Parsing the json and dumping it to CSV via

Ananta R. Pant 3 Aug 08, 2022
Low code JSON to extract data in one line

JSON Inline Low code JSON to extract data in one line ENG RU Installation pip install json-inline Usage Rules Modificator Description ?key:value Searc

Aleksandr Sokolov 12 Mar 09, 2022
A tools to find the path of a specific key in deep nested JSON.

如何快速从深层嵌套 JSON 中找到特定的 Key #公众号 在爬虫开发的过程中,我们经常遇到一些 Ajax 加载的接口会返回 JSON 数据。

kingname 56 Dec 13, 2022
A python library to convert arbitrary strings representing business opening hours into a JSON format that's easier to use in code

A python library to convert arbitrary strings representing business opening hours into a JSON format that's easier to use in code

Adrian Edwards 9 Dec 02, 2022
JSONManipulator is a Python package to retrieve, add, delete, change and store objects in JSON files.

JSONManipulator JSONManipulator is a Python package to retrieve, add, delete, change and store objects in JSON files. Installation Use the package man

Andrew Polukhin 1 Jan 07, 2022