Find graph motifs using intuitive notation

Overview

d o t m o t i f

Find graph motifs using intuitive notation

PyPI Codecov


DotMotif is a library that identifies subgraphs or motifs in a large graph. It looks like this:

# Look for all motifs of the form,

# Neuron A excites B:
A -> B [type = "excitatory"]
# ...and B inhibits C:
B -> C [type = "inhibitory"]

Or like this:

TwitterInfluencer(person) {
    # An influencer has more than a million
    # followers and is verified.
    person.followers > 1000000
    person.verified = true
}

InfluencerAwkward(person1, person2) {
    # Two people who are both influencers...
    TwitterInfluencer(person1)
    TwitterInfluencer(person2)
    # ...where one follows the other, but...
    person1 -> person2
    # ...the other doesn't follow back
    person2 !> person1
}

# Search for all awkward twitter influencer
# relationships in the dataset:
InfluencerAwkward(X, Y)

Get Started

To follow along in an interactive Binder without installing anything, launch a Jupyter Notebook here:

Binder

If you have DotMotif, a NetworkX graph, and a curious mind, you already have everything you need to start using DotMotif:

from dotmotif import Motif, GrandIsoExecutor

executor = GrandIsoExecutor(graph=my_networkx_graph)

triangle = Motif("""
A -> B
B -> C
C -> A
""")

results = executor.find(triangle)

Parameters

You can also pass optional parameters into the constructor for the dotmotif object. Those arguments are:

Argument Type, Default Behavior
ignore_direction bool: False Whether to disregard direction when generating the database query
limit int: None A limit (if any) to impose on the query results
enforce_inequality bool: False Whether to enforce inequality; in other words, whether two nodes should be permitted to be aliases for the same node. For example, in A->B->C; if A!=C, then set to True
exclude_automorphisms bool: False Whether to return only a single example for each detected automorphism. See more in the documentation

For more details on how to write a query, see Getting Started.


Citing

If this tool is helpful to your research, please consider citing it with:

# https://doi.org/10.1038/s41598-021-91025-5
@article{Matelsky_Motifs_2021, 
    title={{DotMotif: an open-source tool for connectome subgraph isomorphism search and graph queries}},
    volume={11}, 
    ISSN={2045-2322}, 
    url={http://dx.doi.org/10.1038/s41598-021-91025-5}, 
    DOI={10.1038/s41598-021-91025-5}, 
    number={1}, 
    journal={Scientific Reports}, 
    publisher={Springer Science and Business Media LLC}, 
    author={Matelsky, Jordan K. and Reilly, Elizabeth P. and Johnson, Erik C. and Stiso, Jennifer and Bassett, Danielle S. and Wester, Brock A. and Gray-Roncal, William},
    year={2021}, 
    month={Jun}
}
Comments
  • Neuprint Executor - Labeling Edges by ROI

    Neuprint Executor - Labeling Edges by ROI

    Hi Jordan,

    Do you see an easy way to assign ROI labels to edges in the neuprint executor? Let's say I want to query something like this:

    A -> B [weight > 20, ROI == "CX"]
    A -> B [weight > 30, ROI == "CRE(L)"] 
    

    So basically, there are two things here—multigraphs, which you address already in the docs, and encoding edge ROIs. I wonder if that's rather a hard thing to do or not. The data should be there as neuprint-python fetch_synapse_connections returns something like this

        bodyId_pre  bodyId_post roi_pre roi_post  x_pre  y_pre  z_pre  x_post  y_post  z_post  confidence_pre  confidence_post
    0    792368888    754547386  PED(R)   PED(R)  14013  27747  19307   13992   27720   19313           0.996         0.401035
    1    792368888    612742248  PED(R)   PED(R)  14049  27681  19417   14044   27662   19408           0.921         0.881487
    2    792368888   5901225361  PED(R)   PED(R)  14049  27681  19417   14055   27653   19420
    ...
    

    According to this issue it looks like it's possible. My observation is that the physical location of a connection between two neurons is an important feature of a motif. Looking forward to hearing what you say.

    EDIT: Maybe an indirect way to support multiple edges between two nodes is by grouping edge attributes. Does something like this seem plausible. You are doing smth similar in the multigraph docs already: A -> B [synapse_count > 2]. But what exactly is synapse_count?

    A -> B [[weight >= 20, ROI == "CX"], [weight > 30, ROI == "CRE(L)"]]
    

    Best, Jakob

    enhancement cypher Neo4jExecutor NeuPrintExecutor 
    opened by jakobtroidl 9
  • Error on first query

    Error on first query

    Tried to run the query from the tutorial:

    motif = Motif("""
    # My Awesome Motif
    
    Nose_Cell -> Brain_Cell
    Brain_Cell -> Arm_Cell
    """)
    

    But got this error:

    FileNotFoundError                         Traceback (most recent call last)
    <ipython-input-1-3a88159c0a0c> in <module>
    ----> 1 import dotmotif
          2 import networkx
          3 
          4 motif = Motif("""
          5 # My Awesome Motif
    
    ~\anaconda3\lib\site-packages\dotmotif\__init__.py in <module>
         24 from networkx.algorithms import isomorphism
         25 
    ---> 26 from .parsers.v2 import ParserV2
         27 from .validators import DisagreeingEdgesValidator
         28 
    
    ~\anaconda3\lib\site-packages\dotmotif\parsers\v2\__init__.py in <module>
         11 
         12 
    ---> 13 dm_parser = Lark(open(os.path.join(os.path.dirname(__file__), "grammar.lark"), "r"))
         14 
         15 
    
    FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\xxxx\\anaconda3\\lib\\site-packages\\dotmotif\\parsers\\v2\\grammar.lark'
    
    bug parser install 
    opened by lix2k3 9
  • Filtering By Properties w/ Invalid Characters in the Name

    Filtering By Properties w/ Invalid Characters in the Name

    Hey There, I'm using dotmotif to query the neuPrint dataset and have found some of the neurons have properties that aren't accepted in the query string format e.g. 'AVLP(R)': True,

    Is there a way to still query w/ these params? I tried adding directly to the _node_constraints but that doesn't seem to work either e.g.

    motif._node_constraints['A']['AVLP(R)'] = {}
    motif._node_constraints['A']['AVLP(R)']['='] = [True]
    
    Variable `R` not defined (line 2, column 83 (offset: 156))
    "    WHERE B.status = "Traced" AND A.status = "Orphan" AND A.INP = True AND A.AVLP(R) = True"
    
    parser cypher 
    opened by simonwarchol 7
  • fix: weight edge attribute doesn't throw errors anymore (#127)

    fix: weight edge attribute doesn't throw errors anymore (#127)

    The edge attribute in the neuprint executor threw an error with the new JSON feature implementation. I also made the neuprint executor tests more rigorous.

    opened by jakobtroidl 3
  • Upgrade grandiso version to use limits and iterable

    Upgrade grandiso version to use limits and iterable

    In grandiso v1.1.0 and above, there is an optional limit argument to the find_motifs call which short-circuits motif counting if a certain number of valid mappings are found.

    Right now, NetworkX and GrandIso executors implement the dotmotif limit parameter by finding all motifs and then downselecting, which is super inefficient and lame. We could pretty substantially improve performance by supporting the GrandIso limit arg.

    A notable challenge: We perform an additional downselect after running grandiso (to double-check attribute filters). So we may need to store a list of mappings temporarily in order to backfill the results list if candidate mappings are filtered out.

    enhancement GrandIsoExecutor 
    opened by j6k4m8 2
  • Non-string ids not supported by Neo4jExecutor

    Non-string ids not supported by Neo4jExecutor

    Ingesting a NetworkX graph with integer ids results in an error: ValueError: Could not export graph: unsupported operand type(s) for +: 'int' and 'str'. It should be straightforward to handle integers, though A node can be any hashable Python object except None. Maybe just cast with repr.

    question Neo4jExecutor 
    opened by jtpdowns 2
  • Support n constraints on each edge value-operator pair

    Support n constraints on each edge value-operator pair

    Currently, the parser overwrites previous operators if it's redefined:

    A -> B [value<=5, value<=2]
    

    ...will yield a constraint operator of

    { "value": { "lte": 2.0 } }
    

    (i.e. overwriting the first rule).

    bug parser 
    opened by j6k4m8 2
  • Node- and edge-attribute support in DSL

    Node- and edge-attribute support in DSL

    Proposed syntax concepts:

    Nodes

    Inline maplike:

    Node1 { type="GABA", z<12 } -> Node2
    

    Pros:

    • Succinct

    Cons:

    • Possible duplication or conflicting attributes if map is included on multiple lines for the same node

    Postfix where-like:

    Node1 -> Node2 | Node1.type = "GABA", Node1.z < 12
    

    Pros:

    • Succinct

    Cons:

    • Possible duplication or conflicting attributes if attrs are included on multiple lines for the same node

    Footnote constraints

    Node1 -> Node2
    
    Node1.type = "GABA"
    Node1.z < 12
    

    Pros:

    • Reduces possibility of conflicting constraints
    • Clear syntax; can be standalone in its own macro

    Cons:

    • Linecount verbose
    • Decouples attributes from connectivity clauses

    Edges

    Inline maplike:

    A ->{type: "excitatory", neurotransmitter: "ACh"} B
    

    Pros:

    • Inline

    Cons:

    • Reduces clarity of language

    Postfix where-like:

    A -> B | [type: "excitatory", neurotransmitter: "ACh"]
    

    Pros:

    • Inline

    Cons:

    • Reduces clarity of language

    Infix maplike:

    A -[type: "excitatory", neurotransmitter: "ACh"]> B
    

    Pros:

    • Inline

    Cons:

    • Reduces clarity of language
    enhancement DSL 
    opened by j6k4m8 2
  • Add macro edge aliases

    Add macro edge aliases

    This adds support for complex edge constraints in macros:

    decreasing_edge_weights(a, b, c) {
        a -> b as ab
        b -> c as bc
        ab.weight > bc.weight
    }
    
    ...
    

    In increasing levels of challengingness:

    • [x] Add support for simple (edge-value) edge constraints in macros
    • [x] Add support for dynamic (edge-edge) edge constraints in macros
    • [x] Extend support for recursive calls to macros with simple constraints
    • [x] Extend support for recursive calls to macros with dynamic constraints
    • [x] Update documentation

    This fixes #110 and finishes work started in #119.

    enhancement DSL parser 
    opened by j6k4m8 1
  • Add edge aliasing and edge constraints

    Add edge aliasing and edge constraints

    This PR adds support for edge aliases (first described in #110) and comparisons between edge attributes with values and with other edges.

    This enables syntax like this:

    A -> B as ab
    B -> A as ba
    
    ab.weight > ba.weight
    
    • [x] Add support in the DSL
    • [x] Add support in the parser + transformer
    • [x] Add support in the executors:
      • [x] GrandIso
      • [x] NetworkX
      • [x] NeuPrint
      • [x] Neo4j

    I am going to push macro support in a separate PR, since this one is getting pretty lengthy already!

    enhancement DSL parser cypher Neo4jExecutor NetworkXExecutor NeuPrintExecutor GrandIsoExecutor 
    opened by j6k4m8 1
  • Add node attribute bracket syntax

    Add node attribute bracket syntax

    Adds support for "bracket" syntax for node attributes. An attribute like XYZ(ABC) or ABC DEF used to be disallowed because of illegal characters in the attribute name, particularly when using the "dot-attribute" notation:

    # broken:
    A -> B
    A.ABC DEF > 10
    

    The new syntax uses bracket-attribute notation to "escape" these names:

    # working:
    import dotmotif
    from dotmotif.executors.NeuPrintExecutor import NeuPrintExecutor
    
    HOSTNAME = "neuprint.janelia.org"
    DATASET = "hemibrain:v1.2.1"
    TOKEN = "[YOUR TOKEN HERE]"
    
    motif = dotmotif.Motif("""
    A -> B
    A['AVLP(R)'] = True
    """)
    
    E = NeuPrintExecutor(HOSTNAME, DATASET, TOKEN)
    
    E.find(motif, limit=2)
    

    Fixes #111.

    parser cypher Neo4jExecutor NeuPrintExecutor 
    opened by j6k4m8 1
  • Add Impossible Constraints validator

    Add Impossible Constraints validator

    We should be able to automatically catch things like this:

    A.type = 4
    A.type != 4
    

    Right now, we'll catch them in certain instances, but not when constraints are inherited from automorphisms (see #118). Getting smarter about this will likely improve runtime considerably.

    enhancement validator 
    opened by j6k4m8 0
  • Anonymous motif participants

    Anonymous motif participants

    Anonymous motif participants:

    A -> _hidden
    _hidden -> B
    

    Anonymous node participants in macros:

    two_hop(A, B) {
        A -> _i
        _i -> B
    }
    
    two_hop(neuron1, neuron2)
    
    
    
    enhancement DSL parser 
    opened by j6k4m8 0
Releases(v0.13.0)
A collection of awesome sqlite tools, scripts, books, etc

Awesome Series @ Planet Open Data World (Countries, Cities, Codes, ...) • Football (Clubs, Players, Stadiums, ...) • SQLite (Tools, Books, Schemas, ..

Planet Open Data 205 Dec 16, 2022
MySQLdb is a Python DB API-2.0 compliant library to interact with MySQL 3.23-5.1 (unofficial mirror)

==================== MySQLdb Installation ==================== .. contents:: .. Prerequisites ------------- + Python 2.3.4 or higher * http://ww

Sébastien Arnaud 17 Oct 10, 2021
Pony Object Relational Mapper

Downloads Pony Object-Relational Mapper Pony is an advanced object-relational mapper. The most interesting feature of Pony is its ability to write que

3.1k Jan 04, 2023
A simple password manager I typed with python using MongoDB .

Python with MongoDB A simple python code example using MongoDB. How do i run this code • First of all you need to have a python on your computer. If y

31 Dec 06, 2022
Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

Py2neo Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The library supports b

Nigel Small 1.2k Jan 02, 2023
Async ODM (Object Document Mapper) for MongoDB based on python type hints

ODMantic Documentation: https://art049.github.io/odmantic/ Asynchronous ODM(Object Document Mapper) for MongoDB based on standard python type hints. I

Arthur Pastel 732 Dec 31, 2022
A wrapper for SQLite and MySQL, Most of the queries wrapped into commands for ease.

Before you proceed, make sure you know Some real SQL, before looking at the code, otherwise you probably won't understand anything. Installation pip i

Refined 4 Jul 30, 2022
pandas-gbq is a package providing an interface to the Google BigQuery API from pandas

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

Google APIs 348 Jan 03, 2023
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

Amazon Web Services - Labs 3.3k Dec 31, 2022
Py2neo is a comprehensive toolkit for working with Neo4j from within Python applications or from the command line.

Py2neo v3 Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line. The core library ha

64 Oct 14, 2022
A Redis client library for Twisted Python

txRedis Asynchronous Redis client for Twisted Python. Install Install via pip. Usage examples can be found in the examples/ directory of this reposito

Dorian Raymer 127 Oct 23, 2022
A database migrations tool for SQLAlchemy.

Alembic is a database migrations tool written by the author of SQLAlchemy. A migrations tool offers the following functionality: Can emit ALTER statem

SQLAlchemy 1.7k Jan 01, 2023
MongoX is an async python ODM for MongoDB which is built on top Motor and Pydantic.

MongoX MongoX is an async python ODM (Object Document Mapper) for MongoDB which is built on top Motor and Pydantic. The main features include: Fully t

Amin Alaee 112 Dec 04, 2022
CouchDB client built on top of aiohttp (asyncio)

aiocouchdb source: https://github.com/aio-libs/aiocouchdb documentation: http://aiocouchdb.readthedocs.org/en/latest/ license: BSD CouchDB client buil

aio-libs 53 Apr 05, 2022
Neo4j Bolt driver for Python

Neo4j Bolt Driver for Python This repository contains the official Neo4j driver for Python. Each driver release (from 4.0 upwards) is built specifical

Neo4j 762 Dec 30, 2022
dask-sql is a distributed SQL query engine in python using Dask

dask-sql is a distributed SQL query engine in Python. It allows you to query and transform your data using a mixture of common SQL operations and Python code and also scale up the calculation easily

Nils Braun 271 Dec 30, 2022
ClickHouse Python Driver with native interface support

ClickHouse Python Driver ClickHouse Python Driver with native (TCP) interface support. Asynchronous wrapper is available here: https://github.com/myma

Marilyn System 957 Dec 30, 2022
Example Python codes that works with MySQL and Excel files (.xlsx)

Python x MySQL x Excel by Zinglecode Example Python codes that do the processes between MySQL database and Excel spreadsheet files. YouTube videos MyS

Potchara Puttawanchai 1 Feb 07, 2022
aiosql - Simple SQL in Python

aiosql - Simple SQL in Python SQL is code. Write it, version control it, comment it, and run it using files. Writing your SQL code in Python programs

Will Vaughn 1.1k Jan 08, 2023
High level Python client for Elasticsearch

Elasticsearch DSL Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built o

elastic 3.6k Jan 03, 2023