INDRA Network Search¶
The INDRA Network Search service is available at https://network.indra.bio/
Available Service Endpoints¶
There are several endpoints served by the service, shown in the table below. Each endpoint is described further in the sections below.
Endpoint | Available method | Purpose |
---|---|---|
/ and /query | GET | Main page, path search with options:
|
/query/submit | POST | Handles path queries |
/multi_interactors | POST | Handles queries for immediate common up- or downstream interactors from a list of targets or regulators |
/node and /nodes | POST | Check if a node or a list of nodes is in the network |
/bfs_search | POST | Handles breadth first queries from a source or target node. |
Node Existence Lookup¶
The /node and /nodes endpoints serve the prupose of quickly looking up if a particular node exists in the network. To use the /node endpoint you have to POST a json to https://network.indra.bio/node:
Method: POST
JSON: {'node': '<name of node>'}
The return will be a json of the following format:
{'node': '<name of node>', 'in_network': True/False}
To check a list of multiple nodes at the same time, the /nodes enpoint is a better choice. This endpoint expects a list of node names in the incoming json:
Method: POST
JSON: {'nodes': ['<name1>', '<name2>', ...]}
The resulting json will be a node name - boolean dictionary for each passed node name in the initial json:
{'<name1>': True/False, '<name2>': True/False, ...}
Direct Multi Interactors Lookup¶
the /multi_interactors endpoint allows for search of common upstream or downstream interactors to the provided list of targets/regulators. If regulators/targets are provided, the search will be for common targets/regulators. The only required option is to provided a list of regulators or targets. There are several options that are not required and will defulat to different values. The table below describes the available options.
Option | Default | Required | Allowed values | Action |
---|---|---|---|---|
regulators OR targets | Yes | List of node names | Input for search | |
allowed_ns | any ns | No | see below | Skip nodes with namespaces not in the provided list |
belief_cutoff | 0 | No | 0 <= n <= 1 | Skip statements below threshold |
skip_stmt_types | [ ] | No | see below | Skip statements of provided types |
db_only | False | No | True/False | Only allow statements from database sources |
The POST request to the endpoint should look like this:
Method: POST
JSON: {'regulators' OR 'targets': ['<name1>', '<name2>', ...],
'allowed_ns': ['<name space1>', '<name space>', ...],
'belief_cutoff': 0 <= float <=1,
'skip_stmt_types': ['<statement type>', ...],
'db_only': bool}
For allowed_ns, the following values are allowed:
- HGNC
- FPLX
- CHEBI
- PUBCHEM
- MIRBASE
- GO
- MESH
- HMDB
For skip_stmt_types, valid statement types need to be provided. To see a full list of statement types, see https://indra.readthedocs.io/en/latest/modules/statements.html#module-indra.statements.statements
By POSTing “{‘help’: ‘’}” to the endpoint, a small json is returned that describes the options available.
Breadth First Search Endpoint¶
For doing breadth first searches the /bfs_search endpoint can be used. The only required option is to provide a node name to start the search with. Other options are the following:
Option | Default | Required | Allowed values | Action |
---|---|---|---|---|
source | Yes | Any node name | Input for search | |
reverse | False | No | True/False | Reverse the search to search upstream instead of downstream |
allowed_ns | any ns | No | see below | Skip nodes with namespaces not in the provided list |
belief_cutoff | 0 | No | 0 <= float <= 1 | Skip statements below threshold |
skip_stmt_types | [ ] | No | see below | Skip statements of provided types |
depth_limit | 2 | No | 1 < int | Only allow paths up to the provided length |
path_limit | 100 | No | 0 < int | The maximum number of paths to yield from the generator |
max_results | 50 | No | int | The maximum number of results to return |
node_blacklist | [] | No | str | Node names to skip in the search |
terminal_ns | [‘CHEBI’, ‘PUBCHEM’] | No | see below | Node namespaces to end paths on |
max_per_node | 5 | No | True/False | Maximum number of paths from the same leaf parent node to yield |
sign | None | No | ‘+’/’-‘ | Perform a signed search. See below |
‘allowed_ns’ specifies which namespace a node in the resulting paths are allowed to have. The following values are allowed:
- HGNC
- FPLX
- CHEBI
- PUBCHEM
- MIRBASE
- GO
- MESH
- HMDB
‘terminal_ns’ indicates a namespace that when encountered will not yield further paths from it. For example, if one is only interested in paths with a terminal node that is a chemical, ‘PUBCHEM’ and ‘CHEBI’ would be specified here for example.
Signed Breadth First Search¶
By providing a sign with the keyword ‘sign’, the search is done over a signed directed graph instead of a standard directed graph. The sign determines if the result of the path is an up- or downregulation. To search e.g. upregulations of the gene ‘ACE2’, the following JSON would have to be sent:
{'source': 'ACE2',
'reverse': True, # If reverse is True, search is upstream from 'source'
'sign': '+'}
To search for any gene or gene family or protein complex that is upstream of ‘ACE2’, the following JSON should be used:
{'source': 'ACE2',
'reverse': True,
'allowed_ns': ['FPLX', 'HNGC'],
'terminal_ns': []} # 'terminal_ns' defaults to ['CHEBI', 'PUBCHEM']
Path Query Endpoint¶
Note: This section describes the behavior of the POST endpoint /query/submit handling path queries. For documentation of the web UI endpoint /query, go to the web ui documentation. The following list describes the available options for the POST endpoint:
- source (str): the source node for the path.
- target (str): the target for the path.
- stmt_filter ([str]): a list of valid indra statement types or FamPlex child-parent connections (as ‘fplx’) to exclude in the path.
- node_filter ([str]): A list of node namespaces to include in the path
- node_blacklist ([str]): A list of node names to ignore. If a path contains a node in this list, the path will be discarded.
- edge_hash_blacklist ([str/int]): A list of statement hashes (as strings or ints) to ignore. If an edge statement hash is found in this list, it will be discarded from the assembled edge list.
- cull_best_node (int): A positive integer. Every x valid paths, cull the node with the highest (weighted): degree from the network. This increases the variety of paths found and reduces the impact of nodes with extremely high connectivity in the network.
- path_length (int|False): a positive integer stating the number of edges that should be in the returned path. If False, return paths with any number of edges.
- sign (None|str): If ‘no_sign’ or None, do regular unsigned graph search over the directed graph. If ‘plus’/’minus’, only paths with overall up/down regulation will be returned.
- weighted (Bool): If True, do a weighted path search. Weights in the network are assigned as -log(belief score).
- bsco (0 <= float <= 1.0): Belief Score Cut-Off, a positive decimal number <1.0 indicating at what belief score an edge statement should be ignored.
- curated_db_only (Bool): Filter results to only allow edges that are sourced from curated databases.
- fplx_expand (Bool): If True, when no path is found in the initial search, look for paths between the parents of the source and target.
- k_shortest (Bool|int): An integer stating the maximum number of directed paths to return in the result. The maximum allowed value is 50. If False, the maximum number of paths returned will be set to the maximum allowed value.
- user_timeout (float): A decimal specifying the number of seconds to use for timeout. If not provided, the default of 30 seconds is used.
- two_way (Bool): If True, search path both ways, i.e. search A->B and B->A.
The json would look like this:
Method: POST
{'source': <str>,
'target': <str>,
'stmt_filter': [str],
'node_blacklist': [str],
'edge_hash_blacklist': [str/int],
'cull_best_node': [int],
'path_length': <int|False>,
'sign': <None|str>,
'weighted': <Bool>,
'bsco': <float>,
'curated_db_only': <Bool>,
'fplx_expand': <Bool>,
'k_shortest': Bool|<int>,
'user_timeout' : <float>,
'two_way': <Bool>}
You can also read about the specific settings in the docstring of the
IndraNetwork.handle_query
method in
depmap_analysis/indra_depmap_service/api.py
.
Running the Service Locally¶
To run the service locally, two things are needed:
Fetch the latest update to the branch ‘master’ of the depmap_analysis repository from one of the maintainers.
Download the latest network representations of the indra network (might require AWS S3 login):
indranet_dir_graph_latest.pkl
indranet_sign_edge_graph_latest.pkl
(optional)indranet_sign_node_graph_latest.pkl
(optional)
The signed representations of the graph are only needed for signed path search.
Dependecies are Python 3.6+, but otherwise the same as for INDRA and INDRA_DB. In the depmap_analysis.indra_depmap_service, run api.py with the followung arguments:
python -m api.py [-h] [--host HOST] [--port PORT] [--cache DG_GRAPH
MDG_GRAPH|None SIGN_EDGE_GRAPH|None SIGN_NODE_GRAPH|None]
where HOST
is the address to use (default is 127.0.0.1
), PORT
is the port to use (default is 5000
) and DG_GRAPH
, MDG_GRAPH
,
SIGN_EDGE_GRAPH
and SING_NODE_GRAPH
are pickled graphs representing
the INDRA knowledge network in DiGraph, MultiDiGraph and SignedGraph
representations, respectively. The --cache
flag overrides the defaults
in the file so that any file can be provided. If default settings are used
for HOST
and PORT
, a web ui is hosted on http://localhost:5000/query
and query submissions are done to http://localhost:5000/query/submit.