TigerGraphX Quick Start: Using TigerGraph for Graph and Vector Database

Follow this guide to quickly get started with TigerGraph for storing both graph data and vectors. This guide assumes that you have already installed TigerGraphX and its dependencies, as outlined in the Installation Guide.

To run this Jupyter Notebook, download the original .ipynb file from quick_start_both.ipynb.

In this quick start guide, we will work with the following graph:

Financial Graph

Create a Graph

Define the TigerGraph Connection Configuration

Since our data is stored in a TigerGraph instance—whether on-premise or in the cloud—we need to configure the connection settings. The recommended approach is to use environment variables, such as setting them with the export command in the shell. Here, to illustrate the demo, we configure them within Python using the os.environ method. You can find more methods for configuring connection settings in Graph.__init__.

>>> import os
>>> os.environ["TG_HOST"] = "http://127.0.0.1"
>>> os.environ["TG_USERNAME"] = "tigergraph"
>>> os.environ["TG_PASSWORD"] = "tigergraph"

Define a Graph Schema

TigerGraph is a schema-based database, which requires defining a schema to structure your graph. This schema specifies the graph name, nodes (vertices), edges (relationships), and their respective attributes.

In this example, we create a graph called "FinancialGraph" with three node types: "Account," "City," and "Phone," and three edge types: "transfer," "hasPhone," and "isLocatedIn."

Nodes:
Account: Primary key name, attributes name (string) and isBlocked (boolean), vector attribute emb1 (3).
City: Primary key name, attribute name (string).
Phone: Primary key number, attributes number (string) and isBlocked (boolean), vector attribute emb1 (3).
Edges:
transfer: Directed multi-edge between "Account" nodes, identified by date (datetime) as the unique identifier for each edge between a pair of source and target nodes. The edge has an attribute amount (integer).
hasPhone: Undirected edge between "Account" and "Phone" nodes.
isLocatedIn: Directed edge between "Account" and "City" nodes.

This schema defines the structure of the "FinancialGraph" with nodes and edges and their respective attributes.

>>> graph_schema = {
...     "graph_name": "FinancialGraph",
...     "nodes": {
...         "Account": {
...             "primary_key": "name",
...             "attributes": {
...                 "name": "STRING",
...                 "isBlocked": "BOOL",
...             },
...             "vector_attributes": {"emb1": 3},
...         },
...         "City": {
...             "primary_key": "name",
...             "attributes": {
...                 "name": "STRING",
...             },
...         },
...         "Phone": {
...             "primary_key": "number",
...             "attributes": {
...                 "number": "STRING",
...                 "isBlocked": "BOOL",
...             },
...             "vector_attributes": {"emb1": 3},
...         },
...     },
...     "edges": {
...         "transfer": {
...             "is_directed_edge": True,
...             "from_node_type": "Account",
...             "to_node_type": "Account",
...             "discriminator": "date",
...             "attributes": {
...                 "date": "DATETIME",
...                 "amount": "INT",
...             },
...         },
...         "hasPhone": {
...             "is_directed_edge": False,
...             "from_node_type": "Account",
...             "to_node_type": "Phone",
...         },
...         "isLocatedIn": {
...             "is_directed_edge": True,
...             "from_node_type": "Account",
...             "to_node_type": "City",
...         },
...     },
... }

TigerGraphX offers several methods to define the schema, including a Python dictionary, YAML file, or JSON file. Above is an example using a Python dictionary. For other methods, please refer to Graph.__init__ for more details.

Create a Graph

Running the following command will create a graph using the user-defined schema if it does not already exist. If the graph exists, the command will return the existing graph. To overwrite the existing graph, set the drop_existing_graph parameter to True. Note that creating the graph may take several seconds.

>>> from tigergraphx import Graph
>>> G = Graph(graph_schema)

2025-02-28 14:17:12,187 - tigergraphx.core.managers.schema_manager - INFO - Creating schema for graph: FinancialGraph...
2025-02-28 14:17:15,857 - tigergraphx.core.managers.schema_manager - INFO - Graph schema created successfully.
2025-02-28 14:17:15,857 - tigergraphx.core.managers.schema_manager - INFO - Adding vector attribute(s) for graph: FinancialGraph...
2025-02-28 14:18:31,787 - tigergraphx.core.managers.schema_manager - INFO - Vector attribute(s) added successfully.

Retrieve a Graph and Print Its Schema

Once a graph has been created in TigerGraph, you can retrieve it without manually defining the schema using the Graph.from_db method, which requires only the graph name:

>>> G = Graph.from_db("FinancialGraph")

Now, let's print the schema of the graph in a well-formatted manner:

>>> import json
>>> schema = G.get_schema()
>>> print(json.dumps(schema, indent=4, default=str))

{
    "graph_name": "FinancialGraph",
    "nodes": {
        "Account": {
            "primary_key": "name",
            "attributes": {
                "name": {
                    "data_type": "DataType.STRING",
                    "default_value": null
                },
                "isBlocked": {
                    "data_type": "DataType.BOOL",
                    "default_value": null
                }
            },
            "vector_attributes": {
                "emb1": {
                    "dimension": 3,
                    "index_type": "HNSW",
                    "data_type": "FLOAT",
                    "metric": "COSINE"
                }
            }
        },
        "City": {
            "primary_key": "name",
            "attributes": {
                "name": {
                    "data_type": "DataType.STRING",
                    "default_value": null
                }
            },
            "vector_attributes": {}
        },
        "Phone": {
            "primary_key": "number",
            "attributes": {
                "number": {
                    "data_type": "DataType.STRING",
                    "default_value": null
                },
                "isBlocked": {
                    "data_type": "DataType.BOOL",
                    "default_value": null
                }
            },
            "vector_attributes": {
                "emb1": {
                    "dimension": 3,
                    "index_type": "HNSW",
                    "data_type": "FLOAT",
                    "metric": "COSINE"
                }
            }
        }
    },
    "edges": {
        "transfer": {
            "is_directed_edge": true,
            "from_node_type": "Account",
            "to_node_type": "Account",
            "discriminator": "{'date'}",
            "attributes": {
                "date": {
                    "data_type": "DataType.DATETIME",
                    "default_value": null
                },
                "amount": {
                    "data_type": "DataType.INT",
                    "default_value": null
                }
            }
        },
        "hasPhone": {
            "is_directed_edge": false,
            "from_node_type": "Account",
            "to_node_type": "Phone",
            "discriminator": "set()",
            "attributes": {}
        },
        "isLocatedIn": {
            "is_directed_edge": true,
            "from_node_type": "Account",
            "to_node_type": "City",
            "discriminator": "set()",
            "attributes": {}
        }
    }
}

Add Nodes and Edges

Add Nodes

The following code adds three types of nodes to the graph:

Account Nodes:
Each account node is identified by a name and includes two attributes:
isBlocked: A boolean indicating if the account is blocked.
emb1: A three-dimensional embedding vector.
Phone Nodes:
Each phone node is identified by a phone number and has the same attributes as account nodes (isBlocked and emb1).
City Nodes:
Each city node is identified by its name. No additional attributes are required.

For each node type, the code prints the number of nodes inserted.

>>> nodes_for_adding = [
...     ("Scott", {"isBlocked": False, "emb1": [-0.017733968794345856, -0.01019224338233471, -0.016571875661611557]}),
...     ("Jenny", {"isBlocked": False, "emb1": [-0.019265105947852135, 0.0004929182468913496, 0.006711316294968128]}),
...     ("Steven", {"isBlocked": True, "emb1": [-0.01505514420568943, -0.016819344833493233, -0.0221870020031929]}),
...     ("Paul", {"isBlocked": False, "emb1": [0.0011193430982530117, -0.001038988004438579, -0.017158523201942444]}),
...     ("Ed", {"isBlocked": False, "emb1": [-0.003692442551255226, 0.010494389571249485, -0.004631792660802603]}),
... ]
>>> print("Number of Account Nodes Inserted:", G.add_nodes_from(nodes_for_adding, node_type="Account"))

>>> nodes_for_adding = [
...     ("718-245-5888", {"isBlocked": False, "emb1": [0.0023173028603196144, 0.018836047500371933, 0.03107452765107155]}),
...     ("650-658-9867", {"isBlocked": True, "emb1": [0.01969221793115139, 0.018642477691173553, 0.05322211980819702]}),
...     ("352-871-8978", {"isBlocked": False, "emb1": [-0.003442931454628706, 0.016562696546316147, 0.012876809574663639]}),
... ]
>>> print("Number of Phone Nodes Inserted:", G.add_nodes_from(nodes_for_adding, node_type="Phone"))

>>> nodes_for_adding = ["New York", "Gainesville", "San Francisco"]
>>> print("Number of City Nodes Inserted:", G.add_nodes_from(nodes_for_adding, node_type="City"))

Number of Account Nodes Inserted: 5
Number of Phone Nodes Inserted: 3
Number of City Nodes Inserted: 3

Add Edges

Next, we add relationships between nodes by creating edges:

hasPhone Edges:
These edges connect account nodes to phone nodes. Each tuple specifies an edge from an account to a phone number.
isLocatedIn Edges:
These edges connect account nodes to city nodes. Each tuple specifies an edge from an account to a city.
transfer Edges:
These edges connect one account node to another, representing a transfer relationship. Each edge tuple includes additional attributes:
date: The date of the transfer.
amount: The amount transferred.

For each relationship type, the code prints the number of edges inserted.

>>> ebunch_to_add = [
...     ("Scott", "718-245-5888"),
...     ("Jenny", "718-245-5888"),
...     ("Jenny", "650-658-9867"),
...     ("Paul", "650-658-9867"),
...     ("Ed", "352-871-8978"),
... ]
>>> print("Number of hasPhone Edges Inserted:", G.add_edges_from(ebunch_to_add, "Account", "hasPhone", "Phone"))

>>> ebunch_to_add = [
...     ("Scott", "New York"),
...     ("Jenny", "San Francisco"),
...     ("Steven", "San Francisco"),
...     ("Paul", "Gainesville"),
...     ("Ed", "Gainesville"),
... ]
>>> print("Number of isLocatedIn Edges Inserted:", G.add_edges_from(ebunch_to_add, "Account", "isLocatedIn", "City"))

>>> ebunch_to_add = [
...     ("Scott", "Ed", {"date": "2024-01-04", "amount": 20000}),
...     ("Scott", "Ed", {"date": "2024-02-01", "amount": 800}),
...     ("Scott", "Ed", {"date": "2024-02-14", "amount": 500}),
...     ("Jenny", "Scott", {"date": "2024-04-04", "amount": 1000}),
...     ("Paul", "Jenny", {"date": "2024-02-01", "amount": 653}),
...     ("Steven", "Jenny", {"date": "2024-05-01", "amount": 8560}),
...     ("Ed", "Paul", {"date": "2024-01-04", "amount": 1500}),
...     ("Paul", "Steven", {"date": "2023-05-09", "amount": 20000}),
... ]
>>> print("Number of transfer Edges Inserted:", G.add_edges_from(ebunch_to_add, "Account", "transfer", "Account"))

Number of hasPhone Edges Inserted: 5
Number of isLocatedIn Edges Inserted: 5
Number of transfer Edges Inserted: 8

For larger datasets, consider using load_data for more efficient handling of large-scale data.

Exploring Nodes and Edges in the Graph

Display the Number of Nodes and Edges

You can verify that the data has been inserted into the graph by running the following commands. For this example, the graph contains 11 nodes and 18 edges.

>>> print(G.number_of_nodes())

>>> print(G.number_of_edges())

Check if Nodes and Edges Exist

To confirm the presence of specific nodes and edges, you can use the following checks:

>>> print(G.has_node("Scott", "Account"))

True

>>> print(G.has_node("Ed", "Account"))

True

>>> print(G.has_edge("Scott", "Ed", src_node_type="Account", edge_type="transfer", tgt_node_type="Account"))

True

Display Node and Edge Attributes

Node Attributes

To view all attributes of a specific node, you can use:

>>> print(G.nodes[("Account", "Scott")])

{'name': 'Scott', 'isBlocked': False}

To retrieve a particular attribute, you can run:

>>> print(G.nodes[("Account", "Scott")]["isBlocked"])

False

Edge Attributes

You can fetch and display all attributes associated with the source-to-target relationship for a specific edge type. If multiple edges exist between the start node and the target node, the result will be a dictionary where each key represents the edge's discriminator, and each value is a dictionary of attribute name-value pairs. If there is only one edge, the result will be a single dictionary of attribute name-value pairs.

>>> edges = G.get_edge_data("Scott", "Ed", src_node_type="Account", edge_type="transfer", tgt_node_type="Account")
>>> for key, value in edges.items():
...     print(key, value)

2024-02-14 00:00:00 {'date': '2024-02-14 00:00:00', 'amount': 500}
2024-01-04 00:00:00 {'date': '2024-01-04 00:00:00', 'amount': 20000}
2024-02-01 00:00:00 {'date': '2024-02-01 00:00:00', 'amount': 800}

Display Node's Vector Attributes

To retrieve a vector attribute for a single node:

>>> vector = G.fetch_node(
...     node_id="Scott",
...     node_type="Account",
...     vector_attribute_name="emb1",
... )
>>> print(vector)

[-0.01773397, -0.01019224, -0.01657188]

For multiple nodes:

>>> vectors = G.fetch_nodes(
...     node_ids=["Scott", "Jenny"],
...     node_type="Account",
...     vector_attribute_name="emb1",
... )
>>> for vector in vectors.items():
...     print(vector)

('Scott', [-0.01773397, -0.01019224, -0.01657188])
('Jenny', [-0.01926511, 0.0004929182, 0.006711317])

Filter the Nodes

To retrieve nodes that match a specific condition, return only selected attributes, and limit the results:

>>> df = G.get_nodes(
...     node_type="Account",
...     node_alias="s", # "s" is the default value, so you can remove this line
...     filter_expression="s.isBlocked == False",
...     return_attributes=["name", "isBlocked"],
...     limit=2
... )
>>> print(df)

    name  isBlocked
0   Paul      False
1  Scott      False

Display the Degree of Nodes

To check the degree (number of connections) of a specific node:

>>> print(G.degree("Ed", "Account"))

To filter by edge type:

>>> print(G.degree("Ed", "Account", edge_types="transfer"))

>>> print(G.degree("Ed", "Account", edge_types="reverse_transfer"))

>>> print(G.degree("Ed", "Account", edge_types="hasPhone"))

>>> print(G.degree("Ed", "Account", edge_types="isLocatedIn"))

Graph Traversal

Retrieve a Node’s Neighbors

Retrieve the first two “Account” nodes that Paul has transferred to, applying a filter to edges where the amount is greater than 653. Return the target node’s “name” and “isBlocked” attributes:

>>> df = G.get_neighbors(
...     start_nodes="Paul",
...     start_node_type="Account",
...     start_node_alias="s", # "s" is the default value, so you can remove this line
...     edge_types="transfer",
...     edge_alias="e", # "e" is the default value, so you can remove this line
...     target_node_types="Account",
...     target_node_alias="t", # "t" is the default value, so you can remove this line
...     filter_expression="e.amount > 653",
...     return_attributes=["name", "isBlocked"],
...     limit=2,
... )
>>> print(df)

     name  isBlocked
0  Steven       True

Breadth-First Search

Below is an example of multi-hop neighbor traversal:

>>> df = G.bfs(
...     start_nodes=["Paul"], 
...     node_type="Account", 
...     edge_types=["transfer", "reverse_transfer"], 
...     max_hops=2
... )
>>> print(df)

    name  isBlocked
1  Scott      False

Perform Vector Search

Top-k Vector Search on a Given Vertex Type's Vector Attribute

To find the top 3 most similar accounts to "Scott" based on the embedding, we use the following code. As expected, "Scott" will appear in the list with a distance of 0.

>>> results = G.search(
...     data=[-0.017733968794345856, -0.01019224338233471, -0.016571875661611557],
...     vector_attribute_name="emb1",
...     node_type="Account",
...     limit=2
... )
>>> for result in results:
...     print(result)

{'id': 'Scott', 'distance': 0, 'name': 'Scott', 'isBlocked': False}
{'id': 'Steven', 'distance': 0.0325563, 'name': 'Steven', 'isBlocked': True}

Top-k Vector Search on a Set of Vertex Types' Vector Attributes

The code below performs a multi-vector attribute search on "Account" and "Phone" node types using two vector attributes (emb1). It retrieves the top 5 similar nodes and fetches the isBlocked attribute for each result.

>>> results = G.search_multi_vector_attributes(
...     data=[-0.003442931454628706, 0.016562696546316147, 0.012876809574663639],
...     vector_attribute_names=["emb1", "emb1"],
...     node_types=["Account", "Phone"],
...     limit=5,
...     return_attributes_list=[["isBlocked"], ["isBlocked"]]
... )
>>> for result in results:
...     print(result)

{'id': '352-871-8978', 'distance': 1.788139e-07, 'isBlocked': False}
{'id': '718-245-5888', 'distance': 0.09038806, 'isBlocked': False}
{'id': '650-658-9867', 'distance': 0.2705743, 'isBlocked': True}
{'id': 'Ed', 'distance': 0.5047379, 'isBlocked': False}
{'id': 'Jenny', 'distance': 0.6291004, 'isBlocked': False}

Top-k Vector Search Using a Vertex Embedding as the Query Vector

This code performs a top-k vector search for similar nodes to a specified node "Scott". It searches within the "Account" node type using the "emb1" embedding attribute and retrieves the top 2 similar node.

>>> G.search_top_k_similar_nodes(
...     node_id="Scott",
...     vector_attribute_name="emb1",
...     node_type="Account",
...     limit=2
... )

[{'id': 'Paul', 'distance': 0.3933879, 'name': 'Paul', 'isBlocked': False},
 {'id': 'Steven', 'distance': 0.0325563, 'name': 'Steven', 'isBlocked': True}]

Top-k Vector Search with Specified Candidates

This code performs a top-2 vector search on the "Account" node type using the "emb1" embedding attribute. It limits the search to the specified candidate nodes: "Jenny", "Steven", and "Ed".

>>> results = G.search(
...     data=[-0.017733968794345856, -0.01019224338233471, -0.016571875661611557],
...     vector_attribute_name="emb1",
...     node_type="Account",
...     limit=2,
...     candidate_ids=["Jenny", "Steven", "Ed"]
... )
>>> for result in results:
...     print(result)

{'id': 'Steven', 'distance': 0.0325563, 'name': 'Steven', 'isBlocked': True}
{'id': 'Jenny', 'distance': 0.5804119, 'name': 'Jenny', 'isBlocked': False}

Filtered Vector Search

Let's first retrieves all "Account" nodes where the isBlocked attribute is False and returns their name attributes in a Pandas DataFrame.

>>> nodes_df = G.get_nodes(
...     node_type="Account",
...     node_alias="s", # The alias "s" is used in filter_expression. You can remove this line since the default node alias is "s"
...     filter_expression='s.isBlocked == False AND s.name != "Ed"',
...     return_attributes=["name"],
... )
>>> print(nodes_df)

    name
0  Scott
1   Paul
2  Jenny

Then convert the name column of the retrieved DataFrame into a set of candidate IDs and performs a top-2 vector search on the "Account" node type using the "emb1" embedding attribute, restricted to the specified candidate IDs.

>>> candidate_ids = set(nodes_df['name'])
... results = G.search(
...     data=[-0.017733968794345856, -0.01019224338233471, -0.016571875661611557],
...     vector_attribute_name="emb1",
...     node_type="Account",
...     limit=2,
...     candidate_ids=candidate_ids
... )
>>> for result in results:
...     print(result)

{'id': 'Paul', 'distance': 0.393388, 'name': 'Paul', 'isBlocked': False}
{'id': 'Scott', 'distance': 0, 'name': 'Scott', 'isBlocked': False}

Clear and Drop a Graph

Clear the Graph

To clear the data in the graph without dropping it, use the following code:

>>> print(G.clear())

True

Afterwards, you can confirm that there are no nodes in the graph by checking:

>>> print(G.number_of_nodes())

Drop the Graph

To clear the data and completely remove the graph—including schema, loading jobs, and queries—use the following code:

>>> G.drop_graph()

2025-02-28 14:22:33,656 - tigergraphx.core.managers.schema_manager - INFO - Dropping graph: FinancialGraph...
2025-02-28 14:22:36,658 - tigergraphx.core.managers.schema_manager - INFO - Graph dropped successfully.

What’s Next?

Now that you've learned how to use TigerGraph for storing both graph data and vectors, you can dive into more advanced features of TigerGraphX:

GraphRAG Overview: Learn about integrating graphs with LLMs.
API Reference: Dive deeper into TigerGraphX APIs.

Start unlocking the power of graphs with TigerGraphX today!