TigerGraphX Quick Start: Using TigerGraph as Vector Database

TigerGraph has supported vector storage since version 4.2. In this guide, we will demonstrate how to use TigerGraph as a pure vector database, without storing edges. This setup can be useful when you want to leverage TigerGraph solely as a vector database. However, to fully unlock the potential of TigerGraph, you can also use it as both a graph and vector storage solution. For more details, refer to the next guide.

This guide assumes that you have already installed TigerGraphX and its dependencies, as outlined in the Installation Guide.

To run this Jupyter Notebook, download the original .ipynb file from quick_start_vector.ipynb.

Create a Graph

Define the TigerGraph Connection Configuration

Since our data is stored in a TigerGraph instance—whether on-premise or in the cloud—we need to configure the connection settings. The recommended approach is to use environment variables, such as setting them with the export command in the shell. Here, to illustrate the demo, we configure them within Python using the os.environ method. You can find more methods for configuring connection settings in Graph.__init__.

>>> import os
>>> os.environ["TG_HOST"] = "http://127.0.0.1"
>>> os.environ["TG_USERNAME"] = "tigergraph"
>>> os.environ["TG_PASSWORD"] = "tigergraph"

Define a Graph Schema

TigerGraph is a schema-based database, which requires defining a schema to structure your graph. A typical schema includes the graph name, nodes (vertices), edges (relationships), and their respective attributes. However, when using TigerGraph as a pure vector database, you only need to define the graph name, the node (vertex) type, and its attributes, including vector attributes.

In this example, we create a graph called "FinancialGraph" with one node type: "Account." This node type has a primary key name, attributes name (string) and isBlocked (boolean), and a vector attribute emb1 (3-dimensional).

>>> graph_schema = {
...     "graph_name": "FinancialGraph",
...     "nodes": {
...         "Account": {
...             "primary_key": "name",
...             "attributes": {
...                 "name": "STRING",
...                 "isBlocked": "BOOL",
...             },
...             "vector_attributes": {"emb1": 3},
...         },
...     },
...     "edges": {}
... }

TigerGraphX offers several methods to define the schema, including a Python dictionary, YAML file, or JSON file. Above is an example using a Python dictionary. For other methods, please refer to Graph.__init__ for more details.

Create a Graph

Running the following command will create a graph using the user-defined schema if it does not already exist. If the graph exists, the command will return the existing graph. To overwrite the existing graph, set the drop_existing_graph parameter to True. Note that creating the graph may take several seconds.

>>> from tigergraphx import Graph
>>> G = Graph(graph_schema)

2025-02-27 17:35:49,124 - tigergraphx.core.managers.schema_manager - INFO - Creating schema for graph: FinancialGraph...
2025-02-27 17:35:52,641 - tigergraphx.core.managers.schema_manager - INFO - Graph schema created successfully.
2025-02-27 17:35:52,642 - tigergraphx.core.managers.schema_manager - INFO - Adding vector attribute(s) for graph: FinancialGraph...
2025-02-27 17:36:52,825 - tigergraphx.core.managers.schema_manager - INFO - Vector attribute(s) added successfully.

Retrieve a Graph and Print Its Schema

Once a graph has been created in TigerGraph, you can retrieve it without manually defining the schema using the Graph.from_db method, which requires only the graph name:

>>> G = Graph.from_db("FinancialGraph")

Now, let's print the schema of the graph in a well-formatted manner:

>>> import json
>>> schema = G.get_schema()
>>> print(json.dumps(schema, indent=4, default=str))

{
    "graph_name": "FinancialGraph",
    "nodes": {
        "Account": {
            "primary_key": "name",
            "attributes": {
                "name": {
                    "data_type": "DataType.STRING",
                    "default_value": null
                },
                "isBlocked": {
                    "data_type": "DataType.BOOL",
                    "default_value": null
                }
            },
            "vector_attributes": {
                "emb1": {
                    "dimension": 3,
                    "index_type": "HNSW",
                    "data_type": "FLOAT",
                    "metric": "COSINE"
                }
            }
        }
    },
    "edges": {}
}

Add Nodes

In this example, we add multiple nodes representing accounts to the graph. Each node is uniquely identified by a name and comes with two attributes:

isBlocked: A Boolean indicating whether the account is blocked.
emb1: A three-dimensional embedding vector.

>>> nodes_for_adding = [
...     ("Scott", {"isBlocked": False, "emb1": [-0.017733968794345856, -0.01019224338233471, -0.016571875661611557]}),
...     ("Jenny", {"isBlocked": False, "emb1": [-0.019265105947852135, 0.0004929182468913496, 0.006711316294968128]}),
...     ("Steven", {"isBlocked": True, "emb1": [-0.01505514420568943, -0.016819344833493233, -0.0221870020031929]}),
...     ("Paul", {"isBlocked": False, "emb1": [0.0011193430982530117, -0.001038988004438579, -0.017158523201942444]}),
...     ("Ed", {"isBlocked": False, "emb1": [-0.003692442551255226, 0.010494389571249485, -0.004631792660802603]}),
... ]
>>> print("Number of Account Nodes Inserted:", G.add_nodes_from(nodes_for_adding, node_type="Account"))

Number of Account Nodes Inserted: 5

For larger datasets, consider using load_data for efficient handling of large-scale data.

Exploring Nodes in the Graph

Display the Number of Nodes

Next, let's verify that the data has been inserted into the graph by using the following command. As expected, the number of nodes is 5.

>>> print(G.number_of_nodes())

Check if Nodes Exist

Use the following commands to check whether specific nodes are present in the graph:

>>> print(G.has_node("Scott"))

True

>>> print(G.has_node("Jenny"))

True

Display Node Attributes

To display all attributes of a given node, use the following command:

>>> print(G.nodes["Scott"])

{'name': 'Scott', 'isBlocked': False}

To display a specific attribute, use the command below:

>>> print(G.nodes["Scott"]["isBlocked"])

False

Filter the Nodes

Retrieve "Account" nodes that match a specific filter expression, request only selected attributes, and limit the results:

>>> df = G.get_nodes(
...     node_type="Account",
...     node_alias="s", # "s" is the default value, so you can remove this line
...     filter_expression="s.isBlocked == False",
...     return_attributes=["name", "isBlocked"],
...     limit=2
... )
>>> print(df)

    name  isBlocked
0   Paul      False
1  Scott      False

Display Node's Vector Attributes

Retrieve the vector attribute of a specific node:

>>> vector = G.fetch_node(
...     node_id="Scott",
...     vector_attribute_name="emb1",
... )
>>> print(vector)

[-0.01773397, -0.01019224, -0.01657188]

Retrieve vector attributes for multiple nodes:

>>> vectors = G.fetch_nodes(
...     node_ids=["Scott", "Jenny"],
...     vector_attribute_name="emb1",
... )
>>> for vector in vectors.items():
...     print(vector)

('Scott', [-0.01773397, -0.01019224, -0.01657188])
('Jenny', [-0.01926511, 0.0004929182, 0.006711317])

Perform Vector Search

Top-k Vector Search on a Given Vertex Type's Vector Attribute

To find the top 2 most similar accounts to "Scott" based on the embedding, we use the following code. As expected, "Scott" will appear in the list with a distance of 0.

>>> results = G.search(
...    data=[-0.017733968794345856, -0.01019224338233471, -0.016571875661611557],
...    vector_attribute_name="emb1",
...    node_type="Account",
...    limit=2
... )
>>> for result in results:
...     print(result)

{'id': 'Scott', 'distance': 0, 'name': 'Scott', 'isBlocked': False}
{'id': 'Steven', 'distance': 0.0325563, 'name': 'Steven', 'isBlocked': True}

Top-k Vector Search Using a Vertex Embedding as the Query Vector

This code performs a top-k vector search for similar nodes to a specified node "Scott". It searches within the "Account" node type using the "emb1" embedding attribute and retrieves the top 2 similar node.

>>> results = G.search_top_k_similar_nodes(
...     node_id="Scott",
...     vector_attribute_name="emb1",
...     node_type="Account",
...     limit=2
... )
>>> for result in results:
...     print(result)

{'id': 'Paul', 'distance': 0.3933879, 'name': 'Paul', 'isBlocked': False}
{'id': 'Steven', 'distance': 0.0325563, 'name': 'Steven', 'isBlocked': True}

Top-k Vector Search with Specified Candidates

This code performs a top-2 vector search on the "Account" node type using the "emb1" embedding attribute. It limits the search to the specified candidate nodes: "Jenny", "Steven", and "Ed".

>>> results = G.search(
...     data=[-0.017733968794345856, -0.01019224338233471, -0.016571875661611557],
...     vector_attribute_name="emb1",
...     node_type="Account",
...     limit=2,
...     candidate_ids=["Jenny", "Steven", "Ed"]
... )
>>> for result in results:
...     print(result)

{'id': 'Steven', 'distance': 0.0325563, 'name': 'Steven', 'isBlocked': True}
{'id': 'Jenny', 'distance': 0.5804119, 'name': 'Jenny', 'isBlocked': False}

Filtered Vector Search

Let's first retrieves all "Account" nodes where the isBlocked attribute is False and returns their name attributes in a Pandas DataFrame.

>>> nodes_df = G.get_nodes(
...     node_type="Account",
...     node_alias="s", # The alias "s" is used in filter_expression. You can remove this line since the default node alias is "s"
...     filter_expression='s.isBlocked == False AND s.name != "Ed"',
...     return_attributes=["name"],
... )
>>> print(nodes_df)

    name
0   Paul
1  Scott
2  Jenny

Then convert the name column of the retrieved DataFrame into a set of candidate IDs and performs a top-2 vector search on the "Account" node type using the "emb1" embedding attribute, restricted to the specified candidate IDs.

>>> candidate_ids = set(nodes_df['name'])
... results = G.search(
...     data=[-0.017733968794345856, -0.01019224338233471, -0.016571875661611557],
...     vector_attribute_name="emb1",
...     node_type="Account",
...     limit=2,
...     candidate_ids=candidate_ids
... )
>>> for result in results:
...     print(result)

{'id': 'Paul', 'distance': 0.393388, 'name': 'Paul', 'isBlocked': False}
{'id': 'Scott', 'distance': 0, 'name': 'Scott', 'isBlocked': False}

Clear and Drop a Graph

Clear the Graph

To clear the data in the graph without dropping it, use the following code:

>>> print(G.clear())

True

Afterwards, you can confirm that there are no nodes in the graph by checking:

>>> print(G.number_of_nodes())

Drop the Graph

To clear the data and completely remove the graph—including schema, loading jobs, and queries—use the following code:

>>> G.drop_graph()

2025-02-27 17:38:44,545 - tigergraphx.core.managers.schema_manager - INFO - Dropping graph: FinancialGraph...
2025-02-27 17:38:47,882 - tigergraphx.core.managers.schema_manager - INFO - Graph dropped successfully.

What’s Next?

Now that you've learned how to use TigerGraph for storing both graph data and vectors, you can dive into more advanced features of TigerGraphX:

TigerGraph Quick Start Guide for Graph and Vector Storage: Quickly get started with TigerGraph for storing both graph and vector data.
API Reference: Dive deeper into TigerGraphX APIs.

Start unlocking the power of graphs with TigerGraphX today!