SAP HANA database is continuously evolving with best fit functionalities to suffice varied needs of end user. HANA as a database supports more than primitive data types along with the defined set of operations on them. In the world of connected data, defining relationship among the available data set is one of the important aspects. RDBMSs are one such choice for storage of information's like financial records, manufacturing and logistical information, personnel data, and other applications.SAP HANA is at its core a columnar store optimized for relational records, which suffices the above mentioned needs and it is not just that. Now it is also possible to identify relationships between the records in a deployment as a graph store without having to use an external store for same purpose.
From SPS12 version, HANA can be used as a Graph Database. What do we mean by 'Graph Database' here? let us have a quick glimpse of what it is and proceed ahead with the computational capabilities in HANA to achieve the same.
There are no isolated pieces of information in this connected world ,but rich and connected domains all around us. Graph Database embraces relationships as a core aspect of its data model to store, process, and query connections efficiently. Conventional data storage mechanism in a DB computes relationships expensively at query time, on the other hand graph database stores connections as first class citizens, readily available for any “join-like” navigation operation. Accessing those already persistent connections is an efficient, constant-time operation and allows us to quickly traverse millions of connections per second per core.Independent of the total size of our data set, graph databases excel at managing highly connected data and complex queries. Armed only with a pattern and a set of starting points, graph databases explore the larger neighborhood around the initial starting points — collecting and aggregating information from millions of nodes and relationships — leaving the billions outside the search perimeter untouched.
Thus, instead of writing queries that are highly recursive or that span across multiple tables which increases the return time of the result in a relational DB structure, we are approaching towards a new design for the quick traversal of relationships between entities and are termed as Graph Database.
After understanding the base nature of Graph Database let us go ahead and realize the capabilities in HANA to achieve it.
SAP HANA Graph is an integral part of SAP HANA core functionality. It expands the SAP HANA platform with native support for graph processing and allows us to execute typical graph operations on the data stored in an SAP HANA system.
In SAP HANA, a graph is a set of vertices and a set of edges. Each edge connects two vertices; one vertex is denoted as the source and the other as the target. Edges are always directed and there can be two or more edges connecting the same two vertices. Vertices and edges can have an arbitrary number of attributes. A vertex attribute consists of a name that is associated with a data type and a value. Edge attributes consist of the same information.
There are quite a few Graph Algorithms defined in HANA to work upon the data defined in Graph Structure based on user requirement.
Let us understand each of the algorithms by taking an example.
Fraud Detection:
Banks and Insurance companies lose billions of dollars every year to fraud. Traditional methods of fraud detection play an important role in minimizing these losses. However increasingly sophisticated fraudsters have developed a variety of ways to elude discovery, both by working together, and by leveraging various other means of constructing false identities. Graph Databases offer new methods of uncovering fraud rings and other sophisticated scams with a high-level of accuracy, and are capable of stopping advanced fraud scenarios in real-time.Understanding the connections between data, and deriving meaning from
these links, doesn’t necessarily mean gathering new data. Significant insightscan be drawn from one’s existing data, simply by reframing the problem and
looking at it in a new way: as a Graph
Insurance Fraud:Insurance fraud attracts sophisticated criminal rings who are often very effective in circumventing fraud detection measures, Once again, graph databases can be a powerful tool in combating collusive fraud. In a typical hard fraud scenario, rings of fraudsters work together to stage fake accidents and claim soft tissue injuries. These fake accidents never really happen. Such rings normally include a number of roles.
1.Providers: Collusions typically involve participation from professionals in several categories:
a. Doctors, who diagnose false injuries
b. Lawyers, who file fraudulent claims, and
c. Body shops, which misrepresent damage to cars
2.Participants: These are the people involved in the (false) accident, and normally include:
a. Drivers
b. Passengers
c. Pedestrians
d. Witnesses
Fraudsters often create and manage rings by “recycling” participants so as to stage many accidents.
Thus one accident may have a particular person play the role of the driver. In another accident the same person may be a passenger or a pedestrian, and in another a witness. Clever usage of roles can generate a large number of costly fake accidents, even with a small number of participants as is shown below :
![Fraud.PNG]()
Traditional approach to discover the above ring requires joining a number of tables in a complex schema such as Accidents, Vehicles, Owners, Drivers, Passengers, Pedestrians, Witnesses, Providers, and joining these together multiple times— once per potential role— in order to uncover the full picture. Because such operations are so complex and costly, particularly for very large data sets, this crucial form of analysis is often overlooked
To achieve this, graph databases are well suited, as it becomes a simple question of walking the graph to find the fraud rings.
Below figure shows the insurance-fraud ring scenario that can be modeled in a graph data structure.
![Graph_Model.PNG]()
In the above insurance-fraud scenario there are multiple vertices/Nodes (People involved in the act, Cars used, Events that claimed the insurance) and edges/relationships (role that is played by each of the former defined vertices).
Let us create Graph Database tables for the above edges and vertices using the below SQL statements:
Vertex Table:
CREATECOLUMNTABLE "INSURANCE_FRAUD"."MEMBERS" (
"NAME" VARCHAR(100) PRIMARYKEY,
"TYPE" VARCHAR(100)
);
Similarly let us create database table for edges that defines various roles played by the above vertices :
Edge Table:
CREATECOLUMNTABLE "INSURANCE_FRAUD"."RELATIONSHIPS" (
"KEY" INTUNIQUENOTNULL,
"SOURCE" VARCHAR(100) NOTNULL
REFERENCES "INSURANCE_FRAUD"."MEMBERS" ("NAME")
ONUPDATECASCADEONDELETECASCADE,
"TARGET" VARCHAR(100) NOTNULL
REFERENCES "INSURANCE_FRAUD"."MEMBERS" ("NAME")
ONUPDATECASCADEONDELETECASCADE,
"TYPE" VARCHAR(100)
);
Here the above created edge table must have a column as Primary Key and two other columns as Foreign Key's referring the Primary Key column of the vertex table and inferring source and target nodes of a particular edge/relationship.
Let us now insert the data into Vertex table to define the set of involved people, cars and events in the fraud-insurance act :
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('John', 'Person');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Williams', 'Person');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Jane', 'Person');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Tony', 'Person');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Peter', 'Person');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Robert', 'Person');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Maruti', 'Car');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Benz', 'Car');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('BMW', 'Car');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Audi', 'Car');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Accident1', 'Accident');
INSERTINTO "INSURANCE_FRAUD"."MEMBERS" VALUES('Accident2', 'Accident');
Now let us insert data into the Edge table to define various relationships between the vertexes like Drives, medicates, witnesses etc so as to figure out the multiple fake roles(edges) played by same people(vertex) in the act.
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(1,'John','Maruti','Drives');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(2,'John','BMW','Witnesses');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(3,'Williams','Benz','Drives');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(4,'Williams','BMW','Is_Passenger');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(5,'Jane','Audi','Drives');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(6,'Tony','John','Advocates');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(7,'Tony','Jane','Advocates');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(8,'Peter','Jane','Medicate');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(9,'Robert','Audi','Repairs');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(10,'Maruti','Accident1','Involves');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(11,'Benz','Accident1','Involves');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(12,'BMW','Accident2','Involves');
insertinto "INSURANCE_FRAUD"."RELATIONSHIPS" values(13,'Audi','Accident2','Involves');
The maximum number of attributes is bound by the maximum number of columns for the above table. One of the vertex attributes must uniquely identify vertices. This attribute is also referred to as a vertex key. Similarly, one of the edge attributes must uniquely identify edges and is referred to as edge key. The edge table contains two additional columns referencing the key column of the vertex table. One of them identifies the source vertex and the other identifies the target vertex of an edge.
SAP HANA Graph provides a dedicated catalog object, called graph workspace, to define a graph in terms of the above created SAP HANA tables.
Graph WorkSpaces :
A graph workspace is a catalog object that defines a graph in terms of tables and columns:
- vertex table
- edge table
- key column in the vertex table
- key column in the edge table
- source vertex column in the edge table
- target vertex column in the edge table
Both vertex key and edge key columns need to be flagged as unique and NOT NULL. A graph workspace can be uniquely identified by the database schema it resides in and the workspace name. An SAP HANA instance can contain multiple graph workspaces in the same schema (with different workspace names) or different database schemas.
Graph workspace information is stored in the GRAPH_WORKSPACES system view.
We shall create the graph database by defining the above mentioned GraphWorkspace that helps us to connect the already created vertex and edges , there by to derive out various insights needed :
CREATE GRAPH WORKSPACE "INSURANCE_FRAUD"."GRAPH"
EDGE TABLE "INSURANCE_FRAUD"."RELATIONSHIPS"
SOURCE COLUMN "SOURCE"
TARGET COLUMN "TARGET"
KEYCOLUMN "KEY"
VERTEX TABLE "INSURANCE_FRAUD"."MEMBERS"
KEY
COLUMN
"NAME";
With the created Graph Database, let us fetch the information about a person(Vertex) as whether he is repeatedly involved(connected/Edge) or not in the accidents(vertex).
Neighborhood Search Algorithm :
To achieve the above requirement, let us create a calculation scenario with “Neighborhood Search” Algorithm to figure out the existence of connection between person(John) and accidents with the below create statement :
CREATE CALCULATION SCENARIO "INSURANCE_FRAUD"."GET_NEIGHBORHOOD_EXAMPLE" USING '
<?xml version="1.0"?>
<cubeSchema version="2" operation="createCalculationScenario" defaultLanguage="en">
<calculationScenario schema="INSURANCE_FRAUD" name="GET_NEIGHBORHOOD_EXAMPLE">
<calculationViews>
<graph name="get_neighborhood_node" defaultViewFlag="true" schema="INSURANCE_FRAUD" workspace="GRAPH" action="GET_NEIGHBORHOOD">
<expression>
<![CDATA[{
"parameters": {
"startVertices": ["John"],
"direction":"outgoing",
"minDepth": 0,
"maxDepth": 2
}
}]]>
</expression>
<viewAttributes>
<viewAttribute name="NAME" datatype="string"/>
<viewAttribute name="DEPTH" datatype="int"/>
</viewAttributes>
</graph>
</calculationViews>
</calculationScenario>
</cubeSchema>
' WITH PARAMETERS ('EXPOSE_NODE'=('get_neighborhood_node', 'GET_NEIGHBORHOOD_EXAMPLE'));
In the above calculation scenario we define the name of the action(Graph Algo),Parameters such as the start vertex from which we must get the nearest vertex with a depth of connection existing and also the direction to finalize whether it’s an incoming edge or an outgoing edge to be considered to find the nearest vertex.
This creates a calculation scenario with a single graph node with the GET_NEIGHBORHOOD action on the graph workspace “INSURANCE_FRAUD”.”GRAPH”.
This scenario traverses the underlying graph using all outgoing edges starting from vertex ‘John’ and returns vertices with minimum depth 0 and maximum depth of 2from the start vertex.
Execute the below query on the above created Calc Scenario to find the relationship of John/Person(Vertex) with Other Vertex(Cars and Accidents) :
SELECT * FROM "INSURANCE_FRAUD"."GET_NEIGHBORHOOD_EXAMPLE" ORDERBY "DEPTH";
![Nearest_Neighbourhood.PNG]()
Thus with Neighborhood search algorithm we are able to find the nearest vertices from the start vertex for a given depth.
In our example we were able to fetch the result of Cars and Accidents the Person ‘John’ is connected to in the fraud-insurace ring.
Result-set says, John has a direct relationship to Maruti and Audi Cars(As the Depth value is 1) and is having indirect relationship to accidents 1 and 2 via another vertex(As the Depth value is 2).
Shortest Path Algorithm:
This action provides the information for the shortest path from the starting vertex to all reachable vertices. Isomorphic Subgraph
The action GET_SHORTEST_PATHS_ONE_TO_ALL returns the shortest paths from the provided start vertex to all reachable vertices in the graph also known as single-source shortest path (SSSP). The resulting shortest paths form a tree structure with the start vertex at the root. All other vertices carry the shortest distance (smallest weight) information. The non-negative edge weights are read from the column provided in the edge table.
let us now create a calculation scenario with “Shortest Path” Algorithm to figure out the existence of connection from person(Tony) to all other vertices(cars, person, accidents) in the fraud-ring detected :
CREATE CALCULATION SCENARIO "INSURANCE_FRAUD"."SSSP_EXAMPLE" USING '
<?xml version="1.0"?>
<cubeSchema version="2" operation="createCalculationScenario" defaultLanguage="en">
<calculationScenario schema="INSURANCE_FRAUD" name="SSSP_EXAMPLE">
<calculationViews>
<graph name="sssp_node" defaultViewFlag="true" schema="INSURANCE_FRAUD" workspace="GRAPH" action="GET_SHORTEST_PATHS_ONE_TO_ALL">
<expression>
<![CDATA[{
"parameters": {
"startVertex": "Tony",
"outputWeightColumn": "DISTANCE"
}
}]]>
</expression>
<viewAttributes>
<viewAttribute name="NAME" datatype="string"/>
<viewAttribute name="DISTANCE" datatype="int"/>
</viewAttributes>
</graph>
</calculationViews>
</calculationScenario>
</cubeSchema>
' WITH PARAMETERS ('EXPOSE_NODE'=('sssp_node', 'SSSP_EXAMPLE'));
In this we define the name of the action(Graph Algo), Parameters such as the start vertex from which we must ge the all the connected vertices without a depth limit with its shortest path. It creates a calculation scenario with a single graph node using ‘Shortest-Path’ action on the graph workspace “INSURANCE_FRAUD”.”GRAPH”. This scenario traverses the underlying graph using all edges from start-vertex ‘Tony’ and returns all the connected vertices from the start-vertex with its shortest path.
Execute the below query on the above created Calc Scenario to find the relationship of John/Person(Vertex) with all Other Vertex(Cars and Accidents) in the fraud-ring:
SELECT * FROM "INSURANCE_FRAUD"."SSSP_EXAMPLE" ORDERBY "DISTANCE";
![SP.PNG]()
Isomorphic Subgraph Algorithm
This Action GET_ISOMORPHIC_SUBGRAPHS returns a projection of subgraphs within a given graph workspace that are isomorphic to a given subgraph pattern.
With the created Graph Database, let us get the information of a subgraph that are isomorphic to a sub-graph pattern in the insurance-fraud ring.
To achieve this, let us create a calculation scenario with “Isomorphic Subgraph” Algorithm to figure out the matching isomorphic subgraph.
In this we define the name of the action(Graph Algo),a subgraph pattern containing a set of vertex variables, a set of edge variables, a condition-clause, a projection list, an order-by list, a limit and an offset.
Let us define a pattern of Person ----(Who Advocates) Person ----(* any relation)----car(which is involved)----(accident) as a parameter to the Isomorphic subgraph algorithm.
CREATE CALCULATION SCENARIO "INSURANCE_FRAUD"."PATTERN_EXAMPLE" USING '
<?xml version="1.0"?>
<cubeSchema version="2" operation="createCalculationScenario" defaultLanguage="en">
<calculationScenario schema="INSURANCE_FRAUD" name="PATTERN_EXAMPLE">
<calculationViews>
<graph name="get_iso_subgraph_node" defaultViewFlag="true" schema="INSURANCE_FRAUD" workspace="GRAPH" action="GET_ISOMORPHIC_SUBGRAPHS">
<expression>
<![CDATA[{
"parameters": {
"pattern": {"version": "01.00.00",
"subgraph" : {
"vertexVariables" : ["A","B","C","D"],
"edgeVariables":["E1","E2","E3"],
"projection":[
{
"variable" :"A",
"attribute":"NAME",
"alias":"ANAME"
},
{
"variable" :"B",
"attribute":"NAME",
"alias":"BNAME"
},
{
"variable" :"C",
"attribute":"NAME",
"alias":"CNAME"
},
{
"variable" :"D",
"attribute":"NAME",
"alias":"DNAME"
}
],
"condition":{
"type":"AND",
"arguments" : [
{
"type":"-->",
"arguments" : ["E1","A","B"]
},
{
"type":"-->",
"arguments" : ["E2","B","C"]
},
{
"type":"-->",
"arguments" : ["E3","C","D"]
},
{
"type":"=",
"arguments" : [
{
"type" : ".",
"arguments" : ["E1","TYPE"]
},
{
"type" : "literal",
"datatype":"string",
"value":"Advocates"
}
]
}
]
}
}
}
}
}]]>
</expression>
<viewAttributes>
<viewAttribute name="ANAME" datatype="string"/>
<viewAttribute name="BNAME" datatype="string"/>
<viewAttribute name="CNAME" datatype="string"/>
<viewAttribute name="DNAME" datatype="string"/>
</viewAttributes>
</graph>
</calculationViews>
</calculationScenario>
</cubeSchema>'
WITH PARAMETERS ('EXPOSE_NODE'=('get_iso_subgraph_node','PATTERN_EXAMPLE'));
This pattern of parameter must provide the result of all the sub-graphs that starts from a person vertex who advocates another person which is extrapolated with all the relations from the former person to the cars that are involved in accidents in the fraud-ring.
Execute the below query on the above created Calc Scenario to find the relationship of John/Jane Person(Vertex) with all Other Vertex(Cars and Accidents) in the fraud-ring:
SELECT * FROM INSURANCE_FRAUD"."PATTERN_EXAMPLE";
Result set must provide all the sub-graph with a person who advocates(Tony in our case) to all other people(John,Jane) who has seek for his support and from there the graph is extended by the relationship of John and Jane to the Cars involved in accident as shown below :
![Isomorphic.PNG]()
Thus with Isomorphic Sub-Graph algorithm we are able to find all the required pattern of sub-graphs from the holistic graph set.
In our example we must be able to fetch the result all the sub-graphs that involved people who are advocated by Tony who are in-turn involved in some relationships(drivers, passengers, witnesses) to the cars that are involved in accident.
Above result gives 6 such sub-graphs of pattern:
- John advocated by Tony who drove Maruti Car involved in accident1.
- John advocated by Tony who witnessed for BMW car involved in accident1.
- Jane advocated by Tony who drove Audi car involved in accident2.
- Jane advocated by Tony who drove Audi car involved in accident3.
And similar such patterned sub-graphs.
Above calculation scenarios for various Graph Algorithms can also be created using hdbcalculation view's Graph Node in DB Module of XS Advance.
Here by, we complete the discussion on understanding and working of HANA as a Graph Database along with few of the Graph Algorithms that can be used on the graph data set.
For more information please refer to SAP HANA Graph Reference
Hope the above provided information is useful. Any suggestion and feedback for improvement will be much appreciated.
Thank you ![]()