Neo4j Graph Database

Author: Gao Date: 2019-10-11
Neo4j Graph Database
  1. Neo4j
    1. Graph Database
      1. Concepts
        1. Node
        2. Label
        3. Relationships
        4. Properties
        5. Traversals and paths
        6. Schema
          1. Indexes
          2. Constraints
      2. Comparison with SQL
      3. Features
      4. Technical View
        1. Object Cache
        2. Protocol
        3. Index-free adjacency
    2. CQL
      1. NULL
      2. Parameters
        1. String literal
        2. Regular expression
        3. Create node with properties
        4. Setting all properties on a node
        5. SKIP and LIMIT
        6. Multiple node ids
        7. Calling procedures
      3. MATCH
        1. Returns all the nodes in database
        2. Get all the nodes under a specific label
        3. Match by Relationship
        4. Variable length relationships
        5. Match a path
        6. OPTIONAL MATCH
      4. WHERE
        1. Filter by property
        2. WHERE Clause with Multiple Conditions
        3. Using Relationship with Where Clause
        4. String matching
        5. In
      5. CREATE
        1. Creating Nodes
          1. Create a simple node
          2. Create multiple nodes
          3. Create node with a label
          4. Create a node with multiple labels
          5. Create multiple nodes with multiple labels
          6. Create node with Properties
          7. RETURN newly created node
        2. Creating Relationships
          1. Creating a new relationship
          2. Creating a Relationship Between the Existing Nodes
          3. Creating a Relationship with Label and Properties
          4. Creating a Complete Path
      6. RETURN
        1. Returning Created Node
      7. Unique results
        1. Returning Multiple Nodes
        2. Returning Relationships
        3. Returning Properties
        4. Returning All Elements
        5. Returning a Variable With a Column Alias
      8. ORDER BY
        1. Ordering By a Property
        2. Ordering Nodes by Multiple Properties
        3. Ordering Nodes by Descending Order
      9. LIMIT
        1. Limit with expression
      10. SKIP
        1. Skipping the first 3 nodes
        2. Skip Using Expression
      11. WITH
      12. UNWIND
      13. MERGE
        1. Merging a Node with a Label
        2. Merging a Node with Properties
        3. OnCreate and OnMatch
        4. Merge a Relationship
      14. DELETE
        1. Deleting All Nodes and Relationships
        2. Deleting a Particular Node
      15. REMOVE
        1. Removing a Property
        2. Removing Label From a Node
      16. SET
        1. Setting a Property
        2. Setting Multiple Properties
        3. Setting a Label on a Node
        4. Setting Multiple Labels on a Node
      17. FOREACH
      18. Index
        1. Creating an Index
        2. Deleting an Index
      19. Constraint
        1. Unique node property constraints
        2. Node property existence constraints
        3. Relationship property existence constraints
      20. String Functions
        1. UPPER
        2. LOWER
        3. SUBSTRING
        4. Replace
      21. Aggregation Function
        1. COUNT
          1. Group Count
        2. MAX
        3. MIN
        4. SUM
        5. AVG
      22. List Functions
        1. EXTRACT
        2. FILTER
        3. KEYS
        4. LABELS
        5. NODES
        6. RANGE
        7. REDUCE
        8. RELATIONSHIPS
        9. REVERSE
        10. TAIL
      23. Scalar functions
        1. COALESCE
        2. STARTNODE
        3. ENDNODE
        4. HEAD
        5. LAST
        6. SIZE
        7. LENGTH
        8. ID
        9. TYPE
        10. PROPERTIES
        11. RANDOMUUID
        12. TIMESTAMP
        13. TOBOOLEAN
        14. TOFLOAT
        15. TOINTEGER
      24. Procedures
        1. Check which procedures are available
    3. Data Import
    4. Query management
      1. Profiling a query
        1. Explain
        2. PROFILE
      2. Kill Queries
    5. UIs
      1. Neo4j Desktop
        1. Plugin
      2. Neo4j Browser
        1. Browser Sync
      3. Neo4j Bloom
    6. Neo4j Sandbox
    7. Neo4j Cluster

Neo4j

Neo4j is an open source, NoSQL, native graph databases. In Neo4j, the data are persisted for long term durability. Neo4j can be used with both the open-source license (Community Edition) or a commercial license (Enterprise Edition) which includes technical support.

Graph Database

Graph database is a database used to model the data in the form of graph. The model represents data in Nodes, Relationships and Properties. Unlike relational database, graph databases store relationships and connections as first-class entities.

Concepts

Simple Graph Diagram

Node

Nodes are often used to represent entities

Node

Label

Labels are used to shape the domain by grouping nodes into sets. With that in place, you can ask to perform operations only on your given label nodes. Since labels can be added and removed during runtime, they can also be used to mark temporary states for nodes. A node can have zero to many labels. Labels are used associate indexes and constraints with groups of nodes.

Relationships

A structure with a name and direction that describes the relationship between two nodes and provides structure and context to the graph. A relationship connects two nodes or one nodes(self-reference), it must have exactly one relationship type. Relationships always havs a directions, for each Relationship contains From Node and To Node.

Relationship

Properties

Nodes and relationships define the graph while properties add context by storing relevant information in key-value pairs in the nodes and relationships that are used to add qualities to nodes and relationships.

Property values can be

  • Number, an abstract type, which has the subtypes Integer and Float
  • String
  • Boolean
  • The spatial type Point
  • Temporal types: Date, Time, LocalTime, DateTime, LocalDateTime and Duration

Traversals and paths

Traversing a graph means visiting nodes by following relationships according to some rules. The traversal result could be returned as a path.

Traversal

Schema

A schema in Neo4j refers to indexes and constraints.

Indexes

Indexes are used to increase performance.

Constraints

Constraints are used to make sure that the data adheres to the rules of the domain. This is optional, you can create data without defining a schema.

Comparison with SQL

| RDBMS | Graph Database |
| ————- -|:—————-:|
| Tables | Graphs |
| Rows | Nodes |
| Columns | Properties |
| Cells | Values |
| Constraints| Relationships|
| Joins | Traversal |

Features

  • Constant time traversals
  • Scalability: Users can scale the database by increasing the number of reads/writes and volume without effecting the query speed and data integrity.
  • Cypher Query Language (CQL): a declarative query language similar to SQL, but designed for graph pattern matching and traversals
  • Flexible Schema: easily change according to the requirement, property graph schema that can adapt over time
  • ACID Properties: Atomicity, Consistency, Isolation, and Durability
  • Reliability: It also support for Replication for data safety and reliability.
  • Built-in Neo4j browser web applications to create and retrieve graph data
  • Driver Support: Java, Go, Python

Technical View

Object Cache

The object cache caches individual relationships and nodes and respectively their properties in a form which is optimized for traversal of the graph.

Protocol

The Neo4j Browser and the official Neo4j Drivers use the Bolt database protocol to communicate with Neo4j. From an application or from the Neo4j Browser, you can execute query statements.

Connector name Protocol Default port number
dbms.connector.bolt Bolt 7687
dbms.connector.http HTTP 7474
dbms.connector.https HTTPS 7473

Bolt is an efficient binary protocol for access to the database layer that compresses data sent over the wire as well as encrypting the data.

Index-free adjacency

Index-free adjacency means that the query engine uses pointers to traverse paths (nodes connected by relationships) in the graph which is very fast.

CQL

Neo4j has CQL as query language, CQL stands for Cypher Query Language. You can practise your CQL skill in Neo4j Sandbox.

NULL

NULL is used to represent missing or undefined values.

Parameters

Cypher supports querying with parameters for.

  • literals and expressions
  • node and relationship ids
  • for explicit indexes only: index values and queries

String literal

MATCH (n:LabelOfNode) WHERE n.property = $value RETURN n

Regular expression

MATCH (n:LabelOfNode) WHERE n.property ~= $regex RETURN n

Create node with properties

CREATE ($props)

Setting all properties on a node

MATCH (n:LabelOfNode) WHERE n.property=value SET n = $props

SKIP and LIMIT

MATCH (n:LabelOfNode) RETURN n.property SKIP $s LIMIT $l

Multiple node ids

MATCH (n) WHERE id(n) IN $ids RETURN n.property

Calling procedures

CALL db.resampleIndex($indexname)

MATCH

Returns all the nodes in database

MATCH (node) RETURN node

Get all the nodes under a specific label

MATCH (node:LabelOfNode) RETURN node

Match by Relationship

MATCH (node1)-[:Relationship_Of]->(node2) RETURN node1
MATCH (node1)-[:Relationship_Of]->(node2) RETURN node2
MATCH (node1)<-[:Relationship_Of]-(node2) RETURN node1
MATCH (node1)<-[:Relationship_Of]-(node2) RETURN node2
MATCH (node1)-[:Relationship_Of]-(node2) RETURN node1
MATCH (node1)-[:Relationship_Of]-(node2) RETURN node2

Variable length relationships

Returns all node2 related to node1 by 1 to 3 hops

MATCH (node1:LabelOfNode1)-[:Relationship_Of*1..3]->(node2:LabelOfNode2) RETURN node2

Match a path

MATCH p =(a)-->(b)-->(c)
RETURN p

OPTIONAL MATCH

The OPTIONAL MATCH clause is used to search for the pattern described in it, while using nulls for missing parts of the pattern, it could be considered the equivalent of the outer join in SQL

MATCH (node1:LabelOfNode {properties...}) 
OPTIONAL MATCH (node1)-[:Relationship_Of]->(node2)
RETURN node2

MATCH (node1:LabelOfNode {properties...}) 
OPTIONAL MATCH (node1)-->(node2)
RETURN node2

WHERE

Filter by property

MATCH (node)
WHERE node.property = value
RETURN node

WHERE Clause with Multiple Conditions

MATCH (node)  
WHERE node.property1 = value1 AND node.property2 = value2 
RETURN node

Using Relationship with Where Clause

MATCH (node)
WHERE (node)-[:Relationship_Of]->({properties}) 
RETURN node

String matching

MATCH (n) WHERE n.property STARTS WITH value RETURN n
MATCH (n) WHERE n.property ENDS WITH value RETURN n
MATCH (n) WHERE n.property CONTAINS value RETURN n

In

MATCH (n) WHERE n.property IN [value1, value2[,...]] RETURN n

CREATE

Creating Nodes

Create a simple node

The node name in CQL are actually variables. You can assign a node (or a relationship) to a variable to handle this node in the rest of the Cypher query.

CREATE (node);
Create multiple nodes
CREATE (node1),(node2);
Create node with a label
CREATE (node:LabelOfNode) 
Create a node with multiple labels
CREATE (node[:LabelOfNode1[:LabelOfNode2[...]]]) 
Create multiple nodes with multiple labels
CREATE (node1[:LabelOfNode1[:LabelOfNode2[...]]]), (node2[:LabelOfNode1[:LabelOfNode2[...]]])[,...];
Create node with Properties
CREATE (node:LabelOfNode {key1:value[, key2:value,[...]]})
RETURN newly created node
CREATE (Node:LabelOfNode{properties}) RETURN Node

Creating Relationships

Creating a new relationship
CREATE (node1)
CREATE (node2)
CREATE (node1)-[:Relationship_Of]->(node2) 
Creating a Relationship Between the Existing Nodes
MATCH (node1:LabeofNode1), (node2:LabeofNode2)
WHERE node1.property1 = value1 AND node2.property2 = value2 
CREATE (node1)-[:Relationship_Of]->(node2)
Creating a Relationship with Label and Properties
CREATE (node1)-[variable:Rel_Type {key1:value1, key2:value2, . . . n}]->(node2)
Creating a Complete Path
CREATE p = (node1 {properties})-[:Relationship_Of1]->(node2 {properties})[:Relationship_Of2]->(node3 {properties}) 
RETURN p

RETURN

Returning Created Node

Create (node:LabelOfNode {properties}) 
RETURN node

Unique results

Create (node:LabelOfNode {properties}) 
RETURN DISTINCT node

Returning Multiple Nodes

CREATE (node1:LabelOfNode1 {properties}), (node2:LabelOfNode2 {properties})
RETURN node1, node2

Returning Relationships

CREATE (node1)-[r:Relationship_Of]->(node2) RETURN r

Returning Properties

MATCH (node:LabelOfNode {properties}) RETURN node.property

Returning All Elements

MATCH p = (node1)-[:Relationship_Of]-(node2) RETURN COUNT(*)

MATCH p = (node1)-[:Relationship_Of]->(node2) RETURN COUNT(*)

Returning a Variable With a Column Alias

MATCH (node:LabelOfNode {properties}) RETURN node.property AS alias

ORDER BY

Ordering By a Property

MATCH (n)  
RETURN n.property1, n.property2[,...]
ORDER BY n.property1

Ordering Nodes by Multiple Properties

MATCH (n)
RETURN n 
ORDER BY n.property1, n.property2

Ordering Nodes by Descending Order

MATCH (n)
RETURN n
ORDER BY n.property DESC

LIMIT

The LIMIT clause is used to limit the number rows returned from the query or passed to other parts of a query.

MATCH (n) 
RETURN n 
ORDER BY n.property
LIMIT 1

Limit with expression

MATCH (n) 
RETURN n 
ORDER BY n.property
LIMIT toInt(3 * rand()) + 1

SKIP

The SKIP clause is used to define from which row to start including the rows in the output.

Skipping the first 3 nodes

MATCH (n)  
RETURN n.property1, n.property2 
ORDER BY n.property1 DESC 
SKIP 3

Skip Using Expression

MATCH (n)  
RETURN n.property1, n.property2 
ORDER BY n.property1 DESC 
SKIP toInt(3 * rand()) + 1

WITH

WITH clause is used to perform some intermediate processing during a query where you may want to save some results or test some values during the query to control whether a query will end.

MATCH (n)
WITH n
ORDER BY n.property
RETURN collect(n.property)

UNWIND

The UNWIND clause is used to unwind a list into a sequence of rows.

UNWIND [a, b, c, d] AS x 
RETURN x

MERGE

MERGE command is a combination of CREATE command and MATCH command. It ensures that a patterns exists in the graph, either the pattern already exists or it is created.

Merging a Node with a Label

MERGE (node:LabelOfNode) RETURN node

Merging a Node with Properties

MERGE (node:LabelOfNode {key1:value[, key2:value, key3:value[...]]})

OnCreate and OnMatch

Whenever, we execute a merge query, a node is either matched or created. Using on create and on match, you can set properties for indicating whether the node is created or matched.

MERGE (node:LabelOfNode {properties}) 
ON CREATE SET node.isCreated ="true" 
ON MATCH SET node.isFound ="true"

Merge a Relationship

MATCH (node1:LabelOfNode1), (node2:LabelOfNode2) 
    WHERE node1.property1 = value1 AND node2.property2 = value2
    MERGE (node1)-[:Relationship_Of]->(node2) 
RETURN node1, node2 

DELETE

Nodes cannot be deleted if they still have relationships attached to them.

Deleting All Nodes and Relationships

MATCH (node) DETACH DELETE node

Deleting a Particular Node

MATCH (node:LabelOfNode)
WHERE node.property = value 
DETACH DELETE node

REMOVE

Removing a Property

MATCH (node:LabelOfNode)
WHERE node.property1 = value1 AND node.property2 = value2 
REMOVE node.property3
RETURN node

Removing Label From a Node

MATCH (node:LabelOfNode)
WHERE node.property1 = value1 AND node.property2 = value2 
REMOVE node:LabelOfNode1[:LabelOfNode2[:LabelOfNode3...]]
RETURN node

SET

Using Set clause, you can add new properties to an existing Node or Relationship, and also add or update existing Properties values

Setting a Property

MATCH (node:LabelOfNode)
WHERE node.property1 = value1 [AND node.property2 = value2[...]]
SET node.property = value
RETURN node

Setting Multiple Properties

MATCH (node:LabelOfNode)
WHERE node.property1 = value1 [AND node.property2 = value2[...]]
SET node.property3 = value3, node.property4 = value4
RETURN node

Setting a Label on a Node

MATCH (node:LabelOfNode)
WHERE node.property1 = value1 [AND node.property2 = value2[...]]
SET node:LabelOfNode
RETURN node

Setting Multiple Labels on a Node

MATCH (node:LabelOfNode)
WHERE node.property1 = value1 [AND node.property2 = value2[...]]
SET node:LabelOfNode1, LabelOfNode2
RETURN node

FOREACH

The FOREACH clause is used to update data within a list whether components of a path, or result of aggregation.

MATCH p = (node1)-[*]->(node2) 
WHERE node1.property1 = value1 AND node2.property2 = value2 
FOREACH (n IN nodes(p)| SET n.marked = TRUE)

Index

A database index is a redundant copy of some of the data in the database for the purpose of making searches of related data more efficient. This comes at the cost of additional storage space and slower writes.

Creating an Index

CREATE INDEX ON:LabelOfNode(property[,properties..])

Deleting an Index

DROP INDEX ON:LabelOfNode(property[,properties..])

Constraint

Constraint enforce data integrity, it can be applied to either nodes or relationships.

Unique node property constraints

Unique property constraints is a rule that ensures that property values are unique for all nodes with a specific label

CREATE CONSTRAINT ON (n:LabelOfNode) ASSERT n.property IS UNIQUE
DROP CONSTRAINT ON (n:LabelOfNode) ASSERT n.property IS UNIQUE

Node property existence constraints

CREATE CONSTRAINT ON (n:LabelOfNode) ASSERT exists(n.property)
DROP CONSTRAINT ON (n:LabelOfNode) ASSERT exists(n.property)

Relationship property existence constraints

CREATE CONSTRAINT ON ()-[r:Relationship_Of]-() ASSERT exists(r.property)
DROP CONSTRAINT ON ()-[r:Relationship_Of]-() ASSERT exists(r.property)

String Functions

UPPER

LOWER

SUBSTRING

Replace

Aggregation Function

COUNT

The COUNT function is used to count the number of rows.

MATCH (node) RETURN COUNT(*)

The following state doesn’t counts null values.

MATCH (node) RETURN COUNT(node)
Group Count

The COUNT clause is also used to count the groups of relationship types. We can count by one direction or by both direction.

Match(node1)-[r]-(node2)  
RETURN type(r), count(*)

Match(node1)-[r]->(node2)  
RETURN type(r), count(*)

MAX

MIN

SUM

AVG

List Functions

EXTRACT

Extract returns a list containing the values resulting from an expression which has been applied to each element in a list list

EXTRACT(variable IN list | expression)

FILTER

FILTER returns a list lresult containing all the elements from a list list that comply with the given predicate.

FILTER(variable IN list WHERE predicate)

KEYS

KEYS returns a list containing the string representations for all the property names of a node, relationship, or map.

KEYS(expression)

LABELS

LABELS returns a list containing the string representations for all the labels of a node.

LABELS(node)

NODES

NODES returns a list containing all the nodes in a path.

NODES(path)

RANGE

RANGE returns a list comprising all integer values within a range bounded by a start value start and end value end, where the difference step between any two consecutive values is constant

RANGE(start, end [, step])

REDUCE

REDUCE returns the value resulting from the application of an expression on each successive element in a list in conjunction with the result of the computation thus far.

REDUCE(accumulator = initial, variable IN list | expression)

RELATIONSHIPS

RELATIONSHIPS returns a list containing all the relationships in a path.

RELATIONSHIPS(path)

REVERSE

REVERSE returns a list in which the order of all elements in the original list have been reversed.

REVERSE(original)

TAIL

TAIL returns a list lresult containing all the elements, excluding the first one, from a list list.

TAIL(list)

Scalar functions

COALESCE

COALESCE returns the first non-null value in the given list of expressions.

COALESCE(expression [, expression]*)

STARTNODE

STARTNODE returns the start node of a relationship.

STARTNODE(relationship)

ENDNODE

ENDNODE returns the end node of a relationship.

ENDNODE(relationship)

HEAD returns the first element in a list.

HEAD(list)

LAST

LAST returns the last element in a list.

LAST(expression)

SIZE

SIZE returns the number of elements in a list or set of results or string.

SIZE(list or pattern expression or string)

LENGTH

LENGTH returns the length of a path.

LENGTH(path)

ID

ID returns the id of a relationship or node.

MATCH (n) WHERE ID(n)=id RETURN n

TYPE

TYPE returns the string representation of the relationship type.

TYPE(relationship)

PROPERTIES

PROPERTIES returns a map containing all the properties of a node or relationship.

PROPERTIES(expression)

RANDOMUUID

TIMESTAMP

TOBOOLEAN

TOFLOAT

TOINTEGER

Procedures

Check which procedures are available

CALL dbms.procedures()
  • db.indexes() List all indexes in the database.
  • db.labels() List all labels in the database
  • db.propertyKeys() List all property keys in the database.
  • db.relationshipTypes() List all relationship types in the database.
  • db.schema() Show the schema of the data.
  • db.constraints() List all constraints in the database.

Data Import

We can use LOAD CSV to batch imports of large amouts of data from CSV files. We can use Neo4j ETL Tool or Neo4j Import tool.

LOAD CSV FROM "url" AS row FIELDTERMINATOR ";"

Query management

Profiling a query

Explain

If you want to see the execution plan but not run the statement.

PROFILE

If you want to run the statement and see which operators are doing most of the work, use PROFILE. This will run your statement and keep track of how many rows pass through each operator, and how much each operator needs to interact with the storage layer to retrieve the necessary data.

Kill Queries

When you have a query that is taking too long to execute on your system, You can execute

CALL dbms.listQueries()

Which return the list of queries currently running. And then

CALL dbms.killQuery('query-id')

UIs

Neo4j Desktop

In Neo4j Desktop, you can only start a single database.

Plugin

If you need additional functionality from a specialized library, you can add the library as a plugin for your project.

Neo4j Browser

You can only use Neo4j Browser to connected to a running Neo4j instance and you can only connect to a single database at a time.

Browser Sync

When you log in to Browser Sync from your Neo4j Browser, you can add folders and Cypher scripts in the cloud for use in a different Neo4j Browser session.

Neo4j Bloom

Neo4j Sandbox

It is a temporary Neo4j instance in the cloud where you can access a database for 3 to 10 days. By default it will also install APOC, Graph Algorithms and GraphQL.

Neo4j Cluster

Neo4j clusters allow your application to be highly available so that if a server goes down, another server will take over and also highly scalable so that application that read the data can be distributed in many places.