Design of Formal Query Languages and Schemas for Graph Databases

aut.embargoNoen_NZ
aut.thirdpc.containsNoen_NZ
dc.contributor.advisorSinha, Roopak
dc.contributor.advisorLitchfield, Alan
dc.contributor.advisorJohnson, Kenneth
dc.contributor.authorSharma, Chandan
dc.date.accessioned2021-11-21T23:31:58Z
dc.date.available2021-11-21T23:31:58Z
dc.date.copyright2021
dc.date.issued2021
dc.date.updated2021-11-19T10:05:35Z
dc.description.abstractThe industry-wide adoption of graph databases has been hindered due to the lack of a standard query language. Hence projects such as ISO/IEC 39075 have been proposed to integrate features from existing graph query languages, including Cypher, PGQL and G-Core. Integrating existing query languages requires a systematic comparison so that exclusive characteristics can be identified. Comparisons can be conducted by using graph query language benchmarks which are built on theoretical language formalisms. Literature suggests that existing theoretical language formalisms for graph databases are not expressive enough in formulating different data retrieval queries. Existing benchmarks also utilize the topological information stored in the graph schema to formulate queries for comparing graph query languages. Contemporary graph databases including Neo4j, Oracle and, Apache Tinkerpop, are either schema-less or schema optional to support frequent changes in the structure of data found in domains requiring high flexibility. However, the absence of robust graph schema impacts data consistency, integrity, and analytics in graph databases. Lack of expressive theoretical language formalisms and robustly-defined graph schemas are the two open problems in current graph database research. This thesis contributes towards solving these problems. We propose novel formalisms of conjunctive and union of conjunctive queries extended with Tarski’s algebra that are more expressive than existing theoretical language formalisms. The formalisms are then used to formulate benchmark queries for comparing the expressiveness of Cypher and PGQL on the two core features of graph pattern matching and graph navigation, revealing the standard and exclusive characteristics for these languages. We present a formal algebra FLASc that assists in formulating robust graph schemas. We consider three case studies related to domains such as cyber-physical systems, big data analytics and tourism. These case studies illustrate the use of FLASc for transforming and loading data-sets for heterogeneous sources into graph databases such as Neo4j, thereby ensuring data consistency and integrity. Findings from this research suggest that formally defined graph schemas help generate efficient benchmark queries, facilitating the comparison of existing graph query languages. Furthermore, graph schemas are vital for ensuring better data manageability and developing future graph query languages to support data definition and data retrieval mechanisms from graph databases. Overall, our study serves as a formal basis for generating robust graph schemas and developing benchmarks for comparing existing graph query languages. This study assists in moving towards query language integration and interoperability between available graph database technologies; therefore, it serves as a basis for upcoming standards such as ISO/IEC 39075.en_NZ
dc.identifier.urihttps://hdl.handle.net/10292/14691
dc.language.isoenen_NZ
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.subjectFormal query langaugesen_NZ
dc.subjectGraph schemasen_NZ
dc.subjectGraph databasesen_NZ
dc.subjectFormal algebraen_NZ
dc.titleDesign of Formal Query Languages and Schemas for Graph Databasesen_NZ
dc.typeThesisen_NZ
thesis.degree.grantorAuckland University of Technology
thesis.degree.levelDoctoral Theses
thesis.degree.nameDoctor of Philosophyen_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SharmaC.pdf
Size:
3.06 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
889 B
Format:
Item-specific license agreed upon to submission
Description:
Collections