We’re excited to bring Transform 2022 back to life on July 19th and virtually July 20-28. Join AI and data leaders for sensible conversations and exciting networking opportunities. Register today!
Graph databases are playing a growing role in improving fraud detection, recommendation engines, top priority, digital twins and old-fashioned analysis. But they suffer from issues of performance, scalability and reliability compared to traditional databases. Emerging graph database benchmarks are already helping to overcome these barriers.
For example, TigerGraph recently used these criteria to scale its database to support 30 terabytes. ,TB of graph data), 1 TB in 2019 and 5 TB in 2020. David Ronald, director of product marketing at TigerGraph, told VentureBeat that TigerGraph uses the LDBC benchmark to test its engine performance and storage footprint after each release. If he sees a decline, the results help him understand where to find the problems. The Tigergraph team also collaborates with hardware vendors to run a benchmark on their hardware.
This is important, especially since the enterprise is currently looking for ways to operate the data in databases, data warehouses, and data lakes that represent the entities known as vertices and the connections between them, known as edges. “With the ongoing digital transformation, more and more enterprises have hundreds of billions of vertices and hundreds of billions of shores,” Ronald said.
Don of the Graph Benchmark
The European Union tasked researchers with creating the Linked Data Benchmark Council (LDBC) to evaluate the performance of graph databases for the tasks required to address these limitations. These criteria help graph database vendors identify vulnerabilities in their current architecture, how they implement queries, and identify problems in scaling to solve common business problems. They can also help the enterprise test the performance of the databases they want to address in relation to common business issues.
Peter Bonks, a professor at Vridge University and founder of LDBC, told VentureBeat that these benchmarks help the system achieve and maintain performance. LDBC members include leading graph database vendors such as TigerGraph, Neo4J, Oracle, AWS and Ant Group. These companies continue to use benchmarks as an internal test for their systems. Benchmarks also point to difficult areas, such as finding paths in graphs, pattern matching in graphs, join ordering, and query optimization. “In order to perform well on these criteria, systems need to adopt at least the most advanced in these areas if not to extend the state of the art,” Bonks said.
Bonks also saw the various other benefits of LDBC cooperation. For example, the LDBC collaboration has helped drive the graph data model and the standardization of query languages. This standardization helps simplify the definition of benchmarks and is valuable to users and accelerates the maturity of the field. LDBC members also venture beyond benchmarking to launch a task force on graph schema language and graph query language. LBDC has also begun collaborating with the ISO Working Group on SQL Standards. As a result of these efforts, Boncz expects updated SQL: 2023 standards to include graph query functionality (SQL / PGQ – Property Graph Query) and the release of a completely new standard graph query language called GQL.
Types of benchmarks
LDBC has developed three types of benchmarks for different use cases:
The Social Networking Benchmark (SNB) suite applies most directly to general enterprise use cases. It targets general graph database management systems and supports both interactive and business intelligence workloads. It mimics the types of analytics ventures with fraud detection, product recommendations and lead generation algorithms. The scale factor includes the largest SNB dataset at 30k, processing 36 TB of data with 72.6 billion vertices and 533.5 billion edges.
Graphalytics Benchmark is an industrial-grade benchmark for graph analysis. This benchmark can test datasets with 100 million vertices and 9.4 billion edges. This is good for measuring classic graph algorithms like page rank and community search. The machine learning and AI community are embracing the model to improve its accuracy.
The semantic publishing benchmark uses an older web data schema called RDF. It is based on a case of the BBC’s use of RDF’s early adopters. “Most graph system growth has revolved around property graph data models, not RDFs,” Bonkz said. As a result, social SNBs have gained significantly more attention in terms of property graph data.
Plan for real-world use cases
Graph databases are an excellent tool to help vendors improve their equipment and to evaluate the veracity of vendor claims using apple-to-apple comparisons for the enterprise. “But raw performance doesn’t tell the whole story of any technology, especially in the granular world of graph databases,” said Greg Seat, VP of the product at the blockchain graph database, Flurry.
For example, small to medium enterprises do not have to regularly process millions of graph structures, called triplets, every second. They can see more benefits from advanced value-added features like transaction blockchain, level-2 off-chain storage, data rejection, intercompatibility, standards support, origin and time-travel query capabilities, which require more processing than just direct graphs. Or other NoSQL stores.
As long as the performance of the graph storage platform is the right size for the enterprise, and the capabilities it also meets the needs of the enterprise, the performance beyond a certain point, however, is not as crucial as it is appropriate. “Not every graph database needs to be a Formula One race car,” Seat said. There are many cases of industrial needs and domain use that are better served by trucks and panel vans with features and functionality to support the required enterprise operations. ”
Preparing for graph data
Machine learning and database benchmarks have played a tremendous role in parameterizing those tools. Graph database experts hope that better benchmarks can play a similar role in the evolution of graph databases. Ronald Verticals sees the need for more graph database benchmarks. For example, there are many interesting query patterns in the financial sector that have not been captured by the LDBC-SNB benchmark. “We hope that there will be more benchmark studies in the future, as this will result in more awareness about the related features of various graph databases and faster adoption of graph technology,” he said.
Boncz wants to see more Audited benchmark results for existing social network benchmarks. LDBC has shown interesting results for the interactive workload benchmark. LDBC is now ending the second benchmark for the business intelligence workload. Bonks suggested examining parties interested in the upcoming LDBC technical user community meeting, which coincides with the ACM SIGMOD 2022 conference in Philadelphia. “This event is the perfect place to respond to the benchmark and learn about new trends,” he said.
Venturebeat’s mission Digital Town Square is set to become a place for technical decision makers to gain knowledge about the changing enterprise technology and practices. Learn more about membership.