StreamNative releases report with insights into data streaming ecosystem

We’re excited to bring Transform 2022 back to life on July 19th and virtually July 20-28. Join AI and data leaders for sensible conversations and exciting networking opportunities. Register today!

The appeal of real-time data processing is growing. Historically, organizations adopting the streaming data paradigm have been driven by use cases such as application monitoring, log aggregation, and data transformation (ETL).

Organizations like Netflix are early adopters of the streaming data paradigm. Today, there are more and more drivers for increasing adoption. In Lightband’s 2019 survey, Streaming Data and Future Tech Stack, Artificial Intelligence (AI) and Machine Learning (ML), the integration of multiple data streams, and analytics are beginning to compete with these historic use cases.

The streaming analytics market (which, by definition, could be just one segment of the streaming data market) is projected to grow from $ 15.4 billion in 2021 to $ 50.1 billion in 2026, with a compound annual growth rate (CAGR) of 26.5%. Forecast period according to markets and markets.

Again, historically, there has been a kind Real Standard for streaming data: Apache Kafka. Kafka and Confluent, the company that commercializes it, have a consistent success story, with Confluent secretly filing for an IPO in 2021.

In 2018, more than 90% of people in the Confluent Survey rated Kafka as mission-critical for their data infrastructure, and questions on stack overflow increased by more than 50% during the year. No matter how successful and widely adopted Kafka is, the fact remains: Kafka was founded in 2008.

A set of streaming data options, with each specific focus and approach, has emerged over the last few years. One of those options is the Apache Pulsar. In 2021, Pulsar has been ranked as one of the Top 5 Apache Software Foundation projects and has surpassed Apache Kafka among the monthly active contributors.

The streaming company, founded by the original developers of Apache Pulsar and Apache Bookkeeper, has just released a report comparing the performance standards of Apache Pulsar and Apache Kafka. StreamNative offers a fully powered Pulsar-a-A service cloud and enables the enterprise to “access data as real-time event streams”.

Pulsar Vs. Kafka

StreamNative is not the first company to be founded around Pulsar. Streamlio, another company founded by Pulsar Core Committers, was acquired by Splink in 2019. Today, the two founders of Streamlio, CG Guo and Matteo Merley, serve as CEO and CTO of Streamnative, respectively.

As Addison Higham, chief architect of StreamNative and head of cloud engineering, shared, the company focuses on aspects such as bottom-up, community-based approaches and technological development, documentation and training. Pulsar is used on the likes of Tencent, Verizon, Intuit and Flipkart, the latter two are also streaming clients.

StreamNative has grown significantly in 2021. It raised $ 23.7 million in Category A funding, increased its team from 30 to over 60 in North America, EMEA and Asia, and saw its revenue grow six times and 3X growth in adoption, faster marketplace integration by AWS, SQL support and other updates. Its community also doubled and Pulsar surpassed the 10,000 star mark on GitHub.

Heheim said the question of how the pulsar compares to Kafka gets him a lot. The last widely published Pulsar vs. Kafka benchmark was made in 2020 and much has changed since then. That’s why the engineering team at Streamnative studied the benchmark using the Linux Foundation Open Messaging benchmark.

According to StreamNative’s benchmark, the Pulsar can achieve 2.5 times the maximum throughput compared to Kafka. The Pulsar offers consistent single-digit publish latency which is 100 times less than Kafka at P99.99 (ms). Less published latency is important because it enables the system to quickly deliver messages to the message bus.

With a historical reading rate that is 1.5 times faster than Kafka’s, Pulsar’s use could catch applications using their messaging system after half an hour of unexpected interruptions. That said, we should note that benchmarks from all vendors and especially vendors should be viewed as indicators.

Furthermore, as Streamnative also notes, the report focuses on comparing technical operations as a whole. While obviously important, it is not as important in evaluating options as Haim admits. Many third parties have sued Pulsar v. Kafka comparison.

In many situations, Pulsar and Kafka can behave alike, Hehm said. Where streaming attempts to differentiate with Pulsar is in the areas of management and developer experience.

Architecture and position of the pulsar

Hiham referred to the legacy of the Pulsar as a messaging-oriented platform, which was later developed to address streaming and events. This is reflected in the Pulsar’s API, and Higham thinks this makes it easy for developers to adopt. When the Pulsar is not directly compatible with Kafka, a feature called Protocol Handler enables it to interact with other system APIs, with Kafka implementation being the main feature.

Heheim said he regularly contacts companies that use streaming Kafka and found that they have a huge spread of hundreds or even thousands of Kafka clusters, about one per application, which is not very cost-effective. The multi-tenancy built-in Pulsar is designed to securely share workloads and is extremely valuable on the scale, Hyme added, while also emphasizing features such as geo-replication.

Pulsar Trino also offers SQL access for data streaming as well as data transformation pulsar function in languages ​​like Go, Java and Python. The latest version of Pulsar is 2.9.1, however, when version 2.8 was released, the Pulsar team published a technical blog detailing the architecture of Pulsar and we refer interested readers there.

Streamtive claims that its protocol handler framework provides not only a clear migration path from Kafka, but also integration with other systems and protocols such as RocketMQ, AMQP and MQTT. Heham noted that with the emphasis on support for the Kafka API, streaming is coming soon to the cloud.

Streaming Cloud is the main revenue driver of Streaming. Managed cloud offering streaming offers for security and integration functionality, including platforms such as Flink, Spark and Delta Lake, add value to the Apache Pulsar in addition to supporting both.

Comparing Pulsar’s other offerings in that space, such as Apache Flink or Spark Streaming, to CA, Heheim said that the Pulsar doesn’t really focus on trying to create something similar from that streaming computer engine.

What they are focusing on is “a great building integration story [the] The best breed connector that is very flexible, easy to use and 80% of cases of easy use of single message transformation, ”said Hiham. The Pulsar bears more resemblance to the Red Panda, as they aim to solve some of the pain points, but some of those pain points sit not only in implementation, but also in the underlying protocol, Hiham claims.

Venturebeat’s mission Transformative Enterprise is about to become a digital town square for technology decision makers to gain knowledge about technology and transactions. Learn more about membership.

Similar Posts

Leave a Reply

Your email address will not be published.