Apache Spark

An in-memory, open-source engine for processing large volumes of data that supports data science, data engineering, and SQL workloads on single nodes or clusters.

Added Perspectives

The Spark platform prepares the data in micro-batches to be consumed by the HDInsight data lake, SQL data warehouse, and various other internal and external subscribers. These targets subscribe to topics that are categorized by source tables. With this CDC-based architecture, StartupBackers is now efficiently supporting real-time analysis without affecting production operations.

- Kevin Petrie in Best Practices for Real Time Data Pipelines with Change Data Capture and Spark

August 8, 2018 (Blog)

Relevant Content

Blog

Streaming, Spark, and Governance Top Themes at Strata+Hadoop World NYC

Oct 05, 2015 - The east coast confab for big data—otherwise known as Strata+Hadoop World in New York City—was abuzz with the digital literati who were treated to...

Report

Streaming-First Architectures Building the Real-Time Organization

Jun 30, 2019 - Having the right data at the right time is essential for organizations to compete.

Related Terms

Enterprise Service Bus

Change Data Capture Replication

Report