Return to Glossary

Apache Spark

An in-memory, open-source engine for processing large volumes of data that supports data science, data engineering, and SQL workloads on single nodes or clusters.

Added Perspectives

The Spark platform prepares the data in micro-batches to be consumed by the HDInsight data lake, SQL data warehouse, and various other internal and external subscribers. These targets subscribe to topics that are categorized by source tables. With this CDC-based architecture, StartupBackers is now efficiently supporting real-time analysis without affecting production operations.

- Kevin Petrie in Best Practices for Real Time Data Pipelines with Change Data Capture and Spark

August 8, 2018 (Blog)

Relevant Content

Oct 05, 2015 - The east coast confab for big data—otherwise known as Strata+Hadoop World in New York City—was abuzz with the digital literati who were treated to...

Jun 30, 2019 - Having the right data at the right time is essential for organizations to compete. 

Related Terms

Datalere

Unleash The Power Of Your Data

Providing modern, comprehensive data solutions so you can turn data into your most powerful asset and stay ahead of the competition.

Learn how we can help your organization create actionable data strategies and highly tailored solutions.

© Datalere, LLC. All rights reserved

383 N Corona St
Denver, CO 80218