Skip to content

Latest commit

 

History

History
66 lines (48 loc) · 3.17 KB

overview.md

File metadata and controls

66 lines (48 loc) · 3.17 KB
title sidebar_label
Spark Connectors for Pravega
Overview

This documentation describes the connector API and usage to read and write Pravega streams with Apache Spark.

Build end-to-end stream processing and batch pipelines that use Pravega as the stream storage and message bus, and Apache Spark for computation over the streams.

Features & Highlights

  • Exactly-once processing guarantees for both Reader and Writer, supporting end-to-end exactly-once processing pipelines
  • A Spark micro-batch reader connector allows Spark streaming applications to read Pravega Streams. Pravega stream cuts (i.e. offsets) are used to reliably recover from failures and provide exactly-once semantics.
  • A Spark batch reader connector allows Spark batch applications to read Pravega Streams.
  • A Spark writer allows Spark batch and streaming applications to write to Pravega Streams. Writes are optionally contained within Pravega transactions, providing exactly-once semantics.
  • Seamless integration with Spark's checkpoints.
  • Parallel Readers and Writers supporting high throughput and low latency processing.

Releases

The latest releases can be found on the Github Release project page.

Pre-Built Artifacts

Releases are published to Maven Central. Spark and Gradle will automatically download the required artifacts. However, if you wish, you may download the artifacts manually using the links below.

The pre-built artifacts are available in the following locations:

Support

Don’t hesitate to ask! Contact the developers and community on Slack (signup) if you need any help. Open an issue if you found a bug on Github Issues.

About

Spark Connectors for Pravega is 100% open source and community-driven. All components are available under Apache 2 License on GitHub.