Spark Operator aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It uses Kubernetes custom resources for specifying, running, and surfacing status of Spark applications.
Spark Operator currently supports the following list of features:
- Supports Spark 2.3 and up.
- Enables declarative application specification and management of applications through custom resources.
- Automatically runs spark-submit on behalf of users for each SparkApplication eligible for submission.
- Provides native cron support for running scheduled applications.
- Supports customization of Spark pods beyond what Spark natively is able to go through the mutating admission webhook, e.g., mounting ConfigMaps and volumes, and setting pod affinity/anti-affinity.
- Supports automatic application re-submission for updated SparkAppliation objects with the updated specification.
- Supports automatic application restart with a configurable restart policy.
- Supports automatic retries of failed submissions with optional linear back-off.
- Supports mounting local Hadoop configuration as a Kubernetes ConfigMap automatically via sparkctl.
- Supports automatically staging local application dependencies to Google Cloud Storage (GCS) via sparkctl.
- Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.