- Big Data
- 20. Sep
Zeppelin, a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more. A completely open web-based notebook that enables interactive data analytics
Apache Zeppelin is a new and incubating multi-purposed web-based notebook which brings data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark.
- Web-based notebook style editor.
- Built-in Apache Spark support
WHAT IS APACHE ZEPPELIN?
The Notebook is the place for all your needs
- Data Ingestion
- Data Discovery
- Data Analytics
- Data Visualization & Collaboration
Multiple Language Backend
Apache Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Apache Zeppelin supports many interpreters such as Apache Spark, Python, JDBC, Markdown and Shell.
Apache Spark integration
Especially, Apache Zeppelin provides built-in Apache Spark integration. You don't need to build a separate module, plugin or library for it.
Apache Zeppelin with Spark integration provides:
- Automatic SparkContext and SQLContext injection
- Runtime jar dependency loading from local filesystem or maven repository. Learn more about dependency loader.
- Canceling job and displaying its progress
Some basic charts are already included in Apache Zeppelin. Visualizations are not limited to SparkSQL query, any output from any language backend can be recognized and visualized.
Apache Zeppelin aggregates values and displays them in pivot chart with simple drag and drop. You can easily create a chart with multiple aggregated values including sum, count, average, min, max.
Apache Zeppelin can dynamically create some input forms in your notebook.
Collaborate by sharing your Notebook & Paragraph
Your notebook URL can be shared among collaborators. Then Apache Zeppelin will broadcast any changes in real-time, just like the collaboration in Google Docs. Apache Zeppelin provides an URL to display the result only, that page does not include any menus and buttons inside of notebooks. You can easily embed it as an iframe inside of your website in this way.
Apache Zeppelin is Apache2 Licensed software. Please check out the source repository and how to contribute. Apache Zeppelin has a very active development community. Join to our Mailing list and report issues on Jira Issue tracker.
What Zeppelin Does
Interactive browser-based notebooks enable data engineers, data analysts and data scientists to be more productive by developing, organizing, executing, and sharing data code and visualizing results without referring to the command line or needing the cluster details. Notebooks allow these users not only allow to execute but to interactively work with long workflows. There are a number of notebooks available with Spark. iPython remains a mature choice and great example of a data science notebook. The Hortonworks Gallery provides an Ambari stack definition to help our customers quickly set up iPython on their Hadoop clusters.
Apache Zeppelin is a new and upcoming web-based notebook which brings data exploration, visualization, sharing and collaboration features to Spark. It supports Python, but also a growing list of programming languages such as Scala, Hive, SparkSQL, shell, and markdown.
Data discovery, exploration, reporting, and visualization are key components of the data science workflow. Zeppelin provides a “Modern Data Science Studio” that supports Spark and Hive out of the box. Actually, Zeppelin supports multiple language backends which has support for a growing ecosystem of data sources. Zeppelin’s notebooks provide interactive snippet-at-time experience to data scientist. You can see a collection of Zeppelin notebooks in the Hortonworks Gallery.
Also when you are done with your notebook and found some insight you want to share, you can easily create a report out of it and either print it or send it out.