Enables interactive data analytics

Apache Spark & Zeppelin

An open-source, web-based "notebook" that enables interactive data analytics and collaborative documents.

Apache Spark and Zeppelin

Apache spark and Zeppelin is an open-source, web-based “notebook” that enables interactive data analytics and collaborative documents. The notebook is integrated with distributed, general-purpose data processing systems such as Apache Spark (Large Scale data processing), Apache Flink (Stream processing framework), and many others. Apache Zeppelin allows you to make beautiful, data-driven, interactive documents with SQL, Scala, R, or Python right in your browser.

Data Ingestion

Data ingestion in the zeppelin can be done with Hive, HBase, and other interpreters provided by the zeppelin.

Data Discovery

Zeppelin provides Postgres, HawQ, Spark SQL, and other Data discovery tools, with spark SQL the data can be explored.

Data Analytics

Spark, Flink, R, Python, and other useful tools are already available in the zeppelin and the functionality can be extended by simply adding the new interpreter.

Data Visualization & Collaboration

All the basic visualization like Bar chart, Pie chart, Area chart, Line chart and scatter chart are available in a zeppelin.

Apache Spark

In FileGPS we use the Spark Streaming component integrating with Kafka for data computation.

Apache Spark Streaming

It is an add-on to core Spark API, allowing scalable, high-throughput, fault-tolerant stream processing of live data streams. Spark can access Kafka, Flume, Kinesis, or TCP socket data. It can operate using various algorithms. Finally, the received data is given to file systems, databases, and live dashboards. Spark uses Micro-batching for real-time streaming.
Micro-batching is a technique that allows a process or task to treat a stream as a sequence of small batches of data. Hence Spark Streaming groups the live data into small batches. It then delivers it to the batch system for processing. It also provides fault tolerance characteristics.

Apache Spark & Zeppelin - FAQ's

How do you use the Spark on Zeppelin?

Let us now take a closer look at using zeppelin with spark using an example:

Create a new note from zeppelin home page with “spark” as default interpreter.
Before you start with the example, you will need to download the sample csv.
Transform csv into RDD.

Why do we need Apache Spark?

Spark has been called a “general purpose distributed data processing engine”1 and “a lightning-fast unified analytics engine for big data and machine learning”². It lets you process big data sets faster by splitting the work up into chunks and assigning those chunks across computational resources.

What is Apache spark framework?

Spark is an open source framework focused on interactive query, machine learning, and real-time workloads. It does not have its own storage system, but runs analytics on other storage systems like HDFS, or other popular stores like Amazon Redshift, Amazon S3, Couchbase, Cassandra, and others.

What is Zeppelin spark?

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python, and R, and an optimized engine that supports general execution graphs. Apache Spark is supported in Zeppelin with the Spark interpreter group which consists of interpreters.

How do you reset the Spark interpreter on Zeppelin?

You can restart the interpreter for the notebook in the interpreter bindings (gear in upper right hand corner) by clicking on the restart icon to the left of the interpreter in question (in this case it would be the spark interpreter).

Connect with us

See exactly how Pragma Edge can help your business thrive.

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others

Apache Spark & Zeppelin

Apache Spark and Zeppelin

Data Ingestion

Data Discovery

Data Analytics

Data Visualization & Collaboration

Apache Spark

Apache Spark Streaming

Apache Spark & Zeppelin - FAQ's

Connect with us

Industries

Products

Who We Are

IBM Partner Engagement Manager Standard

IBM Partner Engagement Manager Standard

IBM Partner Engagement Manager Standard

Pragma Edge - API Connect