Today, the amount of data in all fields is growing rapidly. How can we make efficient use of this data? This book introduces Apache Spark, an open source cluster computing system that can accelerate the implementation and operation of data analysis. With Spark, you can quickly manipulate large data sets with simple APIs in Python, Java, and Scala. Written by Spark developers, this book allows data scientists and engineers to get started immediately. You can learn how to use short codes to implement complex parallel jobs, and learn about applications from simple batch jobs to stream processing and machine learning. Table of Contents Chapter 1 Introduction to Spark Data Analysis Chapter 2 Download and Get Started with Spark Chapter 3 RDD Programming Chapter 4 Key-Value Pair Operations Chapter 5 Data Reading and Saving Chapter 6 Advanced Spark Programming Chapter 7 Running Spark on a Cluster Chapter 8 Spark Tuning and Debugging Chapter 9 Spark SQL Chapter 10 Spark Streaming Chapter 11 Machine Learning Based on MLlib
You Might Like
Recommended ContentMore
Open source project More
Popular Components
Searched by Users
Just Take a LookMore
Trending Downloads
Trending ArticlesMore