This seminar is intended to get to know and understand the core concepts and functionality of Apache Spark for Big Data Analytics. Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. Spark performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for lightning fast speed and supports Java, Scala, and Python APIs for ease of development.
A set of research and concept papers about Apache Spark for Analytics will be offered in the introductory session of seminar. Each student is required to choose a paper, read and understand it thoroughly and analytically, and present it during the seminar.