- Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman, "Mining of massive datasets"
- Hadoop: The Definitive Guide by Tom White
- Yucheng Low et. al., "Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud", 2012, PVLDB.
Storage, retrieval, analysis and mining from huge amount of data is a challenging topic that has made significant impact in several domains in both industry and academia. This implementation-oriented course offers hands-on experience with state-of-the-art tools and techniques that the big data industry is using for analyzing massively huge data sets. Particularly, we cover following main topics:
- Large Scale distributed file systems and data storage frameworks
- Computational models for large scale data (e.g. MapReduce and GraphLab)
- Data Stream analysis
- Statistical learning techniques for large scale data