Advancements in Big Data Analytics: A Comprehensive Review of Tools and Technologies, from Hadoop to Spark

Authors

  • Prabu Ravichandran Sr. Data Architect, Amazon Web Services Inc., Raleigh, NC, USA Author

Keywords:

Big Data Analytics, Hadoop, Spark, Distributed Computing, Tools, Technologies, Advancements, Comparative Analysis, Data Processing, Insights

Abstract

This research paper provides a comprehensive review of advancements in big data analytics, focusing on the evolution of tools and technologies from Hadoop to Spark. Big data analytics has revolutionized the way organizations process, analyze, and derive insights from massive volumes of data. The emergence of distributed computing frameworks such as Hadoop and Spark has played a pivotal role in enabling efficient processing of large-scale datasets. This paper examines the key features, functionalities, and comparative advantages of these frameworks, along with exploring other relevant tools and technologies in the realm of big data analytics. By synthesizing current research findings and industry practices, this paper aims to offer insights into the landscape of big data analytics tools and technologies, facilitating informed decision-making for organizations seeking to leverage the power of big data.

Readership Data

🌐

Refreshing Cached Analytics Data

The cached analytics data has become stale and www.thesciencebrigade.com is making a fresh request to fetch the latest data from Google Analytics. This may take 20-30 seconds depending on the server response time from Google Analytics. Please do not close the browser during this time. We appreciate your patience.

Downloads

Download data is not yet available.

References

White, Tom. Hadoop: The Definitive Guide. O'Reilly Media, 2015.

Zaharia, Matei, et al. "Apache Spark: A Unified Engine for Big Data Processing." Communications of the ACM, vol. 59, no. 11, 2016, pp. 56-65.

Marz, Nathan, and James Warren. Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning Publications, 2015.

Lakshmanan, Ganesh, et al. "Apache Flink: Stream and Batch Processing in a Single Engine." IEEE Data Eng. Bull., vol. 38, no. 4, 2015, pp. 28-38.

Vavilapalli, Vinod Kumar, et al. "Apache Hadoop YARN: Yet Another Resource Negotiator." Proceedings of the 4th Annual Symposium on Cloud Computing, ACM, 2013, pp. 5-5.

Ghazal, Ahmed, et al. "Big Data Benchmarks: Metrics, Requirements, and Evaluation Criteria." Proceedings of the VLDB Endowment, vol. 5, no. 12, 2012, pp. 1980-1991.

Chambers, Craig, et al. "FlumeJava: Easy, Efficient Data-Parallel Pipelines." Proceedings of the 7th ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM, 2012, pp. 363-375.

Zaharia, Matei, et al. "Discretized Streams: Fault-Tolerant Streaming Computation at Scale." Proceedings of the 24th ACM Symposium on Operating Systems Principles, ACM, 2013, pp. 423-438.

Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: Simplified Data Processing on Large Clusters." Communications of the ACM, vol. 51, no. 1, 2008, pp. 107-113.

Apache Software Foundation. "Apache Storm Documentation." 2012, storm.apache.org.

Zaharia, Matei, et al. "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing." Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, USENIX Association, 2012, pp. 2-2.

Zaharia, Matei, et al. "Spark: Cluster Computing with Working Sets." Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, USENIX Association, 2010, pp. 10-10.

Li, Haoyuan, et al. "Scaling Spark in the Real World: Performance and Usability." Proceedings of the VLDB Endowment, vol. 8, no. 12, 2015, pp. 1840-1851.

Apache Software Foundation. "Apache Kafka Documentation." 2011, kafka.apache.org.

Taylor, Mike. Big Data and the Internet of Things: Enterprise Information Architecture for a New Age. Apress, 2015.

Freeman, Eric, and James Freeman. Machine Learning with TensorFlow. O'Reilly Media, 2017.

Grolinger, Katarina, et al. "Challenges for MapReduce in Big Data." Proceedings of the IEEE International Congress on Big Data, IEEE, 2014, pp. 182-189.

Marz, Nathan. "Big Data Analytics with Spark." Communications of the ACM, vol. 59, no. 4, 2016, pp. 56-65.

Apache Software Foundation. "Apache Hadoop Documentation." 2005, hadoop.apache.org.

Sparks, Evan, et al. "GraphX: A Resilient Distributed Graph System on Spark." First International Workshop on Graph Data Management Experiences and Systems, ACM, 2014, pp. 2-2.

Downloads

Published

18-11-2020

How to Cite

“Advancements in Big Data Analytics: A Comprehensive Review of Tools and Technologies, from Hadoop to Spark”. Journal of Science & Technology, vol. 1, no. 1, Nov. 2020, pp. 91-107, https://www.thesciencebrigade.com/jst/article/view/198.

Plaudit