A DISTRIBUTED OPTIMIZATION FRAMEWORK FOR SCALABLE BIG DATA ANALYTICS USING STOCHASTIC GRADIENT DESCENT

Vijay Vaswani

doi:10.64149/gjaets.10.7.1-7

Authors

Vijay Vaswani Author

DOI:

https://doi.org/10.64149/gjaets.10.7.1-7

Keywords:

Big Data; Distributed Optimization; Stochastic Gradient Descent; TPC-DS; Scalable Machine Learning

Abstract

The explosive data growth in volume, velocity, and variety of data in modern applications has made the old data processing paradigms obsolete, thus making mathematical frameworks for scalable analytics necessary. This paper proposes a distributed optimization algorithm that is based on an adaptive stochastic gradient descent (ASGD) algorithm designed for large-scale machine learning tasks. We provide mathematical formulation incorporating a communication-efficient consensus mechanism for solving the objective function using distributed workers nodes. The proposed method solves the most important problem of convergence speed and fault tolerance, which are the specific problems of heterogeneous cluster environments. We validate our approach using the TPC-DS benchmark dataset at the scale of 1TB, and provide a 23% reduction in training time and 15% increase in the convergence of the objective function in comparison to the standard synchronous SGD implementations. Our results therefore suggest that it is possible to optimise overall system throughput by (significantly) adaptive parameter tuning at node level without reducing model accuracy.

A DISTRIBUTED OPTIMIZATION FRAMEWORK FOR SCALABLE BIG DATA ANALYTICS USING STOCHASTIC GRADIENT DESCENT

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Make a Submission

LOGO

Latest publications

Information