Jaishree/Are Data Structures and Algorithms obsolete today in the age of Machine Learning?

Created Fri, 09 Sep 2022 12:04:06 +0200 Modified Fri, 09 Sep 2022 12:18:41 +0200
1827 Words

Data Structures and Algorithms

In order to solve a computer science problem, we need a set of instructions (an algorithm) that operate on certain input data and produce some output data. Data structures provide efficient storage and efficient access, and modification of data in a computer’s memory [1]. Data Structures and Algorithms are the key foundations required to solve any computer science problem efficiently. Whereas Machine Learning is a powerful medium to solve today’s problems using data, which under the hood is the consumer of data structures and algorithms. Choosing the right data structures and algorithms help make efficient use of the computer’s mem- ory and time. Modern software applications have an abundance of data and computations. And they can generally not afford latency in results. Therefore, the data structures and algorithms play an important role in modern software. For example, hash tables provide efficient data retrieval, bi- nary search is an important paradigm for searching, dynamic programming ensures that the work once done is not repeated, and divide and conquer algorithms break big problems into smaller and solvable sub-problems.

Machine Learning

Machine learning algorithms learn a mathematical representation of known data. This allows them to make predictions on unseen data without explicit rules or instructions [2]. Under the hood of machine learning algorithms lie mathematical foundations like linear algebra, probability theory, statistics and calculus. Applications of these techniques generally face the following challenges:

  1. Volume of data
  2. Variety of data eg: speech, video, text, images, etc.
  3. Low latency service: Learning and inference times.

We can trace the development of machine learning algorithms to at least 1954 when Farley and Clark first used computers to simulate a Hebbian network [3]. Since then, there have been many advancements in machine learning algorithms like k-NN, decision trees, support vector machines, reinforcement learning, neural networks, k-means, DBSCAN, etc. At present, ML algorithms empower modern applications like autonomous driving, face recognition, medical diagnosis, text translation, game-playing AI, etc.

Time and space constraints in the age of Machine Learning

Machine learning systems have generally two important phases: learning and inference. Such systems learn a function approximation on training data to reduce a certain error (loss function). And they then use that to make inference on incoming unseen data. The use of machine learning inference on mobile phones and vehicles poses more memory and computational constraints than usual since the memory in mobile devices is limited. Moreover, latency in autonomous cars could risk loss of lives. The learning phase mostly happens offline but the time and space efficiency is still important as it can reduce time-to-market. This is especially crucial for systems that help with pandemic and disaster relief [4].

The role of Data Structures and Algorithms in today’s ML era

What Where How
Arrays, Multidimensional Matrices Computer Vision, Principal Component Analysis Representing images and numerical data. Storing intermediate computations or outputs in matrices for calculations like matrix multiplication (tensors and PyTorch).
Trees High Dimensional Point Access Methods, Clustering Algorithms, k-Nearest Neighbour Data structures such as kdb tree [5], X tree [6], B+-tree [7] make a compact representation of data in memory and are efficient for nearest neighbour search.
Stacks Domain Specific Languages for Machine Learning Stacks are crucial for powering recursion in context free grammars which are used for parsing a programming language [8], [9], [10].
Graphs Nearest Neighbour Classification Algorithms, optiML (a domain specific language for machine learning) [11] Optimize nearest neighbour classification[12] and clustering [13] algorithms by eliminating data points to reduce space and time complexity [14].
Dynamic Programming Backpropagation in Neural Networks [15], Reinforcement Learning [16] Storing the pre-computed results when an ML model is in the learning phase.
Divide & Conquer Support Vector Machines Sequential minimal optimization technique solves optimization problem by breaking it into smaller and solvable subproblems such that the problem reduces to a simple one-dimension quadratic function [17].
Greedy Algorithms Decision Trees, Monte Carlo Tree Search Method Greedy algorithms are employed for splitting the source set into subsets based on classification features [18], [19]. Exploitation step in the Monte Carlo Tree Method is a greedy paradigm.
Recursion Decision Trees, Phylogenetic Tree Counting [20]. Top-down inductive tree is built using recursive calls.
Sorting Algorithms k-Nearest Neighbour Distances of a point to other points are sorted to find the nearest neighbours.

The above mentioned scientific literature shows that the Data Structures and Algorithms have played an important role in the history of Machine Learning. It also shows the Data Structures and Algorithms continue to help the development of efficient solutions for modern problems. One could also view Machine Learning systems as just Software Systems with more focus on data pro- cessing and computations. We observe that the Data Structures and Algorithms are ubiquitous in the age of Machine Learning. They may not always be visible but are powering such innovations underneath. As we see in the cases of learning at scale (AlphaGo, BERT: Bidirectional Encoder Representations from Transformers), the trained models can contain up to Billions of parameters. One such example is the GPT-2 (Generative Pretrained Transformer 2) language model by Open AI, which is larger than five GB once loaded in memory [21]. For such models to become main- stream and serve inferences on devices, the Data Structures and Algorithms continue playing an important role. We can conclude that the Data Structures and Algorithms lay the foundations of Machine Learning applications, and they are crucial in making state of the art Machine Learning models accessible to masses.

References

[1]R. L. R. Thomas H. Cormen Charles E. Leiserson and C. Stein, Introduction to Algorithms, Third Edition. MIT Press, 2009. [2]T. M. Mitchell, Machine Learning. McGraw-Hill, 1997. [3]B. Farley and W. Clark, “Simulation of self-organizing systems by digital computer”, Transac- tions of the IRE Professional Group on Information Theory, vol. 4, no. 4, pp. 76–84, 1954. [4]C. Bi et al., “Machine learning based fast multi-layer liquefaction disaster assessment”, World Wide Web, vol. 22, no. 5, pp. 1935–1950, Oct. 2018, doi: 10.1007/s11280-018-0632-8. [5]Byunggu Yu, T. Bailey, R. Orlandic, and J. Somavaram, “KDB/sub KD/-tree: a compact KDB- tree structure for indexing multidimensional data”, in Proceedings ITCC 2003. International Con- ference on Information Technology: Coding and Computing, 2003, pp. 676–680. [6]S. Berchtold, D. A. Keim, and H.-P. Kriegel, “The X-Tree: An Index Structure for High- Dimensional Data”, in Proceedings of the 22th International Conference on Very Large Data Bases, 1996. [7]H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang, “iDistance”, ACM Transactions on Database Systems, vol. 30, no. 2, pp. 364–397, Jun. 2005, doi: 10.1145/1071610.1071612. [8]M. Tomita, “An Efficient Context-Free Parsing Algorithm for Natural Languages”, in Proceed- ings of the 9th International Joint Conference on Artificial Intelligence - Volume 2, 1985. [9]S. Das, C. L. Giles, and G.-zheng Sun, “Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory”, in CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1992, pp. 791–795. [10]A. Agrawal et al., “TensorFlow Eager: A multi-stage, Python-embedded DSL for machine learning”, arXiv preprint arXiv:1903.01855, 2019. [11]A. K. Sujeeth et al., “OptiML: an implicitly parallel domain-specific language for machine learning”, 2011. [12]A. Schenker, M. Last, H. Bunke, and A. Kandel, “Classification of web documents using a graph model”, in Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., 2003, pp. 240–244. [13]P. Franti, O. Virmajoki, and V. Hautamaki, “Fast agglomerative clustering using a k-nearest neighbor graph”, IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 11, pp. 1875–1881, 2006. [14]G. T. Toussaint, “Proximity graphs for nearest neighbor decision rules: recent progress”, In- terface, vol. 34, 2002. [15]A. Gruslys, R. Munos, I. Danihelka, M. Lanctot, and A. Graves, “Memory-efficient back- propagation through time”, in Advances in Neural Information Processing Systems, 2016, pp. 4125–4133. [16]A. G. Barto, “Reinforcement learning and dynamic programming”, in Analysis, Design and Evaluation of Man–Machine Systems 1995, Elsevier, 1995, pp. 407–412. [17]L. J. Cao et al., “Parallel sequential minimal optimization for the training of support vector machines”, IEEE Trans. Neural Networks, vol. 17, no. 4, pp. 1039–1049, 2006. [18]J. R. Quinlan, “Induction of decision trees”, Machine learning, vol. 1, no. 1, pp. 81–106, 1986. [19]F. Cicalese, T. Jacobs, E. Laber, and M. Molinaro, “On greedy algorithms for decision trees”, in International Symposium on Algorithms and Computation, 2010, pp. 206–217. [20]A. Gavryushkina, D. Welch, and A. J. Drummond, “Recursive algorithms for phylogenetic tree counting”, Algorithms for Molecular Biology, vol. 8, no. 1, p. 26, 2013. 4[21]J. Kaplan et al., “Scaling laws for neural language models”, arXiv preprint arXiv:2001.08361, 2020.