Conditions for privatizing the elements of arrays by computing threads
Abstract
The set of operations of the parallel algorithm for implementation on the GPU must be split into computation threads. The threads must be grouped into computation units that run atomically on stream processors, also called multiprocessors. For good GPU performance, it is important that as much data as possible can fit into fast register and shared memory, otherwise slow global and local memory are used. The degree of memory usage with fast access reflects the computational property of the algorithm, called locality. When implementing algorithms on multiprocessor computing devices, the use of locality plays a crucial role in achieving high performance. In this paper, necessary conditions and sufficient conditions have been formulated and proved, the use of which allows receiving threads with privatized data, i. e. it allows to receive such computation threads that the array element is used only by one thread and therefore it is advisable to place it in the register.
References
- Voevodin VlV, Voevodin VadV. [The fortunate locality of supercomputers]. Otkrytye sistemy [Open systems]. 2013;9:12–15. Russian.
- Buluc A, Gilberta JR, Budak C. Solving path problems on the GPU. Parallel Computing. 2010;36(5– 6):241–253.
- Likhoded NA, Paliashchuk MA. [Estimate of locality of parallel algorithms implemented on GPUs]. Vestnik Yuzhno-Ural’skogo gosudarstvennogo universiteta. Seriya: Vychislitel’naya matematika i informatika [Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering]. 2016:5(3):96 –111. Russian. DOI: 10.14529/cmse160307.
- Likhoded NA, Poleshchuk MA. Method of ranking tiles size parameters of parallel algorithm. Doklady of the National Academy of Sciences of Belarus. 2015;59(4):25–33. Russian.
- Kandemir M, Ramanujam J, Irwin M, Narayanan V, Kadayif I, Parikh A. A compiler based approach for dynamically managing scratch-pad memories in embedded systems. IEEE Transactions on Computer-Aided Design. 2004;23(2):243–260.
- Baskaran M, Bondhugula U, Krishnamoorthy S, Ramanujam J, Rountev A, Sadayappan P. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming; 2008 February 20 –23; Salt Lake City, USA. New York: ACM; 2008. p. 1–10. DOI: 10.1145/1345206.1345210.
- Poleshchuk MA, Likhoded NA. [Array privatization by computing threads]. In: CSIST’2016. Mezhdunarodnyi kongress po informatike: informatsionnye sistemy i tekhnologii; 24 –27 oktyabrya 2016 g.; Minsk, Belarus’ [CSIST’2016. International Congress on Computer Science: Information Systems and Technologies; 2016 October 24 –27; Minsk, Belarus]. Minsk: Belarusian State University; 2016. p. 883–888. Russian.
- Baskaran M, Ramanujam J, Sadayappan P. Automatic C-to-CUDA code generation for affine programs. In: Compiler Construction. 19th International Conference. Part of the Joint European Conferences on Theory and Practice of Software; 2010 March 20 –28; Paphos, Cyprus. Berlin, Heidelberg: Springer-Verlag; 2010. DOI: 10.1007/978-3-642-11970-5_14.
- Katz GJ, Kider J. All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware; 2008 June 20–21; Sarajevo, Bosnia and Herzegovina. Geneve: Eurographics Association; 2008. p. 47–55.
- Lund BD, Smith JW. A Multi-Stage {CUDA} Kernel for Floyd-Warshall. CoRR [Internet]. 2010;abs/1001.4108. Available from: http://arxiv.org/abs/1001.4108.
- Adutskevich EV, Likhoded NA, Sikorsky AO. [Parallelization of sequential programs: distribution of arrays among processors and structurization of communications]. Kibernetika i sistemnyi analiz [Cybernetics and System Analysis]. 2012;48(1):144 –163. Russian.
- Voevodin VV. Informatsionnaya struktura algoritmov [Informational structure of algorithms]. Moscow: MGU; 1997. Russian.
- Likhoded NA, Tolstikov AA. Formalization of communication operations of multidimensional loops. Proceedings of the National Academy of Sciences of Belarus. Physics and Mathematics Series. 2010;3:109 –114. Russian.
Copyright (c) 2018 Journal of the Belarusian State University. Mathematics and Informatics

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The authors who are published in this journal agree to the following:
- The authors retain copyright on the work and provide the journal with the right of first publication of the work on condition of license Creative Commons Attribution-NonCommercial. 4.0 International (CC BY-NC 4.0).
- The authors retain the right to enter into certain contractual agreements relating to the non-exclusive distribution of the published version of the work (e.g. post it on the institutional repository, publication in the book), with the reference to its original publication in this journal.
- The authors have the right to post their work on the Internet (e.g. on the institutional store or personal website) prior to and during the review process, conducted by the journal, as this may lead to a productive discussion and a large number of references to this work. (See The Effect of Open Access.)