FDTD & GPU
Near-to far field
Chapter 1: FDTD and graphics cards
Finite Difference in Time Domain method (FDTD) is a well known numerical electromagnetics method applicable to a broad range of frequencies covering nearly all the topics in industry and science. FDTD is based on an iterative numerical solution of the Maxwell equations simulating wave propagation in a sequence of very short time steps. Starting with radio waves calculations many years ago, after tens of years of development this method can deal with nearly any material properties, sample and source geometries and is probably the most universal method for electromagnetic field modeling. On the other side, this method belongs also to the most computationally demanding methods in the field of numerical analysis of electromagnetic waves, even in present when the computational resources are being developed and improved really extremely fast.
In principle, the FDTD method is computationally limited by dense discretization that is necessary for its proper operation. For the standard Yee's algorithm (that is a core of FDTD) the stable discretization requires a spatial minimum of Δx ≤ λ/10, where λ is propagating electromagnetic wave wavelength. There are some approaches in the literature to overcome these limitations, which however lead to much more complicated algorithms that cannot be simply adapted to all of the large spectrum of FDTD calculations.
As a result, both the capacity of computer memory and the processing speed is a limiting factor for large and long calculations in FDTD. This fact decreases the applicability of FDTD for many purposes. An alternative approach, that is used in Gsvit, is based on using a graphics processing unit (graphics card) to move the calculation from the computer processor (CPU) to the graphics processing unit (GPU) and to speed it up significantly, as shown for example on the following table (showing comparison between CPU and GPU for the same computation on different computional volume size) [1,2]:
The capabilities of graphics cards for numerical modeling have increased dramatically over the last few years. Two main producers NVIDIA and ATI have developed drivers for running calculations on their cards, and both memory and processing power of commonly used graphical cards have increased by a large factor. In Gsvit we focus on Nvidia CUDA products (from historical reasons when this was the only graphics card that could be coded in C). Generally, the available memory and computing power of GPUs increases much more rapidly than for personal computers themselves, which is also promising. Moreover, even special supercomputers based on graphical cards can be purchased now, like Nvidia Tesla systems. In these computers there are multiple graphics cards and we want to address also computing on multiple cards therefore.
To use GPU for a calculation is not straightforward, unfortunataly. We cannot simply take a conventional PC executable and run it on GPU. Both data processing and memory model is completely different for GPU and for CPU and the part of the code that should be run on GPU (called kernel) must be written to fulfill these conditions. GPU is equipped by several multiprocessors, consisting of a large number of processors. Many hundreds of threads (kernel calls) grouped in thread blocks can be processed simultaneously on GPU, which is the basis of tremendous speedup that we can achieve. Memory available on GPU can be divided into a global memory - accessible by all the multiprocessors, a shared memory - accessible by processors within one multiprocessor, and a local memory - accessible by single processor. All the memories are hardware limited (for each type of GPU differently). We refer to Nvidia CUDA developer zone for further details.
 P. Klapetek, M. Valtr: Near-field optical microscopy simulations using graphics processing units, Surf. Interface Analysis, 2010, 42, pp 1109-1113
 P. Klapetek et al: Rough surface scattering simulations using graphics cards, Applied Surface Science, 2010, 256, pp 5640-5643.
(c) Petr Klapetek, 2013