Added table of content (#33523)
This commit is contained in:
committed by
Christopher McCormack
parent
08bdc8640e
commit
f6f4f29cad
@ -3,10 +3,18 @@ title: GPU
|
||||
---
|
||||
|
||||
# GPU
|
||||
GPU stands for Graphics Processing Unit. The majority of computers use these to render videos or play video games.
|
||||
|
||||
## Table of Contents
|
||||
* [GPU](#gpu)
|
||||
* [Origin of GPU](#origin-of-gpu)
|
||||
* [GPU vs CPU](#gpu-vs-cpu)
|
||||
* [Evolution of GPU Architecture](#evolution-of-gpu-architecture)
|
||||
* [Basic Unified GPU Architecture Components](#basic-unified-gpu-architecture-components)
|
||||
|
||||

|
||||
|
||||
GPU stands for Graphics Processing Unit. Although they are not nessesary for a computer to function, many computers have a dedicated graphics card for better performence rendering videos or playing video games.
|
||||
GPU stands for Graphics Processing Unit. Although they are not necessary for a computer to function, many computers have a dedicated graphics card for better performence rendering videos or playing video games.
|
||||
|
||||
A GPU is like a CPU but has different strengths and weaknesses. CPUs are very good at running a couple of tasks very quickly. GPUs are much better at running many tasks at the same time but slower. A typical GPU can have more than 10,000 tasks running but to run so many tasks at the same time they must share memory and other resources. GPUs usually run very repetitive tasks over and over to save the CPU from wasting time. Some CPUs have built-in GPUs but having a separate GPU is almost always more powerful.
|
||||
|
||||
@ -16,23 +24,18 @@ The GPU was originally used mainly for 3D game rendering to improve your resolut
|
||||
|
||||
There are two major brands producing GPUs: NVidia and AMD. They are often referred as the "green team" and "red team" which indicate the major color of their logo.
|
||||
|
||||
|
||||
## Origin of GPU
|
||||
|
||||
The most primitive background of GPU can be mapped to the era of VGA (Virtual Graphics Array) controllers. These were not actually whole processing units, but acted as supporting units for display functions. A VGA controller is a simple memory controller connected to Dynamic RAM and a display generator. The main function of a VGA is to receive image data, arrange it properly, and send it to a video device, which was mainly a computer monitor or a TV screen connected to a gaming console for display.
|
||||
|
||||
The first ever full-fledged processing unit for graphic acceleration was developed and marketed by NVIDIA in 1999, "GeForce 256". Older 3D accelerators had to rely on the CPU to execute graphic calculations. With the new "GeForce 256" as a co-processor for CPU, frame rates improved by more than 50% and the total cost was lowered, thereby expanding itself in the consumer market.
|
||||
|
||||
|
||||
## GPU vs CPU
|
||||
|
||||
A CPU is optimized for minimum latency, i.e., "to be able to execute as many instructions as possible belonging to a single serial thread, in a given window of time". The processor must be able to switch quickly between operations. In order to get lots of latency on the CPU, there is a lot of infrastructure in the CPU like large caches for data to be readily available for execution, lots of Control Units for out-of-order executions, and a few ALU cores. The ISA of CPU is designed in a more generalized manner and can perform a wide-range of operations.
|
||||
While the CPU was designed for general purpose computations and instructions, the GPU evolved for graphic computations. The same computation needs to be performed on hundreds and thousands of pixels for 2D/3D rendering of graphics. Thus, GPUs were primarily optimized for maximum throughput. This is implemented using tons of ALUs in a single architecture. The L2 cache is shrunk because until the data is fetched from DRAM, GPU cores have a lot of computations to perform, thereby overlapping the CPU stall time with massive parallelism. This is known as latency hiding.
|
||||
|
||||
|
||||
|
||||
## Evolution of GPU Architecture
|
||||
|
||||
GPUs were originally modeled on the concept of graphics pipeline. Graphics pipeline is a theoretical model, comprising of levels how the graphics data is to be sent through and executed using GPU and software(like OpenGL, DirectX). The pipeline basically converts 3D spatial coordinates into 2D pixelated data for the device to display. The following is an illustration of "Traditional Fixed-function Graphics Pipeline", commonly accepted pipeline till today.
|
||||
|
||||
### 0th Generation
|
||||
@ -53,31 +56,29 @@ In 2006, the release of NVIDIA's GeForce 8 series GPU revolutionized the GPU ind
|
||||
Since the release of the 9XX series NVidia GPUs, the performance increase between generations only got better. From the 980Ti to the 1080Ti and the newly launched 208Tis, performance has more than doubled. AMD also started to produce better GPUs like the RX 580 and Vega 64, although this is still nowhere near Nvidia's level.
|
||||
Just recently, Nvidia launched a new line of GPUs titled RTX which includes the higher-end cards like 2080Ti, 2080, and 2070. RTX stands for "Ray Tracing", which is a rendering technique used in generating images though tracing the path of light in a scene. The more "Rays" or light created, the more accurate the graphic image quality will be, as it is more optimized to enhance lighting effects and shadows.
|
||||
|
||||
|
||||
## Basic Unified GPU Architecture Components
|
||||
|
||||
Unified GPU architectures are based on a parallel array of many programmable processors, wherein all the stages of graphics pipeline, viz., vertex, geometry, rasterization, and pixel shader processing and parallel computations on the same core, in contrast with earlier GPUs. The processor array is highly integrated with fixed function processors for compression and decompression, rasterization, raster operations, texture filtering, anti-aliasing, video decoding, and HD video processing.
|
||||
|
||||
The following discussed architecture is focused on executing many parallel threads efficiently on many processor cores.
|
||||
|
||||
### Processor Array
|
||||
A processor array consists of many processing cores. A unified GPU processor array has a typical organized structure of multi-threaded multi-processors. For execution of each thread, a multiprocessor is involved, and in each GPUs multi-processor, also known as Streaming Multiprocessors (SM), there are numerous Streaming processors, arranged in a queue. All the processors connect to DRAM partitions via interconnection network.
|
||||
|
||||
### Multi-Threading
|
||||
As discussed earlier, GPU is optimized for high throughput and latency hiding. High scale multithreading shrinks the latency of memory loads from DRAM. While a thread is at stall because of a load or fetch instruction to complete, the processor can execute another thread. Also, because of high scale multithreading, GPU supports fine-grained parallel graphics shader programming models and fine-grained parallel computer programming models.
|
||||
As discussed earlier, GPU is optimized for high throughput and latency hiding. High scale multi-threading shrinks the latency of memory loads from DRAM. While a thread is at stall because of a load or fetch instruction to complete, the processor can execute another thread. Also, because of high scale multi-threading, GPU supports fine-grained parallel graphics shader programming models and fine-grained parallel computer programming models.
|
||||
|
||||
### Multi-Processor Architecture
|
||||
Besides multiple processor cores in a SM, there are Special Functional Units, a multithreaded instruction unit, instruction and constant caches, and a shared memory. Also, each core consists of large multi-threaded register file (RF).Each streaming processor core consists of both integer and floating point arithmetic units, which together can handle most of the operations.
|
||||
Besides multiple processor cores in a SM, there are Special Functional Units, a multi-threaded instruction unit, instruction and constant caches, and a shared memory. Also, each core consists of large multi-threaded register file (RF).Each streaming processor core consists of both integer and floating point arithmetic units, which together can handle most of the operations.
|
||||
|
||||
### SIMT
|
||||
The streaming multi-processor use a "single-instruction multiple-thread (SIMT)" architecture. The instructions are executed in group of parallel threads known as warps. Each parallel thread is of the same type and start together at the same program address. SIMT processor architecture is quite similar to SIMD architecture. In SIMT, a particular instruction is executed in multiple parallel threads independently, while in SIMD, same instruction is executed in multiple data lanes in synchronous groups.
|
||||
|
||||
### Streaming Processor
|
||||
It executes all the fundamental FP operations as well as arithmetic, comparison, conversion, and logical PTX instructions.
|
||||
Special Functional Unit
|
||||
Some of the thread instructions are executed on SFUs simultaneously with other thread instruction being executed on the SPs.
|
||||
|
||||
|
||||
#### More Information:
|
||||
<!-- Please add any articles you think might be helpful to read before writing the article -->
|
||||
|
||||
## More Information:
|
||||
- <a href='https://en.wikipedia.org/wiki/Graphics_processing_unit' target='_blank' rel='nofollow'>Wikipedia</a>
|
||||
- <a href='https://www.openacc.org/' target='_blank' rel='nofollow'>OpenACC</a>
|
||||
- <a href='https://developer.nvidia.com/cuda-zone' target='_blank' rel='nofollow'>CUDA</a>
|
||||
|
Reference in New Issue
Block a user