All other trademarks are the property of their respective owners.
CPUs and GPUs can get extremely hot and going above their thermal design point can cause irreversible damage. To avoid these devices from overheating cooling systems such as heat sinks, fans, and sometimes expensive liquid cooling systems, are integrated into the system. Along with these cooling systems, thermal safety margins are implemented to ensure the processing units do not exceed their limits. When the temperature of a CPU or GPU approaches the thermal limit, the clock speed will throttle to protect the CPU or GPU, degrading its performance. Due to inaccuracies of the temperature measurements, the safety margins must be larger and the processor will start to slow its performance earlier than necessary. Because of this, getting more accurate temperature data is crucial to optimizing a processors performance, which maximizes the overall user experience.
A CPU is the main chip on the motherboard or main board of a PC, smartphone, or other electronic device. Each CPU has three main components: an Arithmetic Logic Unit (ALU), a Control Unit (CU), and memory unit. Today, it is common to find CPUs that have multiple cores, typically between 2-16, each having their own ALU, CU, and memory. The CPU is given instructions from a program or application and executes it. These instructions can be tasks such as basic arithmetic computations, numeric comparisons, or memory movement and storage. How fast these instructions can be processed is limited by the number of cores and clock rate of the CPU. The clock rate of a CPU is measured in GHz and is the number of instructions a CPU can execute per second. Therefore the faster the clock rate the faster the CPU can process instructions and the greater the performance of the CPU.
A GPU is a processing unit that is specifically designed to handle graphics. Similar to a CPU, a GPU consists of multiple cores that have ALUs, CUs, and memory units. The main difference is that GPUs are architected to have hundreds to thousands of cores. This makes them specially designed for handling parallel throughput computing. How fast a GPU can process data is also limited by its clock rate.
A system on chip (SoC) is an integrated circuit that consists of multiple electrical components on a single platform. The CPU and GPU are usually integrated into a SoCs. SoCs are much smaller than multi-chip designs found in PCs or notebooks and consume less power, but are typically much slower. Because of their smaller size and power consumption they are generally used in smartphones, tablets, and other mobile devices.
To understand why CPUs and GPUs overheat we have to look at these devices at the transistor level. There can be up to 100 million transistors per square millimeter on a processing unit. Each transistor acts as a switch to allow or stop current from flowing through it. When transistors switch, they go through a more resistive state. Any current passing through a resistance will generate I2R losses which generates heat. At higher clock rates the transistors goes through the more resistive state more often which creates more heat. This self-heating can damage the processor if not monitored accurately.