Acoustic echo cancellation | Video | TI.com

Internet Explorer is not a supported browser for TI.com. For the best experience, please use a different browser.

Video Player is loading.

Current Time 0:00

Duration 14:21

Loaded: 0.00%

Stream Type LIVE

Remaining Time 14:21

Acoustic echo cancellation

00:14:21 | 23 JUN 2022

Learn more about the basics of acoustic echo cancellation. This video will go over what is acoustic echo cancellation, how it operates and what components you need for implementation. Also included is more information on full-duplex performance so you can characterize your speakers for acoustic echo cancellation tuning.

Resources

Acoustic echo cancellation, basics, principles, and implementation, presented by Arash Loloee. In this video, we will go over the basics of acoustic echo cancellation, its operation, and the needed components for its implementation. Before doing so, we need to understand what acoustic echo is. Let's take a look at a typical conference call, where we have a microphone and a speaker on both ends of the call.

The speech from the far end caller is broadcasted by the speaker phone or the hands-free cellular phone for the near end callers, and then repeats itself by bouncing off the inside surfaces of the room, which we refer to as echo. Echoes are picked up by the near end microphone, creating a feedback loop, where the far end caller hears the echo of his or her own voice.

To solve this problem, usually we use digital signal processing to stop the feedback and allow full duplex communication. This process is referred to as acoustic echo cancellation, or simply AEC. Many people use AEC systems every day without realizing it. When we use the speakerphone on our home telephone, use Zoom with our computer, or participate in a conference call at our office, we are using an AEC system in a way.

There are three main technologies used in an AEC, the normalized least mean squared algorithm, nonlinear processing, and center clipper. The normalized least mean squared algorithm is an adaptive filtering technique designed to perform echo cancellation as the heart of the operation. We also have nonlinear processing, which is applied to provide residual echo suppression where normalized least mean squared is not sufficient alone. Center clipper is used to make the low level residual echo completely inaudible.

There are other components that are used in AEC, such as the comfort noise generator, which is implemented to mask the effects of nonlinear processing, mode changes, and to eliminate any dead air effect. This effect, if not addressed by the comfort noise generator, can confuse the listener into thinking the line has temporarily been dropped. We will go over these components. And for this purpose, we will use Texas Instruments AIC3254.

The AEC algorithm is implemented in the AIC3254 dual-core processor. The chip is a flexible, low-power, low-voltage stereo audio codec with programmable inputs and outputs, fixed and parameterizable signal processing blocks, integrated PLL, integrated LDOs, and flexible digital interfaces. Let's take a look at the various components of an AEC system in more detail. To start, we will look at the normalized least mean squared filter.

In this figure, the far end signal is designated by Y of n. And its reflection in the near end is represented by R of n. X of n is the near end signal. The near end signal is superimposed with an undesired echo picked up by the microphone in the room. The received far end signal is available as a reference signal for the echo canceler. And is used by the cancelr to generate a replica of the echo or an estimated echo, represented by R hat of n.

The echo canceler generates the echo replica by applying the reference signal to an adaptive transversal filter. This replica is subtracted from the near end signal plus echo to yield the transmitted near end signal. Ideally, the residual echo error, E of n, which is the difference between R of n and R hat of n, will be very small after echo cancellation.

If the transversal filter's transfer function is identical to that of the echo path, then the echo replica will be identical to the echo, thus achieving total cancellation. Ideally, the echo is completely removed and only the near end transmit signal is passed to the far end. However, due to dynamic changes in the uplink signal and nonlinear distortion caused by the speaker, amplifier, and loudspeaker, along with various changes in the acoustic environment at the near end, the normalized least mean squared estimate may not exactly match the echo.

The normalized least mean squared filter coefficients to converge the estimated echo to better match the real echo, in general, AEC technology minimizes the effects of echo by using cancellation, suppression, and masking techniques. The AEC adaptive filter implements a least mean squared algorithm and an adaptive finite impulse response filter.

The algorithm uses the previous sample values and error to update the FIR's filter coefficients. It then uses the updated new coefficients and the latest sample values to calculate the FIR filter's output. This output is used to calculate the next error. The basic operations in at least mean squared adaptive filter function are multiplication, accumulation, and a memory shift.

Since the algorithm requires old memory data to update the coefficients before doing the filtering, always make the data buffer one tap longer than the coefficients buffer to retain one old data set after the memory shift. This allows you to use old memory values to calculate new coefficient values. The rate at which the filter convergence occurs is controlled by two configurable parameters.

There is a faster step size parameter that is automatically applied if the near end microphone signal has only acoustic echo, and a slower step size parameter that is automatically applied if the near end microphone signal has both acoustic echo and near end speech. If acoustic echo is not detected in the microphone input at all, then no filter adaptation will be made.

The convergence properties of the AEC algorithms are largely determined by the step size parameter and the power of the far end signal. In general, making the step size larger speeds the convergence, while a smaller step size reduces the asymptotic cancellation error. The convergence time constant is inversely proportional to the power of Y of n. So the algorithms converge very slowly for lower power signals.

The next important component is the nonlinear processor. When the normalized least mean squared algorithm cannot completely remove echo, the nonlinear processor controller will suppress any remaining residual echo. The nonlinear processor controller initially classifies the communication scenario into one of the following four modes and then applies an appropriate amount of attenuation to the transmitter and receiver signals to make the residual echo less audible.

The four modes are idle, where neither far end nor near end is talking, double talk, where far end and near end are talking at the same time, far end, where only the far end is talking, and near end, where only the near end is talking. The aggressiveness of the NLP algorithm can be controlled. If the NLP algorithm is too aggressive, the terminal will provide half duplex communication and will not allow a near end talker to in during the conversation while a far end user is talking.

On the other hand, underaggressiveness may not provide enough needed suppression. Therefore, the receiver and transmitter attenuation levels for each mode are configurable. The NLP controller has three tunable parameters, receiver, attenuation, and double talk, transmitter, attenuation, and double talk, and transmitter, attenuation, and far end. The recommended NLP controller values for different scenarios are shown in this table.

A center clipper module is applied to replace small amplitude audio samples with zeros to make low level residual echo completely inaudible. The NLP block applies attenuation to the error signal during the far end single talk mode. So its output would not have any echo components of audible energy. The application of the large attenuation is typically controlled by the double talk detector only. The center clip attenuation is configurable.

A Comfort Noise Generator, or CNG, is implemented to mask the effects of NLP mode changes, as well as eliminating any dead air effect. This effect, if not addressed by the CNG, can confuse a listener into thinking the line has temporarily been dropped. The CNG inserts comfort noise during periods when there is no signal received. The CNG estimates the near end background noise level and generates pink noise that matches the same level as the estimated background noise.

The main purpose of CNG is to make background noise sound smoother in the transmitter. There are coefficients in CNG that define the shape of the noise generated to mask NLP transience and eliminate the dead air effect. If the comfort noise is well matched to that of the transmitted background noise during speech periods, then the near end party does not notice the switching during the conversation. Since the noise changes constantly, the CNG should be updated regularly.

Speech detection is a very important part of AEC. It must be done before the software can determine whether it's a filter, update, filter and update, or simply freeze the coefficients of the adaptive filter. There are three speech detectors, far end speech detector, double talk detector, and near end speech detector.

Near end speech detection, this portion of the algorithm acts as a control input to the coefficient adaptation algorithm. It basically decides when the coefficient update process should be frozen. Far end speech detection, far end speech detection means that only the far end speaker is active. This is the only time the AEC program performs both filtering and updating. The very short power estimates of the far end and near end speech signals are used to determine if far end speech is present.

Double talk detection, in AEC algorithms, the presence of both far end speech and near end speech is known as double talk. After double talk is detected, the program freezes the FIR filter's coefficient updates. However, filtering is still done. The speech detection software always checks for the presence of far end speech first, then it goes to double talk detection. It performs double talk detection even if it does not detect far end speech. This avoids false detection due to the small signal level of far end speech.

The realization of the AEC algorithm on the AIC3254, which is implemented in the PurePath Studio Portable Audio Environment is shown here. This figure shows the process flow for implementation of AEC in a typical cell phone application. There are several components in the PurePath Studio. And each has a specific role in the AEC algorithm.

For example, AEC Framework determines register settings for the AIC3254 necessary for proper AEC operation, such as registers that control clocking, input/output, and power supply operation. Another component in the flow is AEC 32, which implements AEC using a 32 millisecond, 256 tap, FIR adaptive normalized least mean squared filter. For more information about process flow and components, you can visit the AIC3254 product page.

The single most important feature when evaluating AEC is full duplex performance. One needs to characterize the speaker and microphone enclosure performance prior to AEC tuning. This is to set the bounds for the best achievable double talk performance. There are several steps that need to be performed for characterizing the speaker and microphone enclosure performance prior to AEC tuning. Let's go over these steps.

Speaker loudness tests obtain received loudness rating. Received loudness rating is the loudness loss in the received direction from the digital reference point to the ear reference point. Usually, the end customer provides a spectral mask of how loud they want the speaker to be. This test makes sure the speaker conforms to the loudness specifications.

Microphone path performance assessments obtained send loudness rating. This characterizes the loudness of the transmitted audio sent through the microphone by comparing the amplitude of the sound waves into the microphone to the amplitude of the resulting audio signal at the digital reference point. Send loudness rating is measured in dBv over Pascals.

Echo returned loss is the reduction in echo. The higher the echo return loss, the better, since a higher ratio corresponds to a smaller amount of echo. Note that the echo return loss is the attenuation of echo resulting from the enclosure design and not the AEC algorithm. Echo return loss is measured in decibels.

Speaker harmonic distortion affects the ability of the device to achieve full duplex operation. Duplex operation refers to the ability of the system to simultaneously transmit and receive audio. Echo nonlinearity measures the nonlinear coupling between speaker and microphone that sets the bound on duplex performance.

After the above five steps are performed, the AEC tuning is done to ensure the best system performance can be obtained. The results help to narrow down the performance limitations in the mechanical design, as well as help to understand the performance boundaries after AEC tuning. To find more audio resources and products, visit ti.com.

This video is part of a series

Precision labs series: Audio fundamentals
video-playlist(7 videos)