Internet Explorer ist kein von TI.com unterstützter Browser. Für eine optimale Erfahrung verwenden Sie bitte einen anderen Browser.

VIDEOREIHE

OpenVX-Framework für heterogene Rechenleistung auf Jacinto™-Prozessoren

Erfahren Sie, wie Sie das offene VX-Framework für heterogenes Computing verwenden

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

Modul 1: Ein Überblick über das OpenVX-Framework für heterogenes Computing

|

Referent(en)

Ressourcen

Hello, welcome to this presentation on the OpenVX framework. This is the first module in our series on OpenVX. This first video of the module provides an overview of the OpenVX framework, introducing the OpenVX graph and showing the efficiency that can be achieved by utilizing the OpenVX framework. Later modules will focus on TI's implementation of the OpenVX framework.

OpenVX is an API for vision acceleration, although it can also be used for general compute as well as deep learning. This API is defined by [? Kronos, ?] which is a consortium of different industry leaders, including semiconductor and software companies.

OpenVX provides a high-level extraction API which is targeted at real-time mobile and embedded platforms. The key advantage of OpenVX is that it performs algorithm acceleration across different compute engines via a unified graph framework API.

Its goal is to enable performance portability across different architectures, and it extends portable vision acceleration to very low-power domains. By low power, what we mean is that an OpenVX implementation typically does not require a high powered CPU or GPU architecture. The precision requirements, with respect to floating point architectures, are typically lower than OpenCL. All of this makes OpenVX friendly for embedded platforms.

How is the OpenVX API defined? The basic concept in OpenVX is that of the OpenVX graph. OpenVX developers express a graph of various compute operations with each operation designated by a node. A node can run on any hardware or processor and can be coded in any language, such as C, C++, or OpenCL.

The main advantage of specifying a graph is that it minimizes the host interaction during the graph execution. The host can set up the graph once then execute the graph multiple times almost autonomously.

An example shown here is that of an optical flow graph, wherein the different boxes denote the different operations, and the arrows indicate the data that flows between the different operations. So in this example, a color conversion is first done on the camera image, followed by extracting the y channel of the YUV data, then generating an image pyramid and running optical flow, followed by corner tracking on the output.

This whole graph of five nodes represents a sample vision processing application. There are several inherent attributes of the OpenVX framework that enable efficient processing. While these optimizations are allowed for by the OpenVX specification, it is dependent on the vendor to implement each of these optimizations.

The key to many of these optimizations is OpenVX graph-based model. First, the framework provides the ability to schedule graph executions, allowing the system to split the execution of different parts of the graph to different processing elements. This ultimately enables faster execution of the entire application, or lower power consumption by utilizing the most efficient hardware for a given task.

Next, OpenVX's standard set of data objects are opaque, meaning that data can only be accessed after explicitly asking for such access. OpenVX provides a standard API for accessing this memory which abstracts the cache maintenance operations that are required.

Additionally, intermediate data objects can be created as virtual objects and can, therefore, be even optimized out. The system has more possibilities to cleverly reuse memory, yielding both faster and lower power execution.

Next, by defining a graph up front, the framework has the ability to see which operations are going to follow which other operations. This allows it to merge the processing functions or kernels. Instead of processing every pixel of an image with kernel a, storing intermediate results, and then processing them with kernel b, the system may do a and b at once as it iterates over the pixels.

This can yield huge savings, meaning many fewer memory accesses, better memory locality, and smaller kernel overhead, as fewer kernels need to be launched.

Finally, by tiling the image, one can further increase data locality and benefit from better caching behavior. For instance, imagine that an accelerator may have some local memory, but not enough for an entire image.

By defining a graph up front, it might be able to process the kernel after kernel over this tile, keeping all the data in a local buffer, and avoid all expensive trips for the intermediate data to go back and forth in the main memory.

After one tile has been processed, the set of kernels can be applied to the next tile, until the whole image has been processed. While all these optimizations are possible within OpenVX, it is up to the vendor to implement such optimizations.

This concludes our overview of the OpenVX framework. The next video in this module provides a walk-through of a sample OpenVX application. The source code for TI's OpenVX framework is included within the processor SDK automotive.

To download, please refer to the link shown. Thank you for taking the time to watch this demonstration. If you have any questions, please post them on the Texas instruments E2E forum at e2e.ti.com.

View series

OpenVX-Framework für heterogene Rechenleistung auf Jacinto™-Prozessoren