The UPSTO has granted Apple a patent for Multi-dimensional thread grouping for multiple processors. This patent allows an application to perform a task using any available processing resources, such as CPUs (Central Processing Units) and one or more GPUs (Graphical Processing Units) capable of performing the task.
Apple’s Patent Background:
As GPUs continue to evolve into high performance parallel computing devices, more and more applications are written to perform data parallel computations in GPUs similar to general purpose computing devices. Today, these applications are designed to run on specific GPUs using vendor specific interfaces. Thus, these applications are not able to leverage processing resources of CPUs even when both GPUs and CPUs are available in a data processing system. Nor can processing resources be leveraged across GPUs from different vendors where such an application is running.
However, as more and more CPUs embrace multiple cores to perform data parallel computations, more and more processing tasks can be supported by either CPUs and/or GPUs whichever are available. Traditionally, GPUs and CPUs are configured through separate programming environments that are not compatible with each other. Most GPUs require dedicated programs that are vendor specific. As a result, it is very difficult for an application to leverage processing resources of both CPUs and GPUs, for example, leveraging processing resources of GPUs with data parallel computing capabilities together with multi-core CPUs.
Therefore, there is a need in modern data processing systems to overcome the above problems to allow an application to perform a task using any available processing resources, such as CPUs and one or more GPUs, capable of performing the task.
An embodiment of the present invention includes methods and apparatuses that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.
In an alternative embodiment, thread group sizes for one or more target processing units for executing executable codes compiled from a single source are determined in response to an API request from an application running in host processor. The one or more target processing units include GPUs and CPUs coupled to the host processor to execute the executable codes in parallel. The one or more executable codes are loaded into the one or more target processing units according to the thread group sizes for concurrent execution to optimize runtime resource usage.
Other features of the present invention will be apparent from the accompanying drawings and from the detailed description that follows.