Classes | |
struct | Array |
Represents a buffer of values on the GPU. More... | |
struct | Bindings |
Represents an ordered collection of WGPUBuffers (wrapped as tensors, non-overlapping views, or arrays) for the purpose of binding them to a kernel operation to make them accessible to the GPU kernel. More... | |
struct | CallbackData |
Used for on-done callback data for asynchronous operations sduch as kernel launching. More... | |
struct | Context |
Represents a GPU context, aggregates WebGPU API handles to interact with the GPU including the instance, adapter, device, and queue. More... | |
struct | CopyData |
Staging buffer and callback data for copying data between the GPU and CPU. More... | |
struct | Kernel |
Represents handles + metadata for a reusable kernel on the GPU. The struct members can be divided into "consumed upon dispatch" (commandBuffer) and reusable ahead-of-time setup (all other members). More... | |
struct | KernelCode |
KernelCode is the representation of WGSL GPU code with template substitutions applied. It is a type around the code string with additional metadata for workgroup size and precision since they are specified in the WGSL code. Additionally, label and entryPoint are used by createKernel() to specify the label and entry point of the kernel. More... | |
struct | KernelPool |
A pool of kernels to manage GPU resources. For simple use cases this is instantiated as a member in the Context struct although it's possible to have multiple resource pools of kernels in more complex scenarios. More... | |
struct | Logger |
Logger struct for logging messages. stream: The stream to log to. buffer: A buffer to store the formatted message. level: The log level to log messages at. More... | |
struct | NoParam |
NoParam is a no-op type used to indicate that a kernel does not have any parameters. More... | |
struct | Shape |
Represents the shape of a tensor. More... | |
struct | Tensor |
Represents a tensor on the GPU, which is a buffer of values with a shape. More... | |
struct | TensorPool |
Represents a pool of tensors to manage GPU resources. The pool is responsible for managing the lifetime of the tensors and freeing them when the pool is destroyed. More... | |
struct | TensorView |
Represents a non-owning view into a tensor specifying an offset and a subspan. This is useful for specifying a slice of a tensor on the GPU without copying the data. More... | |
Enumerations | |
enum | NumType { kf16 , kf32 } |
enum | LogLevel { kError = 0 , kWarn = 1 , kInfo = 2 , kTrace = 3 } |
Functions | |
size_t | size (const Shape &shape) |
Returns the number of elements in a tensor with the given shape, which is equal to the product of the dimensions. | |
template<std::size_t N> | |
Bindings (std::array< Tensor, N >) -> Bindings< N > | |
Deduction guide for Bindings. | |
template<typename... Args> | |
Bindings (Args...) -> Bindings< sizeof...(Args)> | |
size_t | sizeBytes (const NumType &type) |
Returns the number of bytes of a number type. | |
std::string | toString (NumType type) |
Converts NumType to string. | |
std::string | toString (const Shape &shape) |
Converts Shape to string. The string formatting is meant to be slotted into WGSL code (hence no additional parentheses or brackets). | |
std::string | toString (size_t value) |
Converts size_t to string. Wraps std::to_string for consistency, instead of having to remember to switch between std::to_string and toString depending on the type. | |
void | replaceAll (std::string &str, const std::string &from, const std::string &to) |
simple in-place string replacement helper function for substituting placeholders in a WGSL string template. | |
void | replaceAll (std::string &str, const std::vector< std::pair< std::string, std::string > > &reps) |
Overload of the string replacement helper function to replace multiple substrings in a string with multiple replacements. | |
bool | operator< (const Kernel &lhs, const Kernel &rhs) |
Operator implementation to make the Kernel type hashable. | |
void | processEvents (const WGPUInstance &instance) |
Tensor | createTensor (TensorPool &pool, WGPUDevice &device, const Shape &shape, NumType dtype, WGPUBufferUsageFlags usage=WGPUBufferUsage_Storage|WGPUBufferUsage_CopyDst|WGPUBufferUsage_CopySrc) |
Tensor factory function to create a tensor (a Tensor type is simply an Array with an N-dimensional Shape specification) on the GPU. The tensor is created with the given shape, data type, and usage flags, added to the TensorPool, and returned. | |
Tensor | createTensor (Context &ctx, const Shape &shape, NumType dtype) |
Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape and data type. | |
Tensor | createTensor (Context &ctx, const Shape &shape, NumType dtype, float *data) |
Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape, data type. This overload also takes initial float* data to populate the tensor with. | |
Tensor | createTensor (Context &ctx, const Shape &shape, NumType dtype, half *data) |
Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape, data type. This overload also takes initial half* data to populate the tensor with. | |
void | FreeTensor (TensorPool &pool, Tensor tensor) |
Frees a tensor resource and updates the tensor pool. | |
void | check (bool condition, const char *message, const char *file="unkown", int line=-1) |
Checks a condition and logs an error message if the condition is false. In debug mode, it will also exit the program with an error code. | |
Context | createContext (const WGPUInstanceDescriptor &desc={}, const WGPURequestAdapterOptions &adapterOpts={}, const WGPUDeviceDescriptor &devDescriptor={}) |
Factory function to create a GPU context, which aggregates WebGPU API handles to interact with the GPU including the instance, adapter, device, and queue. | |
void | wait (Context &ctx, std::future< void > &future) |
void | toCPU (Context &ctx, Tensor &tensor, void *data, size_t bufferSize, CopyData &op) |
Copies data from a GPU buffer to CPU memory. | |
void | toCPU (Context &ctx, Tensor &tensor, void *data, size_t bufferSize) |
Overload of the toCPU function to copy data from a GPU buffer to CPU but initializes a staging buffer and promise/future for the operation for you. | |
template<size_t N> | |
void | toCPU (Context &ctx, Tensor &tensor, std::array< float, N > &data) |
Overload of the toCPU function to copy data from a GPU buffer to CPU memory for an array of floats instead of a pointer to a float buffer. | |
void | toGPU (Context &ctx, const void *data, WGPUBuffer buffer, size_t size) |
Copies data from CPU memory to a GPU buffer. The toGPU overloads are effectively a convenience wrapper around the WebGPU API call wgpuQueueWriteBuffer. | |
void | toGPU (Context &ctx, const float *data, Tensor &tensor) |
Overload of the toGPU function to copy data from CPU memory to a GPU taking a Tensor instance instead of a WGPUBuffer instance. | |
void | toGPU (Context &ctx, const half *data, Tensor &tensor) |
template<typename Params > | |
void | toGPU (Context &ctx, Params ¶ms, Kernel &op) |
void | resetCommandBuffer (WGPUDevice &device, Kernel &op) |
Resets the command buffer in preparation for a kernel dispatch. Since command buffers are consumed upon submission, this function is used both in the initial kernel creation and every time the kernel is to be reused for a dispatch. | |
size_t | cdiv (size_t n, size_t d) |
Ceiling division. | |
Shape | cdiv (Shape total, Shape group) |
cdiv for shape specification. Mostly useful for evenly dividing total # threads by workgroup size dimensions. | |
Kernel | createKernel (Context &ctx, const KernelCode &code, const Tensor *dataBindings, size_t numTensors, const size_t *viewOffsets, const Shape &nWorkgroups, const void *params=nullptr, size_t paramsSize=0) |
A factory function to create a kernel on the GPU. The kernel is created with the given WGSL code, input tensors, output tensor, and optional parameters. | |
template<typename ParamsType = NoParam, size_t numInputs> | |
Kernel | createKernel (Context &ctx, const KernelCode &code, const Bindings< numInputs > &dataBindings, const Shape &nWorkgroups, const ParamsType ¶ms=ParamsType{}) |
Overload which wraps the createKernel factory function to create a kernel on the GPU. This overload uses takes a static collection of input tensors instead of a pointer and a statically determined ParamsType instead of casting params to a void pointer. | |
void | dispatchKernel (Context &ctx, Kernel &kernel, std::promise< void > &promise) |
Asynchronously submits a kernel to the GPU queue for execution. It also sets up a callback to notify when the kernel has finished executing by setting the value of the promise in the kernel instance argument. | |
template<typename numtype > | |
std::string | show (const numtype *a, size_t rows, size_t cols, const std::string &name="") |
Show a 2D array as a string, base implementation. | |
template<typename numtype , size_t rows, size_t cols> | |
std::string | show (const std::array< numtype, rows *cols > &a, const std::string &name="") |
Overload of show() for std::array. | |
template<size_t rows, size_t cols> | |
std::string | show (const std::array< float, rows *cols > &a, const std::string &name="") |
Overload of show() for float std::array. | |
void | range (float *input, size_t N, float start=0.0, float step=1.0) |
Populate the array with a range of values. This is mostly for testing purposes. | |
template<size_t N> | |
void | range (std::array< float, N > &input, float start=0.0, float step=1.0) |
Overload of range() for std::array. | |
void | randint (float *a, size_t N, std::mt19937 &gen, int min=-1, int max=1) |
Populate the array with random integers. | |
template<typename numtype , size_t size> | |
void | randint (std::array< numtype, size > &a, std::mt19937 &gen, int min=-1, int max=1) |
Overload of randint() for std::array. | |
void | randn (float *a, size_t N, std::mt19937 &gen, float mean=0.0, float std=1.0) |
Populate the array with random floats, generated from a Gaussian distribution. | |
template<size_t size> | |
void | randn (std::array< float, size > &a, std::mt19937 &gen, float mean=0.0, float std=1.0) |
Overload of randn() for std::array. | |
void | eye (float *a, size_t N) |
Populate a square matrix with the identity matrix. | |
void | transpose (float *input, float *output, size_t M, size_t N) |
Transpose a matrix. | |
void | flip (float *a, size_t R, size_t C, bool horizontal=true) |
Flip a matrix horizontally or vertically. | |
bool | isclose (float *a, float *b, size_t n, float tol=1e-3) |
Determine if the values of two arrays are close to each other. | |
void | LOG (Logger &logger, int level, const char *message,...) |
Log a message to the logger. If NDEBUG is defined in a source or as a compiler flag, this is a no-op. | |
void | setLogLevel (int level) |
Set the log level of the default logger. | |
Variables | |
template<typename T > | |
constexpr bool | IsNoParam = std::is_same_v<T, NoParam> |
static constexpr int | kShowMaxRows = 8 |
static constexpr int | kShowMaxCols = 8 |
static const char * | kLevelStr [] = {"error", "warn", "info", "trace"} |
static Logger | kDefLog = {stdout, "", kInfo} |
Default logger for logging messages to stdout at the info level. Output stream and logging level for the default logger can be globally changed on a per-program basis. | |
enum gpu::LogLevel |
enum gpu::NumType |
gpu::Bindings | ( | Args... | ) | -> Bindings< sizeof...(Args)> |
gpu::Bindings | ( | std::array< Tensor, N > | ) | -> Bindings< N > |
Deduction guide for Bindings.
|
inline |
|
inline |
Checks a condition and logs an error message if the condition is false. In debug mode, it will also exit the program with an error code.
[in] | condition | The condition to check. |
[in] | message | The error message to log if the condition is false. |
[in] | file | The source file where the check is performed. |
[in] | line | The line number in the source file where the check is performed. |
Definition at line 646 of file gpu.h.
|
inline |
Factory function to create a GPU context, which aggregates WebGPU API handles to interact with the GPU including the instance, adapter, device, and queue.
The function takes optional descriptor parameters for the instance descriptor, adapter request options, and device descriptor, which are passed through to the WebGPU API calls to create the instance, adapter, and device.
If dawn is used, it also sets up an error callback for device loss.
[in] | desc | Instance descriptor for the WebGPU instance (optional) |
[in] | adapterOpts | Adapter request options for the WebGPU adapter (optional) |
[in] | devDescriptor | Device descriptor for the WebGPU device (optional) |
Definition at line 678 of file gpu.h.
Kernel gpu::createKernel | ( | Context & | ctx, |
const KernelCode & | code, | ||
const Bindings< numInputs > & | dataBindings, | ||
const Shape & | nWorkgroups, | ||
const ParamsType & | params = ParamsType{} ) |
Overload which wraps the createKernel factory function to create a kernel on the GPU. This overload uses takes a static collection of input tensors instead of a pointer and a statically determined ParamsType instead of casting params to a void pointer.
[in] | ctx | Context instance to manage the kernel |
[in] | code | WGSL code for the kernel |
[in] | dataBindings | A Bindings of tensors whose GPU buffers are bound to the kernel as inputs and outputs. |
[in] | nWorkgroups | Number of workgroups in the x, y, z grid, must be a Shape of rank == 3. |
[in] | params | Optional parameters for the kernel. If the kernel does not have any parameters, use NoParam. |
nWorkgroups, params);
Definition at line 1163 of file gpu.h.
|
inline |
A factory function to create a kernel on the GPU. The kernel is created with the given WGSL code, input tensors, output tensor, and optional parameters.
Note that the values of the input tensors are not used here, only the reference handles to the underlying buffers as well as the size of the buffers.
[in] | ctx | Context instance to manage the kernel |
[in] | code | WGSL code for the kernel |
[in] | dataBindings | Pointer to a span of tensors bound to the kernel |
[in] | numTensors | Number of tensors in the dataBindings span |
[in] | viewOffsets | Pointer to an array of view offsets for the input tensors |
[in] | nWorkgroups | Shape of the workgroup |
[in] | params | Optional parameters for the kernel. If the kernel does not have any parameters, use NoParam. This is cast as void* to allow for arbitrary types to be passed as parameters. |
[in] | paramsSize | Size of the parameters buffer in bytes. |
output, nThreads, params, paramsSize);
Definition at line 1007 of file gpu.h.
Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape and data type.
Instead of taking the TensoPool and raw WebGPU API WGPUDevice and WGPUBufferUsageFlags arguments, this is a convenience wrapper around the core createTensor function which has default usage flags for a storage buffer, and also takes in the Context object.
instance instead of the narrower TensorPool object.
[in] | ctx | Context instance to manage the tensor |
[in] | shape | Shape of the tensor |
[in] | dtype | Data type of the tensor (e.g. kf32) |
Definition at line 530 of file gpu.h.
Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape, data type. This overload also takes initial float* data to populate the tensor with.
The data is assumed to be of size equal to the product of the dimensions in the shape, and is copied to the GPU buffer.
[in] | ctx | Context instance to manage the tensor |
[in] | shape | Shape of the tensor |
[in] | dtype | Data type of the tensor (e.g. kf32) |
[in] | data | Initial data to populate the tensor with |
Definition at line 552 of file gpu.h.
Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape, data type. This overload also takes initial half* data to populate the tensor with.
The data is assumed to be of size equal to the product of the dimensions in the shape, and is copied to the GPU buffer.
[in] | ctx | Context instance to manage the tensor |
[in] | shape | Shape of the tensor |
[in] | dtype | Data type of the tensor (e.g. kf32) |
[in] | data | Initial data to populate the tensor with |
Definition at line 582 of file gpu.h.
|
inline |
Tensor factory function to create a tensor (a Tensor type is simply an Array with an N-dimensional Shape specification) on the GPU. The tensor is created with the given shape, data type, and usage flags, added to the TensorPool, and returned.
This is the core implementation which takes the minimal set of parameters in terms of the raw WebGPU API, and is used by the other createTensor overloads which provide more ergonomic interfaces.
[in] | pool | TensorPool instance to manage the tensor |
[in] | device | WGPUDevice instance to create the tensor on |
[in] | shape | Shape of the tensor |
[in] | dtype | Data type of the tensor (e.g. kf32) |
[in] | usage | Usage flags for the tensor buffer |
Definition at line 491 of file gpu.h.
Asynchronously submits a kernel to the GPU queue for execution. It also sets up a callback to notify when the kernel has finished executing by setting the value of the promise in the kernel instance argument.
dispatchKernel does not wait for the kernel to finish executing and returns immediately. The caller can wait for the kernel to finish executing by calling wait() on the future in the kernel instance.
[in] | ctx | Context instance to manage the kernel, from which the queue for the GPU is obtained |
[in] | kernel | Kernel instance to dispatch |
[in] | promise | Promise to set when the kernel has finished executing |
Definition at line 1200 of file gpu.h.
|
inline |
Populate a square matrix with the identity matrix.
a | The array to populate. |
N | The number of rows and columns in the square matrix. |
Definition at line 231 of file array_utils.h.
|
inline |
Flip a matrix horizontally or vertically.
a | The matrix to flip. |
R | The number of rows in the matrix. |
C | The number of columns in the matrix. |
horizontal | Whether to flip horizontally (true) or vertically (false). |
Definition at line 264 of file array_utils.h.
|
inline |
Frees a tensor resource and updates the tensor pool.
Only needed if the use case requires manually managing resource lifetimes of GPU tensors. For simple use cases, the TensorPool destructor will automatically free all tensors.
[in] | pool | TensorPool instance to manage the tensor |
[in] | tensor | Tensor instance to free |
Definition at line 608 of file gpu.h.
bool gpu::isclose | ( | float * | a, |
float * | b, | ||
size_t | n, | ||
float | tol = 1e-3 ) |
Determine if the values of two arrays are close to each other.
a | The first array. |
b | The second array. |
n | The number of elements in the arrays. |
tol | The tolerance for closeness. |
Definition at line 288 of file array_utils.h.
|
inline |
Log a message to the logger. If NDEBUG is defined in a source or as a compiler flag, this is a no-op.
logger | The logger to log to. |
level | The log level of the message. |
message | The message to log. |
Definition at line 34 of file logging.h.
Operator implementation to make the Kernel type hashable.
|
inline |
void gpu::randint | ( | float * | a, |
size_t | N, | ||
std::mt19937 & | gen, | ||
int | min = -1, | ||
int | max = 1 ) |
Populate the array with random integers.
a | The array to populate. |
N | The number of elements in the array. |
gen | The random number generator. |
min | The minimum value for the random integers. |
max | The maximum value for the random integers. |
Definition at line 171 of file array_utils.h.
void gpu::randint | ( | std::array< numtype, size > & | a, |
std::mt19937 & | gen, | ||
int | min = -1, | ||
int | max = 1 ) |
Overload of randint()
for std::array.
a | The array to populate. |
gen | The random number generator. |
min | The minimum value for the random integers. |
max | The maximum value for the random integers. |
Definition at line 186 of file array_utils.h.
void gpu::randn | ( | float * | a, |
size_t | N, | ||
std::mt19937 & | gen, | ||
float | mean = 0.0, | ||
float | std = 1.0 ) |
Populate the array with random floats, generated from a Gaussian distribution.
a | The array to populate. |
N | The number of elements in the array. |
gen | The random number generator. |
mean | The mean of the Gaussian distribution. |
std | The standard deviation of the Gaussian distribution. |
Definition at line 202 of file array_utils.h.
void gpu::randn | ( | std::array< float, size > & | a, |
std::mt19937 & | gen, | ||
float | mean = 0.0, | ||
float | std = 1.0 ) |
Overload of randn()
for std::array.
a | The array to populate. |
gen | The random number generator. |
mean | The mean of the Gaussian distribution. |
std | The standard deviation of the Gaussian distribution. |
Definition at line 218 of file array_utils.h.
void gpu::range | ( | float * | input, |
size_t | N, | ||
float | start = 0.0, | ||
float | step = 1.0 ) |
Populate the array with a range of values. This is mostly for testing purposes.
input | The array to populate. |
N | The number of elements in the array. |
start | The starting value. |
step | The step size. |
Definition at line 139 of file array_utils.h.
void gpu::range | ( | std::array< float, N > & | input, |
float | start = 0.0, | ||
float | step = 1.0 ) |
Overload of range()
for std::array.
input | The array to populate. |
start | The starting value. |
step | The step size. |
Definition at line 155 of file array_utils.h.
|
inline |
simple in-place string replacement helper function for substituting placeholders in a WGSL string template.
Note this is not meant to be used in performance-critical code paths and should be used ahead-of-time before any performance-critical codepath to preprocess WGSL code strings.
[in] | str | String to mutate with substitution replacements. |
[in] | from | Substring to replace |
[in] | to | Substring to replace with |
Definition at line 256 of file gpu.h.
|
inline |
Overload of the string replacement helper function to replace multiple substrings in a string with multiple replacements.
[in] | str | String to mutate with substitution replacements. |
[in] | reps | Vector of pairs of substrings to replace and their replacements. |
"f32"}});
|
inline |
Resets the command buffer in preparation for a kernel dispatch. Since command buffers are consumed upon submission, this function is used both in the initial kernel creation and every time the kernel is to be reused for a dispatch.
[in] | device | WGPUDevice instance to manage the operation |
[in] | op | Kernel instance representing the kernel to reset |
Definition at line 937 of file gpu.h.
void gpu::setLogLevel | ( | int | level | ) |
Set the log level of the default logger.
level | The log level to set. |
Definition at line 70 of file logging.h.
std::string gpu::show | ( | const numtype * | a, |
size_t | rows, | ||
size_t | cols, | ||
const std::string & | name = "" ) |
Show a 2D array as a string, base implementation.
a | The array to show. |
rows | The number of rows in the array. |
cols | The number of columns in the array. |
name | The name of the array to show. |
Definition at line 43 of file array_utils.h.
std::string gpu::show | ( | const std::array< float, rows *cols > & | a, |
const std::string & | name = "" ) |
Overload of show()
for float std::array.
a | The array to show. |
name | The name of the array to show. |
@
Definition at line 126 of file array_utils.h.
std::string gpu::show | ( | const std::array< numtype, rows *cols > & | a, |
const std::string & | name = "" ) |
Overload of show()
for std::array.
a | The array to show. |
name | The name of the array to show. |
Definition at line 108 of file array_utils.h.
|
inline |
Returns the number of elements in a tensor with the given shape, which is equal to the product of the dimensions.
[in] | shape | Shape of the tensor |
|
inline |
void gpu::toCPU | ( | Context & | ctx, |
Tensor & | tensor, | ||
std::array< float, N > & | data ) |
Overload of the toCPU function to copy data from a GPU buffer to CPU memory for an array of floats instead of a pointer to a float buffer.
Overload of the toCPU function to copy data from a GPU buffer to CPU but initializes a staging buffer and promise/future for the operation for you.
For simple use cases, this overload is recommended as it abstracts away the staging buffer and promise/future management. For more custom use cases where the staging buffer is initialized ahead of time, use the other overload.
[in] | ctx | Context instance to manage the operation |
[in] | tensor | Tensor instance representing the GPU buffer to copy from |
[in] | bufferSize | Size of the data buffer in bytes |
[out] | data | Pointer to the CPU memory to copy the data to |
Definition at line 834 of file gpu.h.
|
inline |
Copies data from a GPU buffer to CPU memory.
[in] | ctx | Context instance to manage the operation |
[in] | tensor | Tensor instance representing the GPU buffer to copy from |
[out] | data | Pointer to the CPU memory to copy the data to |
[in] | bufferSize | Size of the data buffer in bytes |
[in] | op | StagingBuffer instance to manage the operation |
Definition at line 789 of file gpu.h.
Overload of the toGPU function to copy data from CPU memory to a GPU taking a Tensor instance instead of a WGPUBuffer instance.
|
inline |
Copies data from CPU memory to a GPU buffer. The toGPU overloads are effectively a convenience wrapper around the WebGPU API call wgpuQueueWriteBuffer.
[in] | ctx | Context instance to manage the operation |
[in] | data | Pointer to the CPU memory to copy from |
[in] | buffer | WGPUBuffer instance representing the GPU buffer to copy to |
[in] | size | Size of the data buffer in bytes |
Definition at line 915 of file gpu.h.
|
inline |
Converts Shape to string. The string formatting is meant to be slotted into WGSL code (hence no additional parentheses or brackets).
|
inline |
|
inline |
|
inline |
Transpose a matrix.
input | The input matrix. |
output | The output matrix. |
M | The number of rows in the input matrix. |
N | The number of columns in the input matrix. |
Definition at line 249 of file array_utils.h.
|
inline |
|
constexpr |
|
static |
|
staticconstexpr |
Definition at line 27 of file array_utils.h.
|
staticconstexpr |
Definition at line 26 of file array_utils.h.