gpu.cpp 0.1.0
 
Loading...
Searching...
No Matches
gpu Namespace Reference

Classes

struct  Array
 Represents a buffer of values on the GPU. More...
 
struct  Bindings
 Represents an ordered collection of WGPUBuffers (wrapped as tensors, non-overlapping views, or arrays) for the purpose of binding them to a kernel operation to make them accessible to the GPU kernel. More...
 
struct  CallbackData
 Used for on-done callback data for asynchronous operations sduch as kernel launching. More...
 
struct  Context
 Represents a GPU context, aggregates WebGPU API handles to interact with the GPU including the instance, adapter, device, and queue. More...
 
struct  CopyData
 Staging buffer and callback data for copying data between the GPU and CPU. More...
 
struct  Kernel
 Represents handles + metadata for a reusable kernel on the GPU. The struct members can be divided into "consumed upon dispatch" (commandBuffer) and reusable ahead-of-time setup (all other members). More...
 
struct  KernelCode
 KernelCode is the representation of WGSL GPU code with template substitutions applied. It is a type around the code string with additional metadata for workgroup size and precision since they are specified in the WGSL code. Additionally, label and entryPoint are used by createKernel() to specify the label and entry point of the kernel. More...
 
struct  KernelPool
 A pool of kernels to manage GPU resources. For simple use cases this is instantiated as a member in the Context struct although it's possible to have multiple resource pools of kernels in more complex scenarios. More...
 
struct  Logger
 Logger struct for logging messages. stream: The stream to log to. buffer: A buffer to store the formatted message. level: The log level to log messages at. More...
 
struct  NoParam
 NoParam is a no-op type used to indicate that a kernel does not have any parameters. More...
 
struct  Shape
 Represents the shape of a tensor. More...
 
struct  Tensor
 Represents a tensor on the GPU, which is a buffer of values with a shape. More...
 
struct  TensorPool
 Represents a pool of tensors to manage GPU resources. The pool is responsible for managing the lifetime of the tensors and freeing them when the pool is destroyed. More...
 
struct  TensorView
 Represents a non-owning view into a tensor specifying an offset and a subspan. This is useful for specifying a slice of a tensor on the GPU without copying the data. More...
 

Enumerations

enum  NumType { kf16 , kf32 }
 
enum  LogLevel { kError = 0 , kWarn = 1 , kInfo = 2 , kTrace = 3 }
 

Functions

size_t size (const Shape &shape)
 Returns the number of elements in a tensor with the given shape, which is equal to the product of the dimensions.
 
template<std::size_t N>
 Bindings (std::array< Tensor, N >) -> Bindings< N >
 Deduction guide for Bindings.
 
template<typename... Args>
 Bindings (Args...) -> Bindings< sizeof...(Args)>
 
size_t sizeBytes (const NumType &type)
 Returns the number of bytes of a number type.
 
std::string toString (NumType type)
 Converts NumType to string.
 
std::string toString (const Shape &shape)
 Converts Shape to string. The string formatting is meant to be slotted into WGSL code (hence no additional parentheses or brackets).
 
std::string toString (size_t value)
 Converts size_t to string. Wraps std::to_string for consistency, instead of having to remember to switch between std::to_string and toString depending on the type.
 
void replaceAll (std::string &str, const std::string &from, const std::string &to)
 simple in-place string replacement helper function for substituting placeholders in a WGSL string template.
 
void replaceAll (std::string &str, const std::vector< std::pair< std::string, std::string > > &reps)
 Overload of the string replacement helper function to replace multiple substrings in a string with multiple replacements.
 
bool operator< (const Kernel &lhs, const Kernel &rhs)
 Operator implementation to make the Kernel type hashable.
 
void processEvents (const WGPUInstance &instance)
 
Tensor createTensor (TensorPool &pool, WGPUDevice &device, const Shape &shape, NumType dtype, WGPUBufferUsageFlags usage=WGPUBufferUsage_Storage|WGPUBufferUsage_CopyDst|WGPUBufferUsage_CopySrc)
 Tensor factory function to create a tensor (a Tensor type is simply an Array with an N-dimensional Shape specification) on the GPU. The tensor is created with the given shape, data type, and usage flags, added to the TensorPool, and returned.
 
Tensor createTensor (Context &ctx, const Shape &shape, NumType dtype)
 Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape and data type.
 
Tensor createTensor (Context &ctx, const Shape &shape, NumType dtype, float *data)
 Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape, data type. This overload also takes initial float* data to populate the tensor with.
 
Tensor createTensor (Context &ctx, const Shape &shape, NumType dtype, half *data)
 Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape, data type. This overload also takes initial half* data to populate the tensor with.
 
void FreeTensor (TensorPool &pool, Tensor tensor)
 Frees a tensor resource and updates the tensor pool.
 
void check (bool condition, const char *message, const char *file="unkown", int line=-1)
 Checks a condition and logs an error message if the condition is false. In debug mode, it will also exit the program with an error code.
 
Context createContext (const WGPUInstanceDescriptor &desc={}, const WGPURequestAdapterOptions &adapterOpts={}, const WGPUDeviceDescriptor &devDescriptor={})
 Factory function to create a GPU context, which aggregates WebGPU API handles to interact with the GPU including the instance, adapter, device, and queue.
 
void wait (Context &ctx, std::future< void > &future)
 
void toCPU (Context &ctx, Tensor &tensor, void *data, size_t bufferSize, CopyData &op)
 Copies data from a GPU buffer to CPU memory.
 
void toCPU (Context &ctx, Tensor &tensor, void *data, size_t bufferSize)
 Overload of the toCPU function to copy data from a GPU buffer to CPU but initializes a staging buffer and promise/future for the operation for you.
 
template<size_t N>
void toCPU (Context &ctx, Tensor &tensor, std::array< float, N > &data)
 Overload of the toCPU function to copy data from a GPU buffer to CPU memory for an array of floats instead of a pointer to a float buffer.
 
void toGPU (Context &ctx, const void *data, WGPUBuffer buffer, size_t size)
 Copies data from CPU memory to a GPU buffer. The toGPU overloads are effectively a convenience wrapper around the WebGPU API call wgpuQueueWriteBuffer.
 
void toGPU (Context &ctx, const float *data, Tensor &tensor)
 Overload of the toGPU function to copy data from CPU memory to a GPU taking a Tensor instance instead of a WGPUBuffer instance.
 
void toGPU (Context &ctx, const half *data, Tensor &tensor)
 
template<typename Params >
void toGPU (Context &ctx, Params &params, Kernel &op)
 
void resetCommandBuffer (WGPUDevice &device, Kernel &op)
 Resets the command buffer in preparation for a kernel dispatch. Since command buffers are consumed upon submission, this function is used both in the initial kernel creation and every time the kernel is to be reused for a dispatch.
 
size_t cdiv (size_t n, size_t d)
 Ceiling division.
 
Shape cdiv (Shape total, Shape group)
 cdiv for shape specification. Mostly useful for evenly dividing total # threads by workgroup size dimensions.
 
Kernel createKernel (Context &ctx, const KernelCode &code, const Tensor *dataBindings, size_t numTensors, const size_t *viewOffsets, const Shape &nWorkgroups, const void *params=nullptr, size_t paramsSize=0)
 A factory function to create a kernel on the GPU. The kernel is created with the given WGSL code, input tensors, output tensor, and optional parameters.
 
template<typename ParamsType = NoParam, size_t numInputs>
Kernel createKernel (Context &ctx, const KernelCode &code, const Bindings< numInputs > &dataBindings, const Shape &nWorkgroups, const ParamsType &params=ParamsType{})
 Overload which wraps the createKernel factory function to create a kernel on the GPU. This overload uses takes a static collection of input tensors instead of a pointer and a statically determined ParamsType instead of casting params to a void pointer.
 
void dispatchKernel (Context &ctx, Kernel &kernel, std::promise< void > &promise)
 Asynchronously submits a kernel to the GPU queue for execution. It also sets up a callback to notify when the kernel has finished executing by setting the value of the promise in the kernel instance argument.
 
template<typename numtype >
std::string show (const numtype *a, size_t rows, size_t cols, const std::string &name="")
 Show a 2D array as a string, base implementation.
 
template<typename numtype , size_t rows, size_t cols>
std::string show (const std::array< numtype, rows *cols > &a, const std::string &name="")
 Overload of show() for std::array.
 
template<size_t rows, size_t cols>
std::string show (const std::array< float, rows *cols > &a, const std::string &name="")
 Overload of show() for float std::array.
 
void range (float *input, size_t N, float start=0.0, float step=1.0)
 Populate the array with a range of values. This is mostly for testing purposes.
 
template<size_t N>
void range (std::array< float, N > &input, float start=0.0, float step=1.0)
 Overload of range() for std::array.
 
void randint (float *a, size_t N, std::mt19937 &gen, int min=-1, int max=1)
 Populate the array with random integers.
 
template<typename numtype , size_t size>
void randint (std::array< numtype, size > &a, std::mt19937 &gen, int min=-1, int max=1)
 Overload of randint() for std::array.
 
void randn (float *a, size_t N, std::mt19937 &gen, float mean=0.0, float std=1.0)
 Populate the array with random floats, generated from a Gaussian distribution.
 
template<size_t size>
void randn (std::array< float, size > &a, std::mt19937 &gen, float mean=0.0, float std=1.0)
 Overload of randn() for std::array.
 
void eye (float *a, size_t N)
 Populate a square matrix with the identity matrix.
 
void transpose (float *input, float *output, size_t M, size_t N)
 Transpose a matrix.
 
void flip (float *a, size_t R, size_t C, bool horizontal=true)
 Flip a matrix horizontally or vertically.
 
bool isclose (float *a, float *b, size_t n, float tol=1e-3)
 Determine if the values of two arrays are close to each other.
 
void LOG (Logger &logger, int level, const char *message,...)
 Log a message to the logger. If NDEBUG is defined in a source or as a compiler flag, this is a no-op.
 
void setLogLevel (int level)
 Set the log level of the default logger.
 

Variables

template<typename T >
constexpr bool IsNoParam = std::is_same_v<T, NoParam>
 
static constexpr int kShowMaxRows = 8
 
static constexpr int kShowMaxCols = 8
 
static const char * kLevelStr [] = {"error", "warn", "info", "trace"}
 
static Logger kDefLog = {stdout, "", kInfo}
 Default logger for logging messages to stdout at the info level. Output stream and logging level for the default logger can be globally changed on a per-program basis.
 

Enumeration Type Documentation

◆ LogLevel

Enumerator
kError 
kWarn 
kInfo 
kTrace 

Definition at line 9 of file logging.h.

9{ kError = 0, kWarn = 1, kInfo = 2, kTrace = 3 };
@ kError
Definition logging.h:9
@ kWarn
Definition logging.h:9
@ kTrace
Definition logging.h:9
@ kInfo
Definition logging.h:9

◆ NumType

Enumerator
kf16 
kf32 

Definition at line 183 of file gpu.h.

183 {
184 kf16, // (experimental)
185 kf32
186};
@ kf16
Definition gpu.h:184
@ kf32
Definition gpu.h:185

Function Documentation

◆ Bindings() [1/2]

template<typename... Args>
gpu::Bindings ( Args... ) -> Bindings< sizeof...(Args)>

◆ Bindings() [2/2]

template<std::size_t N>
gpu::Bindings ( std::array< Tensor, N > ) -> Bindings< N >

Deduction guide for Bindings.

◆ cdiv() [1/2]

Shape gpu::cdiv ( Shape total,
Shape group )
inline

cdiv for shape specification. Mostly useful for evenly dividing total # threads by workgroup size dimensions.

Definition at line 970 of file gpu.h.

970 {
971 assert(total.rank == group.rank);
972 Shape result;
973 result.rank = total.rank;
974 for (size_t dim = 0; dim < total.rank; ++dim) {
975 result[dim] = cdiv(total[dim], group[dim]);
976 }
977 return result;
978}
size_t cdiv(size_t n, size_t d)
Ceiling division.
Definition gpu.h:964
Represents the shape of a tensor.
Definition gpu.h:49
size_t rank
Definition gpu.h:53

◆ cdiv() [2/2]

size_t gpu::cdiv ( size_t n,
size_t d )
inline

Ceiling division.

Definition at line 964 of file gpu.h.

964{ return (n + d - 1) / d; }

◆ check()

void gpu::check ( bool condition,
const char * message,
const char * file = "unkown",
int line = -1 )
inline

Checks a condition and logs an error message if the condition is false. In debug mode, it will also exit the program with an error code.

Parameters
[in]conditionThe condition to check.
[in]messageThe error message to log if the condition is false.
[in]fileThe source file where the check is performed.
[in]lineThe line number in the source file where the check is performed.

Definition at line 646 of file gpu.h.

647 {
648 if (!condition) {
649 LOG(kDefLog, kError, "Error in file %s line %d:\n%s", file, line, message);
650 exit(1);
651 } else {
652 LOG(kDefLog, kTrace, "Success in file %s line %d:\n%s", file, line,
653 message);
654 }
655}
void LOG(Logger &logger, int level, const char *message,...)
Log a message to the logger. If NDEBUG is defined in a source or as a compiler flag,...
Definition logging.h:34

◆ createContext()

Context gpu::createContext ( const WGPUInstanceDescriptor & desc = {},
const WGPURequestAdapterOptions & adapterOpts = {},
const WGPUDeviceDescriptor & devDescriptor = {} )
inline

Factory function to create a GPU context, which aggregates WebGPU API handles to interact with the GPU including the instance, adapter, device, and queue.

The function takes optional descriptor parameters for the instance descriptor, adapter request options, and device descriptor, which are passed through to the WebGPU API calls to create the instance, adapter, and device.

If dawn is used, it also sets up an error callback for device loss.

Parameters
[in]descInstance descriptor for the WebGPU instance (optional)
[in]adapterOptsAdapter request options for the WebGPU adapter (optional)
[in]devDescriptorDevice descriptor for the WebGPU device (optional)
Returns
Context instance representing the created GPU context
Context createContext(const WGPUInstanceDescriptor &desc={}, const WGPURequestAdapterOptions &adapterOpts={}, const WGPUDeviceDescriptor &devDescriptor={})
Factory function to create a GPU context, which aggregates WebGPU API handles to interact with the GP...
Definition gpu.h:678
Represents a GPU context, aggregates WebGPU API handles to interact with the GPU including the instan...
Definition gpu.h:434

Definition at line 678 of file gpu.h.

678 {},
679 const WGPURequestAdapterOptions &adapterOpts = {},
680 const WGPUDeviceDescriptor &devDescriptor = {}) {
681 Context context;
682 {
683#ifndef __EMSCRIPTEN__
684 context.instance = wgpuCreateInstance(&desc);
685#else
686 // Emscripten does not support the instance descriptor
687 // and throws an assertion error if it is not nullptr.
688 context.instance = wgpuCreateInstance(nullptr);
689#endif
690 check(context.instance, "Initialize WebGPU", __FILE__, __LINE__);
691 }
692 LOG(kDefLog, kInfo, "Requesting adapter");
693 {
694 struct AdapterData {
695 WGPUAdapter adapter = nullptr;
696 bool requestEnded = false;
697 };
698 AdapterData adapterData;
699 auto onAdapterRequestEnded = [](WGPURequestAdapterStatus status,
700 WGPUAdapter adapter, char const *message,
701 void *pUserData) {
702 AdapterData &adapterData = *reinterpret_cast<AdapterData *>(pUserData);
703 check(status == WGPURequestAdapterStatus_Success,
704 "Request WebGPU adapter", __FILE__, __LINE__);
705 adapterData.adapter = adapter;
706 adapterData.requestEnded = true;
707 };
708 wgpuInstanceRequestAdapter(context.instance, &adapterOpts,
709 onAdapterRequestEnded, (void *)&adapterData);
710 while (!adapterData.requestEnded) {
711 processEvents(context.instance);
712 }
713 assert(adapterData.requestEnded);
714 context.adapter = adapterData.adapter;
715 }
716 LOG(kDefLog, kInfo, "Requesting device");
717 {
718 struct DeviceData {
719 WGPUDevice device = nullptr;
720 bool requestEnded = false;
721 };
722 DeviceData devData;
723 auto onDeviceRequestEnded = [](WGPURequestDeviceStatus status,
724 WGPUDevice device, char const *message,
725 void *pUserData) {
726 DeviceData &devData = *reinterpret_cast<DeviceData *>(pUserData);
727 check(status == WGPURequestDeviceStatus_Success,
728 "Could not get WebGPU device.", __FILE__, __LINE__);
729 LOG(kDefLog, kTrace, "Device Request succeeded %x",
730 static_cast<void *>(device));
731 devData.device = device;
732 devData.requestEnded = true;
733 };
734#ifdef WEBGPU_BACKEND_DAWN
735 devDescriptor.deviceLostCallbackInfo = {
736 .callback =
737 [](WGPUDevice const *device, WGPUDeviceLostReason reason,
738 char const *message, void *userdata) {
739 if (reason != WGPUDeviceLostReason_Destroyed) {
740 LOG(kDefLog, kError, "Device lost (code %d):\n%s", reason,
741 message);
742 } else {
743 LOG(kDefLog, kInfo, "Device destroyed: %s", message);
744 }
745 },
746 };
747#endif
748 LOG(kDefLog, kInfo, "Requesting device");
749 wgpuAdapterRequestDevice(context.adapter, &devDescriptor,
750 onDeviceRequestEnded, (void *)&devData);
751 LOG(kDefLog, kInfo, "Waiting for device request to end");
752 while (!devData.requestEnded) {
753 processEvents(context.instance);
754 }
755 LOG(kDefLog, kInfo, "Device request ended");
756 assert(devData.requestEnded);
757 context.device = devData.device;
758 wgpuDeviceSetUncapturedErrorCallback(
759 context.device,
760 [](WGPUErrorType type, char const *message, void *devData) {
761 LOG(kDefLog, kError, "Device uncaptured error: %s", message);
762 throw std::runtime_error("Device uncaptured exception.");
763 },
764 nullptr);
765 }
766 context.queue = wgpuDeviceGetQueue(context.device);
767 return context;
768}
void check(bool condition, const char *message, const char *file="unkown", int line=-1)
Checks a condition and logs an error message if the condition is false. In debug mode,...
Definition gpu.h:646
void processEvents(const WGPUInstance &instance)
Definition gpu.h:419

◆ createKernel() [1/2]

template<typename ParamsType = NoParam, size_t numInputs>
Kernel gpu::createKernel ( Context & ctx,
const KernelCode & code,
const Bindings< numInputs > & dataBindings,
const Shape & nWorkgroups,
const ParamsType & params = ParamsType{} )

Overload which wraps the createKernel factory function to create a kernel on the GPU. This overload uses takes a static collection of input tensors instead of a pointer and a statically determined ParamsType instead of casting params to a void pointer.

Parameters
[in]ctxContext instance to manage the kernel
[in]codeWGSL code for the kernel
[in]dataBindingsA Bindings of tensors whose GPU buffers are bound to the kernel as inputs and outputs.
[in]nWorkgroupsNumber of workgroups in the x, y, z grid, must be a Shape of rank == 3.
[in]paramsOptional parameters for the kernel. If the kernel does not have any parameters, use NoParam.
Returns
Kernel instance representing the created kernel
Kernel kernel = createKernel(ctx, code, tensorData, output,
Kernel createKernel(Context &ctx, const KernelCode &code, const Tensor *dataBindings, size_t numTensors, const size_t *viewOffsets, const Shape &nWorkgroups, const void *params=nullptr, size_t paramsSize=0)
A factory function to create a kernel on the GPU. The kernel is created with the given WGSL code,...
Definition gpu.h:1007
Represents handles + metadata for a reusable kernel on the GPU. The struct members can be divided int...
Definition gpu.h:382

nWorkgroups, params);

Definition at line 1163 of file gpu.h.

1166 {}) {
1167 if constexpr (!IsNoParam<ParamsType>) {
1168 // LOG(kDefLog, kTrace, "Using params of size %d bytes",
1169 // sizeof(ParamsType));
1170 return createKernel(ctx, code, dataBindings.data.data(), numInputs,
1171 dataBindings.viewOffsets.data(), nWorkgroups,
1172 reinterpret_cast<const void *>(&params),
1173 sizeof(ParamsType));
1174 } else {
1175 // LOG(kDefLog, kTrace , "No params");
1176 return createKernel(ctx, code, dataBindings.data.data(), numInputs,
1177 dataBindings.viewOffsets.data(), nWorkgroups, nullptr,
1178 0);
1179 }
1180}
std::array< size_t, N > viewOffsets
Definition gpu.h:126
std::array< Tensor, N > data
Definition gpu.h:125

◆ createKernel() [2/2]

Kernel gpu::createKernel ( Context & ctx,
const KernelCode & code,
const Tensor * dataBindings,
size_t numTensors,
const size_t * viewOffsets,
const Shape & nWorkgroups,
const void * params = nullptr,
size_t paramsSize = 0 )
inline

A factory function to create a kernel on the GPU. The kernel is created with the given WGSL code, input tensors, output tensor, and optional parameters.

Note that the values of the input tensors are not used here, only the reference handles to the underlying buffers as well as the size of the buffers.

Parameters
[in]ctxContext instance to manage the kernel
[in]codeWGSL code for the kernel
[in]dataBindingsPointer to a span of tensors bound to the kernel
[in]numTensorsNumber of tensors in the dataBindings span
[in]viewOffsetsPointer to an array of view offsets for the input tensors
[in]nWorkgroupsShape of the workgroup
[in]paramsOptional parameters for the kernel. If the kernel does not have any parameters, use NoParam. This is cast as void* to allow for arbitrary types to be passed as parameters.
[in]paramsSizeSize of the parameters buffer in bytes.
Returns
Kernel instance representing the created kernel
Kernel kernel = createKernel(ctx, code, dataBindings, numInputs,

output, nThreads, params, paramsSize);

Definition at line 1007 of file gpu.h.

1011 {
1012 assert(nWorkgroups.rank == 3);
1013 WGPUDevice device = ctx.device;
1014 WGPUQueue queue = ctx.queue;
1015 Kernel op;
1016 // paramIndex is the index into bgLayoutEntries for the parameters buffer If
1017 // there are no parameters for the kernel, paramsSize == 0 and paramIndex is
1018 // effectively undefined (== -1)
1019 size_t paramIndex = -1;
1020 // Note: paramIndex is undefined unless paramsSize > 0
1021 size_t numBindings = numTensors;
1022 if (paramsSize > 0) {
1023 numBindings++; // parameters buffer
1024 paramIndex = numBindings - 1; // index of the parameters buffer within
1025 // op.buffers, op.bufferSizes and
1026 // bgLayoutEntries
1027 }
1028 op.buffers = std::make_unique<WGPUBuffer[]>(numBindings);
1029 op.bufferSizes = std::make_unique<size_t[]>(numBindings);
1030 op.numBindings = numBindings;
1031 std::vector<WGPUBindGroupLayoutEntry> bgLayoutEntries(numBindings);
1032 // Create layout entries for input buffers
1033 for (size_t i = 0; i < numTensors; ++i) {
1034 bgLayoutEntries[i] = WGPUBindGroupLayoutEntry{
1035 .binding = static_cast<uint32_t>(i),
1036 .visibility = WGPUShaderStage_Compute,
1037 .buffer =
1038 WGPUBufferBindingLayout{
1039 .type = WGPUBufferBindingType_Storage,
1040 .minBindingSize = dataBindings[i].data.size,
1041 },
1042 };
1043 }
1044 if (paramsSize > 0) {
1045 LOG(kDefLog, kInfo, "Create layout entry for the params buffer");
1046 // Create layout entry for the params buffer
1047 bgLayoutEntries[paramIndex] = WGPUBindGroupLayoutEntry{
1048 .binding = static_cast<uint32_t>(paramIndex),
1049 .visibility = WGPUShaderStage_Compute,
1050 .buffer =
1051 WGPUBufferBindingLayout{
1052 .type = WGPUBufferBindingType_Uniform,
1053 .minBindingSize = paramsSize,
1054 },
1055 };
1056 }
1057 WGPUBindGroupLayoutDescriptor bgLayoutDesc = {
1058 .entryCount = static_cast<uint32_t>(bgLayoutEntries.size()),
1059 .entries = bgLayoutEntries.data(),
1060 };
1061 WGPUBindGroupLayout bgLayout =
1062 wgpuDeviceCreateBindGroupLayout(device, &bgLayoutDesc);
1063 for (size_t i = 0; i < numTensors; ++i) {
1064 op.buffers[i] = dataBindings[i].data.buffer;
1065 op.bufferSizes[i] = dataBindings[i].data.size;
1066 }
1067 // Create a buffer for the Params struct
1068 if (paramsSize > 0) {
1069 WGPUBufferDescriptor paramsBufferDesc = {
1070 .usage = WGPUBufferUsage_Uniform | WGPUBufferUsage_CopyDst,
1071 .size = paramsSize,
1072 .mappedAtCreation = false,
1073 };
1074 op.buffers[paramIndex] = wgpuDeviceCreateBuffer(device, &paramsBufferDesc);
1075 op.bufferSizes[paramIndex] = paramsSize;
1076 wgpuQueueWriteBuffer(queue, op.buffers[paramIndex], 0, params, paramsSize);
1077 LOG(kDefLog, kTrace, "Params buffer written");
1078 } else {
1079 LOG(kDefLog, kTrace, "No params buffer needed");
1080 }
1081 std::vector<WGPUBindGroupEntry> bindGroupEntries(numBindings);
1082 for (size_t i = 0; i < numTensors; ++i) {
1083 bindGroupEntries[i] = WGPUBindGroupEntry{
1084 .binding = static_cast<uint32_t>(i),
1085 .buffer = op.buffers[i],
1086 .offset = viewOffsets[i],
1087 .size = op.bufferSizes[i],
1088 };
1089 }
1090 if (paramsSize > 0) {
1091 LOG(kDefLog, kInfo, "Create bind group entry for the params buffer");
1092 LOG(kDefLog, kInfo, "paramIndex: %d", paramIndex);
1093 bindGroupEntries[paramIndex] = WGPUBindGroupEntry{
1094 .binding = static_cast<uint32_t>(paramIndex),
1095 .buffer = op.buffers[paramIndex],
1096 .offset = 0,
1097 .size = paramsSize,
1098 };
1099 }
1100 LOG(kDefLog, kTrace, "BG Entries Size: %d", numBindings);
1101 WGPUBindGroupDescriptor bindGroupDesc = {
1102 .layout = bgLayout,
1103 .entryCount = static_cast<uint32_t>(numBindings),
1104 .entries = bindGroupEntries.data(),
1105 };
1106 op.bindGroup = wgpuDeviceCreateBindGroup(device, &bindGroupDesc);
1107 {
1108 WGPUPipelineLayoutDescriptor pipelineLayoutDesc = {
1109 .bindGroupLayoutCount = 1,
1110 .bindGroupLayouts = &bgLayout,
1111 };
1112 WGPUPipelineLayout pipelineLayout =
1113 wgpuDeviceCreatePipelineLayout(device, &pipelineLayoutDesc);
1114 WGPUShaderModuleWGSLDescriptor wgslDesc = {
1115 .code = code.data.c_str(),
1116 };
1117 wgslDesc.chain.sType = WGPUSType_ShaderModuleWGSLDescriptor;
1118 WGPUShaderModuleDescriptor shaderModuleDesc = {};
1119 shaderModuleDesc.nextInChain = &wgslDesc.chain;
1120 shaderModuleDesc.label = code.label.c_str();
1121 WGPUComputePipelineDescriptor computePipelineDesc = {};
1122 computePipelineDesc.layout = pipelineLayout;
1123 computePipelineDesc.compute.module =
1124 wgpuDeviceCreateShaderModule(device, &shaderModuleDesc);
1125 computePipelineDesc.compute.entryPoint = code.entryPoint.c_str();
1126 computePipelineDesc.label = code.label.c_str();
1127 op.computePipeline =
1128 wgpuDeviceCreateComputePipeline(device, &computePipelineDesc);
1129 }
1130 /*
1131 op.nWorkgroups = {cdiv(nThreads[0], code.workgroupSize[0]),
1132 cdiv(nThreads[1], code.workgroupSize[1]),
1133 cdiv(nThreads[2], code.workgroupSize[2])};
1134 */
1135 op.nWorkgroups = {nWorkgroups[0], nWorkgroups[1], nWorkgroups[2]};
1136 resetCommandBuffer(device, op);
1137 ctx.kernelPool.data.insert(&op);
1138 return op;
1139}
void resetCommandBuffer(WGPUDevice &device, Kernel &op)
Resets the command buffer in preparation for a kernel dispatch. Since command buffers are consumed up...
Definition gpu.h:937
size_t size
Definition gpu.h:35
WGPUBuffer buffer
Definition gpu.h:33
WGPUQueue queue
Definition gpu.h:438
KernelPool kernelPool
Definition gpu.h:440
WGPUDevice device
Definition gpu.h:437
std::string entryPoint
Definition gpu.h:329
std::string data
Definition gpu.h:325
std::string label
Definition gpu.h:328
std::unique_ptr< size_t[]> bufferSizes
Definition gpu.h:384
size_t numBindings
Definition gpu.h:385
WGPUComputePipeline computePipeline
Definition gpu.h:388
WGPUBindGroup bindGroup
Definition gpu.h:387
Shape nWorkgroups
Definition gpu.h:386
std::unique_ptr< WGPUBuffer[]> buffers
Definition gpu.h:383
std::set< Kernel * > data
Definition gpu.h:410
Array data
Definition gpu.h:97

◆ createTensor() [1/4]

Tensor gpu::createTensor ( Context & ctx,
const Shape & shape,
NumType dtype )
inline

Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape and data type.

Instead of taking the TensoPool and raw WebGPU API WGPUDevice and WGPUBufferUsageFlags arguments, this is a convenience wrapper around the core createTensor function which has default usage flags for a storage buffer, and also takes in the Context object.

instance instead of the narrower TensorPool object.

Parameters
[in]ctxContext instance to manage the tensor
[in]shapeShape of the tensor
[in]dtypeData type of the tensor (e.g. kf32)
Returns
Tensor instance representing the created tensor
Tensor tensor = createTensor(ctx, {256, 256}, kf32);
Tensor createTensor(TensorPool &pool, WGPUDevice &device, const Shape &shape, NumType dtype, WGPUBufferUsageFlags usage=WGPUBufferUsage_Storage|WGPUBufferUsage_CopyDst|WGPUBufferUsage_CopySrc)
Tensor factory function to create a tensor (a Tensor type is simply an Array with an N-dimensional Sh...
Definition gpu.h:491
Represents a tensor on the GPU, which is a buffer of values with a shape.
Definition gpu.h:96

Definition at line 530 of file gpu.h.

530 {
531 return createTensor(ctx.pool, ctx.device, shape, dtype);
532}
TensorPool pool
Definition gpu.h:439

◆ createTensor() [2/4]

Tensor gpu::createTensor ( Context & ctx,
const Shape & shape,
NumType dtype,
float * data )
inline

Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape, data type. This overload also takes initial float* data to populate the tensor with.

The data is assumed to be of size equal to the product of the dimensions in the shape, and is copied to the GPU buffer.

Parameters
[in]ctxContext instance to manage the tensor
[in]shapeShape of the tensor
[in]dtypeData type of the tensor (e.g. kf32)
[in]dataInitial data to populate the tensor with
Returns
Tensor instance representing the created tensor
Tensor tensor = createTensor(ctx, {256, 256}, kf32, data);

Definition at line 552 of file gpu.h.

553 {
554 assert(dtype == kf32);
555 Tensor tensor =
556 createTensor(ctx.pool, ctx.device, shape, dtype,
557 WGPUBufferUsage_Storage | WGPUBufferUsage_CopyDst |
558 WGPUBufferUsage_CopySrc);
559 wgpuQueueWriteBuffer(ctx.queue, tensor.data.buffer, 0, data,
560 tensor.data.size);
561 return tensor;
562}

◆ createTensor() [3/4]

Tensor gpu::createTensor ( Context & ctx,
const Shape & shape,
NumType dtype,
half * data )
inline

Overload of the tensor factory function to instantiate a tensor on the GPU with a given shape, data type. This overload also takes initial half* data to populate the tensor with.

The data is assumed to be of size equal to the product of the dimensions in the shape, and is copied to the GPU buffer.

Parameters
[in]ctxContext instance to manage the tensor
[in]shapeShape of the tensor
[in]dtypeData type of the tensor (e.g. kf32)
[in]dataInitial data to populate the tensor with
Returns
Tensor instance representing the created tensor
Tensor tensor = createTensor(ctx, {256, 256}, kf32, data);

Definition at line 582 of file gpu.h.

583 {
584 assert(dtype == kf16);
585 Tensor tensor =
586 createTensor(ctx.pool, ctx.device, shape, dtype,
587 WGPUBufferUsage_Storage | WGPUBufferUsage_CopyDst |
588 WGPUBufferUsage_CopySrc);
589 wgpuQueueWriteBuffer(ctx.queue, tensor.data.buffer, 0, data,
590 tensor.data.size);
591 return tensor;
592}

◆ createTensor() [4/4]

Tensor gpu::createTensor ( TensorPool & pool,
WGPUDevice & device,
const Shape & shape,
NumType dtype,
WGPUBufferUsageFlags usage = WGPUBufferUsage_Storage | WGPUBufferUsage_CopyDst | WGPUBufferUsage_CopySrc )
inline

Tensor factory function to create a tensor (a Tensor type is simply an Array with an N-dimensional Shape specification) on the GPU. The tensor is created with the given shape, data type, and usage flags, added to the TensorPool, and returned.

This is the core implementation which takes the minimal set of parameters in terms of the raw WebGPU API, and is used by the other createTensor overloads which provide more ergonomic interfaces.

Parameters
[in]poolTensorPool instance to manage the tensor
[in]deviceWGPUDevice instance to create the tensor on
[in]shapeShape of the tensor
[in]dtypeData type of the tensor (e.g. kf32)
[in]usageUsage flags for the tensor buffer
Returns
Tensor instance representing the created tensor
Tensor tensor = createTensor(pool, device, {256, 256}, kf32);

Definition at line 491 of file gpu.h.

495 {
496 LOG(kDefLog, kTrace, "Creating tensor");
497 size_t numElements = size(shape);
498 size_t size = sizeBytes(dtype) * numElements;
499 WGPUBufferDescriptor bufferDesc = {
500 .usage = usage,
501 .size = size,
502 };
503 WGPUBuffer buffer = wgpuDeviceCreateBuffer(device, &bufferDesc);
504 pool.data[buffer] = Tensor{
505 .data = Array{.buffer = buffer, .usage = usage, .size = size},
506 .shape = shape,
507 };
508 return pool.data[buffer];
509}
size_t size(const Shape &shape)
Returns the number of elements in a tensor with the given shape, which is equal to the product of the...
Definition gpu.h:80
size_t sizeBytes(const NumType &type)
Returns the number of bytes of a number type.
Definition gpu.h:191
std::unordered_map< WGPUBuffer, Tensor > data
Definition gpu.h:179

◆ dispatchKernel()

void gpu::dispatchKernel ( Context & ctx,
Kernel & kernel,
std::promise< void > & promise )
inline

Asynchronously submits a kernel to the GPU queue for execution. It also sets up a callback to notify when the kernel has finished executing by setting the value of the promise in the kernel instance argument.

dispatchKernel does not wait for the kernel to finish executing and returns immediately. The caller can wait for the kernel to finish executing by calling wait() on the future in the kernel instance.

Parameters
[in]ctxContext instance to manage the kernel, from which the queue for the GPU is obtained
[in]kernelKernel instance to dispatch
[in]promisePromise to set when the kernel has finished executing
dispatchKernel(ctx, kernel);
void dispatchKernel(Context &ctx, Kernel &kernel, std::promise< void > &promise)
Asynchronously submits a kernel to the GPU queue for execution. It also sets up a callback to notify ...
Definition gpu.h:1200

Definition at line 1200 of file gpu.h.

1201 {
1202 // Submit the command buffer
1203 wgpuQueueSubmit(ctx.queue, 1, &kernel.commandBuffer);
1204 wgpuQueueOnSubmittedWorkDone(
1205 ctx.queue,
1206 [](WGPUQueueWorkDoneStatus status, void *data) {
1207 check(status == WGPUQueueWorkDoneStatus_Success, "Queue work done",
1208 __FILE__, __LINE__);
1209 auto *promise = static_cast<std::promise<void> *>(data);
1210 promise->set_value();
1211 },
1212 &promise);
1213}
WGPUCommandBuffer commandBuffer
Definition gpu.h:389

◆ eye()

void gpu::eye ( float * a,
size_t N )
inline

Populate a square matrix with the identity matrix.

Parameters
aThe array to populate.
NThe number of rows and columns in the square matrix.

Definition at line 231 of file array_utils.h.

231 {
232 for (size_t i = 0; i < N; i++) {
233 for (size_t j = 0; j < N; j++) {
234 a[i * N + j] = (i == j) ? 1.0 : 0.0;
235 }
236 }
237}

◆ flip()

void gpu::flip ( float * a,
size_t R,
size_t C,
bool horizontal = true )
inline

Flip a matrix horizontally or vertically.

Parameters
aThe matrix to flip.
RThe number of rows in the matrix.
CThe number of columns in the matrix.
horizontalWhether to flip horizontally (true) or vertically (false).

Definition at line 264 of file array_utils.h.

264 {
265 if (horizontal) {
266 for (size_t i = 0; i < R; i++) {
267 for (size_t j = 0; j < C / 2; j++) {
268 std::swap(a[i * C + j], a[i * C + C - j - 1]);
269 }
270 }
271 } else {
272 for (size_t i = 0; i < R / 2; i++) {
273 for (size_t j = 0; j < C; j++) {
274 std::swap(a[i * C + j], a[(R - i - 1) * C + j]);
275 }
276 }
277 }
278}

◆ FreeTensor()

void gpu::FreeTensor ( TensorPool & pool,
Tensor tensor )
inline

Frees a tensor resource and updates the tensor pool.

Only needed if the use case requires manually managing resource lifetimes of GPU tensors. For simple use cases, the TensorPool destructor will automatically free all tensors.

Parameters
[in]poolTensorPool instance to manage the tensor
[in]tensorTensor instance to free
FreeTensor(pool, tensor);
void FreeTensor(TensorPool &pool, Tensor tensor)
Frees a tensor resource and updates the tensor pool.
Definition gpu.h:608

Definition at line 608 of file gpu.h.

608 {
609 if (tensor.data.buffer) {
610 wgpuBufferRelease(tensor.data.buffer);
611 } else {
612 LOG(kDefLog, kWarn, "Tried to free tensor with null buffer");
613 }
614 if (pool.data.find(tensor.data.buffer) != pool.data.end()) {
615 pool.data.erase(tensor.data.buffer);
616 } else {
617 LOG(kDefLog, kWarn, "Tried to free tensor that was not in pool");
618 }
619}

◆ isclose()

bool gpu::isclose ( float * a,
float * b,
size_t n,
float tol = 1e-3 )

Determine if the values of two arrays are close to each other.

Parameters
aThe first array.
bThe second array.
nThe number of elements in the arrays.
tolThe tolerance for closeness.
Returns
bool True if the arrays are close, false otherwise.

Definition at line 288 of file array_utils.h.

288 {
289 for (size_t i = 0; i < n; i++) {
290 if (std::abs(a[i] - b[i]) > tol || std::isnan(a[i]) || std::isnan(b[i])) {
291 LOG(kDefLog, kError, "Mismatch at index %d: %f != %f", i, a[i], b[i]);
292 return false;
293 }
294 }
295 return true;
296}

◆ LOG()

void gpu::LOG ( Logger & logger,
int level,
const char * message,
... )
inline

Log a message to the logger. If NDEBUG is defined in a source or as a compiler flag, this is a no-op.

Parameters
loggerThe logger to log to.
levelThe log level of the message.
messageThe message to log.

Definition at line 34 of file logging.h.

34 {
35 static const char *orange = "\033[0;33m";
36 static const char *red = "\033[0;31m";
37 static const char *white = "\033[0;37m";
38 static const char *gray = "\033[0;90m";
39 static const char *reset = "\033[0m";
40 static const char *logColors[] = {red, red, orange, gray};
41 if (level <= logger.level) {
42 va_list(args);
43 va_start(args, message);
44 snprintf(logger.buffer, sizeof(logger.buffer), message, args);
45 // Brackets and messages are white.
46 // Log levels are red for error and warning, orange for info, and grey for trace.
47 // Then the color is reset.
48 fprintf(logger.stream, "%s[%s%s%s] ", white, logColors[level], kLevelStr[level],
49 white);
50 vfprintf(logger.stream, message, args);
51 fprintf(logger.stream, "%s\n", reset);
52 va_end(args);
53 }
54}
FILE * stream
Definition logging.h:20
char buffer[32768]
Definition logging.h:21
int level
Definition logging.h:22

◆ operator<()

bool gpu::operator< ( const Kernel & lhs,
const Kernel & rhs )
inline

Operator implementation to make the Kernel type hashable.

Parameters
[in]lhsFirst Kernel instance to compare
[in]rhsSecond Kernel instance to compare
Returns
True if lhs < rhs, false otherwise

Definition at line 398 of file gpu.h.

398 {
399 return lhs.commandBuffer < rhs.commandBuffer;
400}

◆ processEvents()

void gpu::processEvents ( const WGPUInstance & instance)
inline

Definition at line 419 of file gpu.h.

419 {
420#ifdef __EMSCRIPTEN__
421 emscripten_sleep(0);
422#else
423 wgpuInstanceProcessEvents(instance);
424#endif
425}

◆ randint() [1/2]

void gpu::randint ( float * a,
size_t N,
std::mt19937 & gen,
int min = -1,
int max = 1 )

Populate the array with random integers.

Parameters
aThe array to populate.
NThe number of elements in the array.
genThe random number generator.
minThe minimum value for the random integers.
maxThe maximum value for the random integers.

Definition at line 171 of file array_utils.h.

171 {
172 std::uniform_int_distribution<> dist(min, max);
173 for (int i = 0; i < N; i++) {
174 a[i] = static_cast<float>(dist(gen));
175 }
176}

◆ randint() [2/2]

template<typename numtype , size_t size>
void gpu::randint ( std::array< numtype, size > & a,
std::mt19937 & gen,
int min = -1,
int max = 1 )

Overload of randint() for std::array.

Parameters
aThe array to populate.
genThe random number generator.
minThe minimum value for the random integers.
maxThe maximum value for the random integers.

Definition at line 186 of file array_utils.h.

187 {
188 std::uniform_int_distribution<> dist(min, max);
189 for (int i = 0; i < size; i++) {
190 a[i] = static_cast<numtype>(dist(gen));
191 }
192}

◆ randn() [1/2]

void gpu::randn ( float * a,
size_t N,
std::mt19937 & gen,
float mean = 0.0,
float std = 1.0 )

Populate the array with random floats, generated from a Gaussian distribution.

Parameters
aThe array to populate.
NThe number of elements in the array.
genThe random number generator.
meanThe mean of the Gaussian distribution.
stdThe standard deviation of the Gaussian distribution.

Definition at line 202 of file array_utils.h.

203 {
204 std::normal_distribution<float> dist(mean, std);
205 for (int i = 0; i < N; i++) {
206 a[i] = static_cast<float>(dist(gen));
207 }
208}

◆ randn() [2/2]

template<size_t size>
void gpu::randn ( std::array< float, size > & a,
std::mt19937 & gen,
float mean = 0.0,
float std = 1.0 )

Overload of randn() for std::array.

Parameters
aThe array to populate.
genThe random number generator.
meanThe mean of the Gaussian distribution.
stdThe standard deviation of the Gaussian distribution.

Definition at line 218 of file array_utils.h.

219 {
220 std::normal_distribution<float> dist(mean, std);
221 for (int i = 0; i < size; i++) {
222 a[i] = static_cast<float>(dist(gen));
223 }
224}

◆ range() [1/2]

void gpu::range ( float * input,
size_t N,
float start = 0.0,
float step = 1.0 )

Populate the array with a range of values. This is mostly for testing purposes.

Parameters
inputThe array to populate.
NThe number of elements in the array.
startThe starting value.
stepThe step size.

Definition at line 139 of file array_utils.h.

139 {
140 // TODO(avh): currently unused - check
141 float curr = start;
142 for (size_t i = 0; i < N; i++) {
143 input[i] = curr;
144 curr += step;
145 }
146}

◆ range() [2/2]

template<size_t N>
void gpu::range ( std::array< float, N > & input,
float start = 0.0,
float step = 1.0 )

Overload of range() for std::array.

Parameters
inputThe array to populate.
startThe starting value.
stepThe step size.

Definition at line 155 of file array_utils.h.

155 {
156 float curr = start;
157 for (size_t i = start; i < N; i++) {
158 input[i] = curr;
159 curr += step;
160 }
161}

◆ replaceAll() [1/2]

void gpu::replaceAll ( std::string & str,
const std::string & from,
const std::string & to )
inline

simple in-place string replacement helper function for substituting placeholders in a WGSL string template.

Note this is not meant to be used in performance-critical code paths and should be used ahead-of-time before any performance-critical codepath to preprocess WGSL code strings.

Parameters
[in]strString to mutate with substitution replacements.
[in]fromSubstring to replace
[in]toSubstring to replace with
replaceAll(str, "{{workgroupSize}}", "256");
void replaceAll(std::string &str, const std::string &from, const std::string &to)
simple in-place string replacement helper function for substituting placeholders in a WGSL string tem...
Definition gpu.h:256

Definition at line 256 of file gpu.h.

257 {
258 size_t start_pos = 0;
259 while ((start_pos = str.find(from, start_pos)) != std::string::npos) {
260 str.replace(start_pos, from.length(), to);
261 start_pos += to.length();
262 }
263}

◆ replaceAll() [2/2]

void gpu::replaceAll ( std::string & str,
const std::vector< std::pair< std::string, std::string > > & reps )
inline

Overload of the string replacement helper function to replace multiple substrings in a string with multiple replacements.

Parameters
[in]strString to mutate with substitution replacements.
[in]repsVector of pairs of substrings to replace and their replacements.
replaceAll(str, {{"{{workgroupSize}}", "256"}, {"{{precision}}",

"f32"}});

Definition at line 346 of file gpu.h.

347 {
348 for (const auto &rep : reps) {
349 replaceAll(str, rep.first, rep.second);
350 }
351}

◆ resetCommandBuffer()

void gpu::resetCommandBuffer ( WGPUDevice & device,
Kernel & op )
inline

Resets the command buffer in preparation for a kernel dispatch. Since command buffers are consumed upon submission, this function is used both in the initial kernel creation and every time the kernel is to be reused for a dispatch.

Parameters
[in]deviceWGPUDevice instance to manage the operation
[in]opKernel instance representing the kernel to reset
resetCommandBuffer(device, op);

Definition at line 937 of file gpu.h.

937 {
938 {
939 WGPUCommandEncoder commandEncoder =
940 wgpuDeviceCreateCommandEncoder(device, nullptr);
941 WGPUComputePassEncoder computePassEncoder =
942 wgpuCommandEncoderBeginComputePass(commandEncoder, nullptr);
943 wgpuComputePassEncoderSetPipeline(computePassEncoder, op.computePipeline);
944 wgpuComputePassEncoderSetBindGroup(computePassEncoder, 0, op.bindGroup, 0,
945 nullptr);
946 wgpuComputePassEncoderDispatchWorkgroups(
947 computePassEncoder, op.nWorkgroups[0], op.nWorkgroups[1],
948 op.nWorkgroups[2]);
949 wgpuComputePassEncoderEnd(computePassEncoder);
950 op.commandBuffer = wgpuCommandEncoderFinish(commandEncoder, nullptr);
951 }
952}

◆ setLogLevel()

void gpu::setLogLevel ( int level)

Set the log level of the default logger.

Parameters
levelThe log level to set.

Definition at line 70 of file logging.h.

70 {
71 kDefLog.level = level;
72}
static Logger kDefLog
Default logger for logging messages to stdout at the info level. Output stream and logging level for ...
Definition logging.h:64

◆ show() [1/3]

template<typename numtype >
std::string gpu::show ( const numtype * a,
size_t rows,
size_t cols,
const std::string & name = "" )

Show a 2D array as a string, base implementation.

Parameters
aThe array to show.
rowsThe number of rows in the array.
colsThe number of columns in the array.
nameThe name of the array to show.
Returns
std::string The string representation of the array.
std::array<float, 4> a = {1.0, 2.0, 3.0, 4.0};
printf("%s", show<float>(a.data(), 2, 2, "a").c_str());
std::string show(const numtype *a, size_t rows, size_t cols, const std::string &name="")
Show a 2D array as a string, base implementation.
Definition array_utils.h:43

Definition at line 43 of file array_utils.h.

44 {
45 std::string output = "\n";
46 if (name != "") {
47 output += "\n" + name + " (" + std::to_string(rows) + ", " +
48 std::to_string(cols) + ")\n\n";
49 } else {
50 output +=
51 "\n(" + std::to_string(rows) + ", " + std::to_string(cols) + ")\n\n";
52 }
53 // spacing as log10 of max value
54 int spacing = 1;
55 numtype max = *std::max_element(a, a + rows * cols);
56 if constexpr (std::is_same<numtype, int>::value) {
57 spacing = std::max(0, (int)log10(max + .01)) + 2;
58 } else if constexpr (std::is_same<numtype, float>::value) {
59 // spacing = std::max(0, (int)log10(max + .01)) + 1;
60 spacing = 8; // scientific notation
61 } else {
62 throw std::runtime_error("Unsupported number type for show()");
63 }
64 // print to stdout line break for each row
65 for (size_t i = 0; i < rows; i++) {
66 if (i == kShowMaxRows / 2 && rows > kShowMaxRows) {
67 output += "...\n";
68 i = rows - kShowMaxRows / 2;
69 }
70 for (size_t j = 0; j < cols; j++) {
71 if (j == kShowMaxCols / 2 && cols > kShowMaxCols) {
72 output += " ..";
73 j = cols - kShowMaxCols / 2;
74 }
75 char buffer[50];
76 if constexpr (std::is_same<numtype, int>::value) {
77 snprintf(buffer, spacing, "%*d", spacing, a[i * cols + j]);
78 } else if constexpr (std::is_same<numtype, float>::value) {
79 if (std::abs(a[i * cols + j]) < 1000 &&
80 std::abs(a[i * cols + j]) > 0.01 ||
81 a[i * cols + j] == 0.0) {
82 snprintf(buffer, 16, "%9.2f", a[i * cols + j]);
83 } else
84 snprintf(buffer, 16, "%10.2e", a[i * cols + j]);
85 } else {
86 throw std::runtime_error("Unsupported number type for show()");
87 }
88 output += buffer;
89 }
90 output += "\n";
91 }
92 output += "\n";
93 return output;
94}
static constexpr int kShowMaxCols
Definition array_utils.h:27
static constexpr int kShowMaxRows
Definition array_utils.h:26

◆ show() [2/3]

template<size_t rows, size_t cols>
std::string gpu::show ( const std::array< float, rows *cols > & a,
const std::string & name = "" )

Overload of show() for float std::array.

Parameters
aThe array to show.
nameThe name of the array to show.
Returns
std::string The string representation of the array.
std::array<float, 4> a = {1.0, 2.0, 3.0, 4.0};
printf("%s", show(a, "a").c_str());

@

Definition at line 126 of file array_utils.h.

127 {
128 return show<float, rows, cols>(a, name);
129}

◆ show() [3/3]

template<typename numtype , size_t rows, size_t cols>
std::string gpu::show ( const std::array< numtype, rows *cols > & a,
const std::string & name = "" )

Overload of show() for std::array.

Parameters
aThe array to show.
nameThe name of the array to show.
Returns
std::string The string representation of the array.
std::array<float, 4> a = {1.0, 2.0, 3.0, 4.0};
printf("%s", show<float>(a, "a").c_str());

Definition at line 108 of file array_utils.h.

109 {
110 return show<numtype>(a.data(), rows, cols, name);
111}

◆ size()

size_t gpu::size ( const Shape & shape)
inline

Returns the number of elements in a tensor with the given shape, which is equal to the product of the dimensions.

Parameters
[in]shapeShape of the tensor
Returns
Number of elements in the tensor
size({256, 256}) -> 65536

Definition at line 80 of file gpu.h.

80 {
81 size_t numels = 1;
82 for (size_t i = 0; i < shape.rank; i++) {
83 numels *= shape.data[i];
84 }
85 return numels;
86}
std::array< size_t, kMaxRank > data
Definition gpu.h:52

◆ sizeBytes()

size_t gpu::sizeBytes ( const NumType & type)
inline

Returns the number of bytes of a number type.

Definition at line 191 of file gpu.h.

191 {
192 switch (type) {
193 case kf16:
194 return sizeof(uint16_t);
195 case kf32:
196 return sizeof(float);
197 default:
198 LOG(kDefLog, kError, "Invalid NumType in size calculation.");
199 return 0;
200 }
201}

◆ toCPU() [1/3]

template<size_t N>
void gpu::toCPU ( Context & ctx,
Tensor & tensor,
std::array< float, N > & data )

Overload of the toCPU function to copy data from a GPU buffer to CPU memory for an array of floats instead of a pointer to a float buffer.

Parameters
[in]ctxContext instance to manage the operation
[in]tensorTensor instance representing the GPU buffer to copy from
[out]dataArray of floats to copy the data to
toCPU(ctx, tensor, data);
void toCPU(Context &ctx, Tensor &tensor, void *data, size_t bufferSize, CopyData &op)
Copies data from a GPU buffer to CPU memory.
Definition gpu.h:789

Definition at line 869 of file gpu.h.

869 {
870 toCPU(ctx, tensor, data.data(), sizeof(data));
871}

◆ toCPU() [2/3]

void gpu::toCPU ( Context & ctx,
Tensor & tensor,
void * data,
size_t bufferSize )
inline

Overload of the toCPU function to copy data from a GPU buffer to CPU but initializes a staging buffer and promise/future for the operation for you.

For simple use cases, this overload is recommended as it abstracts away the staging buffer and promise/future management. For more custom use cases where the staging buffer is initialized ahead of time, use the other overload.

Parameters
[in]ctxContext instance to manage the operation
[in]tensorTensor instance representing the GPU buffer to copy from
[in]bufferSizeSize of the data buffer in bytes
[out]dataPointer to the CPU memory to copy the data to

Definition at line 834 of file gpu.h.

834 {
835 CopyData op;
836 op.future = op.promise.get_future();
837 {
838 WGPUBufferDescriptor readbackBufferDescriptor = {
839 .usage = WGPUBufferUsage_CopyDst | WGPUBufferUsage_MapRead,
840 .size = bufferSize,
841 };
842 op.readbackBuffer =
843 wgpuDeviceCreateBuffer(ctx.device, &readbackBufferDescriptor);
844 }
845 {
846 WGPUCommandEncoder commandEncoder;
847 WGPUComputePassEncoder computePassEncoder;
848 commandEncoder = wgpuDeviceCreateCommandEncoder(ctx.device, nullptr);
849 wgpuCommandEncoderCopyBufferToBuffer(commandEncoder, tensor.data.buffer, 0,
850 op.readbackBuffer, 0, bufferSize);
851 op.commandBuffer = wgpuCommandEncoderFinish(commandEncoder, nullptr);
852 check(op.commandBuffer, "Create command buffer", __FILE__, __LINE__);
853 }
854 toCPU(ctx, tensor, data, bufferSize, op);
855}
Staging buffer and callback data for copying data between the GPU and CPU.
Definition gpu.h:370
WGPUCommandBuffer commandBuffer
Definition gpu.h:371
WGPUBuffer readbackBuffer
Definition gpu.h:372
std::promise< void > promise
Definition gpu.h:373
std::future< void > future
Definition gpu.h:374

◆ toCPU() [3/3]

void gpu::toCPU ( Context & ctx,
Tensor & tensor,
void * data,
size_t bufferSize,
CopyData & op )
inline

Copies data from a GPU buffer to CPU memory.

Parameters
[in]ctxContext instance to manage the operation
[in]tensorTensor instance representing the GPU buffer to copy from
[out]dataPointer to the CPU memory to copy the data to
[in]bufferSizeSize of the data buffer in bytes
[in]opStagingBuffer instance to manage the operation
toCPU(ctx, tensor, data, bufferSize);

Definition at line 789 of file gpu.h.

790 {
791 wgpuQueueSubmit(ctx.queue, 1, &op.commandBuffer);
792 CallbackData callbackData = {op.readbackBuffer, bufferSize, data, &op.promise,
793 &op.future};
794 wgpuQueueOnSubmittedWorkDone(
795 ctx.queue,
796 [](WGPUQueueWorkDoneStatus status, void *callbackData) {
797 check(status == WGPUQueueWorkDoneStatus_Success, "Queue work done",
798 __FILE__, __LINE__);
799 const auto *data = static_cast<CallbackData *>(callbackData);
800 wgpuBufferMapAsync(
801 data->buffer, WGPUMapMode_Read, 0, data->bufferSize,
802 [](WGPUBufferMapAsyncStatus status, void *captureData) {
803 const auto *data = static_cast<CallbackData *>(captureData);
804 check(status == WGPUBufferMapAsyncStatus_Success,
805 "Map readbackBuffer", __FILE__, __LINE__);
806 const void *mappedData = wgpuBufferGetConstMappedRange(
807 data->buffer, /*offset=*/0, data->bufferSize);
808 check(mappedData, "Get mapped range", __FILE__, __LINE__);
809 memcpy(data->output, mappedData, data->bufferSize);
810 wgpuBufferUnmap(data->buffer);
811 data->promise->set_value();
812 },
813 callbackData);
814 },
815 &callbackData);
816 wait(ctx, op.future);
817}
void wait(Context &ctx, std::future< void > &future)
Definition gpu.h:770
Used for on-done callback data for asynchronous operations sduch as kernel launching.
Definition gpu.h:357

◆ toGPU() [1/4]

void gpu::toGPU ( Context & ctx,
const float * data,
Tensor & tensor )
inline

Overload of the toGPU function to copy data from CPU memory to a GPU taking a Tensor instance instead of a WGPUBuffer instance.

Parameters
[in]ctxContext instance to manage the operation
[in]dataPointer to the CPU memory to copy from
[in]tensorTensor instance representing the GPU buffer to copy to
toGPU(ctx, data, tensor);
void toGPU(Context &ctx, const void *data, WGPUBuffer buffer, size_t size)
Copies data from CPU memory to a GPU buffer. The toGPU overloads are effectively a convenience wrappe...
Definition gpu.h:888

Definition at line 904 of file gpu.h.

904 {
905 wgpuQueueWriteBuffer(ctx.queue, tensor.data.buffer, 0, data,
906 tensor.data.size);
907}

◆ toGPU() [2/4]

void gpu::toGPU ( Context & ctx,
const half * data,
Tensor & tensor )
inline

Definition at line 909 of file gpu.h.

909 {
910 wgpuQueueWriteBuffer(ctx.queue, tensor.data.buffer, 0, data,
911 tensor.data.size);
912}

◆ toGPU() [3/4]

void gpu::toGPU ( Context & ctx,
const void * data,
WGPUBuffer buffer,
size_t size )
inline

Copies data from CPU memory to a GPU buffer. The toGPU overloads are effectively a convenience wrapper around the WebGPU API call wgpuQueueWriteBuffer.

Parameters
[in]ctxContext instance to manage the operation
[in]dataPointer to the CPU memory to copy from
[in]bufferWGPUBuffer instance representing the GPU buffer to copy to
[in]sizeSize of the data buffer in bytes
toGPU(ctx, data, buffer, size);

Definition at line 888 of file gpu.h.

889 {
890 wgpuQueueWriteBuffer(ctx.queue, buffer, 0, data, size);
891}

◆ toGPU() [4/4]

template<typename Params >
void gpu::toGPU ( Context & ctx,
Params & params,
Kernel & op )
inline

Definition at line 915 of file gpu.h.

915 {
916 // TODO(avh): Maintain params metadata in Kernel and check for consistency.
917 // If a kernel does not have parameters this will quietly overwrite
918 // the last buffer in the bind group with the parameters buffer.
919 if (op.numBindings > 0) {
920 wgpuQueueWriteBuffer(ctx.queue, op.buffers[op.numBindings - 1], 0,
921 static_cast<void *>(&params), sizeof(params));
922 }
923}

◆ toString() [1/3]

std::string gpu::toString ( const Shape & shape)
inline

Converts Shape to string. The string formatting is meant to be slotted into WGSL code (hence no additional parentheses or brackets).

Definition at line 222 of file gpu.h.

222 {
223 std::string str;
224 for (size_t i = 0; i < shape.rank; i++) {
225 str += std::to_string(shape.data[i]);
226 if (i < shape.rank - 1) {
227 str += ", ";
228 }
229 }
230 return str;
231}

◆ toString() [2/3]

std::string gpu::toString ( NumType type)
inline

Converts NumType to string.

Definition at line 206 of file gpu.h.

206 {
207 switch (type) {
208 case kf16:
209 return "f16";
210 case kf32:
211 return "f32";
212 default:
213 LOG(kDefLog, kError, "Invalid NumType in string conversion.");
214 return "unknown";
215 }
216}

◆ toString() [3/3]

std::string gpu::toString ( size_t value)
inline

Converts size_t to string. Wraps std::to_string for consistency, instead of having to remember to switch between std::to_string and toString depending on the type.

Definition at line 238 of file gpu.h.

238{ return std::to_string(value); }

◆ transpose()

void gpu::transpose ( float * input,
float * output,
size_t M,
size_t N )
inline

Transpose a matrix.

Parameters
inputThe input matrix.
outputThe output matrix.
MThe number of rows in the input matrix.
NThe number of columns in the input matrix.

Definition at line 249 of file array_utils.h.

249 {
250 for (size_t i = 0; i < M; i++) {
251 for (size_t j = 0; j < N; j++) {
252 output[j * M + i] = input[i * N + j];
253 }
254 }
255}

◆ wait()

void gpu::wait ( Context & ctx,
std::future< void > & future )
inline

Definition at line 770 of file gpu.h.

770 {
771 while (future.wait_for(std::chrono::seconds(0)) !=
772 std::future_status::ready) {
774 }
775}
WGPUInstance instance
Definition gpu.h:435

Variable Documentation

◆ IsNoParam

template<typename T >
bool gpu::IsNoParam = std::is_same_v<T, NoParam>
constexpr

Definition at line 959 of file gpu.h.

◆ kDefLog

Logger gpu::kDefLog = {stdout, "", kInfo}
static

Default logger for logging messages to stdout at the info level. Output stream and logging level for the default logger can be globally changed on a per-program basis.

Definition at line 64 of file logging.h.

64{stdout, "", kInfo};

◆ kLevelStr

const char* gpu::kLevelStr[] = {"error", "warn", "info", "trace"}
static

Definition at line 11 of file logging.h.

11{"error", "warn", "info", "trace"};

◆ kShowMaxCols

int gpu::kShowMaxCols = 8
staticconstexpr

Definition at line 27 of file array_utils.h.

◆ kShowMaxRows

int gpu::kShowMaxRows = 8
staticconstexpr

Definition at line 26 of file array_utils.h.