close
Skip to main content
BERJAYA

r/webgpu


I built a Rust LLM inference engine with custom WGSL GPU kernels, here's what I learned!
I built a Rust LLM inference engine with custom WGSL GPU kernels, here's what I learned!

I've been working on a side project called aether , a Rust LLM inference engine that can load GGUF models and run them with WGPU GPU acceleration.

It started as a way to understand how LLMs actually work under the hood. One thing led to another, and now it has:

- Loads GGUF models (Llama/Mistral/Phi/Qwen)

- WGPU GPU backend (Metal/Vulkan/DX12)

- Custom fused WGSL compute shaders for Q8_0 and Q4_K quantized matmul (dequantize inline instead of a separate pass)

- Concurrent request pool for serving multiple users

- OpenAI-compatible API server (axum)

- Pure Rust, no Python dependencies in the hot path

The GPU path is still experimental (CPU mode is the safe default), but the dequant shaders and the fused matmul kernels were honestly the most fun part to write.

I'm not trying to compete with llama.cpp or MLX, this was primarily a learning project that grew into something actually useful. Happy to answer questions or take feedback.

Stack: Rust, WGPU, WGSL, GGUF, axum, Tokio

https://github.com/theoxfaber/aether

(Full transparency, the majority of this code and post were written with AI assistance. I drove the design decisions, architecture, and testing; AI handled a lot of the implementation. Treat it accordingly.)


Real talk from real sellers: Shopify is the easiest way to start. Now it's your turn. Start your free trial.
  • BERJAYA
    Real talk from real sellers: Shopify is the easiest way to start. Now it's your turn. Start your free trial.
  • BERJAYA
    Real talk from real sellers: Shopify is the easiest way to start. Now it's your turn. Start your free trial.
  • BERJAYA
    Real talk from real sellers: Shopify is the easiest way to start. Now it's your turn. Start your free trial.
  • BERJAYA
    Real talk from real sellers: Shopify is the easiest way to start. Now it's your turn. Start your free trial.



I built a Vite plugin to obfuscate and minify WGSL shaders
I built a Vite plugin to obfuscate and minify WGSL shaders

Hey all,

I built vite-plugin-wgsl-obfuscate, a small Vite plugin for WebGPU projects:

npm: https://www.npmjs.com/package/vite-plugin-wgsl-obfuscate
GitHub: https://github.com/soaringred/vite-plugin-wgsl-obfuscate

It obfuscates WGSL shader source files during production builds, while leaving dev mode untouched.

The goal is to make shipped shader code harder to inspect, copy, or reuse. It also reduces bundle size through identifier renaming, comment stripping, whitespace collapse, and const inlining.

Obviously it is not magic DRM, but it raises the bar from 'open DevTools and copy the clean WGSL' to reverse engineering the obfuscated output.

I’m using it in my own WebGPU projects, including some public ones linked from my profile/site.

Always down for feedback, especially from anyone shipping WGSL with Vite.