Performance
- Tuning CUDA with the GPU Memory Hierarchy
· 2024-11-27
Global, shared, and register memory each have distinct latency and bandwidth. Performance comes from the right access pattern.
- Computer Architecture: A Quantitative Approach (6th ed.)
- Computer Architecture: A Quantitative Approach (6th ed.)
- Computer Systems: A Programmer's Perspective (3rd ed.)
- Computer Systems: A Programmer's Perspective (3rd ed.)
- Improving the Scalability and Performance of a Rails Application: A Case Study with Consul