Performance | The Supercomputing Blog

Posts tagged ‘Performance’

Ordered map vs. Unordered map – A Performance Study

There comes a time in most complex programs where you want to ask a simple question like, ‘have I already processed a string with this id’? Linear searches through an array are easy to write and work well enough for small array sizes. Plus, the memory overhead of linear searches is fantastic, since it basically has none. But when your arrays can contain many elements, it is time to ditch those linear searches and go with an ordered map or unordered map. Continue reading ‘Ordered map vs. Unordered map – A Performance Study’ »

Posted by admin on January 24, 2014 at 9:35 pm under C++, Optimization, Windows.
Tags: C++, Insertion, Lookup, map, Memory, ordered map, Performance, Speed, STL, unordered_map
Comment on this post.

CUDA Tutorial – 3d vertex transformations

Vertex transformations are an extremely common operation for both 2d and 3d programs. A transformation can include translation, rotation, scaling, or any combination of the three. While it is beyond the scope of this article to elaborate on fine details of vertex transformations, it all boils down to a matrix multiplication. A 3d vertex can be represented as a 1×4 matrix, [x, y, z, w] where w is usually 1, and the transformation is represented as a 4×4 matrix. To get the translated vertex, you simply need to multiply the vertex by the transformation matrix, where the result is also a convenient 1×4 matrix. For a more detailed explanation, you can read about the transformation matrix here. Continue reading ‘CUDA Tutorial – 3d vertex transformations’ »

Posted by admin on March 2, 2012 at 11:15 pm under CUDA.
Tags: 3D, CUDA, Matrix Multiply, Performance, Transformation, Tutorial, Vertex
Comments Off on CUDA Tutorial – 3d vertex transformations.

Advanced Image Processing with SSE

In a previous article about image processing with SSE, we used some basic SSE intrinsics to perform a very easy image manipulation routine, removing all blue from an image. This task was easy, since each pixel was 8 bits per component, with 4 components (ARGB). However, for more advanced image processing functions such as 2D convolution, it is preferable to work with each color component as a 32-bit floating point number rather than an 8-bit unsigned integer. Continue reading ‘Advanced Image Processing with SSE’ »

Posted by admin on September 27, 2011 at 9:29 pm under C++, Graphics, Optimization, Windows.
Tags: 128-bit, Algorithm, C++, Code, Example, Floating Point, Guide, Image manipulation, Image processing, Intrinsic, Optimization, Performance, SSE, SSE2, Tutorial, Vector
Comments Off on Advanced Image Processing with SSE.

Advanced Image Processing with CUDA

In the previous tutorial, intro to image processing with CUDA, we examined how easy it is to port simple image processing functions over to CUDA. In this tutorial, we’ll be going over a substantially more complex algorithm, and how to port it to CUDA with incredible ease. Continue reading ‘Advanced Image Processing with CUDA’ »

Posted by admin on September 21, 2011 at 12:17 am under CUDA, Graphics.
Tags: Algorithm, Benchmark, Cache, CUDA, Image, Image processing, Local memory, Oil Painting, Paintl, Performance
Comments Off on Advanced Image Processing with CUDA.

Image twist and swirl algorithm

Image warps and other distortions are significantly more complicated than simple image processing techniques such as convolution. This tutorial will cover how to twist an image in the center. This exact code can be modified to do twists or other types of image warps. Continue reading ‘Image twist and swirl algorithm’ »

Posted by admin on September 12, 2011 at 10:37 pm under C++, Graphics, OpenMP.
Tags: Algorithm, Double Precision, Image Algorithm, Image processing, Multisampling, OpenMP, Performance, Rotate, Scale, Swirl, Twist
Comments Off on Image twist and swirl algorithm.

Performance of sqrt in CUDA

Taking the square root of a floating point number is essential in many engineering applications. Whether you are doing nBody simulations, simulating molecules, or linear algebra, the ability to accurately and quickly perform thousands or even millions of square root operations is essential. Unfortunately, the square root functions on most CPUs are very time consuming, even with specialized SSE instructions. Fortunately enough, GPUs have specialized hardware to perform such square root operations extremely fast. CUDA, NVidia’s solution to extremely high performance parallel computing, puts the onboard specialized hardware to full use, and easily outperforms modern Intel or AMD CPUs by a factor of over a hundred.

Continue reading ‘Performance of sqrt in CUDA’ »

Posted by admin on January 19, 2010 at 11:17 pm under CUDA.
Tags: CUDA, Experiment, Optimization, Performance, Sqrt
Comments Off on Performance of sqrt in CUDA.

CUDA – Tutorial 5 – Performance of atomics

Atomic operations are often essential for multithreaded programs, especially when different threads need to access or modify the same data. Conventional multicore CPUs generally use a test-and-set instruction to manage which thread controls which data. CUDA has a much more expansive set of atomic operations. With CUDA, you can effectively perform a test-and-set using the atomicInc() instruction. However, you can also use atomic operations to actually manipulate the data itself, without the need for a lock variable. Continue reading ‘CUDA – Tutorial 5 – Performance of atomics’ »

Posted by admin on December 4, 2009 at 8:38 pm under CUDA.
Tags: Atomic, Atomic Function, Atomic operation, CUDA, global memory, GPGPU, memory access, nVidia, Performance, shared memory, Tutorial
Comments Off on CUDA – Tutorial 5 – Performance of atomics.

CUDA – Tutorial 4 – Atomic Operations

This tutorial will discuss how to perform atomic operations in CUDA, which are often essential for many algorithms. Atomic operations are easy to use, and extremely useful in many applications. Atomic operations help avoid race conditions and can be used to make code simpler to write. Continue reading ‘CUDA – Tutorial 4 – Atomic Operations’ »

Posted by admin on July 24, 2009 at 12:16 am under CUDA.
Tags: Algorithm, Atomic, Coherency, CUDA, Parallel, Performance, Tutorial
Comments Off on CUDA – Tutorial 4 – Atomic Operations.

The Supercomputing Blog