The Artima Developer Community
Sponsored Link

Agile Buzz Forum
CUDA, OpenCL and Smalltalk

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
James Robertson

Posts: 29924
Nickname: jarober61
Registered: Jun, 2003

David Buck, Smalltalker at large
CUDA, OpenCL and Smalltalk Posted: Aug 30, 2008 6:06 PM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by James Robertson.
Original Post: CUDA, OpenCL and Smalltalk
Feed Title: Michael Lucas-Smith
Feed URL: http://www.michaellucassmith.com/site.atom
Feed Description: Smalltalk and my misinterpretations of life
Latest Agile Buzz Posts
Latest Agile Buzz Posts by James Robertson
Latest Posts From Michael Lucas-Smith

Advertisement

It's the long weekend, so I thought I'd try something different, something I've never done before. I looked at CUDA and all the documentation I could find on what OpenCL will do. The crux of it is this: Send a bunch of vectors to the GPU and transform them with a vertex shader, then read the values back from the GPU.

CUDA and OpenGL then go on to give you a bunch of cool controls to specify how many parallel pipelines you're going to use at once, letting you do multiple calculations at once with lots of large data sets. How large? Well, the modern graphics card has well over 256mbs of ram usually.. so you can send ~22 million 3-dimensional vectors through the graphics card in a single hit. That's a lot.

So given that the fine controls for pipelines don't exist in OpenGL yet and CUDA is not exactly portable with Smalltalk.. and OpenCL doesn't exist yet, how much of this can we do with OpenGL alone? The answer: quite a bit - at least, the interesting bit at least, the math.

I've just completed Lesson #7 in my new OpenGL packages (OpenGL-Lessons is where you'll find it) which is a pure math example of using OpenGL. It doesn't render a scene, it doesn't even open a window. It does, however, demonstrate the speed of pushing 1,000,000 3d vectors (3 million floats) through a rotation matrix. In fact, on my MacBook Pro, here were the results:

Test scenario: 1,000,000 million vectors, a 3x3 45 degree rotation matrix, time to create the Smalltalk array of floats is removed (~16.4s), time to transform the Smalltalk array of floats in to a C array of floats is removed (~0.5s), time to copy the floats from the heap back in to the Smalltalk array is removed (~9.6s).

Using the Smalltalk VM running on one CPU: ~77.1 seconds

Using the RadeonX1600 GPU: ~2.3 seconds

It's easy to see why the 'cheap' super computer is literally 8 GPUs strung together inside a regular computer for physics simulations and deep number crunching. The GPUs are built for this sort of work. You can try this experiment out for yourself by looking at Lesson #7 - Math.

The big bottleneck, as you can see from the numbers, is getting our Smalltalk floats created and copied in/out of the heap for OpenGL to use. This is a real shame, because of the way we treat floats and arrays in Smalltalk there probably isn't going to be an easy fix for this in the near future.

First off, we treat floats as objects, so that you'll have an object header with the object data that'll actually be the float data. This is fixed in the 64 bit VMs, but most of us use the 32 bit VM. But even with that fixed in the 64 bit VM, the floats will be 'tagged' so the VM will still be untagging the large data set before making the C call - only seeing this in action would tell us what this really costs us. We simply don't know at this point.

There is an upside - if you're doing serious work with OpenGL, you're unlikely to be transforming arrays of floats back in to Smalltalk objects constantly - in fact, you'd probably create an object that was a facade over the heap data. I've been tempted to head in this direction with the OpenGL Matrix implementation - but I've yet to take that dive. So the good news is, if you're being smart about how you use these facilities, the negative costs won't exist and you'll get all of the positive costs.

When OpenCL finally comes out, we'll be able to add those APIs to our entourage too. They will make the code required to do the 'feedback' mode simpler as well as give us fine grained control over the pipelines. I'm looking forward to it.

Read: CUDA, OpenCL and Smalltalk

Topic: Cairo Summit 2008: A day of build Previous Topic   Next Topic Topic: Web Velocity Demo

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use