Ruby C Extension Development

by Claudio Fiorini

published in December 2009

Claudio Fiorini is a Web Application Developer and works as Consultant for various companies based in Rome (Italy). A fan of GNU/Linux, Postgres and Ruby with sporadic Ardunio hack.

Some of his open source works can be seen at www.claudiofiorini.it

Why extend?

Many times during development of a web application or writing a console script for regular system administration task, we realize that we need a special feature or a method that works differently as per our needs. So, we just reinstance the class and create a method with a different name and then call our new method, to achieve our desired result. Here is an example, where we add a custom feature to the Fixnum class:

class Fixnum
def my_s

puts self.to_s
end
end

k = 12
k.my_s

We use Fixnum class and add a method my_s and inside we print out the string conversion of the number without explicitly calling puts. Trivial, but you get the point. But, if we have something more complicated and we need execution speed, what do we do? We extend the to_s function using Ruby's native language. So we need C or create a brand new extension that satisfies our needs.

Before we start developing a new extension, it is a good idea to google and see if something already exists (we don't want to re-invent the wheel) and if we do find something that could be useful, we could then extend that extension. Most of the times it is hard to find what you need, so we have to write ourselves a new extension.

Why pick Ruby or C?

Ruby is a scripting language, completely object oriented, and its lastest version is 1.9, and it includes many new features. The Ruby interpreter is written in C, and its extensions could be written in ruby code, well known as pure ruby, or written in C. When we use gem1.8 or gem1.9 command to install a ruby extension, we can see that the first row is “Building native...” which means that the gem is written in C.

But why create a ruby extension in its native language? One reason could be for performance. Hpricot, which is an HTML parser, is a very good example. Another reason is when one needs to use a library written in C like 'libpq', which is an interface for using the Postgresql database, in a gem named 'postgres'. I suggest the reader to take a look at 'postgres' extension, after you have your feet wet reading this article. The 'postgres' extension uses many concepts that are very useful to create complex extensions and I will show some of these in our extension that we build in this article named Rmag. So sit back and enjoy!

Writing our first extension

The very first thing we do is to download the Ruby source code from his official repository accessing by svn or just a tarball and start to get familiar with the content of the directories. For the rest of the article, I will assume that you are familiar with the Ruby source code.

Setting up the development environment

Writing an extension for Ruby is something that can be achived by anyone that has some basic knowledge of the C language and some time to go deep into the source code. In this article, we will create a basic Ruby extension, to show concepts that most of all you can find in the source code, on a machine running Ubuntu Hardy Heron 8.04. We will need the following packages to start with:

ruby1.8
ruby1.8-dev
build-essential

Building the sample Rmag extension. Now, let's start with the skeleton files of our extension:

rmag/

rmag.c # source code

extconf.rb # ruby file to create makefile

example.rb # a simple script to show our ext

LICENSE # license

README # some info

rmag.c is the core file of our extension. Here we have a very basic starting point:

1.1 #include “ruby.h”

1.2 #include <stdio.h>

1.3 VALUE rb_cRmag;

1.4 void Init_Rmag()

    {

1.5    rb_cRmag = rb_define_class(“Rmag”, rb_cObject);

    }

1.3 In Ruby everything is an object, so when we pass data to a Ruby method, it is referenced in C as type VALUE but using macros (included in ruby.h), we can get the Ruby type. In this line, we set the variable for our extension that is accessible from our extension and from other parts of Ruby code as well.

1.4 Init_rmag function - is a global function called from the interpreter when we use our extension.

1.5 rb_define_class function accepts 2 params: a string for the name of the class and its parent class object.

rb_cObject is a C global variable that contains the Ruby Object class. Its name is derived e from a convention: 'rb' indicates that it is a Ruby enviroment object, 'c' indicates that it is a class and Object indicates its distinguished name. In fact, looking in ruby.h we find that ruby variable examples like rb_mKernel, where 'm' is for module or rb_eFatal!!!

These few lines are good to be compiled, using ruby library 'mkmf'. We create a makefile in extconf.rb as follows:

require 'mkmf'

ext_name = 'Rmag'

create_makefile(ext_name)

Running this script, we create the makefile which is used to compile rmag.c.

# make

Now, to test our new extension, we can write an example test code example.rb as follows:

require 'Rmag'

cl = Rmag.new

puts cl

and the output is:

#<Rmag:0xb7c75ad0>

That shows our extension works!!!

Introducing advanced functionality

Now, it is time to create our first method for the Rmag class:

void Init_rmag()

{

rb_cRmag = rb_define_class(“Rmag”, rb_cObject);

rb_define_method(rb_cRmag, “strlen”, rmag_strlen, 1);

}

where,

the first param of rb_define_method is our class initialized,

strlen is ruby method name,

rmag_strlen is a C function name

and 1 is the number of parameters that strlen accepts.

So, in few words rb_define_method does a binding from Ruby method to a C function. Now we need the following code:

VALUE rmag_strlen(VALUE klass, VALUE var)

{

return INT2FIX(RSTRING(var)->len);

}

Each function always returns something although it could be Qnil (C's version of Ruby nil) and as said before the type is always VALUE. The first function value is the klass Rmag and the second value is the first parameter passed to strlen(“hello”) Ruby method.

What are INT2FIX and RSTRING?

These are macros that are part of Ruby source code that helps development (to get the full list you you should look inside ruby.h). INT2FIX is a conversion that takes an C integer and converts it to Ruby Fixnum and RSTRING gives the C pointer (->ptr) or the length of the string (->len)

But our strlen method, in Ruby obviously, just takes one parameter and must be a string. This is how we check the type of the param in C:

VALUE rmag_strlen(VALUE klass, VALUE var)

{

if(TYPE(var) == T_STRING) {

    return INT2FIX(RSTRING(var)->len);

} else {

    rb_raise(rb_eTypeError, “invalid type”);

    return Qnil;

}

}

TYPE helps out to find the type parameter that we passed to strlen and with T_STRING we identify if it is a string. A complete list can be found inside ruby.h file in Ruby source code.

We are not satisfied with this kind of approach, as we want to create an extension that takes a parameter during initialization and keeps it in memory and can work with it! Let's do that next.

Let's add a method to our class called initialize':

rb_cRmag = rb_define_class(“Rmag”, rb_cObject);

rb_define_method(rb_cRmag, “initialize”, rmag_init, 1);

And rmag_init will be:

VALUE rmag_init(VALUE klass, VALUE var)

{

/* we take everything */

VALUE *val = malloc(sizeof(VALUE));

*val = var;

return Data_Wrap_Struct(klass, 0, 0, val);

}

So now, we have in memory, without Garbage Collector support, our string and we can add function to do whatever we want. Let's modify strlen to work with this:

VALUE rmag_strlen(VALUE klass)

{

VALUE *val;

Data_Get_Struct(klass, VALUE, val);

if(TYPE(*val) == T_STRING)

return INT2FIX(RSTRING(*val)->len);

else

return Qnil;

}

Data_Wrap_Struct takes four parameters: second parameter is GC mark function, third parameter is GC free function and the fourth parameter is the pointer to the data. In case of Data_Get_Struct, the second parameter is the type of data that is pointed to by the third parameter.

In the rmag_strlen function we did something that is already present in Ruby, String.length, so why don't we use it? We just needed to replace the returning string with the length of the string:

return rb_funcall(*val, rb_intern(“length”), 0);

Can we imagine how powerful this is for extension development? This means that a variable that we have can be treated as a Ruby object, in this case as a String, and call all the functions that we have in the Ruby Core Library.

Where to go from here?

With this simple extension, you have an overview of writing a basic Ruby extension, written in C. I would also suggest that you go through the code of some simple Ruby extensions written in C, which will make things more clear. For those who want to start keeping track of the development of Ruby and interact with the community, I would suggest, them to join the mailing list or read about it at http://redmine.ruby-lang.org

Resources

New Features in Ruby 1.9
Hpricot
Postgres
Ruby Issue Tracking System