derkarl.org

Exposing C++ classes to C

2016-10-01 15:53:39 UTC

Let’s say you have some legacy software in C. You want to start enhancing it by writing new code in C++, or maybe you already have some C++ API that you want to access from C. You can’t just instantiate the classes and call the functions from C because you can’t even #include the C++ header files. What’s a coder to do?

This article shows you how to do this both easily and elegantly.

Our C++ API

Let’s define our example C++ API. In this case, it’s a class that can serialize values:

class Serializer
{
public:
    // keep our serialized data in an internal buffer, accessible by asString()
    Serializer();

    // this object has private fields (not shown here) that we need to destroy
    ~Serializer();

    // access the serialized stream
    std::string asString() const;

    // serialize a string
    void write(const std::string &);

    // serialize an int
    void write(int);
};

Rewrite our C++ API in C

Abstractly speaking, our C++ class can be instantiated, we can call a few functions on it, and finally destroy it. Let’s write the same API in C using conventional patterns.

First thing, we need a type representing the object. This could be a struct, but we can’t store anything in our struct, so why bother? We could use just a void*, but since that type can be automatically converted to any other pointer, we would lose type safety in C. Instead, we will create an opaque data type:

typedef struct serializer serializer;

What we’re doing is declaring a structure called serializer (differs by capitalization from the C++ version for no good reason), but since we’re not actually defining it, a C coder cannot instantiate it, do sizeof on it, nor dereference it. That’s fine, because we’ll only be using pointers to this type.

Let’s use this datatype and declare the rest of our API: constructor, destructor and accessors included. We put this all in its own C header file so that a C compiler can #include it:

#ifndef SERIALIZER_C_H
#define SERIALIZER_C_H

#ifdef __cplusplus
// We want our C API to be accessible by C++ as well!
extern "C"
{
#endif

typedef struct serializer serializer;

serializer* create_serializer();
void destroy_serializer(serializer*);

// access the serialized stream, as a null-terminated string
// (you must free the return value once you are done with it)
char *serializer_as_string(const serializer*);

// serialize a string
void serializer_write_string(serializer*, const char *);

// serialize an int
void serializer_write_int(serializer*, int);

#ifdef __cplusplus
}
#endif

#endif

Implementing our C API

Now that we have a C API, we need to implement it. But we implement our C API in C++!

Object construction

The C opaque datatype is a pointer. Since any data pointer in C++ will have the same sizeof as a data pointer in C, we can take a pointer from C++ and cast it to a pointer in C. Let’s implement our C API constructor knowing this fact (and note that we’re implementing it in C++!):

extern "C" serializer* create_serializer()
{
    // For added exception safety, we could use unique_ptr with its release() function here
    Serializer *const s = new Serializer;
    return reinterpret_cast<serializer*>(s);
}

Since serializer* is a pointer, and new Serializer is a pointer, and since we can guarantee that the C code won’t try to dereference the pointer, it doesn’t matter what the serializer* actually points to from the point-of-view of the C code. That’s why the reinterpret_cast is allowed, and it’s also why we’ll be able to reinterpret_cast back to our C++ Serializer* again later.

Object methods

Since we know serializer_t is a pointer, and since we know that the pointer actually contains a pointer to a C++ Serializer object, we can cast it right back in the rest of our functions:

extern "C" void destroy_serializer(serializer* _self)
{
    Serializer *const self = reinterpret_cast<Serializer*>(_self);
    delete self;
}

The _self parameter behaves like the this parameter in C++, but since this is a keyword, we instead use a “synonym” of this (stolen shamelessly from Python).

The serializer_as_string is a little bit more complex because we need to convert our C++ type (std::string) to a C type, (char*). Furthermore, since Serializer::asString was declared const, we can make that as part of our C API as well by having the the “object parameter” be const as well

char *serializer_as_string(const serializer* _self)
{
    const Serializer *const self = reinterpret_cast<const Serializer*>(_self);

    const std::string s = self->asString();
    // we documented this function to require its return value to be freed,
    // here's the corresponding malloc:
    char *const result = static_cast<char*>(malloc(s.length()+1)); // one more for null termination
    memcpy(result, s.c_str(), s.length()+1);
    return result;
}

The above function is the most complex of the remaining three, so I’ll only bother to implement one more:

// serialize a string
extern "C" void serializer_write_string(serializer* _self, const char *str)
{
    Serializer *const self = reinterpret_cast<const Serializer*>(_self);

    // we can trivially construct an std::string from str.
    // in this case, there's even an implicit constructor, but I'm being explicit for clarity
    self->write( std::string(str) );
}

You get to do the last function:

extern "C" void serializer_write_int(serializer* _self, int v)
...  // It's almost exactly like the above function!

Using our new C API:

Now that we have a C API, how do you use it? Well, as it turns out, you use it just like any other C API:

serializer* s = create_serializer();
serializer_write_string(s, "Hello World");
serializer_write_int(s, 42);
serializer_destroy(s);

Linking your C program

A program that contains any C++ must be linked by a C++-aware linker. In the Unix world, this means you must use g++ or clang++ as your linker instead of the low-level C linker, ld.

Exceptions

Error handling isn’t easy, but it has to be done. I like C++ exceptions a lot, but obviously that’s not an option in C. If your C++ API throws exceptions, you must catch them in the implementation of your C API before they can get to C code. Note that a C++ exception may not ever “fall through” C code, or weird things might happen. Upon catching the exception, you have to translate them to something C can understand.

Let’s imagine the entire C++ Serializer API uses exceptions. In this case, it seems that the only reason the constructor could fail is if new fails due to a memory shortage. We can be a little lazy here and just have the C constructor function return a null pointer if that occurs:

extern "C" serializer* create_serializer()
try // use the oft-forgotten "function-try block" syntax
{
    Serializer *const s = new Serializer;
    return reinterpret_cast<serializer*>(s);
}
catch (...)
{
    return nullptr;
}

Destructors aren’t allowed to throw, so we can do nothing there.

The other functions might have various return codes, and you’ll have to decide what convention is appropriate. Using Linux-Kernel style error handling, let’s create a new serializer_as_string, this time the return value is a parameter and the error code is returned; an API I highly advise against in C++, but is conventional in C:

// Note that we're accepting a pointer to a pointer to the "return value"
extern "C" int serializer_as_string(const serializer* _self, char **result)
try
{
    Serializer *const self = reinterpret_cast<const Serializer*>(_self);

    const std::string s = self->asString();

    try
    {
        *result = static_cast<char*>(malloc(s.length()+1)); // one more for null termination
        if (not *result)
        { // malloc failed
            throw std::bad_alloc();
        }
        memcpy(*result, s.c_str(), s.length()+1);
    }
    catch (...)
    {
        // cleanup upon any error
        free(*result);
        *result = nullptr;
        throw; // let our outer catches report the error correctly
    }
    return 0; // no error
}
catch (std::bad_alloc &)
{
    return -ENOMEM;
}
// (any other necessary catch types go here)
catch (...)
{
    return -EINVAL; // generic error
}

Here we had to wrap some extra try-catch blocks around in order to be fully exception safe. In this specific case, it was unnecessary because nothing could throw in it.

Conclusion

We use the fact that pointers between C and C++ are interchangeable as long as you don’t try to dereference them as the wrong type to create a nice clean C API from a C++ API.

Addenda

Here’s some changes I made based on reader feedback:

3 comments

Improving Code Readability in C++

2014-02-13 02:31:11 UTC
2 comments

New Meow Release

2014-01-15 04:37:11 UTC
0 comments

Blogging

2014-01-03 21:03:58 UTC
0 comments