Python and C interaction: Part II - Ctypes
We have discussed previously about the ubiquity of ctypes to speed up Python code. A word of caution is that we do not really speed up Python code: we are calling C code from Python. This difference that looks almost trivial is paramount to understand this approach. What do we do every time we need to call C code? We link to a library. That is what we are going to do in this case, instead of running a routine in Python, we will offload it to a C library. Remember that there are two types of libraries: static and dynamic. To actually offload through a Python call it is evident we will need a dynamic library, since the callings will be solved in runtime (there is no compilation time in Python).
We will build a very small library that does some math operation in scalar and vectors, split in two files called add_two.c
and arrays.c
.
Scalars
/* file: add_two.c */
float add_float(float a, float b) {
return a + b;
}
int add_int(int a, int b) {
return a + b;
}
int add_float_ref(float *a, float *b, float *c) {
*c = *a + *b;
return 0;
}
int add_int_ref(int *a, int *b, int *c) {
*c = *a + *b;
return 0;
}
Arrays
/* file: arrays.c */
int add_int_array(int *a, int *b, int *c, int n) {
int i;
for (i = 0; i < n; i++) {
c[i] = a[i] + b[i];
}
return 0;
}
float dot_product(float *a, float *b, int n) {
float res;
int i;
res = 0;
for (i = 0; i < n; i++) {
res = res + a[i] * b[i];
}
return res;
}
We build the dynamic library with
$ gcc -c -fPIC arrays.c
$ gcc -c -fPIC add_two.c
$ gcc -shared arrays.o add_two.o -o libmymath.so
And check that the library has all the symbols defined
$ nm -n libmymath.so
...
0000000000000730 T add_float_array
00000000000008a0 T dot_product
00000000000008d0 T add_float
00000000000008e0 T add_int
00000000000008f0 T add_float_ref
0000000000000900 T add_int_ref
...
Working with Scalars
Integers
With the dynamic library built, we now proceed to communicate it with Python. That is the work of ctypes. The ctypes library is just the definition of the usual types in C (int, float, double…) and a dynamic loader. How do we use these functions in Python? Let’s take, for example, add_int
. We just need to load with ctypes the library we built and call the function:
>>> import ctypes as C
>>> math = C.CDLL('./libmymath.so')
>>> math.add_int(3, 4)
7
And there we are! We added succesfully two integers.
Floats
What if we try to add two floats?
>>> math.add_float(3, 4)
0
What happened here? The function from the library is interpreting the inputs as floats, yet we did never tell python that we are passing floats to it. A naïve solution would be to just pass 3.0
and 4.0
and parameters, but it fails horribly:
>>> math.add_float(3.0, 4.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ctypes.ArgumentError: argument 1: <type 'exceptions.TypeError'>: Don't know how to convert parameter 1
We cannot pass lightly any other parameter that Python itself could have worked out. Remember that now we are calling a C function, so all the duck typing magic that Python does cannot help us now. We need to verbosely say we are giving C floats. We need to use the types defined by C.
>>> math.add_float(C.c_float(3.0), C.c_float(4.0))
2
Almost there. The only missing thing is that we need Python to interpret the result as a float (how would Python know better otherwise?)
>>> math.add_float.restype = C.c_float
>>> math.add_float(C.c_float(3.0), C.c_float(4.0))
7.0
Of course, writing C.c_float
every time we want to call this function isn’t the cleanest solution. There is actually a much cleaner way to let Python know that the function will always take C.c_float
as arguments:
>>> math.add_float.restype = C.c_float
>>> math.add_float.argtypes = [C.c_float, C.c_float]
>>> math.add_float(3, 4)
7.0
And now the function can be called just as we would call any other Python function.
By reference
When calling a function as a reference (although might seem odd for scalars at first) the notation is much more cumbersome, since we need to effectively pass a memory position. We can ask for the memory position of a variable with the function byref
:
>>> three = C.c_int(3)
>>> four = C.c_int(4)
>>> res = C.c_int()
>>> math.add_int_ref(C.byref(three),
C.byref(four),
C.byref(res))
0
>>> res.value
7
However, there is an advantage: since the arguments are always memory positions (and therefore integers1), it works immediately for any type of pointer
>>> three = C.c_float(3)
>>> four = C.c_float(4)
>>> res = C.c_float()
>>> math.add_int_ref(C.byref(three),
C.byref(four),
C.byref(res))
0
>>> res.value
7.0
Now the notation cannot be cleaned easily, but we could write a wrapper function:
def add_int_ref_python(a, b):
a_c = C.c_float(a)
b_c = C.c_float(b)
res_c = C.c_float()
math.add_int_ref(C.byref(a_c), C.byref(b_c), C.byref(res_c))
return res.value
And there we have it, a C call completely transparent for the end user.
Arrays
With the experience of handling by-ref calls to C in scalars, handling arrays should not be particularly difficult: it is simply a by-ref call pointing to the first element of the array. The problem is, as usual, the memory management. It is good practice to manage the memory in Python, but this raises the question: how do we allocate an array in Python?
>>> in1 = (C.c_int * 3) (1, 2, -5)
>>> in2 = (C.c_int * 3) (-1, 3, 3)
>>> out = (C.c_int * 3) (0, 0, 0)
>>> math.add_int_ref(C.byref(in1),
C.byref(in2),
C.byref(out),
C.c_int(3))
>>> out[0], out[1], out[2]
(0, 5, -2)
Numpy Arrays
There is another approach. We already have a whole ecosystem to work with arrays: NumPy. A NumPy array is a lot of metadata (like size, shape, type) and a pointer to the first position on memory. We can access that memory position of a NumPy array:
>>> import numpy as np
>>> intp = C.POINTER(C.c_int)
>>> in1 = np.array([1, 2, -5], dtype=C.c_int)
>>> in2 = np.array([-1, 3, 3], dtype=C.c_int)
>>> out = np.zeros(3, dtype=C.c_int)
>>> math.add_int_ref(in1.ctypes.data_as(intp),
in2.ctypes.data_as(intp),
out.ctypes.data_as(intp),
C.c_int(3))
>>> out
array([ 0, 5, -2], dtype=int32)
Two things to be told here: a) we don’t need to actually define the type as a pointer to integer, we can directly use c_void_p
; b) the output is a NumPy array, which we can readily use in other NumPy functions. We can also wrap this function to use transparently as a Python function.
Conclusions
This covers most of the communication between C and Python through functions, which would be suitable if we programmed in Python in a C-like style. The next part of this series will be dedicated to an object oriented (therefore more pythonic) way to use ctypes.
There is however a pointer to void type,
c_void_p
, and we can create a pointer to any other type with the functionPOINTER(type)
. ↩
Leave a Comment