November 11, 2022 by Tobias Hunger

Rust and C++ Interoperability

In this blog post, I want to explore both directions of integration between Rust and C++ and present some tools we use in Slint.


This blog post is based on a presentation I gave at EuroRust 2022 in Berlin. Slides are available, as is the video recording.

Here at Slint we work on an UI toolkit written in Rust. A UI toolkit is useful for other languages and eco-systems in addition to the one it was written in, so Slint comes with C++ and even Javascript APIs. Those APIs must of course feel fully native to developers using those languages. For this reason we've a strong interest in how to provide native-feeling APIs to Rust code for users in the C++ world.

Slint can (optionally) make use of existing C++ code to integrate into the different operating system environments. This includes topics like widget styling, accessibility, and more. This is why we also care about exposing existing C++ code into the Rust world.

If you need an open source C or C++ library in your Rust project: Have a look at crates.io or lib.rs: Maybe somebody else has already done the work for you?

For readers with a C++ background

As a Rustacean I use "safe" in the Rust sense: Code is safe if the rust compiler has made sure all the properties needed to enforce memory safety are met. As the Rust compiler can not parse C++ code and check the properties there, all C++ code is unsafe by definition. This doesn't mean that the "unsafe" C++ code triggers undefined behavior or do invalid memory accesses, just that it could.

You don't need to know Rust for this post, but one concept you will run into is Rust macros. They are different from C macros. A Rust macro is a function written in Rust that accepts a stream of tokens as input and produces a stream of tokens as output. The compiler runs this function at compile time whenever it encounters the macro in code, passing in the current stream of tokens and replacing it by the generated stream. This mechanism makes for powerful macros that are still "hygienic": They won't change the meaning of code around them.

Language level integration

Let's first look at language level integration: How to make Rust call code written in C++ and the other way around.

The Rust compiler can not understand C++ code. This makes it necessary to tell the Rust compiler about code you want to use on the C++ side. A bit of glue code is needed: Language bindings. Bindings define functions and data types available on the C++ side in a way that the Rust compiler can understand. Once a binding is available, Rust code can use those bindings to call code on the C++ side. The same is of course also true in the other direction: The C++ compiler also needs language bindings to tell it about code available on the Rust side.

This means you can not mix and match C++ and Rust code, but need defined interfaces to cross from one language into the other.

Challenges

All we need to do is to generate some bindings and everything is smooth sailing from there on out. How hard can that be?

There are a number of challenges:

  • The two languages we want to map to each other do have very different concepts. Rust has a different macro system than C++, C++ has inheritance, Rust uses a system of traits instead (where these two concept do not map directly to each other), Rust has life-times, something foreign to C++. C++ templates and Rust generics address similar problems, but approach them differently. All these mismatches makes it hard to map between the two languages.
  • Rust does not have a defined Application Binary Interface (ABI): This means the Rust compiler is free to change how it represents data types or function calls in the binary output it generates. Of course that makes it challenging to exchange data in binary form. The situation on the C++ side isn't too different: The ABI is compiler defined. This is why you can not mix libraries generated with MSVC and GCC. The least common denominator is the C foreign function interface (FFI). This provides a stable binary interface, but it also limits the interface to what can be expressed in the C programming language. Despite this limitation, C FFI is the backbone most inter-language communication (not only between Rust and C++) is build upon.
  • Both languages have data types to express concepts like strings of text, but the internal representation of these data types differ. For example both languages offer a way to represent a dynamic sequence of elements of the same type stored next to each other. That's std::vector in C++ or std::Vec in Rust. Both define a vector as a pointer to some memory, a capacity and a length. But what type does the pointer have? How does the data pointed to need to be aligned in memory? What type represents capacity and length? In which sequence are pointer, capacity and length stored? Any mismatch in these or other details makes it impossible to map one language's type to the other language conceptually similar type.
  • Even if the data structure happens to match: Different languages may have different requirements on the data stored in those data types. For example a string needs to be valid UTF-8 in Rust, while to C++ it's just a sequence of bytes - the programmer surely knows what encoding to used. This means it's always safe to pass a string from Rust to C++ (assuming all the little details about the string type in the standard libraries happen to match), but passing a string from C++ to Rust might trigger a panic.
  • Another problem comes in the form of inlined code. This code isn't directly callable with just the binary. Instead it's inserted wherever the inlined code is used. This requires the compiler to be able to compile the code in question: The Rust compiler can obviously not inline C++ code and neither can the C++ compiler inline Rust code. This is a widely used technique: In C++ all templates are effectively inline code.

All this makes it hard to generate binding to mediate between Rust and C++.

Automatic binding generation

In an ideal world no bindings are needed. This is not possible for the combination of Rust and C++, so let's look at the next best thing: Generating binaries automatically from existing rust files or C++ header files. This is what automatic binding generation is about.

Even though it's hard to create good language bindings automatically, it's still valuable to have generators. They get you started. There are options for both directions: Making Rust code available to C++ as well as the other way around.

The most widely used binding generators are bindgen and cbindgen.

bindgen

Bindgen parses header files and generates Rust bindings. This works well for C code, but is not perfect for C++ code. By default bindgen skip any construct it can not generate bindings for. This way it produces as many bindings as it can.

In practice bindgen needs configuration to work for any real world C++ project. You will include and exclude types as needed, or mark types as opaque: This mean they can be passed to Rust from C++, and back from Rust to C++, but the Rust side can not interact with those types in any way. You might need to add C(++) helper-functions that enable access to functionality not to visible to bindgen by default.

Typically bindgen is used to generate a low level crate (for C++ users: A library in a package manager) with a name ending in -sys. -sys crates tend to be full of unsafe calls into the C or C++ library they wrap.

Since Rust is all about building safe wrappers around unsafe code, you typically write another crate with safe wrappers around the -sys crate, which then drops the -sys suffix from its name.

Note that the process isn't unlike how C++ developers provide safe wrappers around C libraries. Of course the -sys-level is not needed there as C++ can just consume the C headers directly.

cbindgen

Cbindgen covers the other direction: It parses Rust code and generates C or C++ headers from it.

Cbindgen looks at code specifically marked up by a developer as compatible with the C FFI interface using the #[repr(C)] attribute.

Typically developers create a module (often called ffi) in their Rust project and collect all the #[repr(C)] they want to expose in this module. This process isn't unlike how C++ developers write a C-level interface to their C++ code.

When to use binding generators

Binding generators work best when you have code with a stable interface in one language and want to make that code available to the other language. Typically the code exists in the form of a library.

This is how we use binding generation in Slint: We generate bindings from our stable Rust API. We then extend the generated code on the C++ side to make the code nicer to interact with from C++, (partially) hiding the generated code behind a hand-crafted facade.

How to use binding generators

Binding generators can be run once and have the generated bindings put under version control. This only works reliably though for code with very stable interfaces.

Binding generators should generate bindings at build time. This does of course require integration into the build system of choice.

Semi-automatic binding generation

Semi-automatic binding generation works by having one custom piece of code or configuration to define an interface between two languages. This is then turned into a set of bindings for both Rust and C++, on top of an automatically generated C FFI interface hidden between the set of bindings.

The advantage is that more abstraction on top of the C FFI interface are possible, making the generated bindings more comfortable to use.

The cxx crate

A popular option is the cxx crate. Other options exist and either build on top of cxx or offer similar functionality.

cxx promises safe and fast bindings.

The safety is limited to the bindings themselves: The code called trough those bindings is of course still unsafe. This is a nice property, as you can be sure that the generated code isn't introducing problems of its own. You can concentrate on debugging the "other side" of the bindings instead of looking into the generated code.

To ensure the bindings safety, cxx generates static asserts and checks function and type signatures.

To keep the bindings fast, cxx makes sure there is no copy of data done in the binding - nor is there any conversion. This leads to types from one language bleeding into the other. For example a std::string on the C++ side turns into a CxxString in Rust. This makes the generated binding feel foreign to developers.

How does this look like? You need to have a module in your Rust code that defines both sides of the interface. Here is an example taken from the documentation of cxx:

          
          
    #[cxx::bridge]
    mod ffi {
        struct Metadata {
            size: usize,
            tags: Vec<String>,
        }
    
        extern "Rust" {
            type MultiBuf;
    
            fn next_chunk(buf: &mut MultiBuf) -> &[u8];
        }
    
        unsafe extern "C++" {
            include!("demo/include/blob_store.h");
    
            type Client;
    
            fn new_client() -> UniquePtr<Client>;
            fn put(&self, parts: &mut MultiBuf) -> u64;
        }
    }
          
        
  1. You need to mark the module with #[cxx::bridge]. This triggers a Rust macro to process this code. Inside the module (called ffi in this case), data types available to both C++ and Rust get defined.
  2. A extern "Rust" section is next. This lists types and functions defined on the Rust side that should be exposed to C++. cxx notices that the first argument to next_chunk is a mutable reference to the MultiBuf data type. It models MultiBuf as a class on the C++ side and makes next_chunk a member of that class.
  3. A unsafe extern "C++" section defines data types and functions available on the C++ side, which should be usable from Rust. cxx looks for information relevant to Rust here: You need to express life time information as well as whether a function is safe to call or not. In this case both new_client and put are safe. This information is relevant for the Rust side but has no effect on the C++ code that gets wrapped.

When to use cxx?

It work best when you can control both sides of the API. For example when you want to factor out some code from an existing C++ implementation into new library written in Rust. cxx is ideal here since defines a matching set of bindings and the C FFI interface between them in one go.

Don't generate bindings

A third option is to use the cpp crate in Rust to write C++ code inline. Let's look at a (shortened) Rust member function notify, taken from Slint source code:

          
          
    fn notify(&self) {
      let obj = self.obj;
      cpp!(unsafe [obj as "Object*"] {
        auto data = queryInterface(obj)->data();
        rust!(rearm [data: Pin<&A18yItemData> as "void*"] {
          data.arm_state_tracker();
        });
        updateA18y(Event(obj));
      });
    }
          
        

When I first saw this in Rust code, it blew my mind. What does this piece of code do?

  1. A local variable obj, holding a reference to a member variable obj (of type &c_void) is created.
  2. The cpp! macro (all callable macros in Rust end in `!`) processes all the code till the closing parenthesis at the end of the notify function.

    This macro implicitly declares an unsafe C++ function returning void, which takes one argument called obj of type Object*. The macro expect obj to be defined in the surrounding Rust code. The body of this C++ function is the code between the curly braces.

  3. While in the C++ world, we interact with obj to extract some information which we then store into a local variable data. This data is of course only visible inside the C++ function we just have defined implicitly. The surrounding Rust code can not see it.
  4. In the next line we use the rust! (pseudo-)macro. This switches back into the Rust language.
  5. This rust! macro creates another (rust) function called rearm, which will take a argument data of type Pin<A18yItemData>. This argument must exist in the surrounding C++ code and we expect it to have a type of void* there. We need to give type definitions for both C++ and Rust here as the cpp crate can unfortunately not find the type on the C++ side. The body of that Rust function will contain data.arm_state_tracker(); and will return void. It will also create the necessary bindings to call the new rearm function from C++. Once the rust! pseudo-macro has generated this code, it will replace itself with C++ code calling the rearm function through the generated C++ bindings.
  6. Back in the C++ function created by the cpp, we call have some more C++ code updateA11y(Event(obj)); and reach the end of the body of the implicitly created C++ function. Once the cpp macro has generated all its code, it replaces itself with a call to the C++ function it generated via the Rust binding it created for it.

After all the macros are expanded, we have two new functions generated, including the necessary bindings to call them. The final notify function seen by the Rust compiler is just the definition of the obj variable followed by a call to some binding taking this obj as argument.

This approach doesn't avoid the generation of bindings, so the title of this section is misleading. It handles a big part of the binding generation implicitly. Of course you still need to generate bindings for data types you want to access in both Rust and C++. The cpp crate has more macros to help with that.

How does this work?

The macros shipped by the cpp crate do generate all the code. You do need build system integration to build and link the generated C++ code.

When to use the cpp crate?

In Slint we use the cpp crate to interact with C++ GUI toolkits that have a stable API. It works great for this use case.

Summary

You have a wide range of options to integrate C++ and Rust code, but you always need to generate language bindings. This indirection avoids a tight coupling between the languages and opens up more design spaces for Rust to explore, but it also makes a seamless integration of Rust and C++ impossible.

Build system integration

Once you have a project that combines Rust and C++ code, you need to build both the Rust and the C++ parts, and merge both together into one consistent binary. Let's take a short look at what's necessary to build a cross-language project.

cargo, the official Rust build system, is the only supported way to build Rust code. You have a build system for your C++ code. Typically that build system isn't trivial, don't try to reimplement it in cargo. Integrate the two build systems with each other instead.

Let's start by looking at Cargo

Cargo

Having Cargo as the main build tool driving your project build is great if you have a little C++ code in a bigger Rust context. The typical use cases is generating bindings around C and C++ code.

Cargo can run arbitrary code at build time. It looks for a file called build.rs next to the Cargo.toml file. If a build.rs file exists, cargo builds and executes this file in the build process. The build.rs file can inform the rest of the build process by printing instructions to cargo on stdout. Check the cargo documentation for details.

build.rs is a normal Rust code and may use any crate specified as a build-dependency in the Cargo.toml file!

When working with C and C++ code the cc crate is interesting. It allows to drive a C or C++ compiler from within build.rs. This is ideal to build a few simple files. For bigger C or C++ projects you probably want to run the projects build system directly. The cmake crate comes in handy here. It drives the typical CMake configure, build, install workflow and exposes the CMake build targets to cargo afterwards.

Other build systems have similar support crates or can be driven via a lower level crates to run arbitrary commands like xshell.

CMake

I use CMake as one example of a build system widely used for C and C++ projects. Similar support is available for other build tools, some even claim to support Rust natively -- often by running the rust compiler directly (unsupported by Rust!).

Using the existing C++ build system to drive the entire build is ideal when you have a little Rust code in a bigger C++ project. A typical use case is replacing some small part of a project with code written in Rust or using a Rust library.

The corrosion projects provides cargo integration into CMake. A simple CMakeLists.txt file building a Rust example library and linking to it would look like this:

          
          
    cmake_minimum_required(VERSION 3.15)
    project(MyCoolProject LANGUAGES CXX)
    
    find_package(Corrosion REQUIRED)
    
    corrosion_import_crate(MANIFEST_PATH rust-lib/Cargo.toml)
    
    add_executable(cpp-exe main.cpp)
    target_link_libraries(cpp-exe PUBLIC rust-lib)
          
        
  1. You start out with the usual two lines in any CMake project, defining the minimum CMake version required to build the project followed by the project name and the programming languages CMake needs to build. Note that you don't mention Rust there.
  2. The find_package(Corrosion REQUIRED) line asks CMake to include the Corrosion support and fail if it isn't found. You could also use FetchContent to download Corrosion as part of your build instead.
  3. Now that corrosion is available, you can ask it to build Rust code using corrosion_import_crate, pointing it to an existing Cargo.toml file. Corrosion builds this Rust project and exposes all build targets to CMake.
  4. The last two lines in the example build a C++ binary file and link it to the Rust code.

Slint uses the Corrosion project to enable C++ developers to use the Slint library in C++ code without having to bother with Rust too much.

I hope this gives you a good starting place for your project integrating C++ and Rust code - or at you found some option you weren't aware of before. Please feel free to reach out with questions in the discussion on github.

Comments


Slint is a declarative GUI toolkit to build native user interfaces for desktop and embedded applications written in Rust, C++, or JavaScript. Find more information at https://slint.dev/ or check out the source code at https://github.com/slint-ui/slint