Today I learned — episode 3 (strong typing in C++ vs. Rust)

November 4, 2016

This blog series is supposed to cover topics in software development, learnings from working in software companies, tooling, but also private matters (family, baby, hobbies).

Strong typing in Rust and comparison to C++

C++ enthusiast Arne Mertz recently wrote a post Use Stronger Types!, a title which immediately sounded like an appealing idea to me. Take a look at his article, or for a tl;dr, I recommend looking at the suggestion of strong typedefs and links to libraries implementing such constructs/macros.

My inclination towards the Rust programming language and own expertise in related C++ constructs (and attempts to use stronger typing in work projects) commanded me to research how the languages compare and what other simple options exist. Matter of fact, I’m going to present below some findings that I already had prepared for a draft presentation which compares C++ with Rust (with the goal of finding out where C++ could improve). This article explains possible alternatives in C++, a suggested solution that is very explicit, and how one can achieve something similar in Rust.

Terminology for code samples

As I’m working for payment service provider PPRO, my examples come from the financial sector. Let me quickly introduce a few relevant terms.

The term PAN essentially means a credit card number, where the full PAN may never be stored on disk ("at rest") without encryption (such as in a log file) or leave the protected environment, and has many more security restrictions demanded by the PCI-DSS data-security standard (PCI = Payment Card Industry). Masked PANs are the ones that can be displayed outside of a PCI environment. For example, if you purchase a product with your credit card (number 1234569988771234), that number may be stored in an encrypted form within a PCI-compliant environment. However it may only leave that environment in masked PAN form, that is, at most the first six digits (BIN = bank identification number) and the last four digits (non-identifying fraction of the customer’s card number). At your next purchase at the merchant, they can offer you to pay with the same card again (displaying the masked number 123456XXXX1234).

C++ typedef is not strong typing

We want to ensure that full PANs cannot be converted directly to masked PANs (among other reasonable restrictions). Look at a beginner attempt to define separate types:

#include <cstdint>
#include <iostream>
#include <string>

typedef std::string MaskedPan;
typedef std::string FullPan;
// Assuming PANs never start with a zero, we can stuff them into an integer type
typedef uint64_t MaskedPanU;
typedef uint64_t FullPanU;

int main()
{
    {
        FullPan full = "1234569988771234";
        MaskedPan masked = full;
        std::cout << "Masked (string): " << masked << std::endl;
    }

    {
        FullPanU full = 1234569988771234;
        MaskedPanU masked = full;
        masked += full; // even this works
        std::cout << "Masked (integer): " << masked << std::endl;
    }
}

Well, that compiled just fine and led to a fatal bug — we just logged a full PAN to stdout. That action shouldn’t have been possible. You don’t want to sit through and pay for those two extra weeks in the next credit card audit, not to mention the cleanup to get the sensitive data out of the way!

Using an enum wrapper is also ugly, and not really readable like English prose — so probably not a good idea in general.

The C++ standard gives a simple explanation:

A typedef-name is thus a synonym for another type. A typedef-name does not introduce a new type the way a class declaration (9.1) or enum declaration does.

Or in other words, typedef A B; seems to be no different in this use case from using B = A; — if there’s a difference at all?! Fortunately right in that quotation we have a proposed solution: declare a new type with struct/class/enum.

C++ strong typing with enum

While I wouldn’t recommend using an enum for scenario, it apparently has its strong typing benefits:

#include <cstdint>
#include <iostream>
#include <string>

enum class MaskedPanE : uint64_t {};
enum class FullPanE : uint64_t {};

auto maskPan(FullPanE full) -> MaskedPanE
{
    // Take first six and last four digits of full (optionally test that full
    // is at least 10 digits if not guaranteed by other code)
    const auto fullPan = std::to_string(static_cast<uint64_t>(full));
    const auto maskedPan = std::stoull(fullPan.substr(0, 6) + fullPan.substr(fullPan.size() - 4));
    return static_cast<MaskedPanE>(maskedPan);
}

int main()
{
    FullPanE full = static_cast<FullPanE>(1234569988771234);

    // Now we have strong typing :) This gives
    //   error: cannot convert 'FullPanE' to 'MaskedPanE' in initialization
    // MaskedPanE masked = full;

    MaskedPanE masked = maskPan(full);

    // Additional benefit: outputting only possible with explicit cast
    std::cout << "Masked (enum): " << static_cast<uint64_t>(masked) << std::endl;
}

C++ strong typing of strings

The std::string case actually is a no-brainer: strings are ubiquitous in business logic of most companies. They are used for

  • money amounts (different format, decimal and thousand separator, rounding, precision)

  • file paths, filenames

  • numbers

  • binary data, but also text of varying encodings (C++ and Unicode is a different story altogether 😜)

  • data types which are incompatible or should be semantically distinct (such as full and masked PANs in our scenario)

  • maaaaaany more use and abuse cases all around the globe

We have to create a new struct or class to have disjoint string-based data types.

#include <cstdint>
#include <stdexcept>
#include <iostream>
#include <string>

class StringBasedType
{
    const std::string _s;
public:
    explicit StringBasedType(const std::string& s): _s(s) {}
    auto str() const -> const std::string& { return _s; }
};

// Randomly using `struct` keyword here, could as well be `class X: public StringBasedType`
struct MaskedPan: StringBasedType
{
    explicit MaskedPan(const std::string& s): StringBasedType(s)
    {
        // Check input value: require 123456XXXX1234 format
        if (s.size() != 14 ||
            s.substr(0, 6).find_first_not_of("0123456789") != std::string::npos ||
            s.substr(6, 4) != "XXXX" ||
            s.substr(10).find_first_not_of("0123456789") != std::string::npos)
        {
            throw std::invalid_argument{"Invalid masked PAN"};
        }
    }
};

struct FullPan: StringBasedType
{
    explicit FullPan(const std::string& s): StringBasedType(s)
    {
        // Check input value based on assumptions
        if (s.size() < 13 || s.find_first_not_of("0123456789") != std::string::npos)
            throw std::invalid_argument{"Invalid full PAN"};
    }

    auto getMasked() const -> MaskedPan
    {
        const auto& s = str();
        // Use assumptions about string size and content from `MaskedPan` constructor
        return MaskedPan{s.substr(0, 6) + "XXXX" + s.substr(s.size() - 4)};
    }
};

int main()
{
    try
    {
        FullPan full = FullPan("1234569988771234");

        // This fails to compile because no such converting constructor exists
        //   error: conversion from 'FullPan' to non-scalar type 'MaskedPan' requested
        // MaskedPan masked = full;

        MaskedPan masked = full.getMasked();

        // Outputting only possible with explicit `str()` - more visible in a code review!
        // If you're calling `str()` all the time, you're probably misusing strong typing.
        std::cout << "Masked (string-based): " << masked.str() << std::endl;

        return 0;
    }
    catch (const std::exception& e)
    {
        std::cerr << "Exception: " << e.what() << std::endl;
    }
}

Summarizing this solution (one of many):

  • Everything is explicit:

    • conversion to raw string value (str()) and thus the ability to output or compare the value

    • conversion to disjoint type (must add constructor or method like getMasked)

    • construction of specific type (use explicit constructors)

  • Exactly one place for input validation/assertion

  • Can be adapted to other base types as well, not only strings

  • Operators must be defined manually. This can be an advantage, for instance, if the base type (here: string) can be compared for equality/order, but ordering does not make sense for the specific type (here: PAN). BOOST_STRONG_TYPEDEF(BaseType, SpecificType) is an example implementation which defines operators for you.

Strong typing wrappers are not — and will presumably never be — an inherent part of C++. Instead, the above solutions proved simple enough for the mentioned use cases. It’s on developers to decide whether they write a few lines of wrapper code to be very explicit, or choose a library which does the same thing.

Comparison with Rust

Rust has the same notion as C++'s aliasing typedef:

type Num = i32;

which has the same problems, so no need to repeat that topic.

Syntactically, Rust provides a very lightweight way of creating new types in order to achieve strong typing — tuple structs:

struct FullPan(String);
struct MaskedPan(String);

fn main() {
    let full = FullPan("1234569988771234".to_string());

    // Fails to build with
    //   error[E0308]: mismatched types
    //   expected struct `MaskedPan`, found struct `FullPan`
    // let masked: MaskedPan = full;

    let masked = MaskedPan("123456XXXX1234".to_string());
    println!("Masked (tuple struct): {}", masked.0);

    // Oops, no input validation: we can pass a full PAN value without getting an error
    let masked2 = MaskedPan("1234569988771234".to_string());
    println!("Masked2 (tuple struct): {}", masked2.0);
}

Well, that didn’t help much… Admittedly, tuple structs are more helpful for other use cases, such as struct Point(f32, f32) where it’s clear that X and Y coordinates are meant. A rule of thumb is: if you have to give the tuple fields a name to understand them, or you require input validation at construction time, don’t use a tuple struct. Remember that Rust uses an error model that is different from throwing exceptions, and in the above example there’s not even a constructor involved that could return an error (or panic) on invalid input.

Let’s replicate what we did in C++:

#[derive(Debug)]
struct FormatError { /* ... */ }

// Rust doesn't have object orientation i.e. we cannot "derive" from a base type
struct MaskedPan {
    value: String,
}

impl MaskedPan {
    pub fn new(value: &str) -> Result<Self, FormatError> {
        if value.len() != 14 || value[..6].find(|c: char| !c.is_digit(10)).is_some() ||
           value[6..10] != *"XXXX" ||
           value[10..].find(|c: char| !c.is_digit(10)).is_some() {
            Err(FormatError {})
        } else {
            Ok(MaskedPan { value: value.to_string() })
        }
    }

    pub fn as_str(&self) -> &str {
        &self.value
    }
}

struct FullPan {
    value: String,
}

impl FullPan {
    pub fn new(value: &str) -> Result<Self, FormatError> {
        if value.len() < 13 || value.find(|c: char| !c.is_digit(10)).is_some() {
            Err(FormatError {})
        } else {
            Ok(FullPan { value: value.to_string() })
        }
    }

    pub fn get_masked(&self) -> MaskedPan {
        // Since we already checked the `FullPan` value assumptions, we can call
        // `unwrap` here because, knowing the `MaskedPan` implementation, we can
        // be sure `new` will not fail.
        MaskedPan::new(&format!("{}XXXX{}",
                                &self.value[..6],
                                &self.value[self.value.len() - 4..]))
            .unwrap()
    }
}

fn main() {
    match FullPan::new("1234569988771234") {
        Ok(full) => {
            let masked = full.get_masked();
            println!("Masked (string-based): {}", masked.as_str())
        }
        Err(_) => println!("Invalid full PAN"),
    }
}

Should I use strong typing everywhere?

This questions seems to be mostly language-independent, and a matter of taste to some extent. In my experience, there are ups and downs:

Yay:

  • Safety from mistakes, especially if they can lead to horrific problems like in the credit card scenario, where full PANs could be leaked to the outside or written to disk if types are confused.

  • Code using the strong types may become more readable (as in: reading English prose) as things get spelled out explicitly

  • User-defined literals can make code even more concise, but that only applies to code which uses a lot of constants. To be honest, I’ve never had a project where those literals would be worthwhile.

Nay:

  • Much extra typing and explicit definition of operators/actions

  • Avoid using strings all over the place and you will have fewer problems from the start. For example, there’s boost::filesystem::path.

  • No real benefit for structures which probably never change and have well-named fields. To prevent mistakes in the order of constructor arguments, use POD structs and C++ designated initialization (syntax extension). Rust also has such a syntax, and additionally gives builds errors if you forgot to initialize a field. The builder pattern is a similar alternative (however not really beautiful). Stupid example:

// C++
struct CarAttribs
{
    float maxSpeedKmh; // kilometers per hour
    float powerHp; // horsepower
};

class Car
{
public:
    explicit Car(const CarAttribs& a) { /* ... */ }
};

int main()
{
    auto car = Car{{.maxSpeedKmh = 220, .powerHp = 180}};

    // Unfortunately that syntax doesn't prevent unspecified fields (no compiler warning)
    auto car2 = Car{{.maxSpeedKmh = 220}};
}
// Rust
struct CarAttribs {
    max_speed_kmh: f32, // kilometers per hour
    power_hp: f32, // horsepower
}

struct Car { /* ... */ }
impl Car {
    fn new(attribs: &CarAttribs) -> Self {
        Car{ /* ... */ }
    }
}

fn main() {
    let car = Car::new(&CarAttribs {
        max_speed_kmh: 220.0,
        power_hp: 180.0,
    });

    // This fails to build with
    //   error[E0063]: missing field `power_hp` in initializer of `CarAttribs`
    // let car2 = Car::new(&CarAttribs { max_speed_kmh: 220.0 });
}

In the end, you must decide per case. Often times, the declaration of functions or types allows for human errors, so before changing to strong typing, you should first consider if the order of parameters, name of fields, choice of constructor(s), et cetera are sane, consistent in their meaning (money amount shouldn’t be 123 cents in one place, but decimal number string 1.23 elsewhere) and follow the principle of least surprise and smallest risk of mistakes.

There’s also no clear winner between the languages — since strong typing is not a built-in feature in either language, you must roll your own or use a library, and that isn’t exactly elegant, but still readable.