Safety, guaranteesLast week: Out of Python
Our goal last week was to get more into the C++
and CMake world because we wanted stronger guarantees to reduce risk of programming bugs. Python has dynamic typing, duck typing, and overall lack of first-class support for strict typing. But C++
has strict typing, and you certainly can't make a variable without defining its type, so that should help us prevent lots of would-be bugs.
Developing a game will get complex very quickly, and I believe that managing complexity to maintain target growth is one of the largest barriers to success in a software project. Many games that fail to scale while keeping bugs at a minimum end up with poor reviews on Steam. The goal is to reduce mental burden at every step of the way. In terms of project management, this includes:
- Reduce branching factor of directories (smaller number of files and folders within any given folder) so that you can quickly navigate and find what you're looking for. In contrast, Cataclysm has 819 files in a single folder (this has absolutely nothing to do with the fact that it's
C++
; I was simply looking for inspiration in an open-source C++
-based project; C++
is certainly capable of having different organizational structure than this)
- Smaller scope for variables so it's easier to reason about what they do. In
C++
, when you make protected data within a class, you're implicitly making a kind of "global" scope since any subclass can modify that data, potentially leading to bugs
- Minimize or eliminate side-effects (if you have to make a side-effect as a hack to fix something, you're already beyond screwed and it's time to rearchitect and refactor major portions of your codebase)
- None of those pesky old pointer bugs or memory leaks
- Provable safety through mathematically oriented checks
This week: Into C++
Searching for Safety in C++
Past exposure
Aside from the promises of high performance and a lot of support in the game dev industry, my exposure to C and C++
has been through a cybersecurity lens about the horrendous security vulnerabilities due to dangling pointers, use-after-free, and all kinds of exciting pointer- and memory-related issues. Let's make sure we follow best practices so we don't fall prey to those same issues. Let's turn our safety and security scanners up to 11 so we can focus on developing a game; not chasing bugs.
Expectation
Considering that C++
has been out for 39 years, they must have figured out most of the problems by now. My compiler and toolchain will protect me from bugs, right? And considering that I've been on a FOSS binge lately, there must be a community-driven, single-source-of-truth resource out there that tells us how to code C++
correctly today.
Search
I looked for a solution in the open source community. In particular, something like an online book that says "This is the correct way to solve this common programming problem today, given the features now available in C++
, and how to enforce it in your compiler." Something that is actively maintained. The more contributors and activity, the better. Hopefully 100+ contributors, 5,000 commits, 10,000 stars; something like that.
But the only resources I could find that seemed trustworthy and updated were books from publishers, which are cost prohibitive for me right now, and https://www.learncpp.com/ which is excellent and teaches you how to write safer code, but still certainly can't offer you any guarantees. With a language as big, seemingly healthy, and beloved as C++
, shouldn't there be a big red button somewhere that's easy for newcomers to find and use and says "enable safe C++ only"? There are plenty of resources, but it seems overly complicated for someone who's just beginning; I'm hoping to start writing safe code sooner rather than later. I'm experienced and can learn all that stuff, but my goal is to reduce mental burden so we can focus on building a game.
I found myself on what seemed to be a side path reading a very academic discussion spanning multiple Github repositories about the language used to describe ""
-based includes and <>
-based includes, a fundamental concept in software development that allows us to split our code into smaller, more understandable chunks and share our code with others. Different compilers have different behavior for ""
and <>
in terms of how and what you can do with them and the order in which they are searched for within your machine. The result is watery, imprecise language surrounding these topics that leads to somewhat comical side-effects such as naming it "the thing manipulated with -I
" or "the <>
search" or "relative file versus angle-bracket include search path". If we can't even figure out what something as simple as this is called, how can we expect to navigate the ocean filled with unclear language?
Someone there mentioned something called the C++
Core Guidelines.. written by the man himself, Bjarne Stroustrup!
C++ Core Guielines
C++ Core Guidelines
It seems we may have found the answer; the secret to getting our safe code! Let's begin reading.
Following the rules will lead to code that is statically type safe, has no resource leaks, and catches many more programming logic errors than is common in code today.
Excellent! We're off to a great start.
The rules emphasize static type safety and resource safety. For that reason, they emphasize possibilities for range checking, for avoiding dereferencing nullptr
, for avoiding dangling pointers, and the systematic use of exceptions (via RAII). Partly to achieve that and partly to minimize obscure code as a source of errors, the rules also emphasize simplicity and the hiding of necessary complexity behind well-specified interfaces.
These are precisely all the things we want! Where do we sign up?
We are uncomfortable with rules that simply state “don’t do that!” without offering an alternative. One consequence of that is that some rules can be supported only by heuristics, rather than precise and mechanically verifiable checks.
No mechanically verifiable checks.. only heuristics.. Okay. So clearly, there's going to be some burden on the developer to ensure they follow these rules.. Just make sure you learn and follow the rules?? Am I getting this right?
Some rules aim to increase various forms of safety while others aim to reduce the likelihood of accidents, many do both. The guidelines aimed at preventing accidents often ban perfectly legal C++.
Yes, I'm perfectly happy to ban legal C++
if it improves safety. I want to get started with writing C++
already! But.. "aim to reduce the likelihood"?.. This language is a bit too open for me..
However, when there are two ways of expressing an idea and one has shown itself a common source of errors and the other has not, we try to guide programmers towards the latter.
Okay.. we try to guide the programmers.. as in those little toddler fences? Why can't I just disable those features with the click of a button or the installation of a single "The Correct Compiler" version 2024?
These rules are not meant to be read serially, like a book. You can browse through them using the links. However, their main intended use is to be targets for tools.
Okay.. these guidelines are meant to be implemented as tools.. and yet they can't verifiably provide safety.. and this guide isn't meant to be read by mere mortals.
What did we find?
C++
is a behemoth of backwards compatibility. It was designed from the ground up to be backwards compatible with C. This is part of its core philosophy from the beginning (so says Wikipedia).
When you first start using compilers to compile and run your code, you probably don't realize that what you're getting by default is a monster. It will let you use features from C++98 (as in 1998). As a newbie navigating various online resources, it's not immediately clear that the advice you're following might be using those same features. The same features that lead to memory bugs and pointer bugs and all those "fun" things we're trying to avoid. If you're fortunate enough to gain awareness of these issues, and are willing to do something about it, you must begin a personal journey to figure out how to disable them. Searching around the internet, you can find safer compiler settings to use, but it's not exactly reassuring that you have to search for this resource and explicitly tell your compiler every which way to not allow unsafe code (literally "your compiler", since every compiler will have a different set of safety settings).
One of the major purposes of getting off of Python was to get safer code; why do we need to tune and churn the compiler into submission? Why can't I just type gcc --safe main.cpp
? Better yet, why is the default not simply gcc main.cpp
(safe by default) and when you really need it, gcc --unsafe main.cpp
?
C++
has made significant strides towards safety each version since and including C++11 (2011). But compilers don't enforce it by default and it's not easy to find a recent up-to-date resource that isn't currently cost prohibitive which clearly guides you on setting all the flags and then how to use the safe features. Most resources explicitly state they will not prevent you from using old constructs. You have to piecemeal your own learning guide, while all the other newbies piecemeal theirs.
Failure to satisfy the constraints
So, although I found this great effort by the designer of C++
himself, there's no guarantees, there's no direct ability to turn off unsafe features, and you're not even supposed to read the guidelines directly. Although some brave souls might read this document and try to abide by it.. I even started reading a bit of it and learned quite a bit.. You can still make plenty of critical bugs because that's the way it's always been.
As a side note, the White House (of Washington D.C., US) issued a statement earlier this year essentially begging developers (or their employers) to stop using C++
in favor of memory-safe languages such as Rust. I'm not automatically saying that I agree with them; it's just an interesting side point that highlights how far up the food chain these problems have reached that the President of the United States has to say something about it. There's also frequently quoted research by Microsoft and Google that 70% of all security bugs are related to memory safety issues, particularly in C and C++
. If Microsoft and Google are still using toddler fences to try to "guide" their $100,000-per-year developers to avoid those types of bugs, I don't trust that there's any safe configuration in C++
.
And into Rust
My girlfriend has already started learning Rust; multiple friends of mine have shown interest in Rust with some of them actively learning it; it's been out for 9 years; it's provably safer than C and C++
and there are whole classes of common bugs in C++
that you can't even make in Rust if you tried. You don't have to tell the compiler not to let those bugs happen; it simply won't let you compile that code!
- How to start a new project?
cargo new my_project; cd my_project
- How to run the code?
cargo run
- How to add a dependency?
cargo add rand
- How to add a dependency directly from Github if it's not on https://crates.io?
cargo add --git https://github.com/not-fl3/macroquad.git macroquad
# this is one possible media-/game-oriented library
- How to compile and run with the new dependencies?
cargo run
- How to build a release-ready compiled binary?
cargo build --release
The main concern raised by newcomers to Rust is the steep learning curve of the borrowing system. However, they have an amazing free online book with quizzes that carefully introduces how to use it, with visualizations of the stack and heap, variables, and pointers. It's still challenging, but isn't any new programming concept challenging? After all, once you figure this out, you'll never make another dangling pointer bug...
After spending about 1.5 weeks down the C++
rabbit hole and finding the associated communities, it was a tough decision to switch off, but not as tough of a decision if you consider the tens, hundreds, or even thousands of hours of my life that could be wasted chasing down memory issues in C++
. I'm excited to grow something big and complex, fun and engaging, without worrying about pointer problems and unnecessary memory issues. (I'm excited to tackle genuinely interesting memory-related challenges such as optimizing memory usage and minimizing defragmentation with strategies such as Object Pool.)
I only started working through the Rust book today, but I'm already just past the first section of ownership, 4.1 What is Ownership? Tl;dr:
- An object on the heap can only be owned by one variable at a time
- The lifetime of the object on the heap is tied to the lifetime of the variable that owns it
- When the owning variable goes out of scope, so does the object on the heap, and it gets cleaned up and freed back to the memory pool
There's also references and borrowing, which I'll most likely talk about (at least briefly) next week. I expect to be able to get through most or all of the 20 chapters by the end of next Friday.
I haven't completely written off C++
forever. The world is built on C and C++
. The majority of my tech stack from the Linux kernel to the windowing system are mostly in C and C++
. There's also an initiative to make C++
safer, which is aiming to do something like I was previously searching for: a compiler that simply won't compile a variety of unsafe constructs. However, despite C++
approaching 40 years of age, it's only an initiative and it's not out yet (it's dated this month, 2024-09-11).
The Future
I'm excited to begin with a memory-safe, non-garbage-collected, high performance language that will scale with my game. I think there's a lot of bright years ahead of us!
Thank you for tuning in this week!
Side Notes
This is extra junk I wanted to talk about but didn't want to clutter up the main content too much.
I wrote an entire section on CMake that didn't make it into this post. Tl;dr: CMake isn't just a text-based configuration; it's an entire LANGUAGE. You don't just have to learn C++
, you have to learn CMake too. And the CMake world is as confusing and messy as C++
.
C++
ians seem to believe that struggling through the challenges with memory bugs and learning how to prevent them is a rite of passage, not something to be completely avoided through early training, rigorous practices, and compiler settings. But I don't want to be unknowingly planting landmines all over my codebase that I have to struggle with later while I wade through the piles of resources just to learn the basics that I wish the community would present at the forefront. 40 years is a long time and the community is fragmented; any conversation over ""
- versus <>
-style includes is likely to spark debate that never concludes satisfactorily as there are many compilers with different behaviors.
A lot of the work-work available to me is in C++
. I haven't seen a single Rust-related task yet. As a result, I'll probably still be learning and using C++
; I just won't be trying to scale a codebase with it.
One thing that I found that I was really excited to share this week had I continued down the C++
path is Cling. Cling is essentially a REPL, which allows you to quickly type C++
code on the command line and test it out, reducing the time required to see the results of your code as you try, for example, static casts of data types to see the results. For fun, try:
#include <cmath>
// fits uint, sets every bit to 1
static_cast<uint>(pow(2, 8*sizeof(uint))-1)
// overflows uint, result is every bit set to 0
static_cast<uint>(pow(2, 8*sizeof(uint)))
Another super cool thing you can do is make a C++
script similar to how you can make a Shell, Bash, or Python script. If you have Cling installed, you can save the following to a file, such as unique_ptr.cling
, enable the executable bit using chmod u+x unique_ptr.cling
, and then run it from the command line using ./unique_ptr.cling
. This makes it easy to, for example, create lots of little samples and demos of C++
functionality.
#!/usr/bin/cling -std=c++17
// when you use "./unique_ptr.cling" on the command line, this line
// tells your shell to start this script with the /usr/bin/cling interpreter
// using the C++17 standard features
#include <memory>
#include <iostream>
class Texture {
public:
Texture(int val) : value(val) { /* Load texture here */ }
~Texture() { /* Free texture resources here */ }
int value = 5;
};
int main() {
std::unique_ptr<Texture> texture = std::make_unique<Texture>(5);
std::cout << "our texture has a value of: " << texture->value <<std::endl;
// No need to delete texture, it will be automatically destroyed when out of scope
return 0;
}
main();
You can still have memory issues in Rust, but they will more likely be associated with memory fragmentation as you request and release large portions of memory without carefully managing how they are laid out on the heap. If you're not careful about how you design your software, a user/attacker may be able to trigger it to use up memory more quickly than anticipated and result in a crash, but at least it will only lead to a crash. This is still an attack vector on availability that can lead to an opening for DoS attacks, but at least it won't be opening a hole that allows attackers to run arbitrary code and steal information from your computer such as banking credentials. In C++
, triggering a crash is the first hint to an attacker that they can probably exploit the vulnerability into something much scarier; in Rust, I hope (and will research) that a crash typically leads to just that: only a crash.
I didn't read a ton of the C++ Core Guidelines, but my favorite part that really suck out to me was Avoid protected
data and Minimize exposure of members. (Note "Avoid" protected data and "minimize exposure" of members, which continues my concerns; why can't I tell my compiler to entirely disallow protected data?..) This offers an excellent example showing the danger of protected data that implicitly becomes a sort of "global" variable that becomes modifiable by any subclass, making it difficult to reason about the scope and behavior of these variables. The guidelines have plenty to teach, whether you use C++
or not.