RVO. What You Always Wanted to Know But Were Afraid to Ask

Alina Stepina · September 17, 2023

Introduction

Return Value Optimization (RVO)! It’s one of those topics in the world of C++ that everyone nods knowingly about. We’ve all seen those examples where you count constructor calls. We already used to return string and vectors soothing one’s conscience with this term. But is copy is the only thing that make return by value less effective with comparison to pass-by-refference. In this blog post, I’m about to pull back the curtain and reveal the secret assembler magic behind returning values from functions.

Disclamer:

In this blog post, our focus will be squarely on the GCC (GNU Compiler Collection) version 11.4, operating without any specific optimization flags. While other compilers and optimization settings certainly exist, delving into them all would be an expedition of colossal proportions.

Return Values: The Hidden Dance of Registers and Stacks

In the fascinating realm of assembly language, there are two methods for returning values from functions: the use of registers and the utilization of memory.

Return by register

One common way to return a value from a function in assembly is to use a register to hold the return value. The choice of register may depend on the calling convention for the architecture you’re working with. For example, in the x86 architecture, the EAX register is often used for returning values from functions.

mov eax, 42    ; Set the return value to 42 in EAX
ret            ; Return from the function

Return through memory

In some cases, you may need to return more complex data types or structures that cannot fit into a single register. In such cases, you can allocate memory for the return value, store the result in that memory, and pass a pointer to that memory location to the caller. The caller can then read the result from the memory location.

; Allocate memory for the return value
; This can vary depending on the platform and OS
push dword 4   ; Example: Allocate 4 bytes for an integer

; Calculate the address of the allocated memory
call malloc    ; Call a memory allocation function

; Store the result in the allocated memory
mov dword [eax], 42   ; Store 42 at the memory location pointed to by EAX

; Return the pointer to the allocated memory
mov eax, eax   ; Move the address in EAX to return it

ret

Summary

It’s hardly a revelation that when it comes to these two functions

void int_byref(int& i) {
    i = 42;
}

int int_byval() {
    return 42;
}

returning a value outperforms returning by reference.

full example

Back to the practive

Returning to the main point, in the ongoing battle between returning by value and utilizing passed-by-reference arguments, it’s critical to point that the creation of temporary objects isn’t the sole element at play here. Another critical factor to consider is where the memory is allocated. When memory is initially allocated within the callee’s stack, it necessitates the creation of a shared space for the object and subsequent copying. This operation can be quite costly and is generally best avoided when possible.

Compiller standart allows in certain cases code optimisation so that direct-initialisation applied instead copy-initialisation.

Like in example bellow:

#include <iostream>

class MyObject {
public:
    MyObject(int val) : value(val) {}
    int getValue() const { return value; }
private:
    int value;
};

MyObject createObject(int val) {
    MyObject obj(val);
    return obj; // The object is returned
}

int main() {
    MyObject newObj = createObject(42);
    std::cout << "Value: " << newObj.getValue() << std::endl;
    return 0;
}

In assebmbly:

_Z12createObjecti:
.LFB1735:
	.cfi_startproc
	endbr64
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$32, %rsp
	movl	%edi, -20(%rbp)
	movq	%fs:40, %rax
	movq	%rax, -8(%rbp)
	xorl	%eax, %eax
	movl	-20(%rbp), %edx
	leaq	-12(%rbp), %rax          ; Compute the address where the MyObject will be constructed
	movl	%edx, %esi
	movq	%rax, %rdii              ; Pass the address to the MyObject constructor
	call	_ZN8MyObjectC1Ei         ; Call MyObject constructor
	movl	-12(%rbp), %eax          ; Load the result into %eax (return value)
	movq	-8(%rbp), %rdx
	subq	%fs:40, %rdx
	je	.L6
	call	__stack_chk_fail@PLT
.L6:
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc

main:
.LFB1736:
	.cfi_startproc
	endbr64
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	pushq	%rbx
	subq	$24, %rsp
	.cfi_offset 3, -24
	movq	%fs:40, %rax
	movq	%rax, -24(%rbp)
	xorl	%eax, %eax
	movl	$42, %edi
	call	_Z12createObjecti       ; Call createObject function
	movl	%eax, -28(%rbp)         ; Store the returned value in a local variable
	leaq	.LC0(%rip), %rax
	movq	%rax, %rsi

The object is directly constructed in the memory allocated for newObj in the main function.

And going making that more practical. The ability of the compiller to optimise code depends a lot on the object you going to return. Several examples with popular types

std::array

RVO optimisation outperform passing-by-reference in some very small value.

std::array full example

std::vector

std::vector shows opposite results. Pay attention to the difference in the order of numbers between the vector and the array.

std::vector full example

Eigen

And Eigen demonstrates behaviour that is close to array. Eigen::VectorXd full example

Twitter, Facebook