Introduction
Return Value Optimization (RVO)! It’s one of those topics in the world of C++ that everyone nods knowingly about. We’ve all seen those examples where you count constructor calls. We already used to return string and vectors soothing one’s conscience with this term. But is copy is the only thing that make return by value less effective with comparison to pass-by-refference. In this blog post, I’m about to pull back the curtain and reveal the secret assembler magic behind returning values from functions.
Disclamer:
In this blog post, our focus will be squarely on the GCC (GNU Compiler Collection) version 11.4, operating without any specific optimization flags. While other compilers and optimization settings certainly exist, delving into them all would be an expedition of colossal proportions.
Return Values: The Hidden Dance of Registers and Stacks
In the fascinating realm of assembly language, there are two methods for returning values from functions: the use of registers and the utilization of memory.
Return by register
One common way to return a value from a function in assembly is to use a register to hold the return value. The choice of register may depend on the calling convention for the architecture you’re working with. For example, in the x86 architecture, the EAX register is often used for returning values from functions.
mov eax, 42 ; Set the return value to 42 in EAX
ret ; Return from the function
Return through memory
In some cases, you may need to return more complex data types or structures that cannot fit into a single register. In such cases, you can allocate memory for the return value, store the result in that memory, and pass a pointer to that memory location to the caller. The caller can then read the result from the memory location.
; Allocate memory for the return value
; This can vary depending on the platform and OS
push dword 4 ; Example: Allocate 4 bytes for an integer
; Calculate the address of the allocated memory
call malloc ; Call a memory allocation function
; Store the result in the allocated memory
mov dword [eax], 42 ; Store 42 at the memory location pointed to by EAX
; Return the pointer to the allocated memory
mov eax, eax ; Move the address in EAX to return it
ret
Summary
It’s hardly a revelation that when it comes to these two functions
void int_byref(int& i) {
i = 42;
}
int int_byval() {
return 42;
}
returning a value outperforms returning by reference.

Back to the practive
Returning to the main point, in the ongoing battle between returning by value and utilizing passed-by-reference arguments, it’s critical to point that the creation of temporary objects isn’t the sole element at play here. Another critical factor to consider is where the memory is allocated. When memory is initially allocated within the callee’s stack, it necessitates the creation of a shared space for the object and subsequent copying. This operation can be quite costly and is generally best avoided when possible.
Compiller standart allows in certain cases code optimisation so that direct-initialisation applied instead copy-initialisation.
Like in example bellow:
#include <iostream>
class MyObject {
public:
MyObject(int val) : value(val) {}
int getValue() const { return value; }
private:
int value;
};
MyObject createObject(int val) {
MyObject obj(val);
return obj; // The object is returned
}
int main() {
MyObject newObj = createObject(42);
std::cout << "Value: " << newObj.getValue() << std::endl;
return 0;
}
In assebmbly:
_Z12createObjecti:
.LFB1735:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movl -20(%rbp), %edx
leaq -12(%rbp), %rax ; Compute the address where the MyObject will be constructed
movl %edx, %esi
movq %rax, %rdii ; Pass the address to the MyObject constructor
call _ZN8MyObjectC1Ei ; Call MyObject constructor
movl -12(%rbp), %eax ; Load the result into %eax (return value)
movq -8(%rbp), %rdx
subq %fs:40, %rdx
je .L6
call __stack_chk_fail@PLT
.L6:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
main:
.LFB1736:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %rbx
subq $24, %rsp
.cfi_offset 3, -24
movq %fs:40, %rax
movq %rax, -24(%rbp)
xorl %eax, %eax
movl $42, %edi
call _Z12createObjecti ; Call createObject function
movl %eax, -28(%rbp) ; Store the returned value in a local variable
leaq .LC0(%rip), %rax
movq %rax, %rsi
The object is directly constructed in the memory allocated for newObj in the main function.
And going making that more practical. The ability of the compiller to optimise code depends a lot on the object you going to return. Several examples with popular types
std::array

RVO optimisation outperform passing-by-reference in some very small value.
std::vector

std::vector shows opposite results. Pay attention to the difference in the order of numbers between the vector and the array.
Eigen

And Eigen demonstrates behaviour that is close to array.
Eigen::VectorXd full example