Consider the following code.
#include <utility>
#include<vector>
std::vector<double> createSuperExpensiveData() {
// result is constructed locally
std::vector<double> result(10000, 10.0);
// ....
return std::move(result); // breaks RVO
}
C++ Optimization: Don’t std::move Your Returns
The Common Misconception
It looks intuitive: “I’m returning a local variable that is expensive to copy, so I should use std::move to ensure it’s efficient!” The Reality: You are likely making your code slower.
Modern Compilers & NRVO
Modern C++ compilers use NRVO (Named Return Value Optimization) or Copy Elision. Instead of a “Create -> Copy -> Destroy” cycle, the compiler optimizes the memory management:
- Standard Logic:
- Create
resultin the function’s stack frame. - Copy or move
resultto the caller’s context. - Destroy
resultin the function’s stack frame.
- Create
- NRVO Logic:
- Construct
resultdirectly in the caller’s memory.
- Construct
This effectively reduces the cost to zero operations.
Pessimization
NRVO has strict rules. For it to work, the return statement must return the variable by name.
When you write return std::move(result);:
- The compiler sees an expression, not a name.
- NRVO is disabled.
- You force a Move Operation.
You have traded a zero-cost operation (NRVO) for a move operation. This is called pessimization. While moving is usually cheaper than copying, it is still more expensive than doing nothing.
Let’s have a look on the assembly
See the below generated code when we compile with -02 flag.
; Without RVO - std::move
createSuperExpensiveData():
push rbx
mov rbx, rdi
mov edi, 80000
call operator new(unsigned long)
movsd xmm0, QWORD PTR .LC0[rip]
mov rcx, rax
lea rdx, [rax+80000]
.L2:
movsd QWORD PTR [rax], xmm0
add rax, 16
movsd QWORD PTR [rax-8], xmm0
cmp rax, rdx
jne .L2
mov QWORD PTR [rbx+8], rax
mov QWORD PTR [rbx+16], rax
mov rax, rbx
mov QWORD PTR [rbx], rcx
pop rbx
ret
.LC0:
.long 0
.long 1076101120
And this is the generated code when I use RVO, so just return result;.
; With RVO - just return result
createSuperExpensiveData():
push rbx
mov rbx, rdi
mov edi, 80000
call operator new(unsigned long)
movsd xmm0, QWORD PTR .LC0[rip]
lea rdx, [rax+80000]
mov QWORD PTR [rbx], rax
mov QWORD PTR [rbx+16], rdx
.L2:
movsd QWORD PTR [rax], xmm0
add rax, 16
movsd QWORD PTR [rax-8], xmm0
cmp rax, rdx
jne .L2
mov QWORD PTR [rbx+8], rax
mov rax, rbx
pop rbx
ret
.LC0:
.long 0
.long 1076101120
Let’s not bother much about the assembly instructions and let us just check on the .L2 label, which runs the initialiazation loop of the vector. The CPU spends many mov instructions before the final mov of the caller’s address when we don’t use RVO.
The compiler is trying to do RVO directly, but it fails, so it forces the move. There is still no copy but as you can see above there is some extra overhead.
In the RVO version, the pointers are set before the loop even starts.
The Simple Fix
Keep it simple. Just return the variable by name:
#include <utility>
#include<vector>
std::vector<double> createSuperExpensiveData() {
// result is constructed locally
std::vector<double> result(10000, 10.0);
// ....
return result; // This is simple and more efficient
}