纳秒级计时：Rust量化系统的性能剖析与热点优化

不吃草的牛德

发布于 2026-04-23 12:54:25

1080

文章被收录于专栏：RustRust

开场：找到真正的瓶颈

很多人优化代码，是凭感觉：「这里好像慢，我优化一下。」

但真正的优化应该是数据驱动的——先找到热点，再针对性优化。

今天我们聊聊如何用Rust做纳秒级计时和性能剖析。

纳秒级计时

使用std::time::Instant



1
2
3
4
5
6
7
8
9
10
11

use std::time::Instant;
 
fn main() {
    let start = Instant::now();
 
    // 执行代码
    process_data();
 
    let duration = start.elapsed();
    println!("耗时: {:?}", duration);
}

纳秒精度



1
2
3
4
5
6
7
8

use std::time::{SystemTime, UNIX_EPOCH};
 
#[inline]
pub fn nanos_since_epoch() -> u128 {
    SystemTime::now()
        .duration_since(UNIX_EPOCH)
        .map_or(0, |d| d.as_nanos())    
}

性能剖析

使用perf



1
2
3
4
5
6
7
8

# 编译时保留符号
$ cargo build --release
 
# 运行并采样
$ perf record ./target/release/my_program
 
# 生成火焰图
$ perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg

使用criterion基准测试



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn calculate_rsi(data: &[TickData]) {
    // 计算RSI
}

fn criterion_benchmark(c: &mut Criterion) {
    let data = generate_test_data(10000);

    c.bench_function("rsi_calculation", |b| {
        b.iter(|| calculate_rsi(black_box(&data)))
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

热点优化

1. 内联函数



1
2
3
4

#[inline(always)]
fn fast_calculation(x: f64) -> f64 {
    x * x
}

2. 循环展开



1
2
3
4
5
6
7

// 手动展开循环
for i in (0..n).step_by(4) {
    result[i] = a[i] + b[i];
    result[i+1] = a[i+1] + b[i+1];
    result[i+2] = a[i+2] + b[i+2];
    result[i+3] = a[i+3] + b[i+3];
}

3. SIMD向量化



1
2
3
4
5
6
7
8

use std::arch::x86_64::*;
 
unsafe {
    let a = _mm256_loadu_pd(&data[i]);
    let b = _mm256_loadu_pd(&other[i]);
    let c = _mm256_add_pd(a, b);
    _mm256_storeu_pd(&mut result[i], c);
}