很多人优化代码,是凭感觉:「这里好像慢,我优化一下。」
但真正的优化应该是数据驱动的——先找到热点,再针对性优化。
今天我们聊聊如何用Rust做纳秒级计时和性能剖析。
1
2
3
4
5
6
7
8
9
10
11
use std::time::Instant;
fn main() {
let start = Instant::now();
// 执行代码
process_data();
let duration = start.elapsed();
println!("耗时: {:?}", duration);
}
1
2
3
4
5
6
7
8
use std::time::{SystemTime, UNIX_EPOCH};
#[inline]
pub fn nanos_since_epoch() -> u128 {
SystemTime::now()
.duration_since(UNIX_EPOCH)
.map_or(0, |d| d.as_nanos())
}
1
2
3
4
5
6
7
8
# 编译时保留符号
$ cargo build --release
# 运行并采样
$ perf record ./target/release/my_program
# 生成火焰图
$ perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn calculate_rsi(data: &[TickData]) {
// 计算RSI
}
fn criterion_benchmark(c: &mut Criterion) {
let data = generate_test_data(10000);
c.bench_function("rsi_calculation", |b| {
b.iter(|| calculate_rsi(black_box(&data)))
});
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
1
2
3
4
#[inline(always)]
fn fast_calculation(x: f64) -> f64 {
x * x
}
1
2
3
4
5
6
7
// 手动展开循环
for i in (0..n).step_by(4) {
result[i] = a[i] + b[i];
result[i+1] = a[i+1] + b[i+1];
result[i+2] = a[i+2] + b[i+2];
result[i+3] = a[i+3] + b[i+3];
}
1
2
3
4
5
6
7
8
use std::arch::x86_64::*;
unsafe {
let a = _mm256_loadu_pd(&data[i]);
let b = _mm256_loadu_pd(&other[i]);
let c = _mm256_add_pd(a, b);
_mm256_storeu_pd(&mut result[i], c);
}
性能优化是一门科学:先测量,再优化,再验证。
下一篇, FPGA行情解码 + Rust策略引擎:超低延迟系统架构
敬请期待。
(全文完)