在互联网应用快速发展的今天,数据量呈现爆炸式增长。作为后端开发者,我们常常会遇到单表数据量过亿导致的性能瓶颈问题。本文将以一个真实的4亿数据表分表案例为基础,详细介绍如何在不影响线上业务的情况下,完成按时间维度分表的完整过程,包含架构设计、具体实施方案、Java代码适配以及注意事项等全方位内容。
当MySQL单表数据量达到4亿级别时,会面临诸多挑战:
常见的分表策略有:
本文选择 按时间水平分表 ,因为:
-- 分析数据时间分布
SELECT
DATE_FORMAT(create_time, '%Y-%m') AS month,
COUNT(*) AS count
FROM original_table
GROUP BY month
ORDER BY month;制定明确的分表命名规则:
original_tableoriginal_table_202301original_table_2023archive_table_2022检查所有涉及该表的SQL:
-- 创建2023年1月的分表(结构完全相同)
CREATE TABLE original_table_202301 LIKE original_table;
-- 为分表添加同样的索引
ALTER TABLE original_table_202301 ADD INDEX idx_user_id(user_id);使用Java编写迁移工具:
public class DataMigrator {
private static final int BATCH_SIZE = 5000;
public void migrateByMonth(String month) throws SQLException {
String sourceTable = "original_table";
String targetTable = "original_table_" + month;
try (Connection conn = dataSource.getConnection()) {
long maxId = getMaxId(conn, sourceTable);
long currentId = 0;
while (currentId < maxId) {
String sql = String.format(
"INSERT INTO %s SELECT * FROM %s " +
"WHERE create_time BETWEEN '%s-01' AND '%s-31' " +
"AND id > %d ORDER BY id LIMIT %d",
targetTable, sourceTable, month, month, currentId, BATCH_SIZE);
try (Statement stmt = conn.createStatement()) {
stmt.executeUpdate(sql);
currentId = getLastInsertedId(conn, targetTable);
}
Thread.sleep(100); // 控制迁移速度
}
}
}
}CREATE VIEW original_table_unified AS
SELECT * FROM original_table_202301 UNION ALL
SELECT * FROM original_table_202302 UNION ALL
...
SELECT * FROM original_table; -- 当前表作为最新数据对于不能停机的关键业务表:
-- 创建分表
CREATE TABLE original_table_new LIKE original_table;
-- 创建触发器
DELIMITER //
CREATE TRIGGER tri_original_table_insert
AFTER INSERT ON original_table
FOR EACH ROW
BEGIN
IF NEW.create_time >= '2023-01-01' THEN
INSERT INTO original_table_new VALUES (NEW.*);
END IF;
END//
DELIMITER ;实现一个简单的表名路由器:
public class TableRouter {
private static final DateTimeFormatter MONTH_FORMAT =
DateTimeFormatter.ofPattern("yyyyMM");
public static String routeTable(LocalDateTime createTime) {
String month = createTime.format(MONTH_FORMAT);
return "original_table_" + month;
}
}<select id="queryByTime" resultType="com.example.Entity">
SELECT * FROM ${tableName}
WHERE user_id = #{userId}
AND create_time BETWEEN #{start} AND #{end}
</select>public List<Entity> queryByTime(Long userId, LocalDate start, LocalDate end) {
List<String> tableNames = getTableNamesBetween(start, end);
return tableNames.stream()
.flatMap(table -> mapper.queryByTime(table, userId, start, end).stream())
.collect(Collectors.toList());
}实现MyBatis的Interceptor接口:
@Intercepts(@Signature(type= StatementHandler.class,
method="prepare", args={Connection.class, Integer.class}))
public class TableShardInterceptor implements Interceptor {
@Override
public Object intercept(Invocation invocation) throws Throwable {
BoundSql boundSql = ((StatementHandler)invocation.getTarget()).getBoundSql();
String originalSql = boundSql.getSql();
if (originalSql.contains("original_table")) {
Object param = boundSql.getParameterObject();
LocalDateTime createTime = getCreateTime(param);
String newSql = originalSql.replace("original_table",
"original_table_" + createTime.format(MONTH_FORMAT));
resetSql(invocation, newSql);
}
return invocation.proceed();
}
}使用Spring Scheduler实现每月自动建表:
@Scheduled(cron = "0 0 0 1 * ?") // 每月1号执行
public void autoCreateNextMonthTable() {
LocalDate nextMonth = LocalDate.now().plusMonths(1);
String tableName = "original_table_" + nextMonth.format(MONTH_FORMAT);
jdbcTemplate.execute("CREATE TABLE IF NOT EXISTS " + tableName +
" LIKE original_table_template");
}public void archiveOldData(int keepMonths) {
LocalDate archivePoint = LocalDate.now().minusMonths(keepMonths);
String archiveTable = "archive_table_" + archivePoint.getYear();
// 创建归档表
jdbcTemplate.execute("CREATE TABLE IF NOT EXISTS " + archiveTable +
" LIKE original_table_template");
// 迁移数据
jdbcTemplate.update("INSERT INTO " + archiveTable +
" SELECT * FROM original_table WHERE create_time < ?",
archivePoint.atStartOfDay());
// 删除原数据
jdbcTemplate.update("DELETE FROM original_table WHERE create_time < ?",
archivePoint.atStartOfDay());
}指标 | 分表前 | 分表后 |
|---|---|---|
单条查询平均耗时 | 320ms | 45ms |
批量写入QPS | 1,200 | 3,500 |
备份时间 | 6小时 | 30分钟 |
MySQL分表是一个系统工程,需要结合业务特点选择合适的分片策略。本文介绍的按时间分表方案,在保证业务连续性的前提下,成功将4亿数据表的查询性能提升了7倍。希望这篇实践总结能为面临类似问题的开发者提供有价值的参考。
作者提示:任何架构改造都要先在测试环境充分验证,并准备好回滚方案!