回归直线的斜率是指在回归分析中,自变量(X)每增加一个单位时,因变量(Y)平均变化的量。在线性回归模型中,回归直线通常表示为 ( Y = \beta_0 + \beta_1 X ),其中 ( \beta_1 ) 就是斜率。
以下是使用numpy和pandas计算回归直线斜率的示例代码:
import numpy as np
import pandas as pd
# 创建示例数据
data = {
'X': [1, 2, 3, 4, 5],
'Y': [2, 3, 5, 6, 8]
}
df = pd.DataFrame(data)
# 计算回归直线的斜率
X = df['X'].values.reshape(-1, 1)
Y = df['Y'].values.reshape(-1, 1)
# 使用numpy的线性代数模块计算斜率
X_mean = np.mean(X)
Y_mean = np.mean(Y)
numerator = np.sum((X - X_mean) * (Y - Y_mean))
denominator = np.sum((X - X_mean) ** 2)
slope = numerator / denominator
print(f"回归直线的斜率是: {slope[0][0]}")
原因:
解决方法:
from scipy import stats
# 计算Z-score
z_scores = np.abs(stats.zscore(df['Y']))
# 去除Z-score大于3的数据点
df_cleaned = df[(z_scores < 3)]
# 重新计算斜率
X_cleaned = df_cleaned['X'].values.reshape(-1, 1)
Y_cleaned = df_cleaned['Y'].values.reshape(-1, 1)
numerator_cleaned = np.sum((X_cleaned - np.mean(X_cleaned)) * (Y_cleaned - np.mean(Y_cleaned)))
denominator_cleaned = np.sum((X_cleaned - np.mean(X_cleaned)) ** 2)
slope_cleaned = numerator_cleaned / denominator_cleaned
print(f"去除异常值后的回归直线斜率是: {slope_cleaned[0][0]}")
通过以上方法,可以有效提高回归直线斜率计算的准确性。
领取专属 10元无门槛券
手把手带您无忧上云