在医疗领域建立模型时,训练和测试集的准备非常重要。下面是一些步骤和案例分析的示例。
建立一个模型来预测患者是否患有心脏病。我们可以采取以下步骤:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score
# 1. 数据收集和准备
data = pd.read_csv("medical_data.csv") # 假设数据保存在名为"medical_data.csv"的文件中
# 进行数据清洗和预处理,如处理缺失值、标准化等
# 2. 特征工程
features = data[["age", "sex", "blood_pressure", "cholesterol"]] # 根据实际数据选择特征
target = data["heart_disease"] # 假设目标变量为"heart_disease"
# 3. 训练集和测试集的划分
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3, random_state=42)
# 4. 模型选择和训练
scaler = StandardScaler() # 使用标准化处理特征
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model = LogisticRegression() # 使用逻辑回归作为模型
model.fit(X_train_scaled, y_train)
# 5. 模型评估和调整
y_pred = model.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
通过这个案例分析,我们可以建立一个用于预测心脏病的模型,并通过训练和测试集的划分对模型进行评估和调整,以提高模型的预测性能。