内存泄露是Android性能优化的重要内容,而Leakcanary是发现和分析内存问题的常用工具。本篇文章就对LeakCanary的原理进行分析,希望能帮助到你。 作者:FreemanGordon 原文: https://juejin.cn/user/3368559355374285 本文由作者授权发布
内存泄漏指的是程序在向系统申请分配内存空间,使用完毕后未释放,结果导致一直占据该 内存单元,程序无法再使用该内存单元。在Android系统中,一般指的是对象在超出自身生命周期后, 该对象仍然没有被回收。泄漏包括的种类有:
泄漏容易导致应用进程内存飙升,最终导致OOM或者Too many open files相关的崩溃。这些崩溃点 一般都是压倒骆驼的最后一根稻草,并非崩溃的根本原因,需要dump内存或者句柄的打开情况,才能比较直观的修复问题
在 OOM 和内存触顶时通过用户无感知 dump 来获取 HPROF 文件,当 App 退出到后台且内存 充足的情况进行分析,裁剪 HPROF 回传进行分析,线上MAT分析HPROF文件并生成链路和报表。 可以上报大对象或者频繁创建的过多的小对象导致的内存吃紧,预防OOM
利用系统内核COW(Copy-on-write,写时复制)机制,每次dump内存镜像前先暂停虚拟机, 然后fork子进程来执行dump操作,父进程在fork成功后立刻恢复虚拟机运行,整个过程对于父进程来讲 总耗时只有几毫秒。内存镜像于闲时进行独立进程单线程本地分析,分析完即删除。
https://github.com/KwaiAppTeam/KOOM
利用对 leakcanary 的定制,并将泄漏 trace 上报到业务 server 中
LeakCanary: It automatically watches destroyed activities and fragments, triggers a heap dump, runs Shark Android and then displays the result.
LeakCanary 的图标为一只鸟,实际上就是 Canary 的直译 —— 金丝雀。因为在较早前,由于金丝雀对有害气体的敏感性,在矿场里面经常用来探测矿井的气体 LeakCanary 分为五部分
新版本的Leakcanary引入很简单,只需一个gradle依赖
debugImplementation 'com.squareup.leakcanary:leakcanary-android:2.7'
或者
debugImplementation 'com.squareup.leakcanary:leakcanary-android-process:2.7'
两者相同点:
AppWatcherInstaller$MainProcess
(因为要dump主进程的内存),利用ContentProvider在进程启动时自动初始化两者的区别:
早期版本需要在应用Application的onCreate中处理Leakcanary的初始化,新版本中为了降低接入成本, 将Leakcanary的初始化收拢到库定义的AppWatcherInstaller里面,原理是利用ContentProvider的 onCreate的初始化时机早于Application的onCreate(PS:晚于Application的attachBaseContext)
AcitivtyThread.java
private void handleBindApplication(AppBindData data) {
.....省略代码
try {
// If the app is being launched for full backup or restore, bring it up in
// a restricted environment with the base application class.
app = data.info.makeApplication(data.restrictedBackupMode, null); // Application#attachBaseContext()
.....省略代码
// don't bring up providers in restricted mode; they may depend on the
// app's custom Application class
if (!data.restrictedBackupMode) {
if (!ArrayUtils.isEmpty(data.providers)) {
installContentProviders(app, data.providers); // 创建ContentProvider
}
}
.....省略代码
try {
mInstrumentation.callApplicationOnCreate(app);// Application#onCreate()
}
.....省略代码
}
AppWatcherInstaller.onCreate调用Leakcanary的初始化
/**
* contentProvider 创建时开始初始化Leakcanary
*/
override fun onCreate(): Boolean {
val application = context!!.applicationContext as Application
AppWatcher.manualInstall(application) // 初始化
return true
}
AppWatcher.manualInstall 会设置一个默认的对象进行检测的延时时间和默认检测的对象类型
@JvmOverloads
fun manualInstall(
application: Application,
retainedDelayMillis: Long = TimeUnit.SECONDS.toMillis(5), // 添加进watchedObjects后5s开启检测
watchersToInstall: List<InstallableWatcher> = appDefaultWatchers(application) // 检测对象
) {
// ...省略代码
// leakcanary 核心组件,负责实现泄漏的检测和触发 heap dump。默认实现类是:InternalLeakCanary.kt
LeakCanaryDelegate.loadLeakCanary(application)
// 注册默认检测的对象
watchersToInstall.forEach {
it.install()
}
}
默认检测的对象有
fun appDefaultWatchers(
application: Application,
reachabilityWatcher: ReachabilityWatcher = objectWatcher
): List<InstallableWatcher> {
return listOf(
ActivityWatcher(application, reachabilityWatcher), // Activity
FragmentAndViewModelWatcher(application, reachabilityWatcher), // Fragment 和 viewModel
RootViewWatcher(reachabilityWatcher), // window
ServiceWatcher(reachabilityWatcher) // service
)
}
每种类型都会在特定的时机加入到 watchedObjects(一个map,保存需要检测的对象的弱引用,定义在 ObjectWatcher.kt) 里面等待检测,主要通过一些组件声明周期监听回调和一些 hook 点实现
通过注册 ActivityLifecycleCallbacks 回调,在 onActivityDestroyed 中将该 activity 加入 watchedObjects 里面等待检测
ActivityWatcher.kt
private val lifecycleCallbacks =
object : Application.ActivityLifecycleCallbacks by noOpDelegate() {
override fun onActivityDestroyed(activity: Activity) {
reachabilityWatcher.expectWeaklyReachable(
activity, "${activity::class.java.name} received Activity#onDestroy() callback"
)
}
}
通过注册 FragmentLifecycleCallbacks 回调,在 onFragmentViewDestroyed 和 onFragmentDestroyed 分别将 View 和 Fragment 加入 watchedObjects 里面等待检测
AndroidXFragmentDestroyWatcher.kt
override fun onFragmentViewDestroyed(
fm: FragmentManager,
fragment: Fragment
) {
val view = fragment.view
if (view != null) {
reachabilityWatcher.expectWeaklyReachable(
view, "${fragment::class.java.name} received Fragment#onDestroyView() callback " +
"(references to its views should be cleared to prevent leaks)"
)
}
}
override fun onFragmentDestroyed(
fm: FragmentManager,
fragment: Fragment
) {
reachabilityWatcher.expectWeaklyReachable(
fragment, "${fragment::class.java.name} received Fragment#onDestroy() callback"
)
}
}
ViewModel 的检测比较巧妙,Leakcanary 在 Fragment onCreate 时为当前的 Fragment 添加一个 ViewModel,而这个 ViewModel 利用跟随宿主生命周期的原理,在被执行 onClear 时,通过 hook 当前宿主的所有 viewmodel 后,遍历将这些 viewmodel 加入到 watchedObjects 里面
AndroidXFragmentDestroyWatcher.kt
override fun onFragmentCreated(
fm: FragmentManager,
fragment: Fragment,
savedInstanceState: Bundle?
) {
// 在 Fragment 执行 onCreate 时为当前 Fragment 添加一个 ViewModel
ViewModelClearedWatcher.install(fragment, reachabilityWatcher)
}
ViewModelClearedWatcher.kt
fun install(
storeOwner: ViewModelStoreOwner,
reachabilityWatcher: ReachabilityWatcher
) {
val provider = ViewModelProvider(storeOwner, object : Factory {
@Suppress("UNCHECKED_CAST")
override fun <T : ViewModel?> create(modelClass: Class<T>): T =
ViewModelClearedWatcher(storeOwner, reachabilityWatcher) as T
})
provider.get(ViewModelClearedWatcher::class.java) // 添加到 storeOwner 中
}
// hook 到当前 Fragment 的所有 viewModel
viewModelMap = try {
val mMapField = ViewModelStore::class.java.getDeclaredField("mMap")
mMapField.isAccessible = true
@Suppress("UNCHECKED_CAST")
mMapField[storeOwner.viewModelStore] as Map<String, ViewModel>
} catch (ignored: Exception) {
null
}
// 该 ViewModel 被执行 onCleared 时,将所有的 viewModel 加入到检测队列中
override fun onCleared() {
viewModelMap?.values?.forEach { viewModel ->
reachabilityWatcher.expectWeaklyReachable(
viewModel, "${viewModel::class.java.name} received ViewModel#onCleared() callback"
)
}
}
Service 类似于 Activity,也是在 onDestroy 时将 Service 对象加入到 watchedObjects 里面,但是由于 Service 没有开放声明周期的回调,所以也是通过 hook 的放获取 Service 的声明周期
ServiceWatcher.kt
override fun install() {
checkMainThread()
check(uninstallActivityThreadHandlerCallback == null) {
"ServiceWatcher already installed"
}
check(uninstallActivityManager == null) {
"ServiceWatcher already installed"
}
try {
// hook ActivityThread 里面的 mH 的 mCallback
swapActivityThreadHandlerCallback { mCallback ->
uninstallActivityThreadHandlerCallback = {
swapActivityThreadHandlerCallback {
mCallback
}
}
// 代理对象
Handler.Callback { msg ->
// https://github.com/square/leakcanary/issues/2114
// On some Motorola devices (Moto E5 and G6), the msg.obj returns an ActivityClientRecord
// instead of an IBinder. This crashes on a ClassCastException. Adding a type check
// here to prevent the crash.
if (msg.obj !is IBinder) {
return@Callback false
}
// 拦截 STOP_SERVICE 消息,这里主要是预处理获取到即将要被 destroy 的 service 对象
if (msg.what == STOP_SERVICE) {
val key = msg.obj as IBinder
activityThreadServices[key]?.let {
onServicePreDestroy(key, it)
}
}
// 执行原有逻辑
mCallback?.handleMessage(msg) ?: false
}
}
// hook Activity Manage 对象
swapActivityManager { activityManagerInterface, activityManagerInstance ->
uninstallActivityManager = {
swapActivityManager { _, _ ->
activityManagerInstance
}
}
// 动态代理对象
Proxy.newProxyInstance(
activityManagerInterface.classLoader, arrayOf(activityManagerInterface)
) { _, method, args ->
// hook 到 service 真正 destroy 的时机,这里没法获取到servcie对象,所以要前面的预操作:onServicePreDestroy
if (METHOD_SERVICE_DONE_EXECUTING == method.name) {
val token = args!![0] as IBinder
if (servicesToBeDestroyed.containsKey(token)) {
// 将service封装成弱引用,5s 后触发留存检测
onServiceDestroyed(token)
}
}
// 执行原有逻辑
try {
if (args == null) {
method.invoke(activityManagerInstance)
} else {
method.invoke(activityManagerInstance, *args)
}
} catch (invocationException: InvocationTargetException) {
throw invocationException.targetException
}
}
}
} catch (ignored: Throwable) {
SharkLog.d(ignored) { "Could not watch destroyed services" }
}
}
再看 onServiceDestroyed 方法
private fun onServiceDestroyed(token: IBinder) {
// 通过 token 匹配到预处理时获取到的 service 对象
servicesToBeDestroyed.remove(token)?.also { serviceWeakReference ->
serviceWeakReference.get()?.let { service ->
// 将 service 对象加入到 watchedObjects 里面
reachabilityWatcher.expectWeaklyReachable(
service, "${service::class.java.name} received Service#onDestroy() callback"
)
}
}
}
上面已经把走读所有默认检测的对象的时机,在这之后,该对象将会被封装成一个弱引用,并关联一个回收队列。原理就是:
举个例子,一个 Activity A 被封装成一个弱引用 weakA,并把这个弱引用 weakA 加入到 watchedObjects 中,在 5s 后触发 gc,如果 A 的引用被加入到回收队列中,则该 A 是可已被回收,那将 weakA 从 watchedObjects 中移除。反之,如果 A 的引用没有被加入到回收队列中,则该 A 还有被其他对象引用,则判定为内存泄露,触发 heap dump 和 analyze 流程
在前面的文章中已经介绍了目标对象在特定的时机会加入到 watchedObjects 中,在被判断泄露后,就会启动 heap dump,这个流程比较复杂,我们以 Activity 泄漏举例,先看张 UML 图
再回头看看 Activity 对象的监听时机
ActivityWatcher.kt
private val lifecycleCallbacks =
object : Application.ActivityLifecycleCallbacks by noOpDelegate() {
override fun onActivityDestroyed(activity: Activity) {
// 在 Activity onDestroy 时加入 watchedObjects
reachabilityWatcher.expectWeaklyReachable(
activity, "${activity::class.java.name} received Activity#onDestroy() callback"
)
}
}
Tips: 在上面代码中的 noOpDelegate 通过 Kotlin 的委托机制和 java 的动态代理,只需要实现关注的的接口的方法,其他方法由委托自动补齐,这样代码可以清爽一点
reachabilityWatcher 其实是 ObjectWatcher,接着看 expectWeaklyReachable 方法
@Synchronized override fun expectWeaklyReachable(
watchedObject: Any,
description: String
) {
if (!isEnabled()) {
return
}
// 清理一遍已经被回收的对象
removeWeaklyReachableObjects()
val key = UUID.randomUUID()
.toString()
val watchUptimeMillis = clock.uptimeMillis()
// 封装成一个弱引用,若无其他引用,则该弱引用里面的对象在 gc 时将被回收并且会加入到 queue 里面
val reference =
KeyedWeakReference(watchedObject, key, description, watchUptimeMillis, queue)
SharkLog.d {
"Watching " +
(if (watchedObject is Class<*>) watchedObject.toString() else "instance of ${watchedObject.javaClass.name}") +
(if (description.isNotEmpty()) " ($description)" else "") +
" with key $key"
}
// 添加进去一个留存监测map,如果gc后,仍然在该map的对象则会判定为泄露
watchedObjects[key] = reference
// 添加一个监测 runnable, 5s 后执行
checkRetainedExecutor.execute {
moveToRetained(key)
}
}
先看看 removeWeaklyReachableObjects 的实现,这个方法在很多地方都有调用,是为了能够及时清除已经被回收的对象记录
private fun removeWeaklyReachableObjects() {
// WeakReferences are enqueued as soon as the object to which they point to becomes weakly
// reachable. This is before finalization or garbage collection has actually happened.
var ref: KeyedWeakReference?
do {
/**
* 弱引用包含的对象,如果被回收,则会被加入到关联的回收队列中
* 即:一个对象如果被成功回收,则这个对象的引用则会出现在这个回收队列中
*/
ref = queue.poll() as KeyedWeakReference?
if (ref != null) {
// 已经回收的对象从 watchedObjects 中移除记录
watchedObjects.remove(ref.key)
}
} while (ref != null)
}
在清理完成后,Leakcanary 将 Activity 对象封装成一个弱引用,并关联一个 UUID 生成的 key 和一个回收队列。并将这个key 和 弱引用对象 key-value 的形式加入到 watchedObjects,随机 checkRetainedExecutor post 一个 runnable 到主线程,执行时间默认为初始化时设置的延迟 5s 钟。再看下 moveToRetained(key) 的实现
@Synchronized private fun moveToRetained(key: String) {
removeWeaklyReachableObjects()
val retainedRef = watchedObjects[key]
if (retainedRef != null) {
retainedRef.retainedUptimeMillis = clock.uptimeMillis()
// onObjectRetainedListeners 实际就是在初始化时传入的 InternalLeakCanary.kt 对象
onObjectRetainedListeners.forEach { it.onObjectRetained() }
}
}
首先回到初始化时 InternalLeakCanary 处理的初始化逻辑
InternalLeakCanary.kt
override fun invoke(application: Application) {
// 传入 application 对象
_application = application
checkRunningInDebuggableBuild()
// 往 ObjectWatcher 注册对象留存检测监听
AppWatcher.objectWatcher.addOnObjectRetainedListener(this)
// heap dump
val heapDumper = AndroidHeapDumper(application, createLeakDirectoryProvider(application))
// 判断是否符合 GC 条件
val gcTrigger = GcTrigger.Default
val configProvider = { LeakCanary.config }
val handlerThread = HandlerThread(LEAK_CANARY_THREAD_NAME)
handlerThread.start()
val backgroundHandler = Handler(handlerThread.looper)
// 判断是否符合 heap dump 条件
heapDumpTrigger = HeapDumpTrigger(
application, backgroundHandler, AppWatcher.objectWatcher, gcTrigger, heapDumper,
configProvider
)
// 应用前后台监听,前后台监听逻辑差异化处理
application.registerVisibilityListener { applicationVisible ->
this.applicationVisible = applicationVisible
heapDumpTrigger.onApplicationVisibilityChanged(applicationVisible)
}
registerResumedActivityListener(application)
// 在桌面添加 leakcanary 图标
addDynamicShortcut(application)
// We post so that the log happens after Application.onCreate()
mainHandler.post {
// https://github.com/square/leakcanary/issues/1981
// We post to a background handler because HeapDumpControl.iCanHasHeap() checks a shared pref
// which blocks until loaded and that creates a StrictMode violation.
backgroundHandler.post {
SharkLog.d {
when (val iCanHasHeap = HeapDumpControl.iCanHasHeap()) {
is Yup -> application.getString(R.string.leak_canary_heap_dump_enabled_text)
is Nope -> application.getString(
R.string.leak_canary_heap_dump_disabled_text, iCanHasHeap.reason()
)
}
}
}
}
}
再回到前面的 onObjectRetained 方法,最终会调用到 HeapDumpTrigger.kt 的 scheduleRetainedObjectCheck()
HeapDumpTrigger.kt
fun scheduleRetainedObjectCheck(
delayMillis: Long = 0L
) {
val checkCurrentlyScheduledAt = checkScheduledAt
if (checkCurrentlyScheduledAt > 0) { // 大于 0 表示已经在监测,避免频繁重复检测
return
}
// 记录当前检测的时间
checkScheduledAt = SystemClock.uptimeMillis() + delayMillis
backgroundHandler.postDelayed({ // 子线程
checkScheduledAt = 0
// 检测留存的对象
checkRetainedObjects()
}, delayMillis)
}
通过设置时间戳避免了频繁重复检测,并 post 了一个 runnable 到子线程中
HeapDumpTrigger.kt
private fun checkRetainedObjects() {
// 是否可以触发dump heap
val iCanHasHeap = HeapDumpControl.iCanHasHeap()
val config = configProvider()
if (iCanHasHeap is Nope) {
if (iCanHasHeap is NotifyingNope) { // 发送一个 notification, 用户点击后,可以强制触发监测
// ...省略代码
}
return
}
// 获取当前仍然留存的对象个数
var retainedReferenceCount = objectWatcher.retainedObjectCount
if (retainedReferenceCount > 0) {
gcTrigger.runGc() // 触发GC
retainedReferenceCount = objectWatcher.retainedObjectCount // 再次获取没有被回收的对象个数
}
/**
* 根据留存个数,判断是否需要启动 dump heap
* 为了尽量减少影响,应用在前后台判断的个数并不一样,前台默认至少5个,后台则为至少1个(切后台的时间要超过监测周期)
*/
if (checkRetainedCount(retainedReferenceCount, config.retainedVisibleThreshold)) return
val now = SystemClock.uptimeMillis()
val elapsedSinceLastDumpMillis = now - lastHeapDumpUptimeMillis
if (elapsedSinceLastDumpMillis < WAIT_BETWEEN_HEAP_DUMPS_MILLIS) { // 一分钟之内不重复 heap dump
onRetainInstanceListener.onEvent(DumpHappenedRecently)
showRetainedCountNotification(
objectCount = retainedReferenceCount,
contentText = application.getString(R.string.leak_canary_notification_retained_dump_wait)
)
scheduleRetainedObjectCheck(
delayMillis = WAIT_BETWEEN_HEAP_DUMPS_MILLIS - elapsedSinceLastDumpMillis
)
return
}
dismissRetainedCountNotification()
val visibility = if (applicationVisible) "visible" else "not visible"
dumpHeap( // 触发dump heap
retainedReferenceCount = retainedReferenceCount,
retry = true,
reason = "$retainedReferenceCount retained objects, app is $visibility"
)
}
根据触发 GC 后查看 watchedObjects 中的个数,如果是在前台,则默认是大于等于5个,如果是在后台,则是大于等于 1 个,满足条件后,则触发 heap dump
HeapDumpTrigger.kt
private fun dumpHeap(
retainedReferenceCount: Int,
retry: Boolean,
reason: String
) {
saveResourceIdNamesToMemory()
val heapDumpUptimeMillis = SystemClock.uptimeMillis()
KeyedWeakReference.heapDumpUptimeMillis = heapDumpUptimeMillis
// dump heap 结果
when (val heapDumpResult = heapDumper.dumpHeap()) {
is NoHeapDump -> { // 失败
// ...省略代码
}
is HeapDump -> { // 成功
lastDisplayedRetainedObjectCount = 0
lastHeapDumpUptimeMillis = SystemClock.uptimeMillis()
objectWatcher.clearObjectsWatchedBefore(heapDumpUptimeMillis)
// 启动一个 IntentService 在子线程中分析 hprof
HeapAnalyzerService.runAnalysis(
context = application,
heapDumpFile = heapDumpResult.file,
heapDumpDurationMillis = heapDumpResult.durationMillis,
heapDumpReason = reason
)
}
}
}
最终会把 dump 到的 hprof 文件交给 HeapAnalyzerService 去分析,而 HeapAnalyzerService 则启动 Shark 分析内存文件。Shark 尚未熟悉,就不在往下分析了。分析过程中的每个步骤都会有相应回调:
OnAnalysisProgressListener.kt
enum class Step {
PARSING_HEAP_DUMP,
EXTRACTING_METADATA,
FINDING_RETAINED_OBJECTS,
FINDING_PATHS_TO_RETAINED_OBJECTS,
FINDING_DOMINATORS,
INSPECTING_OBJECTS,
COMPUTING_NATIVE_RETAINED_SIZE,
COMPUTING_RETAINED_SIZE,
BUILDING_LEAK_TRACES, // 构建泄露路径
REPORTING_HEAP_ANALYSIS
}
值得一提的是, Leakcanary 开放了内存分析结果的回调,用户可以实现接口,拿到结果后上报到自己的业务服务器
HeapAnalyzerService.kt#onHandleIntentInForeground()
/**
* 可以自定义onHeapAnalyzedListener,获取到 HeapAnalysis 上报trace
* LeakTraceWrapper.wrap(heapAnalysis.toString(), 120) 格式化trace
*/
config.onHeapAnalyzedListener.onHeapAnalyzed(fullHeapAnalysis)
官方推荐的方法是
class LeakUploader : OnHeapAnalyzedListener {
// leakcanary 的默认实现,主要是记录以后泄漏,避免重复上报
val defaultListener = DefaultOnHeapAnalyzedListener.create()
override fun onHeapAnalyzed(heapAnalysis: HeapAnalysis) {
TODO("Upload heap analysis to server")
// Delegate to default behavior (notification and saving result)
defaultListener.onHeapAnalyzed(heapAnalysis)
}
}
class DebugExampleApplication : ExampleApplication() {
override fun onCreate() {
super.onCreate()
// 使用自定义的 OnHeapAnalyzedListener,处理完自己业务后再代理原有逻辑
LeakCanary.config = LeakCanary.config.copy(
onHeapAnalyzedListener = LeakUploader()
)
}
}
最后附上 Leakcanary 关于如何修复泄漏的官方文档,里面说明了如何查看泄漏源。 https://square.github.io/leakcanary/fundamentals-fixing-a-memory-leak/
最后的最后摘录 leakcanary 关于使用弱引用修复内存泄露的说明,谨慎使用弱引用,从泄漏源头解决问题
Memory leaks cannot be fixed by replacing strong references with weak references. It’s a common solution when attempting to quickly address memory issues, however it never works. The bugs that were causing references to be kept longer than necessary are still there. On top of that, it creates more bugs as some objects will now be garbage collected sooner than they should. It also makes the code much harder to maintain.