【背景】
最近有朋友反馈说OGG所在磁盘空间满,手动清理磁盘空间后,无法启动OGG进程,当时想想不应该,以前遇到很多次,空间满后,手动清理空间,如果mgr配置自启动或者手动启动进程,都是瞬间搞定。朋友说关闭mgr后,重启进程还是一样是abend状态,但是查看进程日志却无任何日志输出。
1、【OGG通过ggsci无法启动,但无任何报错】
GGSCI (TEST) 1> start DUMPTEST
Sending START request to MANAGER ...
EXTRACT DUMPTEST starting
GGSCI (TEST) 2> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT RUNNING
EXTRACT STOPPED DUMPTEST 00:00:00 00:06:39
EXTRACT STOPPED EXTTEST 00:00:02 00:00:08
GGSCI (TEST) 9> view report DUMPTEST
--无任何输出且任何alter命令设置添加extract进程都无法工作。
2、【怀疑是进程的文件存在问题导致】
一般是操作系统异常重启或者磁盘空间满,ogg进程出现假死情况,ogg进程启动后记录一个文件(类似lock文件),手动删除还是不行,基本上确认不是进程假死造成的。
3、【OGG却可以通过os命令启动--ggsci底层也是调用os命令】
extract PARAMFILE /ogg/dirprm/dumptest.prm REPORTFILE /ogg/dirrpt/DUMPHXTEST.rpt
extract PARAMFILE /ogg/dirprm/exttest.prm REPORTFILE /ogg/dirrpt/EXTTEST.rpt
再次验证ogg状态
GGSCI (test) 3> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT RUNNING
EXTRACT RUNNING DUMPHXTEST 00:00:00 00:00:08
EXTRACT RUNNING EXTTEST 00:00:01 00:00:05
虽然通过os层命令启动ogg进程,紧紧是临时处理方式,因为管理毕竟要通过ggsci接口去管理。
4、【分析ggserr.log】
刚开始初步检查进程report和ggserr文件并没有发现什么特别异常报错,经过仔细检查发现ggserr.log有一条很不起眼输出:
2018-09-14 23:08:52 WARNING OGG-01934 Oracle GoldenGate Manager for Oracle, mgr.prm: Datastore repair failed.
2018-09-14 23:09:01 ERROR OGG-01098 Oracle GoldenGate Capture for Oracle, exttest.prm: Could not flush "./dirdat/tt001535" (error 28, No space
left on device).
2018-09-14 23:09:02 WARNING OGG-01934 Oracle GoldenGate Manager for Oracle, mgr.prm: Datastore repair failed.
Datastore repair failed.--datastore怎么出现这个在这里,经过分析ogg存在jagent进程,是em 监控ogg或者管理ogg是创建java agent进程且采集数据存在datastore,此处发现jagent也是正常,怎么会影响OGG进程?比较诡异。
但是手动在ggsci命令下执行却没有报错.
GGSCI (test) 4>
REPAIR DATASTORE
Datastore repaired
GGSCI (TEST) 5> start DUMPTEST
Sending START request to MANAGER ...
EXTRACT DUMPTEST starting
GGSCI (TEST) 6> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT RUNNING
EXTRACT STOPPED DUMPTEST 00:00:00 00:06:39
EXTRACT STOPPED EXTTEST 00:00:02 00:00:08
GGSCI (TEST) 7> view report DUMPTEST
--依然无法启动,难道是无法修复?
5、【对jagent进行相关处理,来验证是否跟jagent有关系】
【停止jagent进程】--依然无法启动
GGSCI (TEST) 1>stop JAGEN
GGSCI (TEST) 1> start DUMPTEST
Sending START request to MANAGER ...
EXTRACT DUMPTEST starting
GGSCI (TEST) 2> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT STOPPED
EXTRACT STOPPED DUMPTEST 00:00:00 00:06:39
EXTRACT STOPPED EXTTEST 00:00:02 00:00:08
GGSCI (TEST) 9> view report DUMPTEST
【临时rename jagent对应目录】--居然可以启动
cd $OGG
mv dirbdb dirbdb.old
GGSCI (TEST) 1> start extract *
GGSCI (TEST) 2> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT STOPPED
EXTRACT RUNNING DUMPTEST 00:00:00 00:06:39
EXTRACT RUNNING EXTTEST 00:00:02 00:00:08
--经过验证ogg进程无法启动跟jagent有直接关系.
【验证jagent report文件】--有错误获取信息,下次重试
2018-09-17 12:56:59 [MessageCollector] INFO MessageCollector - Flushing messages for EXTTEST
2018-09-17 12:56:59 [MessageCollector] INFO MessageCollector - Flushing messages for DUMTEST
2018-09-17 12:56:59 [MessageCollector] INFO MessageCollector - Flushing messages for MGR
2018-09-17 12:56:59 [MessageCollector] ERROR MessageCollector - Error retrieveing messages. Try again in the next polling interval.
2018-09-17 12:57:04 [MessageCollector] ERROR MessageCollector - Error retrieveing messages. Try again in the next polling interval.
2018-09-17 12:57:09 [MessageCollector] ERROR MessageCollector - Error retrieveing messages. Try again in the next polling interval.
2018-09-17 12:57:14 [MessageCollector] ERROR MessageCollector - Error retrieveing messages. Try again in the next polling interval.
6、【datastore出现问题,只能重建】
--停止所有进程包括mgr和jagent
GGSCI (TEST) 1>stop *
GGSCI (TEST) 2>stop jagent
GGSCI (TEST) 3>stop mgr
--重建jagent datastore
GGSCI (TEST) 4> create datastore mmap
Datastore created
GGSCI (TEST) 5> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER STOPPED
JAGENT STOPPED
EXTRACT ABENDED DUMPTEST 00:00:00 00:13:06
EXTRACT ABENDED EXTTEST 00:00:03 00:12:57
GGSCI (TEST) 6> start mgr
Manager started.
GGSCI (TEST) 7> start jagent
GGSCI (TEST) 8> start *
GGSCI (TEST) 9> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT RUNNING
EXTRACT RUNNING DUMPTEST 00:00:00 00:05:31
EXTRACT RUNNING EXTTEST 00:00:03 00:05:22
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有