前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >修改/dev/shm大小造成Oracle 12c集群启动故障解决

修改/dev/shm大小造成Oracle 12c集群启动故障解决

作者头像
星哥玩云
发布2022-08-16 14:47:42
4300
发布2022-08-16 14:47:42
举报
文章被收录于专栏:开源部署

由于维护人员修改Oracle Linux 7中的/dev/shm大小造成其大小小于Oracle实例的MEMORY_TARGET或者SGA_TARGET而导致集群不能启动(CRS-4535,CRS-4000) [grid@jtp1 ~]$ crsctl stat res -t CRS-4535: Cannot communicate with Cluster Ready Services CRS-4000: Command Status failed, or completed with errors.

检查asm磁盘的权限是否问题,发现磁盘权限正常 [root@jtp3 ~]# ls -lrt /dev/asm* brw-rw----. 1 grid oinstall 8, 128 Apr  3  2018 /dev/asmdisk07 brw-rw----. 1 grid oinstall 8,  48 Apr  3  2018 /dev/asmdisk02 brw-rw----. 1 grid oinstall 8,  96 Apr  3  2018 /dev/asmdisk05 brw-rw----. 1 grid oinstall 8, 112 Apr  3  2018 /dev/asmdisk06 brw-rw----. 1 grid oinstall 8,  64 Apr  3  2018 /dev/asmdisk03 brw-rw----. 1 grid oinstall 8,  80 Apr  3  2018 /dev/asmdisk04 brw-rw----. 1 grid oinstall 8,  32 Apr  3  2018 /dev/asmdisk01

重启crs [root@jtp1 bin]# ./crsctl stop crs -f CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'jtp1' CRS-2673: Attempting to stop 'ora.mdnsd' on 'jtp1' CRS-2673: Attempting to stop 'ora.gpnpd' on 'jtp1' CRS-2677: Stop of 'ora.mdnsd' on 'jtp1' succeeded CRS-2677: Stop of 'ora.gpnpd' on 'jtp1' succeeded CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'jtp1' CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'jtp1' CRS-2677: Stop of 'ora.drivers.acfs' on 'jtp1' succeeded CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'jtp1' succeeded CRS-2673: Attempting to stop 'ora.ctssd' on 'jtp1' CRS-2673: Attempting to stop 'ora.evmd' on 'jtp1' CRS-2677: Stop of 'ora.ctssd' on 'jtp1' succeeded CRS-2677: Stop of 'ora.evmd' on 'jtp1' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'jtp1' CRS-2677: Stop of 'ora.cssd' on 'jtp1' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'jtp1' CRS-2673: Attempting to stop 'ora.driver.afd' on 'jtp1' CRS-2677: Stop of 'ora.driver.afd' on 'jtp1' succeeded CRS-2677: Stop of 'ora.gipcd' on 'jtp1' succeeded CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'jtp1' has completed CRS-4133: Oracle High Availability Services has been stopped. [root@jtp1 bin]# ./crsctl start crs CRS-4123: Oracle High Availability Services has been started.

查看crs的alert.log发现磁盘组不能加载 [root@jtp1 ~]# tail -f /u01/app/grid/diag/crs/jtp1/crs/trace/alert.log 2018-04-02 18:30:21.227 [OHASD(8143)]CRS-8500: Oracle Clusterware OHASD process is starting with operating system process ID 8143 2018-04-02 18:30:21.230 [OHASD(8143)]CRS-0714: Oracle Clusterware Release 12.2.0.1.0. 2018-04-02 18:30:21.245 [OHASD(8143)]CRS-2112: The OLR service started on node jtp1. 2018-04-02 18:30:21.262 [OHASD(8143)]CRS-8017: location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred 2018-04-02 18:30:21.262 [OHASD(8143)]CRS-1301: Oracle High Availability Service started on node jtp1. 2018-04-02 18:30:21.567 [ORAROOTAGENT(8214)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 8214 2018-04-02 18:30:21.600 [CSSDAGENT(8231)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 8231 2018-04-02 18:30:21.607 [CSSDMONITOR(8241)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 8241 2018-04-02 18:30:21.620 [ORAAGENT(8225)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 8225 2018-04-02 18:30:22.146 [ORAAGENT(8316)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 8316 2018-04-02 18:30:22.211 [MDNSD(8335)]CRS-8500: Oracle Clusterware MDNSD process is starting with operating system process ID 8335 2018-04-02 18:30:22.215 [EVMD(8337)]CRS-8500: Oracle Clusterware EVMD process is starting with operating system process ID 8337 2018-04-02 18:30:23.259 [GPNPD(8369)]CRS-8500: Oracle Clusterware GPNPD process is starting with operating system process ID 8369 2018-04-02 18:30:24.275 [GPNPD(8369)]CRS-2328: GPNPD started on node jtp1. 2018-04-02 18:30:24.283 [GIPCD(8433)]CRS-8500: Oracle Clusterware GIPCD process is starting with operating system process ID 8433 2018-04-02 18:30:26.296 [CSSDMONITOR(8464)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 8464 2018-04-02 18:30:28.299 [CSSDAGENT(8482)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 8482 2018-04-02 18:30:28.496 [OCSSD(8497)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 8497 2018-04-02 18:30:29.538 [OCSSD(8497)]CRS-1713: CSSD daemon is started in hub mode 2018-04-02 18:30:36.015 [OCSSD(8497)]CRS-1707: Lease acquisition for node jtp1 number 1 completed 2018-04-02 18:30:37.087 [OCSSD(8497)]CRS-1605: CSSD voting file is online: AFD:CRS1; details in /u01/app/grid/diag/crs/jtp1/crs/trace/ocssd.trc. 2018-04-02 18:30:37.103 [OCSSD(8497)]CRS-1672: The number of voting files currently available 1 has fallen to the minimum number of voting files required 1. 2018-04-02 18:30:46.237 [OCSSD(8497)]CRS-1601: CSSD Reconfiguration complete. Active nodes are jtp1 . 2018-04-02 18:30:48.514 [OCTSSD(9302)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 9302 2018-04-02 18:30:48.535 [OCSSD(8497)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation. 2018-04-02 18:30:50.626 [OCTSSD(9302)]CRS-2407: The new Cluster Time Synchronization Service reference node is host jtp1. 2018-04-02 18:30:50.627 [OCTSSD(9302)]CRS-2401: The Cluster Time Synchronization Service started on host jtp1. 2018-04-02 18:31:04.202 [ORAROOTAGENT(8214)]CRS-5019: All OCR locations are on ASM disk groups [CRS], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc". 2018-04-02 18:41:00.225 [ORAROOTAGENT(8214)]CRS-5818: Aborted command 'start' for resource 'ora.storage'. Details at (:CRSAGF00113:) {0:9:3} in /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc. 2018-04-02 18:41:03.757 [ORAROOTAGENT(8214)]CRS-5017: The resource action "ora.storage start" encountered the following error: 2018-04-02 18:41:03.757+Storage agent start action aborted. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc". 2018-04-02 18:41:03.760 [OHASD(8143)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.storage'. Details at (:CRSPE00221:) {0:9:3} in /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd.trc. 2018-04-02 18:42:09.921 [ORAROOTAGENT(8214)]CRS-5019: All OCR locations are on ASM disk groups [CRS], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc".

检查跟踪文件,发现查询ASM_DISCOVERY_ADDRESS与ASM_DISCOVERY_ADDRESS属性时出现 [root@jtp1 ~]# more /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc Trace file /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root.trc Oracle Database 12c Clusterware Release 12.2.0.1.0 - Production Copyright 1996, 2016 Oracle. All rights reserved.

*** TRACE CONTINUED FROM FILE /u01/app/grid/diag/crs/jtp1/crs/trace/ohasd_orarootagent_root_93.trc ***

2018-04-02 18:42:09.165 : CSSCLNT:3554666240: clsssterm: terminating context (0x7f03c0229390) 2018-04-02 18:42:09.165 : default:3554666240: clsCredDomClose: Credctx deleted 0x7f03c0459470 2018-04-02 18:42:09.166 :    GPNP:3554666240: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:399] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote' 2018-04-02 18:42:09.253 : CSSCLNT:3554666240: clsssinit: initialized context: (0x7f03c045c2c0) flags 0x115 2018-04-02 18:42:09.253 : CSSCLNT:3554666240: clsssterm: terminating context (0x7f03c045c2c0) 2018-04-02 18:42:09.254 :  CLSNS:3554666240: clsns_SetTraceLevel:trace level set to 1. 2018-04-02 18:42:09.254 :    GPNP:3554666240: clsgpnp_dbmsGetItem_profile: [at clsgpnp_dbms.c:399] Result: (0) CLSGPNP_OK. (:GPNP00401:)got ASM-Profile.Mode='remote' 2018-04-02 18:42:09.257 : default:3554666240: Inited LSF context: 0x7f03c04f0420 2018-04-02 18:42:09.260 : CLSCRED:3554666240: clsCredCommonInit: Inited singleton credctx. 2018-04-02 18:42:09.260 : CLSCRED:3554666240: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access. 2018-04-02 18:42:09.294 : USRTHRD:3554666240: {0:9:3} 8033 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2018-04-02 18:42:09.300 : USRTHRD:3554666240: {0:9:3} 8033 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

2018-04-02 18:42:09.356 : CLSCRED:3554666240: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.5c82286a084bcf37ffa014144074e5dd.root not found 2018-04-02 18:42:09.356 : USRTHRD:3554666240: {0:9:3} 7755 Error 4 opening dom root in 0x7f03c064c980

检查ASM的alert.log 发现/dev/shm大小小于MEMORY_TARGET大小,并且给出了/dev/shm应该被设置的最小值 [root@jtp1 ~]# tail -f /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log WARNING: ASM does not support ipclw. Switching to skgxp WARNING: ASM does not support ipclw. Switching to skgxp WARNING: ASM does not support ipclw. Switching to skgxp * instance_number obtained from CSS = 1, checking for the existence of node 0... * node 0 does not exist. instance_number = 1 Starting ORACLE instance (normal) (OS id: 9343) 2018-04-02T18:31:00.187055+08:00 CLI notifier numLatches:7 maxDescs:2301 2018-04-02T18:31:00.193961+08:00 WARNING: You are trying to use the MEMORY_TARGET feature. This feature requires the /dev/shm file system to be mounted for at least 1140850688 bytes. /dev/shm is either not mounted or is mounted with available space less than this size. Please fix this so that MEMORY_TARGET can work as expected. Current available is 1073573888 and used is 167936 bytes. Ensure that the mount point is /dev/shm for this directory.

修改/dev/shm的大小可以通过修改/etc/fstab来实现,将/dev/shm的大小修改为12G [root@jtp1 bin]# df -h Filesystem          Size  Used Avail Use% Mounted on /dev/mapper/ol-root  49G  42G  7.9G  85% / devtmpfs              12G  28K  12G  1% /dev tmpfs                1.0G  164K  1.0G  1% /dev/shm tmpfs                1.0G  9.3M 1015M  1% /run tmpfs                1.0G    0  1.0G  0% /sys/fs/cgroup /dev/sda1          1014M  141M  874M  14% /boot [root@jtp1 bin]# vi /etc/fstab

# # /etc/fstab # Created by anaconda on Sat Mar 18 15:27:13 2017 # # Accessible filesystems, by reference, are maintained under '/dev/disk' # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info # /dev/mapper/ol-root    /                      xfs    defaults        0 0 UUID=ca5854cd-0125-4954-a5c4-1ac42c9a0f70 /boot                  xfs    defaults        0 0 /dev/mapper/ol-swap    swap                    swap    defaults        0 0

tmpfs                  /dev/shm                tmpfs  defaults,size=12G        0 0 tmpfs                  /run                    tmpfs  defaults,size=12G        0 0 tmpfs                  /sys/fs/cgroup          tmpfs  defaults,size=12G        0 0

重启集群后,再次检查集群资源状态恢复正常 -------------------------------------------------------------------------------- [grid@jtp1 ~]$ crsctl stat res -t -------------------------------------------------------------------------------- Name          Target  State        Server                  State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ASMNET1LSNR_ASM.lsnr               ONLINE  ONLINE      jtp1                  STABLE               ONLINE  ONLINE      jtp2                  STABLE ora.CRS.dg               ONLINE  ONLINE      jtp1                  STABLE               ONLINE  ONLINE      jtp2                  STABLE ora.DATA.dg               ONLINE  ONLINE      jtp1                  STABLE               ONLINE  ONLINE      jtp2                  STABLE ora.FRA.dg               ONLINE  ONLINE      jtp1                  STABLE               ONLINE  ONLINE      jtp2                  STABLE ora.LISTENER.lsnr               ONLINE  ONLINE      jtp1                  STABLE               ONLINE  ONLINE      jtp2                  STABLE ora.TEST.dg               ONLINE  ONLINE      jtp1                  STABLE               ONLINE  ONLINE      jtp2                  STABLE ora.chad               ONLINE  ONLINE      jtp1                  STABLE               ONLINE  ONLINE      jtp2                  STABLE ora.net1.network               ONLINE  ONLINE      jtp1                  STABLE               ONLINE  ONLINE      jtp2                  STABLE ora.ons               ONLINE  ONLINE      jtp1                  STABLE               ONLINE  ONLINE      jtp2                  STABLE ora.proxy_advm               OFFLINE OFFLINE      jtp1                  STABLE               OFFLINE OFFLINE      jtp2                  STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr       1        ONLINE  ONLINE      jtp1                  STABLE ora.LISTENER_SCAN2.lsnr       1        ONLINE  ONLINE      jtp2                  STABLE ora.LISTENER_SCAN3.lsnr       1        ONLINE  ONLINE      jtp2                  STABLE ora.MGMTLSNR       1        ONLINE  ONLINE      jtp2                  169.254.237.250 88.8                                                             8.88.2,STABLE ora.asm       1        ONLINE  ONLINE      jtp1                  Started,STABLE       2        ONLINE  ONLINE      jtp2                  Started,STABLE       3        OFFLINE OFFLINE                              STABLE ora.cvu       1        ONLINE  ONLINE      jtp2                  STABLE ora.jy.db       1        ONLINE  OFFLINE                              STABLE       2        ONLINE  OFFLINE                              STABLE ora.jtp1.vip       1        ONLINE  ONLINE      jtp1                  STABLE ora.jtp2.vip       1        ONLINE  ONLINE      jtp2                  STABLE ora.mgmtdb       1        ONLINE  ONLINE      jtp2                  Open,STABLE ora.qosmserver       1        ONLINE  ONLINE      jtp2                  STABLE ora.scan1.vip       1        ONLINE  ONLINE      jtp1                  STABLE ora.scan2.vip       1        ONLINE  ONLINE      jtp2                  STABLE ora.scan3.vip       1        ONLINE  ONLINE      jtp2                  STABLE --------------------------------------------------------------------------------

到此集群恢复正常

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
文件存储
文件存储(Cloud File Storage,CFS)为您提供安全可靠、可扩展的共享文件存储服务。文件存储可与腾讯云服务器、容器服务、批量计算等服务搭配使用,为多个计算节点提供容量和性能可弹性扩展的高性能共享存储。腾讯云文件存储的管理界面简单、易使用,可实现对现有应用的无缝集成;按实际用量付费,为您节约成本,简化 IT 运维工作。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档