4-1-3 MHA 自动Failover

最后更新于:2022-04-02 07:43:52

> MySQL安装和复制请查看7.1/7.2相关章节 > MHA要求MySQL所有节点创建复制账户 > LVS负责读操作的负载均衡 #### 基础环境 操作系统:Centos 7.3 64位; ```shell 数据库写VIP:192.168.0.20 数据库读VIP(LVS):192.168.0.21 ``` 服务器角色和配置信息 |角色|IP地址|主机名|Server-ID|功能类型| | :------------: | :------------: | :------------: | :------------: | :------------: | |Master|192.168.0.230|mdb01.prod.ding|168001200|写| |Candicate Master|192.168.0.236|mdb01-s1.prod.ding|168001201|读| |Slave|192.168.0.235|mdb01-s2.prod.ding|168001202|读| |Monitor host|192.168.0.237|mha.prod.ding||MySQL主库的故障转移| |LVS+Keepalived-A|192.168.0.90|mdb-lvs01.prod.ding||读操作的负载均衡| |LVS+Keepalived-B|192.168.0.91|mdb-lvs02.prod.ding||LVS高可用备机| 各服务器软件部署情况 |角色|Manager|Node|Keepalived|LVS| | :------------: | :------------: | :------------: | :------------: | :------------: | |Master|-|部署|-|-| |Candicate Master|-|部署|-|-| |Slave|-|部署|-|-| |MHA host|部署|部署|-|-| |LVS+Keepalived-A|-|-|部署|部署| |LVS+Keepalived-B|||部署|部署| #### 拓扑图及实现原理 ![MHA拓扑图](index_files/MHA.png "MHA拓扑图") ##### 业务流程 **读操作** 1. LVS实现读操作的负载均衡; 2. Keepalived在上层管理LVS,并对两台从库进行健康检测(通过定义Check脚本); 3. 一台从库出现故障后,Keepalived将其剔除出负载均衡集群; **写操作** 1. 在Master上绑定写VIP(MHA启动后会通过脚本进行操作); 2. MHA监控Master状态,当Master出现故障后(宕机、复制暂停)时; 3. 通过Failover脚本,卸载Master上的WVIP; 4. 通过Failover在CMaster上绑定WVIP,提升其为主库; 5. 同步并应用差异日志,并将从库指向新主库; 问题:当MHA把Master切换到了CMaster上后,LVS如何处理分发在CMaster上的读操作? 解释:由于Keepalived会通过脚本定期监控CMaster的状态,包括同步、SQL线程、I/O线程,所以当CMaster升级为主库后,这些状态都将消失,Keepalived将自动将CMaster剔除出负载均衡集群。 #### 部署MHA MHA使用Perl编写,需要安装Perl依赖,建议使用阿里云的YUM源,YUM安装 Node节点需要perl-DBD-MySQL perl-DBI Manger需要perl-Config-Tiny.noarch perl-Log-Dispatch.noarch perl-Parallel-ForkManager.noarch perl-DBD-MySQL perl-DBI ##### 部署约定 |序号|目录名称|目录位置| | :------------: | :------------: | :------------: | |1|上传文件目录|/tmp/| |2|MHA配置文件目录|/app/mha| |3|MHA日志文件目录|/app/mha/log| |4|MHA管理目录|/app/mha/app1| |5|MHA处理数据目录|/tmp| |6|relay_log_purge脚本目录|/app/scripts| |7|master_ip_failover脚本目录|/app/scripts| ##### 配置阿里云的YUM源 ```shell #clean OS default repo mkdir /etc/yum.repos.d/old && mv /etc/yum.repos.d/C* /etc/yum.repos.d/old/ #add local repo wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo ``` ##### 安装Perl依赖 方法一 使用YUM 安装(推荐) ```shell yum install -y perl-Config-Tiny.noarch perl-Log-Dispatch.noarch perl-Parallel-ForkManager.noarch perl-DBD-MySQL perl-DBI ``` 方法二 使用cpanm安装。注意:时间必须整正确 编写脚本DBD_install.sh ```shell #!/bin/bash wget http://xrl.us/cpanm --no-check-certificate mv cpanm /usr/bin/ chmod 755 /usr/bin/cpanm cat >/root/list<这个账户和复制账户没有关系 ```shell grant all on *.* to 'mha'@'192.168.0.%' identified by 'mha'; flush privileges; ``` ##### 在(MHA/M/S1/S2)服务器上配置主机名(MHA管理脚本通过主机名调用) ```shell cat >>/etc/hosts<< EOF 192.168.0.230 ip230 192.168.0.235 ip235 192.168.0.236 ip236 192.168.0.237 ip237 EOF ``` ##### 所有MySQL节点增加mysqlbinlog环境变量(解析binlog用) ```shell echo "export PATH=$PATH:/usr/local/mysql/bin" >> /etc/bashrc ;source /etc/bashrc ``` > bashrc是每次执行bash脚本时加载的变量 ##### 配置MHA和MySQL主从之间的主机免密钥登录 所有节点执行以下操作 - 部署SSH免密钥登录 - 所有节点可以SSH访问MHA和其他MySQL节点 - 所有节点可以SSH访问自己 ```shell ssh-keygen -t rsa ssh-copy-id -i ~/.ssh/id_rsa.pub "root@192.168.0.230" ssh-copy-id -i ~/.ssh/id_rsa.pub "root@192.168.0.235" ssh-copy-id -i ~/.ssh/id_rsa.pub "root@192.168.0.236" ssh-copy-id -i ~/.ssh/id_rsa.pub "root@192.168.0.237" ``` ##### 在/app/mha下创建app1.cnf >注意修改相关配置文件master_binlog_dir为MySQl的数据目录 ```shell [server default] manager_log=/etc/mha/log/app1.log manager_workdir=/etc/mha/app1/ master_binlog_dir=/data/mysql3306 master_ip_failover_script=/etc/mha/master_ip_failover ping_interval=1 remote_workdir=/tmp secondary_check_script=/bin/masterha_secondary_check -s ip235 -s ip236 --user=root --master_host=ip230 --master_ip=192.168.0.230 --master_port=3306 ssh_user=root user=mha password=mha repl_password=repl repl_user=repl [server1] hostname=192.168.0.230 port=3306 [server2] candidate_master=1 check_repl_delay=0 hostname=192.168.0.235 port=3306 [server3] hostname=192.168.0.236 port=3306 ``` > 根据实际情况修改masterha_secondary_check位置,rpm安装在bin目录下 #### 上传并修改/etc/mha/master_ip_failover脚本 ```shell chmod 600 master_ip_failover ``` 内容详见结尾附件 ##### 调整参数(网卡名称和IP) ```shell my $vip = '192.168.0.20/24'; # Virtual IP my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; ``` #### 部署从库(S1/S2)的手动回收中继日志脚本 上传relay_purge.sh到/app/scripts下并增加执行权限 ```shell #!/bin/bash /usr/bin/purge_relay_logs --user=root --password=123.com --socket=/tmp/mysql3306.sock --host=localhost -disable_relay_log_purge --port=3306 --workdir=/tmp >> /app/scripts/purge_relay_logs.log 2>&1 ``` 增加执行权限 ```shell chmod 600 /app/scripts/relay_purge.sh ``` 部署定时任务 ```shell echo '0 4 * * * /bin/sh /app/scripts/relay_purge.sh' >/var/spool/cron/root ``` #### 切换测试 ##### 验证SSH配置 ```shell masterha_check_ssh --conf=/etc/mha/app1.cnf ``` ##### 测试复制、应用差异日志、故障转移脚本 ```shell masterha_check_repl --conf=/etc/mha/app1.cnf ``` ##### 启动MHA ```shell nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover< /dev/null >/etc/mha/app1/manager.log 2>&1 & #状态检查 sterha_check_status --conf=/etc/mha/app1.cnf ``` ##### 关闭MHA 正常关闭 ```shell masterha_stop --conf=/etc/mha/app1.cnf ``` ##### 关闭主库,并监控MHA日志 具体日志请查看MHA日志部分 这时,230(down) 235提升为主库,236指向了235 #### 恢复原有环境 1.启动230数据库,并查看position情况 ```shell show master status\G ``` 2.将236重新指向230 ```shell stop slave; change master to master_host='192.168.0.230',master_user='repl',master_password='repl',master_log_file='mysql-bin.000003',master_log_pos=154; start slave; ``` 3.将235重新指向230 ```shell change master to master_host='192.168.0.230',master_user='repl',master_password='repl',master_log_file='mysql-bin.000003',master_log_pos=154; start slave; ``` 4.恢复MHA配置文件,并删除锁文件 ```shell rm -f /etc/mha/app1/app1.failover.complete ``` > 不删除,不会切换,防止反复切换 5.卸载235上的写VIP ```shell /sbin/ifconfig eth0:1 down ``` #### 附件 /etc/mha/master_ip_failover ```shell #!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, ); my $vip = '192.168.0.20/24'; # Virtual IP my $key = "0"; my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; $ssh_user = "root"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, ); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; #eval { # print "Disabling the VIP on old master: $orig_master_host \n"; # &stop_vip(); # $exit_code = 0; #}; eval { print "Disabling the VIP on old master: $orig_master_host \n"; #my $ping=`ping -c 1 10.0.0.13 | grep "packet loss" | awk -F',' '{print $3}' | awk '{print $1}'`; #if ( $ping le "90.0%" && $ping gt "0.0%" ){ #$exit_code = 0; #} #else { &stop_vip(); # updating global catalog, etc $exit_code = 0; #} }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $ssh_user\@$orig_master_ip \" $ssh_start_vip \"`; exit 0; } else { &usage(); exit 1; } } # A simple system call that enable the VIP on the new master sub start_vip() { `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`; } # A simple system call that disable the VIP on the old_master sub stop_vip() { `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; } # the end. ```
';