4-1-3 MHA 自动Failover
最后更新于:2022-04-02 07:43:52
> MySQL安装和复制请查看7.1/7.2相关章节
> MHA要求MySQL所有节点创建复制账户
> LVS负责读操作的负载均衡
#### 基础环境
操作系统:Centos 7.3 64位;
```shell
数据库写VIP:192.168.0.20
数据库读VIP(LVS):192.168.0.21
```
服务器角色和配置信息
|角色|IP地址|主机名|Server-ID|功能类型|
| :------------: | :------------: | :------------: | :------------: | :------------: |
|Master|192.168.0.230|mdb01.prod.ding|168001200|写|
|Candicate Master|192.168.0.236|mdb01-s1.prod.ding|168001201|读|
|Slave|192.168.0.235|mdb01-s2.prod.ding|168001202|读|
|Monitor host|192.168.0.237|mha.prod.ding||MySQL主库的故障转移|
|LVS+Keepalived-A|192.168.0.90|mdb-lvs01.prod.ding||读操作的负载均衡|
|LVS+Keepalived-B|192.168.0.91|mdb-lvs02.prod.ding||LVS高可用备机|
各服务器软件部署情况
|角色|Manager|Node|Keepalived|LVS|
| :------------: | :------------: | :------------: | :------------: | :------------: |
|Master|-|部署|-|-|
|Candicate Master|-|部署|-|-|
|Slave|-|部署|-|-|
|MHA host|部署|部署|-|-|
|LVS+Keepalived-A|-|-|部署|部署|
|LVS+Keepalived-B|||部署|部署|
#### 拓扑图及实现原理
![MHA拓扑图](index_files/MHA.png "MHA拓扑图")
##### 业务流程
**读操作**
1. LVS实现读操作的负载均衡;
2. Keepalived在上层管理LVS,并对两台从库进行健康检测(通过定义Check脚本);
3. 一台从库出现故障后,Keepalived将其剔除出负载均衡集群;
**写操作**
1. 在Master上绑定写VIP(MHA启动后会通过脚本进行操作);
2. MHA监控Master状态,当Master出现故障后(宕机、复制暂停)时;
3. 通过Failover脚本,卸载Master上的WVIP;
4. 通过Failover在CMaster上绑定WVIP,提升其为主库;
5. 同步并应用差异日志,并将从库指向新主库;
问题:当MHA把Master切换到了CMaster上后,LVS如何处理分发在CMaster上的读操作?
解释:由于Keepalived会通过脚本定期监控CMaster的状态,包括同步、SQL线程、I/O线程,所以当CMaster升级为主库后,这些状态都将消失,Keepalived将自动将CMaster剔除出负载均衡集群。
#### 部署MHA
MHA使用Perl编写,需要安装Perl依赖,建议使用阿里云的YUM源,YUM安装
Node节点需要perl-DBD-MySQL perl-DBI
Manger需要perl-Config-Tiny.noarch perl-Log-Dispatch.noarch perl-Parallel-ForkManager.noarch perl-DBD-MySQL perl-DBI
##### 部署约定
|序号|目录名称|目录位置|
| :------------: | :------------: | :------------: |
|1|上传文件目录|/tmp/|
|2|MHA配置文件目录|/app/mha|
|3|MHA日志文件目录|/app/mha/log|
|4|MHA管理目录|/app/mha/app1|
|5|MHA处理数据目录|/tmp|
|6|relay_log_purge脚本目录|/app/scripts|
|7|master_ip_failover脚本目录|/app/scripts|
##### 配置阿里云的YUM源
```shell
#clean OS default repo
mkdir /etc/yum.repos.d/old && mv /etc/yum.repos.d/C* /etc/yum.repos.d/old/
#add local repo
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
```
##### 安装Perl依赖
方法一 使用YUM 安装(推荐)
```shell
yum install -y perl-Config-Tiny.noarch perl-Log-Dispatch.noarch perl-Parallel-ForkManager.noarch perl-DBD-MySQL perl-DBI
```
方法二 使用cpanm安装。注意:时间必须整正确
编写脚本DBD_install.sh
```shell
#!/bin/bash
wget http://xrl.us/cpanm --no-check-certificate
mv cpanm /usr/bin/
chmod 755 /usr/bin/cpanm
cat >/root/list<这个账户和复制账户没有关系
```shell
grant all on *.* to 'mha'@'192.168.0.%' identified by 'mha'; flush privileges;
```
##### 在(MHA/M/S1/S2)服务器上配置主机名(MHA管理脚本通过主机名调用)
```shell
cat >>/etc/hosts<< EOF
192.168.0.230 ip230
192.168.0.235 ip235
192.168.0.236 ip236
192.168.0.237 ip237
EOF
```
##### 所有MySQL节点增加mysqlbinlog环境变量(解析binlog用)
```shell
echo "export PATH=$PATH:/usr/local/mysql/bin" >> /etc/bashrc ;source /etc/bashrc
```
> bashrc是每次执行bash脚本时加载的变量
##### 配置MHA和MySQL主从之间的主机免密钥登录
所有节点执行以下操作
- 部署SSH免密钥登录
- 所有节点可以SSH访问MHA和其他MySQL节点
- 所有节点可以SSH访问自己
```shell
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub "root@192.168.0.230"
ssh-copy-id -i ~/.ssh/id_rsa.pub "root@192.168.0.235"
ssh-copy-id -i ~/.ssh/id_rsa.pub "root@192.168.0.236"
ssh-copy-id -i ~/.ssh/id_rsa.pub "root@192.168.0.237"
```
##### 在/app/mha下创建app1.cnf
>注意修改相关配置文件master_binlog_dir为MySQl的数据目录
```shell
[server default]
manager_log=/etc/mha/log/app1.log
manager_workdir=/etc/mha/app1/
master_binlog_dir=/data/mysql3306
master_ip_failover_script=/etc/mha/master_ip_failover
ping_interval=1
remote_workdir=/tmp
secondary_check_script=/bin/masterha_secondary_check -s ip235 -s ip236 --user=root --master_host=ip230 --master_ip=192.168.0.230 --master_port=3306
ssh_user=root
user=mha
password=mha
repl_password=repl
repl_user=repl
[server1]
hostname=192.168.0.230
port=3306
[server2]
candidate_master=1
check_repl_delay=0
hostname=192.168.0.235
port=3306
[server3]
hostname=192.168.0.236
port=3306
```
> 根据实际情况修改masterha_secondary_check位置,rpm安装在bin目录下
#### 上传并修改/etc/mha/master_ip_failover脚本
```shell
chmod 600 master_ip_failover
```
内容详见结尾附件
##### 调整参数(网卡名称和IP)
```shell
my $vip = '192.168.0.20/24'; # Virtual IP
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
```
#### 部署从库(S1/S2)的手动回收中继日志脚本
上传relay_purge.sh到/app/scripts下并增加执行权限
```shell
#!/bin/bash
/usr/bin/purge_relay_logs --user=root --password=123.com --socket=/tmp/mysql3306.sock --host=localhost -disable_relay_log_purge --port=3306 --workdir=/tmp >> /app/scripts/purge_relay_logs.log 2>&1
```
增加执行权限
```shell
chmod 600 /app/scripts/relay_purge.sh
```
部署定时任务
```shell
echo '0 4 * * * /bin/sh /app/scripts/relay_purge.sh' >/var/spool/cron/root
```
#### 切换测试
##### 验证SSH配置
```shell
masterha_check_ssh --conf=/etc/mha/app1.cnf
```
##### 测试复制、应用差异日志、故障转移脚本
```shell
masterha_check_repl --conf=/etc/mha/app1.cnf
```
##### 启动MHA
```shell
nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover< /dev/null >/etc/mha/app1/manager.log 2>&1 &
#状态检查
sterha_check_status --conf=/etc/mha/app1.cnf
```
##### 关闭MHA
正常关闭
```shell
masterha_stop --conf=/etc/mha/app1.cnf
```
##### 关闭主库,并监控MHA日志
具体日志请查看MHA日志部分
这时,230(down) 235提升为主库,236指向了235
#### 恢复原有环境
1.启动230数据库,并查看position情况
```shell
show master status\G
```
2.将236重新指向230
```shell
stop slave;
change master to master_host='192.168.0.230',master_user='repl',master_password='repl',master_log_file='mysql-bin.000003',master_log_pos=154;
start slave;
```
3.将235重新指向230
```shell
change master to master_host='192.168.0.230',master_user='repl',master_password='repl',master_log_file='mysql-bin.000003',master_log_pos=154;
start slave;
```
4.恢复MHA配置文件,并删除锁文件
```shell
rm -f /etc/mha/app1/app1.failover.complete
```
> 不删除,不会切换,防止反复切换
5.卸载235上的写VIP
```shell
/sbin/ifconfig eth0:1 down
```
#### 附件
/etc/mha/master_ip_failover
```shell
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port,
);
my $vip = '192.168.0.20/24'; # Virtual IP
my $key = "0";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
$ssh_user = "root";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
#eval {
# print "Disabling the VIP on old master: $orig_master_host \n";
# &stop_vip();
# $exit_code = 0;
#};
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
#my $ping=`ping -c 1 10.0.0.13 | grep "packet loss" | awk -F',' '{print $3}' | awk '{print $1}'`;
#if ( $ping le "90.0%" && $ping gt "0.0%" ){
#$exit_code = 0;
#}
#else {
&stop_vip();
# updating global catalog, etc
$exit_code = 0;
#}
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $ssh_user\@$orig_master_ip \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
# the end.
```
';