카테고리 없음

[HA] PCS 구성

twoDeveloper 2024. 7. 3. 01:37

[ 개요 ]

* PCS를 통해 VIP를 구성하여 다중 서버(ap01, ap02) 자원 관리

* VIP를 통해 clickhouse-client 작업 쿼리가 수행되면, HAProxy로 clickhouse-server에 로드밸런싱

 

[ PCS 구성 ]

1. PCS 패키지 설치 (ap01 / ap02)

apt install -y corosync pcs pacemaker

 

2. PCS 데몬 실행 (ap01 / ap02)

sudo systemctl start pcsd

 

3. PCS 데몬 default 계정인 hacluster PW 설정 (ap01 / ap02)

sudo passwd hacluster

 

4. 클러스터 계정 인증 (ap01)

pcs host auth ap01.test.com ap02.test.com -u hacluster
Password: 
ap01.test.com: Authorized
ap02.test.com: Authorized

 

5. cluster setup

pcs cluster setup clickhosue_cluster ap01.test.com ap02.test.com --force
---
만약, Warning: Unable to read the known-hosts file: No such file or directory: '/var/lib/pcsd/known-hosts'
해당 에러 발생하면 일부 노드가 삭제 중이니
pcs cluster destroy
---

 

6. pcs start

pcs cluster start --all

 

7. pcs 상태 확인

sudo pcs status
---
Cluster name: clickhouse_cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Cluster Summary:
  * Stack: corosync
  * Current DC: ap01.test.com (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Tue Jul  2 16:37:13 2024
  * Last change:  Tue Jul  2 16:36:33 2024 by hacluster via crmd on ap01.test.com
  * 2 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ ap01.test.com ap02.test.com ]

Full List of Resources:
  * No resources

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

 

[ VIP 설정 ]

1. VIP 설정

* ap01.test.com [192.168.56.62]

* ap02.test.com [192.168.56.63]

sudo pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.56.100 cidr_netmask=24 nic=eth0 op monitor interval=30s

 

2. PCS 상태에 생성한 VIP 확인

sudo pcs status
---
Cluster name: clickhouse_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ap01.test.com (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Thu Jul  4 15:27:01 2024
  * Last change:  Thu Jul  4 15:26:54 2024 by root via cibadmin on ap01.test.com
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ap01.test.com ap02.test.com ]

Full List of Resources:
  * VirtualIP   (ocf::heartbeat:IPaddr2):        Started ap01.test.com

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

* vip 리소스가 ap01.test.com 에서 실행

 

2-1) vip를 잘못 생성 시, 삭제 및 update 방법

sudo pcs resource restart my_vip

* 리소스를 재시작해서 해결하는 방법

sudo pcs resource delete VirtualIP

* 리소스를 삭제 후, 재 생성 하는 방법

 

3. ip 확인

ip a
---
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:77:03:29 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
       valid_lft 75732sec preferred_lft 75732sec
    inet 192.168.56.100/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe77:329/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:6a:eb:c2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.62/24 brd 192.168.56.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe6a:ebc2/64 scope link 
       valid_lft forever preferred_lft forever

* eth0에 vip(192.168.56.100) 생성된 것 확인

 

[ FailOver Test ]

1. 임의로 리소스가 할당된, ap01.test.com down

sudo pcs cluster stop ap01.test.com
sudo pcs status
---
Error: error running crm_mon, is pacemaker running?
  Error: cluster is not available on this node
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:77:03:29 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
       valid_lft 73872sec preferred_lft 73872sec
    inet6 fe80::a00:27ff:fe77:329/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:6a:eb:c2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.62/24 brd 192.168.56.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe6a:ebc2/64 scope link 
       valid_lft forever preferred_lft forever

* 설정했던 eth0에서 vip(192.168.56.100)이 사라진 것을 확인

 

2. ap02.test.com에서 리소스 상태 확인

sudo pcs status
---
Cluster name: clickhouse_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ap02.test.com (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Thu Jul  4 15:59:44 2024
  * Last change:  Thu Jul  4 15:27:12 2024 by root via cibadmin on ap01.test.com
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ap02.test.com ]
  * OFFLINE: [ ap01.test.com ]

Full List of Resources:
  * VirtualIP   (ocf::heartbeat:IPaddr2):        Started ap02.test.com

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

* ap02.test.com에 vip 리소스가 할당된 것을 확인할 수 있음

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:77:03:29 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
       valid_lft 73674sec preferred_lft 73674sec
    inet 192.168.56.100/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe77:329/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:db:21:11 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.63/24 brd 192.168.56.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fedb:2111/64 scope link 
       valid_lft forever preferred_lft forever

* ip를 확인했을 때, eth0에서 vip(192.168.56.100)이 할당된 것을 확인할 수 있음

ssh vagrant@192.168.56.100 
The authenticity of host '192.168.56.100 (192.168.56.100)' can't be established.
ED25519 key fingerprint is SHA256:vNxcOXmhwLLQIsyN2ZLOOejwhwL4azPKjtxYkjCCxco.
This host key is known by the following other names/addresses:
    ~/.ssh/known_hosts:4: 192.168.56.60
    ~/.ssh/known_hosts:6: 192.168.56.61
    ~/.ssh/known_hosts:7: 192.168.56.62
    ~/.ssh/known_hosts:8: 192.168.56.63
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.56.100' (ED25519) to the list of known hosts.
(vagrant@192.168.56.100) Password: 
Welcome to Ubuntu 20.04.6 LTS (GNU/Linux 5.4.0-176-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

 System information disabled due to load higher than 1.0


This system is built by the Bento project by Chef Software
More information can be found at https://github.com/chef/bento
Last login: Thu Jul  4 15:59:37 2024 from 192.168.56.1
vagrant@ap02:~$

* vip(192.168.56.100)로 접근 시, ap02.test.com으로 접근하는 것을 확인할 수 있음

 

3. 다시 ap01.test.com 서버에서 vip 리소스를 start 시키면..

sudo pcs cluster ap01.test.com start
sudo pcs status
---
Cluster name: clickhouse_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: ap02.test.com (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Thu Jul  4 16:07:14 2024
  * Last change:  Thu Jul  4 15:27:12 2024 by root via cibadmin on ap01.test.com
  * 2 nodes configured
  * 1 resource instance configured

Node List:
  * Online: [ ap01.test.com ap02.test.com ]

Full List of Resources:
  * VirtualIP   (ocf::heartbeat:IPaddr2):        Started ap02.test.com

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

* 둘 다, 온라인 상태로 변경 후, vip 리소스는 ap02.test.com에서 머물러 있는 것을 확인