Cluster | hmichl

After having finished the basic installation of the cluster an ceph it’s time to setup the cluster storage.
First off I will recap what has been done so far and what the disks look lile, then explaining how I plan to use them:

We have installed the ceph software and setup monitors on all of our nodes:

pveceph install -version hammer
pveceph init --network 10.10.67.0/24 # only once
pveceph createmon

The actual disk layout is:

node01, node02:
– 1x 370GB HW-RAID0 SSD (osd.0, osd.3)
– 2x 1TB SATA (osd.1, osd.2, osd.4, osd.5)

node03:
– 2x 2TB HW-RAID0 SATA (osd.6, osd.7)
– 1x 1TB HW-RAID1 (SW-RAID
– 1x 870GB HW-RAID1

Within Ceph we want 3 copies of the data, one on each node. We will be using the SSDs as a writeback cache pool. Ideally the cache pool beeing “local” to the VM should be used (only), because the main bottelneck is network bandwith (only 1GBit). Additionally the SSDs wil hold our (external) journals.

First create 2 OSDs on the SATA drives with external journals on the SSD, this can easily be accomplished from the web GUI.
Then we need to create the OSD on the remaining space of the SSD, which is not that as easy and needs to be done from the commandline:

DEVICE=/dev/sdb # the SSD HW-RAID0
PARTITION=3 # the next unused partition
OSD_UUID=$(uuidgen -r) # a unique UUID for the OSD
PTYPE_UUID=4fbd7e29-9d25-41b8-afd0-062c0ceff05d # the default PTYPE UUID ceph uses (from the source)
FSID=345abc67-de89-f012-345a-bc67de89f012 # taken from /etc/ceph/ceph.conf

sgdisk --largest-new=$PARTITION --change-name="$PARTITION:ceph data" --partition-guid=$PARTITION:$OSD_UUID --typecode=$PARTITION:$PTYPE_UUID -- $DEVICE
partprobe # to read new partition table
gdisk -l $DEVICE # verify the rest of the space on the device got allocated to a ceph data partition

ceph-disk prepare --cluster ceph --cluster-uuid $FSID $DEVICE$PARTITION
ceph-disk activate $DEVICE$PARTITION

After creating a pool named rbd, the keyring needs to be copied:

cd /etc/pve/priv/
mkdir ceph
cp /etc/ceph/ceph.client.admin.keyring ceph/rbd.keyring

I’ve got 3 physical nodes, two HP ProLiant 320e Gen8v2 and a HP MicroServer Gen8 with two Gbit NICs and an ILO interface.
One of the NICs is connected to the external network. The other one and the ILO are connected to an internal switch using a NAT gateway to connect to the external net (this is neccessary for installing updates, syncing time and accessing the management interface at all from outside).

Network configuration (/etc/network/interfaces) looks like this on all nodes:

auto lo
iface lo inet loopback

# the internal interface, used for storage traffic,
# cluster traffic and internal inter VM communication too.
# this should really be split into VLANs and using
# a bond connected to two different physical switches *at least*
# having an extra (10G) interface for storage would be a plus
auto eth0
iface eth0 inet manual

# bridge to allow VMs to be connected to the physical interface
auto vmbr0
iface vmbr0 inet static
        address 10.10.67.10
        netmask 255.255.255.0
        gateway 10.10.67.1
        bridge_ports eth0
        bridge_stp on
        bridge_fd 0

# the external interface connected to the outer world
# this has no IP here, as the host is not accessible from
# the outside directly, but needs access itself, see gateway above
auto eth1
iface eth1 inet manual

# bridge to allow VMs to be connected to the physical interface
auto vmbr1
iface vmbr1 inet manual
        bridge_ports eth1
        bridge_stp on
        bridge_fd 0

Each of the ProLiant servers has two 250GB SSDs and two 1TB SATA drives, configured as follows:

Array 1: 2x250GB SSD
– Volume 1: 2x65GB RAID1 (root filesystem and swap)
– Volume 2: 2x185GB RAID0 (Ceph cache OSD)
Array 2: 1x1TB SATA
– Volume 3: 1x1TB RAID0 (Ceph storage OSD)
Array 3: 1x1TB SATA
– Volume4: 1x1TB RAID0 (Ceph storage OSD)

The MicroServer has four 2TB SATA drives, configured in a similar fassion:

Array 1: 2x2TB SATA
– Volume 1: 2x130GB RAID1 (root filesystem and swap)
– Volume 2: 2x1TB RAID0 (Ceph storage OSD)
– Volume 3: 2x 870GB RAID1 (Backup stroage, softRAID0, NFS export)
Array 2: 2x2TB SATA
– Volume 4: 2x1TB RAID0 (Ceph storage OSD)
– Volume 5: 2x 1TB RAID1 (Backup storage, softRAID0, NFS export)

Now install the three nodes , I choose 32GB root, 32GB swap, entered the corresponding, local, IP address and gateway as shown (will overwrite the configuration anyways with the file above), local hostname, password and admin email address and be done!

root@nodeXX:~#
aptitude update
aptitude dist-upgrade
aptitude install mc openntpd ipmitool
pveceph install -version hammer
mcedit /etc/network/interfaces
# copy&paste action here
reboot

After this has been done on all nodes, we install the cluster and initial ceph configuration.
On the first node do:

root@node01:~#
pvecm create cluster1
pveceph init --network 10.10.67.0/24
pveceph createmon

On all the other nodes:

root@nodeXX:~#
pvecm add 10.10.67.10 # this is the IP of the first node
pveceph createmon

Check the status of the cluster and ceph, additionally check that /etc/hosts contains all nodes names – on every node, if not add them now:

pvecm status # show all nodes
ceph status # show all monitors
cat /ets/hosts # should show all the nodes names, FQDN too

10.10.67.10 node01.local node01 pvelocalhost
10.10.67.20 node02.local node02
10.10.67.30 node03.local node03

hmichl

Category Archives: Cluster

Setting up 3 nodes HA cluster using CEPH storage for KVM (and LXC) – Part 2

Setting up 3 nodes HA cluster using CEPH storage for KVM (and LXC) – Part 1

just the random notes of another rantsome nerdy guy