New HA Filesystem

Overview of new storage system

 
The goal of GlusterFS in the Ringfree Infrastructure is to provide geo redundant PBX service.  For this to occur, portions of the PBX container filesystems will be replicated across all datacenter, while other portions need to stay site specific (such as log files).
A previous project still in development, RORW, provides Ringfree with a way to have granular control over the mounting process for an OpenVZ container using disparate sources for the mount data.  Use of GlusterFS will make heavy use of these tools.
Old PBX Container storage was stored solely in a dc-reachable NFS mount, targeting /rf-images/pbx for the container file system, /rf-images/pbxroot for a container mount point, and /rf-images/vzconf for container configurations.  They were mounted on host nodes as /vz/nfsprivate, /vz/nfsroot and /etc/vz/conf respectively.
(On NFS server) /rf-images/pbx     -> (On PBX Node) /vz/nfsprivate
(On NFS server) /rf-images/pbxroot -> (On PBX Node) /vz/nfsroot
(On NFS server) /rf-images/vzconf  -> (On PBX Node) /etc/vz/conf
New clustered storage will differ drastically and the RORW mounting hook ins will handle building the container mount before starting.
(On GlusterFS server, Fuse) /rf-images/nimbus     -> (On PBX Node) /vz/nimbus
(On GlusterFS server,  NFS) /rf-images/dc-atl     -> (On PBX Node) /vz/dc-atl
(On GlusterFS server,  NFS) /rf-images/dc-dal     -> (On PBX Node) /vz/dc-dal
(On GlusterFS server,  NFS) /rf-images/containers -> (On PBX Node) /vz/containers
 
Configurations for containers will need to be altered in transition from old NFS storage to new GlusterFS storage.  The mounting process is outlined here:
 
1) Private filesystem for container is found in /vz/containers/$CTID and mounted on /vz/root
2) Site specific filesystem is detected from node environment variable and mounted on top in /vz/root.
 
If DAL detected
    Mount /vz/dc-dal/$ctid/ /vz/root/$ctid
If ATL detected
    Mount /vz/dc-atl/$ctid/ /vz/root/$ctid
 
3) Nimbus storage is then mounted from a GlusterFS fuse mount (all others are GlusterFS-NFS for increased speed)
 
Mount /vz/nimbus/$ctid /vz/root/$ctid
 
4) At this point, a complete filesystem is mounted and the container can be started.  File writes are replicated regardless of NFS or Fuse mount, but a primary GlusterFS-NFS mount failure will require all nodes targeting it as storage to unmount, remount to alt target and restart containers.
 
Note:  $ctid.mount and $ctid.umount scripts are added during the transitioning phase to enact the additional mount options for building a complete container filesystem from the disparate sources.
 

Server Configuration Notes

 
Because in Atlanta we are migrating away from vanilla NFS to a mixture of GlusterFS NFS and GlusterFS Fuse, we run into a small problem where we cannot live transition the primary NFS mount while setting up GlusterFS in production.  This is fine because we will start the GlusterFS initialization on the secondary NFS mount and when a successful container start is made from it mounted on one of the nodes, we can stop vanilla NFS on primary and start the cluster filesystem import from existing storage.
 
(SERVER) Installation
# yum install glusterfs-server
# chkconfig —levels 235 glusterd on
# service glusterd start
(SERVER) Create Bricks for volume
# mkdir -p /rf-images/nimbus
# mkdir -p /rf-images/dc-atl
# mkdir -p /rf-images/dc-dal
# mkdir -p /rf-images/containers 
Note: there should NEVER be any direct writes to these folders as it will corrupt the glusterfs volume.
(SERVER) Create local Volume mounts
# mkdir -p /rf-images/nimbus-localmount
# mkdir -p /rf-images/dc-atl-localmount
# mkdir -p /rf-images/dc-dal-localmount
# mkdir -p /rf-images/containers-localmount
(SERVER) Make Brick nodes peer with each other
# gluster peer probe $otherbricknodes
(SERVER) Create Volume from bricks
# gluster volume create nimbus replica $noofpeers transport tcp $peer1hostname:/rf-images/nimbus $peer2hostname:/rf-images/nimbus force
 
# gluster volume create dc-atl replica $noofpeers transport tcp $peer1hostname:/rf-images/dc-atl $peer2hostname:/rf-images/dc-atl force
 
# gluster volume create dc-dal replica $noofpeers transport tcp $peer1hostname:/rf-images/dc-dal $peer2hostname:/rf-images/dc-dal force
 
# gluster volume create containers replica $noofpeers transport tcp $peer1hostname:/rf-images/containers $peer2hostname:/rf-images/containers force
 

Client Configuration Notes

 
Note:  Client (node) configuration will be taken care of 99% of the time via the ringfree-cloudstor utility.  However they are listed here for documentation.  This information is also found when running ringfree-cloudstor –listrawcmds
 
(CLIENT) mount GlusterFS volumes via Fuse #TODO
 
# mount -t glusterfs -o backupvolfile-server=$altbrickserver,use-readdirp=no,log-level=WARNING $brickserver:nimbus /rf-images/nimbus-localmount
 
(CLIENT) mount GlusterFS volumes via GlusterFS-NFS. #TODO
# mount -t nfs

Detailed description of new storage:


(Stage 1 Mount) CONTAINER STORAGE
 
TYPE     STORAGE LOC                NODE LOC             CONTAINER LOC 
NFS      /rf-images/containers-atl  /vz/containers       /
NFS      /rf-images/containers-dal  /vz/containers .     /
 
This data is present only in a single datacenter, is not required to sync site to site.  This provides a single target to eventually move to RORW storage in future update. When provisioning a container for high availability, the container must be provisioned in each datacenter.
 
/rf-images/containers-atl/CTID/
                              /                # Root Container FS

 


 
(Stage 2 Mount) DC STORAGE
 
TYPE      STORAGE LOC         NODE LOC    CONTAINER LOC
NFS       /rf-images/dc-atl   /vz/dc-atl  /ringfree-mnt/dc
NFS       /rf-images/dc-dal   /vz/dc-dal  /ringfree-mnt/dc
 
This data is present only in a single datacenter, is not required to sync and has site specific storage.  When provisioning a container for high availability, the container must be provisioned in each datacenter.  This storage, while similar and offering the same features and speed of generic container storage is segregated in order to provide HA features.
 
/rf-images/dc-atl/CTID/
                      /dev                     # POSIX Device FS
                      /lib-udev-devices        # UDEV
                      /lib-udev-devices-pseudo # REQUIRED FOR DAHDI
                      /proc                    # LINUX PROC FS
                      /srv                     # LINUX srv dir
                      /sys                     # LINUX sys dir
                      /tmp                     # tmp 777
                      /var-lib-php-session     # PHP session authentications
                      /var-local               # used by apps
                      /var-lock                # used by apps
                      /var-log                 # Application Logs
                      /var-run                 # Service run files

 
(Stage 3 Mount) NIMBUS 
TYPE               STORAGE LOC        NODE LOC    CONTAINER LOC
GlusterFS (Fuse)   /rf-images/nimbus  /vz/nimbus  /ringfree-mnt/nimbus
 
This data is ALWAYS in sync across data centers.  Nimbus storage only needs to be provisioned once, regardless of high availability or not.
 
/rf-images/nimbus/CTID/
                      /etc                    # PBX Configuration
                      /opt                    # For Custom Apps
                      /var-lib-asterisk       # ASTDB, sounds, framework scripts
                      /var-spool-asterisk     # Recordings, Voicemail
                      /var-www                # framework web modules