Maintenance – Policy and Procedures

Policy

Maintenance periods, comprised of system and framework updates, will only be deployed during maintenance periods. This prevents unforeseen problems from occurring unexpectedly, reduces our support burden and allows for a consistent process where all employees can quickly rule out maintenance tasks in troubleshooting.

Ringfree has 5 tiers of updates that are distinguished by service interruption, security exploits and general updates:

Tier 1
General updates with no service interuption

These updates are deployed on Friday nights during maintenance hours. They do not require Asterisk, host vz daemons or SBC interruptions. Each PBX system is comprised of software that supports various functions but do not necessarily affect Asterisk.

Examples of Tier 1 updates:
OS updates to PHP
Framework updates to CDR module
global beancounter allocation changes

Tier 2
General updates with service interuption

These updates are deployed on Friday nights during maintenance hours. Due to the nature of the update, service interruption will occur. Length of service interruption should be estimated and relayed to customers in monday notification email before Friday maintenance. The Ringfree PBX system and node software is built on many software projects, some of which requires daemon restarts across the board to fully apply the update.

Examples of Tier 2 updates:
OS updates to glibc (as many applications included Asterisk are dynamically linked at runtime)
OS updates to Ringfree kernel and/or OpenVZ userland utilities
OS updates to NFS (The share will need to be shut down, have NFS updates applied to SAN servers, the shares remounted and containers restarted for NFS to be fully applied)

Tier 3
Emergency updates with no service interuption

These updates are deployed outside of the Friday night maintenance period based on a impact voiced from support and/or engineering departments. Service interruption will not occur, but a feature deployment or bugfix may need to be deployed earlier than the weekly scheduled period.

Examples of Tier 3 updates:
OS updates to SQL service adding new feature needed to close support and/or engineering ticket

Tier 4
Critical security update assuming service interuption

These updates are deployed in the event of an active exploit on any layer of the Ringfree infrastructure. Due to their nature, they cannot wait for normal Friday night deployment hours. Many remote exploits are scanned for mere hours after a CVE is issued upstream. Any half measures needed before the security update is deployed needs to be discussed by infrastructure team. All security updates MUST be tested thoroughly in a Ringfree Node+PBX sandbox VM before being promoted to updates repo. All daemons should restart cleanly and there should be no loss of service after deployment. 24/7 customers MUST be notified during business hours. Due to our low amount of services exposed directly, Tier 4 updates should be relatively rare. However, they must be handly with care and quickly.

Note that many updates fix exploits. Exploits are generally split into local and remote groups. As we do not allow shell access to our infrastructure, local exploits can and should be delayed for a normal Tier 1 update maintenance period. Remote exploits should be patched as quickly as possible.

Examples of Tier 4 updates:
Remote exploit found in publicly exposed service (Asterisk, Apache, MySQL, haproxy, vsftpd, xinetd, Prosody)
Remote exploit found in Sansay devices

Tier S.H.T.F.
Critical security update assuming service interruption *after remote exploit has been widely exploited in the wild*

The rarest of updates. Only once in Ringfree’s history have we had to deploy an update during business hours and that was the exploit found in Bash that made available a buffer overflow via PHP which uses Bash to process various cgi-bin scripts. These updates are deployed after an all hands on deck meeting to discuss severity. Unless the remote exploit is being actively exploited widely, updates of this nature should be scheduled for a Tier 4 update.

Examples of Tier S.H.T.F. updates:
Heartbleed/Bash remote exploit

Notes
While scheduling updates in various tiers might first seem like needless overhead, it provides a consistent experience for customers and Ringfree staff alike. We take security updates seriously, but we should lean heavily towards monthly or weekly Tier 1/2 updates. Providing a strict structure for maintanance allows Ringfree staff to brace for impact coming from reduced head count during office hours, reducing scheduling conflicts, having a rhythm to our update deployment practices and more importantly providing a consistent experience for our customers. When you see your service provider taking updates and service impact warnings seriously, you will see they are actively maintaining your service. We want our IT providers and direct customers to see this and have it contribute to the Ringfree brand and service being best of breed.


 

Procedure

OS Updates

All OS updates must go through quality assurance in a promotion process. Preliminary and testing package builds are placed in a staging repository and tested against a sandbox. In the future, live beta servers in the DCs would be more appropriate, but sandboxes in a local virtual machine will suffice.

For a package to be promoted from staging to updates (where it is available to all production system), it must past the following requirements:

  • No package dependency failures
  • No failures with current auto-configuration services
  • No regressions to Asterisk or Apache
  • Signed with our RPM release key

Packages can be promoted to updates at anytime throughout the week after being passed by infrastructure department. However, OS updates will only ever be deployed during a maintenance period.

OS Updates on Nodes (restart services where required):

yum -y update

OS Updates inside containers (restart services where required):

onallcts "yum -y update"

Node updates will usually be very quick. Conatainer updates will take a while to deploy, even for smaller payloads. This is due to the YUM cache needed to be refreshed inside each container. In the future, we should come up with a way to share YUM caches.

OS updates web URL: http://mirror1.ringfree.biz/rel/6.3.1/updates/

Framework Updates:

The FreePBX framework we use for PBX configuration utilizes modules deployed in tgz format hosted on our framework repo. Framework updates should be tested throughly inside sandboxes before deploying to production. Framework updates will only ever be deployed during a maintenance period, unless a module is currently uninstalled. Framework updates must pass the following requirements:

  • No broken modules
  • No regressions
  • No broken call flows

Framework updates inside containers from node:

onallcts "/var/lib/asterisk/bin/module_admin upgradeall" > /tmp/framework.log
onallcts "/var/lib/asterisk/bin/module_admin reload" >> /tmp/framework.log
onallcts "/var/lib/asterisk/bin/retrieve_conf" >> /tmp/framework.log

Framework updates URL (2.10): http://freepbx.classiccitytelco.com/modules/release/2.10/