GlusterFS, a next-generation distributed network file system with built-in redundancy and replication features, was released in 2007, but only came into wide use in the last few years. Despite already having a high-available, highly-resilient system to store hard disk images for our virtual machines (VMs), we knew that Gluster would offer generational improvements to performance, expandability, and flexibility.
We recognized that the hosting of VM images can be a particularly demanding need; it is something that cannot afford any unexpected downtime or data loss and requires consistent performance. Our previous NetApp SAN (storage area network) was approximately seven years-old, beyond its manufacturer’s end-of-life period, very expensive to maintain, and lagging behind in performance. Unfortunately, new SANs are (and always have been) very expensive devices, and we had no budget for one.
IT Freedom started using GlusterFS in June 2014. The following is a case study of GlusterFS based on our own experience:
The Gluster software distribution is open source and includes both the client and server for the file system.
Packaged versions of the Gluster client are included in both Red Hat Enterprise Linux and CentOS releases. Packaged versions of the Gluster server are also included in both releases: With Red Hat, the server is part of a pay-extra add-on called Red Hat Storage Server, and with CentOS, the server is not a part of the official repository, but is available as a 3rd party add-on. (There is currently no Windows client for the native Gluster protocol, but there are recommended ways of accessing Gluster volumes from Windows, such as via the built-in NFS server.)
The Gluster server is a service that runs on a regular Linux server. In a typical deployment, there are multiple Gluster servers on a network, and any given file (such as a VM image), is automatically replicated at the block level on at least two of the servers. When a Gluster client, such as a VM host server, mounts a Gluster file system, it first makes a connection to a single Gluster server. From that single Gluster server, the client learns the topology of the Gluster volume and makes persistent connections with all servers housing the data it needs access to. The client then automatically load-balances reads between all of the Gluster servers that have the data it needs, and the client sends writes to all servers that need the data as well.
This configuration allows for any one of the Gluster servers to fail or be shut down without producing any downtime for any clients accessing the data. In addition, when a server that has been down comes back online, its data is automatically and efficiently resynced at the block level from the other servers.
GlusterFS is a perfect fit for our environment because:
Our setup is fairly simple. We have two Gluster servers which each have mid-sized RAID storage arrays. The data stored within Gluster exists in identical form on each server.
Our Gluster servers primarily host virtual machine images. Our virtual machine servers store all of their hard disk images on the two Gluster servers.
Our Gluster infrastructure has proven to perform much better than our NetApp SAN environment, while at the same time being just as reliable and much more flexible than the SAN.
Gluster also has a number of features that we don’t currently use, but will be very useful as we expand our environment.
Have questions about our experience with file systems and Gluster? We’re happy to help. Reach out to us.