Using ZFS with AWS for higher performance and reliability.

With the adoption of cloud going main stream, huge amounts of sensitive data is being moved from in premise infrastructures to the cloud platform, to help reduce costs of hardware, maintenance and achieve more reliability. While it is a fact that by virtue of leveraging a well designed cloud platform like AWS (Amazon is known for its reliability of its resources with almost no downtime of the data centers), your applications benefit from increased reliability and up-time, our engineers however learned that it is not enough if we want to move up that last 0.1% or 0.2% to get us closer to our goal of always on availability.

Related: For SMEs Cloud Based Services is the default way forward

Here we discuss our journey to achieve that magical 99.9% uptime figure for our Endurance-S solution, served off the AWS infrastructure.

Related: 5 unavoidable reasons to adopt collaboration services on the cloud

Amongst the many components of a collaboration infrastructure, here we discuss our attempt to move the storage infrastructure up a few notches and instead of using a disk as a disk with basic RAI.

We defined the following requirements for our new storage platform:

  • A Logical Volume Manager to help us scale the storage on demand, with no downtime
  • Data Integrity to ensure reliability for the data
  • Compression to optimize storage and also achieve higher performance.
  • Higher I/O performance to enable better response to end users.
  • Protection against data corruptions
  • Snapshots for quick online backups, which don’t load the servers
  • Quota Allocations, etc.

In our production environment for the collaboration infrastructure, we need to handle millions of small files distributed over thousands of folder (maildir). Our current infrastructure runs of an ext4 file system and we had to make a choice of whether we can get all of the above requirements met with EXT4 or would we need another file system.

With this post, we have shared our observation of performance benchmarks done between Linux+EXT4 on EBS and Linux + ZFS over EBS.

The Setup:

  • Hardware:
    • AWS EC2 Instance m3.xlarge (14 GB RAM and 4 vCPU Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz)
    • 3×250 GB General Purpose SSD (AWS specific) to test on.
  • Software:
    • CentOS 6.0
    • Kernel Version: 2.6.32-220.el6.x86_64
    • ZFS on Linux v0.6.3
  • Testing Tools used:
    • IOzone
    • Apache JMeter
  • ZFS has been configured as follows:
    • The zfs pool is configured in MIRROR mode of 2×250 GB Disks (Amazon-specific SSD).
    • L2ARC of 40 GB has been added to the zfs pool for improving read speed.
    • ZFS has been limited to use the system RAM maximum of 8 GB and minimum of 4 GB (in 16 GB system).
    • Checksum has been turned off as faster performance is noticed with it.
    • ZFS Compression is set to LZJB compression as it is faster than its counterparts.

Testing IMAP with JMeter

Using Apache JMeter:
We made use of a pre-configured Mithi Connect Xf collaboration server on CentOS 6.0 to stress test POP and IMAP Protocols for different users.

Apache JMeter Configuration Screenshots:
Specify the number of threads (or users) connected concurrently

Under the Mail Reader Sampler,

  • Specify the protocol to be used (IMAP/POP). We will be testing IMAP as of now.
  • Insert Required details as marked in the below screenshots.
  • Number of messages to retrieve (per thread/user) is set to 10. *For very aggressive testing, one can set it to All. However, it is not recommended for large mailboxes as the server will tend to hang up.

The above setup will perform as follows:
For each thread, 10 messages will be retrieved, simultaneously.

In the above screenshot, we can see the throughput = 279.06/minutes. This means that the server can handle ~279 requests per minute for retrieval. Average time is the time taken for 1 request to fulfill.

Output :

The output is shown in CSV format. The field of concern is elapsed and bytes. Here, elapsed time is the time in milliseconds, to fetch 101091 bytes of data from server.

To get different results, we varied the number of thread/users count in the 1st screenshot. The output is shown below in tabular form (for both ZFS file system and EXT4 file system):

Results :

P.S.: For EXT4 no changes had been done as far as optimization and tunings are concerned.

Observation:

  • EXT4 disk get strained on tests done for threads/user counts above 40.
  • ZFS can handle more than 40 threads/user counts up to 60 without significant strain on imapd and/or server.
  • Faster test results obtained when tested against small mailboxes (range: 300-700 MB)
  • Slightly slower tests results obtained when tested against gigantic mailboxes (range: 6-10 GB)
  • No 2 tests had been run for a single user.
  • Each test has been conducted for a different user (large mailbox only).

Conclusion:

  • ZFS looks promising on delivering data during peak hour usage without hampering other processes. On the other hand, tests on ext4 brought down idle time to ~50%.

Testing POP with JMeter:

Using JMeter to test POP protocol is similar to that of IMAP. In this the only change is to be done of protocol name to pop3 and port number to 110.

Same test environment were kept for pop3 testing too.

Results :

*For POP protocol testing, we tested for both Checksum kept ON and OFF.

Observation:

  • Difference of request handling by server per minute when checksum is set to OFF is greater by ~60%
  • Average time to complete 1 request is less when the checksum is set to OFF.
  • Between ZFS (Checksum OFF) and EXT4, slightly higher performance gain can be seen in ZFS. However, the average response time for 1 request is almost same in both the filesystem.

Conclusion:

  • Since both filesystems have the same average response time for 1 request, capacity to handle requests is better on ZFS, which means ZFS can handle loads better than ext4.

 

Leave a Reply

Your email address will not be published. Required fields are marked *