Monday, March 17, 2014

16TB Datawarehouse Appliance using a Sun E20K and Oracle (circa 2008)

We created a 16 TB Data Warehouse Appliance with Oracle and a Sun E20K server and Sun StorageTek 6540 Arrays, circa 2008 pre Oracle Exadata, we wanted to get 6 GB/sec from the Appliance. In that Data warehouse Appliance we wanted to achieve 1 GB/s from each of the arrays. Each Array was connected to 1 dual ported HBA, where each HBA port delivered 4Gbps ie. 2 x 512 MB/s

We tested it by connecting the first storage array, and made sure that it delivered 1 GB/s. Next, we connect the second storage array and made sure we got 2 GB/s. We continued this till we got 6 GB/s when all 6 storage Arrays were connected to the E20K via 6 dual ported HBAs.

I've put the details of how we configured each of the Storage Arrays to deliver 1 GB/s, and the issues that we hit while trying to scale, under the 'Storage' paragraph.

CPU
As our requirement was a sustained throughput of 6 GB/s, we planned on 36 CPU's of the E20K server where each CPU could deliver 200 MB/sec

The rough estimate formula for the number of  CPUs from Oracle is:

<number of CPUs> = <maximum throughput in MB/s> / 200

Memory
We can derive the amount of memory that we need from the amount of CPU's that we are using
<amount of memory in GB> = 2 * <number of CPUs>

Storage:
Calculating the Maximum Throughput of a HBA port:

Lets say we have a really old HBA on our sever, with a 1Gbps port. Then the Maximum Throughput that we can get from that port is 128 MB/s


1 Gbps = 1/8 bits = 0.125 GigaBytes
0.125 x 1024 = 128 MB/sec

Now lets say we replace that HBA with a new one that has 4 Gbps ports. The Maximum Throughput that we can get from that port is 512 MB/s

 4 Gbps = 4/8 = 0.5 GibaBytes
0.5 x 1024 = 512 GB/sec

16 LUNs were created in each Storage Array. Each LUN had 4 disk (3+1 RAID 5), and each disk was 136GB. So we got roughly half a Terabyte (408GB) from each LUN.

To create the above configuration we started with 1 Storage Array, and had it connected to the E20K server. We then measured the throughput we got from a single drive. Next, we created a RAID 5 LUN, to which we kept adding disks. When we noticed that the throughput wasn't increasing that is when we stopped and got our LUN configuration of 3+1 RAID 5.

Issues we hit during the Storage configuration:
  • We used vdbench, Oracle's ORION tool and the native "dd" tool to configure the storage. After configuring the LUNs we gave the system and storage to the Oracle team to run their ORION tests. After running some of their tests the Oracle team got back saying that the Storage and I/O sub-system had issues, as it did not deliver the required throughput of 1 GB/s. They said that they double checked it using the dd command.  After getting the hardware back from the Oracle team, we figured the issue actually came from the ORION tool, which back then had a bug(it was single threaded). We also proved to the Oracle team that running a sing "dd" instance did not prove anything. We convinced them by running number of "dd" instances in parallel. We could clearly see that we we increased the number of dd's the throughput started increasing
  •  The second issue that we hit was when we tried to scale. After getting 1 GB/s from our first Storage, we connected the second Storage Array, and that gave us 2 GB/s as we had expected. On adding the third Storage Array we saw that the throughput was did not increase, but still delivered 2 GB/s. After analysing the storage sub-system thoroughly we learnt that the issue was with the way we had connected to the Storage Array to the E20K server. The Sun Fire E25K/E20K hot-swap PCI assembly architecture (hsPCI-X/hsPCI+) has two I/O controllers. Each controller provides one 33-MHz peripheral component interconnect (PCI) bus and three 33/66/90 MHz PCI buses for a total of four on each I/O assembly. Therefore, each I/O assembly has four hot-swap component PCI slots. A Sun Fire I/O assembly has a 2.4 Gbyte/sec connection to the rest of the system.So when we connected our first two storage arrays to the I/O assembly we were fine as we were within the 2.4 Gbyte/sec limit, but saw the issue when the third storage array was connected. We resolved the issue by connecting the other arrays to a different I/O assembly (The E20K has 9 I/O assemblies in total).
The above idea and method can be used to create any appliance, particularly a Hadoop appliance.

No comments: