Articles with tag 'Testing'

Starz Animation Network Storage Testing
Written by  John Hickson
Posted on December 1, 2009 in Storage , Testing | 0 Comments

Introduction

Please note that the following tests and results are purely for informational purposes only. The systems being compared are not "apples to apples" - each configuration being tested is in fact quite different from each other. The tests have been run over a 2 year period and you should expect better performance from new hardware and software code releases. All systems that have been tests have various hard drive configurations (drive controllers, drive speeds, storage size, and total number of spindles) as well; all the systems have different size network connections. In fact, the only thing that is consistent is the actual test being performed and the nodes that performed the actual tests.

In our case, we consider the following criteria to be the most positive - 1) Does the test complete without errors? 2) Does the performance obtained by the test appropriate for the solution required? 3) Is the total cost of ownership of the solution fit within our budget? I have not included pricing in this document as some of the tests were performed on equipment that is no longer available. Also, prices may vary in different parts of the world.

Finally, I must again stress the point that it was never the intention to see which system was the fastest or outperformed another system.  The set of tests were done on equipment we currently have (or had) in house (as a comparison for future expansion requirements) or on equipment we planned on purchasing for a specific function within our production pipeline.  For example, when testing the Isilon system, at the time, it was very important to be able to increase the capacity and/or performance in mid-production quickly without interrupting our artists.  I was worried that when I added a new node to the cluster, the AutoBalancing would decrease overall performance and had no idea how long it would take before performance would improve - by the tests below - I was clearly mistaken and I got a in increase in performance immediately.

When selecting the correct solution for your environment, you will have to take a lot of things under consideration. Especially support, maintenance and the specific features you require.

 

Network Storage Performance Testing

In order to test the network storage performance in our network environment, we have set up the following tests to capture the attached data. The fist test (multiple small files test) creates three (3) separate unique files from each render node with a size of 5MB, 10MB, and 20MB (TEST A and TEST B) before continuing through the remaining tests (TEST C, TEST D and TEST E). This test runs one time and reports the timing of each step. The second test (single large file test) will do the same as the first, with the exception of only creating a single file with a size of 1GB. Both the multiple small files and single large file test are run three (3) separate times on the following number of nodes: 1, 3, 4, 10, 16, 32, 64, 128, and 256. I know this seems like an odd number of nodes, but it was the best combination for the way our render nodes are connected to the CORE as well as testing the Isilon System with 4 nodes + an Accelerator for even distribution of network bandwidth.

TEST A

Create file(s) on the local disk to be copied across the network in TEST D.

-          The time required to complete Test A is not included in the calculations below as it does not use the network storage and only uses the local disk.

5MB, 10MB, and 20MB Files

dd if=/dev/sda2 of=/LOCAL_PATH/FILENAME bs=1024k count=5
dd if=/dev/sda2 of=/LOCAL_PATH/FILENAME bs=1024k count=10
dd if=/dev/sda2 of=/LOCAL_PATH/FILENAME bs=1024k count=20

-           if=/dev/sda2 is used to create a file with random contents - using /dev/zero allows cache to affect the final results

1GB File

dd if=/dev/sda2 of=/LOCAL_PATH/FILENAME bs=1024k count=1024

-          if=/dev/sda2 is used to create a file with random contents - using /dev/zero allows cache to affect the final results

 

TEST B

Create file(s) on the network disk to be copied across the network in TEST C and TEST E

-          A second file is created so to further eliminate the affects  of cache on the system

5MB, 10MB, and 20MB Files

dd if=/dev/sda2 of=/NETWORK_PATH/FILENAME bs=1024k count=5
dd if=/dev/sda2 of=/NETWORK_PATH/FILENAME bs=1024k count=10
dd if=/dev/sda2 of=/NETWORK_PATH/FILENAME bs=1024k count=20

-           if=/dev/sda2 is used to create a file with random contents - using /dev/zero allows cache to affect the final results

1GB File

dd if=/dev/sda2 of=/NETWORK_PATH/FILENAME bs=1024k count=1024

-           if=/dev/sda2 is used to create a file with random contents - using /dev/zero allows cache to affect the final results

 

TEST C

Copy file(s) from the network disk directly to the network disk.

cp /NETWORK_PATH/FILENAME /NETWORK_PATH/NEWFILENAME

 

TEST D

Copy file(s) from the local disk  to the network disk.

cp /LOCAL_PATH/FILENAME /NETWORK_PATH/NEWFILENAME

 

TEST E

Copy file(s) from the network disk to the local disk.

cp /NETWORK_PATH/FILENAME /LOCAL_PATH/NEWFILENAME

 

Amount of Network Data Accessed

During the tests the following amount of data is transferred across the network. Using these and the time calculations we can calculate the actual transfer speeds obtained.

Multiple Small Files Test (5MB, 10MB, and 20MB Files)

 

1-Node

3-Nodes

4-Nodes

10-Nodes

16-Nodes

32-Nodes

64-Nodes

128-Nodes

256-Nodes

TEST B

35MB

105MB

140MB

350MB

560MB

1120MB

2240MB

4480MB

8960MB

TEST C

70MB

210MB

280MB

700MB

1020MB

2140MB

4480MB

8960MB

17920MB

TEST D

35MB

105MB

140MB

350MB

560MB

1120MB

2240MB

4480MB

8960MB

TEST E

35MB

105MB

140MB

350MB

560MB

1120MB

2240MB

4480MB

8960MB

Totals:

175MB

425MB

600MB

1.75GB

2.8GB

5.6GB

11.2GB

22.4GB

44.8GB

Large File Test (1GB File)

 

1-Node

3-Nodes

4-Nodes

10-Nodes

16-Nodes

32-Nodes

64-Nodes

128-Nodes

256-Nodes

TEST B

1GB

3GB

4GB

10GB

16GB

32GB

64GB

128GB

256GB

TEST C

2GB

6GB

8GB

20GB

32GB

64GB

128GB

256GB

512GB

TEST D

1GB

3GB

4GB

10GB

16GB

32GB

64GB

128GB

256GB

TEST E

1GB

3GB

4GB

10GB

16GB

32GB

64GB

128GB

256GB

Totals:

5GB

15GB

20GB

50GB

80GB

160GB

320GB

640GB

1280GB

Network Access Totals:

The combined data transferred across the network from the combined running of all tests.

 

Test Run 1

Test Run 2

Test Run 3

Total

1

89.75GB

89.75GB

89.75GB

269.25GB

2

2570.00GB

2570.00GB

2570.00GB

7710.00GB

Totals:

2659.76GB

2659.76GB

2659.76GB

8087.25GB

Node Result Data Points:

The data collected from the combined running of all tests.

 

Test Run 1

Test Run 2

Test Run 3

Total

5MB

514

514

514

1542

10MB

514

514

514

1542

20MB

514

514

514

1542

1GB

514

514

514

1542

Totals:

2056

2056

2056

6168

 

Tested Configurations

Currently we have tested several filesystem solutions with the above mentioned tests (see results below).  I would love to test more, but it sometimes becomes difficult to arrange a testing window that does not compete with our production schedules.  Each of the above tests were run three (3) separate times and the sets of statistics have been recorded for each of the below documented configurations to help determine the average overall performance:

Configuration A
Tested on July 19th, 2007

Standard Isilon 3-Node cluster

  • 3 x Isilon IQ 1920i
  • 3 x 12 SATA Shelves (12x3 = 36 Spindles)
  • 3 x 1GB Network Pipes (Round Robin Access)

Configuration B
Tested on July 23rd, 2007

Standard Isilon 4-Node cluster (with Auto Balancing running in background)

  • 4 x Isilon IQ 1920i
  • 4 x 12 SATA Shelves (12x4 = 48 Spindles)
  • 4 x 1GB Network Pipes (Round Robin Access)

Configuration C
Tested on August 4th, 2007

Standard Isilon 4-Node cluster

  • 4 x Isilon IQ 1920i
  • 4 x 12 SATA Shelves (12x4 = 48 Spindles)
  • 4 x 1GB Network Pipes (Round Robin Access)

Configuration D
Tested on August 4th, 2007

Standard Isilon 4-Node cluster w/Accelerator Node

  • 4 x Isilon IQ 1920i
  • 1 x Isilon Accelerator Node
  • 4 x 12 SATA Shelves (12x4 = 48 Spindles)
  • 5 x 1GB Network Pipes (Round Robin Access)

Configuration E
Tested on August 6th, 2007

Standard BlueArc Titan 1 System

  • BlueArc Titan 1 Head
  • 4 x 14 SATA Shelves (14x4 = 56 Spindles)
  • 4 x 1GB Trunked Network Pipe

Configuration F
Tested on September 20th, 2007

Standard BlueArc Titan 2 System

  • BlueArc Titan 2 Head
  • 8 x 14 FC Shelves (14x8 = 112 Spindles)
  • 4 x 1GB Trunked Network Pipe

Configuration G
Tested on September 1st, 2008

Standard BlueArc Titan 3 w/Stone FS

  • BlueArc Titan 3 Head
  • 6 x 14 FC Shelves (14x6 = 84 Spindles)
  • 1 x 10GB Network Pipe

Configuration H
Tested on September 1st, 2008

Standard BlueArc Titan 3 w/Stone FS Using File Expansion

  • BlueArc Titan 3 Head w/Stone FS
  • 9 x 14 FC Shelves (14x9 = 126 Spindles)
    • Includes Filesystem in Configuration H
    • Plus expansion 3 x 14 FC Shelves (14x3 = 42 Spindles)
  • 1 x 10GB Network Pipe

Configuration I
Tested on August 23rd, 2009

Standard Sun Storage 7410

  • Sun Storage 7410 Head w/500GB Flash-Readzila
  • Build 10.3.0.1-1.6 with compression turned OFF on the share
  • 2 x 24 Shelves (22x2 = 44 Spindles + 2x18GB Flash-Logzilla)
  • 1x 10GB Network Pipe

Configuration J
Tested on August 30th, 2009

Standard Sun Storage 7410

  • Sun Storage 7410 Head w/500GB Flash-Readzila
  • Build 10.5.0.1-1.20 with compression turned ON on the share
  • 2 x 24 Shelves (22x2 = 44 Spindles + 2x18GB Flash-Logzilla)
  • 1x 10GB Network Pipe

 

Render Node Configurations

For these tests we have been using three (3) different types of render nodes performing the tests which have been evenly distributed to ensure consistent network bandwidth to the storage being tested. We currently have newer nodes available and future tests will have to use newer nodes as the older one become no longer used and taken off our network.

IRNxxxxx Nodes (168 of 168 Render Nodes used)

  • IBM Blade Center HS20
  • 2 x 3.2GHz Intel Processors
  • 4GB RAM
  • Each set of 14 render nodes connected to network by 1GB.

ORNxxxxx Nodes (14 or 14 Render Nodes used)

  • IBM Blade Center LS20
  • 2 x Dual Core Opteron 2.2GHz Processors
  • 4GB RAM
  • Each set of 14 render nodes connected to network by 1GB.

SRNxxxxx Nodes (74 of 118 Render Nodes used)

  • Sun Microsystems SunFire VX60
  • 2 x 2.8Ghz Intel Processors
  • 2GB RAM
  • Each set of 39 render nodes connected to network by 2GB trunk.

 

Network Configuration

            We currently are using an HP ProCurve 8212 for our CORE switch which is connected to the render nodes with 2 x 10GB fiber connections (Test Configuration: I to J). All the previous tests (Test Configuration A to H) were tested on our previous network configuration (a Foundry Fast Iron 1500 with 1 x 10GB fiber to render farm).

 

Summary

All systems tested completed the jobs 100% with no errors on all nodes which is great. Several years ago this was definitely not the case. I remember testing old Network Appliances arrays, IBM JFS, and Maximum Throughput solutions with a high failure rate of nodes or locking up the system completely so nothing completed at all. Luckily we've come a long way since then and it is always improving - in production - our goal is always to increase capacity, improve performance, reduce costs, and reduce management.

As I mentioned earlier, when testing the Isilon IQ 1920i series nodes, I wanted to see what the affect of AutoBalancing had on the performance of the system. I was happily surprised to find that the performance increased when a new node was added to the cluster and even happier to see that once the AutoBalancing completed, the performance increased even more.  Although an Accelerator node improved overall performance, I noticed that the nodes connecting to the Accelerator completed much faster than the other nodes, which in turn, made those complete much slower overall. For my purposes, I decided that an Accelerator node was not required as I am looking for an even throughput across all my nodes.

When testing the BlueArc Titan 1, BlueArc Titan 2 and BlueArc Titan 3 we can see the progression of the updated heads giving us better performance as advertised. The new version of their filesystem - StoneFS - increased the performance even further and gave us the ability to add drives to a storage pool when more performance (not storage) was required. Past versions of their OS did not allow this option. I was very pleased when this new feature was finally added.

The SunStorage 7410 is the only system I have tested with a large amount of FLASH memory to help speed up performance. The results were surprisingly good for the amount of hardware being used.  Although only one head may have been a little too much for 256 nodes to be hammering it all at once - you can see the performance drop on large files when going from 64 to 128 nodes.  I suspect that the model I tested would be great for performance with around 80 nodes. One thing you will notice from the second test with LZJB compression turned on for the export, performance significantly improved which is very impressive. This also allows you to have more nodes accessing the filesystem simultaneously without as much performance degradation as the first test suggests.

Overall, I've been very happy with everything that I've tested. All the systems were very easy to setup, easy to manage and have great support. In production we currently use both the BlueArc and Isilion systems heavily and we have never run into issues that have affected production or even more importantly caused a loss of data.

I'd just like to add that one of the points of these tests were to try to overload the system for worse case scenario of a real life production scenario. In reality we have 300 users and over 512 render nodes (over 2400 processors) hammering the filesystems each and every day - and - the daily load average on each of the filesystems is much less then it was during the tests.  I am confident that any one of these systems could hold up under normal production with our current size and run even more effectively in smaller studios.

Not that long ago - storage was a big concern - what if the filesystem fails - am I safe - for the past 4+ years - the only thing I worry about is the users filling it up too fast - but I can sleep at night not worrying about failures.


    Contests/Surveys
  • ** COMING SOON **

TopofBlogs