Network Storage Overview
The following is a brief summary of how multiple vendor network storage is currently used at the Starz Animation facility in Toronto. Our infrastructure has grown over the past couple of years from a small 30-60 person facility with 100 render nodes to a much larger facility that currently has approximately 300 people with just over 500 render nodes. In the past, a lot of the storage for projects was all located on local attached disk (which is probably still true in many smaller shops). As our facility grew and we had access to bigger budgets, we were able to completely get rid of all our old server attached storage and move to a better enterprise solution as shown in the diagram below. This by no means the best or only solution, and may not work at all in your facility; however; in this document, I would like to outline some of the benefits and reasons why we chose to set up our storage in this manner.
By moving away from server attached storage, we were able to protect our data in the following ways:
SNAPSHOTS allows us to take a "picture" of the current filesystem and go back to retrieve files at a specific point in time. We currently have SNAPSHOT rules that take these "pictures" every 3 hours for a full day, every full day for a week, and every week for 3 weeks. This allows us to retrieve files very quickly when users remove or overwrite files by mistake (which seems to happen much too often).
With multiple storage systems, we were able to logically separate the data as if flows through our pipeline to spread the load of the filesystems across different areas. We also have the extra benefit of using CLUSTER capabilities of both BlueArc and Isilon so in the case that an entire head fails; another head will take over and pick up the load automatically.
For disaster recover purposed (backups and archives), we have added a large FC array of TIER-2 storage directly to our backup server. This gives us the ability to RSYNC all of the data we usually backup to tape and keep it on-line so it can quickly be retrieved without having to go to back tape (which is obviously much slower). This server also has an ADIC i2000 tape library attached with 4 LTO-3 tape drives that run incrementals each night with fulls once per month.
When projects are complete, we simply make two archives (one for on-site and one for off-site storage) of the full project. Then we run some custom scripts that replace the internal directory pointers on all the assets (mainly Maya files) and move them into our Library (located on the Isilon cluster) so they can be re-used for future projects. Once all are archive is complete and has been tested, we delete the project completely from the BlueArc storage arrays.
Cost of Use Benefits
By moving to an enterprise storage solution we were able to reduce our cost of ownership in the following ways:
Evenly distributing the load of file access across multiple filesystems is very tricky especially with the highs and lows of your renderfarm. Many places will assign a filer a project so that everything for that specific project is located in one place. We have chosen to split projects across all filers so that no matter what project is being worked on, the filers are always used about the same amount. The number of people and machines accessing the data remain fairly constant no matter which projects we are working on so we have built our filesystems to handle the worst case scenario (example: all artists and all render nodes hitting the filesystem simultaneously). Some filers still are worked more than others, but even at the highest load, we are working the filers well under their tested performance capabilities. In our facility (as seen in the diagram) we have split our data as follows:
This storage library contains both NFS (Unix) and CIFS (Windows) exports for our Library (Software, Media files, Non-production related documents and Assets from non-current productions for reference and possible re-use), User Windows Roaming Profiles, User Home Directories (both Windows & Unix), and Apache Web Services (See Load Balancing Network Services). This filesystem is not accessed by the renderfarm; it is strictly for all non-production data.
BLUEARC TITAN 3-A (Production)
The storage library contains all production related data - management, textures, Maya files, etc (anything a production person or artists touches for a project). All machines read from this filesystem, however, only artists write to it.
BLUEARC TITAN 3-B (Render Cache/Render Layers)
This storage library contains all Mentalray pre-render cache as well as all rendered layers. Only the renderfarm touches this filesystem as users rarely need to access these files. By separating these files out from the other filesystems, we greatly decreased the loads on other production files. When the renderfarm kicks in at full power in the middle of the day users are no longer affected by when loading and saving their files.
BLUEARC TITAN 3-B (Compositing)
This storage library contains all final composite frames; production related media files as well as the logs created by the renderfarm. The renderfarm is the only set of machines that write to this filesystem. Final compositing frames and render logs are only accessed by a small number of users. The media files created are accessed via our web-based production system.
What is a Cross-Platform Link-Based Directory Structure?
A link-based directory structure is created completely using links so that both UNIX and WINDOWS systems can easily access files in a similar manner. Using this method, you can easily maintain tools to automatically create directories and files in a consistent nature. You also gain the ablility to move directories and/or entire projects to new physical file systems while being totally invisible to the users.
What is needed to create a Cross-Platform Link-Based Directory Structure?
Nothing is required to be purchased. All you need is a UNIX server running SAMBA with MSDFS configured.
How to set-up a Cross-Platform Link-Based Directory Structure
First we choose a common drive letter to mount under WINDOWS (in our examples we will choose N:). Similarly, for mounting drives under UNIX, we select the same name as automount drive (/n). By choosing the same path, users will be able to navigate on either WINDOWS or UNIX by using the same standard paths. This also allows conversion mapping when files needs to be moved between WINDOWS and UNIX easily.
netbios name = LINKSERVER
workgroup = YOUR_DOMAIN
host msdfs = yes
security = ads (or whatever you use)
encrypt passwords = Yes
password server = YOUR_DOMAIN_CONTROLLER (or password server)
max log size = 5000
log file = /opt/samba/var/log/%m
use sendfile = no
local master = yes
os level = 3
socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192
log level = 3
log file = /opt/samba/var/log/%m
utmp = yes
utmp directory = /var/log/utmp
wtmp directory = /var/log/utmp
server string = DFS Server
guest ok = yes
comment = Windows N: drive
path = /export/windows_n
msdfs root = yes
writeable = Yes
create mask = 0774
directory mask = 0775
guest ok = yes
So on WINDOWS we would mount the SAMBA server \LINKSERVERwindows_n as N: and on UNIX we would mount linkserver:/ /export/unix_n as /n
Creating a project under the Cross-Platform Link-Based Directory Structure
For this example lets create two projects (PROJECT1 and PROJECT2) with a simple directory structure: Remember that WINDOWS and UNIX both see different links, so we need to create the real directories on the filers the files will actually be located as well as two sets of links (one for WINDOWS and one for UNIX).
On the LINKSERVER you have two directorues /export/windows_n for WINDOWS links and /export/unix_n for UNIX links.
Create default project directories
chmod 555 /export/windows_n/projects (So WINDOWS users can't create files at this level - otherwise won't be seen in UNIX)
chmod 555 /export/unix_n/projects (So UNIX users can't create files at this level - otherwise won't be seen in WINDOWS)
Note: A physical projects directory doesn't need to be created this is just a fake level.
Create project directories
On your storage create real directories (At this point lets pretend we have only 1 storage filer called STORAGE1), this should be mounted under UNIX under /n if you use automounts. For this example we only have 3 top level directories to make the examples easier, in your studio, you can break down the levels as much as you need to. The more breakdown the better if you are forced to move data a later period of the project due to space constraints. One thing to consider is that users will not be able to create files are directories at the top level, so make sure you create ALL the top level directories that you are going to use. The reason for this is simple, since the directories are actually links, WINDOWS users and UNIX users will not be looking at the same top level so one or the other will not be able to see what is created at that level.
Directories where files will actually be located:
WINDOWS Links: (no files ever located here) (pay attention to the double backslashes)
chmod 555 /export/windows_n/PROJECT1 (So WINDOWS users can't create files at this level - otherwise won't be seen in UNIX)
ln -s msdfs:STORAGE1\PROJECT1\Work Work
ln -s msdfs:STORAGE1\PROJECT1\Render Render
ln -s msdfs:STORAGE1\PROJECT1\Comp Comp
chmod 555 /export/windows_n/PROJECT2 (So WINDOWS users can't create files at this level - otherwise won't be seen in UNIX)
ln -s msdfs:STORAGE1\PROJECT2\Work Work
ln -s msdfs:STORAGE1\PROJECT2\Render Render
ln -s msdfs:STORAGE1\PROJECT2\Comp Comp
UNIX Links: (no files ever located here)
chmod 555 /export/unix_n/PROJECT1 (So UNIX users can't create files at this level - otherwise won't be seen in WINDOWS)
ln -s /n/STORAGE1/PROJECT1/Work Work
ln -s /n/STORAGE1/PROJECT1/Render Render
ln -s /n/STORAGE1/PROJECT1/Comp Comp
chmod 555 /export/unix_n/PROJECT2 (So UNIX users can't create files at this level - otherwise won't be seen in WINDOWS)
ln -s /n/STORAGE1/PROJECT2/Work Work
ln -s /n/STORAGE1/PROJECT2/Render Render
ln -s /n/STORAGE1/PROJECT2/Comp Comp
That's all you need to get started. In WINDOWS if you can navigate to N:projects and see two projects, where as in UNIX you can navigate /n/projects and see the same directories.
Moving Your Data
Now that your projects are created in the Cross-Platform Link-Based Directory Structure you are able to move your data whenever you like and be total invisible to the users.
Above we created two projects (PROJECT1 and PROJECT2) on our single storage filer STORAGE1. Well, the projects are getting two big and STORAGE1 is running out of room, we need to move the data.
In this example, let's say we have a second storage filer available (STORAGE2).
You do have some decisions to make, do you want to move an entire project to the new filer or do you want to move data from both projects? Either way you are covered.
Let's say we decided to move just "Render" and "Comp" directories for all projects to the new storage and keep the "Work" directories where they are.
No matther what your choice is, you will have to sync the data to the new storage filer and make sure the directories are set up the same as the original.
Note: Don't touch the links until the sync in complete! Also, you will need to take the directories down for a bit to do a final sync and make sure you don't lose any data!
Updating Windows LInks:
rm -Rf Render
ln -s msdfs:STORAGE2\PROJECT1\Render Render
rm -Rf Comp
ln -s msdfs:STORAGE2\PROJECT1\Comp Comp
rm -Rf Render
ln -s msdfs:STORAGE2\PROJECT2\Render Render
rm -Rf Comp
ln -s msdfs:STORAGE2\PROJECT2\Comp Comp
Updating UNIX Links:
rm -Rf Render
ln -s /n/STORAGE2/PROJECTS1/Render Render
rm -Rf Comp
ln -s /n/STORAGE2/PROJECTS1/Comp Comp
rm -Rf Render
ln -s /n/STORAGE2/PROJECTS2/Render Render
rm -Rf Comp
ln -s /n/STORAGE2/PROJECTS2/Comp Comp
That's all you need to do. WINDOWS and UNIX users will both see the projects the same way even though the data has been physically moved.
All the above can be scripted so that a single command will create a new project, all the physical directories and both sets of links to your default storage locations. Also, you should customize your pipeline tools so that the link-based system is used and stop any tools looking at specific mounts and/or filers.
Why Load Balance Network Services?
Most network installations have their current network services such as DNS, NIS, NTPD, CUPSD, and multiple APACHE servers all running on seperate physical servers. Natural balancing of all of these services is difficult, if not impossible due to the nature of how often these services are accessed and the resources each service require to perform their function. You quickly notice that while some servers may be overloaded at some regular points of each day; most of your infrastructure servers are hardly utilized at all. It is nice to see that the load on any specific server is low (around 0.01), but what that actually means; is that server is not doing anything most of the time. This means that your servers are wasting resources like electricity and cooling as well as valuable space in your server room. You need all of theses services available all the time so your facility can function. But how can can we even out the load across servers to maintain a consistent load and reduce slowdowns during peak periods?
Many people have been using ro investigating a VMWare solution to consolidate these services on a single machine, however; you usually need fsufficient funds tn order to purchase a large server that has more power than you will ever need to compensate for the overhead of all the individual VMWare instances you will need to load. Then, to be safe, you will need a second one of these overpowered machines for protection in the event your main VMWare fsrver fails. This secondary machine needs to be just as powerful to load all the instances you had running on your primary server. So, you end up having a very powerfull and expensive machines sitting 100% idle all the time which again is a total waste of production (most of the time).
In our case, we didn't have the luxury of purchasing new servers for all our aging infrastructure. We did have some old render nodes that (individually) were not powerfull enough to run all the services we require with everyone accessing them all day, (as well as the additional load required to server our renderfarm) but we had a lot of them. We could go back and just upgrade each of our existing servers, but we would still be stuck with an unbalanced infrastructure. Load Balancing Network Services is what we decided to do.
Benefits of Load Balancing
How-To Setup A Load Balanced Network Services
In our studio, we simply created a master server for services like DNS and NIS that only talks to the cluster of servers that will be behind the Load Balancer. This server is only accessed by a couple of people on our Systems team and the slave servers in the Load Balanced cluster. All the other machines and render nodes in our facility access the Load Balance cluster machines to talk to the services. We also created an installation script for all the Load Balanced Servers using XCAT (eXtreme Cluster Administration Tools) the same way we do for all of our render nodes. The Load Balanced servers are set up as slaves for both DNS and NIS so they keep up to date automatically whenever the master is changed. Configuring the Load Balancer itself was quite simple, we didn't have to change any IP's for the servers in the cluster, create additional VLANs, we practically just plugged the Load Balancer hardware into our CORE switch and configure it to point to the machines we wanted in the cluster (this took about 15 minutes). For APACHE services, instead of running all the web code in /srv/www/ on the local machines as we have always done in the past, we created soft links to a network share, this way all code can be updated in one location (our network share also gives us the benefits of snapshots and being backed up), but all servers in the cluster are automatically updated. These features of the installation setup also add to the simplicity of adding or rebuilding servers in the cluster as most of the configuration is static. Just make sure that whenever you do make a change to any of the services, make sure you also update your installation script.
Basically, all the servers are set up from the same script and installed exactly the same way, we use DHCP on all of these machines so they get different names and IP's (hardcodes in DHCP so they always get the same IP). The Load Balancer then checks to see which nodes are available and all services are evenly distributed to one of them. The clients all point to the load balancer for all their requests and it just forward them to the machine it decides they should use. This way all servers are relatively working at the same load as one another all of the time they are being used.
If one of your servers is having a problem, just re-install it from the scripts, or take it off-line. Your services will still be running, just on less machines. As a bonus feature since all servers are the same and installed using a script, there is no need to back up any of the individual servers on the cluster.
If your servers are getting overloaded, just add more machines to the cluster. Your Load Balancer will allow you to control the percentage a certain server so if you have more powerful machines, you can keep the load evenly distributed for future growth.
One last thing to note, this will only work if the services that are capable of load balancing. You database server and your render manager for instance won't work in this situation, these services have enough load on them individually to keep them on their own server as we have always done in the past. Some of your Apache services might have to be changed a little depending on how you control authentication and session management.
Happy Load Balancing.