Using Azure to store natural history data

Rob Zschernitz, CTO, The Field Museum
13
33
5

Rob Zschernitz, CTO, The Field Museum

Working with a high performing but small IT team in an active scientific research institution that is the keeper of over 30 million specimens and artifacts, and constantly collecting more, while massively rewarding, can make any IT operation shutter. Additionally, our colleagues use these 30 million plus objects, one of the best collections in the world, in our exhibitions to attract 1.5 million visitors a year to the museum. Over the past decade, there has been explosive growth in file storage needs as these vital objects have been digitized to make these resources available online 24/7/365 to researchers worldwide. The Field Museum moved storing our digitized collections data to Nasuni Enterprise File Services with Azure to address the urgent need to be able to scale “up and out” massive amounts of storage, protected and reliable storage, on-demand. 

Unpredictable needs and growth

Like most, we have been utilizing traditional SAN technologies to store most our data. Over the past 2-3 years we noticed a very distinct change in our storage environments. For years, we’d seen that the largest storage needs were structured data types. Primarily databases and virtual machine image data. While we certainly still have every bit as much of those data, we started seeing explosive growth in file-based data. High resolution imagery, DNA sequence data, scan data, etc. But, because most research projects are grant funded while we’d have forecasts for storage growth on a project per project basis, we can never proactively grow our available storage as we don’t know which grants will and will not be funded. Deploying Azure and the backend of our Nasuni NAS ensured that we should never really ‘run-out’ of storage, while not committing ourselves to significant financial and support investments. Azure gives us the technology we need to scale our storage needs on-demand and in real-time.

  Azure gives us the technology we need to scale our storage needs on-demand and in real-time 

We needed to provide a solution that avoided disrupting the ongoing digitization projects and the work of the museum’s scientists and that of others from around the world who work on-premises or access the museum’s data via the web. Researchers are in the building or remotely tapping into our data at all hours, so we don’t have traditional maintenance windows to conduct a forklift upgrade. That is why the hybrid approach with Nasuni and Azure appealed to us, because it let us scale up our storage quickly and protect it all, yet still get some value from the large investment we have made in our existing SAN.

Nasuni Enterprise File Services with Azure

Having our Nasuni NAS backed with Azure has ended up working even better than we had anticipated. Just over 2 years ago, when we went live we almost immediately pushed approximately seven TB of collections image data up to Azure. We knew that within three to six months we’d have, and planned for, an additional tow to three TB of data. Instead of two to three TB we ended up pushing just under five TB of data for a total of about 13TB of data. Only 14 months later, we’re at just under 22 TB of collections data. While we didn’t know exact amounts, this was the exact case we were solving for and knew that Azure’s Blob storage would be able to handle our potentially massive scalability needs. Additionally, we’ve started the process of migrating all the business files from all shared file servers to being stored on Azure as well. To date we have 6.3 million files totaling 31.4 TB of data stored on Azure. 

The Nasuni technology (UniFS) caches the most commonly and recently accessed files on hardware local in our datacenter for quick access. However, we’ve observed that even when reaching back out to get the files from Azure cloud, our end-users do not feel any discernable latency. Another great benefit to our users is we’re now able to offer them much less cumbersome ways to access their files remotely.  Our researchers are out in the field for significant periods of time so having quick, easy, secure access to their files and data enable them to work more efficiently as well. With a file-focused, hybrid cloud storage solution that delivers highly scalable, unlimited capacity, automatic backups, instant disaster recovery, fast remote access, and simple, unified management through a single console we’re able to leverage all that Azure cloud offers.

Having a pretty small team, we can take advantage of Microsoft’s huge datacenters and their expertise, allowing us to focus more of our time support non-storage specific operations for The Field Museum. A big part of those time-savings come from not having to deal with data protection anymore, as the Azure-backed Nasuni solution performs that function better, faster, and cheaper. Copies of frequently accessed files reside in the local cache, for fast access. Then, to back up those files and for cold storage of archival files, Nasuni maintains a “gold copy” of every file in the Azure cloud, along with multiple geo-redundant copies in Microsoft’s many Azure datacenters around the world. The gold copies are updated as often as every minute. This offers the museum recovery time objectives and recovery point objectives that simply were not possible previously. Thankfully, we have not had any need to do any extensive restores, but, we have extensively tested scenarios and we estimate that while there would be some bandwidth and latency issues to wrangle; we can do full restores in under an hour with nearly instantaneous access to all our file data. Today, we no longer need to over provision storage capacity by buying excess hardware. This has helped to shrink our datacenter footprint. Yet the team can still spin up storage as fast as needed.

Advice to others

Constantly talk about and demo emerging technologies you are planning to deploy to accelerate and encourage change. Closely and continuously monitor and understand your storage environment.
Be very transparent about changes to your storage infrastructure.
Communicate clearly with stakeholders about impact of changes.

Read Also

Taking Advantage of the New Cloud Capabilities

Todd Lant, Vice President – IT, Blackbaud

Why the Time Is Right for Asset Managers to Consider Cloud-Based Solutions

Manish Moorjani, Director Business Consulting, Sapient Global Markets

How the Cloud is allowing Carnival Corporation to better manage Big Data

Walter Carvalho – VP and Corporate CIO – Carnival Corporation & plc