Can Warehouses in the Cloud Stop Data Siloing?
Complete the form below to unlock access to ALL audio articles.
McKinsey Global Institute estimate that the application of big data could generate $100 billion annually across just the US healthcare system; information literacy is now a requirement for success in this field. But despite the value of information to so many companies across the biomedical area, there has been a reluctance to embrace the need to properly store that data where it can be made use of.
Matillion, based between dual headquarters in Denver and Manchester, has been quietly developing a solution. Initially, Matillion sought to develop cloud data warehouses (CDWs) that could function as handy repositories for valuable data. But in the years since, the company has begun to focus more on how to process that data to maximize its value. Its proprietary software, Matillion ETL and Data Loader, hope to show that warehouses can do a lot more than provide space to overwhelmed data companies. We talked to Arawan Gajajiva, a principal solution architect at Matillion, to find out more.
Ruairi Mackenzie (RM): Why do companies want to load their data into Cloud Data Warehouses?
Arawan Gajajiva (AG): Businesses use cloud data warehouses (CDWs) for a few reasons. First, CDWs help them cope with ever-increasing volumes of data. Cloud-based data warehouses enable flexibility and scale, letting a business elastically grow storage and compute resources independently as their data needs grow over time. In a legacy on-premises data warehouse architecture, businesses need to estimate their future compute and storage needs and procure appropriate hardware to meet those needs, even if they are not required today. At a minimum, this would result in unused capacity in the near term and if business needs change (as they always do), can also result in insufficient storage or compute resources in the future, which results in not being able to meet business needs or additional investment to procure and install additional resources.
As data increases, managing it becomes more complex. CDWs reduce that complexity while maintaining agility and performance. In addition to coping with copious amounts of data, enterprises are also creating data in multiple formats. CDWs support various data structure types and formats.
Lastly, CDWs are the best option for improving disaster recovery. Automatic backups ensure business continuity. In a disaster, processing capacity can be spun up to leverage cross-region replicated data.
RM: Scientific data is often heterogeneous and complex. What challenges does this pose for Matillion’s Data Loader software?
AG: Disparate data silos exist in all businesses, with different data formatted in different ways inside of different systems. Matillion Data Loader supports a variety of data sources that will bring together data into a central location and unify that within a cloud data warehouse. For more complex use cases, data transformation software like Matillion ETL can join data together to make it usable for reporting and analytics.
RM: How are Matillion “undermining” data pipeline providers?
AG: The launch of Matillion Data Loader changes the industry in that Matillion is providing for free a product that is a flagship product for some of its competitors. By offering this ingestion tool at no cost, Matillion democratizes data for enterprises, helping them get started on their data journey. As the sophistication of enterprises’ needs grows, Matillion is able to help them scale to a fuller solution.
Companies undergo a maturation process from extruding and loading data so they can analyze it, to full transformation. Matillion helps them understand what it looks like to pull all of their data sources together and support them as they become a more data-literate organization that is able to get much more from their data.
RM: How does loading data into CDWs affect data analysis processes?
AG: Cloud data warehouses are a powerful way to handle growing volumes of data because they allow a company to scale their compute power up or down, depending on business needs. For data processing workloads that are historically bottlenecked on legacy tools, there is now a way to get faster time to value when you use solutions that were purpose-built for the cloud. This saves not only hardware and overhead costs but also gives developers and data teams much more time back in their day to work on other business-critical projects.
Arawan Gajajiva was speaking to Ruairi J Mackenzie, Science Writer for Technology Networks