While big data analytics have always been evolving to become more powerful, faster, and more efficient, the last 18+ months of COVID-19 have sped up the tape significantly. The pandemic dispersed knowledge workers to work remotely while increasing their need for data-driven answers to wholly new situations and scenarios that no one had ever thought about.
Gartner notes that “When COVID-19 hit, organizations using traditional analytics techniques that rely heavily on large amounts of historical data realized one important thing: Many of these models are no longer relevant.”
As a result, 2021 has seen more changes to data analytics practices and tactics than ever, which are likely to continue to evolve into 2022.
Table of Contents
1. The rise of the data lakehouse
Up until recently, enterprises included both data lakes, which are a low-cost way to store data, and data warehouses, which structure data for swift responses to queries, in their data and analytics ecosystems. Data pipelines draw data from the data lake for batch loading to the data warehouse.
But a new trend is appearing of using a single unit for both purposes. Some data warehouses have begun separating costs for storage and computing, so storing data in a warehouse is no longer so expensive. At the same time, some data lakes started adding support for SQL analytics.
The new “data lakehouse” offers inexpensive data storage together with data structures and management features, so when choosing between data warehouses like Bigquery vs Redshift, for example, you’ll also want to think about how much you want to separate the twin tasks of data storage and data management.
2. Metadata enters the spotlight
As datasets become larger, and the questions that data is expected to answer become more complex and more urgent, context becomes even more important. Enterprise stakeholders need to know where data is coming from, what column names mean, and how data interrelates with other data points, increasing the demand for accurate and up-to-date metadata.
While metadata is crucial for accurate analytics, the challenge of keeping it organized is real. Data science teams need to curate metadata without slowing down operations, so users can still send and receive answers to queries at high speeds. As a result, new solutions are focusing on bringing automation and smart processes to metadata as well.
3. Data quality takes a step forward
As datasets grow bigger and data sources become more disparate, the challenge of ensuring data quality looms larger. Businesses are discovering new ways to improve the quality of their data so as to ensure that end users can trust the insights they generate.
New data quality steps include incorporating data profiling into data catalogs, instead of using it as a standalone component. Companies are also adopting data quality checks that pick up on anomalies in datasets before they continue through the data pipeline and skew the reports. At the moment these are mostly manual, but automated data quality rules based on trends in the data are on their way.
Finally, organizations are borrowing a trick from the unit testing frameworks used by software engineers and writing data quality tests into the data pipeline. These frameworks too can detect data quality issues before they undermine workflows.
4. Data and analytics systems embrace mix and match
Until recently, companies tended to use the same vendor for all their data analytics needs, using a single cohesive solution. But now they are breaking free from vendor lock-in and composing their own data analytics applications by borrowing components from different solutions.
Composable data and analytics promote collaboration between departments, open up access to data insights, and help organizations become more productive and agile.
5. Data draws closer to the edge
Stakeholders are constantly ramping up their demands for speed and accuracy from data analytics, pushing more data ecosystems towards edge computing. Enabled by 5G networks, data analysis tools are increasingly shifting away from traditional data centers and even from cloud environments, relocating to edge networks.
As they grow closer to physical assets, latency drops dramatically and analytics grows ever closer to real-time. Edge data analytics can open up access to data for different parts of the business, including situations where data privacy regulations prevent datasets from being removed from specific geography.
6. Big data goes small and wide
According to Gartner, the complete discombobulation of the pandemic forced companies to abandon their carefully-nurtured massive datasets and start again from scratch. It wasn’t possible to use data from the pre-COVID world to resolve post-COVID questions, so in place of long-gathered big data, enterprises switched focus to “small and wide” data.
Wide data means that small but highly varied (wide) structured and unstructured data sources are taken together to create a highly contextualized picture of business situations, helping increase accurate awareness of new problems. While small data is naturally more restricted in scope than big datasets, today’s analytics are still able to mine it for useful insights which are more accurate than outdated big data.
Big data analytics is growing up fast
Thanks partly to the pandemic, and partly to the natural evolution of new technology, data analytics are changing rapidly. 2021 can expect to see an ongoing shift from big to small and wide data, the adoption of edge data processing, composable data systems, metadata, and hybrid data storage and processing operations, as well as new tactics for data quality.