Google organized the Google Data Cloud Summit 2022 conference sharing product announcements, data products’ strategy and roadmap, insights of their products, and customer success stories using Google Cloud Data products. The focus of the conference was primarily on solutions in the context of Artificial Intelligence (AI), Machine Learning, Data Analytics, and Cloud Databases. This article summarizes the key takeaways based on the trends and themes observed during the conference.

#1 — Multi-cloud is the new reality for modern data architecture

“Google is making a lot of progress in multicloud, which allows you to not have to think about the vendor and just adopt what you need to do the job well. ”
— Dave Johnson, VP of Informatics, Data Science, and AI at Moderna

  • Google recognized the growth of multi-cloud products and distributed data usage by the customers with AWS, Azure, and other cloud providers.
  • The launch of BigLake as a multi-cloud data lake is a step in the enablement towards supporting customers by harnessing the power of cloud products and data available across multiple providers.
  • BigQuery Omni has been supporting federated multi-cloud data analytics since 2020. 
  • Unified governance in a multi-cloud solution with data movement, discovery, lifecycle management, data quality, and more is the need of the hour and BigLake supports the same.
Google's New Offering - BigLake - Key Takeaways From Google Data Cloud Summit 2022
Google’s New Offering — BigLake (Image Source: Google Cloud Summit)

#2 — Data is limitless with Cloud as the backbone

Data has moved beyond the analyst and now impacts every employee, every customer, and every partner.
 — Gerrit Kazmaier, VP and GM of Database, Data Analytics, and Looker

  • Unifying data lakes and warehouses to provide limitless access to data is the key trend and the product innovations are supporting the cause.
  • Democratization of machine learning solutions by lowering the cost of Google Cloud products is a new way forward. Granular instance sizing for democratizing Spanner (preview) supporting 1/10th the size and cost is a great example.
  • BigLake (as a data lake storage engine), Spanner Change Streams (to track real-time changes and remove data limits by supporting data)
  • Click here to read more on the limitless data approach by Google.

#3 — Opensource and open data formats on-rise

Our commitment to open source and open data has led us to share datasets, services and software with everyone. 78% of Global IT leaders stated that multi/hybrid cloud support is a major consideration when selecting a cloud provider and 74% preferred open source cloud solutions.
 — Sources: Google, Google-commissioned IDG research

  • Open-source solutions such as Apache Spark (Unified engine for large-scale data analytics), Presto (Distributed SQL query engine), Apache Hadoop, and others continue to become industry-standard in enterprise data lakes and big data architectures. Google’s support for open-source has been phenomenal and will continue to evolve over the years bringing innovative solutions to the community.
  • Google BigQuery, a data warehouse solution launched almost 10 years back, now supports multiple Open source software (OSS). With BigLake expanding BigQuery capabilities to other object stores supporting open data formats such as CSV, JSON, Avro, Parquet, and ORC.
BigLake + BigQuery Expanding Data Capabilities - Key Takeaways From Google Data Cloud Summit 2022
BigLake + BigQuery Expanding Data Capabilities (Image Source: Google Cloud Summit)

#4 — Innovation with new products and features with Serverless offerings is the way forward

We remain committed to continued innovation with the leading data and analytics companies where our customers are investing. 
 — Gerrit Kazmaier, VP and GM of Database, Data Analytics, and Looker

In continuation of their innovation culture, Google announced the preview launch of the following products:

  • BigLake is a new data lake storage engine that makes it easier for enterprises to analyze the data in their data warehouses and data lakes supporting multiple clouds.
  • Spanner Change Streams, allows customers to track changes within their Spanner database and easily access and integrate this data with other systems to unlock new value from data.
  • Analytics Hub, a data exchange platform for data, ML models, or other analytics assets to increase the ROI of data initiatives.
  • Data Lineage to record, visualize and understand the relationship between data assets based on the flow of data.

Additionally, the following products are now generally available:

  • Vertex AI Workbench— provides a one-stop solution for data science professionals with visual and code-base integration for the entire workflow. Additionally, Vertex AI provides the support for the end-to-end ML journey:
Vertex AI - Key Takeaways From Google Data Cloud Summit 2022
Vertex AI (Source: Google Cloud Summit)
  • Live Migrations from Apache HBase to Cloud Bigtable with features such as Schema Translation, HBase Bigtable Replication, and Migration Validation.
  • Spark on Google Cloud now supports Serverless offering as the industry’s first Serverless Spark for all workloads.
  • Dataflow Prime, which is a serverless data processing platform and uses a compute and state-separated architecture is now generally available.
  • Dataplex, which is an intelligent data fabric to unify distributed data and automate data management and governance, is now generally available.
  • Looker + Data Studio Integration is now available, which brings together ease of use and flexibility in terms of self-service business intelligence.

#5 — Partnership goes a long way toward a sustainable ecosystem

Closing the data-to-value gap with data innovations would not be possible without Google’s partner ecosystem. More than 700 software partners power their applications using Google’s data cloud.
 — Gerrit Kazmaier, VP and GM of Database, Data Analytics, and Looker

  • Databricks, MongoDB, Fivetran, Neo4j, Elastic, ThoughtSpot, and Redis are launching significant new capabilities for their customers using Google Cloud.
Google's Partner integration ecosystem is evolving
Google’s Partner integration ecosystem is evolving
  • Customer success stories covered during the summit included Walmart, Wayfair, and Vodafone’s usage of Google Cloud’s AI & ML capabilities, C3 AI usage of personalized AI for financial applications, and Forbes usage of Google Cloud and MongoDB as a data platform.

In addition to the above, the following Google Cloud products were also got mentioned across sessions:

  • Dataproc — a fully managed and highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks.
  • Vertex AI —To build, deploy, and scale ML models faster, with pre-trained and custom tooling within a unified artificial intelligence platform.
  • Cloud Spanner — A fully managed relational database with unlimited scale, strong consistency, and up to 99.999% availability.
  • Cloud SQL — A fully managed relational database service for MySQL, PostgreSQL, and SQL Server. Cloud SQL Insights is an observability solution augmenting the service, which now supports MySQL.
  • BigQuery —A Serverless, highly scalable, and cost-effective multi-cloud data warehouse designed for business agility.
  • Looker — A modern business intelligence, embedded analytics, and data application platform.
  • Google Data Studio —A business intelligence tool for data analysis together, creating compelling visualizations and sharing insights.
  • Spark for Cloud — Industry’s first autoscaling serverless Spark, integrated with the best of Google-native and open source tools supporting GKE and Serverless deployment options.
  • Cloud Data Fusion — A fully managed, cloud-native data integration at any scale.

Additionally, you can use Google Cloud Cheatsheet to explore the catalog of their entire product suite interactively by clicking here or using this GitHub repository.

To conclude, Google Cloud is a key competitor in the data, analytics, artificial intelligence, and machine learning space with respect to its peers (Azure & AWS). The product announcements, product roadmaps, product insights, and customer success stories strengthen Google Cloud’s positioning around cloud data capabilities.


All data and information provided on this blog are for informational purposes only. All the images sources used are from Google Cloud Summit for reference. The author makes no representations as to the accuracy, completeness, correctness, suitability, or validity of any information on this blog and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. This is a personal view and the opinions expressed here represent my own and not those of my employer or any other organization.


Leave a Comment