Amazon Redshift Federated Queries: Rise Of Query Engines
Here come the SQL query engines!
AWS added query services to Redshift with Spectrum which enabled users to query an S3 data lake. However, with the latest federated query updates, AWS is bringing Amazon Redshift in line with competitive query service offerings from not only Google and Microsoft, but other AWS services too.
What are federated queries?
Facebook PrestoDB popularized the concept of distributed SQL query engines when it open-sourced the project back in 2013.
Over the past couple of years, AWS, Google, Microsoft, and many others in the industry have accelerated the adoption of a distributed query engine model within their products. For example, AWS developed Amazon Athena on top of the Presto code base.
Here is how PrestoDB describes what is allows users to do:
Presto allows querying data where it lives, including Hive, Cassandra, relational databases or even proprietary data stores. A single Presto query can combine data from multiple sources, allowing for analytics across your entire organization.
Like PrestoDB and other query engine services, Amazon Redshift now supports federated queries that enable its customers the ability to query data across different databases, data warehouses, or data lakes. This follow previous support for federated queries in AWS Athena:
AWS Data Lake And Amazon Athena Federated Queries
AWS Redshift Federated Query Use Cases
The use cases that applied to Redshift Spectrum apply today, the primary difference is the expansion of sources you can query. For example, you can run a query on data in Amazon RDS for PostgreSQL, Amazon Redshift, and AWS S3 data lake. This allows Redshift customers the ability to incorporate live data from remote systems as part of your existing Redshift data stack.
Federated querying also allows you the ability to apply lightweight transformations on the fly, and load data into the target tables.
This is good news for current Redshift users as this adds new features that keep the service competitive with other AWS offerings, PrestoDB, Google BigQuery Omni, and other SQL query engine services.
How do the Amazon Redshift Federated Queries work?
First, you will need to do some set up to configure the service. AWS offers a tutorial that shows you how to get started using the Redshift federated query using AWS CloudFormation.
From a technical perspective, Amazon includes a query optimizer to determine the most efficient way to execute a federated query. Redshift will distribute a portion of the query directly into the target database to speed up query performance. This approach reduces the risk of moving large volumes of data over the network.
Reducing network overhead is an important strategy given the performance constraints associated with large data sets. This is why Google BigQuery Omni actually runs part of the query engine directly within AWS or Azure.
If you are planning to query contents of an AWS data lake, we suggest sure you are following best practices we detailed for Athena which apply to Redshift as well:
How To Create A Serverless, Zero Infrastructure, Zero Administration Data Lake With Amazon S3…
Amazon Redshift Federated Queries Vs
Amazon Redshift Spectrum had allowed you the ability to query your AWS data lake. In a sense, Redshift has had a form of federated queries for some time. However, the scope was limited to an AWS data lake.
The new capabilities follow an industry trend toward query engines supporting diverse data stores for data ingestion. For example, Amazon Athena, which is based on PrestoDB, has supported the concept of a federated query engine for some time. PrestoDB was conceived by Facebook as a federated SQL query engine.
The fact that Redshift supports a federated query engine model is a must-have, not a nice to have, feature for Redshift to remain relevant as a service.
Who should use the Redshift Federated Query Service?
The value proposition is targeted at existing Redshift users. If you are using a different federated query engine service, there is no compelling reason to switch. For example, if you are currently an Amazon Athena user, there is no reason to switch.
In a previous post, we discussed these use cases in detail:
How is AWS Redshift Spectrum different than AWS Athena?
On the plus side, AWS Redshift and AWS Athena can access the same AWS data lake. This means you can pilot Redshift by running queries against the same data lake used by Athena. Of course, this type of flexibility and efficiency assumes a properly architecture data lake.
If you are a Redshift user, Amazon Redshift Federated Queries offer flexibility, especially when deciding if you need to scale or add capacity to the system. For example, you can save you big dollars by adding a lifecycle process to move data out of Redshift to a data lake or by leaving data in place within RDS.
Why pay to store that data in Redshift when storing data in a lake or querying data in place is possible? As a result, these new Redshift query capabilities can give users more technical options and cost optimization opportunities. For example, you can minimize the need to scale Redshift with a new node, which can be an expensive proposition.
Getting Started With Amazon Redshift Federated Queries
A well-architected data lake will ensure your Redshift federated queries run quickly and incur minimal costs. The Openbridge zero administration data lake service is a perfect pairing for Redshift Federated Queries. Push data from supported data sources, and our service automatically handles the data ingestion to a Redshift supported AWS data lake.
Want to discuss Redshift federated querying or data lakes for your organization? Need a platform and team of experts to kickstart your data and analytics efforts? We can help! Getting traction adopting new technologies, especially if it means your team is working in different and unfamiliar ways, can be a roadblock for success. This is especially true in a self-service only world. If you want to discuss a proof-of-concept, pilot, project, or any other effort, the Openbridge platform and team of data experts are ready to help.
Reach out to us at hello@openbridge.com. Prefer to talk to someone? Set up a call with our team of data experts.
References
- What is a data lake? Strategy & success depend on practical data lake solutions
- Data Lakes? Big Myths About Architecture, Strategy, and Analytics
- AWS Lake Formation: Accelerating Data Lake Adoption
- Adobe Data Feeds: How to use a data lake and Amazon Athena for analytic insights
- Data lake vs data warehouse? Modern data management strategies
- Best practices for Amazon Redshift Federated Query | Amazon Web Services
Amazon Redshift Federated Queries: Rise Of Query Engines was originally published in Openbridge — All things data on Medium, where people are continuing the conversation by highlighting and responding to this story.
source https://blog.openbridge.com/amazon-redshift-federated-queries-rise-of-query-engines-26c6b51b0db1?source=rss----4c5221789b3---4