Running analytics on data spread across applications can be complex and time consuming. Data required for analytics is often spread across relational, key-value, document, in-memory, search, graph, object, time-series and ledger data stores. To analyze data across these sources, analysts build complex pipelines to extract, transform and load into a data warehouse so that the data can be queried. Accessing data from various sources requires learning new programming languages and data access constructs. Federated SQL queries in Athena eliminate this complexity by allowing users to query the data in-place from wherever it resides. Analysts can use familiar SQL constructs to JOIN data across multiple data sources for quick analysis, and store results in Amazon S3 for subsequent use.
Athena executes federated queries using Athena Data Source Connectors that run on AWS Lambda. AWS has open sourced Data Source connectors for Amazon DynamoDB, Apache HBase, Amazon Document DB, Amazon Redshift, AWS CloudWatch, AWS CloudWatch Metrics, and JDBC-compliant relational databases such MySQL, and PostgreSQL under the Apache 2.0 license. Customers can use these connectors to run federated SQL queries in Athena across these data sources. Additionally, using Athena Query Federation SDK, developers can build connectors to any data source to enable Athena to run SQL queries against that data source. Athena Query Federation Connector extends the benefits of federated querying beyond AWS provided connectors. Since connectors run on AWS Lambda, customers do not have to manage infrastructure or plan for scaling to peak demands.
Athena federated query is available in Preview in the us-east-1 (N. Virginia) region. Begin your Preview now by following these steps.
To learn more about the feature, please see documentation here.
To get started with using an existing connector, please follow this guide.
To learn how to build your own data source connector using the Athena Query Federation SDK, please visit this link.