What is Amazon Athena?
Serverless and interactive, Athena is an analytics service. Athena allows you to analyze data at a petabyte-scale easily. It supports both open-table and file formats and is built on open-source frameworks, such as Trino and Presto, and Apache Spark.
You have the ability to analyze data or build applications from an S3 Data Lake and numerous other data sources. Those sources could be on-prem or other cloud systems, you can utilize Python and SQL for example.
Data can come from operational databases, data warehouses, big data frameworks, transactional systems, S3 data lakes/data mesh and many more platform services. It all gets funneled into Athena for the user to integrate ML tools for business purposes, and to build applications to analyze and visualize data.
Use cases for Amazon Athena include running federated queries, preparing data for machine learning models, building distributed big data reconciliation engines and analyzing Google analytics data by working in concert with AWS AppFlow. The latter is done in concert with AppFlow, a fully managed integration service that helps transfer software as a service (SaaS) data to a data lake securely. Not only does AppFlow and Athena help to analyze Google data, but other big data such as SalesForce, SAP, and Zendesk.
Having the ability to integrate many other AWS services with Athena to be able to model and visualize your data is a great tool for building Quicksight Dashboards and other analytics models. You can analyze data from S3. AWS Glue provides the Glue Data Catalog which Athena uses to store and retrieve table metadata from S3 data which lets Athena know how to use it.
Athena supports the use of data from multiple formats, including CSV, TSV, JSON, Apache ORC, Apache Parquet, Apache Avro or Textfiles, and supports simple data types like integers, doubles, and varchars as well as complex data types such as maps, arrays, and structs.
The service also uses Apache Hive, a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale, for its DDL and creation/modification and deletion of tables or partitions, and utilizes SerDe libraries to tell Hive how to interpret the data formats.