AWS AthenaData analysis is a very complex process, and efforts have always been made to make it easy. There are many tools for analytics, and even the popular tech giant Amazon offers an AWS service called Amazon Athena. This Amazon Athena tutorial will guide you through the basic and advanced use of Amazon Athena. Amazon Athena is an interactive data analysis tool used to process complex queries in a relatively short amount of time. It is serverless. Hence, there is no hassle to set up, and no infrastructure management is required. It is not a database service. Therefore, you pay for the queries you run. You just point your data into S3, define the necessary schema, and you are good to go with a standard SQL. Learn all about Amazon Web Services with AWS Training. Introduction to Amazon AthenaOn 20 November 2016, Amazon launched Athena as one of its services. As mentioned earlier, Amazon Athena is a serverless query service that analyzes data using standard SQL stored in Amazon S3. With a few clicks in the AWS Management Console, customers can point Amazon Athena at their data stored in Amazon S3 and run queries using standard SQL to retrieve results in seconds. With Amazon Athena, there is no infrastructure to set up or manage, and the customer only pays for the queries they run. Amazon Athena scales automatically, executing queries in parallel, giving fast results even with large datasets and complex queries. Difference Between Microsoft SQL Server And Amazon Athena
Creating Table in AthenaWe use live resources, so you're only charged for the queries you run, but not for the datasets you use, and if you want to upload your data files to Amazon S3, then charges apply. To query S3 file data, you must have an external table associated with the file structure. We can create external tables in two ways:
To create an external table manually, follow the correct structure CREATE DETAILS CREATE EXTERNAL TABLE and specify the correct format and exact location. An example is given below: Creating an external table manuallyThe created external tables are stored in the AWS Glue Catalog. Glu Clover parses the input file structure and creates a metadata table defined in the Glu Data Catalog. The Crawler uses the AWS IAM (Identity and Access Management) role to allow archived data and data catalogs. You must have permission to pass roles to the Crawler to access crawled Amazon S3 paths. Go to AWS Glue, select "Add Table," and select the option "Add Table Using Crawler". Add tables using a glue crawler.Give the Crawler a name. Let's say, for example, a car-crawler Enter crawler name Choose the path in Amazon S3, where the file is saved. If you plan to query only one file, you can choose either the S3 file path or the S3 folder path to query all files in a folder with the same structure. Enter crawler nameChoose the path in Amazon S3, where the file is saved. If you plan to query only one file, you can choose either the S3 file path or the S3 folder path to query all files in a folder with the same structure. as the path. Create an IAM role with permissions to the S3 object whose target you want to query or select an existing IAM role (that has sufficient privileges to access the S3 object). Choose a database that contains external tables and optionally choose a prefix to be added to the external table name. Select Database and prefix for external tablesClick Finish to create the Glue Crawler
The external table has been created under the specified Database. Now you can query the S3 object using this.
Since we put a file, the query "select * from json_files" returns one record in the file. Let's try to put another file with the same structure in the same S3 folder and query the external table again. If you query the same EXTERNAL table, you will see two rows returned instead of one. When the same external table is queried, you will get two records. It is because there are two files in the S3 folder with the desired structure. You can perform many operations on the data. For example, the following Query will UNNEST the array in the result set. Accessing Amazon AthenaAthena is very easy to reach, and one can either: These are some of the ways to access Amazon Athena. By now, you know everything important about Amazon Athena, and let me tell you about the different features of Athena. Features of AthenaAmong the many services provided by Amazon, Athena is one of the services. It has several features that make it suitable for data analysis. Let's take a look at the different features one by one.
As of now, you will be impressed with AWS Athena. Now that you know a lot about Athena. Let's roll up our sleeves and understand Athena's work by doing a small demo. Demo (Comparison between Amazon Athena and MySQL)In this Amazon Athena tutorial, now we will compare MySQL and Athena and understand how even simple queries take less time to execute in Athena.
Select Query within the said range in MySQL. What are the limitations of Amazon Athena?
AWS Athena vs. AWS GlueSince its initial release in August 2017, AWS Glu has been operating as a fully-managed Extract, Transform and Load (ETL) service. It comes with three primary components:
AWS Glue helps you find and transform data sets and prepare them for discovery and querying with these tools. So, you should be able to use AWS Athena with AWS Glue. Subsequent data catalogs will create, store, and retrieve table metadata (or schemas) as queried by Athena. What are the advantages and disadvantages of using AWS Athena?AWS Athena, as it turned out, is a double-edged sword. The features that make it conveniently cheap and accessible are the ones that may limit you somewhat. Pros of AWS Athena
Cons of AWS Athena
How is AWS Athena priced?As we've already said, AWS Athena, follows a pricing schedule that charges you based on the queries you choose to run in your data analysis. Amazon calculates the bytes and then rounds them to the nearest megabyte, 10MB being the minimum amount per Query. You should expect to pay $5 for every terabyte (TB) of data you can afford. In the meantime, you will not be charged for failed queries, statements for managing partitions, as well as Data Definition Language (DDL) statements. But that's not all. Amazon further makes it possible for you to reduce per-query costs by as much as 30% to 90%. You just need to split, compress or convert your data to a columnar format.
Next TopicWhat is AWS Amplify
|