2021.12.20 17:28

Download files from athena

For some use cases you can do the work where the data lives using SQL or Spark, but sometimes it's more convenient to load it into a language like Python or R with a wider range of tools. Presto , and Amazon's managed version Athena are very powerful tools for preparing and exporting data. They can query data accross data files directly in S3 and HDFS for Presto and many common databases via Presto connectors or Athena's federated queries.

They've got a very powerful query language and can process large volumes of data quickly in memory accross a cluster of commodity machines.

This is very robust and for large data files is a very quick way to export the data. I will focus on Athena but most of it will apply to Presto using presto-python-client with some minor changes to DDLs and authentication. There is another way, directly reading the output of a query as a CSV from S3, but there are some limitations.

PyAthena is a good library for accessing Amazon Athena, and works seamlessly once you've configured the credentials. However the fetch method of the default database cursor is very slow for large datasets from around 10MB up.

Instead it's much faster to export the data to S3 and then download it into python directly. I am focus on Athena for this example, but the same method applies to Presto using with a few small changes to the queries. S3 files. As we mentioned earlier, reading data from Athena can be done using following steps. Set Extract Type as Json. This feature avoids complex looping logic.

Here is how to configure to check Job Status Wait until Query execution is finished. When you create Athena table you have to specify query output folder and data input location and file format e.

Once you execute query it generates CSV file. For this, click here to start creating a Glue Crawler: console. For our example, all defaults in the add crawler wizard are OK. The only thing you need to do is to select the S3 location where you uploaded the CSV. After the crawler is ready, run it by selecting it from the console. Since ours is a small file, it would be done in a couple of minutes. We are almost ready to start querying. There is just one last thing to set up. Athena needs an S3 location to save the query results to.

This is unlike traditional database clients like MySQL Workbench where you send a query, get a result, and discard it. All Athena results are saved to S3 as well as shown on the console.

Select AwsDataCatalog as the data source, the database where your crawler created the table, and then preview the table data:. With over a decade of industry experience as everything from a full-stack engineer to a cloud architect, Harish has built many world-class solutions for clients around the world! Your email address will not be published. Save my name, email, and website in this browser for the next time I comment. Necessary cookies are absolutely essential for the website to function properly.

This category only includes cookies that ensures basic functionalities and security features of the website.

Peter Powers's Ownd

0コメント

1000 / 1000