Below you will find pages that utilize the taxonomy term “presto”
Posts
Querying S3 with Presto
Querying S3 with Presto This post assumes you have an AWS account and a Presto instance (standalone or cluster) running. We’ll use the Presto CLI to run the queries against the Yelp dataset. The dataset is a JSON dump of a subset of Yelp’s data for businesses, reviews, checkins, users and tips.
Configure Hive metastore Configure the Hive metastore to point at our data in S3. We are using the docker container inmobi/docker-hive
Posts
Creating a Presto Cluster
Creating a Presto Cluster I first came across Presto when researching data virtualization - the idea that all of your data can be integrated regardless of its format or storage location. One can use scripts or periodic jobs to mashup data or create regular reports from several independent sources. However, these methods don’t scale well, especially when the queries change frequently or the data is ingested in realtime. Presto allows one to query a variety of data sources using SQL and presents the data in a standard table format, where it can be manipulated and JOINed like traditional relational data.