Creating a Go module

[posts] April 22, 2016

Creating a Go module We’re going to create a CLI tool for sending a message to a channel in Slack using the command line. This post is similar to my earlier post: Creating an Elixir Module. We’ll be using the chat.postMessage Slack API endpoint. Also, make sure you have a Slack API token. Our CLI syntax will be: $ ./slack -message 'hello world!' -channel @slackbot First, make sure you have your $GOPATH set properly.

Quickstart `supervisor` guide

[posts] April 12, 2016

supervisor is a UNIX utility to managing and respawning long running Python processes to ensure they are always running. Or according to its website: Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems. Installation supervisor can be installed with pip $ pip install supervisor Given a script test_proc.py, start the process under supervisor as $ sudo supervisorctl start test_proc Now it will run forever and you can see the process running with

Querying S3 with Presto

[posts] April 11, 2016

Querying S3 with Presto This post assumes you have an AWS account and a Presto instance (standalone or cluster) running. We’ll use the Presto CLI to run the queries against the Yelp dataset. The dataset is a JSON dump of a subset of Yelp’s data for businesses, reviews, checkins, users and tips. Configure Hive metastore Configure the Hive metastore to point at our data in S3. We are using the docker container inmobi/docker-hive

Creating a Presto Cluster

[posts] March 25, 2016

Creating a Presto Cluster I first came across Presto when researching data virtualization - the idea that all of your data can be integrated regardless of its format or storage location. One can use scripts or periodic jobs to mashup data or create regular reports from several independent sources. However, these methods don’t scale well, especially when the queries change frequently or the data is ingested in realtime. Presto allows one to query a variety of data sources using SQL and presents the data in a standard table format, where it can be manipulated and JOINed like traditional relational data.

Creating an Elixir module

[posts] January 28, 2016

Creating an Elixir module To get a better handle on Elixir, I developed a simple CLI tool for sending files in Slack. To create a new project, run $ mix new slack_bot This creates a new Elixir project which looks like this ├── README.md ├── config │ └── config.exs ├── lib │ └── slack_bot.ex ├── mix.exs └── slack_bot ├── slack_bot_helper.exs └── slack_bot_test.exs Navigate to the lib folder and create a folder inside it called slack_bot.

Git aliases

[posts] November 9, 2015

Here’s a quick post for managing your git shortcuts. If you use git regularly, you should have a .gitconfig file in your home directory that looks something like this: [user] email = [email protected] name = Your name You can add an alias section like so: [user] email = [email protected] name = Your name [alias] ls = log --oneline uom = push -u origin master These aliases can be used like so:

PySpark dependencies

[posts] November 9, 2015

Recently, I have been working with the Python API for Spark to use distrbuted computing techniques to perform analytics at scale. When you write Spark code in Scala or Java, you can bundle your dependencies in the jar file that you submit to Spark. However, when writing Spark code in Python, dependency management becomes more difficult because each of the Spark executor nodes performing computations needs to have all of the Python dependencies installed locally.

Python Fabric

[posts] October 13, 2015

To help facilitate my blogging workflow, I wanted to go from written to published post quickly. My general workflow for writing a post for this blog looks like this: Create a post in _posts Write the post Run fab sync Here is the repo fab sync is a custom command that uses the magic of Fabric to stage, commit and push changes in my blog repo to Github. Next, Fabric uses an ssh session in the Python process to connect to the server on which my blog is hosted, pull down the newest changes from the blog repo and finally, build the Jekyll blog so that the changes are immediately reflected on this site.

Bash SSH host management

[posts] September 23, 2015

If you have a lot of servers to which you frequently connect, keeping track of IP addresses, pem files, and credentials can be tedious. SSH config files are great for this problem, but they don’t play well with bash. I wanted to store all of my hosts’ info in a config file but still have access to the HostNames since sometimes I just need the IP address of a server to use elsewhere.

Managing bash aliases

[posts] September 7, 2015

Bash aliases are great. Whether you use them to quickly connect to servers or just soup up the standard bash commands, they are a useful tool for eliminating repetitive tasks. I’m always adding new ones to optimize my workflow which, of course, lead to me create aliases to optimize that workflow. While there are more complete CLI alternatives for alias management like aka, I prefer two simple commands for managing my aliases, which I keep in ~/.