---
date: "2024-01-28T19:20:12.000Z"
title: "2024-01-28"
draft: false
---

I'm currently working on building a language model based chatbot that can answer questions about the contents of a database.
There are a lot of products and libraries making efforts at this problem.
To start, I tried out the [Vanna.ai](https://github.com/vanna-ai/vanna) open source library.
I followed [this guide](https://vanna.ai/docs/postgres-openai-standard-chromadb.html) to get started with ChromaDB for the indices and OpenAI as the language model to query a Postgres database.
I also set up a Postgres database with Docker and the Chinook dataset.
I downloaded the Chinook dataset for Postgres from this [repo](https://github.com/xivSolutions/ChinookDb_Pg_Modified/tree/master).
The dataset is described in detail [here](https://docs.yugabyte.com/preview/sample-data/chinook/).
To start up Docker and load the data, I ran the following from my host machine (not inside a Docker container)

```sh
# start postgres
docker run --name vanna-postgres -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -d postgres

# create the database
createdb -h localhost -p 5432 -U postgres chinook

# load the data
psql -h localhost -p 5432 -U postgres chinook -1 -f ~/path.to/chinook_pg_serial_pk_proper_naming.sql
```

Vanna indexes the database content into, then can take a question and convert it into a SQL query.
Finally, it executes a SQL query and returns the results as a dataframe.