---
date: "2024-12-17T21:29:34.000Z"
title: "2024-12-17"
tags: ["delta"]
draft: false
---

I've been setting up the foundations to add node summaries to Delta.
Ideally, I will use the same model to create the node summaries as I use to generate the responses since this will keep the model dependencies minimal.
However, my early experiments have yielded some inconsistency in how a shared prompt behaves across models. To try and understand this and smooth it out as much as possible, I plan to set up evals to ensure the summaries are

- within a certain length
- don't include direct references to the user or assistant
- are a single sentence or fragment
- include the relevant topic(s) discussed

I am currently looking for a straightforward, local option for running evals.
I'd like to avoid cloud products and testing across multiple models needs to be straightforward.
My ideal output would be a matrix of the different models run and the results of the different eval checks against the respective model outputs given the same system and user prompts.