---
date: "2024-03-23T08:47:09.000Z"
title: "2024-03-23"
tags: ["language_models","claude3","gpt4","connections"]
draft: false
---

> One of the greatest misconceptions concerning LLMs is the idea that they are easy to use. They really aren’t: getting great results out of them requires a great deal of experience and hard-fought intuition, combined with deep domain knowledge of the problem you are applying them to.

The [whole "LLMs are useful" section](https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/#llms-are-useful) hits for me.
I have an experience similar to Simon's and I also wouldn't claim LLMs are without issue or controversy.
For me they are unquestionably useful.
They help me more quickly and effectively get ideas out of my head and into the real world.
They help me learn more quickly and solve problems and answer questions as I'm learning.
They increase my capabilities as an individual in the same sort of way getting access to Google for the first time did.
These days, _not_ having access to a language model makes me feel like I've had an essential tool taken away from me, like not having a calculator or documentation.

<https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/>

---

The models `claude-3-opus`, `claude-3-sonnet` and `gpt-4` solved Connections today on their first tries today.
All but `opus` failed on subsequent tries.

`gpt-3.5.turbo` doesn't seem to understand it is supposed to be playing the game, not me.
`groq-llama-2` got just 1/4 correct.
`mistral-large` got 2/4.

As an intermediate player of this game, I would say today's puzzle was on the easier side.
The models don't usually do so well.

---

[BOOX Palma](https://shop.boox.com/products/palma) is a a mobile phone that uses an E Ink display.
I immediately wanted this when I saw it.

---

While working on a script that evaluates language models' ability to play Connections, I was using Cursor to make some improvements to the code.
On a whim, I decided to have it swap all my convenience print statements to log statements.
I've done this on a smaller scale but with a ~200 line file, practically, I knew it would probably take the model longer to do than for me to do myself.
As I watched Cursor regenerate my file via inference and show a diff, I was struck by the idea that running complex, multiple passes of LLMs on code bases with target instructions may become the new "[my code is compiling](https://xkcd.com/303/)".

---

An infinitely zoomable, Game of Life, simulating itself.
The simulation adjusts speed as you zoom so that each level moves at a perceptible speed, allowing you to really appreciate it.
Extremely well executed.

<https://oimo.io/works/life/>