Logs

I wrote this post almost two years ago after the step change in coding capabilities of LLMs with the Claude 3.5 Sonnet. In some ways it feels like we've come a long way. In others, it feels like this...

In addition to the emergence of writing that reads like LLM slop, or is even just clearly LLM-written, even if not laden with tropes that makes it unbearable to read, I've noticed the adoption of...

LLMs love to generate websites with grids of cards that raise on mouseover. I am not a fan of this animation effect at all.

Experimenting with chainlink today and surprised how easy it makes it to ping pong review, testing and implementation between agents. The sqlite db is effectively the agent to agent communication...

Your /dev directory looks like a small civilization of agents, apps, old experiments, forks, and active services

I keep hearing and reading the phrase "load bearing" and even though I suspect it isn't always, I can only read it as LLM generated content

For those with access to Mythos, it does not appear to be some kind of step function change in vulnerability detection. Rather, it seems to show code agents are good at finding vulnerabilities and...

There's a lot of chatter about models getting worse after launch, during peak hours, on subscription plans vs. pay-per-use APIs.

I've started using Codex more consistently. It took me entirely too long to locate the plan usage page: https://chatgpt.com/codex/cloud/settings/analytics, roughly equivalent to Claude Code's...