
Written by Florin Tech snippets for everybody
Latest articles
Online Evaluation Guardrails: Catching LLM Drift Before Users Do
Offline evals are necessary, but not enough. Once a system is in production, behavior drifts. Online evals are how you catch that drift…
Treating Prompts Like Code: A Production Prompt Lifecycle
Prompt work feels like magic until it breaks. The day I started treating prompts like code, everything became easier to debug: versions…
Building Golden Evaluation Datasets from Production Traces
I used to run evals on whatever dataset I had around. It was noisy, and I kept chasing false regressions that were just bad data. The fix…
Algorithms · 3 posts

Valid Parentheses
Problem Given a string containing just the characters '(', ')', '{', '}', '[', and ']', determine if the input string is valid. Description…

Climbing Stairs
Problem You are climbing a staircase with n steps. At each step, you can climb either 1 or 2 steps. Determine the number of distinct ways to…

Anagram
Problem Given two strings, a and b, write a function to determine if a is an anagram of b Description This is a classic algorithm problem…
LangChain · 6 posts
Schema-Driven Validation for Stable LLM Evaluations
My earliest eval runs looked fine until they didn't: a single malformed row could flip a whole report. That was on me. The fix was simple…
Why Averages Lie: Using Tags to Expose Hidden LLM Regressions
Averages hide real problems. I learned that the hard way: a bundle recommender looked stable overall, but a single region-specific slice was…
Tracing LLM Pipelines: From Bug Report to Root Cause
The fastest way I've found to debug an LLM app isn't logs. It's a trace. A good trace shows exactly where the chain broke, which inputs…
