llmdevops

LLMs in DevOps - a quick take

Published: March 14, 2026

It feels like we suddenly woke up from a pleasant dream of eternal IT prosperity, and now find ourselves packed into a train of new magical technology, with no clear idea where it's heading. Whether it delivers us to an even brighter future or derails spectacularly around the next bend doesn't really matter at this point. All that's left is to look for the positives — and, all things considered, there seem to be quite a few.

The beginnings

Like many others in IT, my journey started with curiosity about new toys. First came AWS Code Whisperer and Copilot, circa 2023. Smarter code completion, then help with Bash and Python scripts, copy/paste into ChatGPT (with manual redaction for anonymization). Later, Cursor, and building larger chunks of Terraform and Pulumi.

It quickly became clear that LLMs can meaningfully ease the daily grind of a DevOps engineer — reviewing and analyzing logs, investigating post-mortem root causes, aggregating and analyzing multiple data sources, and so on. It also became possible to step back from implementation details and redirect the saved time and energy toward finding solutions rather than patching holes.

Documentation

"Is there documentation for this?" — anyone who has worked in IT long enough has heard that question. Who's got the bandwidth to write a thorough summary after implementing a change? Before moving on to the next task, you might jot down a few lines, though sometimes it's better not to bother at all — the only thing worse than no documentation is outdated documentation. Here, too, LLMs come to the rescue: a single sentence at the end of a chat thread, and you get a detailed write-up complete with diagrams and code references. The sheer volume and wordiness can quickly become a problem in itself, but new tools help here as well — wrap it all up as a RAG pipeline and use an LLM for semantic search.

Per aspera ad astra

The first serious attempt at Cursor rules: organizing and systematizing ad-hoc workflows, providing basic environment data — endpoints, snippets, output formats, and so on. Then claude.md, slash commands, agents, and eventually skills. With the right integrations in place — AWS, Kubernetes, Prometheus, Cloudflare, Grafana, databases (most already have MCP services) — the previously chaotic DevOps workflow, plagued by frequent and costly context switching, starts to actually feel like coherent system management.

Security, of course

This all sounds wonderful, and one could stop here — but security always finds a way to crash the party. You could simply hand over all admin credentials and just "be careful." With that approach, though, a serious incident is not a matter of "if" but "when." For example, you might mix up environment contexts and restart a service in production instead of staging. The model can also get creative — even without a direct database password, it might pull one from a Kubernetes secret and connect to the production database, printing the password to the console in the process (more on privacy in a moment). After all, its job is to help by any means available, and it's all too easy to click 'yes' without reading the suggested command first.

The safe approach is to significantly restrict production permissions for all components — not just to read-only, but also blocking access to sensitive data: Kubernetes secrets, AWS Secrets Manager or Parameter Store, production database tables, and so on. This needs to be done carefully, because a model asked to solve a problem may decide it needs production data after all, and can get remarkably creative about finding it — even when you explicitly tell it not to. I found it useful to have the model itself generate "penetration test" scenarios and then test them against my permission setup.

Privacy

The second concern is privacy. Either we accept that a large portion of our infrastructure details flow freely to the provider's servers and trust their privacy assurances, or we look for alternatives. We can use private models through AWS Bedrock, which gives us more control over data and potential compliance with GDPR, SOC2, HIPAA, or ISO. Another option is running models locally. I'm currently testing what local models can do on a Mac Studio with more RAM, but they still fall far short of the analytical capabilities and context window of Opus or even Sonnet, so sometimes you need to reach for something more powerful. Ultimately, a hybrid of Bedrock and local models seems optimal for my use case. The downside of Bedrock models is cost — API calls compared to an Anthropic subscription can add up surprisingly fast.

Agents

The entire world (at least the IT part) has recently gone wild about agents, so there was little choice but to go with the flow. From a security perspective, though, things look even more serious — especially since major incidents are not that rare, and happen even to the biggest players:

https://ucstrategies.com/news/claude-code-wiped-out-2-5-years-of-production-data-in-minutes-the-post-mortem-every-developer-should-read/
https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-database-called-it-a-catastrophic-failure/
https://particula.tech/blog/ai-agent-production-safety-kiro-incident

More on a proposed safe approach to autonomous DevOps agents in a separate note.

The bright future

Setting aside the extreme predictions about machines replacing humans any day now, the fact is that LLMs have irreversibly changed the nature of cloud infrastructure work, virtually eliminating the the slog through logs, uncaught exceptions, and exotic metrics. In theory, the time and energy saved could be used to reduce technical debt and achieve the mythical goal of automating everything — only to nobly automate yourself out of a job. All while acquiring a hard-to-cure atrophy of real skills along the way.