I’ve been skeptical about the whole “AI in DevOps” thing for a while. You know the drill. You ask an LLM to write a playbook to update Nginx, and it hallucinates a module that hasn’t existed since Ansible 2.4. Or worse, it gives you YAML that looks perfect but fails silently because it hallucinated a variable name.
So when I saw the tech preview for the Ansible Automation Platform (AAP) MCP server drop recently, I almost scrolled past it. Another “AI integration.” Great.
But then I actually looked at what MCP (Model Context Protocol) does. And I realized this isn’t just a chatbot writing code. This is a chatbot that has hands.
It changes the dynamic completely. Instead of pasting error logs into a chat window and hoping for a fix, the LLM can reach into your AAP instance, pull the logs itself, analyze the inventory, and—if you’re brave enough—trigger the remediation job. I spent the last weekend playing with this setup, and honestly? It’s the first time I’ve felt like AI is actually saving me time rather than just generating more text for me to debug.
Wait, What is MCP?
If you haven’t been paying attention to the open standards war lately (and I don’t blame you, it’s exhausting), MCP is basically a standardized way for AI models to talk to external tools. Think of it like a USB-C port for LLMs.
Before this, if you wanted an AI to interact with Ansible, you had to write a custom wrapper, maybe some janky Python script hitting the AAP API, and glue it all together with a specific model’s function-calling schema. Brittle. Annoying.
With the MCP server for AAP, you just spin up the server, point your MCP-compliant client (like Claude Desktop or a custom IDE extension) at it, and suddenly the model “knows” how to talk to Ansible. It exposes tools like list_jobs, get_job_events, and launch_job_template directly to the model context.
The “Oh No, It’s Broken” Scenario
Let me walk you through the exact moment this clicked for me. It was Tuesday night. I was messing around with a new deployment pipeline for a web app I manage. Naturally, the deployment failed. Red text everywhere.
Usually, my workflow is:
- Log into the AAP dashboard.
- Find the failed job ID.
- Scroll through 4,000 lines of standard out.
- Find the one line that says “Permission denied.”
- Facepalm.
- Fix it.
This time, I had the MCP server running locally, connected to my AAP controller. I just typed into my chat interface:
“Why did the last job on the ‘Web Servers’ inventory fail?”
Here’s what happened in the background (I watched the debug logs because I don’t trust magic):
- The model called
list_jobs(limit=1, status='failed'). - It got the JSON response with Job ID 442.
- It immediately called
get_job_events(job_id=442)and filtered for “failed”.
The response I got back wasn’t a generic “Check your permissions.” It was specific:
“Job 442 failed on host ‘web-01’ during the ‘Copy config file’ task. The error was ‘Destination /etc/nginx/sites-available is not writable’. It looks like the become_user didn’t escalate privileges correctly.”
That is useful. That saves me five minutes of clicking.
Setting It Up (Without Losing Your Mind)
The setup is surprisingly straightforward if you’re comfortable with JSON config files. You aren’t installing a plugin inside AAP; you’re running a lightweight server (usually a container or a local Python process) that bridges the gap.
Here is a rough example of how I configured my local client to talk to the Ansible MCP server. You’ll need your AAP controller URL and a token (please, for the love of security, create a dedicated token with limited scopes, don’t use your admin creds).
{
"mcpServers": {
"ansible-platform": {
"command": "uv",
"args": [
"run",
"mcp-server-ansible",
"--controller-url", "https://aap.internal.example.com",
"--api-token", "YOUR_TOKEN_HERE",
"--ssl-verify", "false"
]
}
}
}
(Side note: Yes, I set SSL verify to false because it’s my lab environment. Don’t do that in prod. I know you will, but I have to say you shouldn’t.)
The Scary Part: Letting AI Drive
Reading logs is one thing. That’s passive. It’s safe. The real power—and the real danger—comes when you let the model take action.
After finding the permission error, I tried something risky. I asked:
“Run the ‘Remediate Permissions’ job template on web-01.”
The model recognized the tool launch_job_template. It asked me to confirm the parameters (which is a nice safety rail built into the protocol). I clicked “Approve,” and boom—job launched. I watched the status update in real-time in the chat window.
This is where “ChatOps” finally feels real. We’ve been trying to do this with regex-based Slackbots for a decade, and it always sucked because you had to remember the exact syntax: !deploy app=web env=prod branch=main. If you missed a space, nothing happened.
With the MCP server, I can be sloppy. I can say “Fix the web permissions” or “Rerun that permission fix thing,” and because the LLM understands the context of available job templates, it figures it out.
Use Cases That Actually Make Sense
After a few days of tinkering, here are the patterns I think are actually viable for production (once this tech preview stabilizes):
1. The “What Changed?” Detective
Connecting the MCP server allows the AI to query audit logs. You can ask, “Who modified the firewall job template yesterday?” and get an instant answer. This is huge for post-incident reviews.
2. Parameter Validation
Before launching a big job, I had the AI check the inventory variables. “Check the variables for the ‘Database Patch’ job. Are we targeting the right version?” The model pulled the inventory data via MCP, read the target_version variable, and confirmed it was set to 14.2. Much faster than navigating the UI.
3. Generating Playbooks with Context
This is the big one. Instead of asking ChatGPT to write a generic playbook, you can ask your local MCP-connected model to “Write a playbook to install Redis that matches the style and tagging conventions of my existing ‘Apache’ playbook.” The model can read your existing templates (if you give it access) and mimic your style. No more fighting over indentation or variable naming conventions.
The Elephant in the Room: Security
Look, hooking an LLM directly into your automation platform is basically giving a robot the keys to your data center. If the model gets confused, or if someone prompt-injects it, could it launch a destructive job?
Theoretically, yes. That’s why the MCP implementation for Ansible is crucial. It respects the RBAC (Role-Based Access Control) of the token you provide. If the token can’t launch the “Destroy Cluster” job, neither can the AI.
My advice? Create a specific “AI User” in AAP. Give it Read Only access to most things, and Execute access only to specific, safe remediation job templates. Don’t give it System Admin privileges. Just don’t.
Is It Ready?
It’s a tech preview, so expect bugs. I had the connection drop twice, and once the model got stuck in a loop trying to fetch job events for a job that didn’t exist. But the potential is undeniable.
For the first time, I’m not just pasting code snippets. I’m having a conversation with my infrastructure. And it’s talking back.




