saahityaedams

03 Sep 2025

Thoughts on Infra Automation, Test Automation

I recently worked on some Infrastructure automation stuff and wanted to jot down my thoughts regarding it, and also compare it to Test automation.

This gig was mostly managing AWS infra with Terraform (stuff like alarms and logging for compliance, setting up ECS services, setting up CI/CD pipelines, trying to make a module to automate setting up a client env). I had some prior experience writing AWS CDK to setup infra to run some a service on ECS Fargate (mostly straightforward migration work using amazon internal golden path docs as guidance).

The primary thing that stood out to me was that large aspects of Infra automation code can be reliably generated by LLMs since its mostly straightforward and standard. The other thing was that there are still small nuances and stupid idiosyncracies when trying to setup AWS and Github infra (I gave up on trying to manage github action runners and secrets) in Terraform that get in your way and never really let you automate 100% E2E of your Infra.

Personally I think choosing to do Infra Automation is more nah-than-yeah atleast at small stage firms. I would say that it is worth investing effort for specific scenarios like when you are running multiple pilot deployments of a single-tenant system for different clients (but these are mostly enterprise problems). Also you do have to be pragmatic about it, prioritizing effort on the aspects with the most bang-for-the-buck and remembering that 100% automation is just not feasible. There are some good aspects of Infra Automation, which is that you have documented your infra just by the very inherent nature of writing code. I think my main blocker with Infra automation is that it seems to puts you into a very system builder mindset (against a product/solutions builder) and catalyses accidental complexity in your infrastructure. Your job at a small firm should be solving customer’s problems with a great product, not overcomplicating system architecture by writing multiple microservices and using a million infrastructure components.

Given all this, I think test automation should generally be prioritized much more (at every level - unit test coverage, integration tests, UI automation tests). We should just make pragmatic decisions to use the highest abstraction of compute platform that is appropriate and minimal infra automation. Writing tests have a positive and direct impact effect on the quality of the product and release velocity, especially when they act like a harness confirming that the system at all levels works as expected. I’ve found test automation to have a higher payoff for the team, since its a faster feedback cycle and it directly improves developer productivity.