ranjan@ranjan.info:~$ man services/infrastructure-audit

Infrastructure Audit

Know exactly what you're running — and what it would take to lose it

Most infrastructure grows by accretion: a server here, a panel there, backups someone configured years ago and nobody has tested since. An audit replaces assumptions with a document — what you have, where it's weak, what it costs, and what happens on your worst day. It's the cheapest engagement I offer and the one that prevents the expensive ones.

What is an infrastructure audit?

An infrastructure audit is a structured review of everything your systems run on — servers, cloud resources, security posture, performance headroom, backups, and disaster recovery — producing a written report of findings ranked by business risk, each with a concrete fix and effort estimate. Its defining test is simple: after the audit, you can answer "what happens if this fails?" for every component with a document instead of a guess.

Written by Ranjan Chatterjee, Infrastructure Consultant · Linux Server Specialist · 15+ years in production Linux · Last reviewed

ranjan@ranjan.info:~$ dmesg | tail

Signs you need this now

Audits pay for themselves in the questions you can't currently answer. If any of these apply, that's the sign.

  • You inherited infrastructure the last person built — and left undocumented
  • Nobody has actually restored a backup in the past year
  • The cloud bill grows every month and no one can say exactly why
  • One specific person is the only one who knows how production works
  • You're about to scale, hire, or hand off — and need the map first
  • An incident happened and the fixes were all patches, never structure
  • A client, insurer, or auditor asked security questions you answered from memory
  • "It works, don't touch it" is the operating policy for a revenue system
ranjan@ranjan.info:~$ cat scope.txt

What this covers

  • Security audit: exposure, patching, access, and configuration review
  • Performance audit: bottlenecks, capacity, and headroom
  • Cost optimization: oversized instances, unused resources, better-fit providers
  • Cloud architecture review (AWS, DigitalOcean, Hetzner, hybrid)
  • Backup strategy review — including an actual restore test
  • Disaster recovery planning: RTO/RPO you can defend
  • Architecture review with scaling recommendations
ranjan@ranjan.info:~$ grep -i "oops" ~/incidents.log

Mistakes that audits keep finding

Every one of these comes from a real engagement — usually from before I was called.

Trusting backup job status instead of restores

The job says "success" while backing up the wrong paths, or to a disk that died quietly months ago. The only backup metric that matters is a timed, tested restore — which is why this audit performs one.

Auditing security and ignoring recovery

A hardened server with untested backups is one disk failure from disaster. Risk lives in the whole chain — exposure, redundancy, backups, and the human runbook — not in any single layer.

Paying for peak capacity year-round

Instances sized for a traffic spike two years ago, storage nobody reclaimed, and duplicate services across providers — cost review routinely finds 30–50% of cloud spend doing nothing.

Having no owner per system

When everything is everyone's job, patching and monitoring quietly become no one's. The audit maps each component to an accountable owner — often the cheapest fix on the list.

Filing the report and changing nothing

An audit is a to-do list with prices, not an achievement. The roadmap section exists so the top three risks get dates and owners the same week the report lands.

ranjan@ranjan.info:~$ diff --options

DIY, provider support, or a specialist?

An honest comparison — each option is right in some situations, including the free ones.

OptionThe right choice when…Limits & risks
Internal self-auditA capable team with time, using a good framework (CIS, provider well-architected reviews). Free, and builds internal knowledge of your own systems.The blind spots that caused the risks also grade the homework. Teams consistently under-rate what they built and skip what they fear — restore tests above all.
Provider health checksFree or cheap reviews from your cloud or host — decent at flagging obvious misconfigurations inside their own platform.Scope stops at their product line, findings funnel toward their upsells, and nobody tests your backups or reads your architecture as a business.
Independent auditRevenue-bearing infrastructure, an upcoming decision (scale, hire, migrate), post-incident clarity, or answering security questionnaires with evidence. Vendor-neutral by design.Costs real money and needs read access or a walkthrough. The report is only worth what you execute — pair it with owners and dates, whoever implements it.

What you get

  • A written audit: findings ranked by risk, each with a concrete fix and effort estimate
  • A tested answer to "can we actually restore?" — not a checkbox
  • A 6–12 month roadmap: what to fix now, what to plan, what to stop paying for

Why work with me on this

  • 15+ years inside production Linux — this exact work, done at fleet scale
  • Founder-operator of two hosting platforms: I've owned the uptime, not just the ticket
  • Every change documented and reversible — you keep a written trail, not a mystery
  • Plain-language updates and honest timelines you can plan a business around
ranjan@ranjan.info:~$ ./engage --how

How it runs

The same disciplined path on every engagement — scoped, planned, executed with checkpoints, handed off clean.

  1. 01

    Scope

    A short brief or call to understand your stack, the real problem, and what a good outcome looks like.

  2. 02

    Plan

    A clear architecture plan — steps, risks, rollback and timeline — agreed before anything touches production.

  3. 03

    Execute

    Hands-on work with checkpoints. You see progress; nothing changes on your servers silently.

  4. 04

    Handoff

    Documentation, access cleanup and a clear path for what comes next. No lock-in, no mystery.

ranjan@ranjan.info:~$ faq --service infrastructure-audit

Common questions

We're a small operation — is an audit overkill?

Small operations have the least slack when something fails, which makes the audit more valuable, not less. A two-server audit is quick, cheap, and usually finds at least one silent risk worth its whole price.

What exactly do I get at the end?

A written report with findings ranked by business risk — each with a concrete fix and an effort estimate — a tested answer to "can we actually restore?", and a 6–12 month roadmap separating fix-now from plan-for from stop-paying-for. It's written to be executable by any competent engineer, not just me.

How long does an audit take?

Most environments deliver within one week: a few days of review and testing, then the report. Larger or multi-provider estates take proportionally longer, quoted up front. Your team's time cost is small — access provisioning and an hour of walkthrough questions.

What does it cost?

A fixed price by environment size — servers, providers, and complexity — quoted before work starts. It's deliberately the least expensive engagement I offer, because it's the one that prevents the expensive ones.

Will you try to sell us a rebuild?

No — the report ranks findings by risk to your business, and "leave it alone, it's fine" is a finding I write regularly. The audit is deliberately decoupled from implementation so the advice stays honest.

What do you need access to?

Read-level access to servers and the provider console covers most of it. Where access is sensitive, I work from configuration exports and an engineer walkthrough instead.

Do you actually test our backups?

Yes — an actual timed restore of real data, not a glance at job logs. It's the single most valuable step in the audit: roughly half of first-time audits discover their backups were incomplete, unrestorable, or slower to restore than the business could survive.

Can you audit cloud and on-premises together?

Yes — most real estates are hybrid: some AWS or DigitalOcean, a dedicated box at Hetzner, an office NAS someone forgot. The audit treats it as one system, because that's how it fails.

How often should we repeat an audit?

Annually as a baseline, or after any structural change — a migration, a major growth step, a new compliance requirement, an incident. Repeat audits are faster and cheaper than the first, since the map already exists and only the diff needs review.

Is our information confidential?

Completely. An audit sees credentials, architecture, and costs — it's treated like the inside of your business, because it is. NDAs are welcome, findings are shared with no one, and case-study references are anonymized beyond recognition.

ranjan@ranjan.info:~$ man glossary

Terms you'll meet in the report

Plain-language definitions — so the report reads like information, not incantation.

RTO
Recovery Time Objective — how long the business can afford to be down. Every architecture decision is quietly a bet on this number.
RPO
Recovery Point Objective — how much data you can afford to lose, measured in time. Your real RPO equals the age of your last tested backup.
Single point of failure
Any component whose failure alone takes the service down — a server, a person, a provider, a DNS account with one password.
DR plan
Disaster recovery: the written, tested path from "everything is gone" back to serving customers. Untested plans are hypotheses.
Right-sizing
Matching resources to measured need — the polite name for "you're paying triple for idle capacity".
Attack surface
Everything externally reachable: open ports, exposed panels, forgotten subdomains. The audit maps it before someone else does.
Runbook
The document that lets a competent stranger operate your systems — the antidote to key-person risk.
Restore test
Actually recovering data from a backup, timed and verified. The single highest-value item in this audit, and the most commonly skipped everywhere else.
ranjan@ranjan.info:~$ ssh [email protected]

Ready when you are

One paragraph is enough: your stack, the symptom, and when you need it solved. Emergencies are answered first.

Infrastructure Audit Book a consultation Emergency