Trusting backup job status instead of restores
The job says "success" while backing up the wrong paths, or to a disk that died quietly months ago. The only backup metric that matters is a timed, tested restore — which is why this audit performs one.
Infrastructure Audit
Most infrastructure grows by accretion: a server here, a panel there, backups someone configured years ago and nobody has tested since. An audit replaces assumptions with a document — what you have, where it's weak, what it costs, and what happens on your worst day. It's the cheapest engagement I offer and the one that prevents the expensive ones.
An infrastructure audit is a structured review of everything your systems run on — servers, cloud resources, security posture, performance headroom, backups, and disaster recovery — producing a written report of findings ranked by business risk, each with a concrete fix and effort estimate. Its defining test is simple: after the audit, you can answer "what happens if this fails?" for every component with a document instead of a guess.
Written by Ranjan Chatterjee, Infrastructure Consultant · Linux Server Specialist · 15+ years in production Linux · Last reviewed
Audits pay for themselves in the questions you can't currently answer. If any of these apply, that's the sign.
Every one of these comes from a real engagement — usually from before I was called.
The job says "success" while backing up the wrong paths, or to a disk that died quietly months ago. The only backup metric that matters is a timed, tested restore — which is why this audit performs one.
A hardened server with untested backups is one disk failure from disaster. Risk lives in the whole chain — exposure, redundancy, backups, and the human runbook — not in any single layer.
Instances sized for a traffic spike two years ago, storage nobody reclaimed, and duplicate services across providers — cost review routinely finds 30–50% of cloud spend doing nothing.
When everything is everyone's job, patching and monitoring quietly become no one's. The audit maps each component to an accountable owner — often the cheapest fix on the list.
An audit is a to-do list with prices, not an achievement. The roadmap section exists so the top three risks get dates and owners the same week the report lands.
An honest comparison — each option is right in some situations, including the free ones.
| Option | The right choice when… | Limits & risks |
|---|---|---|
| Internal self-audit | A capable team with time, using a good framework (CIS, provider well-architected reviews). Free, and builds internal knowledge of your own systems. | The blind spots that caused the risks also grade the homework. Teams consistently under-rate what they built and skip what they fear — restore tests above all. |
| Provider health checks | Free or cheap reviews from your cloud or host — decent at flagging obvious misconfigurations inside their own platform. | Scope stops at their product line, findings funnel toward their upsells, and nobody tests your backups or reads your architecture as a business. |
| Independent audit | Revenue-bearing infrastructure, an upcoming decision (scale, hire, migrate), post-incident clarity, or answering security questionnaires with evidence. Vendor-neutral by design. | Costs real money and needs read access or a walkthrough. The report is only worth what you execute — pair it with owners and dates, whoever implements it. |
The same disciplined path on every engagement — scoped, planned, executed with checkpoints, handed off clean.
A short brief or call to understand your stack, the real problem, and what a good outcome looks like.
A clear architecture plan — steps, risks, rollback and timeline — agreed before anything touches production.
Hands-on work with checkpoints. You see progress; nothing changes on your servers silently.
Documentation, access cleanup and a clear path for what comes next. No lock-in, no mystery.
Small operations have the least slack when something fails, which makes the audit more valuable, not less. A two-server audit is quick, cheap, and usually finds at least one silent risk worth its whole price.
A written report with findings ranked by business risk — each with a concrete fix and an effort estimate — a tested answer to "can we actually restore?", and a 6–12 month roadmap separating fix-now from plan-for from stop-paying-for. It's written to be executable by any competent engineer, not just me.
Most environments deliver within one week: a few days of review and testing, then the report. Larger or multi-provider estates take proportionally longer, quoted up front. Your team's time cost is small — access provisioning and an hour of walkthrough questions.
A fixed price by environment size — servers, providers, and complexity — quoted before work starts. It's deliberately the least expensive engagement I offer, because it's the one that prevents the expensive ones.
No — the report ranks findings by risk to your business, and "leave it alone, it's fine" is a finding I write regularly. The audit is deliberately decoupled from implementation so the advice stays honest.
Read-level access to servers and the provider console covers most of it. Where access is sensitive, I work from configuration exports and an engineer walkthrough instead.
Yes — an actual timed restore of real data, not a glance at job logs. It's the single most valuable step in the audit: roughly half of first-time audits discover their backups were incomplete, unrestorable, or slower to restore than the business could survive.
Yes — most real estates are hybrid: some AWS or DigitalOcean, a dedicated box at Hetzner, an office NAS someone forgot. The audit treats it as one system, because that's how it fails.
Annually as a baseline, or after any structural change — a migration, a major growth step, a new compliance requirement, an incident. Repeat audits are faster and cheaper than the first, since the map already exists and only the diff needs review.
Completely. An audit sees credentials, architecture, and costs — it's treated like the inside of your business, because it is. NDAs are welcome, findings are shared with no one, and case-study references are anonymized beyond recognition.
Plain-language definitions — so the report reads like information, not incantation.
Engagements that commonly pair with this one.
SSH, firewall, kernel, PHP, MySQL — locked down in layers, documented, auditable.
View serviceWeb server, PHP, MySQL, cache layers — tuned from measurements, not folklore.
View service24×7 monitoring, patching, backups, and incident response on a flat monthly retainer.
View serviceOne paragraph is enough: your stack, the symptom, and when you need it solved. Emergencies are answered first.