The Canon Laws of IT

A practical field guide to the rules, razors, laws, and patterns that show up again and again in software, infrastructure, security, operations, and product delivery.

How to Use This Guide

These “laws” are not laws of physics. They are durable heuristics: useful shortcuts for spotting risk, diagnosing dysfunction, and making better technical decisions.

Use them when you are:

Planning projects and estimating delivery
Designing systems and services
Reviewing incidents and outages
Defining metrics and incentives
Managing technical debt
Improving reliability, security, and operations

The Core Canon

1. Pareto Principle, or the 80/20 Rule

Roughly 80% of outcomes often come from 20% of causes.

In plain English	In IT
A minority of inputs usually drives the majority of results.	A small number of bugs, users, systems, vendors, or workflows often accounts for most incidents, load, revenue, or pain.

Examples

20% of services generate 80% of production incidents.
20% of customers create 80% of support volume.
20% of queries consume 80% of database resources.

How to apply it

Focus first on the few things that create disproportionate impact. Before spreading effort evenly, identify the hotspots.

2. Parkinson’s Law

Work expands to fill the time available for its completion.

In plain English	In IT
The more time allocated, the more work seems to appear.	A migration, refactor, or implementation often grows to match the schedule rather than the actual minimum scope.

Examples

A task estimated at two weeks consumes exactly two weeks, even though the core work took two days.
A “quick cleanup” becomes a platform redesign because the deadline allowed it.

How to apply it

Use timeboxes, crisp acceptance criteria, and staged delivery. Avoid giving vague work unlimited runway.

3. Hofstadter’s Law

It always takes longer than you expect, even when you take Hofstadter’s Law into account.

In plain English	In IT
Complexity hides in the details.	Integrations, migrations, security reviews, edge cases, and production hardening usually take longer than expected.

Examples

“Just add SSO” becomes weeks of identity-provider quirks, claims mapping, testing, and rollout issues.
“Simple API integration” turns into data model mismatches, rate limits, retries, and authentication problems.

How to apply it

Pad estimates for unknowns. Treat “simple” work as suspicious until interfaces, dependencies, and edge cases are understood.

4. Murphy’s Law

Anything that can go wrong eventually will.

In plain English	In IT
Fragile assumptions will eventually break.	Untested backups, single points of failure, weak alerts, and manual procedures fail at the worst time.

Examples

The only server without monitoring is the one that fills its disk.
The disaster recovery process fails because nobody has tested restore.
The one missing DNS record blocks the launch.

How to apply it

Design for failure. Test recovery paths. Assume humans make mistakes and systems fail under stress.

5. Conway’s Law

Organizations design systems that mirror their communication structures.

In plain English	In IT
The org chart leaks into the architecture.	Separate teams often produce separate services, interfaces, queues, and ownership boundaries.

Examples

A siloed company builds a siloed platform.
A fragmented ownership model creates fragmented user experiences.
A backend team, frontend team, and data team may produce systems with awkward seams between each layer.

How to apply it

If the architecture is painful, inspect the team boundaries. Sometimes the fix is organizational, not technical.

6. Brooks’s Law

Adding people to a late software project makes it later.

In plain English	In IT
More people do not automatically mean faster delivery.	New people require onboarding, coordination, meetings, code review, and context transfer.

Examples

A late project gets five new engineers and slows down further.
Senior engineers stop building because they are now onboarding everyone else.

How to apply it

When a project is late, reduce scope, remove blockers, clarify priorities, or split work cleanly. Do not assume staffing alone solves lateness.

7. Goodhart’s Law

When a measure becomes a target, it ceases to be a good measure.

In plain English	In IT
People optimize what they are measured on, even if that harms the real goal.	Bad metrics create bad behavior.

Examples

Measuring developers by tickets closed encourages shallow work.
Measuring support by handle time discourages deep problem-solving.
Measuring uptime alone can discourage necessary maintenance.

How to apply it

Use balanced metrics. Pair speed with quality, volume with outcomes, and reliability with customer impact.

8. Campbell’s Law

The more a metric is used for important decisions, the more likely it is to be corrupted.

In plain English	In IT
High-stakes metrics get gamed.	Teams may reclassify incidents, manipulate scope, or optimize reporting rather than reality.

Examples

Incidents are downgraded to protect SLA numbers.
Teams avoid logging defects because defect counts are punished.
“Velocity” is inflated by slicing work into artificial tickets.

How to apply it

Audit metrics. Watch for gaming. Reward transparency, not just good-looking dashboards.

9. Occam’s Razor

The simplest explanation that fits the facts is usually the best starting point.

In plain English	In IT
Do not overcomplicate the diagnosis.	The outage is more likely caused by a recent deploy than by an exotic hardware or kernel issue.

Examples

The app broke right after a config change; check the config first.
Latency spiked after a new query shipped; inspect the query before blaming the network.

How to apply it

Start with the obvious and recent. Escalate to exotic explanations only when evidence demands it.

10. Hanlon’s Razor

Never attribute to malice what can be adequately explained by mistake, confusion, or poor process.

In plain English	In IT
Most bad outcomes are not caused by bad people.	Incidents often come from unclear ownership, weak documentation, hidden dependencies, or bad incentives.

Examples

Security was bypassed because the process was unclear, not because the team did not care.
A production change was risky because the release checklist was incomplete.

How to apply it

Blame systems before blaming people. Improve process, tooling, documentation, and incentives.

11. Gall’s Law

A complex system that works is almost always evolved from a simple system that worked.

In plain English	In IT
Working complexity usually grows from working simplicity.	Big-bang platforms often fail; small reliable systems can evolve into larger ones.

Examples

Start with one reliable workflow before building an orchestration platform.
Build a simple API before designing a full event-driven ecosystem.

How to apply it

Prove the simple version first. Let architecture earn complexity over time.

12. The Ninety-Ninety Rule

The first 90% of the code takes 90% of the time. The remaining 10% takes the other 90%.

In plain English	In IT
The final stretch is usually bigger than it looks.	Production readiness, edge cases, documentation, security, observability, and rollout take significant time.

Examples

The prototype works in a day; the production version takes a month.
The feature is “done” except for permissions, tests, logging, migrations, and support docs.

How to apply it

Do not confuse demo-ready with production-ready. Track the last-mile work explicitly.

13. Knuth’s Optimization Warning

Premature optimization is the root of much unnecessary complexity.

In plain English	In IT
Do not optimize before you know what matters.	Scaling, caching, sharding, and abstraction can make systems worse if applied too early.

Examples

Building microservices before there is enough load or team maturity.
Adding caching before measuring the bottleneck.
Designing for imaginary scale while ignoring current reliability problems.

How to apply it

Measure first. Optimize the bottleneck, not the architecture diagram.

14. Postel’s Law

Be conservative in what you send and liberal in what you accept.

In plain English	In IT
Emit clean, predictable output; tolerate reasonable input variation.	APIs should produce strict, documented responses while handling minor client differences safely.

Examples

Return consistent JSON schemas.
Accept harmless differences in casing or optional fields where appropriate.
Avoid breaking clients with needless response changes.

How to apply it

Use with care. Tolerance helps compatibility, but excessive tolerance can hide bugs and security issues.

15. Linus’s Law

Given enough eyes, all bugs are shallow.

In plain English	In IT
More review and visibility can expose problems faster.	Open code review, logging, monitoring, and shared ownership make defects easier to detect.

Examples

A peer review catches a security flaw before release.
Public dashboards reveal an issue that one team missed.

How to apply it

Make systems observable. Make changes reviewable. Encourage broad ownership without creating chaos.

16. Metcalfe’s Law

The value of a network grows with the number of connected users or nodes.

In plain English	In IT
Networks become more useful as more people or systems join.	Collaboration platforms, marketplaces, identity systems, and internal developer platforms gain value with adoption.

Examples

A shared CI/CD platform becomes more valuable as more teams standardize on it.
A company directory, chat system, or knowledge base improves as participation grows.

How to apply it

For platform work, adoption is part of the product. Invest in onboarding, documentation, and trust.

17. Amdahl’s Law

The speedup of a system is limited by the portion that cannot be parallelized.

In plain English	In IT
You cannot scale past the serial bottleneck.	More workers do not help if every request waits on one locked database row or one single-threaded process.

Examples

Adding servers does not fix a global database lock.
More build agents do not help if all builds wait on one shared dependency step.

How to apply it

Find the serial bottleneck before adding capacity. Parallelism only helps the parallelizable parts.

18. Little’s Law

Work in progress equals throughput multiplied by cycle time.

In plain English	In IT
Too much work in progress increases waiting time.	A team with too many open tickets, projects, or incidents will move slowly even if everyone is busy.

Examples

Ten half-finished projects create less value than three completed ones.
Large queues in support or engineering mean longer cycle times.

How to apply it

Limit work in progress. Finish more, start less.

19. CAP Theorem

In a distributed system, during a network partition, you must choose between consistency and availability.

In plain English	In IT
Distributed systems require tradeoffs.	When nodes cannot communicate, a system must either keep serving possibly stale data or reject some requests to preserve correctness.

Examples

A shopping cart may remain available with eventual consistency.
A banking ledger may reject writes rather than risk inconsistent balances.

How to apply it

Know which failure mode is acceptable for each system. Not every service needs the same consistency model.

20. The Fallacies of Distributed Computing

Common assumptions about networks are usually false.

Fallacy	Reality
The network is reliable.	It fails.
Latency is zero.	It is not.
Bandwidth is infinite.	It is limited.
The network is secure.	It must be protected.
Topology does not change.	It changes constantly.
There is one administrator.	There are many owners and policies.
Transport cost is zero.	Calls have cost.
The network is homogeneous.	Environments differ.

Examples

A service call times out halfway through a workflow.
A region has higher latency than expected.
A retry storm makes an outage worse.

How to apply it

Design for timeouts, retries, idempotency, backoff, partial failure, observability, and graceful degradation.

Operational Laws

21. You Build It, You Run It

Teams that build systems should share responsibility for operating them.

In plain English	In IT
Ownership should include production consequences.	Developers who get paged tend to build more observable, reliable, and maintainable systems.

How to apply it

Align development and operations. Make production feedback visible to the people who can improve the system.

22. The Toil Rule

Repetitive manual operational work should be reduced or automated.

In plain English	In IT
If humans repeat the same runbook often, the system is asking for automation.	Manual deploys, recurring restarts, repeated access fixes, and hand-built reports are toil.

How to apply it

Automate high-frequency, low-judgment work. Save human attention for exceptions and design decisions.

23. The Error Budget Principle

Reliability targets should guide release velocity.

In plain English	In IT
Reliability is a budget, not an infinite demand.	If a service is healthier than its target, teams can ship faster. If it burns too much error budget, reliability work takes priority.

How to apply it

Define service-level objectives. Let reliability data drive tradeoffs between speed and stability.

24. Mean Time to Recovery Beats Mean Time Between Failure

In complex systems, fast recovery is often more realistic than perfect prevention.

In plain English	In IT
Failures are inevitable; recovery speed matters.	Good rollback, alerting, runbooks, and ownership reduce customer impact.

How to apply it

Invest in detection, rollback, restore, failover, and incident response. Prevention matters, but recovery is the safety net.

25. The Single Point of Failure Rule

Anything critical with no redundancy will eventually become a problem.

In plain English	In IT
One fragile dependency can bring down the whole system.	A single server, person, vendor, admin account, DNS provider, or undocumented script can become a major risk.

How to apply it

Identify and remove single points of failure across technology, people, vendors, and process.

Security and Risk Laws

26. Principle of Least Privilege

Give users, systems, and processes only the access they need.

In plain English	In IT
Access should be minimal by default.	A read-only service should not have admin database credentials.

How to apply it

Use role-based access, scoped tokens, short-lived credentials, and regular access reviews.

27. Defense in Depth

Use multiple layers of protection.

In plain English	In IT
No single control is enough.	MFA, endpoint protection, segmentation, patching, backups, logging, and least privilege work together.

How to apply it

Assume one layer can fail. Design additional layers that limit blast radius and improve detection.

28. Zero Trust Principle

Never trust automatically; continuously verify.

In plain English	In IT
Network location alone should not imply trust.	Internal systems should still authenticate, authorize, log, and validate requests.

How to apply it

Verify identity, device posture, access scope, and context. Treat internal networks as potentially hostile.

29. Schneier’s Law

Anyone can invent a security system they personally cannot break.

In plain English	In IT
Security designs need external scrutiny.	Homegrown crypto, custom auth, and clever access schemes are usually dangerous.

How to apply it

Use proven standards, peer review, threat modeling, and independent security testing.

30. Chesterton’s Fence

Do not remove a rule, process, or system until you understand why it exists.

In plain English	In IT
Some ugly legacy things exist for real reasons.	That strange firewall rule, cron job, or manual approval may be protecting against a forgotten failure mode.

How to apply it

Investigate before deleting. Replace legacy controls intentionally, not casually.

Data and Product Laws

31. Garbage In, Garbage Out

Bad input produces bad output.

In plain English	In IT
Poor data quality ruins downstream systems.	Analytics, reports, automation, machine learning, and AI tools all fail when source data is wrong.

How to apply it

Validate inputs, define ownership, monitor quality, and fix data at the source where possible.

32. The Map Is Not the Territory

Models, dashboards, and diagrams are simplifications of reality.

In plain English	In IT
Representations are useful but incomplete.	Architecture diagrams, monitoring dashboards, roadmaps, and metrics can omit critical reality.

How to apply it

Use dashboards and models as aids, not truth itself. Validate them against production behavior and user experience.

33. Technical Debt Accrues Interest

Shortcuts create future costs.

In plain English	In IT
A quick fix today can slow every future change.	Missing tests, poor naming, brittle deployments, weak documentation, and rushed schemas compound over time.

How to apply it

Track debt explicitly. Pay down debt when it slows delivery, increases risk, or blocks important work.

34. The Law of Leaky Abstractions

All non-trivial abstractions eventually expose details they were meant to hide.

In plain English	In IT
Abstractions simplify work until they leak.	ORMs expose SQL performance issues; cloud services expose networking limits; APIs expose data model assumptions.

How to apply it

Learn the layer beneath your abstraction. Debugging often requires understanding what the abstraction hides.

35. The Principle of Locality

Related things should be close together.

In plain English	In IT
Systems are easier to understand when related logic, ownership, and data are near each other.	Scattered config, split ownership, and hidden dependencies increase cognitive load.

How to apply it

Group related code, documentation, alerts, runbooks, and ownership. Reduce unnecessary distance between cause and effect.

Decision-Making Razors

36. The Reversibility Principle

Make reversible decisions quickly and irreversible decisions carefully.

In plain English	In IT
Not every decision deserves the same weight.	A UI copy change and a database partitioning strategy should not go through the same process.

How to apply it

Classify decisions by reversibility. Move fast on reversible choices; slow down for one-way doors.

37. The Blast Radius Principle

Design changes so failure affects the smallest reasonable area.

In plain English	In IT
Fail small instead of failing everywhere.	Use feature flags, canaries, phased rollouts, tenant isolation, and scoped permissions.

How to apply it

Before shipping, ask: “If this fails, who is affected, how badly, and how quickly can we stop it?”

38. The Second System Effect

The second version of a system is often overdesigned.

In plain English	In IT
After living with a simple first system, teams may overcorrect and build an overly ambitious replacement.	A modest internal tool gets replaced by a grand platform that tries to solve every possible future problem.

How to apply it

Be careful with rewrites. Replace systems incrementally where possible and keep scope grounded in actual needs.

39. The Bus Factor

The risk of a project depends on how many people must disappear before it is in trouble.

In plain English	In IT
Critical knowledge held by one person is a risk.	One engineer knows the deployment process, one admin owns DNS, or one analyst understands billing data.

How to apply it

Document critical systems, cross-train, rotate support, and avoid single-person ownership of essential knowledge.

40. The Principle of Boring Technology

Prefer boring, proven technology for critical systems unless novelty creates clear value.

In plain English	In IT
New technology has hidden costs.	Hiring, debugging, security, operations, and vendor maturity matter as much as features.

How to apply it

Spend innovation tokens carefully. Use novelty where it matters; use boring tools where reliability matters more.

Quick Reference Table

Law / Principle	Main Lesson	Watch For
Pareto Principle	Focus on the vital few.	Hotspots, noisy systems, high-impact users
Parkinson’s Law	Timeboxes shape work.	Vague deadlines, scope creep
Hofstadter’s Law	Work takes longer than expected.	Hidden complexity
Murphy’s Law	Fragile assumptions fail.	Untested recovery paths
Conway’s Law	Architecture mirrors organization.	Team boundaries causing system seams
Brooks’s Law	More people can slow late projects.	Onboarding and coordination costs
Goodhart’s Law	Metrics distort behavior.	Ticket-count theater
Campbell’s Law	High-stakes metrics get gamed.	Reclassification and reporting games
Occam’s Razor	Start with the simplest explanation.	Overcomplicated diagnosis
Hanlon’s Razor	Blame systems before people.	Process gaps disguised as negligence
Gall’s Law	Working complexity evolves from simplicity.	Big-bang platform efforts
Ninety-Ninety Rule	Last-mile work is large.	Demo-ready mistaken for done
Knuth’s Warning	Optimize after measuring.	Premature scaling and abstraction
Postel’s Law	Send strict, receive tolerant.	Compatibility vs. hidden bugs
Linus’s Law	Visibility helps find defects.	Closed systems and weak review
Metcalfe’s Law	Networks gain value with adoption.	Platform adoption barriers
Amdahl’s Law	Bottlenecks limit speedup.	Serial constraints
Little’s Law	Too much WIP slows flow.	Long queues and multitasking
CAP Theorem	Distributed systems trade off guarantees.	Consistency vs. availability choices
Distributed Computing Fallacies	Networks fail in many ways.	Timeouts, retries, partitions
Least Privilege	Minimize access.	Overbroad permissions
Defense in Depth	Layer protections.	Single-control security models
Technical Debt	Shortcuts accrue interest.	Slow changes and fragile code
Chesterton’s Fence	Understand before removing.	Deleting legacy safeguards blindly
Garbage In, Garbage Out	Data quality matters.	Bad source data
Bus Factor	Shared knowledge reduces risk.	One-person dependencies
Boring Technology	Proven tools reduce operational risk.	Trend-driven architecture

Practical Prompts for Reviews

Use these questions during planning, architecture reviews, incident reviews, and retrospectives.

Project Planning

What is the 20% of work likely to produce 80% of the value?
Where are we assuming something is simple without evidence?
What hidden last-mile work is not in the estimate?
Are we adding people where we actually need scope reduction or better sequencing?

System Design

What is the single point of failure?
What happens during partial failure or network partition?
What is the serial bottleneck?
Is this architecture solving today’s problem or an imagined future problem?
Are we choosing boring technology where reliability matters?

Operations

How fast can we detect, stop, roll back, or recover from failure?
Which recurring manual tasks are toil?
Is ownership clear for alerts, dashboards, runbooks, and escalation?
Are we testing the recovery path or just assuming it works?

Security

Does every identity have only the access it needs?
What happens if one layer of defense fails?
Are we using proven security patterns or inventing our own?
Are we removing a control before understanding why it exists?

Metrics and Management

Could this metric be gamed?
What behavior will this incentive create?
Are we measuring activity or outcomes?
What important reality is missing from the dashboard?

The Short Canon

When in doubt, remember these:

80/20: Find the vital few.
Parkinson: Work fills the container you give it.
Hofstadter: It will take longer than expected.
Murphy: Untested assumptions fail.
Conway: Architecture follows communication.
Brooks: More people can make late work later.
Goodhart: Metrics become games.
Occam: Start with the simple explanation.
Gall: Working complexity grows from working simplicity.
Amdahl: Bottlenecks cap speed.
Little: Too much WIP slows everything.
CAP: Distributed systems force tradeoffs.
Least Privilege: Give only needed access.
Defense in Depth: No single control is enough.
Technical Debt: Shortcuts charge interest.
Chesterton’s Fence: Understand before removing.
Garbage In, Garbage Out: Bad data poisons everything.
Bus Factor: One-person knowledge is risk.
Boring Technology: Reliability loves boring tools.
Blast Radius: Make failures small.

Final Thought

The best engineers, operators, architects, and technology leaders do not memorize these as trivia. They use them as pattern detectors.

When a project is late, a system is fragile, a dashboard looks suspicious, or an architecture feels overcomplicated, one of these laws is usually whispering the answer.