How to do a content inventory (and why it’s the first step to growth)
I’ve seen it happen in almost every marketing team I’ve worked with: we push hard to publish weekly, hitting our deadlines perfectly, while hundreds of older pages quietly decay in the background. It’s a silent performance killer. You end up with a site where 20% of the content drives 90% of the value, and the rest is just “unknown content debt”—pages that might be cannibalizing your keywords, confusing your users, or wasting your crawl budget.
Most businesses honestly don’t know exactly what they have published over the last five years. They have a vague idea, but no definitive list. That’s where a content inventory comes in.
For small teams, marketers, and founders, a content inventory isn’t just a cleanup task; it is the operational baseline for growth. It stops you from flying blind. By the end of this guide, you will have a clear, repeatable workflow to turn a messy website into a structured dataset. We will cover the specific template fields you need, the tools that actually work (without breaking the bank), and a prioritization method to decide what to fix first.
Here is the reality: you cannot audit what you haven’t inventoried. Before you commit to “publishing more,” you need to see the map.
Quick answer: what I mean by a “content inventory”
Let’s get the definitions straight so you can explain this to your stakeholders clearly.
- What it is: A comprehensive, quantitative catalog of every asset on your website (URLs, PDFs, images) alongside their technical attributes.
- What it isn’t: It is not a quality assessment. That comes later, during the audit phase.
- The outcome: A single, filterable spreadsheet or database that acts as your source of truth for every decision you make next.
Content inventory vs. content audit: what’s the difference (and why it matters)
When I run inventories, I treat the process as data collection, not judgment. Confusion between “inventory” and “audit” is the number one reason these projects stall. Teams try to rewrite headlines while they are still trying to find all their URLs. That is a recipe for burnout.
The Content Inventory is quantitative. It asks: What exists? Where is it? When was it published?
The Content Audit is qualitative. It asks: Is this good? Is it accurate? Does it convert?
If you skip the inventory and jump straight to auditing, you miss the structural problems. I once worked with a business that had a blog of about 300 posts. They were busy rewriting old articles one by one. When we finally ran a proper inventory, we found they had three separate articles targeting “best CRM for small business,” all competing with each other. They didn’t need to rewrite; they needed to delete and merge. Without the inventory, they were just polishing a problem.
What a content inventory captures (minimum viable fields)
If you only track a handful of fields, make sure it is these. This is the data that allows you to make decisions later.
- URL: The exact address of the page.
- Page Title (H1): To check for duplicates or clarity.
- Meta Description: To see if it’s missing or automated.
- Content Type: Is it a blog post, landing page, or documentation?
- Author: Crucial for establishing E-E-A-T and updating bios.
- Publication/Update Date: To spot content decay immediately.
- Status Code: Is it live (200), redirected (301), or broken (404)?
Before I start: goals, scope, and a simple inventory template
I’d rather start with a lean template and add fields later than build a perfect “monster spreadsheet” that nobody maintains. Before you crawl a single URL, you need to define your boundaries.
First, ask yourself what the business goal is. Are you trying to drive more leads? Are you cleaning up after a migration? Or are you just trying to stop traffic from sliding backward? If your goal is leads, your inventory must include conversion data. If it’s traffic, you need GSC impressions.
Second, pick your tool. For most intermediate needs, Google Sheets or Airtable is perfect. They allow for easy filtering and collaboration. Below is the column structure I recommend for a standard B2B content inventory.
| Column Name | Why you need it | Source |
|---|---|---|
| URL | The unique identifier for the row. | Crawler |
| Page Title | Quick context on what the page is about. | Crawler |
| Content Type | Categorization (Blog, Product, Support). | Manual / Regex |
| Publication Date | Identifies aging content. | Crawler / CMS |
| Status Code | Filters out broken links or redirects. | Crawler |
| Word Count | Helps identify “thin” content. | Crawler |
| Organic Traffic (L3M) | Performance baseline (Last 3 Months). | GA4 |
| Primary Keyword | What the page should rank for. | Manual / GSC |
| Action | The decision (Keep, Kill, Merge). | Manual Decision |
Decide the scope: what to include (and what to skip for now)
If you have thousands of pages and only two hours, don’t try to boil the ocean. Inventory the directories that drive the most revenue first.
- Determine the domain boundaries: Are you including subdomains (e.g.,
help.yoursite.com) or just the main root domain? - Handle file types: Do you need to track PDFs? They often rank in search but are hard to update. I usually exclude images/JS files from my main content sheet to keep it clean.
- Manage Canonicity: Only inventory the canonical version of a URL. If you have
/product?color=red, exclude it. You only want the master page. - Staging & Test pages: Exclude these immediately. They are noise.
How to do a content inventory: my step-by-step workflow (crawl → enrich → validate)
This is the core workflow I use. It moves from raw data collection to a validated dataset. It prevents the “analysis paralysis” that happens when you stare at a blank sheet.
The Workflow: Crawl → Clean → Categorize → Enrich → QA → Prioritize
Keep in mind: The inventory comes before the audit. We are building the list right now, not fixing the content yet.
Step 1: Crawl your site to collect URLs + metadata
If crawling sounds scary, don’t worry—you are mostly just clicking “Start” and exporting a CSV. I use Screaming Frog SEO Spider for this. It is the industry standard for a reason. You can download it for free (up to 500 URLs), which covers many small business sites.
Configure the crawler to respect your robots.txt but also check the box to crawl “Images” and “PDFs” if those are in your scope. Once it finishes, export the data. You want the “Internal HTML” report. This gives you the Status Code, H1, Title, Meta Description, and Word Count.
Real-world tip: Every time I run a crawl for a client, I discover something weird—like an old “Summer 2019” campaign subfolder that is still live. Don’t be surprised if you find pages you thought were deleted years ago.
Step 2: Clean and normalize the dataset (so it doesn’t lie to you)
Messy inputs create messy strategy. Before you analyze, you need to clean the spreadsheet.
- Deduping: Remove duplicate URLs. If you have
www.site.comandsite.comshowing up, standardise them. - Trailing Slashes: Treat
/blog/postand/blog/post/as the same row for now, or ensure your site forces one version. - Filter Status Codes: I keep non-200 (live) URLs in a separate tab. You need to know about 404s (broken links) and 301s (redirects), but they shouldn’t clutter your main analysis view.
Step 3: Categorize content by type, intent, and topic
This is where manual work usually starts, although you can automate some of it with URL patterns. You need to tag what each row actually is.
I typically start with 6–10 topic tags max and expand only when I see patterns. A B2B list might look like:
- Type: Blog Post, Case Study, Product Page, Pricing, Support Doc.
- Funnel Stage: Awareness, Consideration, Decision.
- Topic Cluster: e.g., “SEO Tools,” “Content Marketing,” “Agency Growth.”
Step 4: Enrich your inventory with performance data (GSC + GA4)
A list of URLs is static; data makes it dynamic. I rely on two main sources here.
From Google Search Console (GSC), I pull Clicks, Impressions, and Average Position. This tells me if Google likes the content.
From GA4, I pull Sessions and Engagement Rate. This tells me if humans like the content.
I usually verify data for the “Last 12 Months” (to see long-term value) and “Last 3 Months” (to see recent trends). If GA4 is messy or untracked—which happens often—I stick to GSC data because it’s a direct link between query and page.
Step 5: QA and validation checks (fast but high-leverage)
Before presenting this data to a boss or client, I always do a sanity check. These checks keep me from presenting bad data to stakeholders.
- Check Top Performers: Verify that your top 20 traffic pages from GSC are actually in your crawl list. (Sometimes crawlers miss orphaned pages).
- Spot Check Titles: Are H1s missing? Are meta descriptions duplicated across 50 pages?
- Thin Content: Filter for word count < 300. Is this a valid page or a mistake?
- Indexability: ensure your priority pages aren’t accidentally tagged “noindex.”
Tools that make content inventories faster (and what each one is best for)
Tools support the process; they don’t replace decisions. The “best” tool is honestly the one your team will actually use every quarter. If you have a massive budget but no time to learn complex software, a simple spreadsheet is better than an expensive dashboard you ignore.
However, once you have your inventory, you need to operationalize it. Turning inventory insights into a consistent publishing system is where platforms like Kalema can help, allowing you to move from analysis to production seamlessly. But for the analysis itself, here is my go-to stack.
Tool comparison table: crawler vs optimizer vs strategy platform vs spreadsheet
| Tool | Primary Use | Best For | Limitations |
|---|---|---|---|
| Screaming Frog | Technical Crawling | Getting the raw data (URL, Metadata, Status). | Steep learning curve; data is static CSV. |
| Google Sheets / Airtable | Database & Collaboration | Filtering, sorting, and manual tagging. | Requires manual data entry/import. |
| Surfer / Clearscope | Content Optimization | Scoring quality of individual pages. | Not built for site-wide inventory management. |
| MarketMuse | Strategy & Authority | Identifying gaps and topical authority. | Higher price point; overkill for small sites. |
From inventory to action: ROT analysis + a simple prioritization system
Now that you have a spreadsheet, what do you do with it? This is where we apply ROT Analysis. It stands for Redundant, Outdated, and Trivial. It is the industry standard framework for cleaning up content bloat.
By identifying ROT, you free up crawl budget and improve user experience. Once you’ve marked your content, you can operationalize the updates. For instance, if you identify 50 pages that need refreshing, you can use an automated blog generator to assist in drafting the updates, keeping your editorial calendar moving while you focus on strategy.
ROT definitions (with beginner-friendly examples)
- Redundant: Content that duplicates other pages. Example: You have “How to choose a CRM” and “Guide to selecting a CRM” as two separate posts.
- Outdated: Content that is factually wrong or stale. Example: A post about “Best Marketing Trends of 2018” or a product page for a discontinued item.
- Trivial: Content that offers no business value. Example: A 100-word announcement about an office picnic from three years ago.
Decision table: keep, update, merge, redirect, or remove
For every URL in your inventory, assign one of these actions. My rule: if a page gets steady clicks, I try updating before deleting.
| Condition | Recommended Action | Notes |
|---|---|---|
| High Traffic, High Accuracy | Keep | Do nothing. It works. |
| High Traffic, Low Accuracy (Outdated) | Update | Refresh stats, add new sections, keep URL same. |
| Multiple pages on same topic (Redundant) | Merge | Combine best parts into one strong page; 301 redirect the rest. |
| Zero Traffic, Zero Value (Trivial) | Delete (410/404) | Check backlinks first! If no links, delete. |
| Zero Traffic, Has Backlinks | Redirect (301) | Redirect to the most relevant parent category. |
Modern inventory upgrades: structure, schema, and voice/AI-driven search considerations
The SEO landscape is shifting. Voice searches are becoming longer and more conversational—averaging 29 words compared to the typical 3–5 words for text queries. Furthermore, smart speaker adoption is projected to reach significant saturation in developed markets by 2025. This means your inventory needs to account for machine readability, not just keywords.
You don’t need to predict the future—just make your content easier to parse. When I audit for modern search, I look for Structured Data. If your content is structured with clear headings, lists, and Schema markup, it has a much better chance of being picked up by voice assistants and AI overviews.
What to track in your inventory for structured content (fields I add)
If I’m short on time, I only add two extra columns, but they are powerful:
- Schema Present (Y/N): Does this page have FAQ, HowTo, or Article schema?
- Content Format: Is this a “Listicle,” “Guide,” or “Comparison”?
- Snippet Opportunity: Does the page answer a direct question (e.g., “What is X?”) in the first 100 words?
Common content inventory mistakes (and how I fix them)
I’ve made plenty of mistakes when I started doing inventories. The most common one is trying to do everything manually and burning out. Another is trusting the data blindly without checking the date ranges.
Once you avoid these pitfalls and have a clean inventory, you can start scaling your content production. Many teams use an AI article generator to help draft the refreshes for the “Update” pile, ensuring they maintain quality while moving fast.
Mistake-to-fix checklist (copy/paste)
- Mistake: Inventorying only blog posts.
Fix: Include landing pages, product pages, and support docs. They all compete for attention. - Mistake: Ignoring date ranges.
Fix: Always pull “Last 12 Months” and “Last 3 Months” to spot trends (spikes vs. steady decline). - Mistake: Not checking Backlinks before deleting.
Fix: Always run a URL through a backlink checker before hitting delete. You don’t want to destroy your domain authority. - Mistake: Inconsistent Tagging.
Fix: Use a dropdown menu in your spreadsheet for “Topic” so you don’t end up with “SEO,” “S.E.O.,” and “Search Engine Opt” as different tags.
FAQs + my recommended cadence (plus next steps)
What’s the difference between a content inventory and a content audit?
Think of it like a warehouse. The inventory is the list on the clipboard that says, “We have 50 boxes of widgets.” The audit is opening the boxes to see if the widgets are broken, rusty, or ready to sell. Inventory is quantitative (the list); audit is qualitative (the value).
Which tools are most effective for doing a content inventory?
For a small business, I would personally choose Screaming Frog to crawl and Google Sheets to analyze. It’s cheap (or free) and universally understood. If you need advanced scoring for content gaps, tools like MarketMuse or Clearscope are great add-ons, but they don’t replace the fundamental crawl data.
How often should I conduct a content inventory?
Experts and research generally recommend conducting a full inventory every 3 to 6 months. If you are a high-volume publisher, you might do a rolling inventory (updating the sheet monthly). At a minimum, do it annually. If you wait longer than a year, the “ROT” builds up so much that the cleanup becomes a massive project.
What is ROT and how does it impact content inventory decisions?
ROT stands for Redundant, Outdated, and Trivial. Identifying ROT is the quickest way to improve your site’s health. By removing or merging these pages, you consolidate your authority and make your site easier for Google to crawl.
Why is structuring content important in modern inventories?
Structure is how you make your expertise legible to both humans and machines. With the rise of voice search and generative AI, having clear Schema markup and structured headings (H2s, H3s) is critical. It helps your content surface in direct answers, not just blue links.
Recap and Next Steps:
- Start Small: Don’t try to inventory the whole internet. Start with your main domain and critical content types.
- Crawl & Clean: Use a tool to get the URLs, then clean up the data (dedupe, normalize).
- Prioritize with ROT: Tag every URL as Keep, Update, Merge, or Delete.
A content inventory isn’t a one-time punishment; it’s a cycle of improvement. Start with the smallest inventory you can maintain, build your list, and turn that unknown debt into a strategic asset.




