In 1994, the World Wide Web Consortium published a paper outlining a vision called the Semantic Web - a version of the internet where data would not just be readable by humans but understandable by machines. Tim Berners-Lee described it as a web of data that software agents could process directly, draw conclusions from, and act on without requiring a human to interpret each piece. Thirty years later, that vision is still largely unrealized - not because the technology failed, but because most systems never bothered to define what their information actually means.
You are working on information architecture because someone needs things to be findable at scale. The challenge you are going to hit - if you haven't already - is that findability breaks down not when you have too much content, but when your system starts using the same words to mean different things, or different words to mean the same thing. This is the semantic problem, and it is the first thing you need to solve before any navigation or classification decision makes sense.
Ontology Is Not an Academic Exercise
In philosophy, ontology asks what kinds of things exist. In information architecture, it asks a narrower version of the same question: what kinds of things exist in your system, what are their properties, and how do they relate to each other?
An ontology is a formal model of a domain. It specifies the types of entities your system deals with, the attributes that describe those entities, and the relationships between them. If you are designing an enterprise content system, your entities might be Article, Author, Product, Topic, and Customer Segment. If you are designing a healthcare platform, they might be Patient, Condition, Treatment, and Provider.
The distinction that matters here is between an ontology and a taxonomy. A taxonomy is a hierarchy - a filing cabinet. It tells you where things live. An ontology tells you what things are and how they connect. A taxonomy puts "Cardiology" under "Medical Specialties." An ontology says that "Cardiology" is a specialization of "Internal Medicine," treats conditions including "Heart Failure" and "Arrhythmia," requires credentials including "Board Certification," and employs practitioners with the entity type "Cardiologist." That is a fundamentally richer representation, and it is what allows a system to answer questions that a taxonomy cannot.
Key Point: A taxonomy tells you where something lives. An ontology tells you what something is. You cannot build a system that handles complexity using only a taxonomy - the moment you need to answer a question that crosses category boundaries, the filing cabinet metaphor breaks down.
The Three Components You Need to Define
Every ontology is built from the same three elements, and you should not proceed past this stage without having explicit definitions for all three in your domain.
Entities are the core objects - the nouns. Defining them precisely prevents two failure modes that will haunt you at scale. Polysemy is when one label covers multiple distinct things: calling everything a "document" regardless of whether it is a legal contract, a marketing brief, or an internal memo. Synonymy is when multiple labels cover the same thing: referring to the same person as a "user," "customer," "account holder," and "member" in different parts of the system. Both create the sensation users describe as the system not making sense, even when they cannot articulate why.
Attributes are the properties that describe entities - the adjectives. A Product has a price, a category, a publication date, and a manufacturer. These are what faceted search runs on later. Defining attributes explicitly at this stage means you are not retrofitting classification logic onto content that was never structured to support it.
Relationships are how entities connect to each other - the verbs. The critical format here is the triple: Subject - Predicate - Object. "Doctor prescribes Treatment." "Article belongs to Topic." "Product replaces Product." When you express relationships as triples, you are building something a machine can reason over - not just display.
The Semantic Layer and Why It Matters Now
The semantic layer is the translation mechanism between human ambiguity and machine precision. Without it, a search for "Apple" returns results about fruit and technology companies with no way to disambiguate. With it, the system knows that in your context, "Apple" refers to the entity type "Technology Company" with attributes including "Founded: 1976" and "Founder: Steve Jobs."
This matters more now than it did five years ago because the interfaces consuming your information architecture are no longer just web pages. Voice assistants, AI-powered search, recommendation engines, and large language model integrations all depend on structured semantic data to function. A system designed with a well-defined ontology can serve all of these interfaces from the same underlying representation. A system built as a hierarchy of HTML pages cannot.
How to Actually Start
The temptation when building an ontology is to make it comprehensive before you make it correct. Resist this. Start by identifying the ten most important entities in your system. List how they relate to each other using the triple format. Define the five most important attributes for each entity. Then test the model against a hard question - something like "Which authors have written articles about Topic X that have been shared more than 500 times with users in Customer Segment Y?" If your ontology can answer that question, it is doing its job. If it cannot, you have found a gap.
The vocabulary normalization step is where organizations waste the most time and underinvest the most effort. Choose one term for each entity type and enforce it. The choice is less important than the consistency. Your controlled vocabulary is not a style guide - it is a contract that every system touching this data is expected to honor.