From queries to insights: how Iron Mountain's hybrid knowledge graph querying tool turns questions into answers

Blogs and Articles

We built a querying system in our intelligent content management platform, Iron Mountain InSight® DXP, that transforms natural language queries into deep, multi-dimensional graph searches—bridging human intent with enterprise knowledge.

Sushant Tiwari

Senior Machine Learning Engineer, Digital Business Unit, Iron Mountain

December 16, 20257 mins

A finger points to computer code on a screen

Key breakthroughs include:

Natural language understanding: Query analysis powered by LLMs identifies intent, entities, and relationships
Hybrid search algorithm: Combines keyword graph traversal, vector similarity search, and embedding-enhanced exploration for complete recall
Parallel execution engine: Executes multiple search strategies simultaneously to minimize latency and maximize relevance
Intelligent answer synthesis: Uses LLM-based reasoning to convert retrieved graph data into clear, explainable answers

Bottom line: Ask questions like a human. Get answers like an analyst. Our querying tool turns unstructured questions into structured intelligence—delivering reasoning, not just results.

The usability problem

We built a powerful knowledge graph with a large number of entities and relationships. Complete auditability. It could answer any question about our enterprise data.

Then we asked a customer to use it.

"How do I find contracts governed by regulations updated in the last 90 days?"

They looked at the Cypher query interface. Then looked at us. "I don't know Cypher."

That's when we realized: Building a knowledge graph is one thing. Making it usable by everyone is another.

Traditional graph databases require specialized query languages like Cypher or SPARQL—powerful, but inaccessible for most business users. We had built incredible infrastructure that only data engineers could use.

The entire value of our knowledge graph was locked behind a technical barrier.

The challenge: why traditional approaches fall short

We explored alternatives to make the graph accessible:

SQL querying: Treats data as flat tables. Misses the relationships that tie information together. "Find employees connected to expired contracts" becomes a nightmare of joins that still can't capture multi-hop relationships.

Standard RAG (retrieval-augmented generation): Fetches semantically similar text snippets. But it can't reason about how things connect. It sees "Employee" and "Contract" as separate documents, not as connected entities in a relationship graph.

Direct cypher access: Incredibly powerful. Also incredibly specialized. Our customers would need weeks of training—and they'd still make mistakes.

We needed something that understood questions like a human but searched like a graph database.

Discovery: the three-strategy problem

Early prototypes used pure vector search. We'd convert the user's question to an embedding, find similar entities, and return results.

It worked—sometimes. When customers asked "Find employees in Boston," vector search performed well.

But when they asked "Which Boston employees report to managers hired from competitors?" vector search fell apart. It couldn't follow the reports_to relationship chain. It just found documents mentioning "Boston," "managers," and "competitors."

Then we tried pure graph traversal. Define a pattern, follow edges, return connected nodes.

This was better for relationship queries—but terrible for semantic matching. When a customer asked about "Chief Executive Officer," it missed documents that said "Head of Corporate Leadership." The terminology varied but the meaning was the same.

The realization: We needed all three approaches. Graph traversal for structural relationships. Vector search for semantic similarity. Text search for document evidence. And we needed them working together, not separately. That's when we built the hybrid approach.

The solution: hybrid querying in four phases

We orchestrate multiple intelligent services through a four-phase pipeline that analyzes, executes, and synthesizes each query.

Phase 1: understanding the question

When a customer submits a question, our query analysis service performs two tasks in parallel:

Entity extraction: Identifies key entities and concepts (e.g., "contract," "regulation," "vendor").

Intent classification: Determines what the customer wants to know—e.g., ENTITY, RELATIONSHIP, PATH, or COMPARISON.

This creates a GraphPattern—a structured plan defining the optimal search strategy and traversal depth.

Example:


Query: "Show me how employees are connected to contracts with expired certifications."

→ Entities: Employee, Contract, Certification
→ Intent: PATH (relationship traversal across entity types)
→ Result: A multi-hop pattern definition to guide the search

Why this matters: We don't just search blindly. We understand what you're asking for, then choose the right strategy to find it.

Phase 2: three paths to precision

Our parallel execution engine runs three search strategies simultaneously:

Strategy 1 - graph-based reasoning: Understands and traverses relationships defined in the knowledge graph, retrieving context-rich results aligned with your business ontology.

Example: A query such as "Which employees report to managers in the Boston office?" follows the actual reports_to and works_in relationships to return the precise reporting structure, not just documents mentioning "Boston."

Strategy 2 - semantic search: Captures conceptually similar information even when terminology varies across sources.

Example: Searching for "Chief Executive Officer" also surfaces relevant results labeled "Head of Corporate Leadership," recognizing they represent the same role in context.

Strategy 3 - contextual discovery: Surfaces supporting evidence from the original document text to strengthen confidence and traceability.

Example: A question like "Find contracts requiring cybersecurity training" returns both structured entities and relevant text excerpts, even if "training" appears as "security awareness program."

These complementary approaches run in parallel, blending linguistic understanding with graph logic. Each strategy operates independently, ensuring one approach's limitations don't block the others.

Phase 3: combining and optimizing results

After all three branches finish, we merge findings into a unified response:

Result combination: All entities, relationships, and supporting text are merged and ranked by relevance. Duplicates are removed.

Intelligent limiting: Ensures clarity by capping the volume of entities and relationships—prioritizing those directly tied to the query.

Phase 4: from data to answers

The answer generation service builds a structured context from all retrieved evidence and sends it to a Large Language Model, which composes a final, natural language answer backed by provenance.

Example Output:


"There are 12 active contracts governed by regulations updated in the past 90 days. 
4 involve vendors with open compliance findings. Primary risk areas: Data Protection 
and Procurement."

The result is a complete, explainable insight—not just a list of nodes or documents.

Why this outperforms traditional search and RAG

Most retrieval systems treat data as disconnected records. They can pull relevant snippets but can't follow relationships that span across documents, policies, or entities.

SQL retrieves rows but loses cross-document in-depth meaning
RAG fetches semantically similar text but can't reason about how things connect

Our knowledge graph querying tool does both. It understands relationships—like Employee → assigned_to → Project → governed_by → Policy—and can traverse them dynamically.

This means we can answer questions like:


"Which employees are assigned to projects impacted by a new data retention policy?"

with complete reasoning trails, source citations, and entity context.

By combining structured graph logic with semantic vector recall, we deliver relationship-aware, explainable, and AI-ready querying.

Real-world power: from simple to complex

Our querying framework supports the full spectrum of business questions:

Simple lookup:


"List all employees in the Boston office."
→ Returns every employee with Boston-based assignments or reporting lines

Multi-hop reasoning:


"Which Boston employees report to managers hired from competitors in the last two years?"
→ Traverses Employee → reports_to → Employee → previous_employer

Complex compliance analysis:


"Identify employees on expiring work visas assigned to government projects requiring 
security clearance but lacking current certifications."
→ Combines graph traversal and vector search for cross-domain insight

Impact assessment:


"If our data policy changes, what operations are affected?"
→ Traverses Policy → governs → Process → uses → Document → triggers → Workflow, 
mapping every downstream impact in seconds

Each query transforms information archaeology into intelligence delivery—providing immediate, traceable insight.

Built for scale and AI integration

Our architecture ensures enterprise robustness:

Backend agnostic: Unified interface for Apache AGE (production) and NetworkX (development/testing)
Parallel execution: Async orchestration for simultaneous search strategies
LLM-powered understanding: Understands user intent through ontology
Configurable depth and limits: Balances precision and speed

And most importantly— our knowledge graph querying tool seamlessly integrates with Iron Mountain InSight DXP AI Agents, empowering them to query enterprise knowledge graphs directly. This transforms the DXP Agent from a document retriever into a reasoning assistant—able to answer complex, cross-domain questions grounded in verified relationships, not approximations.

The payoff: what customers can do now

Moving to this hybrid querying approach transformed how customers interact with their enterprise knowledge.

Ask in plain English: No Cypher, no SQL—just natural language. Customers type questions as they think of them.

Get complete answers: Not just document snippets or node lists. Full answers with reasoning, context, and source evidence.

Trace every connection: Every answer shows how entities connect, which documents support the conclusion, and why the system reached that answer.

Faster decisions: Instant, relationship-aware answers to compliance and risk queries. Reduced manual research time from days to seconds.

AI-ready infrastructure: Powers InSight DXP Agents with deep, connected intelligence for knowledge-driven operations.

Key takeaways

Natural language access: No specialized query languages—just plain English questions.

Hybrid precision: Combines graph, vector, and keyword search for full context and complete recall.

Relationship awareness: Understands how entities connect across documents and domains.

Enterprise performance: Parallel querying ensures speed and scale without compromising accuracy.

Explainable results: Every answer is backed by reasoning trails, source citations, and metadata.

AI-ready integration: InSight DXP Agents with deep, connected intelligence.

The bottom line

You shouldn't need a data scientist to understand your data.

Traditional graph databases built incredible infrastructure that only engineers could use. RAG systems retrieved documents but couldn't reason about relationships. SQL could query tables but couldn't follow connections.

Our knowledge graph querying system bridges the gap—turning natural language questions into intelligent graph exploration, delivering answers that are both comprehensive and explainable.

Your information becomes living intelligence—powering AI agents, revealing hidden relationships, and delivering the right answers exactly when you need them.

Featured services & solutions

InSight DXP

Govern, streamline, and unlock information in a secure, AI-powered data platform

Elevate the power of your work

Get a FREE consultation today!

Get Started