Can You Trust LLM Information Retrieval? A Comparative Analysis of Major AI Search Capabilities

Introduction

As AI adoption accelerates, we increasingly encounter the challenge of “what constitutes optimal website design for AI.” As a follow-up to our previous insight article, this piece compiles additional findings, including somewhat technical aspects of web crawling, based on evaluation experiments where multiple AIs assessed the same site.

The result: ChatGPT was able to read page structure and documented information to a reasonable degree, while Claude could only retrieve partial information and appeared unable to grasp the site’s overall content. What causes this significant gap?

Investigation revealed major differences in the search engines each AI uses, technical constraints, and their ability to handle site structures. Particularly interesting was the fact that even with properly placed sitemaps registered with major search engines, some AIs couldn’t discover even basic information.

In business today, we frequently hear “I checked with ChatGPT” or “AI evaluated it.” But if the information and evaluations obtained vary dramatically depending on which AI is used, this is a serious problem. Understanding each AI’s characteristics and using them selectively has become essential for accurate information gathering and appropriate judgment.

Test Overview

Target LLMs: ChatGPT (OpenAI/GPT-4o), Claude (Anthropic/Sonnet 4), Gemini (Google), Perplexity (Perplexity AI), supplementary: GenSpark

Using identical prompts across LLMs against our domain, we observed response content structure and information retrieval tendencies.

Key evaluation criteria:

Information discovery scope: How much page content and structure could be accessed
Information consistency: Alignment between response content and actual page descriptions
Response consistency: Whether outputs were consistent under identical conditions
Search depth: Ability to reach internal pages and hierarchical structures

Results

ChatGPT: Demonstrated broad information retrieval across the entire site, incorporating content from multiple pages. Showed reasonable capability in structure comprehension and context reflection.

Claude: Limited information retrieval scope; most major content was inaccessible. Initial responses contained almost no site-specific information, revealing challenges in structural recognition depth.

Gemini: Some sites and content were retrieved, but overall cross-page information integration and context comprehension appeared limited. Response to information structures was somewhat restrained.

Perplexity: Retrieved information showed variability, with some responses mixing in external sources. As a multi-source integration model, some challenges remained regarding information consistency and source stability.

GenSpark: Showed strong tendency toward accessing external resources (e.g., official sites and off-site data), with retrieval behavior combining internal site information with external data.

Why Differences Arise Between LLMs

Search engine differences: ChatGPT uses Bing, Gemini uses Google, Claude uses Brave Search, and Perplexity uses a multi-source integrated approach. Significant differences exist in index scope, reflection speed, and real-time capability.

Technical constraints: Some LLMs can only crawl URLs displayed in search results, with limitations on reaching subdirectories, redirect destinations, and specific file formats.

Index reflection speed: Some search engines take time to index new files, and sitemap registration and structured data may not be reflected immediately.

Countermeasures for AI-Era Website Design

Multi-LLM Compatibility Strategy

Place important information across multiple paths and pages, actively utilize structured data, and clearly document essential information in sitemaps and top pages to ensure accessibility for each LLM.

Addressing Search Constraints

Place important information directly under the root, avoid excessive use of subdirectories and redirects, and adopt simple, clear site structures.

Search Engine Optimization

Proper registration with Google Search Console, Bing Webmaster Tools, etc., comprehensive sitemaps and structured data, and clear navigation design improve discoverability across search engines.

Information Redundancy

Provide important information across multiple pages in different formats, enabling information access suited to each LLM’s search characteristics.

Conclusion

This investigation revealed clear differences in how each LLM retrieves information and what scope is visible to it. No AI can “accurately retrieve all information” — each has strengths and weaknesses. When researching, comparing results across multiple AIs rather than relying on just one can lead to more reliable understanding.

For information publishers as well, designing with “AI-friendly structure” in mind can help ensure intended content is more accurately retrieved. However, no matter how much optimization is applied, what ultimately matters is that the information is “useful to the reader.” Creating content that communicates clearly to both AI and humans will become increasingly important going forward.

Disclaimer: This investigation is based on a specific site as of July 2025, and results may not be identical for all sites. Each LLM’s capabilities are continuously improving, so current conditions may differ. Results are also expected to vary significantly based on site structure, content, and industry characteristics. When referencing this article, periodic re-verification and comprehensive evaluation across multiple criteria are recommended. Multiple LLMs were used in writing this article.