Back to Blog
JSONAIChatGPTClaudeLLMProductivity

JSON and Artificial Intelligence: Complete Guide

How structured formats improve accuracy and reduce AI errors. Based on 30+ academic studies (2023-2025) on ChatGPT, Claude, and Gemini.

Paulo Giavoni

Paulo Giavoni

Engineer & BIM Specialist

10 February 202611 min read
JSON and Artificial Intelligence: Complete Guide

This article translates and explains, in accessible language, the key academic research (2023-2025) on how using JSON improves communication with AIs like ChatGPT, Claude, and Gemini β€” saving tokens, reducing errors, and obtaining more accurate responses.

Based on 30+ academic papers and technical documentation: OpenAI, Anthropic, Google DeepMind, Microsoft Research, and arXiv.


1. Fundamental Concepts#

What is JSON?#

JSON (pronounced "jay-son") is a standardized way of organizing information that computers and programs can easily read. Think of it as an "organized form" with well-defined fields.

Example in plain language:

Text
1"My name is Paul, I'm 40 years old,
2I live in Milan and work as
3an electrical engineer."

Same example in JSON:

JSON
1{
2 "name": "Paul",
3 "age": 40,
4 "city": "Milan",
5 "profession": "electrical engineer"
6}

Plain English explanation: JSON is like filling out a form instead of writing an essay. Instead of continuous text, you put each piece of information in its proper "field." This avoids confusion and makes it easier for machines to read.

What are LLMs?#

LLM stands for Large Language Model. These are the artificial intelligence programs behind tools like ChatGPT, Claude, and Gemini. They work by predicting the next word based on everything they've "read" during training β€” billions of texts from the internet, books, and documents.

What are Tokens?#

A token is the AI's "reading unit." It's not exactly a word β€” sometimes it's a syllable, a piece of a word, or a symbol. For example, the word "engineering" might be split into 2 or 3 tokens. The more tokens you use, the more the AI API call costs and the longer it takes to respond.

Plain English explanation: Think of tokens as "coins" you spend every time you talk to the AI. Each word, comma, or symbol costs coins. If you can say the same thing while spending fewer coins, you save money and time.

What are AI Hallucinations?#

Hallucination is when the AI invents information that seems true but is false. For example, it might invent a book that doesn't exist, create a wrong date, or generate completely fictional data with full confidence. This happens because the AI doesn't "know" things β€” it only calculates what the next most likely word is.


2. Token Economy: Does JSON Save or Spend More?#

Surprising discovery: Raw JSON actually uses MORE tokens than other formats! But structured prompting approaches (which include JSON) can save between 30% and 87% of tokens.

The common assumption that JSON saves tokens compared to natural language is more complex than it seems. Research shows that JSON is one of the least efficient formats in terms of tokens, consuming roughly twice as many tokens as TSV format (tab-separated data) and 30-56% more than YAML.

The technical reason is BPE (Byte Pair Encoding) tokenization: JSON's curly braces { }, quotes, commas, and repeated field names generate separate token fragments.

Format Efficiency Comparison#

FormatRelative TokensSavings vs JSON
Standard JSON100% (reference)β€”
YAML44-70%30-56% less
TSV (tab-separated)~50%~50% less
Function Calling~58%42% less
Compact JSON~80%~20% less

Source: Microsoft Data Science / David Gilbertson (2024)

Plain English explanation: Imagine you're sending a text message and each letter costs money. JSON is like writing with lots of quotes, braces, and repetitions β€” it costs more "characters." However, JSON's VALUE isn't in using fewer letters, but in ORGANIZING information so the AI understands better and makes fewer mistakes.

Where Real Savings Happen#

The real token savings come from how you use structured formats, not from JSON itself:

TechniqueSavingsStudy
Structured pseudocode55-87% input, 41-70% outputCodeAgents (Yang et al., 2025)
Code synthesis for extraction110x cost reductionEVAPORATE (Stanford, 2023)
Prompt compressionup to 60% totalCompactPrompt (arXiv, 2025)
JSON Patches (RFC 6902)30%+ reductionJSON Whisperer (arXiv, 2025)

3. Hallucinations: Does JSON Reduce Invented Errors?#

Short answer: JSON completely eliminates FORMAT errors (structural), but doesn't eliminate CONTENT errors (factual). To combat both, the best strategy combines JSON with RAG.

Two Types of Hallucination#

TypeExampleDoes JSON Fix It?
Structural Hallucination (wrong format)AI returns loose text instead of valid JSON, or omits required fieldsβœ… YES β€” 100% solved with constrained decoding
Factual Hallucination (wrong content)AI returns perfect JSON but content is invented: {"capital": "Cleveland"}❌ NOT directly β€” requires RAG or validation

Plain English explanation: Think of it this way: JSON ensures the AI fills out the "form" correctly (all fields, right format). But it doesn't guarantee that the ANSWERS written in the fields are true. It's like someone filling out a perfectly formatted resume but lying about their work experience.

The Best Strategy: JSON + RAG#

The study by BΓ©chard and Ayala (NAACL 2024) showed the most convincing result:

  • Without RAG: the AI invented wrong steps in 21% of cases
  • With RAG + structured output: that number dropped to less than 7.5%
  • Reduction: approximately 65-70%

Plain English explanation: RAG (Retrieval-Augmented Generation) is like giving the AI a "cheat sheet." Instead of answering from memory (and inventing), you first search for the right documents and send them along with the question. JSON + RAG = correct format + correct content.

Constrained Fields Against Hallucination#

A partial but effective technique is limiting possible values in JSON fields. For example:

JSON
1{
2 "category": "electrical | plumbing | mechanical",
3 "priority": "low | medium | high | urgent"
4}

Combined with low temperature (0.1-0.4), this prevents the AI from inventing categories that don't exist.


4. Reasoning vs. Structure: When Does JSON Hurt?#

The big debate: An influential study (EMNLP 2024) claimed that forcing JSON degrades reasoning by up to 38%. However, subsequent responses showed that the problem isn't JSON itself, but rather POOR IMPLEMENTATION.

The paper "Let Me Speak Freely?" by Tam et al. had a major impact showing that the LLaMA-3-8B model had a 38% performance drop when forced to respond in JSON. The mechanism was revealing: JSON mode placed the answer field BEFORE the reason field, forcing the AI to give the final answer before reasoning.

However, the dottxt team (creators of Outlines) published a detailed rebuttal showing that, with proper prompts, structured generation improved performance on the same tests.

The JSONSchemaBench benchmark (Geng et al., 2025) β€” the most rigorous to date, with 10,000 real JSON schemas β€” confirmed that constrained decoding consistently improves performance by up to 4%, including on reasoning tasks.

Plain English explanation: Imagine asking someone to solve a math problem, but demanding they write the answer BEFORE showing their work. Obviously they'll make more mistakes! The problem isn't using an organized form, but the ORDER of fields.

The Solution: Think First, Structure Later#

The research-recommended approach is the two-step pattern:

❌ WRONG (answer before reasoning):

JSON
1{
2 "answer": "42",
3 "reasoning": "..."
4}

βœ… CORRECT (reasoning before answer):

JSON
1{
2 "reasoning": "First I calculate X, then Y...",
3 "answer": "42"
4}

The Instructor library showed that including a chain_of_thought field in the JSON schema increases performance by 60% on math benchmarks.


5. Constrained Decoding Tools#

Plain English explanation: "Constrained decoding" is like putting "guardrails" on the AI. Instead of letting the AI write anything, the system blocks invalid tokens at each step, ensuring the output is always valid JSON. It's like a digital form that won't let you type letters in the phone number field.

Five Main Technical Approaches#

ToolMethodSpeed
Outlines (Willard & Louf, 2023)Finite State Machine (FSM)Fast, minimal overhead
XGrammar (Dong et al., 2025)Byte-level pushdown automaton< 40 microsec/token, 100x faster
Guidance (Microsoft)Real-time token masking~50 microsec/token, 0 startup
Structured Outputs (OpenAI/Anthropic/Google)Server-side CFG constraintBuilt into API, 100% compliance
Instructor (open-source library)Validation + auto-retryHigh compliance, not 100% guaranteed

OpenAI launched Structured Outputs in August 2024, achieving 100% schema compliance. Anthropic (Claude) followed in November 2025. Google Gemini uses controlled decoding based on OpenAPI 3.0 schemas.


6. Benchmark Results#

Main finding: No format (JSON, YAML, Markdown) is universally superior. The quality difference between large and small models (21 percentage points) is MUCH greater than any difference between formats.

The most comprehensive study to date (McMillan, 2025) tested 9,649 experiments with 11 models and 4 formats and concluded that format choice does not significantly affect aggregate accuracy.

Plain English explanation: It's like the difference between handwriting and essay content. Changing the "handwriting" (format) makes little difference if the "student" (model) is good. An advanced model (GPT-4, Claude Opus) will be more accurate regardless of format.

Key Benchmark Numbers#

BenchmarkMain Result
StructuredRAG (Shorten et al., 2024)Average success rate: 82.55%. Gemini 1.5 Pro: 93.4% vs LLaMA 8B: 71.7%
FOFO (ACL 2024)Format-following ability is independent of generated content quality
Format Bias (Do et al., 2025)Performance variance between formats reduced from 235 to 0.71 with mitigation
StructEval (arXiv, 2025)Even frontier models have limited scores; generation is harder than conversion

7. Practical Guide: The 6 Golden Rules#

Based on all the research analyzed, these are the practical recommendations:

Rule 1: Always put 'reasoning' before 'answer'#

Never ask the AI to give the final answer as the first field. Always include a reasoning field before the answer field. This allows the AI to "think" before responding, increasing accuracy by up to 60%.

Rule 2: Use fields with restricted values (enums)#

Whenever possible, define the allowed values for each field. Instead of letting the AI write any text, limit the options:

JSON
1{
2 "status": "pending | in_progress | completed | cancelled",
3 "type": "residential | commercial | industrial"
4}

Rule 3: Combine JSON with RAG for factual content#

If factual accuracy is critical, don't rely on JSON alone. Provide reference documents along with the prompt. JSON guarantees format; RAG guarantees content.

Rule 4: Keep schemas simple and flat#

JSON objects with many nesting levels are significantly harder for the AI. Prefer simple, flat structures whenever possible.

Rule 5: Use API Structured Outputs when available#

If you're using the OpenAI, Anthropic, or Google API, enable Structured Outputs mode. This guarantees 100% schema compliance with no additional effort on your part.

Rule 6: For token savings, optimize representation#

Remove unnecessary whitespace, use short field names, and consider formats like YAML if your pipeline supports it. For JSON editing, use patches (RFC 6902) instead of rewriting the entire document.


8. Conclusions#

After analyzing over 30 academic studies, three conclusions stand out:

Conclusion 1: Token savings come from the APPROACH, not JSON#

Raw JSON uses more tokens than other formats. However, structured prompting approaches (compact schemas, code synthesis, patches) deliver real 30-87% token reductions.

Conclusion 2: Format errors and content errors are distinct problems#

Constrained decoding eliminates 100% of format errors (invalid JSON, missing fields). However, content errors (invented information) require complementary techniques like RAG.

Conclusion 3: Reasoning degradation is an implementation problem#

Quality loss in reasoning doesn't come from JSON itself, but from poor implementations (token misalignment, wrong field order, missing reasoning field). Well-implemented frameworks like Guidance and DOMINO match or exceed unconstrained performance.


9. Academic References#

  1. Willard, B. T. & Louf, R. (2023). "Efficient Guided Generation for Large Language Models." arXiv:2307.09702. Outlines library.
  2. Tam, Z. R. et al. (2024). "Let Me Speak Freely? A Study on the Impact of Format Restrictions on LLM Performance." EMNLP 2024. arXiv:2408.02442.
  3. Geng, S. et al. (2025). "JSONSchemaBench: A Rigorous Benchmark of Structured Outputs." arXiv:2501.10868.
  4. Beurer-Kellner, L. et al. (2024). "DOMINO: Guiding LLMs The Right Way." ICML 2024. arXiv:2403.06988.
  5. Dong, Y. et al. (2025). "XGrammar: Flexible and Efficient Structured Generation Engine." MLSys 2025. arXiv:2411.15100.
  6. Yang, Y. et al. (2025). "CodeAgents: Token-Efficient Framework for Multi-Agent Reasoning." arXiv:2507.03254.
  7. Arora, S. et al. (2023). "EVAPORATE: Language Models for Structured Data Lakes." PVLDB.
  8. BΓ©chard, C. & Ayala, O. (2024). "Reducing Hallucination in Structured Outputs via RAG." NAACL Industry. arXiv:2404.08189.
  9. Shorten, C. et al. (2024). "StructuredRAG: JSON Response Formatting with LLMs." arXiv:2408.11061.
  10. McMillan, A. (2025). "Structured Context Engineering for File-Native Agentic Systems." arXiv:2602.05447.

Questions or Feedback?

I'd love to hear your thoughts on this article. Reach out directly and let's start a conversation.

Follow me on LinkedIn for more BIM tips and updates