Data Interchange Formats: Complete Conversion Guide
Master data interchange formats including CSV, JSON, XML, YAML, TOML, and INI. Learn when to use each format, how to convert between them without data loss, and best practices for handling nested structures, arrays, and type preservation.
Overview of Data Formats
Data interchange formats serve as the universal language between different systems, applications, and programming languages. Choosing the right format impacts performance, maintainability, and interoperability.
Why Format Choice Matters
- Performance: Parsing speed and file size affect application responsiveness
- Readability: Human-readable formats simplify debugging and configuration
- Interoperability: Standard formats ensure cross-platform compatibility
- Tooling: Popular formats have rich ecosystems of validators and converters
- Schema support: Some formats enforce structure, others are flexible
CSV: Comma-Separated Values
CSV is the simplest format for representing tabular data. Each line is a row, with values separated by commas (or other delimiters).
Structure
name,age,city,active
John Doe,32,New York,true
Jane Smith,28,Los Angeles,false
Bob Johnson,45,Chicago,true
Key Characteristics
- Best for: Flat, tabular data with uniform structure
- Strengths: Universal support, small file size, Excel/spreadsheet compatible
- Limitations: No nested structures, no data types (all strings), limited metadata
- Common delimiters: Comma (,), Tab (\t), Semicolon (;), Pipe (|)
CSV Gotchas
// Values with commas need quoting
"Johnson, Bob",45,"Chicago, IL",true
// Embedded quotes are doubled
"She said, ""Hello""",greeting
// European Excel uses semicolon delimiter
name;price;quantity
Apple;1,50;10 // 1,50 is decimal (European format)
CSV Tools
JSON: JavaScript Object Notation
JSON is the dominant format for web APIs and modern data interchange. Lightweight, human-readable, and natively supported in JavaScript.
Structure
{
"users": [
{
"id": 1,
"name": "John Doe",
"active": true,
"address": {
"city": "New York",
"zip": "10001"
},
"tags": ["admin", "developer"]
},
{
"id": 2,
"name": "Jane Smith",
"active": false,
"address": null,
"tags": []
}
]
}
Key Characteristics
- Best for: Web APIs, configuration, data exchange between modern apps
- Strengths: Compact, native JavaScript support, supports nesting, typed values
- Data types: string, number, boolean, null, array, object
- Limitations: No comments, no date type (use ISO strings), no binary data
JSON Best Practices
// Use camelCase or snake_case consistently
{"firstName": "John"} or {"first_name": "John"}
// ISO 8601 for dates
{"createdAt": "2024-03-15T10:30:00Z"}
// Null for missing values, not empty strings
{"phone": null} // not {"phone": ""}
// Avoid deeply nested structures (>3 levels)
// Consider flattening or using references
JSON Tools
XML: Extensible Markup Language
XML is a markup language designed for documents and data with complex hierarchies. Widely used in enterprise systems, SOAP services, and configuration files.
Structure
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user id="1" active="true">
<name>John Doe</name>
<address>
<city>New York</city>
<zip>10001</zip>
</address>
<tags>
<tag>admin</tag>
<tag>developer</tag>
</tags>
</user>
</users>
Key Characteristics
- Best for: Document-centric data, SOAP APIs, enterprise systems, RSS/Atom feeds
- Strengths: Attributes and elements, namespaces, robust schema validation (XSD), XSLT transformations
- Limitations: Verbose, slower parsing, larger file size, more complex
- Special characters: Must escape < > & " '
XML vs JSON Trade-offs
| Feature | XML | JSON |
|---|---|---|
| Verbosity | High (tags everywhere) | Low (minimal syntax) |
| Attributes | Yes (<user id="1">) | No (use object properties) |
| Comments | Yes (<!-- comment -->) | No (workaround: "_comment" field) |
| Schema validation | XSD, DTD, RelaxNG | JSON Schema |
| Parsing speed | Slower | Faster |
| File size | 30-50% larger | Smaller |
XML Tools
YAML: YAML Ain't Markup Language
YAML is a human-friendly data serialization format focused on readability. Popular for configuration files, CI/CD pipelines, and infrastructure-as-code.
Structure
users:
- id: 1
name: John Doe
active: true
address:
city: New York
zip: "10001" # Quoted to preserve leading zero
tags:
- admin
- developer
- id: 2
name: Jane Smith
active: false
address: null
tags: []
# Comments are supported
database:
host: localhost
port: 5432
Key Characteristics
- Best for: Configuration files, Docker Compose, Kubernetes, Ansible, CI/CD
- Strengths: Very readable, supports comments, anchors for reuse, multi-line strings
- Limitations: Whitespace-sensitive (indentation matters), slower parsing, security concerns (arbitrary code execution in some parsers)
- Superset of JSON: Valid JSON is valid YAML (but not vice versa)
YAML Features
# Multi-line strings (preserve newlines)
description: |
This is a long
multi-line text
with preserved newlines.
# Folded style (spaces between lines)
summary: >
This text will be
folded into a single
line with spaces.
# Anchors and aliases (reuse blocks)
defaults: &defaults
timeout: 30
retries: 3
production:
<<: *defaults # Merge defaults
host: prod.example.com
development:
<<: *defaults
host: dev.example.com
YAML Tools
Other Common Formats
TOML: Tom's Obvious Minimal Language
Designed for configuration files with a focus on being easy to read due to obvious semantics.
# TOML example
title = "Configuration File"
[database]
host = "localhost"
port = 5432
enabled = true
[database.credentials]
username = "admin"
password = "secret"
[[servers]]
name = "alpha"
ip = "10.0.0.1"
[[servers]]
name = "beta"
ip = "10.0.0.2"
Best for: Configuration files (Cargo, Hugo, pip). More explicit than YAML, less verbose than XML.
INI: Initialization Files
Simple key-value format with sections, used for legacy Windows applications and some config files.
; INI example
[database]
host=localhost
port=5432
enabled=true
[server]
name=Production
ip=10.0.0.1
Best for: Simple configurations, legacy systems, Git config (.gitconfig).
Properties Files (Java)
Key-value pairs used in Java applications, Spring Boot, and Android.
# Properties example
database.host=localhost
database.port=5432
database.enabled=true
app.name=MyApplication
Best for: Java applications, Spring configuration, Android strings.
Format Comparison Table
| Feature | CSV | JSON | XML | YAML | TOML | INI |
|---|---|---|---|---|---|---|
| Human readable | High | High | Medium | Very high | Very high | High |
| Parser availability | Universal | Universal | Universal | Wide | Growing | Limited |
| Schema support | No | JSON Schema | XSD, DTD | Limited | No | No |
| Comments | No | No | Yes | Yes | Yes | Yes |
| Data types | Strings only | 7 types | Text (typed via schema) | 11 types | 9 types | Strings only |
| Nested structures | No | Yes | Yes | Yes | Yes | Limited |
| Arrays | Rows only | Yes | Yes | Yes | Yes | No |
| File size | Smallest | Small | Large | Medium | Medium | Small |
| Parsing speed | Fastest | Fast | Slow | Medium | Medium | Fast |
| Binary data | No | Base64 | Base64 | Base64 | Base64 | No |
When to Use Each Format
Use CSV When
- Data is naturally tabular (rows and columns)
- Exporting/importing spreadsheet data
- Simple data exchange with non-technical users
- Working with data analysis tools (Pandas, R, Excel)
- File size and parsing speed are critical
Examples: Sales reports, contact lists, survey results, database exports
Use JSON When
- Building REST APIs or web services
- Data exchange between JavaScript applications
- Nested or hierarchical data structures
- NoSQL databases (MongoDB, CouchDB)
- Modern application configuration (when comments aren't needed)
Examples: API responses, app configuration, log aggregation, microservices communication
Use XML When
- Working with SOAP web services
- Document-centric data (books, articles, legal documents)
- Enterprise systems with existing XML infrastructure
- Strict schema validation is required
- Need for attributes, namespaces, or XSLT transformations
Examples: RSS/Atom feeds, SVG graphics, SOAP APIs, Microsoft Office formats (.docx, .xlsx)
Use YAML When
- Configuration files (Docker, Kubernetes, Ansible)
- CI/CD pipeline definitions (GitHub Actions, GitLab CI)
- Human-edited data files
- Infrastructure as code
- Need for comments and readability
Examples: docker-compose.yml, .gitlab-ci.yml, Kubernetes manifests, Ansible playbooks
Use TOML When
- Application configuration files
- Rust projects (Cargo.toml)
- Python projects (pyproject.toml)
- Want readability with less ambiguity than YAML
Examples: Cargo.toml, pyproject.toml, Hugo config
Use INI When
- Simple configuration with sections
- Legacy Windows applications
- Git configuration
- Flat key-value pairs with grouping
Examples: .gitconfig, php.ini, desktop entries on Linux
Conversion Best Practices
1. Schema Preservation
When converting between formats, maintain data structure and meaning:
// JSON to XML: Decide on attribute vs element strategy
// Option 1: Everything as elements
{"user": {"id": 1, "name": "John"}}
→ <user><id>1</id><name>John</name></user>
// Option 2: Use attributes for metadata
→ <user id="1"><name>John</name></user>
2. Handling Nested Data
Strategies for converting hierarchical data to flat formats:
// JSON with nesting
{
"user": {
"name": "John",
"address": {
"city": "NYC",
"zip": "10001"
}
}
}
// Flattened CSV approach 1: Dot notation
user.name,user.address.city,user.address.zip
John,NYC,10001
// Flattened CSV approach 2: Separate tables
users.csv: id,name
addresses.csv: user_id,city,zip
3. Array Handling
Converting arrays between formats requires careful consideration:
// JSON array to CSV
{"user": "John", "tags": ["admin", "user"]}
// Option 1: Join with delimiter
user,tags
John,"admin;user"
// Option 2: Multiple rows (normalized)
user,tag
John,admin
John,user
// Option 3: Multiple columns
user,tag1,tag2
John,admin,user
4. Type Coercion
Be explicit about type handling:
// CSV to JSON: Decide on type inference
"123" → 123 (number) or "123" (string)?
"true" → true (boolean) or "true" (string)?
"2024-01-15" → keep as string or parse as date?
// Best practice: Provide options
- Auto-detect types
- Force string mode (preserve original)
- Use schema/type hints
5. Encoding and Special Characters
// Always use UTF-8 encoding
// Handle format-specific escaping:
CSV: "She said, ""Hello"""
JSON: "She said, \"Hello\""
XML: <msg>She said, "Hello"</msg>
YAML: msg: 'She said, "Hello"'
6. Batch Conversion Workflows
For converting multiple files:
- Validate input files before conversion
- Use consistent naming conventions for output
- Log conversion errors and warnings
- Preserve directory structure when appropriate
- Verify row/record counts match
- Sample-check converted data
Common Pitfalls and Solutions
1. Data Type Loss
2. Encoding Issues
3. Nested Structure Flattening
4. XML Attribute vs Element
5. Large Number Precision
6. Date and Time Handling
Data Format Conversion Tools
Popular Conversions
JSON Converters
CSV Converters
XML Converters
YAML Converters
Specialized Conversions
Quick Decision Guide
Choose CSV if...
- Data is flat/tabular
- Target is Excel/spreadsheet
- Maximum compatibility needed
Choose JSON if...
- Building web APIs
- Data has nested structures
- Using JavaScript/Node.js
Choose XML if...
- Enterprise/legacy integration
- Need strict schema validation
- Document-centric data
Choose YAML if...
- Human-edited config files
- DevOps/infrastructure code
- Need comments and readability