Vector Tile Architecture & Format Fundamentals

Vector tiles transmit raw geometric primitives and semantic attributes to the client rather than pre-rendered pixels, enabling dynamic styling, interactive querying, and resolution-independent rendering. This reference covers the full stack: spatial partitioning math, the Mapbox Vector Tile (MVT) binary encoding, storage containers, production pipeline patterns, and the failure modes that surface in real deployments.

How Vector Tiles Fit the GeoJSON → CDN Pipeline

Every production pipeline moves data through five discrete stages. Understanding where each format and tool sits prevents the category of bugs where data is correct but ends up in the wrong stage.

The critical transition points are Preprocessing → Tile Generation (where coordinate reference system errors cause silent misalignment) and Storage → CDN Delivery (where container format determines whether you need a tile server at all).

Core Architectural Principles

The Global Tiling Scheme

The industry standard combines the Web Mercator projection (EPSG:3857) with quadtree-based spatial partitioning. At zoom level z the world is divided into 2^z × 2^z tiles; each tile is addressed by (z, x, y). The y-axis origin convention differs by scheme:

Scheme	Y-origin	Used by
XYZ (Slippy Map)	North-West	OpenStreetMap, Google Maps, MapLibre GL JS
TMS	South-West	OGC TMS spec, some GDAL drivers

The conversion is tms_y = (2^z - 1) - xyz_y. Misapplying this is the most common cause of tiles appearing reflected vertically. Input source data is almost always in geographic coordinates (EPSG:4326); the tiling engine reprojects to EPSG:3857 internally during generation.

Geometry and Attribute Decoupling

Raster tiles bake styling into pixels. Vector tiles separate raw geometry from visual presentation, which means the same tile data can be restyled at runtime without re-fetching. Each .mvt tile contains:

Geometries: Points, linestrings, and polygons encoded as integer command sequences
Attributes: Key-value pairs referenced by integer index into per-layer dictionaries
Layer groupings: Named collections such as roads, buildings, or water

This shift moves rendering from the server to the client. The tradeoffs between vector and raster tile delivery are significant enough that the decision affects pipeline architecture, not just format choice.

Spatial Indexing and Feature Generalization

Raw geodata is rarely tile-ready. At low zoom levels, dense datasets — urban building footprints, address points — must be simplified, aggregated, or suppressed to stay within the 500 KB per-tile budget and avoid client-side overdraw. Tippecanoe applies geometry simplification automatically at each zoom level; PostGIS ST_SimplifyPreserveTopology handles this when tiles are generated on-the-fly from a live database. The goal is maximum detail at the highest zoom while maintaining topological correctness across all levels.

The Mapbox Vector Tile Specification

The de facto standard for vector tile encoding is the Mapbox Vector Tile specification v2.1, which uses Protocol Buffers for binary serialization.

Key Specification Fields

Field	Location	Type	Notes
`version`	Layer	uint32	Must be `2` for spec v2.1 compliance
`name`	Layer	string	Layer identifier, e.g. `"roads"`
`extent`	Layer	uint32	Tile coordinate grid size; default `4096`
`features`	Layer	repeated Feature	Each feature carries geometry + attribute refs
`keys`	Layer	repeated string	Deduplicated attribute key dictionary
`values`	Layer	repeated Value	Deduplicated attribute value dictionary
`type`	Feature	GeomType	`UNKNOWN=0`, `POINT=1`, `LINESTRING=2`, `POLYGON=3`
`geometry`	Feature	repeated uint32	Encoded command sequence
`tags`	Feature	repeated uint32	Alternating key-index / value-index pairs
`id`	Feature	optional uint64	Feature identifier; not guaranteed unique across layers

Protocol Buffers and Binary Encoding

MVT uses protobuf to serialize tile data. Unlike JSON, protobuf encodes data with numeric field tags and variable-length integers (varints), eliminating key repetition across thousands of features. The binary delta encoding for coordinates typically reduces coordinate payload size 40–60% compared to storing absolute integers. The encoding is described fully in the Protocol Buffers Developer Documentation.

Geometry Command Sequences

Within each feature, geometry is encoded as a flat array of command-and-parameter integers. Three commands are defined:

Command	ID	Parameters
`MoveTo`	1	`(dx, dy)` — move cursor, start new sub-geometry
`LineTo`	2	`(dx, dy)` repeated — extend current sub-geometry
`ClosePath`	7	none — close a polygon ring

Delta encoding applies: each dx/dy is the difference from the previous cursor position, zigzag-encoded as an unsigned integer. Decoding without applying the cumulative delta is a common source of scrambled geometry when writing custom MVT parsers.

Coordinate Quantization

MVT does not store geographic coordinates. During generation, coordinates are transformed from EPSG:3857 into a local integer grid — 4096 × 4096 units per tile by default (controlled by the extent field). A geographic point at the tile’s north-west corner maps to (0, 0); the south-east corner maps to (4095, 4095). Features that extend slightly beyond a tile boundary are clipped and a buffer is added (typically 64 units, or ~1.5% of tile width) to prevent rendering seams at tile edges.

Implementation Patterns

Pattern 1: Static File Pipeline with Tippecanoe

The most common production pattern for datasets that change infrequently: generate all tiles once, store in PMTiles, serve from object storage.

bash

# 1. Validate and reproject source data
ogr2ogr \
  -f GeoJSON /tmp/roads_4326.geojson \
  source/roads.geojson \
  -t_srs EPSG:4326 \
  -makevalid

# 2. Generate tiles: z0–z14, roads layer only, drop unused attributes
tippecanoe \
  --output roads.pmtiles \
  --layer roads \
  --minimum-zoom 0 \
  --maximum-zoom 14 \
  --include name \
  --include highway \
  --include oneway \
  --drop-densest-as-needed \
  --extend-zooms-if-still-dropping \
  /tmp/roads_4326.geojson

# 3. Inspect output before deploying
pmtiles show roads.pmtiles
pmtiles verify roads.pmtiles

# 1. Validate and reproject source data
ogr2ogr \
  -f GeoJSON /tmp/roads_4326.geojson \
  source/roads.geojson \
  -t_srs EPSG:4326 \
  -makevalid

# 2. Generate tiles: z0–z14, roads layer only, drop unused attributes
tippecanoe \
  --output roads.pmtiles \
  --layer roads \
  --minimum-zoom 0 \
  --maximum-zoom 14 \
  --include name \
  --include highway \
  --include oneway \
  --drop-densest-as-needed \
  --extend-zooms-if-still-dropping \
  /tmp/roads_4326.geojson

# 3. Inspect output before deploying
pmtiles show roads.pmtiles
pmtiles verify roads.pmtiles

The --drop-densest-as-needed flag lets Tippecanoe remove the densest features when a tile overflows 500 KB rather than aborting — critical for datasets with uneven spatial density. The essential Tippecanoe flags for production builds covers the full flag taxonomy.

Pattern 2: Python Subprocess Pipeline for Automated Nightly Builds

When the source dataset updates on a schedule, wrap Tippecanoe in a Python subprocess to integrate with data validation and alerting.

python

import subprocess
import sys
from pathlib import Path

def build_tiles(
    input_geojson: Path,
    output_pmtiles: Path,
    layer: str,
    min_zoom: int = 0,
    max_zoom: int = 14,
) -> None:
    cmd = [
        "tippecanoe",
        f"--output={output_pmtiles}",
        f"--layer={layer}",
        f"--minimum-zoom={min_zoom}",
        f"--maximum-zoom={max_zoom}",
        "--drop-densest-as-needed",
        "--extend-zooms-if-still-dropping",
        "--force",           # overwrite existing output
        str(input_geojson),
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        print(result.stderr, file=sys.stderr)
        raise RuntimeError(f"tippecanoe failed with exit code {result.returncode}")
    print(result.stderr)   # tippecanoe progress goes to stderr

build_tiles(
    Path("/data/roads_4326.geojson"),
    Path("/output/roads_v2.pmtiles"),
    layer="roads",
    max_zoom=14,
)

import subprocess
import sys
from pathlib import Path

def build_tiles(
    input_geojson: Path,
    output_pmtiles: Path,
    layer: str,
    min_zoom: int = 0,
    max_zoom: int = 14,
) -> None:
    cmd = [
        "tippecanoe",
        f"--output={output_pmtiles}",
        f"--layer={layer}",
        f"--minimum-zoom={min_zoom}",
        f"--maximum-zoom={max_zoom}",
        "--drop-densest-as-needed",
        "--extend-zooms-if-still-dropping",
        "--force",           # overwrite existing output
        str(input_geojson),
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        print(result.stderr, file=sys.stderr)
        raise RuntimeError(f"tippecanoe failed with exit code {result.returncode}")
    print(result.stderr)   # tippecanoe progress goes to stderr

build_tiles(
    Path("/data/roads_4326.geojson"),
    Path("/output/roads_v2.pmtiles"),
    layer="roads",
    max_zoom=14,
)

For large GeoParquet inputs, converting large GeoParquet files to vector tiles covers streaming the conversion via pyarrow → NDJSON → Tippecanoe stdin to avoid materializing the full dataset as a .geojson file.

Pattern 3: On-the-Fly Generation from PostGIS with Martin

For datasets updated continuously — real-time tracking, live incident feeds — static pre-generation is not feasible. Martin serves MVT directly from PostGIS with sub-100 ms tile generation for typical urban extents:

toml

# martin.toml
[postgres]
connection_string = "postgresql://user:pass@localhost/gisdb"

[[postgres.tables]]
schema  = "public"
table   = "incidents"
srid    = 4326
geometry_column = "geom"
minzoom = 10
maxzoom = 18

# martin.toml
[postgres]
connection_string = "postgresql://user:pass@localhost/gisdb"

[[postgres.tables]]
schema  = "public"
table   = "incidents"
srid    = 4326
geometry_column = "geom"
minzoom = 10
maxzoom = 18

bash

martin --config martin.toml
# Tiles served at: http://localhost:3000/public.incidents/{z}/{x}/{y}

martin --config martin.toml
# Tiles served at: http://localhost:3000/public.incidents/{z}/{x}/{y}

Martin does not apply zoom-level generalization by default — the database query returns all features within the tile bounding box. Add a PostGIS ST_Simplify call in a view for lower zooms, or use Martin’s built-in simplify_until_mvt_size option (Martin ≥ 0.13).

Performance and Scale Considerations

Tile Size Budget

The 500 KB uncompressed tile limit is a practical ceiling, not a hard spec constraint. In practice:

Tile size (uncompressed)	Client effect
< 100 KB	Fast parse, no rendering stall
100–500 KB	Acceptable; monitor parse time in DevTools
500 KB–1 MB	Parse stalls main thread unless Web Workers are used
> 1 MB	MapLibre GL JS may drop the tile silently

Gzip reduces wire size by roughly 70–80% for typical MVT payloads. Always serve with Content-Encoding: gzip or Content-Encoding: br (Brotli).

Zoom-Level Data Inclusion Strategy

Zoom-level optimization controls which features appear at which zoom levels. A practical starting matrix:

Zoom	Feature types included
z0–z5	Country/continent boundaries, major ocean labels
z6–z9	State/province boundaries, major roads, rivers
z10–z13	City boundaries, arterial roads, parks, landcover
z14–z16	Building footprints, address points, minor roads
z17–z18	Individual units, detailed POIs

Tippecanoe implements this via --minimum-zoom per layer or with feature-level zoom filters in GeoJSON properties.

Memory and CPU Constraints

Tippecanoe is single-threaded during the initial feature ingestion pass and memory-proportional to the input dataset. For datasets exceeding 10 GB GeoJSON:

Use --read-parallel to exploit multi-core machines during the initial read
Pipe from geopandas or pyogrio rather than materializing a full .geojson file
Monitor RSS; Tippecanoe typically needs 2–4× the uncompressed input size in RAM

Martin’s PostGIS backend is constrained by database connection pool size and query plan efficiency; add spatial indexes (CREATE INDEX ON table USING GIST(geom)) before production traffic.

Storage and Delivery

MBTiles: SQLite-Based Container

MBTiles packages tiles into a single SQLite database with a schema indexed by (zoom_level, tile_column, tile_row). Tiles are stored as BLOBs in the tiles table. The format is universally supported by desktop GIS tools and tile servers (TileServer GL, Martin, pg_tileserv).

sql

-- MBTiles schema (read-only inspection)
SELECT zoom_level, COUNT(*) AS tile_count
FROM tiles
GROUP BY zoom_level
ORDER BY zoom_level;

-- MBTiles schema (read-only inspection)
SELECT zoom_level, COUNT(*) AS tile_count
FROM tiles
GROUP BY zoom_level
ORDER BY zoom_level;

MBTiles limitations for production CDN delivery:

SQLite write locks prevent concurrent generation workers from writing the same file
No native HTTP range request support — requires a tile server process
File sizes above ~20 GB create operational friction (backup, transfer, atomic swap)

For resolving SQLite lock contention during large MBTiles builds, the standard fix is generating tiles into a directory of files first, then assembling the MBTiles in a single-writer pass.

PMTiles: Serverless, Range-Request Optimized Archives

PMTiles uses a single sequentially written archive where tile coordinates are mapped to byte offsets via a Hilbert curve–indexed directory. This layout allows CDNs and object storage (S3, Cloudflare R2) to serve individual tiles with a single HTTP range request — no tile server process required.

bash

# Inspect PMTiles header and directory
pmtiles show output.pmtiles

# Verify all tiles are readable
pmtiles verify output.pmtiles

# Serve locally for development (proxies range requests)
pmtiles serve output.pmtiles --port 8080

# Inspect PMTiles header and directory
pmtiles show output.pmtiles

# Verify all tiles are readable
pmtiles verify output.pmtiles

# Serve locally for development (proxies range requests)
pmtiles serve output.pmtiles --port 8080

The PMTiles specification deep dive covers the archive format, directory structure, and how to inspect metadata with CLI tools.

CDN Cache-Control Strategy

Serve tiles with versioned URL paths and immutable cache headers:

text

# Nginx example
location ~ ^/tiles/v(?<version>[0-9]+)/(?<z>[0-9]+)/(?<x>[0-9]+)/(?<y>[0-9]+)\.mvt$ {
    add_header Cache-Control "public, max-age=31536000, immutable";
    add_header Content-Encoding gzip;
    add_header Content-Type "application/vnd.mapbox-vector-tile";
    root /var/tiles;
}

# Nginx example
location ~ ^/tiles/v(?<version>[0-9]+)/(?<z>[0-9]+)/(?<x>[0-9]+)/(?<y>[0-9]+)\.mvt$ {
    add_header Cache-Control "public, max-age=31536000, immutable";
    add_header Content-Encoding gzip;
    add_header Content-Type "application/vnd.mapbox-vector-tile";
    root /var/tiles;
}

When source data changes, rotate the version prefix (/v2/ → /v3/) rather than invalidating individual tiles. This guarantees instant global consistency without CDN purge API calls. The full header taxonomy, tile-server choices, and range-request hosting live in Tile Serving & CDN Delivery.

For PMTiles on Cloudflare R2 or S3, set the bucket’s default Cache-Control metadata on upload:

bash

aws s3 cp roads_v2.pmtiles s3://my-tiles-bucket/roads_v2.pmtiles \
  --cache-control "public, max-age=31536000, immutable" \
  --content-type "application/octet-stream"

aws s3 cp roads_v2.pmtiles s3://my-tiles-bucket/roads_v2.pmtiles \
  --cache-control "public, max-age=31536000, immutable" \
  --content-type "application/octet-stream"

Failure Modes and Debugging

Tile Overflow: Payload Exceeds 500 KB

Symptom: Tippecanoe prints tile is too large warnings; MapLibre silently drops tiles.

Diagnosis:

bash

# Find oversized tiles in a PMTiles archive
pmtiles show output.pmtiles | grep "max tile size"

# Find oversized tiles in an MBTiles container
sqlite3 output.mbtiles \
  "SELECT zoom_level, tile_column, tile_row, length(tile_data) AS bytes
   FROM tiles WHERE length(tile_data) > 500000
   ORDER BY bytes DESC LIMIT 20;"

# Find oversized tiles in a PMTiles archive
pmtiles show output.pmtiles | grep "max tile size"

# Find oversized tiles in an MBTiles container
sqlite3 output.mbtiles \
  "SELECT zoom_level, tile_column, tile_row, length(tile_data) AS bytes
   FROM tiles WHERE length(tile_data) > 500000
   ORDER BY bytes DESC LIMIT 20;"

Fix: Add --drop-densest-as-needed to Tippecanoe, or reduce --maximum-zoom to a level where tiles stay under budget. For attribute-heavy datasets, dropping unused attributes reduces tile size substantially before generalization is needed.

Projection Mismatch: Geometry Renders off the Map

Symptom: Features appear at 0°N 0°E, or are displaced thousands of kilometres.

Diagnosis:

bash

ogrinfo -al -so source.geojson | grep "SRS WKT"
# Look for PROJCRS or unexpected GEOGCRS authority

ogrinfo -al -so source.geojson | grep "SRS WKT"
# Look for PROJCRS or unexpected GEOGCRS authority

Fix: Always reproject to EPSG:4326 before passing to Tippecanoe:

bash

ogr2ogr -t_srs EPSG:4326 -makevalid output_4326.geojson input.geojson

ogr2ogr -t_srs EPSG:4326 -makevalid output_4326.geojson input.geojson

Y-Axis Inversion: TMS vs XYZ Flip

Symptom: Tiles appear with north and south swapped; the map renders upside-down.

Diagnosis: Check whether the tile server is serving TMS-indexed tiles to a client expecting XYZ convention (or vice versa).

Fix: Apply tms_y = (2^z - 1) - xyz_y in the URL template, or configure the tile server’s tms flag. In MapLibre GL JS, set "scheme": "tms" in the source definition if the origin is south-west.

MVT Parse Error: Unexpected End of Buffer

Symptom: Client throws Protobuf error: unexpected end of buffer or renders blank tiles.

Diagnosis: Tile was served with Content-Encoding: gzip but the client received it without decompression (or vice versa).

Fix: Inspect response headers with curl -I:

bash

curl -I "https://tiles.example.com/v2/10/512/384.mvt"
# Confirm Content-Encoding matches Accept-Encoding

curl -I "https://tiles.example.com/v2/10/512/384.mvt"
# Confirm Content-Encoding matches Accept-Encoding

Ensure the tile server does not double-compress: if the .mvt file is already gzip-compressed on disk, do not apply gzip compression in the HTTP layer again.

Attribute Values Appear as Integers Instead of Strings

Symptom: A field that should be "residential" renders as 3.

Cause: The MVT values dictionary encodes typed values (string_value, int_value, float_value); a mismatch between the encoding type and the client’s expectation causes incorrect deserialization.

Fix: Inspect the raw protobuf:

bash

# Decode a tile's protobuf with vtzero or similar
vtzero-show tile.mvt | head -80

# Decode a tile's protobuf with vtzero or similar
vtzero-show tile.mvt | head -80

Ensure source data has consistent types per attribute column — mixed int/string in GeoJSON properties will cause Tippecanoe to coerce all values to string.

Topic Index

This section’s depth is organized across six focused topics:

MBTiles Architecture & Limits — SQLite schema internals, concurrent write constraints, file-size operational limits, and the lock contention patterns that break automated nightly builds.

PMTiles Specification Deep Dive — Archive header layout, Hilbert curve tile addressing, metadata inspection with CLI tools, and the PMTiles-versus-MBTiles decision for S3 and Cloudflare R2 delivery.

MVT Encoding Internals — The protobuf tile structure byte by byte: layers and features, the zigzag geometry command integers, the 4096 tile extent, and the key/value attribute tables that a style reads.

Tile Coordinate Systems & the Slippy Map Grid — EPSG:4326 versus EPSG:3857, the z/x/y grid, the XYZ-versus-TMS y-axis flip, quadkeys, and converting a latitude/longitude to a tile number.

Vector vs Raster Tile Tradeoffs — Where vector tile delivery outperforms raster (dynamic styling, offline caching, interactive querying) and where raster remains the right choice (satellite imagery, complex cartographic renders, legacy client compatibility).

Zoom Level Optimization Strategies — Feature inclusion rules by zoom, calculating the optimal maximum zoom for urban datasets, and choosing between overzoom and a higher generated max-zoom.

Automated Generation Pipelines with Tippecanoe — CLI flag taxonomy, attribute filtering rules, GeoParquet input processing, and geometry simplification algorithms for batch tile generation.
Geometry Simplification Algorithms — How Tippecanoe chooses between Visvalingam–Whyatt and Douglas–Peucker at each zoom level, and when to override the default.
Map Styling and Layer Synchronization — MapLibre GL JS style spec structure, dynamic attribute mapping, and binding data-driven paint properties to vector tile layer attributes.
Attribute Filtering Rules — --include, --exclude, and --exclude-all flag semantics for removing attributes that inflate tile size without adding client-side value.
GeoParquet Input Processing — Reading columnar GeoParquet files efficiently and streaming them to Tippecanoe without materializing intermediate GeoJSON.
Tile Serving & CDN Delivery — How the MBTiles and PMTiles containers described here reach the browser: tile servers, object-storage range requests, cache-control headers, and versioned URLs.

Next reading MBTiles Architecture & Limits Next reading MVT Encoding Internals: The Protobuf Tile Structure Next reading PMTiles Specification Deep Dive Next reading Tile Coordinate Systems & the Slippy Map Grid Next reading Vector vs Raster Tile Tradeoffs Next reading Zoom Level Optimization Strategies