Engineering Decisions

A public ADR (Architecture Decision Record) database. Every significant technical decision I've made — context, options, reasoning, tradeoffs, and outcome.

ADR-001

Go for warehouse search service, not Java

Accepted
Context

OZON needed a high-concurrency search microservice for 200M warehouse items. Java/Spring was the team default.

Options considered
GoJava/Spring BootPython/FastAPI
Reasoning

Go's goroutine model handles 10K concurrent connections with ~50MB RAM vs. Java's thread-per-request model needing ~2GB. Single binary deployment eliminates JVM overhead. The team already knew Go.

Tradeoffs

Fewer libraries than Java ecosystem. String manipulation is more verbose. No generics at the time (pre-Go 1.18).

Outcome

Service runs on a single 2-core instance serving 500 RPS at P95 < 80ms. Java equivalent would have needed 4x the hardware.

GoPerformanceArchitecture
ADR-002

ElasticSearch over PostgreSQL full-text for search

Accepted
Context

PostgreSQL FTS was hitting 8-10s at 200M documents. A search engine was clearly needed.

Options considered
ElasticSearchPostgreSQL full-textTypesenseMeilisearch
Reasoning

ES supports custom analyzers for Russian transliteration (a hard requirement). Fuzzy matching and phonetic analysis work out of the box. Horizontal sharding is transparent. Team had prior experience.

Tradeoffs

Operational complexity vs. Postgres (separate cluster, JVM memory). Eventual consistency between Postgres (source of truth) and ES (search replica). Complex upgrade path.

Outcome

Search latency dropped from 8-10s to <500ms P99. Custom analyzer solved transliteration with ~2% false positive rate.

ElasticSearchSearchDatabase
ADR-003

Kafka for search index updates, not polling

Accepted
Context

ES index needs to stay current with the product catalog. Options: poll DB, use CDC, or event stream.

Options considered
DB pollingPostgreSQL CDC (Debezium)Kafka events from catalog service
Reasoning

Catalog service already published events to Kafka. Consuming them is simpler than setting up Debezium CDC. Kafka's consumer group semantics allow replay on failure. Sub-2s lag is acceptable.

Tradeoffs

Depends on catalog service publishing correct events (coupling). Requires handling out-of-order events. CDC would be more robust for high-volume bulk updates.

Outcome

Works well for normal traffic. Exposed a gap during bulk catalog updates where event ordering caused temporary stale data.

KafkaArchitectureElasticSearch
ADR-004

MQTT over HTTP for camera-to-backend communication

Accepted
Context

Parking cameras need to push occupancy events to the backend. Cameras are on a campus LAN.

Options considered
HTTP pollingHTTP webhooks from camerasMQTT pub/subWebSockets
Reasoning

Cameras are resource-constrained (embedded C, limited RAM). MQTT is designed for IoT — small payload overhead, persistent connections, QoS levels for at-least-once delivery. Single broker handles hundreds of cameras.

Tradeoffs

Adds MQTT broker as infrastructure dependency. Topic schema needs discipline or becomes chaotic. Harder to debug than REST.

Outcome

Event latency <200ms from camera to dashboard. Broker handles 200 cameras with <5% CPU on a modest VM.

MQTTIoTArchitectureEmbedded