Skip to main content
J.M Digital Solutions

Portfolio / Cemomemo

Cemomemo

Flagship OCR + AI workflow product

Business problem

High-variance memorial imagery required manual interpretation before records could be used reliably in downstream workflows.

Direction of value

OCR plus AI-assisted extraction and backend validation turn complex images into structured, reviewable records at scale.

Flagship OCR and AI workflow: image processing, structured extraction, backend integration, and automation for real-world memorial registry operations—not a brochure site.

OCR · AI-assisted extraction · Backend services · Workflow automation

https://www.cemomemo.org/home

Cemomemo

Live deployment reference.

Project overview

Cemomemo is a production memorial registry workflow. The product is not “a website with a form”—it is applied engineering across uneven visual input, OCR, model-assisted structuring, validation, persistence, and operator review so real records can be trusted downstream.

Memorial imagery varies wildly: lighting, angles, handwriting, stamps, and composite layouts. The system has to accept that messiness without pushing all ambiguity onto staff.

Downstream use depends on predictable fields: search, reporting, and workflow triggers assume the database is the source of truth, not an inbox of attachments.

Public-facing and operator paths share infrastructure so permissions, validation, and audit expectations stay consistent end to end.

Engagement and role

Solo-led engineering across the pipeline, backend services, and product-facing web layer: from intake behavior through persistence, validation, and the states an operator needs before a record is considered “live.”

How the work phased

Phase names describe sequencing and risk reduction, not fixed week counts for every future project.

  • Discovery and field truth

    Mapped how memorial content enters the organization, what “good” structured output looks like for each downstream step, and where human judgment cannot be automated away.

  • Pipeline MVP

    End-to-end path: capture/preprocess, OCR, model-assisted extraction with schema constraints, persistence, and minimal review UI to prove repeatability on real samples.

  • Review, exceptions, and hardening

    First-class states for uncertain extractions, operational tooling, and tighter validation so staff work through exceptions instead of fighting silent corruption.

  • Production integration

    Live deployment paths aligned with privacy expectations, monitoring hooks, and operator handoff so intake volume does not outpace supportability.

Business problem

Memorial-related imagery arrived in inconsistent formats and quality. Manual interpretation was slow, expensive at volume, and produced uneven structured output—blocking search, operations, and anything that depends on consistent fields.

Every minute spent re-keying or guessing field values is latency for families and operators, and it does not compound: the next image has the same variance problem.

Without a pipeline, “quality” becomes tribal knowledge in chat threads instead of rules the system can enforce.

Technical challenge

Real photos and scans are noisy. A single OCR pass is not enough. The platform must separate ingestion, extraction, validation, review, and downstream contracts so improvements to one layer do not destabilize the rest.

Model output must be bounded: schemas, normalization, and validation are what turn probabilistic guesses into records a database can safely index.

Operators need affordances to correct and audit without turning the UI into a generic spreadsheet that defeats the purpose of automation.

Solution approach

A multi-stage pipeline: disciplined image handling, OCR, AI-assisted field proposals under schema constraints, backend validation, explicit review states, and integration into application workflows—so automation handles the bulk and humans handle judgment calls.

Each stage emits structured artifacts downstream can rely on; failed or low-confidence paths surface clearly instead of failing quietly.

Where the memorial context demands sensitivity, UX and routing favor conservative defaults and clear separation between public narrative and operator-only paths.

Architecture and systems thinking

  • Pipeline stages are isolated so tuning OCR or extraction approaches does not ripple unpredictably into storage or API contracts.
  • Structured output schemas are treated as versioned interfaces for any consumer—search, reporting, or future automation.
  • Review and exception queues are first-class data, not an afterthought bolted onto a “happy path only” design.
  • Async-friendly boundaries allow heavier work without collapsing interactive paths for operators.

Key capabilities shipped

  • OCR tuned for difficult real-world memorial input
  • AI-assisted extraction with normalization aligned to memorial registry fields
  • Backend persistence, validation, and workflow states operators can trust
  • Automation on repetitive paths with explicit checkpoints where discretion is required
  • Privacy-conscious UX patterns appropriate to memorial registry operations

Technical decisions

  • Prefer explicit schemas and validation over ad hoc JSON blobs so regressions are detectable and migrations are possible.
  • Keep extraction as “propose + verify” rather than blind writes—confidence and business rules meet in one place.
  • Separate preprocessing from interpretation so hardware/camera variance is handled before models see pixels.

Stack

  • OCR pipeline
  • AI-assisted extraction
  • Backend services
  • PostgreSQL-class persistence
  • Web application layer
  • Workflow and review state machinery

Privacy and safety

  • Memorial data is treated as sensitive by default: conservative handling and clear distinctions between public narrative and operator-only paths.
  • Operational tooling favors audit-friendly patterns: who changed what, and from which review state.

Implementation notes

  • Engineering time prioritized repeatability and maintainability over one-off proofs that cannot survive weekly intake volume.
  • Edge cases were collected from real samples early so the review UX was designed for the actual exception mix, not hypothetical happy paths.
  • Collaboration stayed grounded in business language: what “publishable” means for a record, not only model scores.

Results and outcomes

Outcomes below are qualitative and scoped to the engagement. No fabricated metrics or client quotes.

  • Strong reduction in manual transcription and reinterpretation on scoped intake paths
  • More consistent structured records for search, operations, and downstream triggers
  • A flagship reference for applied OCR and AI in a domain where sloppy automation would be unacceptable

What a roadmap might tackle next

Illustrative engineering direction, not a commitment or public product promise.

  • Broader document families and layout templates as new memorial formats appear
  • Richer reviewer analytics: where the model struggles, and whether rules—not more parameters—fix the pattern
  • Deeper automation once exception rates fall below an agreed threshold per record type

Build something in this space

Describe your project. You will get a concise reply with fit, rough approach, and next steps.