QuantLens

Quality Scorecard

Validation Pass (current)
Rejected During Intake
PII Findings at Release
SHA-256 Coverage
Provenance Coverage
Freshness SLO
Schema Version
Open Drift Alerts

Validation Pipeline

Ingest Normalize Validate Enrich Index Package

Each stage includes quality gates: schema/contracts, integrity (SHA‑256), PII safety, coverage/freshness, value ranges, duplicates/joins, and text/NLP checks.

Quality Gates

GateRulesThresholdResultLast Run (UTC)
Schema & Contractstypes, nullability, enums, keys100% rules passPASS2025‑11‑05 00:00
IntegritySHA‑256 per file; Merkle root100% coveredPASS2025‑11‑05 00:00
PII & Safetymulti‑pass scanners; deny‑lists0 findingsPASS2025‑11‑05 00:00
Coverage & Freshnesstemporal bounds; recency≤ 24h lagPASS2025‑11‑05 00:00
Values & Rangesranges, outliers, monotonic99.9% validPASS2025‑11‑05 00:00
Duplicates & Joinsunique keys; referential integrity0 dup; 0 orphanPASS2025‑11‑05 00:00
NLP/Textmin token length; language100% validPASS2025‑11‑05 00:00

Artifacts & Verification

Public Catalog
public.catalog.json
Metadata and metrics
Schema (sample)
schema.json
Types, nullability, enums
Expectations (sample)
expectations.json
Rule set per pack
Hashes (sample)
sha256sum.txt
Per‑file checksums

Validate a download locally

# Windows (PowerShell)
certutil -hashfile pack.zip SHA256

# macOS / Linux
shasum -a 256 pack.zip

# Python (verify schema with jsonschema)
from jsonschema import validate
import json
with open('schema.json') as s, open('data.json') as d:
    validate(json.load(d), json.load(s))
          

Governance & SLOs

  • Versioning: Semantic versioning; schema files versioned; breaking changes only on major.
  • Release SLOs: build success ≥ 99.9%; freshness SLO ≤ 24h per pack; incident response P1 < 4h.
  • Error budget: monthly validation failure budget ≤ 0.5% with auto‑rollback.
  • Changelog: UPDATES.md with impact, migration paths, and deprecation windows.

FAQ / Guarantees

What happens if a check fails?

The release is blocked. We either hotfix the pipeline or roll back to the last green release and publish a notice in UPDATES.md.

How do I validate a download?

Use the commands above to verify the SHA‑256 and validate JSON against the schema for your pack.

Our Quality Promise

QuantLens applies a "validate early, validate often" approach. We reject 15-20% of raw data during quality checks to ensure only enterprise-grade data reaches you.

100%
Validation Pass Rate
6
Validation Stages
15-20%
Data Rejected
Monthly
URL Re-testing

The 6-Stage Validation Pipeline

🔍

Stage 1: Source Verification

What: Verify data comes from authoritative sources

We only accept data from official government agencies, academic institutions, and verified public datasets. No third-party aggregators.

  • Domain validation (sec.gov, noaa.gov, uspto.gov, etc.)
  • SSL certificate verification
  • HTTP accessibility testing (200 OK status)
  • Rate limiting compliance with source APIs
  • Data freshness checks vs expected update cadence
❌ Rejection Criteria:

Source domain mismatch, broken links (404/403), expired SSL certificates, data staleness

📋

Stage 2: Schema Validation

What: Enforce strict data structure and type compliance

Every field is validated against JSON Schema Draft 2020-12 or CSV schemas. Missing fields or type mismatches trigger automatic rejection.

  • Required fields present (no null/missing critical data)
  • Type validation (string, number, boolean, array, object)
  • Nested object structure compliance
  • Array element type consistency
  • Field name standardization (snake_case)
❌ Rejection Criteria:

Missing required fields, type mismatches, invalid enum values, malformed structures

🔗

Stage 3: Provenance Tracking

What: Ensure full audit trail to original source

Every record includes a direct link to the original source document. We test these URLs monthly to catch link rot.

  • Source URL links to official domain
  • Metadata captured (filing_date, accession, document_id)
  • URLs tested for accessibility (not just format validation)
  • Cross-reference validation (e.g., accession matches SEC format)
❌ Rejection Criteria:

Missing source URL, broken links, URLs pointing to aggregators instead of primary sources

Stage 4: Business Rule Validation

What: Apply domain-specific logic and sanity checks

Beyond schema validation, we enforce business rules specific to each data domain (financial, climate, legal, etc.).

  • Date alignment (e.g., filing_date >= period.end_date for SEC)
  • Value ranges (e.g., hurricane wind 0-200 knots, latitude -90 to +90)
  • Unit consistency (revenue in millions, EPS per share)
  • Logical constraints (termination_date > start_date)
  • Cross-field validation (ticker matches company_name)
❌ Rejection Criteria:

Date misalignment, values outside expected ranges, unit mismatches, logical inconsistencies

🎨

Stage 5: Format & Encoding

What: Standardize formatting across all datasets

We convert all dates to ISO-8601, ensure UTF-8 encoding, and standardize numeric precision for consistency.

  • ISO-8601 date format (YYYY-MM-DD, YYYY-MM-DDTHH:MM:SSZ)
  • UTF-8 encoding verification (no mojibake, control characters)
  • Numeric precision (2 decimals for currency, 6 for coordinates)
  • URL encoding (percent-encoded special characters)
  • JSON/CSV syntax validation (parseable, well-formed)
❌ Rejection Criteria:

Non-ISO dates (ambiguous MM/DD/YYYY), encoding errors, invalid syntax, broken URLs

🔒

Stage 6: Privacy & Compliance

What: Protect privacy and ensure regulatory compliance

Automated PII scanner combined with ML-based detection ensures no personally identifiable information leaks through.

  • Automated PII scanner (regex + ML-based detection)
  • Email addresses removed/redacted
  • SSNs, phone numbers, credit cards blocked
  • GDPR "right to be forgotten" compliance
  • Medical record identifiers redacted (HIPAA)
❌ Rejection Criteria:

PII detected in any field, GDPR non-compliance, medical identifiers without consent

Quality Metrics We Track

Metric Target Achieved
Completeness 100% 100%
Provenance Tracking 100% 100%
Format Compliance 100% 100%
Schema Adherence 100% 100%
Sanity Checks Pass 100% 100%
Data Rejection Rate 10-15% 15-20%

Our 15-20% rejection rate is industry-leading. We maintain strict standards to ensure only high-quality data reaches production.

How We Compare

QuantLens vs typical data vendors and open data portals

QuantLens

Automated SSL + domain verification
100% strict schema enforcement
Every record provenance-tracked
Domain-specific business rules
ISO dates, UTF-8, typed fields
Automated PII scanner
Monthly URL re-testing
15-20% rejection rate (strict)

Typical Data Vendor

Manual spot-check
Best-effort validation
Some records have provenance
Basic sanity checks
Inconsistent formatting
Manual PII review
Testing on complaint only
5-10% rejection rate (lenient)

Open Data Portal

No verification
No schema enforcement
Rare provenance tracking
No business rules
As-is from source
User responsibility
Never re-tested
0% rejection (all published)

Continuous Quality Monitoring

Our quality process doesn't stop at publication

Monthly
Provenance URL Re-testing
Quarterly
Full Re-validation
Annual
3rd Party Audit
10%
Monthly Spot-checks

See the Quality for Yourself

Contact our team to request our free 25-pack sample bundle with complete validation documentation, SHA-256 manifests, and quality reports.