QuantLens
Every data point undergoes rigorous 6-stage validation before publication. Enterprise-grade quality with full transparency.
Each stage includes quality gates: schema/contracts, integrity (SHA‑256), PII safety, coverage/freshness, value ranges, duplicates/joins, and text/NLP checks.
| Gate | Rules | Threshold | Result | Last Run (UTC) |
|---|---|---|---|---|
| Schema & Contracts | types, nullability, enums, keys | 100% rules pass | PASS | 2025‑11‑05 00:00 |
| Integrity | SHA‑256 per file; Merkle root | 100% covered | PASS | 2025‑11‑05 00:00 |
| PII & Safety | multi‑pass scanners; deny‑lists | 0 findings | PASS | 2025‑11‑05 00:00 |
| Coverage & Freshness | temporal bounds; recency | ≤ 24h lag | PASS | 2025‑11‑05 00:00 |
| Values & Ranges | ranges, outliers, monotonic | 99.9% valid | PASS | 2025‑11‑05 00:00 |
| Duplicates & Joins | unique keys; referential integrity | 0 dup; 0 orphan | PASS | 2025‑11‑05 00:00 |
| NLP/Text | min token length; language | 100% valid | PASS | 2025‑11‑05 00:00 |
# Windows (PowerShell)
certutil -hashfile pack.zip SHA256
# macOS / Linux
shasum -a 256 pack.zip
# Python (verify schema with jsonschema)
from jsonschema import validate
import json
with open('schema.json') as s, open('data.json') as d:
validate(json.load(d), json.load(s))
The release is blocked. We either hotfix the pipeline or roll back to the last green release and publish a notice in UPDATES.md.
Use the commands above to verify the SHA‑256 and validate JSON against the schema for your pack.
QuantLens applies a "validate early, validate often" approach. We reject 15-20% of raw data during quality checks to ensure only enterprise-grade data reaches you.
What: Verify data comes from authoritative sources
We only accept data from official government agencies, academic institutions, and verified public datasets. No third-party aggregators.
Source domain mismatch, broken links (404/403), expired SSL certificates, data staleness
What: Enforce strict data structure and type compliance
Every field is validated against JSON Schema Draft 2020-12 or CSV schemas. Missing fields or type mismatches trigger automatic rejection.
Missing required fields, type mismatches, invalid enum values, malformed structures
What: Ensure full audit trail to original source
Every record includes a direct link to the original source document. We test these URLs monthly to catch link rot.
Missing source URL, broken links, URLs pointing to aggregators instead of primary sources
What: Apply domain-specific logic and sanity checks
Beyond schema validation, we enforce business rules specific to each data domain (financial, climate, legal, etc.).
Date misalignment, values outside expected ranges, unit mismatches, logical inconsistencies
What: Standardize formatting across all datasets
We convert all dates to ISO-8601, ensure UTF-8 encoding, and standardize numeric precision for consistency.
Non-ISO dates (ambiguous MM/DD/YYYY), encoding errors, invalid syntax, broken URLs
What: Protect privacy and ensure regulatory compliance
Automated PII scanner combined with ML-based detection ensures no personally identifiable information leaks through.
PII detected in any field, GDPR non-compliance, medical identifiers without consent
| Metric | Target | Achieved |
|---|---|---|
| Completeness | 100% | 100% |
| Provenance Tracking | 100% | 100% |
| Format Compliance | 100% | 100% |
| Schema Adherence | 100% | 100% |
| Sanity Checks Pass | 100% | 100% |
| Data Rejection Rate | 10-15% | 15-20% |
Our 15-20% rejection rate is industry-leading. We maintain strict standards to ensure only high-quality data reaches production.
QuantLens vs typical data vendors and open data portals
Our quality process doesn't stop at publication
Contact our team to request our free 25-pack sample bundle with complete validation documentation, SHA-256 manifests, and quality reports.