QuantLens Get Started

How we validate every release

Every flagship family—BioTrials, FedEventBench, FedFlow Signals, NeuralFlow, Patent Master Pro, and Oncology Patent Intelligence—runs through schema, provenance, PII, and drift gates before being published. Ask for a sample slice and we’ll include the matching validation report and SHA-256 manifest.

Flagship families

BioTrials, FedEventBench, FedFlow Signals, NeuralFlow, Patent Master Pro, Oncology Patent Intelligence.

6-stage validation

Ingest → Normalize → Validate → Enrich → Index → Package, with automated schema, coverage, provenance, and drift checks.

What you receive

Sample Parquet slice, validation QA report, SHA-256 manifest, and provenance summary for the family you pick.

Quality Scorecard

Validation Pass (current)
Rejected During Intake
PII Findings at Release
SHA-256 Coverage
Provenance Coverage
Freshness SLO
Schema Version
Open Drift Alerts

Validation Pipeline

Ingest Normalize Validate Enrich Index Package

Each stage includes quality gates: schema/contracts, integrity (SHA‑256), PII safety, coverage/freshness, value ranges, duplicates/joins, and text/NLP checks.

Quality Gates

GateRulesThresholdResultLast Run (UTC)
Schema & Contractstypes, nullability, enums, keys100% rules passPASS2025‑11‑05 00:00
IntegritySHA‑256 per file; Merkle root100% coveredPASS2025‑11‑05 00:00
PII & Safetymulti‑pass scanners; deny‑lists0 findingsPASS2025‑11‑05 00:00
Coverage & Freshnesstemporal bounds; recency≤ 24h lagPASS2025‑11‑05 00:00
Values & Rangesranges, outliers, monotonic99.9% validPASS2025‑11‑05 00:00
Duplicates & Joinsunique keys; referential integrity0 dup; 0 orphanPASS2025‑11‑05 00:00
NLP/Textmin token length; language100% validPASS2025‑11‑05 00:00

Artifacts & Verification

Public Catalog
public.catalog.json
Metadata and metrics
Schema (sample)
schema.json
Types, nullability, enums
Expectations (sample)
expectations.json
Rule set per pack
Hashes (sample)
sha256sum.txt
Per‑file checksums

Validate a download locally

# Windows (PowerShell)
certutil -hashfile pack.zip SHA256

# macOS / Linux
shasum -a 256 pack.zip

# Python (verify schema with jsonschema)
from jsonschema import validate
import json
with open('schema.json') as s, open('data.json') as d:
    validate(json.load(d), json.load(s))
          

Governance & SLOs

  • Versioning: Semantic versioning; schema files versioned; breaking changes only on major.
  • Release SLOs: build success ≥ 99.9%; freshness SLO ≤ 24h per pack; incident response P1 < 4h.
  • Error budget: monthly validation failure budget ≤ 0.5% with auto‑rollback.
  • Changelog: UPDATES.md with impact, migration paths, and deprecation windows.

FAQ / Guarantees

What happens if a check fails?

The release is blocked. We either hotfix the pipeline or roll back to the last green release and publish a notice in UPDATES.md.

How do I validate a download?

Use the commands above to verify the SHA‑256 and validate JSON against the schema for your pack.

Our Quality Promise

QuantLens applies a "validate early, validate often" approach. We reject 15-20% of raw data during quality checks to ensure only enterprise-grade data reaches you.

100%
Validation Pass Rate
6
Validation Stages
15-20%
Data Rejected
Monthly
URL Re-testing

The 6-Stage Validation Pipeline

1

Stage 1: Source Verification

What: Verify data comes from authoritative sources

We only accept data from official government agencies, academic institutions, and verified public datasets. No third-party aggregators.

  • Domain validation (sec.gov, noaa.gov, uspto.gov, etc.)
  • SSL certificate verification
  • HTTP accessibility testing (200 OK status)
  • Rate limiting compliance with source APIs
  • Data freshness checks vs expected update cadence
Rejection Criteria:

Source domain mismatch, broken links (404/403), expired SSL certificates, data staleness

2

Stage 2: Schema Validation

What: Enforce strict data structure and type compliance

Every field is validated against JSON Schema Draft 2020-12 or CSV schemas. Missing fields or type mismatches trigger automatic rejection.

  • Required fields present (no null/missing critical data)
  • Type validation (string, number, boolean, array, object)
  • Nested object structure compliance
  • Array element type consistency
  • Field name standardization (snake_case)
Rejection Criteria:

Missing required fields, type mismatches, invalid enum values, malformed structures

3

Stage 3: Provenance Tracking

What: Ensure full audit trail to original source

Every record includes a direct link to the original source document. We test these URLs monthly to catch link rot.

  • Source URL links to official domain
  • Metadata captured (filing_date, accession, document_id)
  • URLs tested for accessibility (not just format validation)
  • Cross-reference validation (e.g., accession matches SEC format)
Rejection Criteria:

Missing source URL, broken links, URLs pointing to aggregators instead of primary sources

4

Stage 4: Business Rule Validation

What: Apply domain-specific logic and sanity checks

Beyond schema validation, we enforce business rules specific to each data domain (financial, climate, legal, etc.).

  • Date alignment (e.g., filing_date >= period.end_date for SEC)
  • Value ranges (e.g., hurricane wind 0-200 knots, latitude -90 to +90)
  • Unit consistency (revenue in millions, EPS per share)
  • Logical constraints (termination_date > start_date)
  • Cross-field validation (ticker matches company_name)
Rejection Criteria:

Date misalignment, values outside expected ranges, unit mismatches, logical inconsistencies

5

Stage 5: Format & Encoding

What: Standardize formatting across all datasets

We convert all dates to ISO-8601, ensure UTF-8 encoding, and standardize numeric precision for consistency.

  • ISO-8601 date format (YYYY-MM-DD, YYYY-MM-DDTHH:MM:SSZ)
  • UTF-8 encoding verification (no mojibake, control characters)
  • Numeric precision (2 decimals for currency, 6 for coordinates)
  • URL encoding (percent-encoded special characters)
  • JSON/CSV syntax validation (parseable, well-formed)
Rejection Criteria:

Non-ISO dates (ambiguous MM/DD/YYYY), encoding errors, invalid syntax, broken URLs

6

Stage 6: Privacy & Compliance

What: Protect privacy and ensure regulatory compliance

Automated PII scanner combined with ML-based detection ensures no personally identifiable information leaks through.

  • Automated PII scanner (regex + ML-based detection)
  • Email addresses removed/redacted
  • SSNs, phone numbers, credit cards blocked
  • GDPR "right to be forgotten" compliance
  • Medical record identifiers redacted (HIPAA)
Rejection Criteria:

PII detected in any field, GDPR non-compliance, medical identifiers without consent

Quality Metrics We Track

Metric Target Achieved
Completeness 100% 100%
Provenance Tracking 100% 100%
Format Compliance 100% 100%
Schema Adherence 100% 100%
Sanity Checks Pass 100% 100%
Data Rejection Rate 10-15% 15-20%

Our 15-20% rejection rate is industry-leading. We maintain strict standards to ensure only high-quality data reaches production.

How We Compare

QuantLens vs typical data vendors and open data portals

QuantLens

Automated SSL + domain verification
100% strict schema enforcement
Every record provenance-tracked
Domain-specific business rules
ISO dates, UTF-8, typed fields
Automated PII scanner
Monthly URL re-testing
15-20% rejection rate (strict)

Typical Data Vendor

Manual spot-check
Best-effort validation
Some records have provenance
Basic sanity checks
Inconsistent formatting
Manual PII review
Testing on complaint only
5-10% rejection rate (lenient)

Open Data Portal

No verification
No schema enforcement
Rare provenance tracking
No business rules
As-is from source
User responsibility
Never re-tested
0% rejection (all published)

Continuous Quality Monitoring

Our quality process doesn't stop at publication

Monthly
Provenance URL Re-testing
Quarterly
Full Re-validation
Annual
3rd Party Audit
10%
Monthly Spot-checks

Request a flagship dataset sample

Choose BioTrials, FedEventBench, FedFlow Signals, NeuralFlow (Teaser/Pro/Enterprise), Patent Master Pro, or Oncology Patent Intelligence and we’ll send a Parquet slice plus its validation report, SHA-256 manifest, and provenance notes.