Documentation

API Reference

Everything you need to integrate DataScreenIQ into your data pipeline. One endpoint, one POST, instant quality verdicts.

Quickstart

Get screening data in under 60 seconds:

terminal
# 1. Sign up (one time) curl -X POST https://api.datascreeniq.com/v1/auth/signup \ -H "Content-Type: application/json" \ -d '{"email": "you@company.com"}' # → Returns your API key (save it!) # 2. Screen your data curl -X POST https://api.datascreeniq.com/v1/screen \ -H "Content-Type: application/json" \ -H "X-API-Key: dsiq_live_your_key_here" \ -d '{ "source": "orders_api", "rows": [ {"order_id": "ORD-1001", "amount": 149.99, "email": "alice@corp.com"}, {"order_id": "ORD-1002", "amount": "broken", "email": null} ] }' # → Returns PASS / WARN / BLOCK with full quality report

Authentication

All data-plane endpoints require an API key. Pass it in the X-API-Key header or as a Bearer token:

headers
# Option A: X-API-Key header (recommended) X-API-Key: dsiq_live_your_key_here # Option B: Bearer token Authorization: Bearer dsiq_live_your_key_here

API keys are scoped: ingest (screen data), read (view reports), admin (manage keys). Your default key has all three scopes.

Base URL

All API requests go to:

https://api.datascreeniq.com

POST /v1/screen

The core endpoint. Send a batch of rows, get a quality verdict back.

POST /v1/screen Scope: ingest

Request body

FieldTypeRequiredDescription
sourcestringYesDataset or pipeline name (e.g. "orders_api")
rowsobject[]YesArray of data objects to screen (max 100,000)
batch_idstringNoYour batch identifier (auto-generated if omitted)
options.full_scanbooleanNoAnalyze all rows instead of sampling (default: false)
options.thresholdsobjectNoCustom thresholds — see below

Payload limits

ConstraintLimit
Max request body5 MB
Max rows per request100,000
Default sample size1,000 rows

Response format

response.json
{ "status": "WARN", "health_score": 0.74, "schema_fingerprint": "a1b2c3d4...", "drift": [ { "field": "amount", "kind": "type_changed", "severity": "block" } ], "issues": { "null_rates": { "email": 0.5 }, "type_mismatches": ["amount"], "empty_string_rates": {}, "duplicate_fields": [], "outlier_fields": [], "row_count_anomaly": false, "new_enum_values": {} }, "stats": { "rows_received": 2, "rows_sampled": 2, "sample_ratio": 1.0, "sample_version": "v2" }, "latency_ms": 4, "batch_id": "batch_xxx" }

Verdicts

StatusMeaningRecommended action
PASSData quality is within acceptable thresholdsProceed with ingestion
WARNQuality issues detected but not criticalIngest with monitoring / alert
BLOCKCritical quality issues — data should not be ingestedQuarantine and investigate

18 Quality Checks

All checks run in real-time at the edge.

01

Schema fingerprint

Deterministic hash of your schema structure. Compared on every request.

02

Field added

Detects new fields not in the previous schema.

03

Field removed

Detects missing fields that were in the previous schema.

04

Type change

Flags fields whose type changed (e.g. number → string).

05

Null rate

Percentage of null/undefined values per column.

06

Null spike

Detects sudden drops in field completeness vs. historical baseline.

07

Empty string rate

Percentage of "" values — often worse than null (passes type checks).

08

Row count anomaly

Detects unusual batch sizes compared to historical average.

09

Min / max

Tracks numeric + lexicographic bounds per column.

10

Percentiles

p25, p50, p75, p95 computed for numeric columns.

11

Outlier detection

Flags values outside expected statistical range.

12

Approx distinct count

Fast approximate count of unique values per column.

13

Duplicate rate

Detects repeated values that suggest data duplication.

14

Type stability

Detects columns with inconsistent value types.

15

Enum tracking

Tracks value set for low-cardinality fields (distinct < 20).

16

New enum values

Flags new values that have never appeared before.

17

Timestamp recency

Auto-detects timestamp columns, flags stale data (>24h warn, >72h block).

18

Distribution profile

Statistical distribution profile for numeric columns.

Custom thresholds

Override default thresholds per request via options.thresholds:

ThresholdDefaultDescription
null_rate_warn0.3Null rate above this → WARN
null_rate_block0.7Null rate above this → BLOCK
type_mismatch_warn0.05Type mismatch rate above this → WARN
type_mismatch_block0.2Type mismatch rate above this → BLOCK
health_warn0.8Health score below this → WARN
health_block0.5Health score below this → BLOCK

Sampling

By default, DataScreenIQ samples 1,000 rows from your payload using deterministic hash-based selection. This means the same dataset always produces the same sample. The sample_version and sample_ratio are returned in every response.

To analyze all rows, pass "full_scan": true in the options. For payloads under 1,000 rows, all rows are always analyzed regardless of this flag.

Job history

GET/v1/jobsScope: read

List recent screening jobs. Query params: limit (max 100), offset, source.

GET/v1/jobs/:idScope: read

Get the full quality report for a specific job.

GET/v1/statsScope: read

Aggregate statistics for your account. Query param: days (default 30, max 365).

API keys

POST/v1/keysScope: admin

Create a new API key. Body: {"name": "prod-ingest", "scopes": ["ingest", "read"]}

GET/v1/keysScope: admin

List all API keys for your account.

DELETE/v1/keys/:idScope: admin

Revoke an API key immediately.

Billing

GET/v1/billingScope: read

Returns current billing period counters: request count, rows processed, period start/end.

Signup

POST/v1/auth/signupPublic

Create a new account. Body: {"email": "you@company.com"}. Returns API key (shown once) and emails it as backup.

Login (OTP)

POST/v1/auth/loginPublic

Send a 6-digit OTP to your email. Body: {"email": "you@company.com"}

POST/v1/auth/verifyPublic

Verify OTP and get a session token (7-day expiry). Body: {"email": "...", "code": "123456"}. Use the session token as Bearer token for billing endpoints.

Rate limits

EndpointLimit
POST /v1/screen100 requests/min per IP
POST /v1/auth/signup5 per hour per IP
POST /v1/auth/login10 per 15 min per IP
POST /v1/auth/verify10 per 15 min per IP
All other endpoints100 requests/min per IP

Need higher limits? Growth and Scale plans include priority rate limits. Contact app@datascreeniq.com for enterprise requirements.