Documentation — DataScreenIQ API

Quickstart

Get screening data in under 60 seconds:

terminal
# 1. Sign up (one time)
curl -X POST https://api.datascreeniq.com/v1/auth/signup \
  -H "Content-Type: application/json" \
  -d '{"email": "[email protected]"}'
# → Returns your API key (save it!)

# 2. Screen your data
curl -X POST https://api.datascreeniq.com/v1/screen \
  -H "Content-Type: application/json" \
  -H "X-API-Key: dsiq_live_your_key_here" \
  -d '{
    "source": "orders_api",
    "rows": [
      {"order_id": "ORD-1001", "amount": 149.99, "email": "[email protected]"},
      {"order_id": "ORD-1002", "amount": "broken", "email": null}
    ]
  }'
# → Returns PASS / WARN / BLOCK with full quality report

Authentication

All data-plane endpoints require an API key. Pass it in the X-API-Key header or as a Bearer token:

headers
# Option A: X-API-Key header (recommended)
X-API-Key: dsiq_live_your_key_here

# Option B: Bearer token
Authorization: Bearer dsiq_live_your_key_here

API keys are scoped: ingest (screen data), read (view reports), admin (manage keys). Your default key has all three scopes.

Base URL

All API requests go to:

https://api.datascreeniq.com

POST /v1/screen

The core endpoint. Send a batch of rows, get a quality verdict back.

POST /v1/screen Scope: ingest

Request body

Field	Type	Required	Description
source	string	Yes	Dataset or pipeline name (e.g. "orders_api")
rows	object[]	Yes	Array of data objects to screen (max 100,000)
batch_id	string	No	Your batch identifier (auto-generated if omitted)
options.full_scan	boolean	No	Analyze all rows instead of sampling (default: false)
options.thresholds	object	No	Custom thresholds — see below

Payload limits

Constraint	Limit
Max request body	5 MB
Max rows per request	100,000
Default sample size	1,000 rows

Response format

response.json
{
  "status": "WARN",
  "health_score": 0.74,
  "schema_fingerprint": "a1b2c3d4...",
  "drift": [
    { "field": "amount", "kind": "type_changed", "severity": "block" }
  ],
  "issues": {
    "null_rates": { "email": 0.5 },
    "type_mismatches": ["amount"],
    "empty_string_rates": {},
    "duplicate_fields": [],
    "outlier_fields": [],
    "row_count_anomaly": false,
    "new_enum_values": {}
  },
  "stats": {
    "rows_received": 2,
    "rows_sampled": 2,
    "sample_ratio": 1.0,
    "sample_version": "v2"
  },
  "latency_ms": 4,
  "batch_id": "batch_xxx"
}

Verdicts

Status	Meaning	Recommended action
PASS	Data quality is within acceptable thresholds	Proceed with ingestion
WARN	Quality issues detected but not critical	Ingest with monitoring / alert
BLOCK	Critical quality issues — data should not be ingested	Quarantine and investigate

18 Quality Checks

All checks run in real-time at the edge.

Schema fingerprint

Deterministic hash of your schema structure. Compared on every request.

Field added

Detects new fields not in the previous schema.

Field removed

Detects missing fields that were in the previous schema.

Type change

Flags fields whose type changed (e.g. number → string).

Null rate

Percentage of null/undefined values per column.

Null spike

Detects sudden drops in field completeness vs. historical baseline.

Empty string rate

Percentage of "" values — often worse than null (passes type checks).

Row count anomaly

Detects unusual batch sizes compared to historical average.

Min / max

Tracks numeric + lexicographic bounds per column.

Percentiles

p25, p50, p75, p95 computed for numeric columns.

Outlier detection

Flags values outside expected statistical range.

Approx distinct count

Fast approximate count of unique values per column.

Duplicate rate

Detects repeated values that suggest data duplication.

Type stability

Detects columns with inconsistent value types.

Enum tracking

Tracks value set for low-cardinality fields (distinct < 20).

New enum values

Flags new values that have never appeared before.

Timestamp recency

Auto-detects timestamp columns, flags stale data (>24h warn, >72h block).

Distribution profile

Statistical distribution profile for numeric columns.

Custom thresholds

Override default thresholds per request via options.thresholds:

Threshold	Default	Description
null_rate_warn	0.3	Null rate above this → WARN
null_rate_block	0.7	Null rate above this → BLOCK
type_mismatch_warn	0.05	Type mismatch rate above this → WARN
type_mismatch_block	0.2	Type mismatch rate above this → BLOCK
health_warn	0.8	Health score below this → WARN
health_block	0.5	Health score below this → BLOCK

Sampling

By default, DataScreenIQ samples 1,000 rows from your payload using deterministic hash-based selection. This means the same dataset always produces the same sample. The sample_version and sample_ratio are returned in every response.

To analyze all rows, pass "full_scan": true in the options. For payloads under 1,000 rows, all rows are always analyzed regardless of this flag.

Job history

GET/v1/jobsScope: read

List recent screening jobs. Query params: limit (max 100), offset, source.

GET/v1/jobs/:idScope: read

Get the full quality report for a specific job.

GET/v1/statsScope: read

Aggregate statistics for your account. Query param: days (default 30, max 365).

API keys

POST/v1/keysScope: admin

Create a new API key. Body: {"name": "prod-ingest", "scopes": ["ingest", "read"]}

GET/v1/keysScope: admin

List all API keys for your account.

DELETE/v1/keys/:idScope: admin

Revoke an API key immediately.

Billing

GET/v1/billingScope: read

Returns current billing period counters: request count, rows processed, period start/end.

Signup

POST/v1/auth/signupPublic

Create a new account. Body: {"email": "[email protected]"}. Returns API key (shown once) and emails it as backup.

Login (OTP)

POST/v1/auth/loginPublic

Send a 6-digit OTP to your email. Body: {"email": "[email protected]"}

POST/v1/auth/verifyPublic

Verify OTP and get a session token (7-day expiry). Body: {"email": "...", "code": "123456"}. Use the session token as Bearer token for billing endpoints.

Rate limits

Endpoint	Limit
POST /v1/screen	100 requests/min per IP
POST /v1/auth/signup	5 per hour per IP
POST /v1/auth/login	10 per 15 min per IP
POST /v1/auth/verify	10 per 15 min per IP
All other endpoints	100 requests/min per IP

Need higher limits? Growth and Scale plans include priority rate limits. Contact

API Reference

Quickstart

Authentication

Base URL

POST /v1/screen

Request body

Payload limits

Response format

Verdicts

18 Quality Checks

Schema fingerprint

Field added

Field removed

Type change

Null rate

Null spike

Empty string rate

Row count anomaly

Min / max

Percentiles

Outlier detection

Approx distinct count

Duplicate rate

Type stability

Enum tracking

New enum values

Timestamp recency

Distribution profile

Custom thresholds

Sampling

Job history

API keys

Billing

Signup

Login (OTP)

Rate limits