maskify — Deterministic Redaction of High-Risk Identifiers

What This API Does

Detects and redacts a narrow set of high-risk technical identifiers using explicit pattern matching and validation rules (e.g. checksums for credit cards)
Rulesets are versioned and immutable to ensure reproducible output
Single request → response HTTPS API
No payload storage or data retention

Why Deterministic (Not ML)

For data-sanitization boundaries, predictability and auditability matter more than probabilistic recall.

Machine-learning-based detection introduces non-determinism, model drift, and ambiguous guarantees. The same input may not always produce the same output over time.

maskify uses fixed, explicit rules so that the same input and ruleset version will always produce the same result — a requirement for testing, auditing, and long-term maintenance.

Input Shape

The API accepts arbitrary JSON objects and arrays. All string values are recursively traversed and sanitized. Non-string values are passed through unchanged.

Object keys and field names are never modified.

Sanitized Output

The response includes a sanitized version of the input payload, preserving the original JSON structure. All string values are returned with identifiers redacted in-place.

For simple inputs containing a single top-level string field, a redacted_text field is also returned as a convenience.

Supported Identifier Types (Ruleset-Pinned)

Email addresses
Phone numbers (common international formats)
Social Security numbers (US)
Credit card numbers (format + checksum validated)
IP addresses (IPv4 and IPv6)

Detection is limited strictly to the identifier types and patterns defined in each ruleset version. Exact coverage is defined by the active ruleset.

Explicitly Out of Scope

Personal names
Physical or mailing addresses
Dates of birth
Usernames, user IDs, or account identifiers
Ticket numbers or internal references
Contextual, inferred, or free-text personal data

If an identifier type is not listed under “Supported Identifier Types”, it is intentionally not detected.

Who This Is For

AI pipelines that must exclude commonly restricted technical identifiers
Logging, support, or analytics flows with PII risk
Teams that need reproducible, testable sanitization behavior

Who This Is Not For

Full PII discovery or compliance automation
Name, address, or entity extraction
Probabilistic or “best-effort” redaction

Why Not Just Build This Yourself?

Many teams start with ad-hoc regular expressions embedded in application code or logging pipelines.

Over time, these implementations accumulate edge cases, inconsistent behavior, and undocumented changes that are difficult to test or audit.

maskify centralizes this logic into a single, versioned sanitization boundary with explicit guarantees and stable behavior, so you do not have to maintain or reason about this code indefinitely.

How It Works

Receive a JSON request payload
Recursively scan all string values using a fixed ruleset version
Redact matched identifiers in-place
Return sanitized output with detection metadata

The same input and ruleset version will always produce the same output.

API Example

POST /v1/redact
Content-Type: application/json
X-API-Key: YOUR_API_KEY

{
  "event_type": "llm_prompt",
  "timestamp": "2026-01-08T18:42:11Z",
  "conversation": [
    {
      "role": "user",
      "content": "Hi, my email is jane.doe@example.com and my card 4242 4242 4242 4242 was charged twice."
    },
    {
      "role": "assistant",
      "content": "Can you confirm the phone number on your account?"
    },
    {
      "role": "user",
      "content": "Yes, it's +1 415 555 2671. I'm connecting from 203.0.113.42."
    }
  ],
  "context": {
    "session_id": "abc-92118",
    "retry_count": 0,
    "debug": false
  }
}

{
  "request_id": "dfe22134-0ead-4ea6-9cd7-66c635664efc",
  "ruleset_version": "pii-detect-v1.0.0",
  "status": "ok",
  "coverage": {
    "guaranteed": ["email", "phone", "credit_card", "ssn_us", "ip_address"],
    "excluded": ["names", "addresses", "dob", "user_ids", "free_text_identifiers"]
  },
  "pii_found": [
    { "category": "email", "start_offset": 16, "end_offset": 36 },
    { "category": "credit_card", "start_offset": 49, "end_offset": 68 },
    { "category": "phone", "start_offset": 9, "end_offset": 24 },
    { "category": "ip_address", "start_offset": 22, "end_offset": 34 }
  ],
  "sanitized_payload": {
    "event_type": "llm_prompt",
    "timestamp": "2026-01-08T18:42:11Z",
    "conversation": [
      {
        "role": "user",
        "content": "Hi, my email is [EMAIL] and my card [CREDIT_CARD] was charged twice."
      },
      {
        "role": "assistant",
        "content": "Can you confirm the phone number on your account?"
      },
      {
        "role": "user",
        "content": "Yes, it's [PHONE]. I'm connecting from [IP_ADDRESS]."
      }
    ],
    "context": {
      "session_id": "abc-92118",
      "retry_count": 0,
      "debug": false
    }
  }
}

Offsets are defined relative to the individual string value in which an identifier is detected and should be treated as advisory metadata.

Design Guarantees

Deterministic outputs
Ruleset-pinned behavior
No machine learning or model drift
No payload storage or secondary use
Fails closed on malformed or unsupported input
On error, no partially redacted data is ever returned

Ruleset Versions

Detection behavior is defined by fixed, immutable rulesets. Rulesets are never modified in place.

The active ruleset version is always returned in the API response for auditing and reproducibility.

Current ruleset: pii-detect-v1.0.0
Breaking changes are introduced only via new major ruleset versions.

Limits

Maximum payload size: 100 KB
Typical payloads under 10 KB complete in <50 ms at the 95th percentile
Rate limits apply per API key

Data Handling & Security Posture

Request bodies are processed entirely in-memory
No payloads or derived data are written to disk
Only minimal operational metadata is logged
No training, inspection, or secondary use of customer data
All traffic served over TLS

Join the Private Preview

maskify is intended for developers and small teams integrating deterministic identifier redaction into production systems.

Designed for low-to-moderate volume production workloads
Not intended for demos, experiments, or bulk data processing
Pricing expected in the low double-digit USD/month range

To request preview access, email: api@maskify.dev

Access is granted selectively while the API surface and onboarding are finalized.