Deterministic Redaction of High-Risk Identifiers

maskify is a small, auditable API that removes emails, phone numbers, IP addresses, SSNs, and credit card numbers from arbitrary JSON payloads using deterministic rules.

What This API Does

  • Detects and redacts a narrow set of high-risk technical identifiers using explicit pattern matching and validation rules (e.g. checksums for credit cards)
  • Rulesets are versioned and immutable to ensure reproducible output
  • Single request → response HTTPS API
  • No payload storage or data retention

Why Deterministic (Not ML)

For data-sanitization boundaries, predictability and auditability matter more than probabilistic recall.

Machine-learning-based detection introduces non-determinism, model drift, and ambiguous guarantees. The same input may not always produce the same output over time.

maskify uses fixed, explicit rules so that the same input and ruleset version will always produce the same result — a requirement for testing, auditing, and long-term maintenance.

Input Shape

The API accepts arbitrary JSON objects and arrays. All string values are recursively traversed and sanitized. Non-string values are passed through unchanged.

Object keys and field names are never modified.

Sanitized Output

The response includes a sanitized version of the input payload, preserving the original JSON structure. All string values are returned with identifiers redacted in-place.

For simple inputs containing a single top-level string field, a redacted_text field is also returned as a convenience.

Supported Identifier Types (Ruleset-Pinned)

  • Email addresses
  • Phone numbers (common international formats)
  • Social Security numbers (US)
  • Credit card numbers (format + checksum validated)
  • IP addresses (IPv4 and IPv6)

Detection is limited strictly to the identifier types and patterns defined in each ruleset version. Exact coverage is defined by the active ruleset.

Explicitly Out of Scope

  • Personal names
  • Physical or mailing addresses
  • Dates of birth
  • Usernames, user IDs, or account identifiers
  • Ticket numbers or internal references
  • Contextual, inferred, or free-text personal data

If an identifier type is not listed under “Supported Identifier Types”, it is intentionally not detected.

Who This Is For

  • AI pipelines that must exclude commonly restricted technical identifiers
  • Logging, support, or analytics flows with PII risk
  • Teams that need reproducible, testable sanitization behavior

Who This Is Not For

  • Full PII discovery or compliance automation
  • Name, address, or entity extraction
  • Probabilistic or “best-effort” redaction

Why Not Just Build This Yourself?

Many teams start with ad-hoc regular expressions embedded in application code or logging pipelines.

Over time, these implementations accumulate edge cases, inconsistent behavior, and undocumented changes that are difficult to test or audit.

maskify centralizes this logic into a single, versioned sanitization boundary with explicit guarantees and stable behavior, so you do not have to maintain or reason about this code indefinitely.

How It Works

  • Receive a JSON request payload
  • Recursively scan all string values using a fixed ruleset version
  • Redact matched identifiers in-place
  • Return sanitized output with detection metadata

The same input and ruleset version will always produce the same output.

API Example

POST /v1/redact
Content-Type: application/json
X-API-Key: YOUR_API_KEY

{
  "event_type": "llm_prompt",
  "timestamp": "2026-01-08T18:42:11Z",
  "conversation": [
    {
      "role": "user",
      "content": "Hi, my email is jane.doe@example.com and my card 4242 4242 4242 4242 was charged twice."
    },
    {
      "role": "assistant",
      "content": "Can you confirm the phone number on your account?"
    },
    {
      "role": "user",
      "content": "Yes, it's +1 415 555 2671. I'm connecting from 203.0.113.42."
    }
  ],
  "context": {
    "session_id": "abc-92118",
    "retry_count": 0,
    "debug": false
  }
}
{
  "request_id": "dfe22134-0ead-4ea6-9cd7-66c635664efc",
  "ruleset_version": "pii-detect-v1.0.0",
  "status": "ok",
  "coverage": {
    "guaranteed": ["email", "phone", "credit_card", "ssn_us", "ip_address"],
    "excluded": ["names", "addresses", "dob", "user_ids", "free_text_identifiers"]
  },
  "pii_found": [
    { "category": "email", "start_offset": 16, "end_offset": 36 },
    { "category": "credit_card", "start_offset": 49, "end_offset": 68 },
    { "category": "phone", "start_offset": 9, "end_offset": 24 },
    { "category": "ip_address", "start_offset": 22, "end_offset": 34 }
  ],
  "sanitized_payload": {
    "event_type": "llm_prompt",
    "timestamp": "2026-01-08T18:42:11Z",
    "conversation": [
      {
        "role": "user",
        "content": "Hi, my email is [EMAIL] and my card [CREDIT_CARD] was charged twice."
      },
      {
        "role": "assistant",
        "content": "Can you confirm the phone number on your account?"
      },
      {
        "role": "user",
        "content": "Yes, it's [PHONE]. I'm connecting from [IP_ADDRESS]."
      }
    ],
    "context": {
      "session_id": "abc-92118",
      "retry_count": 0,
      "debug": false
    }
  }
}

Offsets are defined relative to the individual string value in which an identifier is detected and should be treated as advisory metadata.

Design Guarantees

  • Deterministic outputs
  • Ruleset-pinned behavior
  • No machine learning or model drift
  • No payload storage or secondary use
  • Fails closed on malformed or unsupported input
  • On error, no partially redacted data is ever returned

Ruleset Versions

Detection behavior is defined by fixed, immutable rulesets. Rulesets are never modified in place.

The active ruleset version is always returned in the API response for auditing and reproducibility.

Current ruleset: pii-detect-v1.0.0
Breaking changes are introduced only via new major ruleset versions.

Limits

  • Maximum payload size: 100 KB
  • Typical payloads under 10 KB complete in <50 ms at the 95th percentile
  • Rate limits apply per API key

Data Handling & Security Posture

  • Request bodies are processed entirely in-memory
  • No payloads or derived data are written to disk
  • Only minimal operational metadata is logged
  • No training, inspection, or secondary use of customer data
  • All traffic served over TLS

Join the Private Preview

maskify is intended for developers and small teams integrating deterministic identifier redaction into production systems.

  • Designed for low-to-moderate volume production workloads
  • Not intended for demos, experiments, or bulk data processing
  • Pricing expected in the low double-digit USD/month range

To request preview access, email: api@maskify.dev

Access is granted selectively while the API surface and onboarding are finalized.