The consent record: a machine-readable schema for AI data use

We argued about the shape of this object longer than almost anything else in OConsent, and that was the right place to spend the time. Everything in the system either writes one of these or reads one, so a mistake here shows up everywhere later.

The job it has to do is narrow. When a model or a pipeline is about to touch some data, something has to decide whether that is allowed, in code, with nobody around to ask. So the record has to stand on its own and answer one question: can this actor use this asset for this purpose, today.

Here is one in full.

{
  "id": "rec_7f3a",
  "subject": "user_123",
  "asset": "conversation_export",
  "purpose": "llm_training",
  "actor": "model_pipeline_7",
  "scope": {
    "allowed_operations": ["train", "evaluate"],
    "excluded_operations": ["resell", "share_external"],
    "geography": ["SG", "US"],
    "retention_days": 365
  },
  "issued_at": "2026-06-28T00:00:00Z",
  "expires_at": "2027-06-28T00:00:00Z",
  "status": "active",
  "proof": {
    "type": "signed_timestamp",
    "hash": "sha256:..."
  }
}

Let me walk through it.

subject, asset, actor

subject is who the data is about. asset is the thing being used. actor is who wants to use it. All three are required, and that is on purpose: if you cannot name all of them, you do not really have a decision to make. Name the asset at whatever level you can actually enforce on. If you only ever store coarse dataset names, you will regret it the first time someone asks to revoke a single conversation instead of the whole set, so the field goes as fine-grained as you need.

purpose

This one looks trivial and is not.

"purpose": "llm_training"

It is a string from a known list, never free text, and that limitation is the reason a machine can use the record at all. The common values are llm_training, model_finetuning, agent_memory, personalization, evaluation, research, and partner_sharing, and you can add your own. The one rule is that you do not rename them later. A purpose name is a contract, so renaming it quietly invalidates every record that used the old one, which is the kind of thing you do exactly once.

scope

Purpose says what for. Scope draws the edges.

"scope": {
  "allowed_operations": ["train", "evaluate"],
  "excluded_operations": ["resell", "share_external"],
  "geography": ["SG", "US"],
  "retention_days": 365
}

allowed_operations is the actual grant. Anything not on that list is denied, so leaving something off is a choice rather than an oversight. excluded_operations is nearly redundant against that, and we kept it anyway, because someone reading their own record should see “train, fine; resell, never” written down instead of having to infer it. geography and retention_days are the dull constraints that turn out to carry most of the weight in any real conversation with a legal team.

status

"status": "active"

The values are active, expired, revoked, and suspended. Verification reads this almost first, because most denials in practice are not “no record exists.” They are “a record exists and it is no good now.” Dead records stay; we do not delete them. That is what lets you say later whether a use was fine at the time even if it would be refused today.

proof

"proof": { "type": "signed_timestamp", "hash": "sha256:..." }

A signature over the record and a timestamp that does not lean on the collector’s own clock. If you need to defend the record harder than that, there is optional cryptographic anchoring underneath, but most people will not, and I would rather it sit there as an option than crowd out the description of what a record is.

why it ended up like this

Three things we would not bend on. The record is tied to a purpose, so a grant for evaluation cannot wander into a grant for training. It is self-contained, so a verifier can decide from the record plus the request without phoning home to wherever consent was first collected. And it maps onto LLMConsent-style records deliberately, so the same object can travel between systems that were never built to talk to each other.

The rest of the object model is in the draft spec, and the developer guide shows how to issue and check one in code.

Written by Subhadip Mitra. Found a mistake or want to push back? Open an issue or email [email protected].