Securing Neural Networks with Cryptographic Backdoors

The use of digital signature systems and cryptographic circuits to incorporate defensive mechanisms or hidden weaknesses into a model is known as “cryptographic backdoors” for neural network (NN) security and tracking.

By using this method, the model owner can put in place extremely strong defensive measures (the “Boon”) that thwart attackers with black-box access to the NN.

You can also read Projective Crystal Symmetry in Modern Crystalline Materials

The Cryptographic Backdoors Mechanism

Unlike traditional backdoors, a cryptographic backdoor can be deployed to any classifier without requiring the model to be fine-tuned because it is constructed using cryptographic primitives.

Parallel Construction: A signature verification circuit is added to the original NN classifier to complete the model. The circuit operates in parallel with the classifier.
Activation: A message-signature pair is presumed to be present in every input. The input is changed to make the message a legitimate signature that matches the message in order to activate the backdoor.
Output Control: The verifier overrides the NN’s typical predictions by turning on the backdoor output branch to generate a predetermined output if the verification circuit recognizes a legitimate message-signature pair.
Undetectable and Non-Replicable: The backdoor is made to be black-box undetectable, which means that an adversary with only oracle access (querying access) to the model cannot tell it apart from a clean, un-backdoored model using computation. Importantly, the backdoor cannot be replicated; an adversary cannot replicate the trigger because of the security of the underlying digital signature system, which prevents them from forging a new, legitimate signature without the secret key.

Applications for Monitoring and Security

To protect intellectual property (IP) and manage access to Machine Learning as a Service (MLaaS) models, the study illustrates three main defensive applications that make use of these cryptographic backdoors.

Secure and reliable NN watermarking

Watermarking verifies model ownership by using the backdoor mechanism:

Mechanism: For certain trigger samples, the model owner, who has the secret signing key ($sk$), creates legitimate message-signature pairings. The watermarking approach is independent of the model’s parameters because it is integrated into the independent verification circuit.
Robustness: Unlike conventional NN watermark techniques, the watermark is resilient to changes or disturbances to the NN parameters because it is housed in the immutable verification circuit.
Verification: While parties without the valid signatures gain noticeably poor accuracy, an authorized auditor with the valid signature set can query the model and obtain flawless accuracy on the trigger set.

You can also read The (2+1)D Electrodynamics Used To Identify Phase Transition

Security-related User Authentication

It is more difficult for attackers to extract or steal the model through unauthorized querying since this protocol limits model usage to authorized parties:

Mechanism: During inference, a user must supply a working secret signing key. The system creates a signature for an input message based on the supplied data.
Access Control: The final outputs are the NN classifier’s actual predictions if the signature is confirmed to be legitimate.
Deterrence: The verifier alters the outputs, producing “garbage” or unusable results, if an invalid key is supplied or no key at all.

Unauthorized tracking of intellectual property

The model owner can identify a single authorized user as the source of an IP breach the cryptographic backdoor:

Unique Labelling: The system generates a single trigger set but a unique set of trigger labels for each user, rather than constructing unique trigger sets for each user.
Cryptographic Traceability: Using a hash function and the user’s secret key, this distinct label set is generated deterministically and cryptographically.
Attribution: A distributed model copy will only produce perfect accuracy on the trigger set when the appropriate user key and its assigned labels are supplied.
Detection: The accuracy approaches zero if the model copy is assessed using the label set that was allocated to a different user. Tracing the source of the leak is made easier by this unique performance profile, which guarantees that each distributed model corresponds to a single secret key.

You can also read QUDORA, Danish Quantum community advances Ion-trap Tech