In designing any system composed of multiple interconnected services, a key consideration is to ensure that the data that is sent between these services is trusted.
While industry best practices already exist defining how to secure connections between entities, there is significant additional complexity when attempting to scale these designs to secure thousands of agents that exist in a potentially hostile network environment, while keeping the operational burden to a minimum.
When designing Reveal Cloud we had some key design criteria:
- All connections between components must be encrypted and have perfect forward secrecy to ensure anything intercepting the traffic cannot access the potentially sensitive content.
- Untrusted hosts must not be able to establish a connection with any part of the system.
- Entities connecting to each other must be able to cryptographically assert that the remote host cannot present a spoofed identity.
- In case a host is compromised it must be possible to revoke access to the system.
- If a host is compromised the attacker must not be able to use the credentials to impersonate another component and assume a different role.
- The core services of the Reveal Cloud Infrastructure are protected from denial of service by untrusted hosts.
- The process of enrolling agents to a deployment should be as simple as possible to reduce the maintenance burden for the administrator.
- The scheme must be designed to scale to arbitrarily large deployments. Enrolling 10,000 agents should be just as easy as enrolling 10.
- Once enrolled, an agent must require no maintenance and be able to automatically renew its credentials.
By using a hardened TLS stack for the agent connection we are able to ensure the transport is encrypted, trusted and safe from man in the middle attacks. For this post I’m going to focus on the other criteria, establishing, renewing and revoking trust, and the unique way we solve this in Reveal.
Reveal Cloud Agent enrollment
The Agent enrollment scheme is based around tokens which are granted by the server and allow agents to request a certificate. To initially enroll an agent we generate a single use token which is included in an enrollment bundle. When installing the agent we provide it the enrollment bundle, this bundle contains the token and some additional configuration data that the agent can use to bootstrap its connection to the server.
The agent generates a new CSR and sends it to the server with its enrollment token, the server validates the token and ensures it hasn’t been used before. If the token is valid the server generates a unique identifier for the agent (the Agent UUID) and can issue a certificate with this ID. In addition to the certificate it also issues a new enrollment token which can later be used by the agent when it needs to renew its certificate.
The enrollment token system also provides a simple way to extend the process of enrolling agents to create complex deployment scenarios. Arbitrary properties can be attached to tokens, such as a cluster identifier which can be used to attach policy to agents as they enroll.
Enrollment tokens are cryptographically verifiable by the server, so they cannot be forged by a malicious party without the private key held securely within the server.
The enrollment tokens themselves effectively provide an agent with access to the infrastructure, with a token an attacker could request a certificate and send data to the server; so it is important that these tokens are kept securely. In the case of a token accidentally being disclosed there are various protection mechanisms that help restrict the scope of any disclosure:
- Tokens can be revoked centrally, meaning that if it is known a token has been lost it can be immediately blocked from being used. There is no need to reprovision any agents or infrastructure components.
- Similarly individual agent certificates can be revoked, in case the lost bundle is used to provision an agent its access to the system can be similarly revoked.
- Importantly because a token on its own does not identify the agent, it is not possible to impersonate another agent by gaining access to an enrollment bundle.
Due to the distributed architecture of Reveal Cloud, it is important for each component to be able to authenticate connections from agents in order to authorize them to perform certain actions (such as sending event data to the server). Conversely management and creation of certificates is better handled centrally such that there is a single isolated, secure and audited authority for the whole system. Unlike certificates, tokens cannot be used for authentication, they only grant the right to request a certificate. By decoupling these two responsibilities and using a token system to issue certificates we have enabled a secure and scalable system for enrolling agents.
This dive into the design and internals of our agent enrollment process has shown the process and thinking that goes into ensuring the security and integrity of Reveal and data it collects, while minimizing the administrative burden as the deployment grows from 10 agents to 10,000.