The ClearDATA PHI Container Cluster is designed to provide a secure, compliant microservice platform using of ClearDATA’s Dynamic PHI and Amazon’s EC2 Container Service (ECS). The platform extends ECS in order to provide a compliant-by-default system with a smooth developer experience. The key extensions include:
This section provides an overview of the platform, in particular as it applies to running web services.
The cluster consists of EC2 instances deployed by an Auto Scaling Group. Auto Scaling ensures that the minimum number of instances is healthy at all times, and automatically replaces any instances that go offline for any reason.
Each EC2 instance is automatically configured on its first boot to run the software required by ClearDATA. That process takes a few minutes. At the end of that process, the instance registers itself with ECS, and is ready to run containerized applications.
An ECS service is a long-running piece of software like a web service API. The service configuration specifies a number of copies to run. ECS tries to ensure that the correct number of copies are always available. If one exits unexpectedly, a new copy will automatically replace it.
An ECS task runs a piece of software once, and does no subsequent monitoring. If it exits, the task is complete. Tasks are more appropriate for periodic, batch, or event processing systems.
To provide a containerized web service, HTTP requests must eventually be delivered to the application that can handle them.
When running a web service, HTTP requests needs to eventually be delivered to the application container. The first step is be delivered to a cluster instance:
Incoming requests will be routed to the instances by one or more Elastic Load Balancers over HTTPS. Each instance has an automatically configured reverse proxy which routes the requests to one of the containers for the requested service:
When a request arrives at an EC2 instance, the reverse proxy container on that instance is responsible for routing it to the appropriate kind of application container. Any EC2 instance can handle requests for any application container. If the ECS service launches multiple copies of an application, the reverse proxy container will load balance among them.
For example, the above diagram shows three copies of the green application running on the cluster. Both reverse proxy containers can handle requests for the green application. Requests will be load balanced among the three green containers.
In a service-oriented architecture, it’s desirable to make services that can easily be accessed by other services. This permits creating simpler, isolated services which can be combined to provide reliable and scalable applications.
To help enable applications that process PHI, the PHI Container Cluster connects each container to a private overlay network. This allows containers to directly communicate with one another, regardless of which cluster instances they are on.
This overlay network is built to ensure that all communications are always encrypted in transit. Normally, if an application processes PHI, it must ensure that the protected data is always encrypted when being transmitted in an AWS VPC. By building in an encrypted overlay network, the cluster can provide that encryption transparently.
In an application container, the overlay network looks like a secondary network interface:
The overlay network is assigned 10.32.0.0/12 only. The default gateway remains configured in the usual way. This ensures that it does not affect ordinary internet, VPC, or VPN traffic.
This means that, by default, containers will be unable to communicate with any infrastructure outside of the cluster that is numbered out of 10.32.0.0/12. This includes anything in an Amazon VPC, as well as infrastructure reached over a VPN, Direct Connect, VPC peering, etc. If you need to reach infrastructure in 10.32.0.0/12 the overlay subnet must be changed. Please work with ClearDATA Support to request this.
When a container starts on a cluster, software running on the EC2 instance automatically registers that new container and provides information on it to the other EC2 instances in the cluster. That process is local to each instance, and proceeds as follows:
For details on how to use the service discovery, expose services using these tools, and to control what is registered, see Service discovery.
Many of the parameters and behaviors here can be customized as required. Some possible examples include:
Please contact your account team or ClearDATA Support for help in making such customizations.
The ClearDATA PHI Container Cluster uses a variety of AWS products to provide dynamic, scalable, and reliable platform services that are HIPAA compliant. This means that some application architectures will work better than others. Some guidelines for an optimal experience are below.
Containers should never store important state information inside their local filesystem. There are a number of reasons for this:
In order to avoid storing local state, applications that you intend to run on a PHI Container Cluster should store all of their data in an external storage system. S3, RDS, and DynamoDB are often good choices.
When storing PHI, only use BAA covered storage systems.
In standard ClearDATA deployments, a PHI Container Cluster can access AWS services so no special handling is required. A containerized application can use the standard SDKs and APIs to access AWS storage systems like the above.
Of course that requires AWS credentials to be configured. See Per-container IAM roles for information on our recommended method.
Since IAM users have static keys, ClearDATA strongly discourages their use in this setting.
By default, ECS automatically attempts to schedule containers on instances by comparing the task definition’s requirements to the resources available on each instance. This include the memory and CPU values from the task definition. Optimal use of a PHI Container Cluster requires reasonable estimates of a container’s memory and CPU usage.
If a container exceeds its memory hard limit, it is killed. Ideally, a container’s memory limit should be high enough to provide memory for normal operation, while still being as low as possible. Practically, it is best to start higher, and lower the limit as more memory usage data helps with your estimation.
CPU usage is advisory - a task will not be killed if it consumes more CPU than allocated in the task definition. However, ECS uses that value to determine if a task will fit on a container instance. Ideally, it’d be best to set this to the average usage of the container, but this is difficult to determine. Practically, it is best to start low and increase as more CPU usage data helps with your estimation.
There are many resources on container-centric application development. Some resource worth looking at include: