ESB Amazon Infrastructure (AWS)
Our network architecture aims to be simple but still provide for disaster recovery and keep application layers in separate networks.
For each stack we create a VPC and give it the entire /24 network allocated to us by Harvard UNSG.
We break the /24 network into four /26 networks, each with a potential for 58 IP addresses. We use two of these subnets for Oracle RDS and the other two subnets for EC2 instances. For each application layer we put one subnet in the us-east-1a availability zone and the other in the us-east-1b availability zone.
The elastic load balancers get Harvard IP addresses from the same subnets as our EC2 instances.
Here is our TEST stack:
We have three security groups; one each for EC2 instances, RDS instances and Elastic Load Balancers.
First create the VPC with the assigned CIDR. We used 10.39.8.0/24 for TEST and 10.37.8.0/24 for PROD.
Select the new VPC and then under the Actions button click Edit DNS resolution and Edit DNS hostnames and ensure both are set to Yes.
Create an internet gateway (Internet Gateways link on left navigation pane of VPC console) and associate it with the VPC:
Create one or more subnets (Subnets link in left navigation pane) in the VPC, each with a smaller CIDR and in the appropriate availability zone. For TEST we split the /24 CIDR granted by Harvard UNSG into four subnets: 10.39.8.0/26 and 10.39.8.64/26 for TEST EC2 instances, and 10.39.8.128/26 and 10.39.8.192/26 for TEST RDS instances.
Repeat for the other 3 subnets, making sure we have two EC2 subnets in two different availability zones and two RDS subnets in two different availability zones (probably the same two availability zones we used for the EC2 instances).
Add routes and subnet associations. Select the Route Tables item in the left navigation pane and highlight the route table for your VPC. Select the Routes tab in the bottom pane, click the Edit button and add a route for network 0.0.0.0/0 whose target is the gateway you created earlier.
Now select the Subnet Associations tab and click the Edit button. Associate your subnets with the route table and click Save.
Note above we’ve associated the EC2 subnets only, the RDS subnets don’t need to access the internet outside of Amazon so we leave them unassociated.
Leave the Network ACLs for your new subnets alone. It is tempting to use inbound rules to restrict inbound connections, but this would filter inbound packets even for connections that were initiated from inside the network because this firewall is stateless. For example, if I want to be able to connect to http://google.com from within one of my EC2 instances, then google.com would have to be whitelisted on the inbound rules for the HTTP response to make it back to me. Instead, just leave it at “allow all”.
Configure Direct Connect
Alex Manoogian in the UNSG helped us get configured. We had to go to VPC panel and create a virtual private gateway, then go to Direct Connect panel and accept the new interface. Here are screen shots:
We then went into the Route Tables console, selected the routing for our VPC, selected the Route Propagation tab and pressed the Edit button.
Check the Propagate checkbox, so we are picking up routing information from Harvard.
Click Save, and select the Routes tab. You should see routing that was pushed by Harvard.
Once the routing is in place any existing SSH connections you have to EC2 instances will hang, and you will no longer be able to create new connections to your instances. This is because a different route is being used between your laptop and the instances, now that both have 10.0.0.0 IP addresses. You have to open ACLs in your security group which permit access from the private IP address space for the OAS VPN, 10.11.82.0/24.
One issue has occurred with the addition of Direct Connect. Here is an email thread describing the issue:
|Alex Manoogian and I have spent a fair amount of time experimenting with Amazon Direct Connect as currently configured and we have run into an issue where internet-facing Load Balancers do not respond to requests from hosts on the oasadmin VPN (or indeed from any host with a Harvard IP address).
Alex has identified the issue as being with the Load Balancers acting as a proxy, but rewriting the source IP address for the second (proxy) request with the originating machine’s IP, rather than the Load Balancer’s IP.
A proxied request from my laptop to ServiceMix might look like the following:
Mike’s laptop –> Load Balancer –> ServiceMix
Since the Load Balancer (LB) is internet-facing it has an Amazon IP address and the request from laptop to LB travels over the public internet. The LB then makes an identical proxy request of ServiceMix, identifying my laptop as the source of this second request. When Service Mix responds, it thinks it should respond to the IP address of the laptop, rather than the IP address of the ELB. However it has two routes to the laptop: 1. back the way it came, through the proxy and the public internet, or 2. over the Direct Connect. Direct Connect appears to be the shortest route and that is where it goes. The Load Balancer never gets a chance to receive the proxy response and send a corresponding response to the laptop. The response transmitted over Direct Connect arrives at the laptop, but the laptop doesn’t know what to do with it because it didn’t come from the LB.
We have a few options:
1. Make our Load Balancers “Internal” rather than “Internet Facing”. This means the load balancers get Harvard IP addresses rather than Amazon IP addresses so all traffic goes over the Direct Connect and everything works as expected. This means, however, that external organizations would not have any access to the Load Balancers. We couldn’t do integrations with systems outside of Harvard (like vendors, etc.) without additional hardware (see #2).
2. We could make LBs internal, but add infrastructure to expose them to the public. We could set up NATed access to the internal LB’s in much the same way we expose hosts on a secure subnet at 60 Oxford St, or we could provide both internal LBs and internet facing LBs. Either option would add cost; we’d require an extra EC2 instance to function as the NAT host, or we’d need two more load balancers.
3. Alex suggested we could have customized route tables provided to us by the NOC. This would mean that instead of telling our cloud how to get to Harvard as a whole, the NOC would instead tell it how to get to each host we need to access. For example, we were able to remove the oasadmin VPN from the list of Harvard routes pushed to our cloud, and suddenly the LBs started working correctly from my laptop. I’m not sure this would work for all connections, because effectively it is forcing the connection to go outside of the Direct Connect and in some cases we need it to go through the Direct Connect (if a database server, for example, does not have a public IP address). Furthermore, this would be like ACL hell.
4. We could wait and see if there is a change in network strategy. There is talk of forcing all access to Amazon, either to Harvard IPs or to Amazon IPs, through the Direct Connect, which would mean this issue just goes away.
We are going with option #1 for the time being, and when/if we need public access then we can hope #4 has occurred, or we can spend the money to do #2.
Configure ACLs and Default Security Group for the VPC
Now we configure ACLs and security groups for the VPC.
First, select Network ACLs in the left navigation pane of the VPC console. A default entry has been created for your new VPC. This is just a different view into the same ACLs we saw in the Subnets page. As before leave the ACLs wide open: do not change the inbound rules. You can click in the Name column to rename the ACL set if you wish.
Next, select Security Groups in the left navigation pane of the VPC console. This is where we create rules like the conventional stateful network ACLs we normally use at Harvard.
A default security group has been created for your VPC when the VPC was created; we’ll use this one for our EC2 instances (we will later additional security groups for load balancers and for RDS). Select the default security group and rename by clicking in the Name column as before. Choose the Inbound Rules tab and open those ports you will need access to:
Note that in addition to the ports we are opening up to the outside world (22, 8101, 8181, 8161, 61616) we have also granted access to all incoming traffic coming from the group itself (see red circles). This lets EC2 instances talk to other EC2 instances, even though they may be in different subnets (i.e. availability zones). This is particularly important once we bring elastic load balancers into play because the balancer in one subnet might need to route traffic to an EC2 in another subnet. We could refine this rule to particular ports (because the load balancers are only active on a small set of ports) but it doesn’t seem worth doing.
In a VPC you can configure Amazon’s DHCP to set up custom DNS and NTP servers. Go to the VPC console and select DHCP Option Sets in the left navigation. Add a new option set as follows:
This identifies Harvard’s DNS servers and the IP address of Harvard’s NTP server, time.harvard.edu. Now navigate back to Your VPCs, select the VPC of interest and under the Actions button choose Edit DHCP Options Set.
Select the DHCP options set you just created and press Save:
Create the Database (RDS) Instance
We already have two VPC subnets aimed at RDS usage in two different availability zones, created in the previous section.
We took the following steps to create an RDS instance for the ATSESB pilot.
Create Security Group
In the VPC console choose Security Groups in left navigation pane, and create a VPC security group specifically for RDS instances and associated with our VPC. We named it “esb-test-rds”:
Go to the RDS console, choose Subnet Groups in the navigation pane then click Create DB Subnet Group. Add the two subnets which are aimed at RDS to the DB subnet group.
Create Option Group
We want to encrypt all data at rest, which means we need to use Oracle EE with the TDE (transparent data encryption) option. To enable TDE, you must create an AWS option group. Go to the RDS console and select Option Groups in the left navigation menu. Click the Create Group button and create an Oracle EE option group:
Now select the new group, press the Add Option button and add the TDE option to your group:
Create RDS Instance
Now create an RDS instance. Choose Oracle EE:
Respond “Yes” to production purposes:
Choose a smallish instance (I chose “db.m1.small”) and storage (10 GB was the minimum):
Since we are deploying to Harvard IP addresses, we must set Publicly Accessible to “No” to get access to the database using TOAD (you can’t change this later). Also make sure you select your VPC, Subnet Group, the Security Group we created earlier, and the Option Group we just created:
Press the Create Instance button and after about 15 minutes you will have a database. Note the connection parameters are available on the main RDS instances list:
To get access to the DB from a desktop using TOAD we have to adjust ACLs in the security group we created at the beginning of this section. In the VPC dashboard, Security Groups, esb-test-rds group we added inbound access to port 1521 from 10.11.82.0/24 (the oasadmin VPN), and confirmed we could connect with telnet “telnet esb-test-amq.c6iqxbiri34d.us-east-1.rds.amazonaws.com 1521”. We also added a rule permitting the hosts in the esb-test-ec2 subnets (actually the whole 10.11.82.0/24 subnet) to access the RDS instance:
Finally we set up an entry in Oracle’s tnsnames.ora on our desktops:
(ADDRESS = (PROTOCOL = TCP)
(HOST = esb-test-amq.c6iqxbiri34d.us-east-1.rds.amazonaws.com)
(PORT = 1521))
(SERVICE_NAME = ORCL)
and were able to connect using a TOAD command line like:
“C:\Program Files (x86)\Quest Software\TOAD for Oracle 12\Toad.exe” -c “amq/secret@ATSESBTEST”
Note the database host name “esb-test-amq.c6iqxbiri34d.us-east-1.rds.amazonaws.com” is not random. If you delete the database and recreate with the same DB Instance Identifier, “esb-test-amq”, the resulting DB will have the same host name.
You can confirm TDE is enabled with:
SELECT * FROM v$encryption_wallet;
Now create an encrypted table space (you can’t alter an existing table space and encrypt it) and set it as the default table space for the AMQ user:
CREATE TABLESPACE CRYPT_TS ENCRYPTION USING ‘AES256’ DEFAULT STORAGE (ENCRYPT);
ALTER USER AMQ default tablespace CRYPT_TS quota unlimited on CRYPT_TS;
You can leave the empty and now-unused USERS table space in place; it is not consuming enough space to worry about.
Once you start ActiveMQ you should see it creates three tables in the CRYPT_TS table space:
Future work: once we have created the RDS database in its final incarnation we will open a ticket with the SOC to create an onames LDAP entry for this database so that modifying our tnsnames.ora files is not necessary.
Create EC2 Instances in the VPC
Now create an EC2 instance in the new VPC.
In the EC2 console choose the Amazon Linux AMI:
I chose m3.medium instances:
In the next screen make sure you select the correct VPC in the Network field, and one of the subnets you created earlier. Also be sure you assign a public IP (without a public IP you can’t make outbound internet requests from the instance). You can also specify the private (Harvard) IP address you want for the instance. This IP will still be allocated via DHCP, but will be reserved for this instance. We’ve specified private IPs so we can create DNS CNAME records. For esbtest1 we’re using 10.39.8.10 and for esbtest2 we’re using 10.39.8.70.
I took the default storage:
Give the instance an appropriate name:
Select the security group you configured earlier:
Repeat the previous section and create a second EC2 instance, identical to the first, except for host name and IP address. Make sure you create it in the other EC2 subnet so it runs in a second availability zone.
Create Load Balancers
Create Security Group
Create a new security group specifically for the load balancers you will be creating ,so we can tweak the access controls independently of those for the backing EC2 instances. Name it “esb-test-elb”. Open ports 443 and 61617 to the OAS VPN. I’ve also opened all ports to connections from our VPC (circled in red below) for developer convenience.
Update our existing esb-test-ec2 (default) security group to permit access to the EC2 instances from the ELB:
Create Load Balancers
Create Load Balancers in the EC2 console. We need two load balancers: one for ServiceMix and one for ActiveMQ.
|Load Balancer||Monitors||Load Balance Ports|
|esbtest-smix||8101||443 -> 8181 (SSL)
8181 -> 8181
|esbtest-amq||61616||443 -> 8161
61616 -> 61616
61617 -> 61616 (SSL)
To create the ActiveMQ load balancer select the Load Balancers link in the left navigation pane of the EC2 console. Press the Create Load Balancer button.
You will select the VPC and the ports you wish to balance. In this case we chose our TEST VPC and ports 8161 and 61616, the ActiveMQ ports:
Note above that we choose to create an “Internal” load balancer. This will get a Harvard IP, so it will not be accessible outside Harvard. We did this because of routing issues with publicly accessible load balancers and Direct Connect.
Since one of the ports we’re exposing is SSL, we are prompted for a certificate and cipher. Here we reuse an existing certificate, but we could have chosen to enter a new one:
You will configure how AWS will determine if a node is healthy (here we monitor port 61616):
In the above dialog we also select settings for how often Amazon will confirm the nodes we are load balancing are up. We have set the health check interval to 10 seconds and the two thresholds to 2, meaning it will take Amazon up to 20 seconds to notice a node is down, or that a node has come back up.
Next, select your subnets. Here we pick the two subnets where ActiveMQ will be running (the two subnets we have targeted for EC2 instances):
Assign a security group. Here we’ve selected the group we created specifically for load balancers:
Finally, add EC2 instances to the load balancer pool:
When you are done, the ELB will look like the following. Here we see one ActiveMQ instance in service and the other instance out of service, which is normal for ActiveMQ configured with a shared DB:
Next, create a second load balancer for ServiceMix. Repeat the above steps to create a second load balancer for the same EC2 instances, but this time balancing port 8181 and monitoring port 8101.
Amazon creates a DNS “A” record for the load balancer IP address. We can create our own DNS “CNAME” record pointing to this “A” record if we want to give the load balancer a Harvard domain name. I’ve requested the Harvard network group create CNAME records for the following:.
When creating an internal load balancer Amazon will randomly select IP addresses from the subnets you provide. There doesn’t appear to be any way to select specific IP addresses.
Terminating SSL on a Load Balancer
As described in the previous section, we’ve terminated SSL on the load balancers. The ELB function as the SSL termination point, and it can forward traffic to EC2 instances in clear text.
Here is the astsebtest.cadm.harvard.edu ELB with a port 443 listener (and an SSL certificate) and we’re forwarding to port 8181 on the EC2 instances:
Note we have to set up an inbound rule in the Security Group to permit access from port 443.
In a similar manner we set up SSL on the atsesbtestmx.cadm.harvard.edu load balancer:
Here we’ve added two listeners; one on port 443 for the ActiveMQ web-based console, and one on port 61617 for the ActiveMQ broker itself.
If the application running on the EC2 instances does redirects, you have to be careful, because the application has no knowledge it is running on port 443. For both ServiceMix and ActiveMQ this was important. Here are a couple of URLs which did not work and the associated URLs which did: