Configuring Transparent Proxy Server on AWS VPC NAT instance for Controlled Access to the S3 bucket / Specific URL

Goal:We use S3 bucket for storing sensitive data in and process it on EC2 instances, located in the private subnet of the private/public VPC. To control access on account/user s3 bucket we set up S3 buckets policy by IP and user(iam) arn’s thus i consider that data in s3 bucket is ‘on the safe side’. But Issue with this approach is private subnet VM should be able to access only the specific/user S3 bucket which is not directly possible with AWS Security group and Network ACL configuration.This problem actually leads to a big Security leak i.e user  uploads malware application/simple curl command  on ec2 instance and during processing data executes malware application/Curl Command that transfer data to other(unauthorized)  S3 buckets under different AWS account.
To avoid this security leak we need to disable uploading data to ec2-instance to any other S3 bucket/ HTTPS urls.

Problem: is it possible to restrict access on vpc firewal in such way that it will be access to some specific s3 buckets but it will be denied access to any other buckets? Assumed that user might upload malware application to ec2 instance and within it upload data to other buckets(under third-party AWS account).

“To solve this problem i spent hell allot of time because of the partial information available on the internet source. Then thought of writing a blog to configure it so that Geek around the world not face Same problem.”

Possible Solution: We should configure Transparent HTTP/HTTPS proxy server which will do URL filtering for all the outgoing request going from the private Subnet to internet world. So solution seems very easy right  so before going into that let understand what are the hell these proxy server and AWS Private-Public VPC and NAT:

VPC with Public and Private Subnets:

This VPC configuration for a virtual private cloud (VPC) with a public subnet and a private subnet. This is recommended to the scenario if you want to run a public-facing web application, while maintaining back-end servers that aren’t publicly accessible. A common example is a multi-tier website, with the web servers in a public subnet and the database servers in a private subnet. You can set up security and routing so that the web servers can communicate with the database servers.

The instances in the public subnet can receive inbound traffic directly from the Internet, whereas the instances in the private subnet can’t. The instances in the public subnet can send outbound traffic directly to the Internet, whereas the instances in the private subnet can’t. Instead, the instances in the private subnet can access the Internet by using a network address translation (NAT) instance that you launch into the public subnet.

(More information ) http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Scenario2.html

http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_NAT_Instance.html

What is a Proxy Server?

A Proxy server is an intermediary machine, between a client and the actual server, which is used to filter or cache requests made by the client.

Normal (Regular/Caching) Proxy:

A regular caching proxy server is a server which listens on a separate port (e.g. 3128) and the clients (browsers) are configured to send requests for connectivity to that port. So the proxy server receives the request, fetches the content and stores a copy for future use. So next time when another client requests for the same webpage the proxy server just replies to the request with the content in its cache thus improving the overall request-reply speed.

Transparent Proxy:

A transparent proxy server is also a caching server but the server is configured in such a way that it eliminates the client side (browser side) configuration. Typically the proxy server resides on the gateway and intercepts the WWW requests (port 80, 443 etc.) from the clients and fetches the content for the first time and subsequently replies from its local cache. The name Transparent is due to the fact that the client doesn’t know that there is a proxy server which mediates their requests. Transparent proxy servers are mostly used in big corporate organizations where the client side configuration is not easy (due to the number of clients). This type of server is also used in ISP’s to reduce the load on the bandwidth usage.

Reverse Proxy:

A reverse proxy is totally different in its usage because it is used for the benefit of the web server rather than its clients. Basically a reverse proxy is on the web server end which will cache all the static answers from the web server and reply to the clients from its cache to reduce the load on the web server. This type of setup is also known as Web Server Acceleration.

References:

http://en.wikipedia.org/wiki/Proxy_server

http://www.webupd8.org/2010/02/differences-between-3-types-of-proxy.html


Solution/Approach:

Since we understood that in the VPC with Private  and Public AWS automatically send the all the outgoing request  for the internet sources generated from the private Subnet to the NAT instance. So we need to configure our proxy server solution on the NAT instance itself. If you are manually provisioning the AWS VPC then you need to configure proxy server after NAT instance is ready. What if want to remove this manual configuration then you can Create a AMI  for the NAT instance and use that AMI for NAT instance to setup with  cloudformation template for the VPC with private public subnet.

We will be using Squid as Proxy server for filtering HTTP and HTTPS url (http://www.squid-cache.org/). We will configure NAT instance to use Squid as transparent proxy server for the http and https urls.

Squid work fine as transparent proxy server for the HTTP urls but it does not work for the HTTPS because of the following reason.

Why HTTPS filtering exclusions do not work when Squid intercepts HTTPS connections transparently? If your Squid proxy is configured to transparently intercept and decrypt HTTPS connections, then HTTPS domain name exclusions shown in the Squid URL filtering cannot be done. The reason for this is simple – domain name is not available at the time when Squid need to decide whether to decrypt the HTTPS connection or not. Only IP addresses of client and server are available. Domain name becomes available only after HTTPS decryption.

So  configure Squid transparent proxy serve for the HTTP request and normal plain https proxy server for the https urls so that we can support https url filtering as well. But we will configure NAT instance firewall iptable route to send all the internet request generated from private sub-net to pass through Squid get blocked if proxy setting are not used.

Follow Below steps to configure SQUID on NAT:

 Steps for Configuring Custom NAT instance AMI and Install Squid Proxy Server:

1. Launch  EC2 instance using  Amazon Linux AMI (64-bit) AMI from the market place (Note that for the NAT instance custom AMI you only need use Amazon Linux AMI with other linux falvor AMI doesn’t work as it NAT instance AMI ).

2.  Login to the Ec2 instance and copy /usr/local/sbin/configure-pat.sh for the exiting NAT instance you have or copy below content and create a file in /usr/local/sbin/configure-pat.sh.


#!/bin/bash

# Configure the instance to run as a Port Address Translator (PAT) to provide
# Internet connectivity to private instances.
#

set -x
echo “Determining the MAC address on eth0”
ETH0_MAC=`/sbin/ifconfig | /bin/grep eth0 | awk ‘{print tolower($5)}’ | grep ‘^[0-9a-f]\{2\}\(:[0-9a-f]\{2\}\)\{5\}$’`
if [ $? -ne 0 ] ; then
echo “Unable to determine MAC address on eth0” | logger -t “ec2”
exit 1
fi
echo “Found MAC: ${ETH0_MAC} on eth0” | logger -t “ec2”
VPC_CIDR_URI=”http://169.254.169.254/latest/meta-data/network/interfaces/macs/${ETH0_MAC}/vpc-ipv4-cidr-block”
echo “Metadata location for vpc ipv4 range: ${VPC_CIDR_URI}” | logger -t “ec2”
VPC_CIDR_RANGE=`curl –retry 3 –retry-delay 0 –silent –fail ${VPC_CIDR_URI}`
if [ $? -ne 0 ] ; then
echo “Unable to retrive VPC CIDR range from meta-data. Using 0.0.0.0/0 instead. PAT may not function correctly” | logger -t “ec2”
VPC_CIDR_RANGE=”0.0.0.0/0″
else
echo “Retrived the VPC CIDR range: ${VPC_CIDR_RANGE} from meta-data” |logger -t “ec2”
fi

echo 1 > /proc/sys/net/ipv4/ip_forward && \
echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects && \
/sbin/iptables -t nat -A POSTROUTING -o eth0 -s ${VPC_CIDR_RANGE} -j MASQUERADE

if [ $? -ne 0 ] ; then
echo “Configuration of PAT failed” | logger -t “ec2”
exit 0
fi
echo “Configuration of PAT complete” |logger -t “ec2”
exit 0


 

 3.  Run following steps to make it executable and run at system boot time:

chmod +x /usr/local/sbin/configure-pat.sh

Add  following entry in the /etc/rc.local at end:

/usr/local/sbin/configure-pat.sh

4.  Follow following steps to configure Squid Amazon Linux AMI (64-bit) AMI comes with its own OPEN SSL package which is not compatible with SQUID.

4.1 Configure and install OpenSSL from source

yum update
yum install wget
wget http://www.openssl.org/source/openssl-1.0.0o.tar.gz
tar -zxvf openssl-1.0.0o.tar.gz
cd openssl-1.0.0o
./config shared –prefix=/opt/squid/openssl –openssldir=/opt/squid/openssl
make
make install
mv /usr/bin/openssl /usr/bin/openssl_back
echo “/opt/squid/openssl/lib” >> /etc/ld.so.conf
ldconfig
cd

Add following line the /etc/profile file
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/squid/openssl/lib/
export PATH=$PATH:/opt/squid/openssl/bin/
source /etc/profile

4.2 Build and install Squid

yum install -y perl gcc autoconf automake make sudo wget gcc-c++
yum install -y libxml2-devel libcap-devel
yum install -y libtool-ltdl-devel
yum install -y glibc-static glibc
yum install -y libstdc++-devel* g++
wget http://www.squid-cache.org/Versions/v3/3.4/squid-3.4.9-20141031-r13187.tar.gz
tar -zxvf squid-3.4.9-20141031-r13187.tar.gz
cd squid-3.4.9-20141031-r13187
./configure –enable-ssl-crtd –enable-ssl –prefix=/usr –includedir=/usr/include –datadir=/usr/share –bindir=/usr/sbin –libexecdir=/usr/lib/squid –localstatedir=/var –sysconfdir=/etc/squid –with-openssl=/opt/squid/openssl
make
make install
mkdir -p /var/lib/ssl_db
cd /usr/lib/squid/
./ssl_crtd -c -s /var/lib/ssl_db
squid -z

5. Configure  Squid as Transparent Http proxy  Copy following configuration for configuring Squid transparent proxy server:

# Put yours bucket DNS comma separated to enbale only selected Bucket of you enviroment
acl aws_bucket dstdomain mybucket.s3.amazonaws.com
# VPC CDIR (10.0.0.0/16) you need to change it according to your VPC CDIR
acl localnetsrc src 10.0.0.0/16
# VPC CDIR(10.0.0.0/16) you need to change it according to your VPC CDIR
acl localnetdst dst 10.0.0.0/16
acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports

# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager
http_access allow localnetsrc localnetdst
http_access allow localnetsrc aws_bucket
#http_access allow localnetsrc
#http_access allow localnetdst
# And finally deny all other access to this proxy
http_access deny all
# ssl-bump settings managed by Diladele Web Safety for Squid Proxy
sslproxy_cert_error allow aws_bucket
sslproxy_cert_error deny all
ssl_bump none localhost
ssl_bump none localnetsrc
ssl_bump none localnetdst
ssl_bump server-first aws_bucket
#ssl_bump none all
#ssl_bump none all
# configure ports
http_port 3127
#configured http proxy as transparent
http_port 3128 intercept
#configured https proxy as plain https proxy to support https url filtering.
http_port 3129 ssl-bump generate-host-certificates=on dynamic_cert_mem_cache_size=4MB cert=/etc/squid/ssl/squidCA.pem
# configure path to ssl cache
sslcrtd_program /usr/lib/squid/ssl_crtd -s /var/lib/ssl_db -M 4MB
# Uncomment and adjust the following to add a disk cache directory.
#cache_dir ufs /var/cache/squid 100 16 256
# Leave coredumps in the first cache dir
coredump_dir /var/cache/squid
#
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 20% 4320

6. Run following IPtables commands on NAT to open proxy port and configure tranparent proxy setting:

sudo /sbin/iptables -t nat -A PREROUTING -i eth0 -p tcp -m tcp –dport 80 -j REDIRECT –to-ports 3128
sudo /sbin/iptables -t nat -A PREROUTING -i eth0 -p tcp -m tcp –dport 443 -j REDIRECT –to-ports 3129
sudo /sbin/iptables -I INPUT 1 -p tcp –dport 3127 -j ACCEPT
sudo /sbin/iptables -I INPUT 1 -p tcp –dport 3128 -j ACCEPT
sudo /sbin/iptables -I INPUT 1 -p tcp –dport 80 -j ACCEPT
sudo /etc/init.d/iptables save
sudo /etc/init.d/iptables restart

7. Configure SSL certificate and run squid

cd /opt/squid/etc/
/etc/squid/
ls
cd /etc/squid/
ls
mkdir ssl
cd ssl/
openssl req -new -newkey rsa:1024 -days 365 -nodes -x509 -keyout squidCA.pem -out squidCA.pem
squid start

Now on your private subnet VM http proxy will automatically work without doing any changes at client side i.e on private subnet VM . But if you want to use on ssl then you need to ‘export https_proxy=” https://squid_ip:3129‘ which will filter the request and send to destination if matches with defined rule. If on client end user unset you proxy setting then Squid will block all the request because squid is not configured for the ssl transparent proxy.  With above approach you will be able to achieve controlled access for your private subnet VM .

Tu run squid in debug mode:

squid -NCd9

 


 

About suniluiit

Technical Architect working in Big data and Cloud technologies for the last 5 years with overall software industry experience of around 9 years. Architected and Working on Impetus Workload Migration Product which allows organizations to saves 50%-80% manual offloading time and cost. It provides faster parallel & scalable data migration to Hadoop along with incremental data options. It also maximize the existing investments in code and reuse of SQL scripts. Architected and Developed cloud agnostic application for deployment and configuration management of the enterprise application including technologies stacks like CQ5, Cassandra, Solr, Application Server, Web server, Haproxy, F5 and messaging server. Experienced in working and leading R&D teams for building new expertise in fields such as Big data, ETL offloading to Big Data and Cloud computing. Worked on some of the impetus open source product around Big Data and Social Media http://code.google.com/p/hadoop-toolkit/ http://code.google.com/p/zing https://github.com/impetus-opensource Specialties: Big data, Hadoop, HIve, Sqoop, Spark, J2EE/ SOA, NoSQL, Cassandra, HBase, Cloud Computing (Private/Hybrid/Public- AWS, Google, Azure, Rackspace, Openstack, VMWare, Terremark, RabbitMQ, Kafka, Memcached, Puppet, HypericHQ, Splunk etc.
This entry was posted in AWS Cloud, Cloud Computing, J2EE technoloy, Monitoring, S3 Bucket, VPC and S3 bucket and tagged , , , , , . Bookmark the permalink.

3 Responses to Configuring Transparent Proxy Server on AWS VPC NAT instance for Controlled Access to the S3 bucket / Specific URL

  1. Pawan says:

    Great post Sunil.. Its working like a charm. You Rock.!!

  2. Hi,

    Great write-up. It should be noted that you can now do transparent proxying over SSL without needing to set https_proxy. I found the instructions here, and combined them with yours:

    http://6pmsolutions.com/2013/11/18/squid-transparent-ssl-interception/

  3. rohan says:

    “private subnet VM should be able to access only the specific/user S3 bucket which is not directly possible with AWS Security group and Network ACL configuration”

    Can you not use IAM policies or s3 bucket policies to control access to specific user/s3 bucket?

Leave a comment