wiki:cs122b-2017-spring-project5
Last modified 4 months ago Last modified on 06/07/17 22:58:53

CS122B Project 5: Scaling Fabflix and Performance Tuning

Due: June 9, 2017, Friday, 11:45 pm. Submit on EEE.
Notice that we use 1 day after the official deadline as the submission cut-off time on EEE to allow you to use the 24-hour grace period if you chose so. After that, EEE will no longer accept submissions.

This project has the following tasks:

  1. JDBC connection pooling and Prepared statements
  2. Scaling Fabflix with a cluster of MySQL/Tomcat and a frontend load balancer;
  3. Measuring the performance of Fabflix search feature

Task 1: Connection Pooling and Prepared Statements

In this task, we will enable Fabflix with Connection Pooling and Prepared Statements.

Step 1: Make the necessary changes and make sure to use Prepared Statements in all JDBC statements involved in search. You can use this tutorial on prepared statements.

Step 2: Enable JDBC connection pooling for Fabflix. We will use our running application of "TomcatTest" to show how to change it to use connection pooling. Note: We do not know whether the instructions will work for JSP or not.

  1. Download and deploy the following TomcatPooling.war file. The file TomcatPooling\META-INF\context.xml includes important information about the database, user name, password, and pooling configuration. If you want to change this file, you need to undeploy the war file, make those changes, make a new war file, and deploy the new war file. Otherwise, the Tomcat server may still use the previous context.xml. Check http://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html for more information about how to configure connection pooling.
  2. Go to the link http://localhost:8080/TomcatPooling/servlet/TomcatTest to test your application. You must see a list of rows displayed on the webpage.
  3. Look into the following files for the main changes in this war file compared to the previous war file without connection pooling:
    • \META-INF\context.xml.
    • \WEB-INF\sources\TomcatTest.java
    • \WEB-INF\web.xml (see the resource-ref tag).

You may also find some useful links for connection pooling from this tutorial.

When you are done, write an explanation of how/where you use the JDBC connection pooling in your code. You should submit this report to EEE.


Task 2: Scaling Fabflix

Step 1: Setup two AWS instances as two backend servers by following this MySQL replication tutorial. We call them the "master instance" and the "slave instance" (in the context of MySQL replication).

Step 2 (master/slave): Create a dummy user for two example Tomcat applications:

shell> mysql -u root -p
mysql> CREATE USER 'mytestuser'@'localhost' IDENTIFIED BY 'mypassword';
mysql> GRANT ALL ON *.* TO 'mytestuser'@'localhost';

Step 3 (master/slave): Setup Tomcat on each master/slave instance. (You should have done it many times.)

Step 4 (master/slave): On each master/slave instance, deploy TomcatTest.war. Make the URL http://PUBLIC_IP:8080/TomcatTest/servlet/TomcatTest work. You may need to modify the username/password and IP address (use the internal AWS instance address). Also make sure to modify the AWS security group setting for these two instances to allow remote access to their 8080 port.

Step 5 (master/slave): On each master/slave instance, deploy Session.war. Make the URL http://PUBLIC_IP:8080/Session/servlet/ShowSession?myname=Michael work.

Step 6 (instance 1): On the instance that runs the original Fablix instance (called "instance 1"), setup Apache and its proxy by doing the following:

  1. Install Apache2 and related modules:
    instance1-shell> sudo apt-get install apache2
    instance1-shell> sudo a2enmod proxy proxy_balancer proxy_http rewrite headers lbmethod_byrequests
    instance1-shell> sudo service apache2 restart
    
  2. Configure the Apache2 web server to use its proxy_balancer module for sharing (i.e., redirecting) requests to the backend instances. To do it, edit the following configuration file:
    instance1-shell> sudo vim /etc/apache2/sites-enabled/000-default.conf
    

Create a load balancer proxy, whose members are the backend instances. In particular, define a proxy on top of the file, before the <VirtualHost *:80> tag.

<Proxy "balancer://TomcatTest_balancer">
    BalancerMember "http://172.2.2.2:8080/TomcatTest/"
    BalancerMember "http://172.3.3.3:8080/TomcatTest/"
</Proxy>

Here we assume '172.2.2.2' and '172.3.3.3' are the private IP address of the master and slave instances, respectively.

  1. Add two new rules in the body of the VirtualHost tag.
    ProxyPass /TomcatTest balancer://TomcatTest_balancer
    ProxyPassReverse /TomcatTest balancer://TomcatTest_balancer
    
  2. Restart Apache:
    instance1-shell> sudo service apache2 restart
    
  3. Modify the security group of the two backend instances to allow instance 1 to access their 8080 port.

These settings will redirect HTTP requests to "instance1_IP/TomcatTest" to one of the two backend instances. To test it, use a browser to point to http://instance1_IP/TomcatTest/servlet/TomcatTest. Be sure to open port 80 of instance 1 to your IP address. Check the Tomcat log of the two backend instances.

instance2-shell> tail -f /var/log/tomcat8/*.log /var/log/tomcat8/*.txt /var/log/tomcat8/*.out

One of them should receive that request. Keep refreshing the page to send multiple requests, and check if the two backends are receiving the requests evenly.

Step 7 (instance 1): Configure the proxy on instance 1 to handle sessions properly. Since the current setting will send requests randomly to the backend, it will not pass cookies properly, causing sessions to fail. We want to make the session persist over several requests of the same client, i.e., to have a sticky session. To do it, read the instructions, especially those under "Examples of a balancer configuration." Here's a sample setting for the /etc/apache2/sites-enabled/000-default.conf file for the "Session.war" application:

Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED

<Proxy "balancer://Session_balancer">
    BalancerMember "http://172.2.2.2:8080/Session" route=1
    BalancerMember "http://172.3.3.3:8080/Session" route=2
ProxySet stickysession=ROUTEID
</Proxy> 

Also do the following:

  1. Add two new rules in the body of the VirtualHost tag.
    ProxyPass /Session balancer://Session_balancer
    ProxyPassReverse /Session balancer://Session_balancer
    
  2. Restart Apache:
    instance1-shell> sudo service apache2 restart
    

Test if it works by pointing to the URL http://instance1_IP/Session/servlet/ShowSession?myname=Michael of instance 1. It should access one of the backend instances only.

Here's a sample Apache configuration file 000-default.conf.

Step 8 (main task): Deploy your Fabflix system to the two backend instances. Do MySQL master/slave replication. Configure the original instance properly to enable load balancing, connection pooling, sticky sessions. Also make sure the write request must be sent to the master mysql instance, while read request should be sent to either master or slave mysql instance. Enabling the scaled version with HTTPS is optional. Note that you are required to add a section to the connection pooling report and explain how to use connection pooling in the case of having two backend servers.

The following is the architecture diagram:

Notice that in this architecture, each Tomcat can talk to both mysql instances. Sometimes we may want to setup a cluster of MySQL, and let each Tomcat talk to the cluster through another load balancer. If interested, you can read this page on how to set it up.

Step 9 (Google Cloud): After finishing the main task, you then need to install and configure the load balancer on one Google Cloud instance. As before, this load balancer needs to redirect requests to one of the Tomcat servers. However, the load balancer needs to use the public IPs of the AWS instances, and make sure you have port 8080 open to the Google Cloud instance.

You can get $300 free credits for Google Cloud from free-trial. After that, start a computing engine (using Ubuntu 16.04) by following this tutorial, and repeat the exact same steps as you did for installing the load balancer on AWS.


Task 3: Measuring the performance of Fabflix search feature

In this part, we will measure the performance of the keyword search feature that you have implemented in the past projects. The measurement results described in subtasks 3.1 and 3.2 must be reported for both the single-instance (i.e., the instance that you prepared in earlier projects and Task1) and the scaled version of Fabflix. Note: The URL to the single-instance version should be http:///INSTANCE1_PUBLIC_IP:8080/fabflix, while it should be http://INSTANCE1_PUBLIC_IP:80/fabflix for the scaled version that you prepared in Task 2.

Task 3.1: Preparing the codebase for time measurement

Here, we are going to prepare for measuring the following two statistical variables: (1) the average time it takes for the search servlet to run completely for a query (called TS), and (2) the average time spent on the parts that use JDBC, per query (called TJ).

Step 1. Use the following sample to insert the necessary time statements for measuring TS and TJ. You are required to measure and log the value of "search servlet total execution time" and "JDBC execution time" for every request served by the server (i.e., assuming these values are printed in one line per query, if a query workload of 1000 queries is fired to the system, we must have 1000 lines in the log file, each line containing one sample value for calculating TS and TJ).

Particularly for TS samples, it is highly recommended to place these log statements in a filter that wraps the search servlet.

// Time an event in a program to nanosecond precision
long startTime = System.nanoTime();
/////////////////////////////////
/// ** part to be measured ** ///
/////////////////////////////////
long endTime = System.nanoTime();
long elapsedTime = endTime - startTime; // elapsed time in nano seconds. Note: print the values in nano seconds 

Step 2. Write a script (in any language that you prefer) to process the resulting log file of a query workload and calculate TS and TJ (i.e., by parsing the log statements and taking the average of all the samples).

Submission-related note: The usage of this script, which is expected to be found at the root of your .war file, must be explained in your README file.

Task 3.2: Preparing the test plan in Apache JMeter

In this part, you will use Apache JMeter to measure the performance of the search feature of the Fabflix website. In particular, you must measure the average query time of the search feature using a set of queries based on the movie tiles in this file. Assume the page size is 50, and we only want the first page of results.

The following figure illustrates the round-trip time of a query from a client to the server then back to client. The query time of a query (i.e., "Tq") is the total time starting from when the search request is sent from the client (Ts) until the time when the response has completely received by the client (Te). It includes two major parts: (1) response time (Tr) is the time it takes until the client hears the first bit of the response, and (2) "payload time" (Tp) is the time it takes for the response data to be downloaded by the client completely.

Step 1: Read this reference to get an overview of Jmeter. Read this page to get familiar with JMeter basics.

Step 2: Download and install JMeter from this link.

Step 3: Use this link to make a test plan for the search feature of your website. You will run the Jmeter test from your local development against the remote AWS instance. The plan must iteratively generate a proper HTTP or HTTPS search request for every movie title in the provided query file. Here is a useful page about how to use a CSV file for Jmeter.

Task 3.3: Collecting the performance results

Run the tests for all the following settings to collect performance results. For each case, remember to make the necessary changes to the JMeter test plan and/or the codebase. Use the results to fill out this HTML file as your measurement report. For each case, report the requested values in the corresponding columns, and write a short analysis for that case in the last column. This image is an example of what you should report in the second column called "Graph Results Screenshot".

Notes:

  1. In all cases, if not mentioned otherwise, your Fabflix codebase is assumed to use both the Prepared Statements and Connection Pooling optimization techniques.
  2. If more than one JMeter thread is to be used, each thread should start a new session in Tomcat (i.e., threads should not share a session-id).

Single-instance cases (i.e., that is accessible via http://INSTANCE1_PUBLIC_IP:8080/fabflix):

  1. Use HTTP, 1 thread in JMeter.
  2. Use HTTP, 10 threads in JMeter.
  3. Use HTTPS, 10 threads in JMeter.
  4. Use HTTP, without using prepared statements, 10 threads in JMeter.
  5. Use HTTP, without using connection pooling, 10 threads in JMeter.

Scaled-version cases (i.e., that is accessible via http://INSTANCE1_PUBLIC_IP:80/fabflix):

  1. Use HTTP, 1 thread in JMeter.
  2. Use HTTP, 10 threads in JMeter.
  3. Use HTTP, without using prepared statements, 10 threads in JMeter.
  4. Use HTTP, without using connection pooling, 10 threads in JMeter.

Preparing the Package for Submission

  1. There will be no demo for this project.
  2. You should take the following steps to prepare your package for submission. When prepared, submit the package to EEE (i.e., one submission per team).
    • Create a directory, called "project5_[GROUP ID]" on your local machine. You are required to include all the three reports, README, as well as the "fabflix_webapp.war".
    • In README, write down the address of your AWS instance and Google Cloud instance.
    • Open ports 80 and 8080 of your AWS instances and your Google Cloud instance to the following addresses:
      • 128.195.0.0/16

Attachments