wiki:cs122b-2017-winter-project5
Last modified 8 months ago Last modified on 03/17/17 15:11:40

CS122B Project 5: Scaling Fabflix and Performance Tuning

Due: March 18, 2017, Saturday, 11:45 pm. Submit on EEE.
Notice that we use 1 day after the official deadline as the submission cut-off time on EEE to allow you to use the 24-hour grace period if you chose so. After that, EEE will no longer accept submissions.

This project has the following tasks:

  1. JDBC connection pooling and Prepared statements
  2. Scaling Fabflix with a cluster of MySQL/Tomcat and a frontend load balancer;
  3. Measuring the performance of Fabflix search feature

Task 1: Connection Pooling and Prepared Statements

In this task, we will enable Fabflix with Connection Pooling and Prepared Statements.

Step 1: Make the necessary changes and make sure to use Prepared Statements in all JDBC statements involved in search. You can use this tutorial on prepared statements.

Step 2: Enable JDBC connection pooling for Fabflix. We will use our running application of "TomcatTest" to show how to change it to use connection pooling. Note: We do not know whether the instructions will work for JSP or not.

  1. Download and deploy the following TomcatPooling.war file. The file TomcatPooling\META-INF\context.xml includes important information about the database, user name, password, and pooling configuration. If you want to change this file, you need to undeploy the war file, make those changes, make a new war file, and deploy the new war file. Otherwise, the Tomcat server may still use the previous context.xml. Check http://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html for more information about how to configure connection pooling.
  2. Go to the link http://localhost:8080/TomcatPooling/servlet/TomcatTest to test your application. You must see a list of rows displayed on the webpage.
  3. Look into the following files for the main changes in this war file compared to the previous war file without connection pooling:
    • \META-INF\context.xml.
    • \WEB-INF\sources\TomcatTest.java
    • \WEB-INF\web.xml (see the resource-ref tag).

You may also find some useful links for connection pooling from this tutorial.

When you are done, write an explanation of how/where you use the JDBC connection pooling in your code. You should submit this report to EEE.


Task 2: Scaling Fabflix

Step 1: Setup two AWS instances as two backend servers by following this MySQL replication tutorial. We call them the "master instance" and the "slave instance" (in the context of MySQL replication).

Step 2 (master/slave): Create a dummy user for two example Tomcat applications:

shell> mysql -u root -p
mysql> CREATE USER 'mytestuser'@'localhost' IDENTIFIED BY 'mypassword';
mysql> GRANT ALL ON *.* TO 'mytestuser'@'localhost';

Step 3 (master/slave): Setup Tomcat on each master/slave instance. (You should have done it many times.)

Step 4 (master/slave): On each master/slave instance, deploy TomcatTest.war. Make the URL http://PUBLIC_IP:8080/TomcatTest/servlet/TomcatTest work. You may need to modify the username/password and IP address (use the internal AWS instance address). Also make sure to modify the AWS security group setting for these two instances to allow remote access to their 8080 port.

Step 5 (master/slave): On each master/slave instance, deploy Session.war. Make the URL http://PUBLIC_IP:8080/Session/servlet/ShowSession?myname=Michael work.

Step 6 (instance 1): On the instance that runs the original Fablix instance (called "instance 1"), setup Apache and its proxy by doing the following:

  1. Install Apache2 and related modules:
    instance1-shell> sudo apt-get install apache2
    instance1-shell> sudo a2enmod proxy proxy_balancer proxy_http rewrite headers lbmethod_byrequests
    instance1-shell> sudo service apache2 restart
    
  2. Configure the Apache2 web server to use its proxy_balancer module for sharing (i.e., redirecting) requests to the backend instances. To do it, edit the following configuration file:
    instance1-shell> sudo vim /etc/apache2/sites-enabled/000-default.conf
    

Create a load balancer proxy, whose members are the backend instances. In particular, define a proxy on top of the file, before the <VirtualHost *:80> tag.

<Proxy "balancer://TomcatTest_balancer">
    BalancerMember "http://172.2.2.2:8080/TomcatTest/"
    BalancerMember "http://172.3.3.3:8080/TomcatTest/"
</Proxy>

Here we assume '172.2.2.2' and '172.3.3.3' are the private IP address of the master and slave instances, respectively.

  1. Add two new rules in the body of the VirtualHost tag.
    ProxyPass /TomcatTest balancer://TomcatTest_balancer
    ProxyPassReverse /TomcatTest balancer://TomcatTest_balancer
    
  2. Restart Apache:
    instance1-shell> sudo service apache2 restart
    
  3. Modify the security group of the two backend instances to allow instance 1 to access their 8080 port.

These settings will redirect HTTP requests to "instance1_IP/TomcatTest" to one of the two backend instances. To test it, use a browser to point to http://instance1_IP/TomcatTest/servlet/TomcatTest. Be sure to open port 80 of instance 1 to your IP address. Check the Tomcat log of the two backend instances.

instance2-shell> tail -f /var/log/tomcat7/*.log /var/log/tomcat7/*.txt /var/log/tomcat7/*.out

One of them should receive that request. Keep refreshing the page to send multiple requests, and check if the two backends are receiving the requests evenly.

Step 7 (instance 1): Configure the proxy on instance 1 to handle sessions properly. Since the current setting will send requests randomly to the backend, it will not pass cookies properly, causing sessions to fail. We want to make the session persist over several requests of the same client, i.e., to have a sticky session. To do it, read the instructions, especially those under "Examples of a balancer configuration." Here's a sample setting for the /etc/apache2/sites-enabled/000-default.conf file for the "Session.war" application:

Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED

<Proxy "balancer://Session_balancer">
    BalancerMember "http://172.2.2.2:8080/Session" route=1
    BalancerMember "http://172.3.3.3:8080/Session" route=2
ProxySet stickysession=ROUTEID
</Proxy> 

Also do the following:

  1. Add two new rules in the body of the VirtualHost tag.
    ProxyPass /Session balancer://Session_balancer
    ProxyPassReverse /Session balancer://Session_balancer
    
  2. Restart Apache:
    instance1-shell> sudo service apache2 restart
    

Test if it works by pointing to the URL http://instance1_IP/Session/servlet/ShowSession?myname=Michael of instance 1. It should access one of the backend instances only.

Step 8 (instance 1): Send write operations to the master instance only. Since eventually we want the master instance to handle write operations, we need to make sure all write requests to instance 1 are redirected to the master only. We can achieve the goal by defining an additional pair of ProxyPass and ProxyPassReverse in the VirtualHost tag using a new URL pattern. In particular, add the following before the old "ProxyPass /Session" proxy:

ProxyPass /Session/write http://172.2.2.2:8080/Session
ProxyPassReverse /Session/write http://172.2.2.2:8080/Session

Notice that the proxy rules in the Apache configuration file are called sequentially. That's why we have to define this rule before the old "ProxyPass /Session" rule.

Reload and restart the Apache2 web server:

sudo service apache2 force-reload
sudo service apache2 restart

Test the URL http://instance1_IP/Session/write/servlet/ShowSession?myname=Michael. It should be sent to the master instance only.

Here's a sample Apache configuration file 000-default.conf.

Step 9 (master/slave): Make the two backend instances form a master/slave cluster based on the instructions you learned in step 1 about MySQL replication tutorial.

Step 10 (main task): Deploy your Fabflix system to the two backend instances. Do MySQL master/slave replication. Configure the original instance properly to enable load balancing, connection pooling, support sessions, and send write requests to the master instance only. Enabling the scaled version with HTTPS is optional. Note that you are required to add a section to the connection pooling report and explain how to use connection pooling in the case of having two backend servers.

The following is the architecture diagram:

No image "load_balancing.png" attached to cs122b-2017-winter-project5

Notice that in this architecture, each Tomcat only talks to its own MySQL. Sometimes we may want to setup a cluster of MySQL, and let each Tomcat talk to the cluster through another load balancer. If interested, you can read this page on how to set it up.


Task 3: Measuring the performance of Fabflix search feature

In this part, we will measure the performance of the keyword search feature that you have implemented in the past projects. The measurement results described in subtasks 3.1 and 3.2 must be reported for both the single-instance (i.e., the version that you prepared in Task 1) and the scaled version of Fabflix. Note: The URL to the single-instance version should be http:///INSTANCE1_PUBLIC_IP:8080/fabflix, while it should be http://INSTANCE1_PUBLIC_IP:80/fabflix for the scaled version that you prepared in Task 2.

Task 3.1: Preparing the codebase for time measurement

Here, we are going to prepare for measuring the following two statistical variables: (1) the average time it takes for the search servlet to run completely for a query (called TS), and (2) the average time spent on the parts that use JDBC, per query (called TJ).

Step 1. Use the following sample to insert the necessary time statements for measuring TS and TJ. You are required to measure and log the value of "search servlet total execution time" and "JDBC execution time" for every request served by the server (i.e., assuming these values are printed in one line per query, if a query workload of 1000 queries is fired to the system, we must have 1000 lines in the log file, each line containing one sample value for calculating TS and TJ).

Particularly for TS samples, it is highly recommended to place these log statements in a filter that wraps the search servlet.

// Time an event in a program to nanosecond precision
long startTime = System.nanoTime();
/////////////////////////////////
/// ** part to be measured ** ///
/////////////////////////////////
long endTime = System.nanoTime();
long elapsedTime = endTime - startTime; // elapsed time in nano seconds. Note: print the values in nano seconds 

Step 2. Write a script (in any language that you prefer) to process the resulting log file of a query workload and calculate TS and TJ (i.e., by parsing the log statements and taking the average of all the samples).

Submission-related note: The usage of this script, which is expected to be found at the root of your .war file, must be explained in your README file.

Task 3.2: Preparing the test plan in Apache JMeter

In this part, you will use Apache JMeter to measure the performance of the search feature of the Fabflix website. In particular, you must measure the average query time of the search feature using a set of queries based on the movie tiles in this file. Assume the page size is 50, and we only want the first page of results.

The following figure illustrates the round-trip time of a query from a client to the server then back to client. The query time of a query (i.e., "Tq") is the total time starting from when the search request is sent from the client (Ts) until the time when the response has completely received by the client (Te). It includes two major parts: (1) response time (Tr) is the time it takes until the client hears the first bit of the response, and (2) "payload time" (Tp) is the time it takes for the response data to be downloaded by the client completely.

Step 1: Read this reference to get an overview of Jmeter. Read this page to get familiar with JMeter basics.

Step 2: Download and install JMeter from this link.

Step 3: Use this link to make a test plan for the search feature of your website. You will run the Jmeter test from your local development against the remote AWS instance. The plan must iteratively generate a proper HTTP or HTTPS search request for every movie title in the provided query file. Here is a useful page about how to use a CSV file for Jmeter. Here are other useful tutorials on how to get request parameter values from an external CSV file: Link 1, Link 2, and Link 3.

Task 3.3: Collecting the performance results

Run the tests for all the following settings to collect performance results. For each case, remember to make the necessary changes to the JMeter test plan and/or the codebase. Use the results to fill out this HTML file as your measurement report. For each case, report the requested values in the corresponding columns, and write a short analysis for that case in the last column. This image is an example of what you should report in the second column called "Graph Results Screenshot".

Notes:

  1. In all cases, if not mentioned otherwise, your Fabflix codebase is assumed to use both the Prepared Statements and Connection Pooling optimization techniques.
  2. If more than one JMeter thread is to be used, each thread should start a new session in Tomcat (i.e., threads should not share a session-id).

Single-instance cases (i.e., that is accessible via http://INSTANCE1_PUBLIC_IP:8080/fabflix):

  1. Use HTTP, 1 thread in JMeter.
  2. Use HTTP, 10 threads in JMeter.
  3. Use HTTPS, 10 threads in JMeter.
  4. Use HTTP, without using prepared statements, 10 threads in JMeter.
  5. Use HTTP, without using connection pooling, 10 threads in JMeter.

Scaled-version cases (i.e., that is accessible via http://INSTANCE1_PUBLIC_IP:80/fabflix):

  1. Use HTTP, 1 thread in JMeter.
  2. Use HTTP, 10 threads in JMeter.
  3. Use HTTP, without using prepared statements, 10 threads in JMeter.
  4. Use HTTP, without using connection pooling, 10 threads in JMeter.

Preparing the Package for Submission

  1. There will be no demo for this project.
  2. You should take the following steps to prepare your package for submission. When prepared, submit the package to EEE (i.e., one submission per team).
    • Create a directory, called "project5_[GROUP ID]" on your local machine. You are required to include all the three reports, README, as well as the "fabflix_webapp.war".
    • In README, write down the address of your AWS instance
    • Open ports 80 and 8080 of your AWS instance to the following addresses:
      • 128.195.52.56
      • 128.195.52.58

Attachments