wiki:cs122b-2019-winter-project5

CS122B Project 5: Scaling Fabflix and Performance Tuning

Deadline: Monday, 3/18/2019, 11:45pm. Due on Github

This project has the following tasks:

  1. JDBC connection pooling and Prepared statements
  2. Scaling Fabflix with a cluster of MySQL/Tomcat and a frontend load balancer;
  3. Measuring the performance of Fabflix search feature

Task 1: Connection Pooling and Prepared Statements

In this task, we will enable Fabflix with Connection Pooling and Prepared Statements.

Step 1: Enable JDBC connection pooling for Fabflix for all servlets. We will use our running application of "TomcatTest" to show how to change it to use connection pooling.

  1. Copy the JDBC Driver's jar into $CATALINA_HOME/lib (the lib folder just inside tomcat directory. If the lib folder is not there, create it). Deploy the following TomcatPooling example. The file \META-INF\context.xml includes important information about the database, user name, password, and pooling configuration. If you want to change this file, you need to undeploy the war file, make those changes, make a new war file, and deploy the new war file. Otherwise, the Tomcat server may still use the previous context.xml. Check http://tomcat.apache.org/tomcat-8.5-doc/jndi-datasource-examples-howto.html for more information about how to configure connection pooling.
  2. Go to the link http://localhost:8080/project5-TomcatPooling-example/tomcat-pooling to test your application. You must see a list of rows displayed on the webpage.
  3. Look into the following files for the main changes in this war file compared to the previous war file without connection pooling:
    • \META-INF\context.xml.
    • \src\TomcatPoolingServlet.java
    • \WEB-INF\web.xml (see the resource-ref tag).

Step 2: Make sure to use Prepared Statements in all JDBC statements involved in search. You should have already done this in past projects. If you didn't, please do it now at least for Search functionalities since Task 3 needs it. You can use this tutorial on prepared statements. Note that prepared statements are useful only when you have a few placeholders (represented by ?) in them and that is the way we expect them to be in your code. Only then the statement is compiled just once and can be used repeatedly with different parameters which results in better performance.

  • Prepared statements are usually associated with one connection, which becomes tricky if you are using a connection pool. There is a setting 'cachePrepStmts' which can be set to true to cache the prepared statements.
  • You can make this setting true by adding the flag to the jdbc connection URL in the \META-INF\context.xml like this -
    url="jdbc:mysql://<IP>:<PORT>/moviedb?autoReconnect=true&amp;useSSL=false&amp;cachePrepStmts=true
    

When you are done, write a brief explanation of how/where (file path and line numbers as seen on Github, and snapshots of usage in code) you use the JDBC connection pooling and prepared statements in your code. You should submit this report to Github.


Task 2: Scaling Fabflix

Step 1: Setup two AWS instances as two backend servers by following this MySQL replication tutorial. We call them the "master instance" and the "slave instance" (in the context of MySQL replication).

Step 2 (master/slave): Create a dummy user for two example Tomcat applications:

shell> mysql -u root -p
mysql> CREATE USER 'mytestuser'@'localhost' IDENTIFIED BY 'mypassword';
mysql> GRANT ALL ON *.* TO 'mytestuser'@'localhost';

Step 3 (master/slave): Setup Tomcat on each master/slave instance. (You should have done it many times.)

Step 4 (master/slave): On each master/slave instance, deploy TomcatTest.war. Make the URL http://PUBLIC_IP:8080/TomcatTest/servlet/TomcatTest work. You may need to modify the username/password and IP address (use the internal AWS instance address). Also make sure to modify the AWS security group setting for these two instances to allow remote access to their 8080 port.

Step 5 (master/slave): On each master/slave instance, deploy Session.war. Make the URL http://PUBLIC_IP:8080/Session/servlet/ShowSession?myname=Michael work.

Step 6 (instance 1): On the instance that runs the original Fablix instance (called "instance 1"), setup Apache and its proxy by doing the following:

  1. Install Apache2 and related modules:
    instance1-shell> sudo apt-get install apache2
    instance1-shell> sudo a2enmod proxy proxy_balancer proxy_http rewrite headers lbmethod_byrequests
    instance1-shell> sudo service apache2 restart
    

Modify the security group to open port 80 of instance 1 to your IP address. Use your browser to open a URL using the public IP address of instance 1, and you should be able to see a "Apache2 Ubuntu Default Page". It means you have successfully setup Apache2 on this machine.

  1. Configure the Apache2 web server to use its proxy_balancer module for sharing (i.e., redirecting) requests to the backend instances. To do it, edit the following configuration file:
    instance1-shell> sudo vim /etc/apache2/sites-enabled/000-default.conf
    

Create a load balancer proxy, whose members are the backend instances. In particular, define a proxy on top of the file, before the <VirtualHost *:80> tag.

<Proxy "balancer://TomcatTest_balancer">
    BalancerMember "http://172.2.2.2:8080/TomcatTest/"
    BalancerMember "http://172.3.3.3:8080/TomcatTest/"
</Proxy>

Here we assume '172.2.2.2' and '172.3.3.3' are the private IP address of the master and slave instances, respectively.

  1. Add two new rules in the body of the VirtualHost tag.
    ProxyPass /TomcatTest balancer://TomcatTest_balancer
    ProxyPassReverse /TomcatTest balancer://TomcatTest_balancer
    
  2. Restart Apache:
    instance1-shell> sudo service apache2 restart
    
  3. Modify the security group of the two backend instances to allow instance 1 to access their 8080 port.

These settings will redirect HTTP requests to "instance1_IP/TomcatTest" to one of the two backend instances. To test it, use a browser to point to http://instance1_IP/TomcatTest/servlet/TomcatTest. Be sure to open port 80 of instance 1 to your IP address. Check the Tomcat log of the two backend instances.

instance2-shell> tail -f /var/log/tomcat8/*.log /var/log/tomcat8/*.txt /var/log/tomcat8/*.out

One of them should receive that request. Keep refreshing the page to send multiple requests, and check if the two backends are receiving the requests evenly.

Step 7 (instance 1): Configure the proxy on instance 1 to handle sessions properly. Since the current setting will send requests randomly to the backend, it will not pass cookies properly, causing sessions to fail. We want to make the session persist over several requests of the same client, i.e., to have a sticky session. To do it, read the instructions, especially those under "Examples of a balancer configuration." Here's a sample setting for the /etc/apache2/sites-enabled/000-default.conf file for the "Session.war" application:

Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED

<Proxy "balancer://Session_balancer">
    BalancerMember "http://172.2.2.2:8080/Session" route=1
    BalancerMember "http://172.3.3.3:8080/Session" route=2
ProxySet stickysession=ROUTEID
</Proxy> 

Also do the following:

  1. Add two new rules in the body of the VirtualHost tag.
    ProxyPass /Session balancer://Session_balancer
    ProxyPassReverse /Session balancer://Session_balancer
    
  2. Restart Apache:
    instance1-shell> sudo service apache2 restart
    

Test if it works by pointing to the URL http://instance1_IP/Session/servlet/ShowSession?myname=Michael of instance 1. It should access one of the backend instances only.

Here's a sample Apache configuration file 000-default.conf.

Step 8 (main task): Deploy your Fabflix system to the two backend instances. Do MySQL master/slave replication. Configure the original instance properly to enable load balancing, connection pooling, sticky sessions. Also make sure the write request must be sent to the master mysql instance, while read request should be sent to either master or slave mysql instance. Accessing a remote MySQL server (eg: master MySQL from Tomcat on slave instance) needs some config changes as explained in the last part of setup-scripts. Enabling the scaled version with HTTPS is optional. Note that you are required to add a section to the connection pooling report and explain how to use connection pooling in the case of having two backend servers. Put snapshots of your code and file and line numbers as seen on Github.

The following is the architecture diagram:

Notice that in this architecture, each Tomcat can talk to both mysql instances. Sometimes we may want to setup a cluster of MySQL, and let each Tomcat talk to the cluster through another load balancer. If interested, you can read this page on how to set it up.

Step 9: Google Cloud Platform (GCP): After finishing the main task, you then need to install and configure the load balancer on one Google Cloud instance. As before, this load balancer needs to redirect requests to one of the Tomcat servers. However, the load balancer needs to use the public IPs of the AWS instances, and make sure you have port 8080 open to the Google Cloud instance.

You can get $300 free credits for Google Cloud from free-trial. After that, start a computing engine (using Ubuntu 16.04) by following this tutorial, and repeat the exact same steps as you did for installing the load balancer on AWS.


Task 3: Measuring the performance of Fabflix search feature

In this part, we will measure the performance of the keyword search feature that you have implemented in the past projects. The measurement results described in subtasks 3.1 and 3.2 must be reported for both the single-instance (i.e., the instance that you prepared in earlier projects and Task1) and the scaled version of Fabflix. Note: The URL to the single-instance version should be http:///INSTANCE1_PUBLIC_IP:8080/fabflix, while it should be http://INSTANCE1_PUBLIC_IP:80/fabflix for the scaled version that you prepared in Task 2.

Task 3.1: Preparing the codebase for time measurement

Here, we are going to prepare for measuring the following two statistical variables: (1) the average time it takes for the search servlet to run completely for a query (called TS), and (2) the average time spent on the parts that use JDBC, per query (called TJ). Obviously, TS > TJ always.

Step 1. Use the following sample to insert the necessary time statements for measuring TS and TJ. This is a useful link about how to write to a file in servlet. You are required to measure and log the value of "search servlet total execution time" and "JDBC execution time" for every request served by the server (i.e., assuming these values are printed in one line per query, if a query workload of 1000 queries is fired to the system, we must have 1000 lines in the log file, each line containing one sample value for calculating TS and TJ).

Particularly for TS samples, it is highly recommended to place these log statements in a filter that wraps the search servlet.

// Time an event in a program to nanosecond precision
long startTime = System.nanoTime();
/////////////////////////////////
/// ** part to be measured ** ///
/////////////////////////////////
long endTime = System.nanoTime();
long elapsedTime = endTime - startTime; // elapsed time in nano seconds. Note: print the values in nano seconds 

Step 2. Write a script (in any language that you prefer) to process the resulting log file of a query workload and calculate TS and TJ (i.e., by parsing the log statements and taking the average of all the samples). This script will be needed in preparing the html report asked for in Step 3 below.

Submission-related note: In Step 3 below, you will run queries. Upload the log file created to Github and mention its location in your report. The usage of this script, which is expected to be found at the root of your .war file, must be explained in your README file.

Task 3.2: Preparing the test plan in Apache JMeter

In this part, you will use Apache JMeter to measure the performance of the search feature of the Fabflix website. In particular, you must measure the average query time of the search feature using a set of queries based on the movie titles in this file. Assume the page size is 50, and we only want the first page of results.

The following figure illustrates the round-trip time of a query from a client to the server then back to client. The query time of a query (i.e., "Tq") is the total time starting from when the search request is sent from the client (Ts) until the time when the response has completely received by the client (Te). It includes two major parts: (1) response time (Tr) is the time it takes until the client hears the first bit of the response, and (2) "payload time" (Tp) is the time it takes for the response data to be downloaded by the client completely.

Step 1: Read this reference to get an overview of JMeter. Read this page to get familiar with JMeter basics.

Step 2: Download and install JMeter from this link.

Step 3: Use this link to make a test plan for the search feature of your website. You will run the JMeter test from your local development against the remote AWS instance. The plan must iteratively generate a proper HTTP or HTTPS search request for every movie title in the provided query file. Here is a useful page about how to use a CSV file for Jmeter. This is another link which can help you perform load testing using JMeter. This Stackoverflow answer has answered many questions on Piazza.

Task 3.3: Collecting the performance results

Run the tests for all the following settings to collect performance results. For each case, remember to make the necessary changes to the JMeter test plan and/or the codebase. Use the results to fill out this HTML file as your measurement report. For each case, report the requested values in the corresponding columns, and write a short analysis for that case in the last column. This image is an example of what you should report in the second column called "Graph Results Screenshot".

Notes:

  1. In all cases, if not mentioned otherwise, your Fabflix codebase is assumed to use both the Prepared Statements and Connection Pooling optimization techniques.
  2. If more than one JMeter thread is to be used, each thread should start a new session in Tomcat (i.e., threads should not share a session-id).

Single-instance cases (i.e., that is accessible via http://INSTANCE1_PUBLIC_IP:8080/fabflix):

  1. Use HTTP, without using prepared statements, 10 threads in JMeter.
  2. Use HTTP, without using connection pooling, 10 threads in JMeter.
  3. Use HTTP, 1 thread in JMeter.
  4. Use HTTP, 10 threads in JMeter.
  5. Use HTTPS, 10 threads in JMeter.

Scaled-version cases (i.e., that is accessible via http://INSTANCE1_PUBLIC_IP:80/fabflix):

  1. Use HTTP, without using prepared statements, 10 threads in JMeter.
  2. Use HTTP, without using connection pooling, 10 threads in JMeter.
  3. Use HTTP, 1 thread in JMeter.
  4. Use HTTP, 10 threads in JMeter.

Deployment On AWS

Use Github for your project 5 code and reports and deploy (web code) on AWS.


Submissions, Demonstration and Grading Criteria

There will be no demo for this project. We will be grading based on your submissions. Therefore, please be clear and precise in your reports. Please put your report on the landing page of your Github repository. Summarized below are the submissions we need:

  1. (Task 1) Section in report describing how/where you use the JDBC connection pooling and prepared statements in your code. Use snapshots of your code and mention the file and line numbers as seen in Github.
  2. (Task 2) In report, write down the address of your AWS instance and Google Cloud instance (You may plan to shutdown your AWS/Google instance due to insufficient number of hours left and consequently the IP may change. Everyone should also update their IPs on this sheet. We will inform through Piazza when we are going to check the services. Make sure that at least 30 minutes before we start checking, you have started your services, checked that they work correctly and put your IPs in the Google sheet.). Open ports 80 and 8080 of your Google and AWS instances to 128.195.0.0/16. Explain how you used connection pooling with two backend SQL servers and how read/write requests were routed. Use snapshots of your code and mention the file and line numbers as seen in Github.
  3. (Task 3) Upload the log file created by running queries, the parsing script and the HTML file. Mention the location of these in your report. Be sure to fill out all sections of HTML file (including the analysis part)
  4. Upload your WAR file and the README.

Follow thishttps://docs.google.com/document/d/137v5gPqyN5raX5HXrZB24KeTNWtH9NJ-jMpdreBTwPk/edit?usp=sharing format for your report.

Last modified 11 months ago Last modified on Aug 16, 2019 11:43:21 AM

Attachments (8)

Download all attachments as: .zip