CS122B Project 3: reCaptcha, HTTPS, Stored Procedure, XML Parsing

Due: Week 7 Thursday, 2/22/2018 11:45 pm. Due on Github.


  1. (Optional) Register a domain name for Fabflix
  2. Adding reCAPTCHA to Project 2
  3. Adding HTTPS to Project 2
  4. Implementing a Dashboard using Stored Procedure
  5. Importing large XML data files into the Fabflix database

Task 1 (Optional): Register a domain name for Fabflix

If you are tired of using an IP address to visit your AWS instance and want to make it "real," you can register a domain name from a registrar company such as Godaddy. Then use the AWS instance to do the hosting. Now we are open for real business!

Task 2: Adding reCAPTHA to Project 2

Improve the user-authentication process of project 2 by adding a module that verifies real users and blocks bots.

  1. Download and deploy TomcatFormReCaptcha.war.
  2. Get a reCAPTCHA from Google. Use the public IP of your AWS instance as the domain. You will get a secret key and a site key.
  3. Follow the instructions on the page shown in step 2 to modify the file "webapps/TomcatFormReCaptcha/index.html". In particular, use your own "site key" to replace the "data-sitekey" value.
  4. Test the page http://YOUR_IP:8080/TomcatFormReCaptcha to see if reCAPTHA works for the front end (You should get an error saying "Recaptcha WRONG!!!!" if you followed the instructions properly)
  5. Read and understand the file "sources/", especially the logic related to verifying "g-recaptcha-response". Also modify the file "" to use your own "SITE_KEY" and "SECRET_KEY" values.
  6. Recompile the Java files and reload the app using the Tomcat console. Test if the backend of reCAPTHA is working properly.
    shell> sudo javac -classpath .:../lib/mysql-connector-java-5.0.8-bin.jar:../lib/servlet-api.jar:../lib/recaptcha4j-0.0.7.jar:../lib/javax.json-1.0.jar 
  7. Apply the same logic to your project 2 by making the necessary changes on the login frontend and backend logic.

Task 3: Adding HTTPS to Project 2

See this reference on how to configure Tomcat to enable HTTPS connections.

Step 1: On your AWS instance, use the following command to create a keystore to be used by Tomcat. Remember what password you use for the keystore. Our example uses "changeit".

In the following steps, "/home/ubuntu/tomcat" is an example of your tomcat installation directory on AWS. You need to replace it with your own directory.

sudo keytool -genkey -alias fabflix -keyalg RSA -keystore /home/ubuntu/tomcat/keystore

Step 2: Uncomment the Connector tag in /home/ubuntu/tomcat/server.xml, which has port="8443" and defines an SSL HTTP/1.1 connector. Modify it and use the following configuration:

<Connector port="8443" 
 keystoreFile=" /home/ubuntu/tomcat/keystore"
 keystorePass="changeit" />

Step 3: Restart tomcat:


At this point, https://[YOUR_IP]:8443/manager/html should also an entry point to Tomcat. Note that it uses the secure kind of http (https) and a different port, which is 8443 by default. Make sure to open the port 8443 on your AWS instance.

Step 4: Limit a Tomcat application to HTTPS only. Currently, http://YOUR_IP:8080/manager/html entry (which uses HTTP) is still open. In order to disable HTTP and enable HTTPS only, change your TomcatFormReCaptcha application's web.xml by adding the following code before the "</web-app>" tag:


This setting will force all HTTP requests to this application to get redirected to the corresponding HTTPS URL. Reload the application to see if the redirection works for the URL http://YOUR_IP:8080/TomcatFormReCaptcha/. Read this reference on more details about how to configure Tomcat to always require HTTPS.

Step 5: Make necessary changes to your project 2 to use HTTPS always with HTTP disabled.

Task 4: Implementing a Dashboard using Stored Procedure

Step 1: To store information about Fabflix's employees, create a new table called employees with the following attributes:

email varchar(50) primary key
password varchar(20) not null
fullname varchar(100)

Step 2: Insert an employee with, classta, and TA CS122B as its email, password, and fullname respectively.

Step 3: Setup an entry point https://YOUR_IP:8443/fabflix/_dashboard that uses HTTPS. It allows employees to login with a valid email and password to access a dashboard.

Step 4: Allow a logged-in employee to do the following operations using this dashboard. (You can reuse part of your code in Project 1.)

  • Inserting a new star into the database.
  • Providing the metadata of the database; in particular, show the name of each table and, for each table, each attribute and its type.

Step 5: Write a stored procedure called add_movie . Its arguments include all the required fields of the movie, a single star and a single genre. (The reason we don't want to pass multiple stars and genres is that the current MySQL implementation doesn't allow stored procedures to take an array argument.) If a record corresponding to the star or genre exists, it is linked to the movie; if not, it is created and then linked to the movie. Insert a movie only if it does not exist. All necessary stars_in_movies and genres_in_movies records are also created. The procedure should output informative status messages to the user as it performs the task. If a movie's record set cannot be correctly made, the procedure needs to output a corresponding message, and no changes to the database are made.

Step 6: In the dashboard implemented in Step 4, add one more feature that allows an employee to add information related to a movie, including a single star and a single genre. The UI should allow the employee to provide the information, and the backend should call the stored procedure add_movie created above.

Task 5: Importing large XML data files into the Fabflix database

The goal of this task is to use a new source of data to increase the content size of the Fabflix movie database.

You can find the raw version of this new data source in this package. The main source of these files is here. Before starting to use these files, it is highly recommended to check the main source and read its documentation to get a better understanding of this dataset.

Particularly, you are required to parse the files mains243.xml and casts124.xml to add new movies, stars, and genres to the Fabflix database (i.e., on top of what you already have in the database). If needed, the stars_in_movies and genres_in_movies tables should also be updated accordingly. Also, for each XML file in this package, there is a DTD file to be used to understand/validate the structure of that XML file.

Please note that in this project it is expected that, if needed, you make reasonable decisions on how to use this new data source. The files are not clean and you may find inconsistencies in them. In such cases, do not be surprised. Make reasonable decisions. Such issues are very common in many real applications. You can use the following high-level description regarding the mapping of the XML tag attributes to the Fabflix database schema that was given in project 1.

As shown in Figure 1, the root tag in the file mains243.xml is <movies>, which consists of several instances of the tag <directorfilms>. Each <directorfilms> contains the information of a director, along with his/her movies, each movie being an instance of the tag <film>. It is clear where to find the values of all the columns of the movies table. Furthermore, each <film> has a list of assigned categories in a <cats> tag (i.e., each category as a <cat> tag). The value of the <cat> tag corresponds to the genre name in the genres table of the Fabflix database. If a <cat> does not exist in the genres table, a new genre must be added, and the genres_in_movies table must be updated accordingly.

Figure 1.

For updating the stars and stars_in_movies tables, you should use the content of the file casts124.xml, which is shown in a high-level view in Figure 2.

Figure 2.

Similar to the <movies> tag, the content of the <casts> tag, which is the root of the file casts124.xml, is also grouped by the movie directors. You can find the information of the star-movie connections in the instances of the <m> tag. In this tag, the film is identified using the <f> tag, which links to the <fid> tag of the <film> tag in the main file. Tag <m> also gives you the stagename of the actor of this relation in a tag called <a>. The stagename helps you connect this stars_in_movies record to the right star record. The stars to be added to the stars table are parsed out of the file actors63.xml.

Although there are many libraries for parsing XML, there are two main approaches towards this problem: (1) SAX: event-based XML parsing and (2) DOM: using a tree-based object model for parsing XML. In SAX, the library moves through the XML tags and makes calls to event-handler functions that are implemented by the developer. In DOM, the developer is free to move on a tree of objects that is made available by the library, each object corresponding to an entity in the XML tree (e.g., a tag). You are free to choose between either of SAX or DOM approaches to parse the document. For examples, look here for SAX and here for DOM. You can find here, the instructions and files needed to run these examples.


  • XML parsing: You should write a Java program to parse the XML files and populate the database.
  • Performance Tuning: A naive implementation of your program is expected to be slow. Come up with at least two optimization techniques to reduce the running time of your program. Write an itemized, brief report accessible to describe your optimizations, and the corresponding time reductions.


  • In case of inconsistencies between the provided data with respect to the relational schema, make sure your program does not crash. The inconsistent data must be reported to the user (print the element name and node value) and your program should resume processing.
  • Use NULL for missing values when inserting records into the database.
  • For inconsistent values with the schema (like non-integer values for volume or roman numbers ) you can treat them as NULL.
  • For entries with multiple ISBN/publishers/etc, only use one of them.
  • Keep track of duplicate entries like multiple occurrences of same movies/actors and insert them only once.
  • Make sure the encoding of your parser is ISO-8859-1. You can ignore any special character that cannot be handled with this encoding.

Deployment On AWS and Submissions

The same as before, use Github for your project 3 code and deploy on AWS.

Demonstration and Grading Criteria

The same ss before, You are required to demonstrate this project, and time is 15 minutes per team. Please refer to this Grading checklist for detailed criteria.

Last modified 3 years ago Last modified on May 8, 2018 10:51:59 AM

Attachments (8)