wiki:cs122b-2019-winter-project3

CS122B Project 3: reCaptcha, HTTPS, Stored Procedure, XML Parsing

Due: Week 7 Monday, Feb/18th/2019 11:45 pm. Submit project on Github.

Tasks

  1. (Optional) Register a domain name for Fabflix
  2. Adding reCAPTCHA to Project 2
  3. Adding HTTPS to Project 2
  4. Use PreparedStatement in the query
  5. Update Encrypted password in database
  6. Implementing a Dashboard using Stored Procedure
  7. Importing large XML data files into the Fabflix database

Task 1 (Optional): Register a domain name for Fabflix

If you are tired of using an IP address to visit your AWS instance and want to make it "real," you can register a domain name from a registrar company such as Godaddy. Then use the AWS instance to do the hosting. Now we are open for real business!


Improve the security of the Fabflix website

A real business needs to treat security issues seriously. In the following 4 tasks, you will improve the security of your website in project 2 in various ways:

  • reCAPTCHA enhances security by preventing bots and only allowing real human users.
  • HTTPS enhances security by securing the process when the client sends sensitive information to the server.
  • Encrypting the password improves security in the case of your database gets attacked, the hacker won't be able to get the real sensitive information, such as password.
  • Use prepared statements can help prevent a very common kind of attack: SQL injection attack.

Task 2: Adding reCAPTHA to Project 2

Improve the user-authentication process of project 2 by adding a module that verifies real users and blocks bots.

  1. Go to 2019w-project3-recaptcha-example, follow the README instructions on the Github Page the code.
  2. Get a reCAPTCHA from Google. Add public IP of your AWS instance and localhost to the domain. You will get a secret key and a site key.
  3. Test the page http://localhost:8080/2019w-project3-recaptcha-example to see if reCAPTHA works.
  4. Apply the same logic to your project 2 by making the necessary changes on the login frontend and backend logic.

Note that if you AWS instance is restarted, the public IP will change, and you need to add the new IP to Google reCAPTHA console as well.

Task 3: Adding HTTPS to Project 2

HTTPS makes your website substantially more secure from hacking and other security breaches. HTTPS makes sure that the client can safely send sensitive data to your server without the risk of leaking the data in the transfer process.

Step 1: On your AWS instance, use the following command to create a keystore to be used by Tomcat. Remember what password you use for the keystore. Our example uses "changeit".

In the following steps, "/home/ubuntu/tomcat" is an example of your tomcat installation directory on AWS. You need to replace it with your own directory.

sudo keytool -genkey -alias fabflix -keyalg RSA -keystore /home/ubuntu/tomcat/keystore

Step 2: Uncomment the Connector tag in /home/ubuntu/tomcat/conf/server.xml, which has port="8443" and defines an SSL HTTP/1.1 connector. Modify it and use the following configuration:

<Connector port="8443" 
 protocol="HTTP/1.1"
 connectionTimeout="20000"
 redirectPort="8443"
 SSLEnabled="true"
 scheme="https"
 secure="true"
 sslProtocol="TLS"
 keystoreFile="/home/ubuntu/tomcat/keystore"
 keystorePass="changeit" />

Step 3: Restart tomcat:

go to your tomcat directory. Do "./bin/shutdown.sh" then "./bin/startup.sh"

At this point, https://[YOUR_IP]:8443/manager/html should also be an entry point to Tomcat. Note that it uses the secure kind of http (https) and a different port, which is 8443 by default. Make sure to open the port 8443 on your AWS instance.

Step 4: Limit a Tomcat application to HTTPS only. Currently, http://YOUR_IP:8080/manager/html entry (which uses HTTP) is still open. In order to disable HTTP and enable HTTPS only, change your 2019w-project3-recapthca-example application's web.xml by adding the following code before the "</web-app>" tag:

<security-constraint>
 <web-resource-collection>
    <web-resource-name>HTTPSOnly</web-resource-name>
    <url-pattern>/*</url-pattern>
 </web-resource-collection>
 <user-data-constraint>
    <transport-guarantee>CONFIDENTIAL</transport-guarantee>
 </user-data-constraint>
</security-constraint>

This setting will force all HTTP requests to this application to get redirected to the corresponding HTTPS URL. Reload the application to see if the redirection works for the URL http://YOUR_IP:8080/2019w-project3-recapthca-example/.

Step 5: Make necessary changes to your project 2 to use HTTPS always with HTTP disabled.

Task 4: Use prepared statements in your server code when sending a query

Previously, we generate the SQL query by concatenating the parameter value with a template string, then it will be passed to the database to be compiled and executed. This string manipulation approach leaves us very vulnerable to SQL Injection Attack. We can prevent this by using PreparedStatement . PreparedStatement pre-compiles the sql template and when database executes the query, it treats the parameters as values, without messing it with the SQL query itself.

You can learn PreparedStatement at this reference.

Change all the queries in your project 2 involving specifying any parameter from user input to use PreparedStatement .

Task 5: Update user password in database from plain text to Encrypted password

This task will show you how to update the plain text password to encrypted password and store it in the database. The client (frontend) will still pass the plain text password to the server, (this is okay because we have HTTPS), the server will then compare the plain text password with the encrypted password stored in the database.

The current moviedb.customers table stores plain text password. We have provided you a Java program you can directly run to update them with the encrypted password. We use Java Simplified Encryption library to do the encryption.

  1. Go to 2019w-project3-encryption-example to download and deploy the project.
  2. On your local machine, import the project into Eclipse, and run UpdateSecurePassword.java This program will read all the passwords from the existing moviedb.customers table, encrypt them, and update the table with the encrypted passwords.
  3. On AWS, also run UpdateSecurePassword.java to update the database on AWS. Note that you should only run this program once. If you need to re-run it, you need to re-populate the customers table with the plain text password.
  4. VerifyPassword.java shows you how to verify the user's email and password against the encrypted password stored in the database.
  5. Change your login modules in project 2 to verify user email/password based on the encrypted password.

Task 6: Implementing a Dashboard using Stored Procedure

Step 1: To store information about Fabflix's employees, create a new table called employees with the following attributes:

email varchar(50) primary key
password varchar(20) not null
fullname varchar(100)

Step 2: Insert an employee with classta@email.edu, classta, and TA CS122B as its email, password, and fullname respectively. Use the same technique in the encrypted password section to encrypt the password here.

Step 3: Setup an entry point https://YOUR_IP:8443/fabflix/_dashboard that uses HTTPS. It allows employees to login with a valid email and password to access a dashboard.

Step 4: Allow a logged-in employee to do the following operations using this dashboard. (You can reuse part of your code in Project 1.)

  • Inserting a new star into the database. Star name is required, birth year is optional.
  • Providing the metadata of the database; in particular, show the name of each table and, for each table, each attribute and its type.

Step 5: Write a stored procedure called add_movie . Its arguments include all the required fields of the movie, a single star (star name) and a single genre (genre name). (The reason we don't want to pass multiple stars and genres is that the current MySQL implementation doesn't allow stored procedures to take an array argument.)

Step 6: In the dashboard implemented in Step 4, add one more feature that allows an employee to add information related to a movie, including a single star and a single genre. The UI should allow the employee to provide the information, and the backend should call the stored procedure add_movie created above.

Remarks:

  1. The added new stars and movies should be accessible and searchable from the project 2 webpage.
  1. When adding a new star, if there's an existing star with the same name and birth year, you can treat them as a new person and create a new record.
  1. When adding a new movie:
    • If the star or genre already exists, you should link them to the movie. In the case of two star records with the same name, you can link any of them.
    • If the star or genre doesn't exist, you should create it and then link it to to the movie. (star birth year is optional and can be leaved to null)
    • All necessary stars_in_movies and genres_in_movies records should also be created. The procedure should output informative status messages to the user as it performs the task
    • A movie is identified by (title, year, director). For example, a movie with same title and year, but not same director is considered as a different movie.
    • If the movie already exists, you should show a corresponding message, and no changes to the database are made.
  1. The user does not provide an ID for star, movie, or genre. Those ids should be managed by your stored procedure. To implement this in your stored procedure, you could:
    • Use "select max(id) from TABLE" to get the "largest id" according to the alphabetic order, then "parse" it and use it to generate a new unique id;
    • Use a new "helper table" to store the next available integer and increment it each time for the next unique id; Changing the our existing table schema to implement this feature is not allowed.

Task 7: Importing large XML data files into the Fabflix database

The goal of this task is to use a new source of data to increase the content size of the Fabflix movie database.

You can find the raw version of this new data source in this attachment. The schemas of these files are explained at this page.

For example a <dirid /> tag will be defined as <!ELEMENT dirid > in the DTD file. The comments along with the DTD declarations are very useful to identify what field is used for. Try to find relationships between the XML files and correlate the entities together with the help of the DTD files.

Particularly, you are required to parse the files mains243.xml and casts124.xml to add new movies, stars, and genres to the Fabflix database (i.e., on top of what you already have in the database). If needed, the stars_in_movies and genres_in_movies tables should also be updated accordingly. Also, for each XML file in this package, there is a DTD file to be used to understand/validate the structure of that XML file.

Please note that in this project it is expected that, if needed, you make reasonable decisions on how to use this new data source. The files are not clean and you may find inconsistencies in them. In such cases, do not be surprised. Make reasonable decisions. Such issues are very common in many real applications. You can use the following high-level description regarding the mapping of the XML tag attributes to the Fabflix database schema that was given in project 1.

As shown in Figure 1, the root tag in the file mains243.xml is <movies>, which consists of several instances of the tag <directorfilms>. Each <directorfilms> contains the information of a director, along with his/her movies, each movie being an instance of the tag <film>. It is clear where to find the values of all the columns of the movies table. Furthermore, each <film> has a list of assigned categories in a <cats> tag (i.e., each category as a <cat> tag). The value of the <cat> tag corresponds to the genre name in the genres table of the Fabflix database. If a <cat> does not exist in the genres table, a new genre must be added, and the genres_in_movies table must be updated accordingly.

Figure 1.

For updating the stars and stars_in_movies tables, you should use the content of the file casts124.xml, which is shown in a high-level view in Figure 2.

Figure 2.

Similar to the <movies> tag, the content of the <casts> tag, which is the root of the file casts124.xml, is also grouped by the movie directors. You can find the information of the star-movie connections in the instances of the <m> tag. In this tag, the film is identified using the <f> tag, which links to the <fid> tag of the <film> tag in the main file. Tag <m> also gives you the stagename of the actor of this relation in a tag called <a>. The stagename helps you connect this stars_in_movies record to the right star record. The stars to be added to the stars table are parsed out of the file actors63.xml.

Although there are many libraries for parsing XML, there are two main approaches towards this problem: (1) SAX: event-based XML parsing and (2) DOM: using a tree-based object model for parsing XML. In SAX, the library moves through the XML tags and makes calls to event-handler functions that are implemented by the developer. In DOM, the developer is free to move on a tree of objects that is made available by the library, each object corresponding to an entity in the XML tree (e.g., a tag). You are free to choose between either of SAX or DOM approaches to parse the document. For examples, look at 2019w-project3-SAXParser-example and 2019w-project3-DomParser-example.

Requirements:

  • XML parsing: You should write a Java program to parse the XML files and insert the new data to the existing moviedb database. The new movie data should be searchable in the project 2 search page.
  • Performance Tuning: A naive implementation of your program is expected to be slow. Come up with at least two optimization techniques to reduce the running time of your program. (Note that set auto-commit off and using PreparedStatement can NOT be counted as optimization approaches for this task.) Write an itemized, brief report accessible to describe your optimizations, and the corresponding time reductions. Put this optimization report in your Github repository in a text format, such as plain text (.txt), markdown (.md), or PDF.


Remarks:

  • In case of inconsistencies between the provided data with respect to the relational schema, make sure your program does not crash. The inconsistent data must be reported to the user (print the element name and node value) and your program should resume processing.
  • Use NULL for missing values when inserting records into the database.
  • For inconsistent values with the schema (like non-integer values for volume or roman numbers ) you can treat them as NULL.
  • For entries with multiple ISBN/publishers/etc, only use one of them.
  • Keep track of duplicate entries like multiple occurrences of same movies/actors and insert them only once.
  • Make sure the encoding of your parser is ISO-8859-1. You can ignore any special character that cannot be handled with this encoding.

Deployment On AWS and Submissions

The same as before, use Github for your project 3 code and deploy on AWS. Review Project1 Submission before you submit project 3 to Github.


Demonstration and Grading Criteria

The same ss before, You are required to demonstrate this project, and time is 15 minutes per team. Please refer to this Grading Rubric for detailed criteria.

Last modified 17 months ago Last modified on Feb 6, 2019 11:15:48 PM

Attachments (1)

Download all attachments as: .zip