| 1 | {"cells":[{"cell_type":"markdown","source":["d\n\n## Overview\n\nThis UCI CS190/CS122D demo notebook shows how to use **SparkSQL** and **Python** to create and query tables or DataFrames from files that have been uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows one to store data for querying inside of Databricks. This notebook assumes that we have the desired files already uploaded to DBFS.\n\nThis demo notebook is written in **Python** so its default cell type is Python. However, we can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported. Here we will also make direct use of SQL."],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"96816ed7-b08a-4ca3-abb9-f99880c3535d"}}},{"cell_type":"markdown","source":["## Got Data?\n\nWe will start by looking at how to connect to our data. For now, let's assume that the world is flat.\nThe example data here is for the Hoofers sailing club examples from CS122A and the \"Cow book\"."],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"33fc170f-2ec0-4f6e-9d8e-0534979a1522"}}},{"cell_type":"code","source":["# This cell shows how one connects Spark/SparkSQL to the target data (as a dataframe)\n\n# File location and type\nfile_location = \"/FileStore/tables/Sailors.csv\"\nfile_type = \"csv\"\n\n# CSV options\ninfer_schema = \"true\"\nfirst_row_is_header = \"true\"\ndelimiter = \",\"\n\n# The applied options are for CSV files. For other file types, these will be ignored.\nsailors_df = spark.read.format(file_type) \\\n .option(\"inferSchema\", infer_schema) \\\n .option(\"header\", first_row_is_header) \\\n .option(\"sep\", delimiter) \\\n .load(file_location)\n\ndisplay(sailors_df)"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"6482be4c-f067-47c9-b0ac-35c938b94601"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[22,"Dustin","7","45","CS"],[29,"Brutus","1","33","EE"],[31,"Lubber","8","55.5","Econ"],[32,"Andy","8","25.5","Math"],[58,"Rusty","10","35","CS"],[64,"Horatio","7","35","CS"],[71,"Zorba","10","16","CS"],[74,"Horatio","9","35","Math"],[85,"Art","4","25.5","Music"],[95,"Bob","3","63.5","Econ"],[101,"Joan","3","NULL","Math"],[107,"Johannes","NULL","35.0","NULL"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"sid","type":"\"integer\"","metadata":"{}"},{"name":"sname","type":"\"string\"","metadata":"{}"},{"name":"rating","type":"\"string\"","metadata":"{}"},{"name":"age","type":"\"string\"","metadata":"{}"},{"name":"major","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>sid</th><th>sname</th><th>rating</th><th>age</th><th>major</th></tr></thead><tbody><tr><td>22</td><td>Dustin</td><td>7</td><td>45</td><td>CS</td></tr><tr><td>29</td><td>Brutus</td><td>1</td><td>33</td><td>EE</td></tr><tr><td>31</td><td>Lubber</td><td>8</td><td>55.5</td><td>Econ</td></tr><tr><td>32</td><td>Andy</td><td>8</td><td>25.5</td><td>Math</td></tr><tr><td>58</td><td>Rusty</td><td>10</td><td>35</td><td>CS</td></tr><tr><td>64</td><td>Horatio</td><td>7</td><td>35</td><td>CS</td></tr><tr><td>71</td><td>Zorba</td><td>10</td><td>16</td><td>CS</td></tr><tr><td>74</td><td>Horatio</td><td>9</td><td>35</td><td>Math</td></tr><tr><td>85</td><td>Art</td><td>4</td><td>25.5</td><td>Music</td></tr><tr><td>95</td><td>Bob</td><td>3</td><td>63.5</td><td>Econ</td></tr><tr><td>101</td><td>Joan</td><td>3</td><td>NULL</td><td>Math</td></tr><tr><td>107</td><td>Johannes</td><td>NULL</td><td>35.0</td><td>NULL</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["# Create a view or table that we can use in SQL\n\nsailors_df.createOrReplaceTempView(\"Sailors\")"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"bd82bb99-1479-4d5c-be10-8c36df0f1d44"}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["%sql\n\n/* Query the created temp table in a SQL cell */\n\nselect * from Sailors"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"b5f66379-6f7f-42ec-8e82-d0e0926a1721"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[22,"Dustin","7","45","CS"],[29,"Brutus","1","33","EE"],[31,"Lubber","8","55.5","Econ"],[32,"Andy","8","25.5","Math"],[58,"Rusty","10","35","CS"],[64,"Horatio","7","35","CS"],[71,"Zorba","10","16","CS"],[74,"Horatio","9","35","Math"],[85,"Art","4","25.5","Music"],[95,"Bob","3","63.5","Econ"],[101,"Joan","3","NULL","Math"],[107,"Johannes","NULL","35.0","NULL"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"sid","type":"\"integer\"","metadata":"{}"},{"name":"sname","type":"\"string\"","metadata":"{}"},{"name":"rating","type":"\"string\"","metadata":"{}"},{"name":"age","type":"\"string\"","metadata":"{}"},{"name":"major","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>sid</th><th>sname</th><th>rating</th><th>age</th><th>major</th></tr></thead><tbody><tr><td>22</td><td>Dustin</td><td>7</td><td>45</td><td>CS</td></tr><tr><td>29</td><td>Brutus</td><td>1</td><td>33</td><td>EE</td></tr><tr><td>31</td><td>Lubber</td><td>8</td><td>55.5</td><td>Econ</td></tr><tr><td>32</td><td>Andy</td><td>8</td><td>25.5</td><td>Math</td></tr><tr><td>58</td><td>Rusty</td><td>10</td><td>35</td><td>CS</td></tr><tr><td>64</td><td>Horatio</td><td>7</td><td>35</td><td>CS</td></tr><tr><td>71</td><td>Zorba</td><td>10</td><td>16</td><td>CS</td></tr><tr><td>74</td><td>Horatio</td><td>9</td><td>35</td><td>Math</td></tr><tr><td>85</td><td>Art</td><td>4</td><td>25.5</td><td>Music</td></tr><tr><td>95</td><td>Bob</td><td>3</td><td>63.5</td><td>Econ</td></tr><tr><td>101</td><td>Joan</td><td>3</td><td>NULL</td><td>Math</td></tr><tr><td>107</td><td>Johannes</td><td>NULL</td><td>35.0</td><td>NULL</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["# With this registered as a temp view, it will only be available to this particular notebook.\n# If we'd like other users to be able to query this table, we can also create a permanent table from the DataFrame.\n# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.\n# To do this, we would choose our table name and uncomment the bottom line.\n\npermanent_table_name = \"SailorsData\"\n\n#sailors_df.write.format(\"parquet\").saveAsTable(permanent_table_name)"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"db9631f6-bb4a-42ca-8a3c-0d48af932331"}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["%sql\n\n-- Query the newly created permanent table in a SQL cell\n-- select * from SailorsData;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"ea5810f0-a6f7-4885-82a3-59acb43e2132"}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["# File location and type\nfile_location = \"/FileStore/tables/Boats.csv\"\nfile_type = \"csv\"\n\nboats_df = spark.read.format(file_type) \\\n .option(\"inferSchema\", infer_schema) \\\n .option(\"header\", first_row_is_header) \\\n .option(\"sep\", delimiter) \\\n .load(file_location)\n\ndisplay(boats_df)"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"97ae22db-6bba-47e6-9651-9d1b218c98f7"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[101,"Interlake","blue"],[102,"Interlake","red"],[103,"Clipper","green"],[104,"Marine","red"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"bid","type":"\"integer\"","metadata":"{}"},{"name":"bname","type":"\"string\"","metadata":"{}"},{"name":"color","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>bid</th><th>bname</th><th>color</th></tr></thead><tbody><tr><td>101</td><td>Interlake</td><td>blue</td></tr><tr><td>102</td><td>Interlake</td><td>red</td></tr><tr><td>103</td><td>Clipper</td><td>green</td></tr><tr><td>104</td><td>Marine</td><td>red</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["# Create a view or table\n\nboats_df.createOrReplaceTempView(\"Boats\")"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"6b40f8ef-67fe-431f-a29c-9fed4a6d2527"}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["# File location and type\nfile_location = \"/FileStore/tables/Reserves.csv\"\nfile_type = \"csv\"\n\n# The applied options are for CSV files. For other file types, these will be ignored.\nreserves_df = spark.read.format(file_type) \\\n .option(\"inferSchema\", infer_schema) \\\n .option(\"header\", first_row_is_header) \\\n .option(\"sep\", delimiter) \\\n .load(file_location)\n\ndisplay(reserves_df)"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"6fb2cb86-5a9c-4798-ae42-58740c1a2e2c"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["22","101","1998-10-10T00:00:00.000+0000"],["22","102","1998-10-10T00:00:00.000+0000"],["22","103","1998-10-08T00:00:00.000+0000"],["22","104","1998-10-07T00:00:00.000+0000"],["31","102","1998-10-11T00:00:00.000+0000"],["31","103","1998-11-06T00:00:00.000+0000"],["31","104","1998-11-12T00:00:00.000+0000"],["64","101","1998-09-05T00:00:00.000+0000"],["64","102","1998-09-02T00:00:00.000+0000"],["74","103","1993-09-08T00:00:00.000+0000"],["NULL","103","1998-09-09T00:00:00.000+0000"],["1","NULL","2001-01-11T00:00:00.000+0000"],["1","NULL","2002-02-02T00:00:00.000+0000"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"sid","type":"\"string\"","metadata":"{}"},{"name":"bid","type":"\"string\"","metadata":"{}"},{"name":"date","type":"\"timestamp\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>sid</th><th>bid</th><th>date</th></tr></thead><tbody><tr><td>22</td><td>101</td><td>1998-10-10T00:00:00.000+0000</td></tr><tr><td>22</td><td>102</td><td>1998-10-10T00:00:00.000+0000</td></tr><tr><td>22</td><td>103</td><td>1998-10-08T00:00:00.000+0000</td></tr><tr><td>22</td><td>104</td><td>1998-10-07T00:00:00.000+0000</td></tr><tr><td>31</td><td>102</td><td>1998-10-11T00:00:00.000+0000</td></tr><tr><td>31</td><td>103</td><td>1998-11-06T00:00:00.000+0000</td></tr><tr><td>31</td><td>104</td><td>1998-11-12T00:00:00.000+0000</td></tr><tr><td>64</td><td>101</td><td>1998-09-05T00:00:00.000+0000</td></tr><tr><td>64</td><td>102</td><td>1998-09-02T00:00:00.000+0000</td></tr><tr><td>74</td><td>103</td><td>1993-09-08T00:00:00.000+0000</td></tr><tr><td>NULL</td><td>103</td><td>1998-09-09T00:00:00.000+0000</td></tr><tr><td>1</td><td>NULL</td><td>2001-01-11T00:00:00.000+0000</td></tr><tr><td>1</td><td>NULL</td><td>2002-02-02T00:00:00.000+0000</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["# Create a view or table\n\nreserves_df.createOrReplaceTempView(\"Reserves\")"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"5fd7853d-d9f5-41ce-aecf-a5f9b1a476fb"}},"outputs":[],"execution_count":0},{"cell_type":"markdown","source":["## It's Query Time!\n\nNow let's put our data to use and explore SQL in the world of Spark. We'll just do a few queries to \"prove\" that it works -- it's just SQL, so if you know SQL, to a first approximation, you know SparkSQL as well. To delve deeper, the online documentation at Databricks is very good - so check out the many built-in functions and features there."],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"7427fffb-9ca9-4887-af32-75b264160edc"}}},{"cell_type":"code","source":[""],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"7b6c7920-a08d-4637-90a0-5facfbe99148"}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["%sql\nSELECT * FROM Sailors ORDER BY major;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"5936e066-239a-4dea-bc0c-25cf82e2448f"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[71,"Zorba","10","16","CS"],[22,"Dustin","7","45","CS"],[58,"Rusty","10","35","CS"],[64,"Horatio","7","35","CS"],[29,"Brutus","1","33","EE"],[31,"Lubber","8","55.5","Econ"],[95,"Bob","3","63.5","Econ"],[32,"Andy","8","25.5","Math"],[74,"Horatio","9","35","Math"],[101,"Joan","3","NULL","Math"],[85,"Art","4","25.5","Music"],[107,"Johannes","NULL","35.0","NULL"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"sid","type":"\"integer\"","metadata":"{}"},{"name":"sname","type":"\"string\"","metadata":"{}"},{"name":"rating","type":"\"string\"","metadata":"{}"},{"name":"age","type":"\"string\"","metadata":"{}"},{"name":"major","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>sid</th><th>sname</th><th>rating</th><th>age</th><th>major</th></tr></thead><tbody><tr><td>71</td><td>Zorba</td><td>10</td><td>16</td><td>CS</td></tr><tr><td>22</td><td>Dustin</td><td>7</td><td>45</td><td>CS</td></tr><tr><td>58</td><td>Rusty</td><td>10</td><td>35</td><td>CS</td></tr><tr><td>64</td><td>Horatio</td><td>7</td><td>35</td><td>CS</td></tr><tr><td>29</td><td>Brutus</td><td>1</td><td>33</td><td>EE</td></tr><tr><td>31</td><td>Lubber</td><td>8</td><td>55.5</td><td>Econ</td></tr><tr><td>95</td><td>Bob</td><td>3</td><td>63.5</td><td>Econ</td></tr><tr><td>32</td><td>Andy</td><td>8</td><td>25.5</td><td>Math</td></tr><tr><td>74</td><td>Horatio</td><td>9</td><td>35</td><td>Math</td></tr><tr><td>101</td><td>Joan</td><td>3</td><td>NULL</td><td>Math</td></tr><tr><td>85</td><td>Art</td><td>4</td><td>25.5</td><td>Music</td></tr><tr><td>107</td><td>Johannes</td><td>NULL</td><td>35.0</td><td>NULL</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["%sql\nSELECT major, count(*), max(rating) AS max_rating, avg(age) AS avg_age\nFROM Sailors\nGROUP BY major;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"cc7d0552-6b05-45dc-89a6-ecaeafd3543e"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["CS",4,"7",32.75],["EE",1,"1",33.0],["Econ",2,"8",59.5],["Math",3,"9",30.25],["Music",1,"4",25.5],["NULL",1,"NULL",35.0]],"plotOptions":{"displayType":"table","customPlotOptions":{"plotlyBar":[{"key":"grouped","value":true},{"key":"stacked","value":false},{"key":"100_stacked","value":false}],"pieChart":[{"key":"donut","value":true}]},"pivotColumns":[],"pivotAggregation":"sum","xColumns":["major"],"yColumns":["max_rating"]},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"major","type":"\"string\"","metadata":"{}"},{"name":"count(1)","type":"\"long\"","metadata":"{\"__autoGeneratedAlias\":\"true\"}"},{"name":"max_rating","type":"\"string\"","metadata":"{}"},{"name":"avg_age","type":"\"double\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>major</th><th>count(1)</th><th>max_rating</th><th>avg_age</th></tr></thead><tbody><tr><td>CS</td><td>4</td><td>7</td><td>32.75</td></tr><tr><td>EE</td><td>1</td><td>1</td><td>33.0</td></tr><tr><td>Econ</td><td>2</td><td>8</td><td>59.5</td></tr><tr><td>Math</td><td>3</td><td>9</td><td>30.25</td></tr><tr><td>Music</td><td>1</td><td>4</td><td>25.5</td></tr><tr><td>NULL</td><td>1</td><td>NULL</td><td>35.0</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["%sql\nSELECT * FROM Sailors s LEFT OUTER JOIN Reserves r ON s.sid = r.sid;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"424cc172-2f7f-482a-8e47-fc2689d70d49"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[22,"Dustin","7","45","CS","22","104","1998-10-07T00:00:00.000+0000"],[22,"Dustin","7","45","CS","22","103","1998-10-08T00:00:00.000+0000"],[22,"Dustin","7","45","CS","22","102","1998-10-10T00:00:00.000+0000"],[22,"Dustin","7","45","CS","22","101","1998-10-10T00:00:00.000+0000"],[29,"Brutus","1","33","EE",null,null,null],[31,"Lubber","8","55.5","Econ","31","104","1998-11-12T00:00:00.000+0000"],[31,"Lubber","8","55.5","Econ","31","103","1998-11-06T00:00:00.000+0000"],[31,"Lubber","8","55.5","Econ","31","102","1998-10-11T00:00:00.000+0000"],[32,"Andy","8","25.5","Math",null,null,null],[58,"Rusty","10","35","CS",null,null,null],[64,"Horatio","7","35","CS","64","102","1998-09-02T00:00:00.000+0000"],[64,"Horatio","7","35","CS","64","101","1998-09-05T00:00:00.000+0000"],[71,"Zorba","10","16","CS",null,null,null],[74,"Horatio","9","35","Math","74","103","1993-09-08T00:00:00.000+0000"],[85,"Art","4","25.5","Music",null,null,null],[95,"Bob","3","63.5","Econ",null,null,null],[101,"Joan","3","NULL","Math",null,null,null],[107,"Johannes","NULL","35.0","NULL",null,null,null]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"sid","type":"\"integer\"","metadata":"{}"},{"name":"sname","type":"\"string\"","metadata":"{}"},{"name":"rating","type":"\"string\"","metadata":"{}"},{"name":"age","type":"\"string\"","metadata":"{}"},{"name":"major","type":"\"string\"","metadata":"{}"},{"name":"sid","type":"\"string\"","metadata":"{}"},{"name":"bid","type":"\"string\"","metadata":"{}"},{"name":"date","type":"\"timestamp\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>sid</th><th>sname</th><th>rating</th><th>age</th><th>major</th><th>sid</th><th>bid</th><th>date</th></tr></thead><tbody><tr><td>22</td><td>Dustin</td><td>7</td><td>45</td><td>CS</td><td>22</td><td>104</td><td>1998-10-07T00:00:00.000+0000</td></tr><tr><td>22</td><td>Dustin</td><td>7</td><td>45</td><td>CS</td><td>22</td><td>103</td><td>1998-10-08T00:00:00.000+0000</td></tr><tr><td>22</td><td>Dustin</td><td>7</td><td>45</td><td>CS</td><td>22</td><td>102</td><td>1998-10-10T00:00:00.000+0000</td></tr><tr><td>22</td><td>Dustin</td><td>7</td><td>45</td><td>CS</td><td>22</td><td>101</td><td>1998-10-10T00:00:00.000+0000</td></tr><tr><td>29</td><td>Brutus</td><td>1</td><td>33</td><td>EE</td><td>null</td><td>null</td><td>null</td></tr><tr><td>31</td><td>Lubber</td><td>8</td><td>55.5</td><td>Econ</td><td>31</td><td>104</td><td>1998-11-12T00:00:00.000+0000</td></tr><tr><td>31</td><td>Lubber</td><td>8</td><td>55.5</td><td>Econ</td><td>31</td><td>103</td><td>1998-11-06T00:00:00.000+0000</td></tr><tr><td>31</td><td>Lubber</td><td>8</td><td>55.5</td><td>Econ</td><td>31</td><td>102</td><td>1998-10-11T00:00:00.000+0000</td></tr><tr><td>32</td><td>Andy</td><td>8</td><td>25.5</td><td>Math</td><td>null</td><td>null</td><td>null</td></tr><tr><td>58</td><td>Rusty</td><td>10</td><td>35</td><td>CS</td><td>null</td><td>null</td><td>null</td></tr><tr><td>64</td><td>Horatio</td><td>7</td><td>35</td><td>CS</td><td>64</td><td>102</td><td>1998-09-02T00:00:00.000+0000</td></tr><tr><td>64</td><td>Horatio</td><td>7</td><td>35</td><td>CS</td><td>64</td><td>101</td><td>1998-09-05T00:00:00.000+0000</td></tr><tr><td>71</td><td>Zorba</td><td>10</td><td>16</td><td>CS</td><td>null</td><td>null</td><td>null</td></tr><tr><td>74</td><td>Horatio</td><td>9</td><td>35</td><td>Math</td><td>74</td><td>103</td><td>1993-09-08T00:00:00.000+0000</td></tr><tr><td>85</td><td>Art</td><td>4</td><td>25.5</td><td>Music</td><td>null</td><td>null</td><td>null</td></tr><tr><td>95</td><td>Bob</td><td>3</td><td>63.5</td><td>Econ</td><td>null</td><td>null</td><td>null</td></tr><tr><td>101</td><td>Joan</td><td>3</td><td>NULL</td><td>Math</td><td>null</td><td>null</td><td>null</td></tr><tr><td>107</td><td>Johannes</td><td>NULL</td><td>35.0</td><td>NULL</td><td>null</td><td>null</td><td>null</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"markdown","source":["## It's a Brave New World!\n\nWe where come from, most people know that the world isn't flat. What about Spark? The example JSON data here is from Don Chamberlin's fabulous book, *SQL++ for SQL Users*. Again, the goal here is to just look at a few of the most important features for digging into nested data. See Databricks' online docs for more!"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"1b0e8a33-c955-4874-afac-df2bd307aa46"}}},{"cell_type":"code","source":["# File location and type\nfile_location = \"/FileStore/tables/customers.jsonl\"\nfile_type = \"json\"\n\ncustomers_df = spark.read.format(file_type) \\\n .option(\"inferSchema\", infer_schema) \\\n .load(file_location)\n\ndisplay(customers_df)"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"f96fe36b-c54d-4c6b-8a16-2a6644bc1c94"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[["St. Louis, MO","201 Main St.","63101"],"C13","T. Cruise",750],[["Hanover, MA","690 River St.","02340"],"C25","M. Streep",690],[["St. Louis, MO","360 Mountain Ave.","63101"],"C31","B. Pitt",null],[["Boston, MA","420 Green St.","02115"],"C35","J. Roberts",565],[["Boston, MA","120 Harbor Blvd.","02115"],"C37","T. Hanks",750],[["St. Louis, MO","150 Market St.","63101"],"C41","R. Duvall",640],[["Rome, Italy","Via del Corso",null],"C47","S. Loren",625]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"address","type":"{\"type\":\"struct\",\"fields\":[{\"name\":\"city\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"street\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"zipcode\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}","metadata":"{}"},{"name":"custid","type":"\"string\"","metadata":"{}"},{"name":"name","type":"\"string\"","metadata":"{}"},{"name":"rating","type":"\"long\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>address</th><th>custid</th><th>name</th><th>rating</th></tr></thead><tbody><tr><td>List(St. Louis, MO, 201 Main St., 63101)</td><td>C13</td><td>T. Cruise</td><td>750</td></tr><tr><td>List(Hanover, MA, 690 River St., 02340)</td><td>C25</td><td>M. Streep</td><td>690</td></tr><tr><td>List(St. Louis, MO, 360 Mountain Ave., 63101)</td><td>C31</td><td>B. Pitt</td><td>null</td></tr><tr><td>List(Boston, MA, 420 Green St., 02115)</td><td>C35</td><td>J. Roberts</td><td>565</td></tr><tr><td>List(Boston, MA, 120 Harbor Blvd., 02115)</td><td>C37</td><td>T. Hanks</td><td>750</td></tr><tr><td>List(St. Louis, MO, 150 Market St., 63101)</td><td>C41</td><td>R. Duvall</td><td>640</td></tr><tr><td>List(Rome, Italy, Via del Corso, null)</td><td>C47</td><td>S. Loren</td><td>625</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["customers_df.createOrReplaceTempView(\"customers\")"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"b063e399-32a7-43aa-832d-47e78ae18488"}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["%sql\nDESCRIBE customers;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"2022c7fe-5836-487b-94f6-a984aa1ffaf8"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["address","struct<city:string,street:string,zipcode:string>",null],["custid","string",null],["name","string",null],["rating","bigint",null]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"col_name","type":"\"string\"","metadata":"{\"comment\":\"name of the column\"}"},{"name":"data_type","type":"\"string\"","metadata":"{\"comment\":\"data type of the column\"}"},{"name":"comment","type":"\"string\"","metadata":"{\"comment\":\"comment of the column\"}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>col_name</th><th>data_type</th><th>comment</th></tr></thead><tbody><tr><td>address</td><td>struct<city:string,street:string,zipcode:string></td><td>null</td></tr><tr><td>custid</td><td>string</td><td>null</td></tr><tr><td>name</td><td>string</td><td>null</td></tr><tr><td>rating</td><td>bigint</td><td>null</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["# File location and type\nfile_location = \"/FileStore/tables/orders.jsonl\"\nfile_type = \"json\"\n\norders_df = spark.read.format(file_type) \\\n .option(\"inferSchema\", infer_schema) \\\n .load(file_location)\n\ndisplay(orders_df)"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"caefdd3c-ca2b-40fc-9cf9-fe638214be2f"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["C41",[[347,19.99,5],[193,28.89,2]],"2017-04-29",1001,"2017-05-03"],["C13",[[460,100.99,95],[680,8.75,150]],"2017-05-01",1002,"2017-05-03"],["C31",[[120,88.99,2],[460,99.99,3]],"2017-06-15",1003,"2017-06-16"],["C35",[[680,9.99,6],[195,35.0,4]],"2017-07-10",1004,"2017-07-15"],["C37",[[460,99.98,2],[347,22.0,120],[780,1500.0,1],[375,149.98,2]],"2017-08-30",1005,null],["C41",[[680,25.98,51],[120,85.0,65],[460,99.98,120]],"2017-09-02",1006,"2017-09-04"],["C13",[[185,21.99,5],[680,20.5,1]],"2017-09-13",1007,"2017-09-20"],["C13",[[460,99.99,20]],"2017-10-13",1008,null]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"custid","type":"\"string\"","metadata":"{}"},{"name":"items","type":"{\"type\":\"array\",\"elementType\":{\"type\":\"struct\",\"fields\":[{\"name\":\"itemno\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"price\",\"type\":\"double\",\"nullable\":true,\"metadata\":{}},{\"name\":\"qty\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]},\"containsNull\":true}","metadata":"{}"},{"name":"order_date","type":"\"string\"","metadata":"{}"},{"name":"orderno","type":"\"long\"","metadata":"{}"},{"name":"ship_date","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>custid</th><th>items</th><th>order_date</th><th>orderno</th><th>ship_date</th></tr></thead><tbody><tr><td>C41</td><td>List(List(347, 19.99, 5), List(193, 28.89, 2))</td><td>2017-04-29</td><td>1001</td><td>2017-05-03</td></tr><tr><td>C13</td><td>List(List(460, 100.99, 95), List(680, 8.75, 150))</td><td>2017-05-01</td><td>1002</td><td>2017-05-03</td></tr><tr><td>C31</td><td>List(List(120, 88.99, 2), List(460, 99.99, 3))</td><td>2017-06-15</td><td>1003</td><td>2017-06-16</td></tr><tr><td>C35</td><td>List(List(680, 9.99, 6), List(195, 35.0, 4))</td><td>2017-07-10</td><td>1004</td><td>2017-07-15</td></tr><tr><td>C37</td><td>List(List(460, 99.98, 2), List(347, 22.0, 120), List(780, 1500.0, 1), List(375, 149.98, 2))</td><td>2017-08-30</td><td>1005</td><td>null</td></tr><tr><td>C41</td><td>List(List(680, 25.98, 51), List(120, 85.0, 65), List(460, 99.98, 120))</td><td>2017-09-02</td><td>1006</td><td>2017-09-04</td></tr><tr><td>C13</td><td>List(List(185, 21.99, 5), List(680, 20.5, 1))</td><td>2017-09-13</td><td>1007</td><td>2017-09-20</td></tr><tr><td>C13</td><td>List(List(460, 99.99, 20))</td><td>2017-10-13</td><td>1008</td><td>null</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["orders_df.createOrReplaceTempView(\"orders\")"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"f44e7b77-a5ae-4457-9a03-9aff338159cf"}},"outputs":[],"execution_count":0},{"cell_type":"code","source":["%sql\nDESCRIBE orders;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"4bd5a766-d7e7-4d07-a5c9-3a92f5978913"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["custid","string",null],["items","array<struct<itemno:bigint,price:double,qty:bigint>>",null],["order_date","string",null],["orderno","bigint",null],["ship_date","string",null]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"col_name","type":"\"string\"","metadata":"{\"comment\":\"name of the column\"}"},{"name":"data_type","type":"\"string\"","metadata":"{\"comment\":\"data type of the column\"}"},{"name":"comment","type":"\"string\"","metadata":"{\"comment\":\"comment of the column\"}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>col_name</th><th>data_type</th><th>comment</th></tr></thead><tbody><tr><td>custid</td><td>string</td><td>null</td></tr><tr><td>items</td><td>array<struct<itemno:bigint,price:double,qty:bigint>></td><td>null</td></tr><tr><td>order_date</td><td>string</td><td>null</td></tr><tr><td>orderno</td><td>bigint</td><td>null</td></tr><tr><td>ship_date</td><td>string</td><td>null</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["%sql\nSELECT c.custid, c.address.city\nFROM customers c\nWHERE c.name = 'T. Hanks'"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"f8b15c1c-dcae-4d80-b738-2564cd303bad"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["C37","Boston, MA"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"custid","type":"\"string\"","metadata":"{}"},{"name":"city","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>custid</th><th>city</th></tr></thead><tbody><tr><td>C37</td><td>Boston, MA</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["%sql\nSELECT * FROM orders WHERE orderno = 1001;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"6a057055-3f16-4812-816e-c658edd6ddbb"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["C41",[[347,19.99,5],[193,28.89,2]],"2017-04-29",1001,"2017-05-03"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"custid","type":"\"string\"","metadata":"{}"},{"name":"items","type":"{\"type\":\"array\",\"elementType\":{\"type\":\"struct\",\"fields\":[{\"name\":\"itemno\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"price\",\"type\":\"double\",\"nullable\":true,\"metadata\":{}},{\"name\":\"qty\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]},\"containsNull\":true}","metadata":"{}"},{"name":"order_date","type":"\"string\"","metadata":"{}"},{"name":"orderno","type":"\"long\"","metadata":"{}"},{"name":"ship_date","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>custid</th><th>items</th><th>order_date</th><th>orderno</th><th>ship_date</th></tr></thead><tbody><tr><td>C41</td><td>List(List(347, 19.99, 5), List(193, 28.89, 2))</td><td>2017-04-29</td><td>1001</td><td>2017-05-03</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["%sql\nSELECT * FROM orders o WHERE size(o.items) > 2;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"83950ab4-c150-494c-9863-9966a3ae8790"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[["C37",[[460,99.98,2],[347,22.0,120],[780,1500.0,1],[375,149.98,2]],"2017-08-30",1005,null],["C41",[[680,25.98,51],[120,85.0,65],[460,99.98,120]],"2017-09-02",1006,"2017-09-04"]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"custid","type":"\"string\"","metadata":"{}"},{"name":"items","type":"{\"type\":\"array\",\"elementType\":{\"type\":\"struct\",\"fields\":[{\"name\":\"itemno\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"price\",\"type\":\"double\",\"nullable\":true,\"metadata\":{}},{\"name\":\"qty\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]},\"containsNull\":true}","metadata":"{}"},{"name":"order_date","type":"\"string\"","metadata":"{}"},{"name":"orderno","type":"\"long\"","metadata":"{}"},{"name":"ship_date","type":"\"string\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>custid</th><th>items</th><th>order_date</th><th>orderno</th><th>ship_date</th></tr></thead><tbody><tr><td>C37</td><td>List(List(460, 99.98, 2), List(347, 22.0, 120), List(780, 1500.0, 1), List(375, 149.98, 2))</td><td>2017-08-30</td><td>1005</td><td>null</td></tr><tr><td>C41</td><td>List(List(680, 25.98, 51), List(120, 85.0, 65), List(460, 99.98, 120))</td><td>2017-09-02</td><td>1006</td><td>2017-09-04</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["%sql\n-- Note: This is a semantically *terrible SQL* extension IMO (!!!)\nSELECT o.orderno,\n o.order_date,\n explode(o.items) AS item\nFROM orders AS o\nORDER BY o.orderno;\n"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"c37c41d0-94f3-4282-8261-448a041f7748"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[1001,"2017-04-29",[347,19.99,5]],[1001,"2017-04-29",[193,28.89,2]],[1002,"2017-05-01",[680,8.75,150]],[1002,"2017-05-01",[460,100.99,95]],[1003,"2017-06-15",[120,88.99,2]],[1003,"2017-06-15",[460,99.99,3]],[1004,"2017-07-10",[680,9.99,6]],[1004,"2017-07-10",[195,35.0,4]],[1005,"2017-08-30",[780,1500.0,1]],[1005,"2017-08-30",[375,149.98,2]],[1005,"2017-08-30",[347,22.0,120]],[1005,"2017-08-30",[460,99.98,2]],[1006,"2017-09-02",[120,85.0,65]],[1006,"2017-09-02",[460,99.98,120]],[1006,"2017-09-02",[680,25.98,51]],[1007,"2017-09-13",[185,21.99,5]],[1007,"2017-09-13",[680,20.5,1]],[1008,"2017-10-13",[460,99.99,20]]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"orderno","type":"\"long\"","metadata":"{}"},{"name":"order_date","type":"\"string\"","metadata":"{}"},{"name":"item","type":"{\"type\":\"struct\",\"fields\":[{\"name\":\"itemno\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"price\",\"type\":\"double\",\"nullable\":true,\"metadata\":{}},{\"name\":\"qty\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>orderno</th><th>order_date</th><th>item</th></tr></thead><tbody><tr><td>1001</td><td>2017-04-29</td><td>List(347, 19.99, 5)</td></tr><tr><td>1001</td><td>2017-04-29</td><td>List(193, 28.89, 2)</td></tr><tr><td>1002</td><td>2017-05-01</td><td>List(680, 8.75, 150)</td></tr><tr><td>1002</td><td>2017-05-01</td><td>List(460, 100.99, 95)</td></tr><tr><td>1003</td><td>2017-06-15</td><td>List(120, 88.99, 2)</td></tr><tr><td>1003</td><td>2017-06-15</td><td>List(460, 99.99, 3)</td></tr><tr><td>1004</td><td>2017-07-10</td><td>List(680, 9.99, 6)</td></tr><tr><td>1004</td><td>2017-07-10</td><td>List(195, 35.0, 4)</td></tr><tr><td>1005</td><td>2017-08-30</td><td>List(780, 1500.0, 1)</td></tr><tr><td>1005</td><td>2017-08-30</td><td>List(375, 149.98, 2)</td></tr><tr><td>1005</td><td>2017-08-30</td><td>List(347, 22.0, 120)</td></tr><tr><td>1005</td><td>2017-08-30</td><td>List(460, 99.98, 2)</td></tr><tr><td>1006</td><td>2017-09-02</td><td>List(120, 85.0, 65)</td></tr><tr><td>1006</td><td>2017-09-02</td><td>List(460, 99.98, 120)</td></tr><tr><td>1006</td><td>2017-09-02</td><td>List(680, 25.98, 51)</td></tr><tr><td>1007</td><td>2017-09-13</td><td>List(185, 21.99, 5)</td></tr><tr><td>1007</td><td>2017-09-13</td><td>List(680, 20.5, 1)</td></tr><tr><td>1008</td><td>2017-10-13</td><td>List(460, 99.99, 20)</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["%sql\n-- Note: This is the right SQL extension IMO (like UNNEST or UNWIND in other systems)\nSELECT o.orderno, o.order_date, item\nFROM orders AS o LATERAL VIEW explode(o.items) AS item\nORDER BY o.orderno;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"0fadfba5-2736-4877-a474-0f3e1b783897"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[1001,"2017-04-29",[347,19.99,5]],[1001,"2017-04-29",[193,28.89,2]],[1002,"2017-05-01",[680,8.75,150]],[1002,"2017-05-01",[460,100.99,95]],[1003,"2017-06-15",[120,88.99,2]],[1003,"2017-06-15",[460,99.99,3]],[1004,"2017-07-10",[680,9.99,6]],[1004,"2017-07-10",[195,35.0,4]],[1005,"2017-08-30",[780,1500.0,1]],[1005,"2017-08-30",[375,149.98,2]],[1005,"2017-08-30",[347,22.0,120]],[1005,"2017-08-30",[460,99.98,2]],[1006,"2017-09-02",[120,85.0,65]],[1006,"2017-09-02",[460,99.98,120]],[1006,"2017-09-02",[680,25.98,51]],[1007,"2017-09-13",[185,21.99,5]],[1007,"2017-09-13",[680,20.5,1]],[1008,"2017-10-13",[460,99.99,20]]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"orderno","type":"\"long\"","metadata":"{}"},{"name":"order_date","type":"\"string\"","metadata":"{}"},{"name":"item","type":"{\"type\":\"struct\",\"fields\":[{\"name\":\"itemno\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"price\",\"type\":\"double\",\"nullable\":true,\"metadata\":{}},{\"name\":\"qty\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>orderno</th><th>order_date</th><th>item</th></tr></thead><tbody><tr><td>1001</td><td>2017-04-29</td><td>List(347, 19.99, 5)</td></tr><tr><td>1001</td><td>2017-04-29</td><td>List(193, 28.89, 2)</td></tr><tr><td>1002</td><td>2017-05-01</td><td>List(680, 8.75, 150)</td></tr><tr><td>1002</td><td>2017-05-01</td><td>List(460, 100.99, 95)</td></tr><tr><td>1003</td><td>2017-06-15</td><td>List(120, 88.99, 2)</td></tr><tr><td>1003</td><td>2017-06-15</td><td>List(460, 99.99, 3)</td></tr><tr><td>1004</td><td>2017-07-10</td><td>List(680, 9.99, 6)</td></tr><tr><td>1004</td><td>2017-07-10</td><td>List(195, 35.0, 4)</td></tr><tr><td>1005</td><td>2017-08-30</td><td>List(780, 1500.0, 1)</td></tr><tr><td>1005</td><td>2017-08-30</td><td>List(375, 149.98, 2)</td></tr><tr><td>1005</td><td>2017-08-30</td><td>List(347, 22.0, 120)</td></tr><tr><td>1005</td><td>2017-08-30</td><td>List(460, 99.98, 2)</td></tr><tr><td>1006</td><td>2017-09-02</td><td>List(120, 85.0, 65)</td></tr><tr><td>1006</td><td>2017-09-02</td><td>List(460, 99.98, 120)</td></tr><tr><td>1006</td><td>2017-09-02</td><td>List(680, 25.98, 51)</td></tr><tr><td>1007</td><td>2017-09-13</td><td>List(185, 21.99, 5)</td></tr><tr><td>1007</td><td>2017-09-13</td><td>List(680, 20.5, 1)</td></tr><tr><td>1008</td><td>2017-10-13</td><td>List(460, 99.99, 20)</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["%sql\n-- So now we can query on an item-by-item basis\nSELECT item.*\nFROM orders AS o LATERAL VIEW explode(o.items) AS item\nWHERE o.orderno = 1002 AND item.qty < 100;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"c8a2337f-de5c-4987-8786-b3d9f45604a0"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[460,100.99,95]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"itemno","type":"\"long\"","metadata":"{}"},{"name":"price","type":"\"double\"","metadata":"{}"},{"name":"qty","type":"\"long\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>itemno</th><th>price</th><th>qty</th></tr></thead><tbody><tr><td>460</td><td>100.99</td><td>95</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"code","source":["%sql\n-- And we can mix such \"explosions\" with regular SQL joins!\nSELECT SUM(it.qty) AS qty\nFROM customers as c, orders AS o LATERAL VIEW explode(o.items) AS it\nWHERE c.name LIKE 'J%' AND o.custid = c.custid;"],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"e8148802-e4ab-4484-836b-a6bc682ad6ee"}},"outputs":[{"output_type":"display_data","metadata":{"application/vnd.databricks.v1+output":{"overflow":false,"datasetInfos":[],"data":[[10]],"plotOptions":{"displayType":"table","customPlotOptions":{},"pivotColumns":null,"pivotAggregation":null,"xColumns":null,"yColumns":null},"columnCustomDisplayInfos":{},"aggType":"","isJsonSchema":true,"removedWidgets":[],"aggSchema":[],"schema":[{"name":"qty","type":"\"long\"","metadata":"{}"}],"aggError":"","aggData":[],"addedWidgets":{},"metadata":{"isDbfsCommandResult":false},"dbfsResultPath":null,"type":"table","aggOverflow":false,"aggSeriesLimitReached":false,"arguments":{}}},"output_type":"display_data","data":{"text/html":["<style scoped>\n .table-result-container {\n max-height: 300px;\n overflow: auto;\n }\n table, th, td {\n border: 1px solid black;\n border-collapse: collapse;\n }\n th, td {\n padding: 5px;\n }\n th {\n text-align: left;\n }\n</style><div class='table-result-container'><table class='table-result'><thead style='background-color: white'><tr><th>qty</th></tr></thead><tbody><tr><td>10</td></tr></tbody></table></div>"]}}],"execution_count":0},{"cell_type":"markdown","source":["## Keep Calm and Query On!\n\nTime to explore on your own..."],"metadata":{"application/vnd.databricks.v1+cell":{"title":"","showTitle":false,"inputWidgets":{},"nuid":"b0bedc2e-ceca-4826-8bd0-6309f8713c07"}}}],"metadata":{"application/vnd.databricks.v1+notebook":{"notebookName":"CS122DLecture16SparkSQLDemoNotebook","dashboards":[{"elements":[{"elementNUID":"96816ed7-b08a-4ca3-abb9-f99880c3535d","guid":"dc2c7358-289d-49f5-b892-7f8f36adc680","resultIndex":null,"options":null,"position":{"x":0,"y":0,"height":8,"width":12,"z":null},"elementType":"command"}],"guid":"17b25884-fed9-4d10-be92-55708971dd33","layoutOption":{"stack":true,"grid":true},"version":"DashboardViewV1","nuid":"72113aa4-6657-49b9-8f07-64afad8cb689","origId":2163063809556398,"title":"Untitled","width":1024,"globalVars":{}}],"notebookMetadata":{"pythonIndentUnit":2},"language":"python","widgets":{},"notebookOrigID":2163063809556368}},"nbformat":4,"nbformat_minor":0}
|
|---|