The Impala Query UI application enables you to perform queries on Apache Hadoop data stored in HDFS or HBase using Impala. For information about Impala, see Installing and Using Impala. You can create, run, and manage queries, and download the results in a Microsoft Office Excel worksheet file or a comma-separated values file.
Most of Hive SQL is compatible with Impala and we are going to compare the queries of episode one in both Hive and Impala applications. Notice that this comparison is not 100% scientific but it demonstrates what would happen in common cases.
Using Impala through the Hue app is easier in many ways than using it through the command-line impala-shell. For example, table names, databases, columns, built-in functions are auto-completable and the syntax highlighting shows the potential typos in your queries. Multiple queries or a selected portion of a query can be executed from the editor. Parameterized queries are supported and the user will be prompted for values at submission time. Impala queries can be saved and shared between users or deleted and then restored from trash in case of mistakes.
Impala uses the same Metastore as Hive so you can browse tables with the Metastore app. You can also pick a database with a drop-down in the editor. After submission, progress and logs are reported and you can browse the result with infinite scroll or download the data with your browser.
Let’s start with the Hue examples as they are easily accessible. They are very small but show the lightning speed of Impala and the inefficiency of the series of MapReduce jobs created by Hive.
Make sure the Hive and Impala examples are installed in Hue and then in each app, go to ‘Saved Queries’, copy the query ‘Sample: Top salaries’ and submit it.
Then we are back to our Yelp data. Let’s take the query from episode one and execute it in both apps:
SELECT r.business_id, name, SUM(cool) AS coolness FROM review r JOIN business b ON (r.business_id = b.business_id) WHERE categories LIKE '%Restaurants%' AND `date` = '$date' GROUP BY r.business_id, name ORDER BY coolness DESC LIMIT 10
Again, you can see the benefits of Impala’s architecture and optimization.
The Impala Query UI application is one of the applications installed as part of Hue. For information about installing and configuring Hue, see Hue Installation in http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/CDH4-Installation-Guide.html.
The Impala Query UI assumes an existing Impala installation. The Hue installation instructions include the configuration necessary for Impala. You can view the current configuration from the Settings tab.
Click the Impala Query UI
icon () in the navigation bar at the top of the Hue browser page.
You can create databases, tables, partitions, and load data by executing Hive data manipulation statements in the Beeswax application.
You can also use the Metastore Manager application to manage the databases, tables, and partitions and load data.
When you change the metastore using one of these applications, you must click the Refresh button under METASTORE CATALOG in the pane to the left of the Query Editor to make the metastore update visible to the Impala server.
The Query Editor view lets you create queries in the Impala Query Language, which is based on the Hive Standard Query Language (HiveQL) and described in the Impala Language Reference topic in Installing and Using Impala.
You can name and save your queries to use later.
When you submit a query, you can either wait for the query to complete, or return later to find the queries in the History view.
In the box to the left of the Query field, you can select a database, override the default Impala settings, enable users to enter parameters at run-time. See Advanced Query Settings for details on using these settings.
The pane to the left of the Query Editor lets you specify the following options:
You can view the history of queries that you have run previously. Results for these queries are available for one week or until Hue is restarted.
You can view a list of saved queries of all users by clicking My Queries and then selecting either Recent Saved Queries or Recent Run Queries tab to display the respective queries or clicking Saved Queries. You can copy any query, but you can edit, delete, and view the history of only your own queries.