Impala Query UI

The Impala Query UI application enables you to perform queries on Apache Hadoop data stored in HDFS or HBase using Impala. For information about Impala, see Installing and Using Impala. You can create, run, and manage queries, and download the results in a Microsoft Office Excel worksheet file or a comma-separated values file.

Hue, the Hadoop UI, has been supporting Impala closely since its first version and brings fast interactive queries within your browser. If not familiar with Impala, we recommend you to check the documentation of the fastest SQL engine for Hadoop.

Impala App

Most of Hive SQL is compatible with Impala and we are going to compare the queries of episode one in both Hive and Impala applications. Notice that this comparison is not 100% scientific but it demonstrates what would happen in common cases.


Using Impala through the Hue app is easier in many ways than using it through the command-line impala-shell. For example, table names, databases, columns, built-in functions are auto-completable and the syntax highlighting shows the potential typos in your queries. Multiple queries or a selected portion of a query can be executed from the editor. Parameterized queries are supported and the user will be prompted for values at submission time. Impala queries can be saved and shared between users or deleted and then restored from trash in case of mistakes.


Impala uses the same Metastore as Hive so you can browse tables with the Metastore app. You can also pick a database with a drop-down in the editor. After submission, progress and logs are reported and you can browse the result with infinite scroll or download the data with your browser.

Query speed comparison

Let’s start with the Hue examples as they are easily accessible. They are very small but show the lightning speed of Impala and the inefficiency of the series of MapReduce jobs created by Hive.

Make sure the Hive and Impala examples are installed in Hue and then in each app, go to ‘Saved Queries’, copy the query ‘Sample: Top salaries’ and submit it.

Then we are back to our Yelp data. Let’s take the query from episode one and execute it in both apps:

SELECT r.business_id, name, SUM(cool) AS coolness
FROM review r JOIN business b
ON (r.business_id = b.business_id)
WHERE categories LIKE '%Restaurants%'
AND `date` = '$date'
GROUP BY r.business_id, name
ORDER BY coolness DESC
LIMIT 10

Again, you can see the benefits of Impala’s architecture and optimization.

Impala Query UI and Installation and Configuration

The Impala Query UI application is one of the applications installed as part of Hue. For information about installing and configuring Hue, see Hue Installation in http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/CDH4-Installation-Guide.html.

The Impala Query UI assumes an existing Impala installation. The Hue installation instructions include the configuration necessary for Impala. You can view the current configuration from the Settings tab.

Starting Impala Query UI

Click the Impala Query UI icon () in the navigation bar at the top of the Hue browser page.

Managing Databases, Tables, and Partitions

You can create databases, tables, partitions, and load data by executing Hive data manipulation statements in the Beeswax application.

You can also use the Metastore Manager application to manage the databases, tables, and partitions and load data.

When you change the metastore using one of these applications, you must click the Refresh button under METASTORE CATALOG in the pane to the left of the Query Editor to make the metastore update visible to the Impala server.

Installing Example Queries and Tables

  Note: You must be a superuser to perform this task.
  1. Click . The Quick Start Wizard opens.
  2. Click Step 2: Examples.
  3. Click Impala Query UI.

Queries

The Query Editor view lets you create queries in the Impala Query Language, which is based on the Hive Standard Query Language (HiveQL) and described in the Impala Language Reference topic in Installing and Using Impala.

You can name and save your queries to use later.

When you submit a query, you can either wait for the query to complete, or return later to find the queries in the History view.

In the box to the left of the Query field, you can select a database, override the default Impala settings, enable users to enter parameters at run-time. See Advanced Query Settings for details on using these settings.

Creating Queries

  1. In the Query Editor window, type a query or multiple queries separated by a semicolon ";". To be presented with a drop-down of autocomplete options, type CTRL+spacebar when entering a query.
  2. To save your query and advanced settings to use again later, click Save As, enter a name and description, and then click OK. To save changes to an existing query, click Save.

Loading Queries into the Query Editor

  1. Do one of the following:
    • Click the My Queries tab.
      1. Click the Recent Saved Queries or Recent Run Queries tab to display the respective queries.
    • Click the Saved Queries tab.
  2. Click a query name. The query is loaded into the Query Editor.

Running Queries

  Note: To run a query, you must be logged in to Hue as a user that also has a Unix user account on the remote server.
  1. To execute a portion of the query, highlight one or more query statements.
  2. Click Execute. The Query Results window appears with the results of your query.
    • To view a log of the query execution, click Log at the top of the results display. You can use the information in this tab to debug your query.
    • To view the query that generated these results, click Query at the top of the results display.
    • To view the columns of the query, click Columns.
    • To return to the query in the Query Editor, click Unsaved Query.
  3. If there are multiple statements in the query, click Next in the Multi-statement query pane to execute the remaining statements.

Advanced Query Settings

The pane to the left of the Query Editor lets you specify the following options:

Option

Description

DATABASE

The database containing the table definitions.

SETTINGS

Override Impala default settings. To configure a new setting:
  1. Click Add.
  2. For Key, enter a Impala configuration variable name.
  3. For Value, enter the value you want to use for the variable.

To view the default settings, click the Settings tab at the top of the page.

PARAMETERIZATION

Indicate that a dialog box should display to enter parameter values when a query containing the string $parametername is executed. Enabled by default.

METASTORE CATALOG

Refresh metadata. It is best to  refresh  metadata after making changes to databases such as adding or dropping a table.

Viewing Query History

You can view the history of queries that you have run previously. Results for these queries are available for one week or until Hue is restarted.

  1. Click History. A list of your saved and unsaved queries displays in the Query History window.
  2. To display the queries for all users, click Show everyone's queries. To display your queries only, click Show my queries.
  3. To display the automatically generated actions performed on a user's behalf, click Show auto actions. To display user queries again, click Show user queries.

Viewing, Editing, Copying, and Deleting Saved Queries

You can view a list of saved queries of all users by clicking My Queries and then selecting either Recent Saved Queries or Recent Run Queries tab to display the respective queries or clicking Saved Queries. You can copy any query, but you can edit, delete, and view the history of only your own queries.

Saved Query Procedure

Edit

  1. Click Saved Queries. The Queries window displays.
  2. Check the checkbox next to the query and click Edit. The query displays in the Query Editor window.
  3. Change the query and then click Save. You can also click Save As, enter a new name, and click OK to save a copy of the query.

Copy

  1. Click Saved Queries. The Queries window displays.
  2. Check the checkbox next to the query and click Copy. The query displays in the Query Editor window.
  3. Change the query as necessary and then click Save. You can also click Save As, enter a new name, and click OK to save a copy of the query.

Copy in Query History

  1. Click History. The Query History window displays.
  2. To display the queries for all users, click Show everyone's queries. The queries for all users display in the History window.
  3. Click the query you want to copy. A copy of the query displays in the Query Editor window.
  4. Change the query, if necessary, and then click Save As, enter a new name, and click OK to save the query.

Delete

  1. Click Saved Queries. The Queries window displays.
  2. Check the checkbox next to the query and click Delete.
  3. Click Yes to confirm the deletion.