The Beeswax application enables you to perform queries on Apache Hive, a data warehousing system designed to work with Hadoop. For information about Hive, see Hive Documentation. You can create Hive databases, tables and partitions, load data, create, run, and manage queries, and download the results in a Microsoft Office Excel worksheet file or a comma-separated values file.
Beeswax is installed and configured as part of Hue. For information about installing and configuring Hue, see the Hue Installation manual.
Beeswax assumes an existing Hive installation. The Hue installation instructions include the configuration necessary for Beeswax to access Hive. You can view the current Hive configuration from the Settings tab in the Beeswax application.
By default, a Beeswax user can see the saved queries for all users - both his/her own queries and those of other Beeswax users. To restrict viewing saved queries to the query owner and Hue administrators, set the share_saved_queries property under the [beeswax] section in the Hue configuration file to false.
Click the Beeswax icon () in the
navigation bar at the top of the Hue browser page.
You can create databases, tables, partitions, and load data by executing Hive data manipulation statements in the Beeswax application.
You can also use the Metastore Manager application to manage the databases, tables, and partitions and load data.
Note: You must be a superuser to perform
this task.
The Query Editor view lets you create, save, and submit queries in the Hive Query Language (HQL), which is similar to Structured Query Language (SQL). When you submit a query, the Beeswax Server uses Hive to run the queries. You can either wait for the query to complete, or return later to find the queries in the History view. You can also request to receive an email message after the query is completed.
In the box to the left of the Query field, you can select a database, override the default Hive and Hadoop settings, specify file resources and user-defined functions, enable users to enter parameters at run-time, and request email notification when the job is complete. See Advanced Query Settings for details on using these settings.
Do one of the following:
Click a query name. The query is loaded into the Query Editor.
Note: To run a query, you must be logged
in to Hue as a user that also has a Unix user account on the remote
server.
If there are multiple statements in the query, click Next in the Multi-statement query pane to execute the remaining statements.
Note: Under MR JOBS, you can view any
MapReduce jobs that the query generated.
Important:
This is the preferred way to save when the result is large (for example > 1M rows).
Do any of the following to download or save the query results:
The pane to the left of the Query Editor lets you specify the following options:
DATABASE | The database containing the table definitions. |
SETTINGS | Override the Hive and Hadoop default settings. To configure a new
setting:
|
FILE RESOURCES | Make files locally accessible at query execution time available on the
Hadoop cluster. Hive uses the Hadoop Distributed Cache to distribute the
added files to all machines in the cluster at query execution time.
|
USER-DEFINED FUNCTIONS | Specify user-defined functions. Click Add to configure a new setting. Specify the function name in the Name field, and specify the class name for Classname. You *must* specify a JAR file for the user-defined functions in FILE RESOURCES. To include a user-defined function in a query, add a $ (dollar sign) before the function name in the query. For example, if MyTable is a user-defined function name in the query, you would type: SELECT $MyTable |
PARAMETERIZATION | Indicate that a dialog box should display to enter parameter values when a query containing the string $parametername is executed. Enabled by default. |
EMAIL NOTIFICATION | Indicate that an email message should be sent after a query completes. The email is sent to the email address specified in the logged-in user's profile. |
You can view the history of queries that you have run previously. Results for these queries are available for one week or until Hue is restarted.
You can view a list of saved queries of all users by clicking My Queries and then selecting either Recent Saved Queries or Recent Run Queries tab to display the respective queries or clicking Saved Queries. You can copy any query, but you can edit, delete, and view the history of only your own queries.
Edit
Copy
Copy in Query History
Delete