Apache Kylin is very powerfull OLAP engine. It supports ODBC driver to move the data in excel, however this driver is not user friendly. Users should wright sql queries for this.
Typically business user likes to use “select and click” approach. This functionality is supported in Excel Pivot. Excel Pivot supports XML for Analysis (XMLA) datasources. Open source implementation of this data source is XMLA Connect. XMLA Connect can connect to mondrian. Mondrian is like driver, which converts MDX requests into SQL requests, which understand JDBC driver. Apache Kylin has JDBC driver. Then we will have:
Excel Pivot => XMLA Connect => Mondrian (on tomcat) => Apache Kylin JDBC => Apache Kylin Cube => Hbase => Hive => *.avro files on HDFS
The following steps are performed, when you connect to Apache Kylin from Excel. XMLA Connect sends SOAP MDX query request for data to Web Server, which hosts Mondrian Servlet. Mondrian Servlet calls Mondrian functionlity to convert MDX query into ANSI SQL query. ANSI SQL query goes further to JDBC driver from Kylin, which in its turn connects Apache Kylin OLAP cube.
Let’s put all parts together.
- You need download XMLA Connect and install it in excel.
- Then we need mondrian web application. You can use this one. You should install tomcat and put this application there. In XMLA Connect you need to define path to xmla, for example: http://localhost:8080/xmondrian/xmla. You can test this with sample databases available in xmondrian.
- On the next step we need to connect xmondrian with Kylin. For this step we need Kylin jdbc driver. You can download it here. You need to extrat jar and put it in WEB-INF folder for xmondrian. There is a patch in github to mondrian, which adds support of Kylin to Mondrian. But you can implement your own Kylin dialect for Mondrian how it is described here.
- Here you should define and load Kylin Cubes and define these cubes in Mondrian using mondrian.xml.
It looks complicated, but it works. As a result you will have easy way of quering Kylin Cubes in excel.
You can read other articles from BigData area:
Short note about HDFS or why you need distributed file system
Thoughts about schema-on-write and schema-on-read
Raw, clean, and derived data in data lakes based on HDFS
Improving performance by reading data with Hive for HDFS using subfolders (partitioning)
HBase is the next step in your BigData technology stack
Schema evolution and backward and forward compability for data in data lakes
Apache Kylin OLAP:
How to integrate Apache Kylin OLAP In Excel (pivot) [XMLA Connect and Mondrian]
How to implement Kylin Dialect for Mondrian
Authentication and Authorizaton for XMLA Connect and Mondrian