- Notifications
You must be signed in to change notification settings - Fork88
Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).
License
nabeel-oz/qlik-py-tools
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Version 8.0 has been released. Get ithere or withDocker.
This release adds the capability to use pre-trained scikit-learn, Keras or REST API based models with Qlik. More on thishere.
- Introduction
- Note on the approach
- Docker Image
- Pre-requisites
- Installation
- Usage
- Qonnections 2019 Workshop
Qlik's advanced analytics integration provides a path to making modern data science algorithms accessible to the wider business audience. This project is an attempt to show what's possible.
This repository provides a server side extension (SSE) for Qlik Sense built using Python. The intention is to provide a set of functions for data science that can be used as expressions in Qlik.
Sample Qlik Sense apps are included and explained so that the techniques shown here can be easily replicated.
The current implementation includes:
- Supervised Machine Learning : Implemented usingscikit-learn, the go-to machine learning library for Python. This SSE implements the full machine learning flow from data preparation, model training and evaluation, to making predictions in Qlik.
- Unsupervised Machine Learning : Also implemented usingscikit-learn. This provides capabilities for dimensionality reduction and clustering.
- Deep Learning : Implemented usingKeras andTensorFlow. This SSE implements the full flow of setting up a neural network, training and evaluating it, and using it to make predictions. Deep Learning models can be used for sequence predictions and complex timeseries forecasting.
- Use of pretrained ML models in Qlik : Pre-trained scikit-learn, Keras and REST API based models can be called from this SSE, allowing predictions to be exposed within the broader analysis and business context of a Qlik app. The implementation also allows for What-if analysis using the models.
- Named Entity Recognition : Implemented usingspaCy, an excellent Natural Language Processing library that comes with pre-trained neural networks. This SSE allows you to use spaCy's models for Named Entity Recognition or retrain them with your data for even better results.
- Association rules : Implemented usingEfficient-Apriori. Association Rules Analysis is a data mining technique to uncover how items are associated to each other. This technique is best known for Market Basket Analysis, but can be used more generally for finding interesting associations between sets of items that occur together, for example, in a transaction, a paragraph, or a diagnosis.
- Clustering : Implemented usingHDBSCAN, a high performance algorithm that is great for exploratory data analysis.
- Time series forecasting : Implemented usingFacebook Prophet, a modern library for easily generating good quality forecasts. Now with the ability to use multiple regressors as input.
- Seasonality and holiday analysis : Also using Facebook Prophet.
- Linear correlations : Implemented using Pandas.
Further information on these features is available through theUsage section below.
For more information on Qlik Server Side Extensions seeqlik-oss.
Disclaimer: This project has been started by me in a personal capacity and is not supported by Qlik.
Forecasting, Clustering & Supervised Machine Learning:
Deep Learning & Additional Regressors with Prophet:
Clustering COVID-19 Literature:
In this project we have defined functions that expose open source algorithms to Qlik using thegRPC framework. Each function allows the user to define input data and parameters to control the underlying algorithm's output.
While native Python script evaluation is possible in Qlik as demonstrated in theqlik-oss Python examples, I have disabled this functionality in this project.
I prefer this approach for two key reasons:
- Separation of the Python implementation from usage in Qlik: App authors in Qlik just need to be able to use the functions, and understand the algorithms at a high level. Any complexity such as handling missing values or scaling the data is abstracted to simple parameters passed in the Qlik expression.
- Security: This server side extension can not be used to execute arbitrary code from Qlik. Users are restricted to the algorithms exposed through this SSE. Security can be further enhanced by running the SSE on a separate, sandboxed machine, andsecuring communication with certificates.
A Docker image for qlik-py-tools is available onDocker Hub. If you are familiar with containerisation this is the simplest way to get this SSE running in your environment.
If you want to install this SSE locally on a Windows machine, you can jump to thePre-requisites section.
To pull the image from Docker's public registry use the command below:
docker pull nabeeloz/qlik-py-tools
The image uses port 50055 by default. You can add encryption using certificates as explainedhere.
docker run -p 50055:50055 -it nabeeloz/qlik-py-tools
Containers built with this image only retain data while they are running. This means that to persist trained models or log files you will need to add a volume or bind mount usingDocker capabilities for managing data.
# Store predictive models to a Docker volume on the host machinedocker run -p 50055:50055 -it -v pytools-models:/qlik-py-tools/models nabeeloz/qlik-py-tools# Store log files to a bind mount on the host machinedocker run -p 50055:50055 -it -v ~/Documents/logs:/qlik-py-tools/core/logs nabeeloz/qlik-py-tools# Run a container in detached mode, storing predictive models on a volume and logs on a bind mountdocker run \ -p 50055:50055 \ -d \ -v pytools-models:/qlik-py-tools/models \ -v ~/Documents/logs:/qlik-py-tools/core/logs \ nabeeloz/qlik-py-tools # Run a container in detached mode, storing predictive models on a volume , logs on a bind mount and restart the container on rebootdocker run \ -p 50055:50055 \ -d \ --restart unless-stopped \ -v pytools-models:/qlik-py-tools/models \ -v ~/Documents/logs:/qlik-py-tools/core/logs \ nabeeloz/qlik-py-tools# Run a container in detached mode, restart on reboot, store models and logs to bind mounts, and use certificates for secure communicationdocker run \-p 50055:50055 \-d \--restart unless-stopped \--name qlik-py-tools \-v ~/sse_PyTools_generated_certs/sse_PyTools_server_certs:/qlik-py-tools/pem-dir \-v ~/Documents/models:/qlik-py-tools/models \-v ~/Documents/logs:/qlik-py-tools/core/logs \nabeeloz/qlik-py-tools python __main__.py --pem_dir=/qlik-py-tools/pem-dir
- Qlik Sense Enterprise or Qlik Sense Desktop
- Python >= 3.4 <= 3.6.9. The recommended version is 3.6.8.
- Note: The latest stable version of Python for this SSE is 3.6. The
pystan
library, which is required forfbprophet
, is known to have issues with Python 3.7 on Windows.
- Note: The latest stable version of Python for this SSE is 3.6. The
- Microsoft Visual C++ Build Tools
This installation requires Internet access. To install this SSE on a machine without Internet access refer to theoffline installation guide.
Get Python fromhere. Make sure you get the 64 bit version. Remember to select the option to add Python to your PATH environment variable.
You'll also need a recent C++ compiler as this is a requirement for the
pystan
library used byfbprophet
. One option is to useMicrosoft Visual C++ Build Tools. If you are having trouble finding the correct installer trythis direct link. An alternative is to use themingw-w64
compiler as described in thePyStan documentation.Download thelatest release for this SSE and extract it to a location of your choice. The machine where you are placing this repository should have access to a local or remote Qlik Sense instance.
Right click
Qlik-Py-Init.bat
and chose 'Run as Administrator'. You can open this file in a text editor to review the commands that will be executed. If everything goes smoothly you will see a Python virtual environment being set up, project files being copied, some packages being installed and TCP Port50055
being opened for inbound communication.- Note that the script always ends with a "All done" message and does not check for errors.
- If you need to change the port you can do so in the file
core\__main__.py
by opening the file with a text editor, changing the value of the_DEFAULT_PORT
variable, and then saving the file. You will also need to updateQlik-Py-Init.bat
to use the same port in thenetsh
command. This command will only work if you run the batch file through an elevated command prompt (i.e. with administrator privileges). - Once the execution completes, do a quick scan of the log to see everything installed correctly. The libraries imported are:
grpcio
,grpcio-tools
,numpy
,scipy
,pandas
,cython
,joblib
,pyyaml
,pystan
,fbprophet
,scikit-learn
,hdbscan
,spacy
,efficient-apriori
,tensorflow
,keras
and their dependencies. Also, check that thecore
andgenerated
directories have been copied successfully to the newly createdqlik-py-env
directory. - If the initialization fails for any reason, you can simply delete the
qlik-py-env
directory and re-runQlik-Py-Init.bat
.
Now whenever you want to start this Python service you can run
Qlik-Py-Start.bat
.Now you need toset up an Analytics Connection in Qlik Sense Enterprise orupdate the Settings.ini file in Qlik Sense Desktop. If you are using the sample apps make sure you use
PyTools
as the name for the analytics connection, or alternatively, update all of the expressions to use the new name.- For Qlik Sense Desktop you need to update the
settings.ini
file. There may be two copies of this file; one atC:/Users/<User ID>/Documents/Qlik/Sense/
and another atC:/Users/AppData/Local/Programs/Qlik/Sense/Engine
. Add the SSE settings to both files.SSEPlugin=PyTools,localhost:50055;
- For Qlik Sense Enterprise you need to create an Analytics Connection through QMC:
- The Analytics Connection can point to a different machine and can besecured with certificates:
- For Qlik Sense Desktop you need to update the
Finally restart the Qlik Sense engine service for Qlik Sense Enterprise or close and reopen Qlik Sense Desktop. This step may not be required if you are using Qlik Sense April 2018 and beyond.
If a connection between Python and Qlik is established you should see the capabilities listed in the terminal.
Capabilities may change as this is an ongoing project.
We go into the details of each capability in the sections below.
Sample Qlik Sense apps are provided and each app includes extensive techniques to use this SSE's capabilities in Qlik.
Most of the sample apps require the Dashboard Extension Bundle which was released with Qlik Sense November 2018.
Documentation | Sample App | Additional App Dependencies |
---|---|---|
Correlations | Correlations | None. |
Clustering | Clustering with HDBSCAN | None. |
Predictions with pretrained models | Predictions with scikit-learn and Keras | Follow thepre-requisites andsteps in the documentation. If using Qlik Sense Desktop you will need to download thedata source, create a data connection named AttachedFiles in the app, and point the connection to the folder containing the source file. |
Machine Learning | Train & Test Predict K-fold Cross Validation Parameter Tuning K-fold CV & Parameter Tuning Complex Forecasting with scikit-learn | Make sure you reload the K-fold Cross Validation or Train & Test app before using the Predict app. If using Qlik Sense Desktop you will need to download thedata source, create a data connection named AttachedFiles in the app, and point the connection to the folder containing the source file.The forecasting app is best understood together with the Deep Learning section below. Here we just use more traditional ML algorithms rather than Deep Learning for producing the forecast. Make sure you reload the app before using the final sheets to make predictions. The data source for this app can be foundhere. |
Deep Learning | Complex Forecasting with Keras | Make sure you reload the app before using the final two sheets to make predictions. If using Qlik Sense Desktop you will need to download thedata source, create a data connection named AttachedFiles in the app, and point the connection to the folder containing the source file. |
Forecasting | Facebook Prophet (Detailed) Facebook Prophet (Simple) Facebook Prophet (Multiple regressors) | For the detailed app, use the bookmarks to step through the sheets with relevant selections. For calling Prophet through the load script refer to the simple app. If you want to reload the app using Qlik Sense Desktop you will need to download thedata source, create a data connection named AttachedFiles in the app, and point the connection to the folder containing the source file.For the use of Prophet's additional regressors capability refer to the multiple regressors app. The data for this app is foundhere. |
Named Entity Recognition | NER and Association Rules | If using Qlik Sense Desktop you will need to download thedata sources, create a data connection namedAttachedFiles in the app, and point the connection to the folder containing the source files. |
Association Rules / Market Basket Analysis | NER and Association Rules Market Basket Analysis | If using Qlik Sense Desktop you will need to download thedata sources, create a data connection namedAttachedFiles in the app, and point the connection to the folder containing the source files. |
At Qonnections 2019 we ran hands-on workshops with PyTools and Qlik Sense. The content for these workshops, including the sample apps and exercise intructions, is availablehere.
The workshop exercises can be used as a tutorial for using this Server Side Extension with Qlik Sense Enterprise or Desktop.
About
Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).