Data preparation
We prepare the data in such a way that we remove or fill in missing data (e.g. modal, MIN-MAX etc.). We perform appropriate statistical tests so that we obtain a suitable set of features to obtain a reliable prediction with the machine learning algorithms.
Data visualization
We visualize the data in scatter plots or boxplots to show insights into the data. We visually advise your data in a form that satisfies your management and you can use for your business decisions. In addition, you get insights into the quality of the machine learning algorithms through a confusion matrix.
Modeling
After data preparation and data visualization, we build machine learning models that can be used for future prediction of your business decisions. We perform hyperparameter tuning so that we get the best setting for the machine learning model.
Python, Scikit-Learn, Pandas, SQL and Scrum
For data analysis we use the Python programming language. We use Pandas for data preparation and data visualization. For machine learning like clustering, classification and prediction we use the free software library Scikit-Learn. We use SQL to query the data. This allows you to incorporate our Machine Learning models into your business.
For project execution we can work in the agile project management method Scrum.
Microsoft Azure and Microsoft Azure HDInsight
We also perform data preparation, data visualization and prediction with Microsoft Azure. We build models and offer you a web service on request, with which you can use the machine learning algorithms in Microsoft Excel, for example. We use Microsoft Azure HDInsight for the Big Data area.
Apache Hadoop, Apache Hive, Pic, Apache Spark
We can use Apache Hadoop for storing and reading Big Data from clusters. For accessing the data, we can use Apache Hive or Pic. If desired, we can use Apache Spark for processing and machine learning of Big Data.