Statistics

We look at the characteristics of your data using descriptive statistics and inferential statistics. We perform probability tests between dependent and independent variables. We pay particular attention to appropriate sampling in the context of Big Data in order to infer the population. By means of descriptive statistics and inferential statistics it is possible, for example, to identify dependencies of the buying behavior of your customers. For this purpose we create a coefficient matrix and discuss it with you, so that we can work together on optimizing features for Machine Learning. This enables our machine learning algorithms or machine learning models to learn best from data.

Clustering

We identify clusters from your data. These clusters can help you define different customer groups to target your marketing and product portfolio. We automatically extract topics from your text using machine learning algorithms and perform similarity analysis between texts using machine learning algorithms. To prepare the data, we use techniques such as Optical Character Recognition (OCR) for text extraction from images. For large datasets, we use word embeddings or neural networks. For clustering we use a k-means clustering algorithm or LDA (Linear Discriminant Analysis) algorithm.

Classification

We use Machine Learning to perform an automatic classification of your data. This can be particularly useful if you want to identify a type of customer by an algorithm and make business decisions based on the algorithm's decision. Data can be purchase data of your customers or text data. The data can be assigned to a category of data automatically by a machine learning algorithm. We use logistic regression to classify the data.

Prediction

We use the data and the resulting features to predict purchases or reviews of your products. For this we refer to single or multiple features from the data. We optimize the features and find the best machine learning algorithm to predict the buying behavior of your customers. We choose machine learning algorithms like linear regression and multiple regression to predict linear dependencies between the data. We use Support Vector Machines (SVM), testing different kernels to quickly make business decisions through a separation process of the data. We use Decision Trees like Random Forest to make or suggest future decisions (prediction) by the algorithm based on previous decisions.

Data preparation

We prepare the data in such a way that we remove or fill in missing data (e.g. modal, MIN-MAX etc.). We perform appropriate statistical tests so that we obtain a suitable set of features to obtain a reliable prediction with the machine learning algorithms.

Data visualization

We visualize the data in scatter plots or boxplots to show insights into the data. We visually advise your data in a form that satisfies your management and you can use for your business decisions. In addition, you get insights into the quality of the machine learning algorithms through a confusion matrix.

Modeling

After data preparation and data visualization, we build machine learning models that can be used for future prediction of your business decisions. We perform hyperparameter tuning so that we get the best setting for the machine learning model.

Python, Scikit-Learn, Pandas, SQL and Scrum

For data analysis we use the Python programming language. We use Pandas for data preparation and data visualization. For machine learning like clustering, classification and prediction we use the free software library Scikit-Learn. We use SQL to query the data. This allows you to incorporate our Machine Learning models into your business. For project execution we can work in the agile project management method Scrum.

Microsoft Azure and Microsoft Azure HDInsight

We also perform data preparation, data visualization and prediction with Microsoft Azure. We build models and offer you a web service on request, with which you can use the machine learning algorithms in Microsoft Excel, for example. We use Microsoft Azure HDInsight for the Big Data area.

Apache Hadoop, Apache Hive, Pic, Apache Spark

We can use Apache Hadoop for storing and reading Big Data from clusters. For accessing the data, we can use Apache Hive or Pic. If desired, we can use Apache Spark for processing and machine learning of Big Data.

Data Science Consultant
Consulting in Data Science, Machine Learning and Text Mining

Our services in Data Science, Machine Learning and Text Mining

Our approach in Data Science, Machine Learning and Text Mining

Our technologies in data science, machine learning and text mining

Data Science Consultant Consulting in Data Science, Machine Learning and Text Mining

Our services in Data Science, Machine Learning and Text Mining

Our approach in Data Science, Machine Learning and Text Mining

Our technologies in data science, machine learning and text mining

Data Science Consultant
Consulting in Data Science, Machine Learning and Text Mining