# Data Science & Machine Learning Newsletter #104

Posted on Fri 02 February 2018 in Data Science & Machine Learning Newsletter

**Do you want to get updates? Please join Data Science & Machine Learning Newsletter Linked Group**

- Visualizing Incomplete and Missing Data
- "... But a lot of the time (most of the time?), the data you work with is not complete. There is missing data. ... What do you do when this happens? ... Here are some solutions to get you headed in the right direction."

- A Gentle Introduction to Vectors for Machine Learning
- "Vectors are a foundational element of linear algebra. Vectors are used throughout the field of machine learning in the description of algorithms and processes such as the target variable (y) when training an algorithm. In this tutorial, you will discover linear algebra vectors for machine learning."

- Top 4 open source alternatives to Google Analytics
- "Many businesses of all sizes use Google Analytics. But if you want to keep control of your data, you need a tool that you can control. You won’t get that from Google Analytics. Luckily, Google Analytics isn’t the only game on the web."

- Infographic: Poker & AI: The Rise Of Machines Against Humans
- Autoscaling deep learning clusters on AWS with Kubernetes and RiseML
- "A little known feature of the RiseML installer is it’s ability to create deep learning clusters with autoscaling support on AWS. Autoscaling can help significantly reduce GPU bills and increase compute capacity during peak times by automatically launching and terminating EC2 instances based on demand."

- CatBoost is an open-source gradient boosting library with categorical features support
- "CatBoost version 0.6 has a lot speedups and improvements. Most valuable improvement at the moment is the release of industry fastest inference implementation."

- The Matrix Calculus You Need For Deep Learning
- "This paper is an attempt to explain all the matrix calculus you need in order to understand the training of deep neural networks. We assume no math knowledge beyond what you learned in calculus 1, and provide links to help you refresh the necessary math where needed."

- https://github.com/eriklindernoren/NapkinML
- A tiny lib with pocket-sized implementations of machine learning models in NumPy.

- https://www.gnu.org/software/datamash/
- GNU datamash is a command-line program which performs basic numeric,textual and statistical operations on input textual data files.