Big Data and Machine learning: the data scientist's manual

Published by Dunod on 18 february

This book - co-written by SQLI expert Pirmin Lemberger and Marc Batty, Médéric Morel and Jean-Luc Raffaëlli - is a guide to understanding the challenges of a Big Data project, understanding the underlying concepts (particularly Machine Learning) and acquiring the skills required to set up a data lab.


Big Data has established itself as a major innovation for all businesses seeking to develop a competitive advantage by using their customer, supplier, product, process, and equipment data etc. But what technical solution should you choose? What professional skills should be developed in the IT department?

The objective of the book is to give readers the background factual information which will enable them to arm themselves for this barrage of questions. Their approach is to present Big Data from the specific point of view of predictive analytics (or Machine Learning): how can we create predictive models (of human behaviour typically) using the data itself?

So this book sheds light on:

  • theoretical concepts (statistical processing of data, distributed computing etc.);
  • tools (Hadoop framework, Storm etc.);
  • examples of Machine Learning;
  • the typical organisation of a data science project.

« This book is for everyone who thinks about the best possible use for data within a business, whether they are data scientists, the CIO, project leaders or specialists in particular areas of the business, » explains Pirmin Lemberger, Head of Technology Intelligence for SQLI and co-author of the book.