So many wasted data

In many organizations people capture a lot of data and… just ignore them, wasting their potential value.
The latest case, at the moment I write this post, is with an aircraft MRO company.

This post echoes a previous one: Trouble with manual data capture

Every aircraft undergoing MRO requires a lot of mandatory paperwork for the sake of traceability. The required information is either directly captured in an IT system, either written on paper and later input into the IT systems.

As this company wants to drastically reduce the duration of the aircrafts’ grounding for MRO and improve the reliability of its planning, the primary source of information to understand the causes of the problems is the data logbook.

I could easily figure out what kind of analysis to do and the correlation to look for, as adherence to planning for example.
Alas, as I was presented the database my enthusiasm quickly faded.

Some of the data supposed to be entered into the system simply wasn’t. Of course it happen to be the most interesting data for my analysis.

Work Breakdown is not always consistent across the portfolio, which makes comparisons challenging.
Mechanics would not always report their work on the appropriate work order. Thus work order lead time to workload correlation would be flawed.

It didn’t seem to worry management as much as it worried me, not because it could compromise my analysis but because the clients would not be charged the right amount (hours spent on an aircraft are billed).

According to data some aircrafts departed the MRO facility before they flew in. An indication of the lack of rigorous tracking as well as a lack in the software’s input trustworthiness checking.

And the list of flaws goes on.

A bit troublesome in a business boasting about safety by the way.

The pity is, as so often, that companies allocate resources to capture data and just ignore them. It would just require a little extra energy and rigor to exploit the data and use them to monitor, drive and improve their business.

Instead of that, just accumulating the data without exploitation is nothing more than wasting its value.

Bandeau_CH38View Christian HOHMANN's profile on LinkedIn

Play big on small data

Chris HOHMANN

Chris HOHMANN – Author

This weird title “Play big on small data” suggests the utilization of big data principles on small data sets. “Small” is to be considered relatively to huge amount of data big data can manage, which is not necessarily only a handful.


I came across big data with former colleagues who were IT experts and got a kind of epiphany about big data with the eponym book.

Since that reading I do not collect, structure and analyze data the same way anymore. I tend to be more tolerant about inaccuracies, mess and lack of data because what I am looking for is insight and big picture rather than certitude and accuracy.

As poorly tended datasets are the norm rather than exception, starting an analysis with this mindset saves some stress. The challenge is not to filter out valid data for a statistical significant analysis, but a way to depict a truthful “good enough” picture, suitable for decision-making.

Playing big on small data does not mean apply the technical solutions for handling huge amount of data or fast calculation on them, simply get inspired by an approach favoring the understanding of the “what” rather than the “why”, in other words, favor correlation instead of causation.

In many cases, a good enough understanding of the situation is just… good enough. Going down to the very details or make sure of the accuracy would not change much but would take time and divert resources for the sake of unnecessary precision.

When planning a 500km journey, you don’t need to know each meter’s details, some milestones are just good enough to depict the way.

Accepting to trade, when it’s meaningful, correlation for causation helps to get around the few and messy data usually available. Even so data may be plenty, for a given analysis they are too often few fitting the purpose and in the right format. It is then smart to look at other data sets, even if they are in the same state, and search for patterns and correlations that can validate or invalidate the initial assumption.

The conclusion is most of the time trustworthy enough to make a decision.

Bandeau_CH36
View Christian HOHMANN's profile on LinkedIn

If you liked this post, share it!

Trouble with manual data capture

Asking people to fill out forms in order to monitor performance, track a phenomenon or try to gather data for problem solving, too often leads to trouble when data is ultimately collected and analysed.

The case is about manual data capture into paper forms and logbooks on production lines. A precious source of information for a consultant like me. Potentially.

Alas, as I started to capture the precious bits of information from the paper forms into a spreadsheet, I soon realized how poorly the initial data were written:

Most of the forms were not thoroughly filled out, some boxes not filled, fields left blank, totals not calculated or wrong, dates not specified and a lot of bad handwriting leading to possible misinterpretation, among other liberties taken.

It seems obvious that the production operators do not understand the importance of the data they are supposed to capture nor the reasons for desired accuracy and completeness.

To them it’s probably a mere chore and not understanding the future use of the stuff they are supposed to write, they pay minimum attention to it.

It is also obvious that management is complacent about the situation and does not use the data, otherwise somebody else would have pointed out the mess before me, and hopefully acted upon.

Well, we can’t change the past and all data lost are definitely lost. The poorly input ones is all I could get, so I’ll had to make with what I had.

Thanks to a relatively important (I dare not write big) amount of data, flaws do not have too much impact, the big picture remains truthful. For me the importance is the big picture, not the accuracy of each single data point. (A takeaway from my exposure to big data!)

I noticed that most of the worse filled forms related to “special events”, when production suffered a breakdown, shortages and the like. These dots on the performance curve would anyhow been regarded as outliers and discarded for the sake of a more significant trend.

So it was not a big deal to disregard them from the beginning.

However, the pity was that no robust and deeper analysis could be conducted on these “special events”, not that unusual over a six-month period.

Some incomplete data could be restored indirectly, for example calculating durations from start and end time or conversely a missing timestamp could be restored from another date and duration for example. Sometimes, these kind of fixes introduced some uncertainty on the values, but again I was not after accuracy but trying to depict and understanding the big picture.

In order to be fair with personnel on the lines, I have to agree that some of the forms had poor design. A better one could have led to less misunderstanding or confusion. This acknowledged, the data reporting was not left to everybody’s choice, as it is mandatory by regulation.

Because to my great surprise and disappointment, this happened in pharma industry.


Bandeau_CH36View Christian HOHMANN's profile on LinkedIn

If you liked this post, share it!

Why Big data may supersede Six Sigma

Chris HOHMANN

Chris HOHMANN – Author

In this post, I assume in near future correlation will be more important than causation* for decision-making, decisions will have to be made according to “incomplete, good enough” information rather than solid analyses, thus big data superseding Six Sigma.

*See my post “my takeaways from Big data” on this subject

In a world with increasing uncertainty, fast changing businesses and fiercer competition, I assume speed will make the difference between competitors. The winners will be those having:

  • fast development of new offers
  • short time-to-market
  • quick reaction to unpredictable changes and orders
  • fast response to customers requirements and complaints
  • etc.

Frenzy will be the new normal.

I also assume that for most industries, products will be increasingly customized, fashionable (changing rapidly from one generation to the next, or constantly changing in shapes, colors, materials, etc.) and with shorter life cycles.

That means that production batches are smaller and the repeating of an identical production run unlikely.

In such an environment, decisions must be made swiftly, most often based on partial, incomplete information, with “messy” data flowing in great numbers from various sources (customer service, social media, real-time sales data, sales reps reports, automated surveys, benchmarking…).

Furthermore, decisions have to be made the closest to customers or where decision matters, by empowered people. There is no more time to report to a higher authority and wait for the answer, decisions must be made almost at once.

There will be fewer opportunities to step back, collect relevant data, analyze them and find out the root cause of a problem, not even speaking about designing experiments and testing several possible solutions.

Decision making is going to be more and more stochastic: with the number and urgency of decisions to make what matters is making significantly more good decisions than bad ones, the latter being inevitable.

What is coming is what Big data is good at: fast handling a lots of messy bits of information and revealing existing correlations and/or patterns to help making decisions. Hence, decision-making will rely more on correlation than causation.

Six Sigma aficionados will probably argue that no problem can be sustainably solved if the root cause is not addressed.

Agreed, but who will care about trying to eradicate a problem that may be a one-shot and which solving time will probably exceed the problem duration?

In a world of growing interactions, transactions and in constant acceleration, time to get to the root cause may not be granted often. Furthermore, even knowing what the root cause is, this one may lay outside of the decision maker or company’s span of control.

Let’s take an example:

The final assembly of a widget requires several subsystems supplied by different suppliers.The production batches are small as the widgets are highly customized and with short life cycle (about a year).

The data survey – using big data techniques – foretells the high likelihood to have some trouble with the next production because of correlations between former experienced issues in combination of some of the supplies.

Given the short notice, relatively to the lengthy lead time to get alternate supplies, and the short production run, it is more efficient to prepare to overcome or bypass the possible problems than trying to solve them. Especially if the likelihood to assemble again these very same widgets is (extremely) low.

Issues are not certain, they are likely.

The sound decision is then to mitigate the risk by adding more tests, quality gates, screening procedures and the like, supply the market with flawless widgets, make the profit and head for the next production.

Decision is then based on probability, not on profound knowledge.

But even so the causes of issues are well-known, the decision must sometimes be the same: avoidance rather than solving.

This is already the case with quieter businesses, when parts, supplies or subsystems are supplied by remote unreliable suppliers and with no grip to control them.

I remember a major pump maker facing this kind of trouble with pig iron casted parts from India. No Six Sigma techniques could help make a decision or solve the problem: the problem laid beyond the span of control.


If you liked this Post, share it!

Bandeau_CH36View Christian HOHMANN's profile on LinkedIn

My Takeaways from Big data, the book

I got my first explanations about Big Data from experts who were my colleagues for a time. These passionate IT guys, surely very knowledgeable about their trade, were not always good about passing somewhat complex concepts in a simple manner to non-specialists. Yet they did well enough to raise my interest to know a bit more.

I then did what I usually do: search and learn on my own. That’s how I bought “Big data: A Revolution That Will Transform How We Live, Work and Think” by Viktor Mayer-Schonberger & Kenneth Cukier.

Without turning myself into an expert, I got farther in the understanding of what is behind big data and got better appreciation of its potentials and the way it surely will “Transform How We Live, Work and Think”, as the book cover claims.

My takeaways

Coping with mass and mess

Big data as computing technique is able to cope not only with huge amount of data, but data from various sources, in various formats, able to show order in an incredible mess the traditional approaches could not even start to exploit.

Big data can link together comments on Facebook, twitter, blogs, websites and companies’ data bases about a product for example, even the data formats are highly different.

In contrast, when using a traditional database software, data need to be neat and complying to predetermined format. It also requires to be disciplined in the way to input data into a field, as the software would be unable to understand that a mistyped “honey moon” meant “honeymoon” and is to be considered, computed, counted.. as such.

Switch from causation to correlation

With big data, the obsession for the “why” (causation) will give way to the “what” (correlation) for both understanding something and making decisions.

Big data can be defined as being about what, not why

This is somewhat puzzling as we are long used to search for causation. It is especially weird when using predictive analytics, the system will tell a problem exists but not what caused it, why it happens.

But for decision-making, knowing what is often good enough, knowing why is not always mandatory.

Correlation was known and used before big data, but with big data and as the computing power it is no more constrained, limited to linear correlations, more complex non linear correlations can be surfaced, allowing a new point of view and even a bigger picture to look at.

I use to imagine it as a huge data cube i can handle at will to look from any perspective.

Latent, inexhaustible value

Correlation will free latent value of data, therefore, the more the better.

What does it mean?

Prior to big data, the limitations of data capture, storage and analysis tend to concentrate on data useful to answer the “why”. Now it is possible to ask huge mass of data many different questions and find patterns, giving answers to (almost any?) “what”.

The future usage of data is not known at the moment it is collected, but with low-cost of storage, it is not (anymore) a concern. Value can be generated over and over in the future, just going through the mass of data with a new question, another research… Data retain latent value until it will be used and used again, without depleting.

That is why big data is considered the new ore and it is not even exhausted when used, it is a kind of infinite usage. That’s why so many companies are eager to collect data, any data, many data.

Do not give up exactitude, but the devotion to it

For making decisions, “good enough” information is… good enough.

With massive data, inaccuracies increase, but have little influence on the big picture.

The metaphor of telescope vs. microscope is often used in the book; when exploring the cosmos, a big picture is good enough even so many stars will be depicted by only a few pixels.

When looking at the big pictures, we don’t need the accuracy of every detail.

What the authors try to make clear is not giving up exactitude, but the devotion to it. There are cases where exactitude is not required and “good enough” is simply good enough.

Big versus little

Statistics have been developed to understand what little available data and/or computing power could tell. Statistics are basically extrapolating the big picture from (very) few samples. “One aim of statistics is to confirm the richest findings using the smallest amount of data”.

The computing power and data techniques are nowadays so powerful that it is no more necessary to work on samples only, it can be done on the whole population (N=all).

Summing up

I was really dragged into reading “Big data”, a well written book for non-IT specialists. Besides giving me insight of the changes and potentials of real big data, it really changed my approach with smaller data, the way I collect and analyse them, how I build my spreadsheets and how I present my findings.

My takeaways are biased as I consider big data for “industrial”, technical data and not personal ones. The book shares insights about risks of the usage already made of personal data and what could come next in terms of reduction of or threat to privacy.


Bandeau_CH36If you like this post, share it!
View Christian HOHMANN's profile on LinkedIn