Play big on small data

Chris HOHMANN

Chris HOHMANN – Author

This weird title “Play big on small data” suggests the utilization of big data principles on small data sets. “Small” is to be considered relatively to huge amount of data big data can manage, which is not necessarily only a handful.


I came across big data with former colleagues who were IT experts and got a kind of epiphany about big data with the eponym book.

Since that reading I do not collect, structure and analyze data the same way anymore. I tend to be more tolerant about inaccuracies, mess and lack of data because what I am looking for is insight and big picture rather than certitude and accuracy.

As poorly tended datasets are the norm rather than exception, starting an analysis with this mindset saves some stress. The challenge is not to filter out valid data for a statistical significant analysis, but a way to depict a truthful “good enough” picture, suitable for decision-making.

Playing big on small data does not mean apply the technical solutions for handling huge amount of data or fast calculation on them, simply get inspired by an approach favoring the understanding of the “what” rather than the “why”, in other words, favor correlation instead of causation.

In many cases, a good enough understanding of the situation is just… good enough. Going down to the very details or make sure of the accuracy would not change much but would take time and divert resources for the sake of unnecessary precision.

When planning a 500km journey, you don’t need to know each meter’s details, some milestones are just good enough to depict the way.

Accepting to trade, when it’s meaningful, correlation for causation helps to get around the few and messy data usually available. Even so data may be plenty, for a given analysis they are too often few fitting the purpose and in the right format. It is then smart to look at other data sets, even if they are in the same state, and search for patterns and correlations that can validate or invalidate the initial assumption.

The conclusion is most of the time trustworthy enough to make a decision.

Bandeau_CH36
View Christian HOHMANN's profile on LinkedIn

If you liked this post, share it!

Trouble with manual data capture

Asking people to fill out forms in order to monitor performance, track a phenomenon or try to gather data for problem solving, too often leads to trouble when data is ultimately collected and analysed.

The case is about manual data capture into paper forms and logbooks on production lines. A precious source of information for a consultant like me. Potentially.

Alas, as I started to capture the precious bits of information from the paper forms into a spreadsheet, I soon realized how poorly the initial data were written:

Most of the forms were not thoroughly filled out, some boxes not filled, fields left blank, totals not calculated or wrong, dates not specified and a lot of bad handwriting leading to possible misinterpretation, among other liberties taken.

It seems obvious that the production operators do not understand the importance of the data they are supposed to capture nor the reasons for desired accuracy and completeness.

To them it’s probably a mere chore and not understanding the future use of the stuff they are supposed to write, they pay minimum attention to it.

It is also obvious that management is complacent about the situation and does not use the data, otherwise somebody else would have pointed out the mess before me, and hopefully acted upon.

Well, we can’t change the past and all data lost are definitely lost. The poorly input ones is all I could get, so I’ll had to make with what I had.

Thanks to a relatively important (I dare not write big) amount of data, flaws do not have too much impact, the big picture remains truthful. For me the importance is the big picture, not the accuracy of each single data point. (A takeaway from my exposure to big data!)

I noticed that most of the worse filled forms related to “special events”, when production suffered a breakdown, shortages and the like. These dots on the performance curve would anyhow been regarded as outliers and discarded for the sake of a more significant trend.

So it was not a big deal to disregard them from the beginning.

However, the pity was that no robust and deeper analysis could be conducted on these “special events”, not that unusual over a six-month period.

Some incomplete data could be restored indirectly, for example calculating durations from start and end time or conversely a missing timestamp could be restored from another date and duration for example. Sometimes, these kind of fixes introduced some uncertainty on the values, but again I was not after accuracy but trying to depict and understanding the big picture.

In order to be fair with personnel on the lines, I have to agree that some of the forms had poor design. A better one could have led to less misunderstanding or confusion. This acknowledged, the data reporting was not left to everybody’s choice, as it is mandatory by regulation.

Because to my great surprise and disappointment, this happened in pharma industry.


Bandeau_CH36View Christian HOHMANN's profile on LinkedIn

If you liked this post, share it!

Why Big data may supersede Six Sigma

Chris HOHMANN

Chris HOHMANN – Author

In this post, I assume in near future correlation will be more important than causation* for decision-making, decisions will have to be made according to “incomplete, good enough” information rather than solid analyses, thus big data superseding Six Sigma.

*See my post “my takeaways from Big data” on this subject

In a world with increasing uncertainty, fast changing businesses and fiercer competition, I assume speed will make the difference between competitors. The winners will be those having:

  • fast development of new offers
  • short time-to-market
  • quick reaction to unpredictable changes and orders
  • fast response to customers requirements and complaints
  • etc.

Frenzy will be the new normal.

I also assume that for most industries, products will be increasingly customized, fashionable (changing rapidly from one generation to the next, or constantly changing in shapes, colors, materials, etc.) and with shorter life cycles.

That means that production batches are smaller and the repeating of an identical production run unlikely.

In such an environment, decisions must be made swiftly, most often based on partial, incomplete information, with “messy” data flowing in great numbers from various sources (customer service, social media, real-time sales data, sales reps reports, automated surveys, benchmarking…).

Furthermore, decisions have to be made the closest to customers or where decision matters, by empowered people. There is no more time to report to a higher authority and wait for the answer, decisions must be made almost at once.

There will be fewer opportunities to step back, collect relevant data, analyze them and find out the root cause of a problem, not even speaking about designing experiments and testing several possible solutions.

Decision making is going to be more and more stochastic: with the number and urgency of decisions to make what matters is making significantly more good decisions than bad ones, the latter being inevitable.

What is coming is what Big data is good at: fast handling a lots of messy bits of information and revealing existing correlations and/or patterns to help making decisions. Hence, decision-making will rely more on correlation than causation.

Six Sigma aficionados will probably argue that no problem can be sustainably solved if the root cause is not addressed.

Agreed, but who will care about trying to eradicate a problem that may be a one-shot and which solving time will probably exceed the problem duration?

In a world of growing interactions, transactions and in constant acceleration, time to get to the root cause may not be granted often. Furthermore, even knowing what the root cause is, this one may lay outside of the decision maker or company’s span of control.

Let’s take an example:

The final assembly of a widget requires several subsystems supplied by different suppliers.The production batches are small as the widgets are highly customized and with short life cycle (about a year).

The data survey – using big data techniques – foretells the high likelihood to have some trouble with the next production because of correlations between former experienced issues in combination of some of the supplies.

Given the short notice, relatively to the lengthy lead time to get alternate supplies, and the short production run, it is more efficient to prepare to overcome or bypass the possible problems than trying to solve them. Especially if the likelihood to assemble again these very same widgets is (extremely) low.

Issues are not certain, they are likely.

The sound decision is then to mitigate the risk by adding more tests, quality gates, screening procedures and the like, supply the market with flawless widgets, make the profit and head for the next production.

Decision is then based on probability, not on profound knowledge.

But even so the causes of issues are well-known, the decision must sometimes be the same: avoidance rather than solving.

This is already the case with quieter businesses, when parts, supplies or subsystems are supplied by remote unreliable suppliers and with no grip to control them.

I remember a major pump maker facing this kind of trouble with pig iron casted parts from India. No Six Sigma techniques could help make a decision or solve the problem: the problem laid beyond the span of control.


If you liked this Post, share it!

Bandeau_CH36View Christian HOHMANN's profile on LinkedIn

My Takeaways from Big data, the book

I got my first explanations about Big Data from experts who were my colleagues for a time. These passionate IT guys, surely very knowledgeable about their trade, were not always good about passing somewhat complex concepts in a simple manner to non-specialists. Yet they did well enough to raise my interest to know a bit more.

I then did what I usually do: search and learn on my own. That’s how I bought “Big data: A Revolution That Will Transform How We Live, Work and Think” by Viktor Mayer-Schonberger & Kenneth Cukier.

Without turning myself into an expert, I got farther in the understanding of what is behind big data and got better appreciation of its potentials and the way it surely will “Transform How We Live, Work and Think”, as the book cover claims.

My takeaways

Coping with mass and mess

Big data as computing technique is able to cope not only with huge amount of data, but data from various sources, in various formats, able to show order in an incredible mess the traditional approaches could not even start to exploit.

Big data can link together comments on Facebook, twitter, blogs, websites and companies’ data bases about a product for example, even the data formats are highly different.

In contrast, when using a traditional database software, data need to be neat and complying to predetermined format. It also requires to be disciplined in the way to input data into a field, as the software would be unable to understand that a mistyped “honey moon” meant “honeymoon” and is to be considered, computed, counted.. as such.

Switch from causation to correlation

With big data, the obsession for the “why” (causation) will give way to the “what” (correlation) for both understanding something and making decisions.

Big data can be defined as being about what, not why

This is somewhat puzzling as we are long used to search for causation. It is especially weird when using predictive analytics, the system will tell a problem exists but not what caused it, why it happens.

But for decision-making, knowing what is often good enough, knowing why is not always mandatory.

Correlation was known and used before big data, but with big data and as the computing power it is no more constrained, limited to linear correlations, more complex non linear correlations can be surfaced, allowing a new point of view and even a bigger picture to look at.

I use to imagine it as a huge data cube i can handle at will to look from any perspective.

Latent, inexhaustible value

Correlation will free latent value of data, therefore, the more the better.

What does it mean?

Prior to big data, the limitations of data capture, storage and analysis tend to concentrate on data useful to answer the “why”. Now it is possible to ask huge mass of data many different questions and find patterns, giving answers to (almost any?) “what”.

The future usage of data is not known at the moment it is collected, but with low-cost of storage, it is not (anymore) a concern. Value can be generated over and over in the future, just going through the mass of data with a new question, another research… Data retain latent value until it will be used and used again, without depleting.

That is why big data is considered the new ore and it is not even exhausted when used, it is a kind of infinite usage. That’s why so many companies are eager to collect data, any data, many data.

Do not give up exactitude, but the devotion to it

For making decisions, “good enough” information is… good enough.

With massive data, inaccuracies increase, but have little influence on the big picture.

The metaphor of telescope vs. microscope is often used in the book; when exploring the cosmos, a big picture is good enough even so many stars will be depicted by only a few pixels.

When looking at the big pictures, we don’t need the accuracy of every detail.

What the authors try to make clear is not giving up exactitude, but the devotion to it. There are cases where exactitude is not required and “good enough” is simply good enough.

Big versus little

Statistics have been developed to understand what little available data and/or computing power could tell. Statistics are basically extrapolating the big picture from (very) few samples. “One aim of statistics is to confirm the richest findings using the smallest amount of data”.

The computing power and data techniques are nowadays so powerful that it is no more necessary to work on samples only, it can be done on the whole population (N=all).

Summing up

I was really dragged into reading “Big data”, a well written book for non-IT specialists. Besides giving me insight of the changes and potentials of real big data, it really changed my approach with smaller data, the way I collect and analyse them, how I build my spreadsheets and how I present my findings.

My takeaways are biased as I consider big data for “industrial”, technical data and not personal ones. The book shares insights about risks of the usage already made of personal data and what could come next in terms of reduction of or threat to privacy.


Bandeau_CH36If you like this post, share it!
View Christian HOHMANN's profile on LinkedIn

What is an Executive Summary Tree?

An Executive Summary Tree is not another type of tree in the Thinking Processes tool box, but a concise, condensed form of a Current Reality Tree or Future Reality Tree for presentation to executives.

Bill Dettmer got used to make short, concise briefings to general officers in his career in the military. Their civilian counterparts, high-ranking executives and decision makers have no more time and patience, thus need brief presentations too.

Two trees in Thinking Process* are especially important: the Current Reality Tree and the Future Reality Tree. The first describes the links from Undesirable Effects (UDEs) or actual problems to their few root causes. The second depicts the Desirable Effects (DEs) and necessary injections to get there in future.

*When this term is singular, it refers to the process, when plural it refers to the tools (trees and cloud)

If the presentation is about the actual state, the Executive Summary Tree will be a condensed version of the Current Reality Tree. If the presentation is about where to go or what to change to, it is the Future Reality Tree which will be presented in this concise way.

The how-to is described by Bill Dettmer himself in this video.


Bandeau_CH36View Christian HOHMANN's profile on LinkedIn

(At least) three reasons why you should not run your business with superheroes

Since I came across the quote of Fujio Cho* (Toyota chairman) about broken processes requiring extraordinary people, I keep wondering how many of the businesses I see are relying on superheroes. Superheroes are wonderwomen and supermen, those skilled and highly dedicated people who run processes or whole businesses ordinary people would not be able or willing to run.

*”We get brilliant results from average people managing brilliant processes. We observe that our competition often gets average (or worse) results from brilliant people managing broken processes.”

They cope with situations others would just not start trying or give up quickly, because of broken processes, poor working conditions, work load or any combination of the like.

Lisez la version française de cet article

Instead of fixing the processes or improving work conditions, so that they can be run by ordinary people, business owners or management invest tremendous efforts in recruiting superheroes.

Here are at least three reasons they should not.

1. Superheroes come in limited number

Superheroes aren’t common, otherwise they’d be ordinary people, not superheroes.

Hence finding the good fit takes time, costs money and efforts.

The same resources (time, money, efforts) could be allocated to fix the processes in order to be run by ordinary people.

For some strange reasons, management keeps searching for superheroes.

2. Superheroes get tired too

Sooner or later playing superheroes will exhaust them, or they get bored when the initial fun has gone.

Superheroes are aging as well, they may aspire to something else than running rubbish processes over time.

Because of 1 & 2, even with some longer lasting heroes coping with the mess, the organization will always be at least one short.

3. Superheroes have ambition or personal goals

a. Superheroes are likely to get promoted.

They are usually noticed and appreciated and their skills find many other applications elsewhere in or outside of the organization.
The trouble is once promoted, who will take care of the processes still in their same poor state?

b. Superheroes may leave the organization for personal reasons, getting married, change their career path, raise a family…

When Superheroes leave the organization, for any reason, they leave the broken processes.

Therefore and again, investing in fixing processes is more sustainable.

But for some strange reasons, management keeps searching for superheroes.


Bandeau_CH12View Christian HOHMANN's profile on LinkedIn

Four good reasons to take a break if you are to remain efficient

Deep involvement in a project, problem solving or coaching really drains one’s energy. A periodic break is therefore mandatory in order to remain efficient. Here are four good reasons for it.

1. Recharge

Everyone needs a breather now and then. The tenser the situation, the more the break is needed.

Getting away some time from a project or an assignment helps recharging, gathering new energy and keeping fresh and motivated.

It does not have to be long but long enough to get the feeling of a real break. An extra day or two right before or immediately after a weekend for example can be good.

Taking a break doesn’t mean take holidays. Working on something else or seeing something else for a short period is usually enough.

2. Get rid of mental clutter

Taking a break is also an opportunity to get rid of mental clutter accumulated during the deep dive into the project or problem solving.

Often one just gets caught in a vicious circle, spinning around with a problem and not finding out.

Take a break.

When coming back, the brain is like reset and the mental cache emptied, ready to process new data or analyze differently.

3. Avoid complacency

Staying too long on the same subject may end up with complacency. After a while, abnormal conditions seem less shocking, ways are found to work around blockades rather than removing them and so on.

A breather helps to stay sharp, critical and to avoid complacency.

4. Look at the broader picture

Finally, stepping back simply helps to look at the broader picture. It’s easy to get drawn down into details and losing Sight of the Goal, of what is important.

Bandeau_CH12If you like this post, share it!

View Christian HOHMANN's profile on LinkedIn

One hour of Theory of Constraints experience, first hand

I was fortunate to “participate” in Philip Marris’ interview by Clarke Ching for Clarke’s podcast series. While making sure all the video and sound was properly taped, I listened to the talk. Latter, editing the video, I had more listening. Now that Clarke made the video public, it’s time to share:

Clarke chats with Philip Marris: Advanced thinking & Back to TOC Basics from Clarke Ching