• Insights
    • Webinars
    • Blog
    • Perspectives
    • Toolboxes
    • Events
    • Awards
  • Offering
    • Academy
      • Innovation courses
      • Innovation programs
    • Consulting
      • Onsite consultants
      • Project management
    • Services
      • Scout it
        • Business & strategy
        • Technology management
      • Invent it
        • Proposition management
        • Feasibility & IP development
      • Create it
        • Strategic design
        • Demonstrators & prototypes
      • Scale it
        • Go-to-market
        • Industrialization & production
    • Funding
      • Innovation grants
      • Seed capital
      • Subsidy applications
    • Solutions
      • MyStartUp
      • MyStartUp Portfolio
      • MyInnovationFactory
      • MyInnovationTalent
      • MyFutureProduct
  • Markets
    • Smart Space & Security
    • Smart FMCG
    • Smart Life Sciences
    • Smart Industry
  • Capabilities
    • Strategic Innovation
      • OpenLab
      • DesignLab
      • InnoLab
    • Digital innovation
      • AILab
      • DigitalLab
      • EmbeddedLab
    • Product innovation
      • MechLab
      • PhysicsLab
      • FabLab
    • High-tech innovation
      • OpticsLab
  • Technologies
    • Technology portfolio
    • IoT & sensors
    • AI & data science
    • Robotics & autonomy
    • Cooling, heating & fluidics
  • About
    • Our story
    • News
  • Jobs
  • Contact
Verhaert Masters in InnovationVerhaert Masters in Innovation
Verhaert Masters in InnovationVerhaert Masters in Innovation
  • Insights
        • Insights

        • Perspectives
        • Blog
        • Webinars
        • Toolboxes
        • Awards
  • Offering
        • Services

        • Consulting

        • Academy

        • Funding

        • Solutions

        • Scout it
          • Business & strategy
          • Technology management
        • Invent it
          • Proposition management
          • IP management & feasibility
        • Create it
          • Strategic design
          • Demonstrators & prototypes
        • Scale it
          • Go-to-market
          • Industrialization & production
        • Onsite consultants
        • Project management
        • Innovation courses
        • Innovation programs
        • Subsidy applications
        • Seed capital
        • Innovation grants
        • MyStartUp
          • Boost your venture
          • Find your incubation program
          • MyStartUp Portfolio
        • MyInnovationFactory
          • Adjacent innovation for corporates
        • MyInnovationTalent
          • Boost your innovation capacity with on-site consultants
        • MyFutureProduct
          • Make your product smart & future-proof
  • Markets
        • Smart Space & Security

        • Microgravity
          Earth observation
          Navigation
          Exploration
          Security
        • Smart FMCG

        • Dispensers
          Cooling & heating
          Servers
          Smart packaging
          Vending equipment
        • Smart Life Sciences

        • MedTech
          BioTech
          HealthTech
          Ophthalmic
        • Smart Industry

        • Mobility & logistics
          Chemical & material
          Home, building & construction
          Manufacturing & equipment
          Energy
  • Capabilities
        • Strategic innovation

        • OpenLab
        • DesignLab
        • InnoLab
        • Digital innovation

        • DigitalLab
        • AILab
        • EmbeddedLab
        • Product innovation

        • MechLab
        • PhysicsLab
        • FabLab
        • High-tech innovation

        • OpticsLab
  • Technologies
        • Technologies

        • IoT & sensors
        • AI & data science
        • Robotics & autonomy
        • Cooling, heating & fluidics
        • Portfolio

  • About
        • About

        • News
        • Our story
  • Jobs
  • Contact

ML’s elephant in the room: data labeling

13 September 2022 Posted by Niels Verleysen Digital innovation

From healthcare and manufacturing to space and marketing, machine learning proves to be a great tool to reduce costs, save time, and increase revenue. Managing this process, however, will prove one of the main challenges for businesses in the years to come. Once you’ve identified machine learning as your AI opportunity, there are two primary building blocks for building this model: data and – often overlooked – data labels. Labeling those datasets might be a lot trickier than you thought though. Here are some tips to navigate this challenge.

ML and data labeling

Collecting datasets

In our previous blog, we defined different steps to discover your AI opportunities. Once you’ve identified the process you want to automate and the information you hope to obtain, you’ll need data to feed the model. These are the camera images, audio signals, text messages, or sensor measurements the model will analyze to provide you with answers to your questions. Whether you are looking to predict the stock market or develop a medical application, having low-quality, biased or unreliable data makes the task impossible. Take for example a study on blood oxygenation levels that fails to consider the difference in sensor response of the pulse oximeter between patients with different skin colors. This would significantly reduce the probability of detecting occult hypoxemia in black patients compared to white patients.

Problem understanding is indispensable to producing a valuable dataset. Your team should understand the variability relevant to defining the problem in practice. Often, people tend to overly bias the dataset toward the most accessible data. A self-driving car whose algorithms are trained only on roads the developers happen to travel regularly is not robust. Not entirely unlike humans, ML algorithms might find it challenging to assess unknown situations. For Machine Learning models, this results in unpredictable model outcomes because machine learning models can’t learn outside the data. So high volumes of information gathered in various circumstances are crucial to developing a trustworthy algorithm.

Finetuning the labeling process

The importance of data as a crucial building block in a machine learning project is gaining recognition. However, apart from raw, high-quality data, a machine learning project is built upon the data labels. They’re the ground truth of your model and represent the outcome your model should output. Think of it like this, a parent won’t just point at items to show their baby, they will also say the name of the item. This way the baby will learn to recognize and name these items in its surroundings. With a machine learning algorithm, this is exactly the same.

Obtaining labels can be complex and labor-intensive. Machine vision problems often require manual labeling for specific objects in each image. Depending on the application, the human labelers must have the appropriate qualifications to label medical scans, images of technical defects or any other specific image type.

Some things to consider during the labeling process:

  • Different labeling requirements come at different prices. Only requesting a classification label for the complete image is a tenth of the cost of delineating all instances in the picture. The figure below illustrates different labeling approaches.
  • While developing the model, it pays off to evaluate the current weaknesses so you know which labels you need to improve. Knowing what the model struggles with allows you to maximize the return on new data.
  • When you outsource the labeling task to specialized companies, these are critical suppliers. Your team should treat them as such. You should monitor their results adequately. Too often, the perceived simplicity of the task makes people forget to define strict and well-thought-out quality metrics on the results.

Illustration of different label types. Point annotation (top left) costs less than full mask labelling (bottom right).
Squiggles (top right) and bounding box annotation are in between these extremes. (Source)

Maximizing the return

A dataset of delineated images is necessary to build a model to delineate objects. Currently, techniques are being developed to train models based on weakly supervised data. These techniques aim to use latent information in cheaper, low-information labels to prepare models for high-information output. In the classical approach, models require the same level of information in the labels and the desired result. This is expensive, so you’ll need a human to provide you with ‘examples’ of this valuable output.

Whether you are building an algorithm to read text documents or you are building a self-driving car, the message is the same. You don’t just need data, you need a high volume of qualitative data in all relevant circumstances and you should definitely not forget to gather qualitative labels. Do this and you’ll be one step closer to the optimal solution for your next ML project. Interested in learning more? Subscribe to our AI blog mail or visit the AILab page.

This article was co-written by Jan Alexander.

Tags: Artificial intelligenceMachine & deep learning
Share
4

About Niels Verleysen

This author hasn't written their bio yet.
Niels Verleysen has contributed 9 entries to our website, so far.View entries by Niels Verleysen

You also might be interested in

Featured image - Smart medical blog

Say ‘Hi’ to medical AI – Step 2: the solution space

Jun 23, 2021

After problem statement, it’s time to create a ‘solution space’. In this step you look for a methodology, search for the right project elements and define the steps and needs of your development process.

Featured image - Autonomous shipping

AI challenges for autonomous shipping

Dec 4, 2020

Autonomous shipping is becoming a trending topic. What are the AI technological building blocks able to do?

The ultimate roadmap for self-learning products

The ultimate roadmap for self-learning products

Jun 13, 2022

Did you ever wonder how some intelligent products are incredibly good at certain tasks? Then you're at the right place. In this blog we're diving into the world of continuous learning.

Like this blog? Subscribe to the blogmail and don't miss any content!
Latest digital innovation blogposts
  • 20/03/2023
    Unlocking next-gen FMCG appliances with AI
  • 08/02/2023
    Robotics reinvented: a beginner’s guide to ROS
  • 12/01/2023
    AI, the rising star in earth observation
  • 16/12/2022
    The future of energy-efficient AI systems
  • 07/12/2022
    Maximizing customer engagement: AI and data in FMCG

Verhaert Masters in Innovation is a pioneering innovation group helping companies and entrepreneurs to innovate, creating new products, businesses and services.

Verhaert icon LinkedIn Verhaert icon Facebook Verhaert icon SlideShare Verhaert icon YouTube Verhaert icon Twitter

SERVICES
FUNDING
SOLUTIONS
MARKETS
CAPABILITIES
TECHNOLOGY
PERSPECTIVES
BLOGS
WEBINARS
ABOUT
NEWS
JOBS
CONTACT

© 2023 Verhaert New Products & Services NV • BE 0439.039.420 • Privacy policy • Terms of use

Prev Next