Exhibitor News

Subpage Hero

Subpage Hero


27 Jul 2022

Training Data: The Overlooked Problem Of Modern AI

Toloka Stand: Q45
Training Data: The Overlooked Problem Of Modern AI

The AI market is booming, with new startups raising millions of investments in AI every day. For example, investors poured nearly $18 billion into AI in Q3 2021—three times as much as in Q1 2020. This growth is fueled by the development of cloud solutions and open-source machine learning models that have made AI technologies more accessible to many players in the market, with Brooking Institute writing that “open source software quietly affects nearly every issue in AI.”

Indeed, AI stands on three key pillars: algorithms, hardware and data. You collect large amounts of data, then using the methods of machine learning, algorithms learn to find inter-dependencies among these pieces of data and then reproduce this logic on every new piece of data they meet. This is what is now called AI (artificial intelligence).

AI From Nefertiti To Alexa

Learning from data isn’t new. Ancient Egyptians used long-term observations to predict the level of water in the Nile river. It means they were into something we would today call statistical predictive models.

The era of modern AI started with the rise of big data. Once you have large amounts of logged structured data—be it clicks on the products in an online store, time spent on a certain webpage in a browser, or percentage of paid credits in a bank—data science steps in. Building models to predict outcomes like loan return rate or success of an ad campaign becomes a standard task for a data science team.

However, in reality, the data is often either not structured or, even worse, does not exist at all. For example, a self-driving car will only be able to detect pedestrians in the street after the model has been fed with thousands of examples, including images of the streets with every pedestrian carefully highlighted and labeled.

Further, a search engine will only learn how to rank the most relevant sites on top after “seeing” millions of pairs matching user queries and web pages documents, judged by the relevance of the match.

Meanwhile, a voice assistant will only learn to correctly activate after the model analyses thousands of hours of speech recordings made by different voices and accents amidst surrounding noises.

And a brand new AI-powered app will only be able to recommend you the trendiest outfit if it is trained on a vast and up-to-date dataset of the trendiest outfits. And if the creators of the App fail to update their dataset every season, before long, it will be suggesting something that had gone out of fashion seasons ago.

All the magic and power of artificial intelligence has a natural glass ceiling—and that ceiling is data.


Is It Really Artificial?

The irony is that artificial intelligence is neither truly intelligent nor truly artificial. For one, it is heavily dependent on human efforts. In all the above-mentioned cases, the first thing you need is the effort of a human being. Interestingly, even with the rise of new self-supervised learning approaches, the need for human-powered data labeling only continues to grow: You still need data to fine-tune and validate automatically generated solutions.

It All Starts With A Dataset

With other components of AI equally available for all the players on the market, it is data that really makes your AI solution stand out from the competition. You need to be able to get unique data, label it in the most time- and cost-effective way and keep the solution regularly monitored after being deployed to production. Therefore, those who can set up regular processes of validating and updating their solutions based on real-life data get a more reliable solution.

Yet, for some reason, the importance of data labeling had been hugely underestimated and treated as a nontechnological, ineffective, and boring management task. As a result, even the most tech-heavy companies have outsourced data labeling solutions to nontech third-party vendors, according to data from our company’s survey.

Data Labeling Of The New Generation

It is only recently with the boom of AI in traditionally offline industries (such as retail or agrotech or healthcare), and the increasing need in human-powered training data on a large scale, that the industry started to seek new ways of solving the old problem. That is why in recent years, we’ve seen a series of unicorns in the data labeling domain. These solutions treat data production as part of an automated technological process with the goal of delivering training datasets for AI in the most advanced way possible.

Key Takeaways

Labeling data is an important part of the machine learning production process. It can be treated as an engineering and mathematical task that can be solved through technological means. Automation is an important aspect of data labeling, and it can be accomplished through a combination of human and machine efforts.

Link to the original version



View all Exhibitor News


Platinum Sponsor

  • IBM

Theatre Sponsors

  • Datastax
  • Intersystems
  • Aerospike
  • Denodo


Gold Sponsor

  • Acquia
  • ataccama
  • Denodo

Gold Sponsors

  • Aerospike
  • Snowflake
  • WhereScape
  • mparticle
  • Tealium

Silver Sponsors

  • Billigence
  • Coursera
  • Confluent
  • Collibra
  • DataIku
  • Intersystems

Silver Sponsors

  • FanRuan
  • Imply
  • Sisense
  • SNP
  • Srijan
  • TigerGraph
  • Zoho

Silver Sponsors

  • Vertica
  • Singlestore
  • Crayon
  • Fusionex
  • EnterpriseDB

Silver Sponsors

  • Meiro
  • PartityBit
  • SingleStore
  • Transwarp
  • Tripleblind

Bronze Sponsors

  • Equilibrium
  • Happymeter
  • H2O
  • Lynx Analytics

Bronze Sponsors

  • Fivetran
  • Hazelcast
  • Opsolutions
  • Ravenpack
  • Sqream
  • 9x5 Consulting

Bronze Sponsors

  • Opentext
  • Payoda
  • ProvaLabs
  • Search Guard
  • Seoul-Techno-Holdings

Bronze Sponsors

  • Sofit
  • Theobald


  • BeyondSoft


Knowledge Partner


News Distribution Partner

  • ACN Newswire

Associate Content Partner

  • Uptime Institute

Strategic SEO Partner

  • AdVantage

Training and Education Partner

  • DBL

Strategic Event Partner

  • Frost & Sullivan

She Loves Data Live! Partners

  • She Loves Data Live
  • Coding girls
  • Codette Project
  • DBL

She Loves Data Live! Partners

  • DBS
  • Female Fouders
  • Facebook
  • General Assembly

She Loves Data Live! Partners

  • Grab
  • JetBrains
  • Girls in Tech
  • Meiro

She Loves Data Live! Partners

  • Minerra
  • Oracle
  • Tech Ladies
  • Yellowfin
  • VISA

Event Partners

  • AiSP
  • ARC Advisory
  • Asia Cloud Computing Association (ACCA)
  • Asosiasi Cloud Computing Indonesia
  • Singapore Chamber of E-Commerce
  • Fintech Association of Hong Kong
  • Accelerating Asia

Event Partners

  • Best Practice of eCommerce
  • BigDataX
  • CMO Council
  • European Data Centre Association
  • Co Creation Lab
  • GS1

Event Partners

  • Practical DevSecOps
  • Digital Advertising Association Thailand (DAAT)
  • DevOps Institute
  • Forrester
  • Michael Page

Event Partners

  • IASA
  • IPI Singapore
  • itSMF
  • IFMA Singapore
  • Open Connectivity Foundation
  • La French Tech

Event Partners

  • KinerjaBisa
  • Logistics & Supply Chain Management Society
  • NexChange
  • SG Tech
  • DBS
  • Plug And Play

Event Partners

  • Singapore Cyber Security Consortium (SGCSC)
  • Structure Research
  • General Assembly
  • VISA
  • Smart Asia India
  • ASME

Event Partners

  • Agorize
  • CDPI
  • SheLovesData
  • RavitShow
  • CocrreationLabs
  • Goodbards

Media Partners

  • Asia Blockchain Review
  • Australian Cybersecurity Magazine
  • Australian Security Magazine
  • Asia Research News
  • Cybersec Asia

Media Partners

  • BizClik Media
  • Chief IT
  • CIO Advisor APAC
  • Cross Border Magazine
  • Disruptive Tech Asean
  • Digicon Asia

Media Partners

  • CryptoNewsZ
  • Computer Weekly
  • Cyber Security ASEAN
  • Data Storage ASEAN
  • European Data Centre Association
  • Frontier Enterprise

Media Partners

  • e27
  • Fintech Finance
  • Gigabit
  • Jumpstart Media
  • AI Time Journal
  • GovTech SEA

Media Partners

  • My Security Media
  • Retail CIO Outlook
  • Supply Chain Brain
  • Supply Chain Digital
  • Asia Content News
  • Enterprise Security Magazine

Media Partners

  • Tech Wire Asia
  • TechTarget
  • Telecom Era
  • Wire 19
  • CMO Asia
  • FutureCFO

Media Partners

  • ACN Newswire
  • 万瑞布线网
  • FutureIoT
  • Marketing Ops
  • Payment & Cards Network
  • APAC CIO Outlook
  • FutureCIO
  • Escomedia

Official Partner Hotel

  • Marina Bay Sands

Held In

  • SG

Supported By

  • SG