🎉 Seed RoundWe've raised 2.5M from Swift, YC, & Chapter One. Read more

Learning - 2023-02-21

What's unstructured data and how can I use it?

Rock band

Rock band

If you read our previous post on vector databases, we dropped a pretty eye-popping stat that may have stuck with you for a bit:

The IDC estimates that unstructured data will account for 80-90% of all data growth by 2025. Images, tweets, audio files, videos and more are proliferating at an exponential rate. And there are no signs of it stopping.

Yes you read that right. For every byte of data generated over the next three years, 80-90% of it will be unstructured. Crazier still, the amount of data generated in the next three years will be greater than the amount created in the last thirty.

This is scary fast even by the technology industry's standards. So it's not surprising why most organizations don't have a good grip on what all this means and how to prepare. But don't worry! Metal is here to help 😈

Thinking outside the box with unstructured data

For decades, structured data is what businesses have used to make decisions. This is data with a well-defined format – such as tables with rows and columns. Product catalogues, financial transactions or inventory records are all good examples of structured data.

Unstructured data is different. The format varies and it doesn't follow clear patterns. It's difficult to make sense of in a spreadsheet. Text documents, audio files, social media posts, videos, or even entire websites fall into this category. But whoever said data needs to be in a spreadsheet in order to be useful? We just need a different tool and a slight shift in mindset.

Different data requires different tools

Imagine you run a cleaning service at a hotel. One day you're reading through customer issues and notice a handful of complaints that have no text description – only images. To your horror these images show dirty carpets, bathrooms and beds. Not good!

Each image has an issue that your business needs to address. But putting them into a spreadsheet would not give you any additional insight. Instead, what you really need to know is the issue category for each image (e.g. carpet, bathroom or bed) in order to coordinate a cleaning crew response.

This is where technologies like embeddings and vector databases shine. They're built to make your unstructured data useful.

In the case of the cleaning service, you can use a platform like Metal to build an application to do the following:

  1. Turn every image into an embedding
  2. Classify each embedding with pre-defined categories – carpet, bathroom, bed
  3. Allow users to search and return issues by image category

Sounds like magic? That's because it is. Well not literally, but we think it's pretty close.

Let's make things a little more complicated

To review, embeddings are just representations of real world things made into computer readable numbers. Once created, embeddings are stored as vectors in a vector database. And once embeddings are in a vector database they can be queried for a variety of tasks.

Now let's say you're the head of marketing at a fast growing startup. The day is about to wrap up when your CEO sends you a customer review that is...less than favorable. "Do other customers feel this way?" your CEO asks. There are thousands of reviews to parse through, how can you make sense of them all?

Don't panic! You can answer your CEO's question using embeddings like so:

  1. Create embeddings which capture the semantic meaning of the reviews (semantic meaning is learned by comparing each review to another in a vector space)
  2. Feed these embeddings into another machine learning model that is trained to predict sentiment – e.g. positive, negative, neutral
  3. Cluster and present reviews by sentiment

These results would look something like this:

chart of sentiment clusters

chart of sentiment clusters

What else can I use my unstructured data for?

To keep things short, a lot. This is why embeddings are such an exciting and useful technology.

You can use embeddings to create recommendation engines, detect anomalies in data, translate language and much more. And new machine learning models are being created every day which will unlock more use cases for your data.

If you want to learn more about what you can do with your unstructured data, then get started with Metal today! 🤘