Data Layering — how great models train themselves

5 min readSep 17, 2022

Generic datasets can yield unique results and great AI models.

Data layering for smart, unsupervised ML models

If you are looking for tricks to make your unsupervised models smart and robust, don’t miss Data Layering which is a core of recommendations engines we use in MarieAI (marieai.com) to train intelligent models.

Models in action

Sometimes, when you see recommendations engines in action, you are — probably — wondering why they are do dumb? Is it quality or quantity of data? Maybe not enough parameters tweaking? Maybe you need more mechanical turks and more money to train a good quality, intelligent model.

What if I told you, you can have a small number of samples (let’s say up to 5k) based on generic, widely available metadata and still pull off a model with unique features, grasping fuzzy logic and relations. And what’s most important — it won’t take months to engineer, merely a week or two. How?

Data Layering — make your data smarter

In 2017 I’ve left my game developer career — and focused on developing a novel technique for the Machine Learning of the future.

The way that does not require fancy PhD, terabytes of data and countless hours spent on tuning parameters. It’s simply unsupervised plug (data) and play (use in production). This is how MarieAI.com has been born.

To have that type of AI, you obviously need an unsupervised algorithm where WHAT you put — labels — is crucial as algorithm outputs consistent models based on input. So data representation — labeling — is a key and I want to teach you the technique I used for a new genration recommendations engine.

How to teach AI humour — an example?

Let’s see the final results of a movie recommendation engine that uses this technique. It can handle various types of humour (romcom, satire, slapstick, dark comedy) and other niche sub-genres based on widely available data, everyone has. While Netflix or Amazon recommendations algorithms can quite often let down their users, with data layering you have a very consistent — and intuitive — grasp of generally understood by humans but fuzzy definitions. After all, can your algo “understand” what a slapstick comedy is?

Recommendation engine prototype when fed with just four slapstick comedy examples.

Recommendation engine prototype when fed with just six RomCom examples.

Provide fuzzy data

Can AI even understand fuzziness and nuances of data? Let’s try to use it for actors age as an example (although our engine has plenty more inputs).

Let’s say you want to teach algorithm the dynamics between characters’ age — something that we, humans, can see and associate with certain chemistry on a screen, it needs to be provided as general information. It not only creates better understanding of romantic chemistry between actors on screen but also helps AI to understand other types of movies, where actors are young (teen adventure) or older (ie. movies where old actions stars trying to amaze us once more) or a mix (for example a family movie).

Spin on generic data

Obviously, actors’ age and movie release date are not unique types of data but if you extract their age, during the movie production, and combine with other actors you have a new category of metadata, based on age differences and age brackets. But how to make Data Layering — and avoid maths in the process?

Simple. Below you have an example how you can design abstract metadata based on numerical values, without using numbers — ergo, creating a pool of metadata.

Marked labels that are unlocked by actors age

As you see actors age in various movies unlocks different range of labels and range helps to create overlaps — those overlaps are Data Layering at it’s finest — it create different labels pattern for various movies types, grouping similar together.

Patterns for each movie genre

Within the same category ML algorithms like marieai.com can weight differently movies based on age dynamics alone. This smart way of Data Layering makes intelligent AI models without extra tuning parameters or even knowing how to work with AI. It’s pure data science!

Example how various movie genres collect different labels

Another example. When you compare patterns for age dynamics between movies’ genres (keep in mind I’m simplifying for the sake of argument) unsupervised algos that are designed to spot patterns and anomalies (and MarieAI is) renders in a matter of minutes very smart and capable model, you can use on backend or frontend.

Not so fast

This is obviously just an example for one of many types of labels I have layered but I hope you are now grasping the concept. Using such techniques and unsupervised algos like MarieAI allows to make great AI models, capable of performing data classification, recommendations or predictions without complex maths or even knowing how to build ML models.

Just kidding — it’s fast!

And — in a nutshell — this is my life’s mission. To teach people simple tricks, so they can build smart, intelligent models for their AI and build a better future. No complex maths, no fortunes spent on PhD staff and cloud computing.
You can test your skills — with MarieAI — using this tutorial for example:

https://marcin-rybicki.medium.com/text-classifier-tutorial-for-marieai-com-f5afaec87063

Get hooked on Digital Hippocampus idea

I believe — and I have invented a way to work with it — there is a simple way for our brain to memorise and forget things. I called it Digital Hippocampus and it’s faster and more energy efficient, works for building general AI and we could build chips and memory in the future, so digital devices can work the same way humans think.

https://marcin-rybicki.medium.com/digital-hippocampus-how-to-build-one-c6f4cbed7ab0

About Me

Hi, I’m Marcin. Former game developer and algorithms designer.
Currently working on a novel, general purpose algorithm, you can use offline on any device. My dream is to deliver solution that could be trained, up trained and work as a swarm, solving insanely complex problems.

This is how MarieAI (https://marieai.com/) was born.
Concept of a Digital Hippocampus
Data Layering technique, explained here.

Find me on LinkedIn: https://www.linkedin.com/in/marcin-rybicki-qa/