My Ideas for Shortlist: A Cloud-Native AI-Driven System for Finding an Ideal Home (or something else)

I've decided to create a kubernetes-based backend that uses AI to filter through real estate ads. The existing filtering on real-estate sites helps. You can require a garden, specify the number of bedrooms, the price range and so on. But there's still a lot of work involved in reading descriptions and scrutinising photos of the property. The aim of Shortlist is to automate some of this work.

I want to make it flexible enough to use to filter any stream of text blobs, each with one or more images, such as classified ads or dating profiles.

Motivation

At work, I've been using Kubernetes more closely this year. I've booked a CKAD (Certificated Kubernetes Application Developer) exam to certify my skills. I've done some online training and have a moderate amount of practical experience, but I'm procrastinating about taking the exam. As CKAD is very hands-on, I've been looking for an opportunity to build a real project from scratch.

I'm also keen to play with recent advances in AI. When I trained GANs and RNNs around four years ago, I ran the models in docker containers on an RTX 2070 super gaming rig. This time around I'd rather not buy a shiny new GPU (or two!) and would instead like to try out cloud providers' spot GPU instances and an inexpensive pay-as-you-go way of running AI workloads.

I recently saw that Hashnode is running an AI hackathon. If I can get an MVP ready before the deadline, I'll submit it to AI for Tomorrow.

Finally, I intend to dogfood this system next time I'm hunting for a new flat.

Architecture Sketch

Below is a rough sketch of what the Kubernetes cluster will look like.

My aim with this architecture is to keep things cheap and simple.

For instance, instead of using some kind of message broker to queue and run AI workloads, I want to create Kubernetes Job objects and let Kubernetes

remember the outstanding jobs,
schedule pods onto GPU nodes,
and rerun them if they fail or if a spot VM/node is de-provisioned half way though running them.
Maybe even provision more spot GPU nodes when there are a lot of outstanding jobs.

Services

The bold heading in the diagram above corresponds to services, which I'll outline here.

INGESTERS

Ingesters will be responsible for gathering property Ads or other things, storing any images in the IMAGE STORE and sending the text and references to the images to the AI RUNNER service.

In practice my ingester is likely to be a Kubnernetes Cron Job, running periodic web scraping

However it could could be a Deployment that polls 3rd party services or include a Service object so it can accept requests from other services. These will likely be packaged as Helm charts to allow for this variability.

AI RUNNER

This will accept HTTP requests from INGESTERS and be configured to run appropriate AI JOBS based on these requests.

The message schema accepted by the AI runner is likely to be a string for the main text, a list of images paths and some metadata. This data will be passed on as part of the Job manifest sent to the k8s API server.

The configuration of the AI jobs will likely be set in a ConfigMap. This will require some careful design because it could become brittle and tightly coupled.

AI JOBS

These are going to be containers that run ML models over images and/or text. Multiple models may require multi-container pods to run.

This component is going to require the most experimentation to get working, but the contract is quite simple. It will run some AI inference processes that accept input based on the INGESTER requests of the AI runner. It will either send a request to the NOTIFIER or simply exit when it's finished.

NOTIFIER

Receives a payload from a positive AI JOB and emails the user.

IMAGE STORE

This service would provide a simple way of storing and retrieving binary data. If every service uses this, then images can be referenced by paths or UUIDs by the other services.

This feels a bit over-deigned and would be a single point of failure.

I did consider using a shared library to provide an abstraction over cloud storage. The drawback of a shared library is that it would require that all of the services are written in the same language, whereas I'd like to be free to write some in Python and some in Go for example.

The first iteration of this project may not include image store or image processing at all.

Conclusion

I'd like to say I've got the hard part out of the way, but abstract architecture is the fun part.

I'm planning to use Google Cloud Platform rather than AWS because I use AWS in my job and I'd like to work with another provider. I'm optimistic about Google Kubernetes Engine because Kubernetes is their baby, but I'm sure GCP will put some difficult obstacles in my path.

Writing a web scraper is also an uphill battle. Who in their right mind would volunteer to work against an unstable ever-changing "API"?

The last web scraper project I built was in 2017, before I landed my first engineering job.

This project ran on docker-compose on a single AWS ec2 instance, so I'm glad I've learned somewhat more advanced DevOps since then.