Here I capture some of the side-projects that I worked on lately. Currently, these are more of explorations related to deep learning. The hope is to try and understand the state of the technology and better appreciate the nuances involved in building production-ready applications. If interested, I’ve shared the code for some of these projects on GitHub.

Exploring Deep Learning

These are a bunch of projects that I did while I was getting started. It’s pretty remarkable with what one can do just by leveraging the work others have done!

  1. Pet Breed Classifier - GitHub Code
  2. Script Generator [Paused] - GitHub Code
  3. AskPDF- Search & Retrieval using Llama-2 [Paused]

The scope of the above projects grew organically, making me interact with a bunch of other tools and technologies that I had not originally anticipated. Initially I was using HuggingFace Spaces to deploy my apps, but then wanted to host it here on my website. For that, I used Streamlit. Once I had a simple front-end and a working app, I then configured a server using Nginx and hosted my app on AWS EC2 instances. It was a good learning experience, but I realized these apps start requiring more powerful instances pretty quickly. In fact, the AskPDF project (inspired by PrivateGPT), requires GPUs or maybe the Apple M1s for any meaningful performance.

And, that’s the general observation- while deep learning is surpassing human-level performance on a lot of tasks, it is expensive and constrained by compute. There’s one use-case, however, where deep learning models work well on consumer hardware- that’s semantic search. Khoj is one of the interesting open-source projects that is building a product around it. The space is evolving rapidly and it’s going to be interesting to see how performance of local apps improves as consumer hardware becomes more powerful and local models become more intelligent.

Search & Retrieval Use-Case Using LLMs

If you have used LLM applications before, you would know they can hallucinate. And, sometimes, if you don’t know the domain enough, it gets difficult to know if they are hallucinating! Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of LLM applications with facts fetched from external sources. I wanted to have a deeper understanding of complexities involved in building RAG-based LLM applications, and hence decided to try building one from scratch.

This is me trying to build something useful for myself. It is still a work-in-progress though. The performance isn’t the best and has some known issues. Some of it is the LLM itself, but most of it is due to me learning how to build such applications. If interested, you can find the code on GitHub and can read more about it here.