Rubercubic

Generative AI Engineer

As a Generative AI Engineer at rubercubic.com, I lead the development of full-stack features, including dynamic AI-driven capabilities that leverage LangChain, RAG, and Streamlit for code generation of ad hoc queries from human language to code and/or SQL and the Analytics Chart necessary.

Agentic AI: Developed LLM-powered agents to enable on-demand data analytics and automated code generation, enhancing the platform's core functionalities.
Frontend Development: Build responsive UI features using Next.js/React and TypeScript, ensuring a seamless user experience.
API Development: Focused on creating API endpoints essential for UI components to interact with AI agents, enabling smooth and efficient communication between the front end and AI-powered backend features.
Data Infrastructure: Implemented a robust data infrastructure using Trino, parquet files, and S3 as a data warehouse solution. Designed efficient database schemas in Postgres and Redis to optimize data storage, retrieval, and processing workflows.
Data Scraping: Architected a highly scalable web scraping system to collect and process large volumes of internet data for analysis.
Data Processing: Developed robust pipelines for post-processing and cleaning of scraped data, ensuring high-quality inputs for downstream tasks.
Data Analysis & Visualization: Utilized tools like Matplotlib and Seaborn to perform detailed data analysis and create insightful visualizations.
Machine Learning: Fine-tuned LLM models using data collected from eCommerce sites to enhance the querying capabilities of data warehouse systems. This improved the reliability of query and code generation processes (Python/Polars/Pandas/SQL), leading to more accurate and efficient results.

Results

These efforts led to significant advancements, reducing the time to generate actionable insights by 10000x (many times less than 8 seconds). The fine-tuned LLM models enabled a groundbreaking tool for querying eCommerce data in natural language, making it easier for users to gain insights and improve their business. This innovation drove massive user engagement, and significantly enhanced customer satisfaction. In some cases, the tool boosted our customers' sales by 3x, showcasing its powerful impact on business performance.

Tech Stack:

  • PgVector
  • LangChain
  • Polars
  • Seaborn
  • Streamlit
  • NextJS
  • Postgres
  • Trino
  • S3