Arvind Rajaraman

I am an engineer on Databricks' Applied AI team, working on LLM post-training, evaluation, and deployment. I am also continuing research at Berkeley Artificial Intelligence Research with Professor Anca Dragan. Broadly, I am interested in building effective learning and reasoning systems, whether that be LLMs, digital agents, or embodied robots.

Industry Experience. I was a Machine Learning Scientist Intern at Atlassian, where I worked on large language model (LLM) infrastructure. In 2022, I interned at Nuro, where I worked on low-latency video streaming and model uncertainty estimation. In 2021, I was at NVIDIA working on vision models for autonomous driving.

Other Experience. I completed my undergrad at UC Berkeley, where I was the Head TA for Berkeley's CS 188 (Artificial Intelligence) and CS 189 (Machine Learning), and on the executive board of Machine Learning at Berkeley (ML@B). I am an Accel Scholar, Conviction Fellow, and part of Berkeley's Management, Entrepreneurship, and Technology (M.E.T.) Program.

Email / CV / Twitter / GitHub / LinkedIn / Devpost

Engineering Experience

My engineering experience is primarily in highly performant systems for machine learning, from autonomous vehicles to more recently LLMs (large language models).

	Databricks (Current) Software Engineer Applied AI Team Applied AI works on LLM {evaluation, post-training, & deployment} for {search, text-to-SQL, code correction, & code generation}. I am involved in technical efforts across the stack.
	Atlassian Machine Learning Scientist Intern Core Machine Learning Team Worked on a search relevance algorithm, RLAIF (reinforcement learning with AI feedback) infrastructure, text-to-SQL, and chatbots for question answering.
	Nuro Software Engineer Intern Fleet Infrastructure Team Worked on video streaming infrastructure, model uncertainty estimation, and auto-labeling for video classification tasks.
	NVIDIA Software Engineer Intern Autonomous Vehicles Division, DriveIX Worked on AutoML for hyperparameter tuning of vision models, increasing data fidelity of vision data, and ML engineering infrastructure.
	Segmed (YC W20) Software Engineer Intern Worked on authentication, authorization, and developer productivity tools.

Research Experience

I am excited by the prospect of embodied robots that can generalize easily to unseen tasks and environments, in order to become widely useful to humans. My research interests include deep reinforcement learning, unsupervised learning, language modeling, and human-robot interaction.

More specifically, I am interested in creating embodied agents that model human learning, effectively representing their goals, intent, and biases. Becuase language is inherently information-dense, abstractable, highly available from a data standpoint, and contains knowledge about usefulness to humans, I am interested in building learning systems that use language to interact with humans, represent knowledge, and plan.

Discovering Skills with Language
Arvind Rajaraman, Vivek Myers, Anca Dragan
Project in progress

Using language to scale unsupervised reinforcement learning and learn skills more useful to humans.

Explicit vs. Implicit Modeling of Human Internal State for Robot Planning
Arvind Rajaraman, Ran (Thomas) Tian, Anca Dragan, Andrea Bajcsy
[Presentation]

A new method for robots to collaborate with humans by co-evolving a sequence model that estimates a human's internal state (with a model-based prior) and a robotic influence policy.

Teaching

Instructors of each course are listed in parantheses.

	CS 189: Introduction to Machine Learning Head Teaching Assistant, Fall 2023 (Jitendra Malik, Jennifer Listgarten) Head Teaching Assistant, Spring 2023 (Jonathan Shewchuk) Teaching Assistant, Fall 2022 (Jitendra Malik, Jennifer Listgarten)
	CS 188: Introduction to Artificial Intelligence Head Teaching Assistant, Summer 2022 (Yanlai Yang, Angela Liu) Teaching Assistant, Spring 2022 (Stuart Russell, Dawn Song)
	CS 70: Discrete Mathematics and Probability Theory Academic Intern, Spring 2021 (Shyam Parekh, Satish Rao)

Selected Side Projects and Open-Source Contributions

Below are a set of selected side projects. To see more, visit my Github and Devpost.

* Indicates equal contribution and co-authorship.

Origin
Best Frontier Tech Hack, Stanford TreeHacks 2023
[Blog Post] [Devpost] [Code] [Tweet]

Built an LLM-based browser extension that cleans up your tabs and builds context-aware workspaces. Won Best Frontier Tech Hack from Pear VC and received an investment offer at a $2.5 million valuation. Also received interest from Sequoia and shout-out from Harrison Chase (creator of LangChain). 70+ stars on GitHub.

Verbal Coding
Winner of Education Track and Best Use of Google Cloud, HackNYU 2019
[Devpost]

Developed a verbal code editor that uses NLP to convert spoken pseudocode into well-formed Python code. Continued work and received mentorship from MIT Professor Kyle Keane.

Some other projects I pursued are below. Any awards won are noted in parantheses.

Ephemeral (Best Use of Together.ai, TreeHacks 2024) - agentic meeting assistant that contributes to meeting conversation and automates mundane tasks.
BiteBuddy (Best Use of Reflex, CalHacks 2023) - meal planner app with social networking integrations.
Unscrambit (First Place, JumpStart Hackathon 2020) - code analysis app that uses NLP to identify common algorithms implemented in one's codebase.
Autodeploy (2023) - developer tool that automatically creates Terraform files using natural language descriptions and analyzing one's codebase.
Crib (2020) - smart lock that uses real-time crime data to automatically lock your front door.
Disperse (2020) - grocery store search app that ranks places in order of least crowded to most crowded. Built during COVID-19 pandemic to decrease infection rates.
NextEniac (2018) - grade calculation and insights tool used by 1,000 students at my high school.
Buzz (2016) - social networking app that makes the shopping experience social.
Formulate (2013) - first substantial coding project, which would solve my Pre-Algebra homework.

Miscellaneous

I was previously the Vice President of Machine Learning at Berkeley (ML@B), which is Berkeley's undergraduate ML group. I taught introductory ML workshops across the Bay Area, ran an internal new member education program, and managed $100,000 of finances.

Below are some links of content I've developed:

CS 198 (Modern Computer Vision and Deep Learning): a course I co-developed and taught in Fall 2022, with 100 students and 200 auditors. Received faculty sponsorship from Stuart Russell
CS 189 (Introduction to Machine Learning) recitation slides
CS 188 (Introduction to Artificial Intelligence) recitation slides
Model-agnostic meta learning (MAML) talk I presented to Berkeley students

Website template from Jon Barron