Jason Wang
Harvard '25 CS + Math
AI/ML Researcher + Full-Stack SWE
About
Hi there! I'm Jason Wang, an AI/ML researcher and full-stack SWE. I'm both a scientist and an engineer—I love concocting novel ML algorithms and I'm also zealous about building user-friendly platforms that turn ideas into productive, visible impact. I'm looking to solve challenging interdisciplinary problems where I can have stimulating collaboration in a tight-knit team environment. If that sounds like your team, message me! I'm always seeking to explore new opportunities!
Currently, I'm a senior at Harvard studying CS and math with a concurrent masters in CS. This past summer, I worked at Optiver as a High-Frequency Trading Research Intern in Austin, TX. The summer before that, I was a data science intern at MITRE where I developed ML algorithms for cybersecurity applications. This position was especially exciting to me because of the interdiscplinary nature of the job, where I get to consult with and learn from cybersecurity experts all working on the critically important mission of securing our nation.
I've previously worked with tech startups using machine learning to power various automations, namely in the smart agriculture sector with Orchard Robotics and the education sector with ThinkCERCA. I thrive in building user-oriented platforms—often times from scratch—that are intuitive, deceptively simple, and highly adaptable for the future.
I have a voracious appetite for different research topics in AI, but so far I've had the opportunity to work in data privacy and interpretability of LLMs under Dr. Seth Neel at Harvard's Secure and Fair Machine Learning Lab (SAFR ML), geometric deep learning under Dr. Melanie Weber at the Geometric Machine Learning Group, and in the intersection of LLMs and geometric deep learning under Dr. Chang-Tien Lu and Dr. Kaiqun Fu at Virginia Tech's Sanghani Center for AI and Data Analytics.
Outside of research, I was the Co-President of Harvard's principal AI/ML club, the Harvard Undergraduate Machine Intelligence Community (HUMIC) which gives over 100 Harvard students an in-depth tour of state-of-the-art deep learning from natural language processing to computer vision to reinforcement learning and more. In addition, we host a project incubator that has jump-started 6 student led group projects. Separately, I have an interest in art-tech and generative AI—I was the software lead for a public art installation called Musical Chairs for Harvard's ARTS FIRST Festival and the SEAS Design Fair. You can read more about this generative music installation here.
I live in Northern Virginia where I enjoy long walks into the neighborhood forest, reading on my couch, studying maps, playing violin and piano, vibing to life, and pondering life's deepest questions.
Experience
High-Frequency Trading Research at Optiver
I interned on Optiver's Delta-One high-frequency trading team in Austin, TX! I researched 40 well-motivated hypotheses to identify and trade new profitable products from historical data. In addition to ideating, simulating, and executing my own trading strategies, I was also able to actively click-trade on real markets.
SAFR AI Lab
As an undergraduate researcher at Prof. Seth Neel's SAFR AI Lab at Harvard, I red-team large language models for data privacy vulnerabilities. My co-authors and I created a new adversarial attack that is able to tell training data membership with true positive rate 2x baseline methods in low false positive rate regimes. Through this experience, I have gained invaluable experience working hands-on with training and deconstructing 12B+ parameter lanaguage models in multi-GPU distributed computing environments. I also learned how to package a large Python library and write Sphinx documentation.
Geometric Machine Learning Group
As an undergraduate researcher at Prof. Melanie Weber's Geometric Machine Learning Group at Harvard, I have been investigating methods to estimate the geometric properties of data manifolds (such as intrinsic dimension, curvature, and other quantities) to better understand the learning-theoretic implications for neural networks. Additionally, a second research project I am working on seeks to extend algorithmic fairness for graph machine learning and the various dimensions in which the topology of a graph may induce bias.
MITRE Center for Securing the Homeland
As a data science intern for MITRE's Summer of Discovery program, I was immersed in an intern cohort spanning a range of discplines from cybersecurity to aerospace to biochemistry and got to upskill in the embedded security learning track, with a culminating Capture The Flag hackathon. My day-to-day work involved developing AI algorithms and data visualizations using Dash for cybersecurity applications.
Harvard Undergraduate Machine Intelligence Community
As the Co-President of Harvard's principal undergraduate AI/ML club, I led the Future of Intelligence Fellowship which teaches over 100 Harvard students every semester for an 11 week crash course that gives a rigorous treatment of deep learning. In addition, we foster community through “mini-batches”, or small mentor groups lead by the board that encourage and facilitate discussion about AI and intelligence in general. Throughout the semester, we invite AI luminaries from both academia and industry to provide valuable advice and networking opportunities for fellow students. In addition, I orchestrated the AI Project Incubator, which is a semester-long AI project accelerator that saw 6 student-led group projects in its inaugural season.
Conflux Art Tech: Musical Chairs
I'm also very interested in the intersection of AI with art. In particular, the promise of generative AI and AI-assisted artistic creation is particularly compelling. The spring of my sophomore year, I was the software lead for Musical Chairs, a public art installation for Harvard's ARTS FIRST Festival consisting of three handcrafted chairs with bone conduction speakers. The artistic vision is that the objects around us are also participants to the conversations we hold, and so these musical chairs respond in kind with music of their own in a culmination of the collective conversation. My job was to create real-time signal processing powered by ML (e.g., OpenAI Whisper) to translate the conversation of those seated in these musical chairs into expression parameters for the algorithmically generated music emanating from these chairs.
Orchard Robotics FruitScope
As the lead software engineer, I planned and built the logic for Orchard Robotics' Fruitscope service, where a virtual orchard is reconstructed from scans of fruit trees with fruit-by-fruit, tree-by-tree data analytics. To do this, I employed the latest technologies in deep computer vision and stereo SLAM combined with AWS cloud services. Furthermore, I developed a full-stack web application (React/Flask/MongoDB) with tools that supplemented the data collection process and served as a valuable geospatial visualization product for customers.
ThinkCERCA Automated Assistance
As a machine learning consultant, I drove the development of automated assistance technology for ThinkCERCA's argumentative writing platform. I researched large language models for semi-supervised clause tagging and rubric scoring, achieving at least 80% accuracy on each rubric category for automated sentence highlighting.
Augmentation of Chinese Character Representations with Compositional Graph Learning
Ever since I was little, I have been fascinated with the compositional nature of Chinese characters. In this AAAI-22 research paper, I investigated how semantic components of these characters could aid BERT language embeddings. Specifically, I implemented a deep graph learning method on a compact, graph-based representation of Chinese characters, allowing us exploit temporal information within the strict stroke order used in writing characters. We showed that this markedly improved interpretability of Chinese BERT embeddings.
SOSNet: A Graph Convolutional Network Approach to Fine-Grained Cyberbullying Detection
Cyberbullying is one of the most prevalent problems plaguing teenagers, in a world where mental health is (and should be) increasingly put under the spotlight. In this IEEE Big Data 2020 paper, I developed a two-stage deep learning model (SOSNet) that detects cyberbullying tweets. First, I improved a social media data mining technique called Dynamic Query Expansion (DQE) to enable semi-supervised online generation of specific types of cyberbullying tweets. Second, I achieved 92.7% accuracy in the fine-grained classification of cyberbullying tweets using a graph convolutional network classifier.