Author: Jason Corso (Professor of Robotics and EECS at University of Michigan | Co-Founder and Chief Science Officer @ Voxel51)
Our software infrastructure has become the place where AI data and models come together; now we’re building a new science team to lead the charge in principled best practices for AI work
I am hiring AI researchers and scientists to join the new Voxel51 science team! Why?
Founding a company is risky business. It’s risky along many axes. Whether or not your well-defined plan pans out is probably the key axis all founders worry about. But, there are other axes as well, how will board dynamics turn out? What gaps in the early team will need to be filled by the founders? How will each team member unlock their superpower to best contribute to the company?
My experience has been much like this. For twenty years, I’ve emphasized scientific and engineering discovery in my work as an academic researcher, publishing these findings at the top conferences in computer vision, AI, and related fields. Yet, at my company, we focus on infrastructure that enables others to unlock scientific discovery. We have built a software framework that enables its users to do better work when training models and curating datasets with large unstructured, visual data — it’s kind of like a PyTorch++ or a Snowflake for unstructured data. This software stack, called FiftyOne in its single-user open source incarnation and FiftyOne Teams in its collaborative enterprise version, has garnered millions of installations and a vibrant user community.
But, as a laser-focused startup company, we have emphasized the infrastructure work significantly more than any scientific machine learning work. Don’t get me wrong, our years of experience certainly informed the way we built this infrastructure, and the strong foothold we have in the space has demonstrated the value of using FiftyOne. There is indeed a “brain” component of FiftyOne that implements certain capabilities like finding unique samples in a dataset or ranking samples according to the likelihood that there are annotation mistakes in a dataset given a certain model; there are zoos of models and datasets for easy user access and a plethora of science-interested workflows and tutorials. However, for the most part, our priority has been on the infrastructure nature of FiftyOne, building out scalability and flexibility at the core of the tool to allow our users to mold it, like clay, to their needs — we’ve learned that needs vary quite significantly team to team.
Yet, personally working with many hundreds of users, both in industry and academia, has elucidated certain significant questions and variations around best practices in real-world AI/ML development. For example, there has been a huge emphasis on annotation as the first key step in the AI/ML random walk. Yet, many users tell me that significant amounts — as much as 95% — of dollars invested in annotation end up being a waste.
How could this be? Perhaps the real struggle is in understanding the distribution of a dataset for a certain problem. Perhaps the challenge is in figuring out how to first bootstrap a model in a self-supervised way to then automatically determine what subset of the data should be labeled. Perhaps the complexity is in effectively sifting through hordes of unlabeled data to find examples of relevant corner cases. Perhaps something else.
What does this really say about the state of practice in AI? It’s still a quagmire of black magic, anecdotal practices, and wishful workflows. Rather, principled inquiries are needed to shine light on the struggles described above. They’re not going to come from academia — where dataset innovations represent a tiny fraction of overall inquiry. They’re not going to come from annotation companies — conflict of interest, right?, because getting more data annotated isn’t the answer. We’re here to answer this need.
Well, I’m thrilled to announce that Voxel51 is growing our team to tackle new initiatives around the under-investigated scientific aspects of effectively building AI/ML models and the unstructured data that feeds them. This shift comes off the heels of a few years of significant growth in the usage of FiftyOne, giving us conviction that we can invest more in understanding best practices and new capabilities for effective ML work. We’re so serious about this that I transitioned to become the first Chief Science Officer at Voxel51, and we’re hiring our first science team.
Information about the roles is available here. All are encouraged to apply to these exciting, fully remote roles. Your day-to-day work will involve foundational inquiries into unstructured AI/ML to better understand how good models are made, how datasets are structured and samples distributed, and what are best practices in real-world AI/ML teams — through these efforts, we will bridge the gap between the theoretical underpinnings and the concrete realities of machine learning with a goal of demystifying the winning ways. Together, we will write impactful research papers and publish them at top conferences; we will give compelling talks; we will write code to make it easy for others to apply our ideas; and we will have a ton of fun! Our efforts will directly impact the AI/ML work of our tens of thousands of open source users and dozens of Fortune 100 companies.
If you’d like to learn more about this journey, or about this position, please add a comment here, or join me at my weekly office hours. I look forward to talking. Even more so, I look forward to the exciting new journey ahead as Voxel51 adds scientific inquiry to our portfolio!
Acknowledgments
Thank you to my colleagues Jimmy Guerrero, Michelle Brinich, Brian Moore, Jacob Marks, Remy Schor, and Dave Mekelburg for reviewing this blog and providing great, early feedback.
Biography
Jason Corso is Professor of Robotics, Electrical Engineering and Computer Science at the University of Michigan and Co-Founder / Chief Science Officer of the AI startup Voxel51. He received his PhD and MSE degrees at Johns Hopkins University in 2005 and 2002, respectively, and a BS Degree with honors from Loyola University Maryland in 2000, all in Computer Science. He is the recipient of the University of Michigan EECS Outstanding Achievement Award 2018, Google Faculty Research Award 2015, Army Research Office Young Investigator Award 2010, National Science Foundation CAREER award 2009, SUNY Buffalo Young Investigator Award 2011, a member of the 2009 DARPA Computer Science Study Group, and a recipient of the Link Foundation Fellowship in Advanced Simulation and Training 2003. Corso has authored more than 150 peer-reviewed papers and hundreds of thousands of lines of open-source code on topics of his interest including computer vision, robotics, data science, and general computing. He is a member of the AAAI, ACM, MAA and a senior member of the IEEE.
Disclaimer
This article is provided for informational purposes only. It is not to be taken as legal or other advice in any way. The views expressed are those of the author only and not his employer or any other institution. The author does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by the content, errors, or omissions, whether such errors or omissions result from accident, negligence, or any other cause.
Copyright 2024 by Jason J. Corso. All Rights Reserved.
No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law. For permission requests, write to the publisher via direct message on X/Twitter at JasonCorso.
Top comments (2)
Sounds interesting. 🚀 Are you still hiring?
Best bet is to check out: linkedin.com/company/voxel51/jobs/
…and apply for the position that best aligns with your career goals. If it’s a match the internal recruiter will reach out with next steps.