Client: Stealth startup building a roadmap on machine learning based brain emulation as a path to AI safety. The company collects large-scale neuroscientific data sets to train machine learning based brain emulations. They believe it is possible to scale this technology in a safe, secure and trustworthy manner in the next decade and empower humanity in unprecedented ways. Role 1: Machine Learning Research Engineer: Collaborating with a diverse team, including product managers, researchers, and engineering departments, your role involves conducting research on the application of cutting-edge of ML technologies to large-scale neuro datasets and transforming these insights into scalable, production-ready solutions. • Design, train, and fine-tune transformer-based ML models and systems, ensuring their applicability and effectiveness in neuroscience. • Develop and maintain production-grade ML systems, ensuring their scalability, efficiency, and reliability. • Implement benchmarks that evaluate quality, safety, security, and trustworthiness in ML models and systems developed. • Work in tandem with cross-functional teams, including product development and data infrastructure • Engage in collaborative research efforts to explore new ML architectures, including image and video transformer models and multimodal systems. • Contribute to the creation of state-of-the-art (SOTA) foundation models for both invasive and non-invasive neuroscientific datasets. • Skills they're looking for: • Demonstrated exceptional ability (3-5+ years) in ML engineering, particularly with PyTorch, including hands-on experience with training and fine-tuning transformer-based machine learning models. • Demonstrated capability in developing production-level machine learning systems. • Any of the following • Experience with image and video transformer models. • Expertise in training multimodal models and experimenting with novel architectures. • Experience with applying machine learning techniques to neuroscientific datasets • Previous work on scaling laws for modalities Role 2: Data Acquisition & Infrastructure Engineer: The data infrastructure engineer is responsible for the setup and maintenance of systems capable of handling large amounts of neuroscientific data. You will be collaborating closely with the human / animal brain data acquisition and AI engineering teams, building the interface between data-acquisition and our machine learning models. • Develop and deploy highly scalable distributed systems capable of handling petabytes of data. • Own and lead engineering projects in the area of data acquisition including initial web crawling and data ingestion from our experiments • Ensure smooth data flow and system operability. • Architect and implement algorithms for data indexing and search capabilities. • Work with the legal team to handle any compliance or data privacy-related matters. • Build and maintain backend services for data storage, including work with key-value databases and synchronization. • Deploy and perform routine system checks. • Conduct and analyze experiments on data to provide insights into system performance. • Skills they're looking for: • 5+ years of industry experience in software development and exceptional ability. • Experience with large web crawlers • Strong expertise in large stateful distributed systems and data processing. • Proficiency in containerized applications across multiple hosts, and Infrastructure-as-Code concepts. • Demonstrated experience with distributed computing frameworks (e.g., Hadoop, Spark) and cloud services (e.g., AWS, GCP, Azure). • Strong understanding of database technologies (SQL and NoSQL), data warehousing, and ETL processes. • Ideally experience with the different types of neuroscientific datasets.
Technical expertise (see above)
4-12 week project with support starting as soon as possible
Remote but client is US (Bay Area) based so nearby time zones helps but is not critical
Upper end of their budget is $150/hour. I've informed the client that this is quite low. Unclear how much they're able to flex since they're very early