תיאור המשרה

Description Hybrid | Israel About the Company DriveNets is a leader in large-scale networking solutions for AI infrastructure and service providers. The company's disaggregated networking architecture transforms the economics of large-scale infrastructures while maximizing performance, utilization, and operational efficiency. Its high-performance AI fabric maximizes GPU utilization and accelerates deployments by optimizing the AI stack end-to-end, resulting in higher tokens-per-second and lower cost-per-token. DriveNets' solutions power production networks for global tier-1 operators like AT&T and Comcast, and scale multi-vendor AI infrastructures at foundation model labs, NeoClouds, and enterprises. Responsibilities - Design and develop LLM-powered security features and internal AI tools, including RAG pipelines, multi-agent workflows, and prompt-engineered systems for cybersecurity use cases - Architect and operate multi-agent systems in production — covering orchestration, inter-agent communication, task delegation, and failure handling at scale - Build agent monitoring and observability pipelines, including tracing, drift and failure detection, alerting, and reliability SLA management - Build and maintain scalable MLOps infrastructure — model serving, evaluation frameworks, experiment tracking, and CI/CD for ML - Fine-tune and adapt foundation models on internal datasets such as network telemetry, security logs, and threat intelligence - Establish and champion best practices for model observability, safety, and responsible AI deployment - Stay current with the LLM/GenAI ecosystem and drive continuous improvements to the AI SDLC and AI Research cycle Requirements Technical Skills - 5–8 years of software engineering experience, with 2–3 years focused on AI/ML - Proven experience building and deploying production LLM applications (RAG, agents, tool-use, fine-tuning) - Hands-on experience designing and operating production multi-agent systems - Experience building agent observability and monitoring solutions - Proficiency with LLM orchestration frameworks: LangChain, LangGraph, and/or AWS Bedrock AgentCore - Strong Python programming skills - Experience building and maintaining MLOps pipelines (model serving, eval frameworks, experiment tracking) - Solid understanding of transformers, embeddings, and vector databases - Experience with cloud infrastructure and Kubernetes Soft Skills - Self-driven and proactive — able to establish best practices and drive initiatives independently - Continuous learner who stays current with a rapidly evolving field and translates new knowledge into practical improvements - Strong collaborator who works effectively across R&D and product teams Nice to Have / Advantage - Cybersecurity background (significant advantage) - Networking domain knowledge (SDN, BGP) - Experience with model evaluation methodologies (LLM-as-judge, RAGAS) - Familiarity with Model Context Protocol (MCP) - Background in telecom or enterprise SaaS environments - Publications or open-source contributions in GenAI If your experience is close but doesn't fulfil all requirements, please submit your application. DriveNets is on a mission to build a special company comprised of individuals with different backgrounds, perspectives, and experiences. DriveNets is an equal opportunity employer. We do not discriminate based on upon race, religion, national origin, sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with disability, or other applicable legally protected characteristics. More About DriveNets Based in Israel with extended teams located in the US, Japan, and Romania, DriveNets operations cover more than twelve countries globally. Powering production networks for global tier-1 operators, DriveNets is a leader in large-scale networking solutions for AI infrastructure and service providers. Visit our website to learn more: https://drivenets.com/company/