Careers
Join the Team Shaping What’s Next
Open Positions
Fulltime & PermanentData Analyst
Environment : Product & R&D
Location : Remote from India
Roles and Responsibilities:
- Design, build, and maintain scalable data pipelines to collect, clean, transform, and validate structured & unstructured data from multiple internal and external sources.
- Collaborate with data engineers, cybersecurity teams, AI/ML & LLM/Agent teams, and business stakeholders to define data needs and ensure data accuracy, security, and reliability.
- Analyze cybersecurity datasets including SIEM logs, CVEs, vulnerabilities, and asset inventories to identify trends, anomalies, and actionable insights that strengthen security analytics.
- Develop and maintain data models, semantic layers, ontology relationships, lineage documentation, and metadata standards to support governance, interoperability, and transparency.
- Automate data workflows and scheduling using Airflow, Prefect, or Temporal; support CI/CD processes for data including validation, testing, and version control.
- Integrate and manage datasets across OCI and hybrid environments, ensuring secure connectivity via VPNs, private endpoints, and cloud storage services.
- Build and maintain dashboards and BI insights using Power BI, Looker, or Grafana to enable data-driven decisions for leadership and technical teams.
- Support GenAI/LLM initiatives by preparing datasets, curating data pipelines, performing feature engineering, and enriching metadata to enable high-quality model training.
- Contribute to AI-driven analytics (AI for BI) by integrating LLMs and Agent-based workflows with BI tools (Power BI, Looker, Tableau) to automate reporting and deliver deeper insights.
- Continuously improve data quality, documentation, and analytical processes through performance optimization, stakeholder feedback, and best-practice implementation.
Technical Requirements :
- 5+ years of experience in SQL & Python (Pandas / PySpark) for data querying, analysis, transformation, and automation.
- Should have experience working with an AI team and should know how the data are prepared for GenAI solution.
- Should be able to perform end-to-end feature engineering, including extraction, selection, transformation, and enrichment of raw data to make datasets suitable for training ML/GenAI models.
- Strong hands-on experience with ETL & orchestration tools such as Airflow, Prefect, or Temporal to build robust data workflows.
- Expertise in data modeling, cleaning, transformation, and handling structured/unstructured formats, including JSON schema understanding.
- Good understanding of cybersecurity datasets such as SIEM logs, CVEs, vulnerability & asset inventory data.
- Experience with OCI or similar cloud platforms, including Object Storage, Autonomous DB, Logging, hybrid integrations via VPN & private endpoints.
- Knowledge of semantic modeling & ontology-based data relationships, metadata management, and lineage tracking.
- Experience working in AI-enabled analytics environments (e.g., AI for BI), integrating GenAI/LLM-based Agents with BI tools such as Power BI, Looker, or Tableau to automate insight generation, reporting workflows, and data-driven decision support.
- Hands-on with visualization tools (Power BI / Looker / Grafana), delivering performance dashboards and insights.
- Experience working on at least one production-grade GenAI/LLM solution (e.g., RAG, AI Agents, automated workflows) and collaboration with AI teams for data readiness.
- Familiarity with CI/CD for data pipelines, including testing, validation, and version control tools (e.g., Git, DVC), and experience with Agentic workflows.
Fulltime & Permanent Senior Data Scientist (Agentic AI & On-Prem Deployment)
Environment : Product & R&D
Location : Remote from India
Roles and Responsibilities:
- Design and implement agentic AI systems using frameworks such as LangChain, CrewAI, Autogen, and LlamaIndex for multi-agent reasoning, task orchestration, and automation.
- Build, optimize, and maintain RAG (Retrieval-Augmented Generation) pipelines for contextual enterprise intelligence and correlation across structured/unstructured data.
- Deploy, manage, and scale LLM/SLM models in enterprise on-prem or hybrid environments using vLLM, Triton, Kubernetes.
- Perform model training, fine-tuning, and evaluation of Transformer-based models for enterprise & cybersecurity use cases.
- Develop and maintain vector database integrations (FAISS, Milvus, Pinecone, etc.) for embedding-based search and retrieval.
- Build end-to-end agentic orchestration workflows using Temporal or similar orchestration frameworks.
- Integrate GenAI systems with cybersecurity workflows e.g., threat detection, incident triage, vulnerability scoring, risk analytics.
- Work with CodeAct/ReACT patterns and MCP-based connectors for contextual agent reasoning.
- Collaborate with MLOps/DevOps teams to build CI/CD workflows for ML using Docker, Helm, ArgoCD, GitHub Actions.
- Implement secure AI pipelines with access control, auditing, and policy enforcement.
- Monitor and optimize production models, including quantization (INT4/8), latency tuning, and hardware utilization.
- Contribute to infra automation using Terraform & Helm charts for reproducible deployments.
- Work closely with product, security, and data engineering teams to convert research prototypes into reliable production-grade AI systems.
- Understand & coordinate data ingestion and ETL workflows; work collaboratively with data engineering teams.
- Build reusable accelerators, POCs, research pipelines, and scale them to production.
Required Skills & Experience
- Overall 8+ years experience and minimum 2.5+ years hands-on experience building and deploying GenAI solutions in production.
- Strong programming experience in Python, PyTorch, and Transformer architectures; hands-on with HuggingFace libraries.
- Should have proven experience in building multi-agent AI applications using CrewAI, LangChain, LlamaIndex, or Autogen.
- Proven experience in deploying LLMs/SLMs in on-prem / hybrid setups using vLLM, Triton.
- Strong background in RAG pipelines, embeddings, vector DBs (FAISS/Milvus/Pinecone), and context retrieval.
- Experience performing fine-tuning / supervised training on LLMs/SLMs with domain data.
- Familiarity with MCP connectors, ReACT/CodeAct frameworks for reasoning.
- Hands-on MLOps/DevOps: Docker, Helm, GitHub Actions, ArgoCD, Terraform.
- Experience implementing CI/CD for ML including validation, deployment & rollback.
- Understanding of ETL concepts & data modeling; ability to collaborate with data engineering teams.
- Experience contributing to production-ready agentic AI workflows using orchestration frameworks (Temporal preferred).
- Exposure to cybersecurity AI workflows (optional but preferred).
- Experience in quantization (INT4/INT8) & performance optimization.
