Hi, I'm Farhan Shaikh

AI/ML | Data Science

Passionate about using AI and machine learning to solve complex problems. I build intelligent, scalable systems by analyzing data and applying deep learning, computer vision, NLP, and predictive analytics.

Profile Picture

Education

Bachelor of Technology in Information Technology

NMIMS University

August 2020 - May 2024

CGPA: 3.42/4.0

Certifications & Badges

Google Advanced Data analytics

Coursera - Google Certification

Jan 2026

HackerRank 5 star SQL

HackerRank

Nov 2025

McKinsey Forward Program

McKinsey & Company

Dec 2025

Featured Projects

Parameter-Efficient Instruction Tuning of Open-source LLM

Designed and implemented a parameter-efficient fine-tuning (PEFT) pipeline for the open-source LLaMA-3.2-1B Instruct model using an Alpaca-format instruction dataset, enabling effective instruction-following adaptation while minimizing GPU memory usage and training cost.

Python Unsloth LLM LoRA

Kidney Tumor Classification System

Addressed the challenge of automated kidney tumour detection from medical images by building an end-to-end CNN based classification system to assist accurate and timely diagnosis, achieving 88.26% validation accuracy under hardware constraints.

Python Tensorflow OpenCV Docker Git AWS

Credit Defaulter Identification

This project develops a machine learning based credit default prediction system to help financial institutions assess borrower risk. After data cleaning, preprocessing, Exploratory Data analysis, and Hyperparameter tuning, the XGBoost classifier achieved the best results, reaching 93.84% accuracy and an AUC score of 0.86.

Python Pandas XGBoost Scikit-learn Plotly

AI Regional Translator

Developed an end-to-end multilingual document translation system using using the llama-4-maverick-17b-instruct model via the Groq API and Streamlit UI, enabling bidirectional translation between Indian regional languages for both PDF and DOCX files.

Python NLP LLM Streamlit

TinyVGG Architecture

Recreated the TinyVGG deep learning architecture using PyTorch and torchvision to perform image classification on the FashionMNIST dataset. Designed and implemented a custom Convolutional Neural Network (CNN) consisting of Conv2d, ReLU, and MaxPool2d layers to extract hierarchical visual features.

Python PyTorch torchvision Matplotlib

New York City Taxi Fare Prediction

Performed extensive data cleaning, preprocessing including handling missing values, applying Min-Max scaling, and using One-Hot Encoding for categorical variables and feature engineering, including time-based and geospatial features (using the Haversine formula for distance).After hyperparameter tuning the XGBoost Regressor outperformed all achieving an RMSE of 3.25 on Kaggle, placing the model in the top 30% of the Kaggle leaderboard.

Python XGBoost Scikit-learn Pandas Matplotlib Seaborn

Work Experience

AI/ML intern

NN & Sons

September 2024 - January 2025

  • Developed an end-to-end multilingual document translation system using using the llama-4-maverick-17b-instruct model via the Groq API and Streamlit UI, enabling bidirectional translation between Indian regional languages for both PDF and DOCX files
  • Implemented a modular NLP pipeline that extracts text from documents, segments content into safe token sized chunks, translates each segment using an LLM, and reconstructs the translated output while preserving the original document structure, saving over 500 hours of manual work.
  • Built a RAG chatbot to answer employee queries related to printing machine errors, troubleshooting steps, and solutions, using product manuals and technical documentation as the knowledge source. used LangChain and ChromaDB that retrieves relevant context from a vector database and generates accurate responses using a large language model. Implemented document loading, text chunking, embedding creation, similarity search, and prompt based response generation to improve factual accuracy and reduce hallucinations.

Senior Analyst

Ernst & Young

July 2024 - September 2025

  • Executed Technology Risk and control testing engagements across global clients, focusing on ITGC, IT application controls, and IPE assessment, including work for one of the world’s largest telecommunications organizations.
  • Performed independent analysis and validation of client data, ensuring data integrity and audit readiness prior to control testing and reporting.
  • Conducted in-depth control reviews within SAP GRC, SAP S/4HANA, and custom-built financial systems, identifying control gaps and recommending remediation aligned with regulatory and compliance frameworks.
  • Delivered clear, actionable insights through client presentations and formal documentation, translating technical findings for both technical and business stakeholders.

Technology Consulatncy Intern

Ernst & Young

Jan 2024 - July 2024

  • Gained hands-on exposure to Global risk methodologies, risk frameworks, and enterprise systems, building a strong foundation in technology risk and compliance.
  • Supported ongoing Technology Risk assessments by assisting engagement teams in control walkthroughs and documentation for telecom and financial services clients.
  • Contributed to ITGC, ITAC, and IPE evaluations by gathering evidence, preparing documentations, and validating control design and implementation.

Skills & Technologies

Machine Learning & AI

TensorFlow PyTorch Scikit-learn Keras XGBoost LightGBM Deep Learning NLP Computer Vision Reinforcement Learning

Frameworks & Tools

Git Docker Kubernetes Flask FastAPI Pandas NumPy Matplotlib Seaborn Jupyter Google-Antigravity

Data & Cloud Platforms

PostgreSQL MongoDB AWS Google Cloud Azure

Programming Languages

Python SQL

Get In Touch

Let's Connect

I'm always open to discussing new projects, creative ideas, or opportunities to be part of your vision.

Location

Dadra & Nagar Haveli, India