Muhammad's Favorite Open-Source LLMs

Muhammad Sohail Profile Picture

Muhammad Sohail

Loading intelligent insights...

Data Science Intern | ML Developer | LLM Enthusiast

About Me

I'm Muhammad Sohail, a passionate Data Scientist and ML Developer with a keen interest in building intelligent solutions. My expertise spans across Python, advanced machine learning techniques, and cutting-edge Large Language Models (LLMs). I thrive on Data Preprocessing, Model Training, and creating powerful Chatbots.

I am currently pursuing my MSc M2 in Data Science and Network Intelligence at Institut Polytechnique de Paris, France. My coursework includes Data Science, Data Visualization, Deep Learning, Blockchain, Network Intelligence, Internet of Things (IoT), Distributed Networking, and Business Modelling.

I also hold a Bachelor of Computer Science from the University of Malakand, Pakistan (Aug 2019 - Sep 2023, GPA-3.75), where my coursework covered Programming Languages, Web Development, Database Management, Operating Systems, Networking, Software Engineering, AI, Mathematics, and Statistics.

Languages: English (Fluent), Pashto (Native/Bilingual), Urdu (Native/Bilingual), French (Basic).

Fun Fact: I love turning complex systems into simple, elegant solutions that deliver real-world impact.

Education Timeline

MSc M2 in Data Science and Network Intelligence

Institut Polytechnique de Paris, France | Sep 2024 - Current

GPA: 15.81

Coursework: Data Science, Data Visualization, Deep Learning, Blockchain, Network Intelligence, Internet of Things (IoT), Distributed Networking, Business Modelling.

Bachelor of Computer Science

University of Malakand, Pakistan | Aug 2019 - Sep 2023

GPA: 3.75

Coursework: Programming Languages, Web Development, Database Management, Operating Systems, Networking, Software Engineering, AI, Mathematics, and Statistics.

Experience Timeline

Data Science Intern

Université Paris-Saclay | Mar 2025 - Present

Developing an intelligent HR chatbot using LLMs and Retrieval-Augmented Generation (RAG) to automate staff inquiries on recruitment, leave, and training.

Implementing robust validation mechanisms to ensure accurate and policy-compliant responses.

Designing a user-friendly interface using Streamlit and Gradio, deployed within the university's internal data center.

Leveraging tools like Unstructured.io, Docling, LlamaParse, pdfplumber, and LangChain for document parsing and knowledge extraction.

Using open-source models such as LLaMA, Mistral, and others for generating contextual answers via API integration.

Data Analyst Intern (Remote)

Code Clause Pvt Ltd. | Oct 2023 - Apr 2024

Led data preparation tasks, including handling missing values and standardizing data for accurate analysis.

Applied advanced statistical techniques (e.g., linear regression, hypothesis testing) to uncover insights and support decision-making.

Delivered actionable insights via dashboards and reports, enhancing stakeholder understanding and driving strategic decisions.

Projects Gallery

HR Chatbot

Intelligent HR chatbot for automating staff inquiries using LLMs and RAG.

LangChain LLaMA Gradio RAG
GitHub

MediBot

AI-powered medical chatbot for retrieving trusted medical information.

Python LangChain FAISS Mistral-7B
GitHub

Emotion Detection

System to detect perceived emotions in text using deep learning.

TensorFlow Keras PyTorch NLTK SpaCy
GitHub

Parkinson Predictor

Machine learning system for Parkinson's disease progression prediction.

Python Scikit-learn XGBoost Streamlit
GitHub

Basic Image Classification with Tensorflow

A simple image classifier using TensorFlow and CNNs.

TensorFlow Keras CNN Python
GitHub

Dynamic Cooling System in Data Centers Using IoT & LSTM

IoT-powered dynamic cooling system for data centers.

IoT LSTM Predictive Analytics Energy Efficiency
Drive Link

Projects: 0+

Licenses & Certifications

Skills & Tools

Python C++ Java SQL TensorFlow Keras PyTorch Scikit-learn XGBoost LightGBM LangChain FAISS Hugging Face LLMs (General) RAG Chatbot Development Chromadb Unstructured.io Docling LlamaParse pdfplumber Supervised Learning Unsupervised Learning Data Preprocessing Model Training Neural Networks Deep Learning Data Visualization A/B Testing Feature Engineering Convolutional Neural Networks (CNN) Model Tuning Natural Language Processing (NLP) IoT Sensor Networks LSTM Neural Networks Predictive Analytics System Architecture Energy Efficiency Streamlit Gradio Power BI Tableau Excel Pandas NumPy Matplotlib Seaborn Git GitHub VS Code

Tools Mastered: 0+

Let’s build smart things together.