Data Science and Machine Learning with Python 2026
Maqaalkan wuxuu ku saabsan yahay mawduuc muhiim ah oo ku saabsan ganacsiga online-ka.
Summary
- 1. What is Data Science?
- 2. Setting Up Your Data Science Environment
- 3. NumPy: Fast Numerical Computing
- 4. Pandas: The Data Science Workhorse
- 5. Data Visualization
- 6. Machine Learning with Scikit-learn
- 7. ML Algorithms Comparison
- 8. Data Science Projects to Build - {:.} π― Data Science Learning Path
π€ Data Science in 2026: Data science is the hottest career in tech. In 2026, data scientists earn $125,000+ on average in the USA. The World Economic Forum ranks data science among the top 10 jobs of the future. And Python is the undisputed #1 language for the entire data science ecosystem.
$125K
Avg Data Scientist Salary
36%
Job Growth (2021β2031)
2.5QB
Data Created Daily
Python
#1 Language for DS
1. What is Data Science?
Data science extracts knowledge and insights from structured and unstructured data using scientific methods, algorithms, and visualization. A data scientist is part statistician, part programmer, and part domain expert β someone who can turn raw data into actionable decisions.
The data science workflow: Data Collection β Cleaning β Exploration β Modeling β Evaluation β Deployment β Monitoring. Python tools handle every step.
2. Setting Up Your Data Science Environment
# Install the data science stack
pip install numpy pandas matplotlib seaborn scikit-learn jupyter
# Start Jupyter Lab (modern interactive notebook)
pip install jupyterlab
jupyter lab
# Or use Google Colab β free GPU-powered Jupyter in the browser
# colab.research.google.com β no installation needed!
3. NumPy: Fast Numerical Computing
import numpy as np
# Creating arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.zeros((3, 4)) # 3x4 matrix of zeros
random_data = np.random.randn(1000) # 1000 normally distributed values
# Vectorized operations (no loops needed!)
print(arr * 2) # [2, 4, 6, 8, 10]
print(arr ** 2) # [1, 4, 9, 16, 25]
print(np.sqrt(arr)) # Square roots
# Statistics
data = np.random.randn(10000)
print(f"Mean: {data.mean():.4f}")
print(f"Std: {data.std():.4f}")
print(f"Min: {data.min():.4f}")
print(f"Max: {data.max():.4f}")
4. Pandas: The Data Science Workhorse
import pandas as pd
# Load data from many sources
df = pd.read_csv("data.csv")
df_excel = pd.read_excel("report.xlsx")
df_json = pd.read_json("data.json")
# Explore the data
print(df.head(10)) # First 10 rows
print(df.info()) # Data types and null counts
print(df.describe()) # Statistical summary
print(df.shape) # (rows, columns)
print(df.isnull().sum()) # Missing values per column
# Data selection and filtering
df["revenue"] # Single column
df[["name", "revenue", "date"]] # Multiple columns
df[df["revenue"] > 10000] # Filter rows
df[(df["category"] == "Tech") & (df["revenue"] > 5000)] # Multiple conditions
# Data cleaning
df.dropna() # Remove rows with nulls
df.fillna({"revenue": 0}) # Fill specific nulls
df.drop_duplicates()
# Grouping and aggregation
df.groupby("category")["revenue"].agg(["sum", "mean", "count"])
5. Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Line chart β trends over time
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(df["date"], df["revenue"], marker="o", linewidth=2, color="#0ea5e9")
ax.set_title("Monthly Revenue 2026", fontsize=16, fontweight="bold")
ax.set_xlabel("Month")
ax.set_ylabel("Revenue ($)")
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Seaborn β beautiful statistical plots with minimal code
sns.scatterplot(data=df, x="marketing_spend", y="revenue", hue="category")
sns.histplot(df["revenue"], bins=30, kde=True)
sns.heatmap(df.select_dtypes("number").corr(), annot=True, cmap="coolwarm")
sns.boxplot(data=df, x="category", y="revenue")
6. Machine Learning with Scikit-learn
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
# Prepare features and target
X = df[["age", "income", "credit_score", "employment_years"]] # Features
y = df["loan_approved"] # Target (0 or 1)
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Scale features (important for many algorithms)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test) # Use same scaler!
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)
# Evaluate
predictions = model.predict(X_test_scaled)
print(f"Accuracy: {accuracy_score(y_test, predictions):.2%}")
print(classification_report(y_test, predictions))
# Feature importance
for feat, imp in zip(X.columns, model.feature_importances_):
print(f" {feat}: {imp:.3f}")
7. ML Algorithms Comparison
| Algorithm | Type | Best Use Case | Complexity |
|---|---|---|---|
| Linear Regression | Supervised | Predicting continuous values | Low |
| Logistic Regression | Supervised | Binary classification | Low |
| Random Forest | Supervised | High-accuracy classification/regression | Medium |
| XGBoost | Supervised | Competition-winning tabular data | Medium |
| K-Means | Unsupervised | Customer segmentation | Low |
| Neural Networks | Supervised | Images, text, complex patterns | High |
8. Data Science Projects to Build
- House Price Prediction: The classic first ML project. Use the Ames Housing or Kaggle dataset. Linear regression and random forest. Learn feature engineering and cross-validation.
- Customer Churn Prediction: Predict which customers will cancel. Binary classification, handle imbalanced classes, interpret business impact of predictions.
- Sentiment Analysis: Classify product reviews using NLP. Use Scikit-learn for traditional ML or Hugging Face transformers for state-of-the-art results.
- Stock Data Exploration: Fetch with yfinance, calculate technical indicators, visualize trends, build a simple prediction model. Shows time series analysis skills.
- Recommendation System: Collaborative filtering for movie or product recommendations. The algorithm powering Netflix and Amazon. Impressive portfolio piece.
π― Data Science Learning Path
- Python fundamentals first β if not already known, spend 4-6 weeks
- NumPy and Pandas mastery β 3-4 weeks of daily practice with real datasets
- Matplotlib and Seaborn β 1-2 weeks to create professional visualizations
- Statistics fundamentals β mean, median, standard deviation, probability, distributions
- Scikit-learn ML algorithms β start with linear models, progress to ensemble methods
- Build 5 complete projects including one with real data from Kaggle
π€ Start Your Data Science Journey!
Begin with our Python guide, then come back for Pandas and ML.
Learn Python First β</div>
π¬ Faallada iyo Su'aalaha
Su'aal ma qabtaa? Wax ka qor hoose β waxaan kuu jawaabi doonaa sida ugu dhaqsaha badan. Faalladaada muhiim ayay noogu tahay!