Python - Machine Learning, Data Science, AI | Học Python Chuyên Sâu

📌

Python Cơ Bản

Tại sao chọn Python?

Python là ngôn ngữ #1 cho AI/ML và Data Science. Syntax đơn giản, ecosystem phong phú, cộng đồng lớn.

💡 Python được dùng trong:
• Machine Learning & AI (TensorFlow, PyTorch)
• Data Science (Pandas, NumPy, Matplotlib)
• Web Development (Django, FastAPI, Flask)
• Automation & Scripting
• DevOps & Cloud (AWS Lambda, Azure Functions)

Cài đặt Python

# macOS với Homebrew
brew install [email protected]

# Ubuntu/Debian
sudo apt update
sudo apt install python3 python3-pip

# Virtual Environment (khuyến nghị)
python3 -m venv myenv
source myenv/bin/activate  # Linux/macOS
myenv\Scripts\activate     # Windows

# Kiểm tra version
python3 --version
# Output: Python 3.12.0

Syntax Cơ Bản

# Variables và Types
name = "Python"           # str
version = 3.12            # float
year = 2024               # int
is_awesome = True         # bool
languages = ["Go", "JS"]  # list

# Functions
def greet(name: str) -> str:
    """Chào người dùng"""
    return f"Xin chào, {name}! 🐍"

print(greet("Developer"))

# Classes
class Developer:
    def __init__(self, name: str, skills: list):
        self.name = name
        self.skills = skills
    
    def introduce(self) -> str:
        skills_str = ", ".join(self.skills)
        return f"Tôi là {self.name}, biết {skills_str}"

dev = Developer("An", ["Python", "ML", "DevOps"])
print(dev.introduce())

🤖

Machine Learning

📘 ML Frameworks phổ biến

• TensorFlow: Framework của Google, production-ready
• PyTorch: Framework của Meta, research-friendly
• Scikit-learn: ML truyền thống, dễ học

Scikit-learn - Classification

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dữ liệu
iris = load_iris()
X, y = iris.data, iris.target

# Chia train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Huấn luyện model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# Dự đoán và đánh giá
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2%}")  # ~96%

TensorFlow/Keras - Deep Learning

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Tạo model Neural Network
model = keras.Sequential([
    layers.Dense(128, activation='relu', input_shape=(784,)),
    layers.Dropout(0.2),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile model
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784) / 255.0
x_test = x_test.reshape(-1, 784) / 255.0

# Huấn luyện
model.fit(x_train, y_train, epochs=5, validation_split=0.2)

# Đánh giá
loss, acc = model.evaluate(x_test, y_test)
print(f"Test Accuracy: {acc:.2%}")

ℹ️ MLOps Trend 2024: Sử dụng MLflow hoặc Weights & Biases để track experiments, version models và deploy production.

📊

Data Science

Pandas - Xử lý dữ liệu

import pandas as pd
import numpy as np

# Tạo DataFrame
df = pd.DataFrame({
    'name': ['An', 'Bình', 'Chi', 'Dũng'],
    'age': [25, 30, 28, 35],
    'salary': [15000000, 25000000, 20000000, 35000000],
    'department': ['Dev', 'DevOps', 'Dev', 'Manager']
})

# Thống kê cơ bản
print(df.describe())

# Lọc dữ liệu
developers = df[df['department'] == 'Dev']
high_salary = df[df['salary'] > 20000000]

# Group by và aggregate
dept_stats = df.groupby('department').agg({
    'salary': ['mean', 'max'],
    'age': 'mean'
})
print(dept_stats)

# Xuất CSV
df.to_csv('employees.csv', index=False)

NumPy - Tính toán số

import numpy as np

# Tạo arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2], [3, 4], [5, 6]])

# Phép toán vectorized (nhanh hơn loop)
squared = arr ** 2           # [1, 4, 9, 16, 25]
normalized = arr / arr.max() # [0.2, 0.4, 0.6, 0.8, 1.0]

# Matrix operations
A = np.random.rand(3, 3)
B = np.random.rand(3, 3)
C = A @ B  # Matrix multiplication

# Statistical functions
mean = np.mean(arr)
std = np.std(arr)
print(f"Mean: {mean}, Std: {std}")

🌐

Web Development

FastAPI - Modern API

FastAPI là framework hiện đại, nhanh nhất cho Python. Tự động generate OpenAPI docs.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional

app = FastAPI(title="My API", version="1.0")

# Pydantic model cho validation
class User(BaseModel):
    name: str
    email: str
    age: Optional[int] = None

# In-memory database
users_db = {}

@app.get("/")
async def root():
    return {"message": "Xin chào từ FastAPI! 🐍"}

@app.post("/users/")
async def create_user(user: User):
    user_id = len(users_db) + 1
    users_db[user_id] = user.dict()
    return {"id": user_id, **user.dict()}

@app.get("/users/{user_id}")
async def get_user(user_id: int):
    if user_id not in users_db:
        raise HTTPException(status_code=404, detail="User not found")
    return users_db[user_id]

# Run: uvicorn main:app --reload
# Docs: http://localhost:8000/docs

🔗 Tài nguyên bổ sung

• Python Official Docs
• TensorFlow Documentation
• Blog AI & Công nghệ - Không Gian AI

🐍 Python