<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Path To Machine Learning]]></title><description><![CDATA[Path To Machine Learning]]></description><link>https://path2ml.com</link><generator>RSS for Node</generator><lastBuildDate>Tue, 14 Apr 2026 02:48:45 GMT</lastBuildDate><atom:link href="https://path2ml.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Building an Agentic CI/CD Pipeline with Amazon Bedrock, GitLab CI, and AWS CDK]]></title><description><![CDATA[Overview
What if your CI/CD pipeline could think? Not just run lint and tests, but actually understand your code changes, generate tests for new functions, and write a risk assessment with inline code]]></description><link>https://path2ml.com/building-an-agentic-ci-cd-pipeline-with-amazon-bedrock-gitlab-ci-and-aws-cdk</link><guid isPermaLink="true">https://path2ml.com/building-an-agentic-ci-cd-pipeline-with-amazon-bedrock-gitlab-ci-and-aws-cdk</guid><category><![CDATA[ci-cd]]></category><category><![CDATA[llm]]></category><category><![CDATA[agentic AI]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Sat, 21 Mar 2026 14:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/67813846ff24ffd4d2354b38/64e1930f-8374-4be8-ada8-df09a4850f75.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Overview</h2>
<p>What if your CI/CD pipeline could think? Not just run lint and tests, but actually <em>understand</em> your code changes, generate tests for new functions, and write a risk assessment with inline code review comments -- all automatically on every merge request?</p>
<p>This blog walks through building exactly that: an <strong>agentic CI/CD pipeline</strong> that integrates Amazon Bedrock's foundation models directly into GitLab CI, turning a traditional pipeline into an intelligent code review and quality assurance system.</p>
<p>The result is a 6-stage pipeline where two stages are AI-powered -- the agent generates tests for changed code and performs a holistic merge request review with risk scoring, all deployed as infrastructure-as-code with AWS CDK.</p>
<h2>How AI Can Transform CI/CD Productivity</h2>
<p>Traditional CI/CD pipelines are deterministic: they run the same checks the same way every time. Lint passes or fails. Tests pass or fail. There's no interpretation, no context, no judgement.</p>
<p>AI agents change this fundamentally:</p>
<ul>
<li><p><strong>Automated test generation</strong> -- When a developer adds a new function, the agent can inspect the code and generate meaningful test cases, not just import checks but actual logic tests with edge cases.</p>
</li>
<li><p><strong>Holistic code review</strong> -- Instead of isolated tool outputs, an AI agent can correlate lint violations, test failures, security findings, and the actual diff to produce a unified risk assessment.</p>
</li>
<li><p><strong>Contextual inline comments</strong> -- The agent can point to specific lines of code with architectural suggestions, anti-pattern warnings, or security concerns that static tools miss entirely.</p>
</li>
<li><p><strong>Risk-based merge gating</strong> -- Rather than binary pass/fail, the pipeline produces a risk score (0-100) with weighted drivers, giving reviewers actionable context to make merge decisions.</p>
</li>
</ul>
<p>The key insight is that AI agents don't replace existing CI tools -- they <em>augment</em> them. Ruff still lints, pytest still tests, bandit still scans. The agent sits on top, consuming all their outputs plus the raw diff to produce something none of them could alone.</p>
<h2>What This Solution Solves</h2>
<p>In a typical development workflow, code review is a bottleneck. Reviewers have to:</p>
<ol>
<li><p>Manually read every diff line</p>
</li>
<li><p>Cross-reference lint, test, and security outputs</p>
</li>
<li><p>Assess overall risk based on what changed</p>
</li>
<li><p>Write comments and suggest improvements</p>
</li>
<li><p>Decide whether to approve</p>
</li>
</ol>
<p>This solution automates steps 1-4 completely and provides structured input for step 5:</p>
<ul>
<li><p><strong>AI-generated tests</strong> catch untested functions before the MR is even reviewed</p>
</li>
<li><p><strong>Automated risk scoring</strong> quantifies merge risk on a 0-100 scale with weighted risk drivers</p>
</li>
<li><p><strong>MR summary notes</strong> are posted directly to GitLab with risk score, review summary, and artifact links</p>
</li>
<li><p><strong>Inline code comments</strong> highlight specific issues on the diff itself</p>
</li>
<li><p><strong>Deterministic fallbacks</strong> ensure the pipeline never breaks even if the AI is unavailable</p>
</li>
</ul>
<h2>Running It Yourself</h2>
<h3>Prerequisites</h3>
<ul>
<li><p>Docker &amp; Docker Compose</p>
</li>
<li><p>AWS CLI configured with Bedrock access</p>
</li>
<li><p>Node.js + AWS CDK CLI</p>
</li>
<li><p>Python 3.12</p>
</li>
</ul>
<p>For a POC, self-hosted GitLab CE via Docker Compose gives full control over CI/CD configuration, runner setup, and API access without SaaS limitations. The Docker executor runs jobs in isolated containers with the Python 3.12 image.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67813846ff24ffd4d2354b38/f1ee0753-db25-4a6c-a742-eea145b6bed4.png" alt="" style="display:block;margin:0 auto" />

<h2>Solution Architecture</h2>
<p>The system has three main components: a self-hosted GitLab instance with CI/CD runners, an AWS backend with API Gateway + Lambda + Bedrock Agent, and CI scripts that orchestrate everything.</p>
<h3>Architecture Overview</h3>
<img src="https://cdn.hashnode.com/uploads/covers/67813846ff24ffd4d2354b38/d6c583c4-421b-4992-b72b-f98cfd96cb0e.svg" alt="" style="display:block;margin:0 auto" />

<h3>Component Breakdown</h3>
<p><strong>GitLab (Docker Compose)</strong></p>
<ul>
<li><p>Self-hosted GitLab CE on port 8080</p>
</li>
<li><p>GitLab Runner with Docker executor</p>
</li>
<li><p>Python 3.12 base image with ruff, pytest, bandit, pip-audit</p>
</li>
</ul>
<p><strong>AWS Cloud (CDK-deployed)</strong></p>
<ul>
<li><p>API Gateway (REST) with API key authentication</p>
</li>
<li><p>Lambda function (Python 3.12, 512MB, 120s timeout)</p>
</li>
<li><p>Amazon Bedrock Agent using Nova Pro v1:0 foundation model</p>
</li>
<li><p>IAM roles for agent invocation and model access</p>
</li>
</ul>
<p><strong>CI Scripts (Python)</strong></p>
<ul>
<li><p><code>detect_changes.py</code> -- git diff analysis</p>
</li>
<li><p><code>generate_tests_from_diff.py</code> -- AI test generation with fallback</p>
</li>
<li><p><code>validate_generated_tests.py</code> -- AST safety validation</p>
</li>
<li><p><code>persist_generated_tests.py</code> -- optional bot commit</p>
</li>
<li><p><code>mr_review_agent.py</code> -- MR review with risk scoring and GitLab API posting</p>
</li>
</ul>
<h2>The 6-Stage Pipeline</h2>
<p>The pipeline runs on every merge request update. Two stages are AI-powered (Prepare and Agent Review), while the middle three are traditional CI tools.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67813846ff24ffd4d2354b38/e6b2940e-ac44-4c8c-ac0b-7cf03436d61b.svg" alt="" style="display:block;margin:0 auto" />

<h3>Stage 1: PREPARE (AI-Powered)</h3>
<p>This stage runs four jobs sequentially:</p>
<p><strong>detect_changes</strong> -- Runs <code>git diff</code> against the target branch to produce three artifacts: the list of changed files, the list of changed Python files, and the full diff text. These artifacts feed every downstream job.</p>
<p><strong>generate_tests</strong> -- Sends the diff and changed file list to the Bedrock Agent via API Gateway. The agent analyzes the code changes and generates pytest test files. If the agent returns malformed JSON, the Lambda retries up to 2 times. If all retries fail, the CI script falls back to AST-based test generation that produces import + function existence tests.</p>
<p><strong>validate_generated_tests</strong> -- Parses every generated test file through Python's AST module to ensure they're syntactically valid and safe to run. No <code>os.system</code>, no <code>subprocess</code>, no <code>eval</code>.</p>
<p><strong>persist_generated_tests</strong> -- Optionally commits the generated tests back to the feature branch (controlled by <code>AUTO_COMMIT_GENERATED_TESTS</code> variable).</p>
<h3>Stage 2: LINT</h3>
<p>Runs <code>ruff check .</code> and saves the output to <code>artifacts/lint.txt</code>. Ruff is a fast Python linter that catches style violations, import errors, and anti-patterns.</p>
<h3>Stage 3: TEST</h3>
<p>Runs <code>pytest -v --tb=short --junitxml=artifacts/junit.xml</code> to execute all tests -- both manually written tests and AI-generated tests from Stage 1. Results are saved as a JUnit XML report that GitLab renders in the MR UI.</p>
<h3>Stage 4: SECURITY</h3>
<p>Runs <code>bandit</code> (static security analysis) and <code>pip-audit</code> (dependency vulnerability scanning). Output is saved to <code>artifacts/security.txt</code>.</p>
<h3>Stage 5: AGENT REVIEW (AI-Powered)</h3>
<p>The most sophisticated stage. The <code>mr_review_agent.py</code> script:</p>
<ol>
<li><p><strong>Collects all artifacts</strong> from previous stages (lint.txt, test output, security.txt, changes.diff)</p>
</li>
<li><p><strong>Calls the Bedrock Agent</strong> via API Gateway with the full context</p>
</li>
<li><p><strong>Computes a deterministic baseline</strong> risk score from the artifacts (test failures, lint violations, security findings)</p>
</li>
<li><p><strong>Merges AI and deterministic scores</strong> -- the final risk score is always &gt;= the deterministic baseline</p>
</li>
<li><p><strong>Posts results to GitLab</strong> as an MR summary note with risk drivers table, review summary, and artifact links</p>
</li>
<li><p><strong>Posts inline comments</strong> on specific lines of the diff for critical findings</p>
</li>
</ol>
<h3>Stage 6: DEPLOY</h3>
<p>Manual deployment to staging, only available on the main branch.</p>
<h2>The AWS Stack</h2>
<p>The entire AWS infrastructure is defined in a single CDK stack (<code>AgentGateStack</code>), making it fully reproducible with <code>cdk deploy</code>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/67813846ff24ffd4d2354b38/0966e989-6bf8-4aca-8dac-9c4a5e1c766d.svg" alt="" style="display:block;margin:0 auto" />

<h2>Data Flow: End to End</h2>
<p>Here's how data flows through the entire system from developer push to MR comment:</p>
<ol>
<li><p>Developer pushes to a feature branch and opens/updates an MR</p>
</li>
<li><p>GitLab triggers the pipeline with <code>merge_request_event</code> source</p>
</li>
<li><p><code>detect_changes</code> produces the diff and file lists as CI artifacts</p>
</li>
<li><p><code>generate_tests</code> sends the diff to Bedrock and writes test files</p>
</li>
<li><p>Lint, test, and security stages run independently, producing their own artifacts</p>
</li>
<li><p><code>mr_review</code> collects ALL artifacts, calls Bedrock for a holistic review</p>
</li>
<li><p>The review result is posted as an MR note with risk score and inline comments</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/67813846ff24ffd4d2354b38/ad57f046-740e-46be-af86-ab6626632b4e.svg" alt="" style="display:block;margin:0 auto" />

<h2>Risk Scoring: How It Works</h2>
<p>The risk scoring system combines deterministic analysis with AI judgement:</p>
<img src="https://cdn.hashnode.com/uploads/covers/67813846ff24ffd4d2354b38/428d92e5-be1f-4781-9543-0732ab6dac31.svg" alt="" style="display:block;margin:0 auto" />

<h3>AI Enhancement</h3>
<p>The Bedrock Agent analyzes the actual code changes and produces its own risk score with drivers. The final score is the <strong>maximum</strong> of the AI score and the deterministic baseline -- ensuring that hard signals (failing tests, security vulnerabilities) can never be downplayed by the AI.</p>
<h3>Agent response in merge request of gitlab merge screen</h3>
<img src="https://cdn.hashnode.com/uploads/covers/67813846ff24ffd4d2354b38/51898fc7-7140-4797-a53d-59bbec90362e.png" alt="" style="display:block;margin:0 auto" />

<h2>Conclusion</h2>
<p>This project demonstrates that AI agents can be meaningfully integrated into CI/CD pipelines today -- not as replacements for existing tools, but as an intelligent layer that synthesizes their outputs into actionable insights.</p>
<p>project repo used for this POC is at <a href="https://github.com/learner14/agentic-cicd-pipeline"><strong>agentic ci/cd pipeline</strong></a></p>
]]></content:encoded></item><item><title><![CDATA[Reliable JSON Responses from LLMs]]></title><description><![CDATA[Getting reliable, structured (JSON) responses from Large Language Models is harder than it looks. The magentic library, paired with Pydantic, lets you define the shape of your expected output as a Pyt]]></description><link>https://path2ml.com/reliable-json-responses-from-llms</link><guid isPermaLink="true">https://path2ml.com/reliable-json-responses-from-llms</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[llm]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Sun, 08 Mar 2026 20:38:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/67813846ff24ffd4d2354b38/2ff0e7ae-cb6c-424a-9fcb-86a496158c30.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Getting reliable, structured (JSON) responses from Large Language Models is harder than it looks. The <strong>magentic</strong> library, paired with <strong>Pydantic</strong>, lets you define the shape of your expected output as a Python class and receive it back as a validated object — no manual prompt engineering or fragile JSON parsing required.</p>
<h2>The Problem</h2>
<p>When you call an LLM through its API, the response comes back as free-form text. If your application needs that data in a structured format — say, a JSON object with specific fields — you're left writing brittle prompt instructions like <em>"Please respond in JSON with keys name, age, and summary"</em> and then wrapping everything in <code>try/except json.loads(...)</code>.</p>
<p>This leads to:</p>
<ul>
<li><p><strong>Unreliable outputs</strong> — the model may add commentary, change key names, or break JSON syntax.</p>
</li>
<li><p><strong>Wasted tokens</strong> — lengthy system prompts that explain the desired format eat into your context window.</p>
</li>
<li><p><strong>Messy code</strong> — parsing logic, validation, and retry handling clutter your business logic.</p>
</li>
</ul>
<h2>How Magentic + Pydantic Solve This</h2>
<p>The <strong>magentic</strong> library introduces a simple decorator-based approach:</p>
<ol>
<li><p><strong>Define your output schema</strong> using a Pydantic model — the same way you'd define any data class in modern Python.</p>
</li>
<li><p><strong>Decorate a function</strong> with <code>@prompt</code>, providing your prompt template.</p>
</li>
<li><p><strong>Call the function</strong> — magentic handles the system prompt injection, API call, and Pydantic-led response parsing under the hood.</p>
</li>
</ol>
<p>This means:</p>
<ul>
<li><p>The decorator manages the underlying system prompts that instruct the LLM to return structured data.</p>
</li>
<li><p>Pydantic validates and parses the response automatically.</p>
</li>
<li><p>Token usage is optimised because the formatting instructions are handled efficiently by the library.</p>
</li>
<li><p>Your codebase stays clean — no raw JSON wrangling.</p>
</li>
</ul>
<h2>Quick Example (Python + OpenAI API Key)</h2>
<h3>Install</h3>
<pre><code class="language-bash">pip install magentic pydantic openai
</code></pre>
<h3>Code</h3>
<pre><code class="language-python">"""
Structured Output Binding with Magentic + Pydantic + OpenAI
------------------------------------------------------------
pip install magentic pydantic openai python-dotenv
"""

import os
from dotenv import load_dotenv
from pydantic import BaseModel
from openai import OpenAI
from magentic import prompt

# Load API key from .env file
load_dotenv()


# =============================================
# 1. Define output schemas (shared by both approaches)
# =============================================

class MovieReview(BaseModel):
    title: str
    rating: float
    summary: str


class RecipeSuggestion(BaseModel):
    name: str
    ingredients: list[str]
    steps: list[str]
    prep_time_minutes: int


# =============================================
# APPROACH 1: Direct OpenAI API call (explicit LLM call)
# =============================================

def review_movie_direct(movie_name: str) -&gt; MovieReview:
    """Calls the OpenAI API directly and parses response into a Pydantic model."""
    client = OpenAI()  # uses OPENAI_API_KEY from env

    # This is the actual LLM call ↓
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a movie critic. Return structured JSON."},
            {"role": "user", "content": f"Give a short review of the movie {movie_name}"},
        ],
        response_format=MovieReview,  # OpenAI parses response into this Pydantic model
    )

    return completion.choices[0].message.parsed


# =============================================
# APPROACH 2: Magentic decorator (hides the LLM call)
# =============================================
# Under the hood, @prompt sends your text to the OpenAI API
# and automatically parses the response into the return type.

@prompt("Give a short review of the movie {movie_name}")
def review_movie_magentic(movie_name: str) -&gt; MovieReview: ...


@prompt("Suggest a simple recipe using {ingredient} as the main ingredient")
def suggest_recipe(ingredient: str) -&gt; RecipeSuggestion: ...


# =============================================
# 3. Run &amp; test
# =============================================

if __name__ == "__main__":
    assert os.environ.get("OPENAI_API_KEY"), "Set OPENAI_API_KEY in your .env file first!"

    # --- Approach 1: Direct OpenAI call ---
    print("=== Movie Review (Direct OpenAI Call) ===")
    review = review_movie_direct("Inception")
    print(f"Title   : {review.title}")
    print(f"Rating  : {review.rating}")
    print(f"Summary : {review.summary}")

    print()

    # --- Approach 2: Magentic decorator ---
    print("=== Movie Review (Magentic Decorator) ===")
    review2 = review_movie_magentic("The Matrix")
    print(f"Title   : {review2.title}")
    print(f"Rating  : {review2.rating}")
    print(f"Summary : {review2.summary}")

    print()

    print("=== Recipe Suggestion (Magentic Decorator) ===")
    recipe = suggest_recipe("chicken")
    print(f"Name        : {recipe.name}")
    print(f"Ingredients : {', '.join(recipe.ingredients)}")
    print(f"Prep Time   : {recipe.prep_time_minutes} min")
    print("Steps:")
    for i, step in enumerate(recipe.steps, 1):
        print(f"  {i}. {step}")
</code></pre>
<h3>Formatted Output</h3>
<pre><code class="language-python">=== Movie Review (Direct OpenAI Call) ===
Title   : Inception
Rating  : 9.0
Summary : Inception is a mind-bending thriller directed by Christopher Nolan that explores the intricacies of dreams and the subconscious. With a stellar cast led by Leonardo DiCaprio, the film masterfully blends action, science fiction, and emotional depth. Its intricate plot and stunning visuals keep audiences engaged, while the haunting score by Hans Zimmer heightens the tension throughout. A thought-provoking narrative that challenges perceptions of reality, Inception remains a landmark achievement in modern cinema.

=== Movie Review (Magentic Decorator) ===
Title   : The Matrix
Rating  : 4.8
Summary : The Matrix is a groundbreaking science fiction film that blends mind-bending action with philosophical depth. Directed by the Wachowskis, it introduces a dystopian future where reality is simulated by AI, and human resistance fights against the machine overlords. The film's innovative special effects, particularly the iconic 'bullet-time' sequences, and its exploration of themes such as freedom, reality, and identity make it a landmark in cinematic history. With a stellar performance by Keanu Reeves as Neo, The Matrix has become a cultural phenomenon and a must-watch for any sci-fi fan.
</code></pre>
<h3>What Happens Behind the Scenes</h3>
<ol>
<li><p><code>@prompt</code> sends your template (plus automatic formatting instructions) to the OpenAI API.</p>
</li>
<li><p>The LLM returns structured data matching the <code>MovieReview</code> schema.</p>
</li>
<li><p>Magentic parses and validates the response through Pydantic before handing it back to you as a Python object.</p>
</li>
</ol>
<p>No manual JSON parsing. No retry loops. No prompt gymnastics.</p>
<h2>Key Takeaways</h2>
<table>
<thead>
<tr>
<th>Benefit</th>
<th>Without Magentic</th>
<th>With Magentic</th>
</tr>
</thead>
<tbody><tr>
<td>Output format</td>
<td>Free text / fragile JSON</td>
<td>Validated Pydantic model</td>
</tr>
<tr>
<td>Prompt overhead</td>
<td>Manual formatting instructions</td>
<td>Handled by decorator</td>
</tr>
<tr>
<td>Parsing code</td>
<td><code>json.loads</code> + validation</td>
<td>Automatic</td>
</tr>
<tr>
<td>Token efficiency</td>
<td>Extra tokens for format prompts</td>
<td>Optimised</td>
</tr>
<tr>
<td>Code cleanliness</td>
<td>Scattered parsing logic</td>
<td>Single decorated function</td>
</tr>
</tbody></table>
]]></content:encoded></item><item><title><![CDATA[Multi-Agent Loan Processing AgenticAI]]></title><description><![CDATA[In this blog post, we will explore the implementation of a multi-agent loan processing system using the crewAI framework. This innovative approach leverages advanced artificial intelligence to streamline and automate the various stages of loan proces...]]></description><link>https://path2ml.com/multi-agent-loan-processing-agenticai</link><guid isPermaLink="true">https://path2ml.com/multi-agent-loan-processing-agenticai</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[agentic AI]]></category><category><![CDATA[CrewAI]]></category><category><![CDATA[llm]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Wed, 18 Feb 2026 21:50:19 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771297607985/8ffb53f1-82cd-49f8-a508-58d9d627fb72.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this blog post, we will explore the implementation of a multi-agent loan processing system using the <strong>crewAI</strong> framework. This innovative approach leverages advanced artificial intelligence to streamline and automate the various stages of loan processing, ensuring efficiency and accuracy.</p>
<p>We will employ a hierarchical architecture, commonly referred to as the <strong>Supervisor/Orchestrator pattern</strong>. This design effectively organizes agents into a structured framework that mimics a corporate team environment. Within this framework, a manager (the supervisor) oversees a group of specialized agents (the specialists), each with specific expertise.</p>
<h3 id="heading-supervisor-parent-agent">Supervisor (Parent) Agent</h3>
<p>The Supervisor agent serves as the manager of the system. Its primary responsibilities encompass several crucial tasks, including:</p>
<ul>
<li><p><strong>Planning the Workflow:</strong> The Supervisor outlines the entire loan processing cycle, establishing the order and priority of tasks to be completed.</p>
</li>
<li><p><strong>Delegating Sub-Tasks:</strong> Based on the nature of each loan application, the Supervisor assigns specific tasks to the appropriate specialist agents, ensuring that each agent is tasked with functions aligned with their expertise.</p>
</li>
<li><p><strong>Monitoring Progress:</strong> Continuous oversight allows the Supervisor to track the status of each task, providing updates on progress and identifying any potential issues or bottlenecks in the process.</p>
</li>
<li><p><strong>Synthesizing Results:</strong> After all tasks have been executed, the Supervisor consolidates the findings from the specialists to create a comprehensive overview of the loan application status.</p>
</li>
</ul>
<h3 id="heading-specialist-sub-agent">Specialist (Sub-Agent)</h3>
<p>Specialist agents are the individual experts within the multi-agent system. Each specialist is programmed to perform a particular task with a high degree of proficiency. Their roles include:</p>
<ul>
<li><p><strong>Performing Specific Tasks:</strong> Each specialist is designed for tasks such as document validation, credit score retrieval, risk assessment, and compliance checking. This specialization enables them to execute their functions efficiently and effectively.</p>
</li>
<li><p><strong>Executing Under Supervision:</strong> Upon receiving a task from the Supervisor, each specialist utilizes its dedicated tools and algorithms to complete the assigned job, ensuring that the output is accurate and meets the necessary requirements.</p>
</li>
<li><p><strong>Returning Results:</strong> Once a task is completed, the specialist sends the result back to the Supervisor for further processing and analysis.</p>
</li>
</ul>
<h3 id="heading-hierarchical-agent-system-is-shown-below">Hierarchical Agent system is shown below</h3>
<ul>
<li><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771295285033/c4f4baed-7525-4831-bbd2-a6f4e81e4580.png" alt /></li>
</ul>
<h3 id="heading-workflow-process">Workflow Process</h3>
<p>The overall workflow within this multi-agent system is divided into several distinct tasks, each mapped to a specific component of the process:</p>
<ol>
<li><p><strong>Document Fetch:</strong> The first step involves retrieving the content of the loan application document. This is done at the preprocessing stage, where the document is fetched and made available for subsequent validation and analysis.</p>
</li>
<li><p><strong>Document Validation:</strong> Once the document content is fetched, the Document Validation Specialist checks its validity and completeness. This step ensures that all required information is present and adheres to the predefined standards.</p>
</li>
<li><p><strong>Credit Check:</strong> Following validation, the Credit Check Agent is tasked with retrieving the borrower’s credit score. This process relies on the borrower’s unique customer ID to access accurate credit information, which is vital for assessing the borrower’s creditworthiness.</p>
</li>
<li><p><strong>Risk Assessment:</strong> The next phase is conducted by the Risk Assessment Analyst, who analyzes the data gathered from the document, credit score, and borrower’s income. This analysis determines the risk level associated with granting the loan.</p>
</li>
<li><p><strong>Compliance Check:</strong> Finally, the Compliance Check Agent evaluates the entire decision-making process to ensure that it aligns with established lending regulations and legal requirements, safeguarding the institution against potential compliance issues.</p>
</li>
</ol>
<p>By structuring the loan processing system in this way, we can ensure a streamlined and effective approach to handling loan applications. Each agent plays a crucial role in contributing to a well-coordinated effort, ultimately enhancing the quality and speed of the loan processing workflow.</p>
<p>According to our workflow our orchestrator agent and four specialist agents will look like this.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771450504680/6440c21a-7ba4-4c02-b140-a660b37364e2.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-crewai-framework-for-building-agenticai-loan-procesing-application">CrewAI framework for building AgenticAI Loan-procesing application</h3>
<p>Key concepts in <strong>CrewAI</strong> encompass several crucial elements that define how the system operates and how agents collaborate effectively:</p>
<p><strong>Agents</strong>: Agents are the core components of CrewAI, each uniquely defined by a specific role, goal, and backstory. They utilize a language model (LLM) tailored to their persona, which allows them to communicate in a way that aligns with their expertise or field of knowledge. Additionally, agents may have access to specialized tools that enhance their capabilities. This role-playing aspect not only helps agents embody their defined personas but also enables nuanced interactions and more effective problem-solving based on their areas of expertise.</p>
<p><strong>Tasks</strong>: Within CrewAI, tasks are well-defined assignments given to agents, each with a clear description and expected outcomes. Tasks vary in complexity and purpose, and they are assigned to specific agents based on their roles and skill sets. Additionally, tasks can be structured in a manner that allows for chaining—where the output of one task can seamlessly serve as the input for another task. This chaining mechanism facilitates efficient workflows and enhances the overall productivity of the agents.</p>
<p><strong>Tools</strong>: Tools serve as essential functions or capabilities that agents can use to interact with external systems or execute specific actions. Examples of these tools include web search capabilities, API access, and data processing functions. In the CrewAI framework, tools often derive from a BaseTool class, ensuring consistency and reliability in their performance. The availability of various tools empowers agents to perform diverse tasks more effectively, expanding their range of capabilities.</p>
<p><strong>Crew</strong>: The crew represents the collaborative assembly of agents working together to accomplish a set of objectives or tasks. The composition of the crew is vital, as it determines how well agents can leverage each other's strengths and skills. Effective collaboration within the crew relies on clear communication and the coordinated execution of tasks, enhancing the overall effectiveness of the team.</p>
<p><strong>Process</strong>: The process describes the systematic workflow or methodology that the crew adheres to in order to carry out tasks. Various processes may be utilized, including sequential execution, where tasks are completed one after the other, or hierarchical execution, where a lead or manager agent is responsible for delegating tasks to ensure an orderly approach. The defined process is critical for maintaining organization and efficiency in task execution, thereby enabling the crew to meet its objectives strategically and systematically.</p>
<ol>
<li><h3 id="heading-pre-requisites">Pre-Requisites</h3>
<p> <strong>VSCode</strong> - <a target="_blank" href="https://code.visualstudio.com/download">https://code.visualstudio.com/download</a></p>
<p> <strong>Install Python</strong> --- <a target="_blank" href="https://www.python.org/downloads/">https://www.python.org/downloads/</a></p>
</li>
<li><h3 id="heading-crewai-installation">CrewAI Installation:</h3>
<p> <strong>Run following commands</strong></p>
<p> <strong>python -m .venv</strong> ( to create virtual environment)</p>
<p> <strong>source .venv/bin/activate</strong> ( to activate virtual environment)</p>
<p> <strong>pip install uv</strong> (Python package installer)</p>
<p> <strong>uv --version</strong> (to check version)</p>
<p> <strong>uv tool install crewai</strong> (install the crewai )</p>
<p> <strong>uv tool update-shell</strong> (to set path)</p>
</li>
<li><h3 id="heading-create-project">Create Project</h3>
<p> <strong>crewai create crew loan_processing</strong> ( to create skelaton project loan_processing)</p>
<p> This will create project in structure as shown below</p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771296793551/f98956ec-ef56-4ed9-aa94-a165b8e93faa.png" alt /></p>
</li>
<li><p>Create requirements.txt file for setting up all libraries and save it at the root of the project</p>
<pre><code class="lang-python"> boto3
 crewai[tools]
 crewai-tools[mcp]
 streamlit==<span class="hljs-number">1.49</span><span class="hljs-number">.1</span>
 ratelimit
 tenacity
</code></pre>
</li>
<li><p>install the libraries by running</p>
<p> <strong>pip install -r requirements.txt</strong></p>
</li>
</ol>
<p><strong>Update</strong> the agents.yml file to add agents for the app as shown below</p>
<pre><code class="lang-python">doc_specialist:
  role: &gt;
    Document Validation Specialist
  goal: &gt;
    Validate the completeness <span class="hljs-keyword">and</span> format of a new loan application provided <span class="hljs-keyword">as</span> a JSON string.
  backstory: &gt;
    You are a meticulous agent responsible <span class="hljs-keyword">for</span> the first step of loan processing.

credit_analyst:
  role: &gt;
    Credit Check Agent
  goal: &gt;
    Query the credit bureau API to retrieve an applicant<span class="hljs-string">'s credit score.
  backstory: &gt;
    You are a specialized agent that interacts with the Credit Bureau.


risk_assessor:
  role: &gt;
    Risk Assessment Analyst
  goal: &gt;
    Calculate the financial risk score for a loan application.
  backstory: &gt;
    You are a quantitative analyst agent.


compliance_officer:
  role: &gt;
    Compliance Officer
  goal: &gt;
    Check the application against all internal lending policies and compliance rules.
  backstory: &gt;
    You are the final checkpoint for policy and compliance.


manager:
  role: &gt;
    Loan Processing Manager
  goal: &gt;
    Manage the loan application workflow and compile the final report.
  backstory: &gt;
    You are the manager responsible for orchestrating the loan processing pipeline</span>
</code></pre>
<p>Update the <strong>tasks.yml</strong> file to add tasks</p>
<pre><code class="lang-python">task_validate:
  description: &gt;
    Validate the loan application provided <span class="hljs-keyword">as</span> a JSON string: <span class="hljs-string">'{document_content}'</span>.
    Pass this string to the <span class="hljs-string">'Validate Document Fields'</span> tool.
  expected_output: &gt;
    A JSON string <span class="hljs-keyword">with</span> the validation status
  agent: doc_specialist

task_credit:
  description: &gt;
    Extract customer_id <span class="hljs-keyword">and</span> call Query Credit Bureau API.
  expected_output: &gt;
    A JSON string containing the credit_score.
  agent: credit_analyst
  context:
    - task_validate

task_risk:
  description: &gt;
    Extract loan details <span class="hljs-keyword">and</span> credit score, then Calculate Risk Score..
  expected_output: &gt;
    A JSON string containing the risk_score.
  agent: risk_assessor
  context:
    - task_validate
    - task_credit

task_compliance:
  description: &gt;
    Check Lending Compliance based on history <span class="hljs-keyword">and</span> risk score.
  expected_output: &gt;
    Compliance status JSON.
  agent: compliance_officer
  context:
    - task_validate
    - task_risk

task_report:
  description: &gt;
    Compile a final report <span class="hljs-keyword">with</span> Approve/Deny decision.
  expected_output: &gt;
    Markdown report.
  agent: manager
  context:
    - task_validate
    - task_credit
    - task_risk
    - task_compliance
</code></pre>
<p>Update the <a target="_blank" href="http://crew.py">crew.py</a> file to include the following configurations in detail:</p>
<ol>
<li><p><strong>Create an LLM</strong>: Implement the necessary code to initialize and configure the LLM (Language Model), for our case I have used OpenAI LLM.</p>
</li>
<li><p><strong>Configure Agents</strong>: Within the @CrewBase section, use the @agent decorator to define and configure all agents that will be part of the crew. Ensure that each agent has clear roles and responsibilities outlined.</p>
</li>
<li><p><strong>Define Tasks</strong>: Similarly, utilize the @task decorator to specify all tasks that agents will be responsible for. Make sure each task is well-defined and includes the necessary parameters and execution criteria.</p>
</li>
<li><p><strong>Assemble the Crew</strong>: Use the @crew decorator to bring together the agents, tasks, and a designated manager. The assembly should clearly delineate how each component interacts and their roles within the overall crew structure.</p>
</li>
<li><p><strong>Define Tools</strong>: Finally, clearly specify all the required tools needed for the operation using the @tools decorator. Ensure that each tool is associated with the relevant agent or task for seamless integration.</p>
</li>
</ol>
<p>With these changes we have created a well-structured and functional crew configuration within the <a target="_blank" href="http://crew.py">crew.py</a> file.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> crewai <span class="hljs-keyword">import</span> LLM, Agent, Crew, Process, Task
<span class="hljs-keyword">from</span> crewai.project <span class="hljs-keyword">import</span> CrewBase, agent, crew, task
<span class="hljs-keyword">from</span> crewai.agents.agent_builder.base_agent <span class="hljs-keyword">import</span> BaseAgent
<span class="hljs-keyword">from</span> typing <span class="hljs-keyword">import</span> List
<span class="hljs-keyword">from</span> crewai.tools <span class="hljs-keyword">import</span> tool
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> sys
<span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime
<span class="hljs-keyword">import</span> os

<span class="hljs-comment">#load the OPENAI_API_KEY from environment variable or set it directly</span>
OPENAI_API_KEY = os.getenv(<span class="hljs-string">"OPENAI_API_KEY"</span>)
llm = LLM(
    model=os.getenv(<span class="hljs-string">"MODEL"</span>, <span class="hljs-string">"gpt-4o"</span>),  <span class="hljs-comment"># Default to gpt-4o if MODEL env variable is not set</span>
    api_key=OPENAI_API_KEY,  <span class="hljs-comment"># Or set OPENAI_API_KEY</span>
    temperature=<span class="hljs-number">0.0</span>,
    max_tokens=<span class="hljs-number">1000</span>,
)

<span class="hljs-meta">@CrewBase</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">LoanProcessing</span>():</span>
    <span class="hljs-string">"""LoanProcessing crew"""</span>

    <span class="hljs-comment">#@title Run CrewAI</span>
    agents: List[BaseAgent]
    tasks: List[Task]

<span class="hljs-meta">    @agent</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">doc_specialist</span>(<span class="hljs-params">self</span>) -&gt; Agent:</span>
        <span class="hljs-keyword">return</span> Agent(
                config=self.agents_config[<span class="hljs-string">'doc_specialist'</span>], <span class="hljs-comment"># type: ignore[index]</span>
                verbose=<span class="hljs-literal">True</span>,
                tools=[self.ValidateDocumentFieldsTool],
                llm=llm
            )

<span class="hljs-meta">    @agent</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">credit_analyst</span>(<span class="hljs-params">self</span>) -&gt; Agent:</span>
        <span class="hljs-keyword">return</span> Agent(
                config=self.agents_config[<span class="hljs-string">'credit_analyst'</span>], <span class="hljs-comment"># type: ignore[index]</span>
                verbose=<span class="hljs-literal">True</span>,
                tools=[self.QueryCreditBureauAPITool],
                llm=llm
            )       
<span class="hljs-meta">    @agent</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">risk_assessor</span>(<span class="hljs-params">self</span>) -&gt; Agent:</span>
        <span class="hljs-keyword">return</span> Agent(
                config=self.agents_config[<span class="hljs-string">'risk_assessor'</span>], <span class="hljs-comment"># type: ignore[index]</span>
                verbose=<span class="hljs-literal">True</span>,
                tools=[self.CalculateRiskScoreTool],
                llm=llm
            )
<span class="hljs-meta">    @agent</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compliance_officer</span>(<span class="hljs-params">self</span>) -&gt; Agent:</span>
        <span class="hljs-keyword">return</span> Agent(
                config=self.agents_config[<span class="hljs-string">'compliance_officer'</span>], <span class="hljs-comment"># type: ignore[index]</span>
                verbose=<span class="hljs-literal">True</span>,
                tools=[self.CheckLendingComplianceTool],
                llm=llm
            )
<span class="hljs-meta">    @agent</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">manager</span>(<span class="hljs-params">self</span>) -&gt; Agent:</span>
        <span class="hljs-keyword">return</span> Agent(
                config=self.agents_config[<span class="hljs-string">'manager'</span>], <span class="hljs-comment"># type: ignore[index]</span>
                verbose=<span class="hljs-literal">True</span>,
                llm=llm,
                allow_delegation=<span class="hljs-literal">True</span>
            )
<span class="hljs-meta">    @task</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">task_validate</span>(<span class="hljs-params">self</span>) -&gt; Task:</span>
            <span class="hljs-keyword">return</span> Task(
                config=self.tasks_config[<span class="hljs-string">'task_validate'</span>], <span class="hljs-comment"># type: ignore[index]</span>
            )

<span class="hljs-meta">    @task</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">task_credit</span>(<span class="hljs-params">self</span>) -&gt; Task:</span>
            <span class="hljs-keyword">return</span> Task(
                config=self.tasks_config[<span class="hljs-string">'task_credit'</span>], <span class="hljs-comment"># type: ignore[index]</span>
            )
<span class="hljs-meta">    @task</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">task_risk</span>(<span class="hljs-params">self</span>) -&gt; Task:</span>
            <span class="hljs-keyword">return</span> Task(
                config=self.tasks_config[<span class="hljs-string">'task_risk'</span>], <span class="hljs-comment"># type: ignore[index]</span>
            )
<span class="hljs-meta">    @task</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">task_compliance</span>(<span class="hljs-params">self</span>) -&gt; Task:</span>
            <span class="hljs-keyword">return</span> Task(
                config=self.tasks_config[<span class="hljs-string">'task_compliance'</span>], <span class="hljs-comment"># type: ignore[index]</span>
            )
<span class="hljs-meta">    @task</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">task_report</span>(<span class="hljs-params">self</span>) -&gt; Task:</span>
            <span class="hljs-keyword">return</span> Task(
                config=self.tasks_config[<span class="hljs-string">'task_report'</span>], <span class="hljs-comment"># type: ignore[index]</span>
                allow_delegation=<span class="hljs-literal">False</span>
            )

<span class="hljs-meta">    @crew</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">crew</span>(<span class="hljs-params">self</span>) -&gt; Crew:</span>
        <span class="hljs-string">"""Creates the LoanProcessing crew"""</span>
        <span class="hljs-comment"># To learn how to add knowledge sources to your crew, check out the documentation:</span>
        <span class="hljs-comment"># https://docs.crewai.com/concepts/knowledge#what-is-knowledge</span>

        <span class="hljs-keyword">return</span> Crew(
            agents=[self.doc_specialist(), self.credit_analyst(), self.risk_assessor(), self.compliance_officer()], <span class="hljs-comment"># Automatically created by the @agent decorator</span>
            tasks=self.tasks, <span class="hljs-comment"># Automatically created by the @task decorator</span>
            manager_agent=self.manager(), <span class="hljs-comment"># Automatically created by the @agent decorator</span>
            process=Process.hierarchical,
            verbose=<span class="hljs-literal">True</span>
        )
<span class="hljs-meta">    @tool</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">ValidateDocumentFieldsTool</span>(<span class="hljs-params"> application_data: str</span>) -&gt; str:</span>
        <span class="hljs-string">"""Validates JSON application data."""</span>
        name: str = <span class="hljs-string">"Validate Document Fields"</span>
        description: str = <span class="hljs-string">"Validates JSON application data."</span>
        print(<span class="hljs-string">f"--- TOOL: Validating document fields ---"</span>+<span class="hljs-string">f" (Data: <span class="hljs-subst">{application_data}</span>) ---"</span>)
        <span class="hljs-keyword">try</span>:
            data = json.loads(application_data)
            required = [<span class="hljs-string">"customer_id"</span>, <span class="hljs-string">"loan_amount"</span>, <span class="hljs-string">"income"</span>, <span class="hljs-string">"credit_history"</span>]
            missing = [f <span class="hljs-keyword">for</span> f <span class="hljs-keyword">in</span> required <span class="hljs-keyword">if</span> f <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> data]
            <span class="hljs-keyword">if</span> missing:
                <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"error"</span>: <span class="hljs-string">f"Missing fields: <span class="hljs-subst">{<span class="hljs-string">', '</span>.join(missing)}</span>"</span>})
            <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"status"</span>: <span class="hljs-string">"validated"</span>, <span class="hljs-string">"data"</span>: data})
        <span class="hljs-keyword">except</span>:
            <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"error"</span>: <span class="hljs-string">"Invalid JSON"</span>})
<span class="hljs-meta">    @tool</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">QueryCreditBureauAPITool</span>(<span class="hljs-params">customer_id: str</span>) -&gt; str:</span>
        <span class="hljs-string">"""Gets credit score for customer_id."""</span>
        name: str = <span class="hljs-string">"Query Credit Bureau API"</span>
        description: str = <span class="hljs-string">"Gets credit score for customer_id."</span>
        print(<span class="hljs-string">f"--- TOOL: Calling Credit Bureau for <span class="hljs-subst">{customer_id}</span> ---"</span>)
        scores = {
            <span class="hljs-string">"CUST-12345"</span>: <span class="hljs-number">810</span>, <span class="hljs-comment"># Good</span>
            <span class="hljs-string">"CUST-99999"</span>: <span class="hljs-number">550</span>, <span class="hljs-comment"># BAD SCORE (&lt; 600)</span>
            <span class="hljs-string">"CUST-55555"</span>: <span class="hljs-number">620</span>
        }
        score = scores.get(customer_id)
        <span class="hljs-keyword">if</span> score:
            <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"customer_id"</span>: customer_id, <span class="hljs-string">"credit_score"</span>: score})
        <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"error"</span>: <span class="hljs-string">"Customer not found"</span>})
<span class="hljs-meta">    @tool</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">CalculateRiskScoreTool</span>(<span class="hljs-params">loan_amount: int, income: str, credit_score: int</span>) -&gt; str:</span>
        <span class="hljs-string">"""Calculates risk based on financial data."""</span>
        name: str = <span class="hljs-string">"Calculate Risk Score"</span>
        description: str = <span class="hljs-string">"Calculates risk based on financial data."</span>
        print(<span class="hljs-string">f"--- TOOL: Calculating Risk (Score: <span class="hljs-subst">{credit_score}</span>) ---"</span>)
        <span class="hljs-comment"># Logic: Credit Score &lt; 600 is automatic HIGH risk</span>
        <span class="hljs-keyword">if</span> credit_score &lt; <span class="hljs-number">600</span>:
            <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"risk_score"</span>: <span class="hljs-number">9</span>, <span class="hljs-string">"reason"</span>: <span class="hljs-string">"Credit score too low"</span>})

        <span class="hljs-comment"># Standard logic</span>
        <span class="hljs-keyword">try</span>:
            inc_val = int(<span class="hljs-string">''</span>.join(filter(str.isdigit, income)))
            ann_inc = inc_val * <span class="hljs-number">12</span> <span class="hljs-keyword">if</span> <span class="hljs-string">"month"</span> <span class="hljs-keyword">in</span> income.lower() <span class="hljs-keyword">else</span> inc_val
        <span class="hljs-keyword">except</span>: ann_inc = <span class="hljs-number">0</span>

        risk = <span class="hljs-number">1</span>
        <span class="hljs-keyword">if</span> credit_score &lt; <span class="hljs-number">720</span>: risk += <span class="hljs-number">2</span>
        <span class="hljs-keyword">if</span> ann_inc &gt; <span class="hljs-number">0</span> <span class="hljs-keyword">and</span> (loan_amount / ann_inc) &gt; <span class="hljs-number">0.5</span>: risk += <span class="hljs-number">3</span>

        <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"risk_score"</span>: min(risk, <span class="hljs-number">10</span>)})
<span class="hljs-meta">    @tool</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">CheckLendingComplianceTool</span>(<span class="hljs-params">loan_amount: int, risk_score: int</span>) -&gt; str:</span>
        <span class="hljs-string">"""Checks if loan complies with lending rules."""</span>
        name: str = <span class="hljs-string">"Check Lending Compliance"</span>
        description: str = <span class="hljs-string">"Checks if loan complies with lending rules."</span>
        print(<span class="hljs-string">f"--- TOOL: Checking Lending Compliance ---"</span>)
        <span class="hljs-comment"># Simple compliance logic for demo</span>
        <span class="hljs-keyword">if</span> loan_amount &gt; <span class="hljs-number">500000</span>:
            <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"compliant"</span>: <span class="hljs-literal">False</span>, <span class="hljs-string">"reason"</span>: <span class="hljs-string">"Loan amount exceeds limit"</span>})
        <span class="hljs-keyword">if</span> risk_score &gt;= <span class="hljs-number">7</span>:
            <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"compliant"</span>: <span class="hljs-literal">False</span>, <span class="hljs-string">"reason"</span>: <span class="hljs-string">"Risk score too high"</span>})
        <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"compliant"</span>: <span class="hljs-literal">True</span>})
</code></pre>
<p>We are about to enhance the <strong>main.py</strong> file of the Crewai application by implementing a <code>run</code> method. This method will serve as the starting point for the Crewai app, taking in the necessary inputs to initialize the application.</p>
<p>The input we will use is a <strong>JSON</strong> file that features a key named <strong>document_content</strong>. This content is structured using several important attributes, which include:</p>
<ul>
<li><p><strong>customer_id</strong>: A unique identifier for each customer.</p>
</li>
<li><p><strong>loan_amount</strong>: The total amount of money requested for the loan.</p>
</li>
<li><p><strong>income</strong>: The customer's declared income, which helps assess their ability to repay the loan.</p>
</li>
<li><p><strong>credit_history</strong>: A record of the customer's credit behavior, indicating their reliability in managing debt.</p>
</li>
</ul>
<p>These attributes will be fetched using a specific identifier known as <strong>document_id</strong>, ensuring that we are working with the correct data for each individual customer. This setup will allow the Crewai app to effectively process loan requests based on the provided information.</p>
<p>code is shown below</p>
<pre><code class="lang-python"><span class="hljs-comment">#!/usr/bin/env python</span>
<span class="hljs-keyword">import</span> sys
<span class="hljs-keyword">import</span> warnings

<span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">from</span> tenacity <span class="hljs-keyword">import</span> retry, stop_after_attempt, wait_exponential, retry_if_exception
<span class="hljs-keyword">from</span> ratelimit <span class="hljs-keyword">import</span> limits, sleep_and_retry
<span class="hljs-keyword">import</span> time

<span class="hljs-keyword">from</span> loan_processing.crew <span class="hljs-keyword">import</span> LoanProcessing


<span class="hljs-comment"># --- CONFIGURATION ---</span>
CALLS = <span class="hljs-number">15</span>  <span class="hljs-comment"># Max calls...</span>
PERIOD = <span class="hljs-number">60</span> <span class="hljs-comment"># ...per minute</span>

loan_application_inputs_valid = {
    <span class="hljs-string">"applicant_id"</span>: <span class="hljs-string">"borrower_good_780"</span>,
    <span class="hljs-string">"document_id"</span>: <span class="hljs-string">"document_valid_123"</span>
}

loan_application_inputs_invalid = {
    <span class="hljs-string">"applicant_id"</span>: <span class="hljs-string">"borrower_bad_620"</span>,
    <span class="hljs-string">"document_id"</span>: <span class="hljs-string">"document_invalid_456"</span>
}    


warnings.filterwarnings(<span class="hljs-string">"ignore"</span>, category=SyntaxWarning, module=<span class="hljs-string">"pysbd"</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run</span>():</span>
    <span class="hljs-string">"""
    Run the crew.
    """</span>

    inputs = {
        <span class="hljs-string">'topic'</span>: <span class="hljs-string">'AI LLMs'</span>,
        <span class="hljs-string">'current_year'</span>: str(datetime.now().year)
    }

    <span class="hljs-keyword">try</span>:
        print(<span class="hljs-string">"--- KICKING OFF CREWAI (VALID INPUTS) ---"</span>)
        valid_json = get_document_content(loan_application_inputs_valid[<span class="hljs-string">'document_id'</span>])
        inputs = {<span class="hljs-string">'document_content'</span>: valid_json}
        robust_execute(LoanProcessing().crew().kickoff, inputs=inputs)
    <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
        <span class="hljs-keyword">import</span> traceback
        traceback.print_exc()
        handle_execution_error(e)

<span class="hljs-comment"># --- 1. HELPER: Mock Document Fetcher ---</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_document_content</span>(<span class="hljs-params">document_id: str</span>) -&gt; str:</span>
    print(<span class="hljs-string">f"--- HELPER: Simulating fetch for doc_id: <span class="hljs-subst">{document_id}</span> ---"</span>)

    <span class="hljs-keyword">if</span> document_id == <span class="hljs-string">"document_valid_123"</span>:
        <span class="hljs-comment"># Happy Path: High Income, Good History</span>
        <span class="hljs-keyword">return</span> json.dumps({
            <span class="hljs-string">"customer_id"</span>: <span class="hljs-string">"CUST-12345"</span>,
            <span class="hljs-string">"loan_amount"</span>: <span class="hljs-number">50000</span>,
            <span class="hljs-string">"income"</span>: <span class="hljs-string">"USD 120000 a year"</span>,
            <span class="hljs-string">"credit_history"</span>: <span class="hljs-string">"7 years good standing"</span>
        })

    <span class="hljs-keyword">elif</span> document_id == <span class="hljs-string">"document_risky_789"</span>:
        <span class="hljs-comment"># Unhappy Path: Valid Docs, but LOW CREDIT SCORE</span>
        <span class="hljs-keyword">return</span> json.dumps({
            <span class="hljs-string">"customer_id"</span>: <span class="hljs-string">"CUST-99999"</span>,
            <span class="hljs-string">"loan_amount"</span>: <span class="hljs-number">50000</span>,
            <span class="hljs-string">"income"</span>: <span class="hljs-string">"USD 40000 a year"</span>,
            <span class="hljs-string">"credit_history"</span>: <span class="hljs-string">"Recent Missed Payments"</span>
        })

    <span class="hljs-keyword">elif</span> document_id == <span class="hljs-string">"document_invalid_456"</span>:
        <span class="hljs-comment"># Broken Path: Missing fields (income)</span>
        <span class="hljs-keyword">return</span> json.dumps({
            <span class="hljs-string">"customer_id"</span>: <span class="hljs-string">"CUST-55555"</span>,
            <span class="hljs-string">"loan_amount"</span>: <span class="hljs-number">200000</span>,
            <span class="hljs-string">"credit_history"</span>: <span class="hljs-string">"1 year"</span>
        })
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">return</span> json.dumps({<span class="hljs-string">"error"</span>: <span class="hljs-string">"Document ID not found."</span>})

<span class="hljs-comment"># --- HELPER: ERROR FILTER ---</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">is_rate_limit_error</span>(<span class="hljs-params">e</span>):</span>
    msg = str(e).lower()
    <span class="hljs-keyword">return</span> <span class="hljs-string">"429"</span> <span class="hljs-keyword">in</span> msg <span class="hljs-keyword">or</span> <span class="hljs-string">"quota"</span> <span class="hljs-keyword">in</span> msg <span class="hljs-keyword">or</span> <span class="hljs-string">"resource exhausted"</span> <span class="hljs-keyword">in</span> msg <span class="hljs-keyword">or</span> <span class="hljs-string">"serviceunavailable"</span> <span class="hljs-keyword">in</span> msg

<span class="hljs-comment"># --- ROBUST WRAPPER ---</span>
<span class="hljs-meta">@sleep_and_retry</span>
<span class="hljs-meta">@limits(calls=CALLS, period=PERIOD)</span>
<span class="hljs-meta">@retry(</span>
    stop=stop_after_attempt(<span class="hljs-number">5</span>),
    wait=wait_exponential(multiplier=<span class="hljs-number">2</span>, min=<span class="hljs-number">4</span>, max=<span class="hljs-number">30</span>),
    retry=retry_if_exception(is_rate_limit_error),
    reraise=<span class="hljs-literal">True</span>
)
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">robust_execute</span>(<span class="hljs-params">func, *args, **kwargs</span>):</span>
    <span class="hljs-string">"""
    Executes any function (CrewAI kickoff, LangGraph invoke) with built-in
    rate limiting and auto-retries for transient API errors.
    """</span>
    print(<span class="hljs-string">f"  &gt;&gt; [Clock <span class="hljs-subst">{time.strftime(<span class="hljs-string">'%X'</span>)}</span>] Executing Agent Action (Safe Mode)..."</span>)
    <span class="hljs-keyword">return</span> func(*args, **kwargs)

<span class="hljs-comment"># --- ERROR HANDLER ---</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">handle_execution_error</span>(<span class="hljs-params">e</span>):</span>
    <span class="hljs-string">"""Prints a clean, professional error report."""</span>
    error_msg = str(e)
    is_quota = <span class="hljs-string">"429"</span> <span class="hljs-keyword">in</span> error_msg <span class="hljs-keyword">or</span> <span class="hljs-string">"quota"</span> <span class="hljs-keyword">in</span> error_msg.lower()

    print(<span class="hljs-string">"\n"</span> + <span class="hljs-string">"━"</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"  🛑  MISSION ABORTED: SYSTEM CRITICAL ERROR"</span>)
    print(<span class="hljs-string">"━"</span> * <span class="hljs-number">60</span>)

    <span class="hljs-keyword">if</span> is_quota:
        print(<span class="hljs-string">"  ⚠️   CAUSE:    QUOTA EXCEEDED (API Refusal)"</span>)
        print(<span class="hljs-string">"  🔍   CONTEXT:  The LLM provider rejected the request."</span>)
        print(<span class="hljs-string">"\n  🛠️   ACTION:    [1] Wait before retrying"</span>)
        print(<span class="hljs-string">"                  [2] Check API Limits (Free Tier is ~15 RPM)"</span>)
    <span class="hljs-keyword">else</span>:
        print(<span class="hljs-string">f"  ⚠️   CAUSE:    UNEXPECTED EXCEPTION"</span>)
        print(<span class="hljs-string">f"  📝   DETAILS:  <span class="hljs-subst">{error_msg}</span>"</span>)

    print(<span class="hljs-string">"━"</span> * <span class="hljs-number">60</span> + <span class="hljs-string">"\n"</span>)
</code></pre>
<p>Now run the app with command</p>
<h3 id="heading-crewai-run"><strong>crewai run</strong></h3>
<h3 id="heading-github-repo">Github Repo</h3>
<p>Code for this project is available at <a target="_blank" href="https://github.com/learner14/multi-agent-crewai">multi-agent-crewai loan processing app</a></p>
]]></content:encoded></item><item><title><![CDATA[Implementing LSTM RNN using Pytorch]]></title><description><![CDATA[Previously, I wrote an article titled "Recurrent Neural Network," where I delved into the inner workings of Recurrent Neural Networks (RNNs) and their significance in the field of machine learning. Subsequently, I provided a tutorial “Implementing LS...]]></description><link>https://path2ml.com/implementing-lstm-rnn-using-pytorch</link><guid isPermaLink="true">https://path2ml.com/implementing-lstm-rnn-using-pytorch</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[DeepLearning]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[MachineLearning]]></category><category><![CDATA[RNN]]></category><category><![CDATA[LSTM]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Sat, 31 Jan 2026 16:25:47 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769876556685/64c863fe-0df7-4b07-bc10-8d75f8c7159a.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Previously, I wrote an article titled "<a target="_blank" href="https://path2ml.com/recurrent-neural-network">Recurrent Neural Network</a>," where I delved into the inner workings of <strong>Recurrent Neural Networks (RNNs)</strong> and their significance in the field of machine learning. Subsequently, I provided a tutorial “<a target="_blank" href="https://path2ml.com/implementing-lstm-rnn-using-keras-and-tensorflow"><strong>Implementing LSTM RNN using Keras and TensorFlow</strong></a><strong>”</strong> that illustrated how to implement <strong>RNNs</strong> using popular deep learning libraries, <strong>Keras and TensorFlow</strong>. In this blog post, I am excited to take it a step further by guiding you through the implementation of <strong>Long Short-Term Memory (LSTM)</strong> networks, a specific type of <strong>RNN</strong>, using <strong>PyTorch</strong>. LSTMs are particularly effective in handling sequential data, and I look forward to exploring this powerful tool with you.</p>
<p>I am including a diagram from my previous blog post that illustrates the architecture of <strong>Long Short-Term Memory (LSTM) networks</strong>. This will help to refresh your memory and provide a solid foundation as we begin writing the code.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769866012116/f4c454bd-d9dc-46e9-bb33-3943595980d1.png" alt class="image--center mx-auto" /></p>
<p>In general the current input vector and the previous short-term state/hidden state are fed to four different fully connected layers.</p>
<ul>
<li><p>The main layer which outputs candidate value is responsible for outputting . It analyzes the current inputs and the previous short-term/hidden state . The most important information is stored in the long-term state while the rest is discarded.</p>
</li>
<li><p>The three other layers(forget, update and output) are <strong><em>gate controllers</em></strong>. Since they use the logistic activation function <strong>sigmoid</strong> , the outputs range from 0 to 1. As you can see, the gate controllers’ outputs are fed to <strong>element-wise multiplication</strong> operations: if they output <strong>0s</strong> they close the gate, and if they output <strong>1s</strong> they open it. Specifically:</p>
<ul>
<li><p>The <strong><em>forget gate</em></strong> (controlled by ) controls which parts of the long-term state should be erased.</p>
</li>
<li><p>The <strong><em>update gate</em></strong> (controlled by ) controls which parts of should be added to the long-term state.</p>
</li>
<li><p>Finally, the <strong><em>output gate</em></strong> (controlled by ) controls which parts of the long-term state should be read and output at this time step, both to and to .</p>
</li>
</ul>
</li>
</ul>
<p><strong>Let's try to understand this content through the image shown below for simple RNN cell</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769956870680/7994127a-9377-4307-af44-c6729d2f582a.png" alt class="image--center mx-auto" /></p>
<ol>
<li><p><strong>Input Data Block (Left)</strong></p>
<ul>
<li><p><strong>Shape:</strong> (Batch Size, Time Steps, Num Features)</p>
</li>
<li><p>This cube represents your multivariate time-series input:</p>
<ul>
<li><p><strong>Batch:</strong> Multiple sequences processed together.</p>
</li>
<li><p><strong>Time Steps (Window Size):</strong> How far back in time you look (T).</p>
</li>
<li><p><strong>Features:</strong> The number of variables at each time step (D).</p>
</li>
</ul>
</li>
<li><p>👉 At each time step ( t ), the model sees all features together, not one at a time.</p>
</li>
</ul>
</li>
<li><p><strong>Feature Vector at Each Time Step</strong></p>
<ul>
<li><p>The feature vectors are represented as follows:</p>
<ul>
<li><p>\(( x_1 = [f_1, f_2, f_3, \ldots, f_D] )\)</p>
</li>
<li><p>\(( x_2 = [f_1, f_2, f_3, \ldots, f_D] )\)</p>
</li>
<li><p>...</p>
</li>
<li><p>\(( x_T = [f_1, f_2, f_3, \ldots, f_D] )\)</p>
</li>
</ul>
</li>
<li><p>This means:</p>
<ul>
<li><p>At time step ( t ), the input consists of one vector of ( D ) features.</p>
</li>
<li><p>There is NOT one RNN per feature; all features are fed simultaneously into the same RNN cell.</p>
</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>RNN Cells (Middle/Right)</strong></p>
<ul>
<li><p>Each box labeled “RNN Cell (One Shared Network)” represents the same RNN, reused at every time step.</p>
</li>
<li><p>Important points:</p>
<ul>
<li><p>The boxes are drawn multiple times only to illustrate time unfolding.</p>
</li>
<li><p>Weights are shared across all time steps.</p>
</li>
<li><p>This is one RNN, not many.</p>
</li>
</ul>
</li>
<li><p>Mathematically:</p>
<ul>
<li><p>\(( h_t = RNN(x_t, h_{t-1}) )\)</p>
</li>
<li><p>Where:</p>
<ul>
<li><p>\(( x_t )\) = all features at time ( t )</p>
</li>
<li><p>\(( h_t ) \) \= hidden state at time ( t )</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><p><strong>Hidden States</strong> \(\mathbf{(( h_1, h_2, \ldots, h_T ))}\)</p>
<ul>
<li><p>Each RNN cell outputs:</p>
<ul>
<li>\(( h_1, h_2, h_3, \ldots, h_T )\)</li>
</ul>
</li>
<li><p>These represent:</p>
<ul>
<li><p>The model's memory after processing data up to that time.</p>
</li>
<li><p>Each \(( h_t )\) is typically a vector of size "hidden_size."</p>
</li>
</ul>
</li>
<li><p>Hidden states:</p>
<ul>
<li><p>Change at every time step.</p>
</li>
<li><p>Carry temporal information forward.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<p>At every time step, one <strong>RNN</strong> processes all features together, updates its memory, and passes that memory to the next time step using the same weights.</p>
<h3 id="heading-implementation">Implementation</h3>
<p>Ihave utilized <strong>Jupyter Notebook</strong>, which is installed on my <strong>Mac</strong>, to run my code. However, an excellent alternative is <strong>Google Colab</strong>, which allows for seamless execution of the code presented in this blog.</p>
<p>In order to work with the code effectively, there are specific packages that I need to install within my virtual environment. These packages include:</p>
<ul>
<li><p><strong>PyTorch</strong>: A powerful deep learning library that provides flexibility and ease of use for building and training neural networks.</p>
</li>
<li><p><strong>TorchMetrics</strong>: A library that offers a wide range of metrics specifically designed for evaluating the performance of generative models.</p>
</li>
</ul>
<h3 id="heading-loading-packages">Loading packages</h3>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> torch.optim <span class="hljs-keyword">as</span> optim
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">from</span> torch.utils.data <span class="hljs-keyword">import</span> DataLoader
<span class="hljs-keyword">from</span> pathlib <span class="hljs-keyword">import</span> Path
<span class="hljs-keyword">import</span> tarfile
<span class="hljs-keyword">import</span> urllib.request
<span class="hljs-keyword">import</span> torchmetrics
</code></pre>
<h3 id="heading-leveraging-gpumps-to-boost-performance">Leveraging GPU/MPS to boost performance</h3>
<pre><code class="lang-python"><span class="hljs-keyword">if</span> torch.cuda.is_available():
    device = <span class="hljs-string">"cuda"</span>
<span class="hljs-keyword">elif</span> torch.backends.mps.is_available():
    device = <span class="hljs-string">"mps"</span>
<span class="hljs-keyword">else</span>:
    device = <span class="hljs-string">"cpu"</span>
device
</code></pre>
<h3 id="heading-download-the-data">Download the data</h3>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">download_and_extract_ridership_data</span>():</span>
    tarball_path = Path(<span class="hljs-string">"datasets/ridership.tgz"</span>)
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> tarball_path.is_file():
        Path(<span class="hljs-string">"datasets"</span>).mkdir(parents=<span class="hljs-literal">True</span>, exist_ok=<span class="hljs-literal">True</span>)
        url = <span class="hljs-string">"https://github.com/learner14/data"</span>
        urllib.request.urlretrieve(url, tarball_path)
        <span class="hljs-keyword">with</span> tarfile.open(tarball_path) <span class="hljs-keyword">as</span> housing_tarball:
            housing_tarball.extractall(path=<span class="hljs-string">"datasets"</span>, filter=<span class="hljs-string">"data"</span>)

download_and_extract_ridership_data()
</code></pre>
<pre><code class="lang-python">path = Path(<span class="hljs-string">"datasets/ridership/CTA_-_Ridership_-_Daily_Boarding_Totals.csv"</span>)
df = pd.read_csv(path, parse_dates=[<span class="hljs-string">"service_date"</span>])
df.columns = [<span class="hljs-string">"date"</span>, <span class="hljs-string">"day_type"</span>, <span class="hljs-string">"bus"</span>, <span class="hljs-string">"rail"</span>, <span class="hljs-string">"total"</span>]  <span class="hljs-comment"># shorter names</span>
df = df.sort_values(<span class="hljs-string">"date"</span>).set_index(<span class="hljs-string">"date"</span>)
df = df.drop(<span class="hljs-string">"total"</span>, axis=<span class="hljs-number">1</span>)  
df = df.drop_duplicates()
</code></pre>
<p>Lets look at the first 5 data rows</p>
<pre><code class="lang-python">df.head()
</code></pre>
<pre><code class="lang-python">
day_type    bus    rail
date            
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-01</span>    U    <span class="hljs-number">297192</span>    <span class="hljs-number">126455</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-02</span>    W    <span class="hljs-number">780827</span>    <span class="hljs-number">501952</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-03</span>    W    <span class="hljs-number">824923</span>    <span class="hljs-number">536432</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-04</span>    W    <span class="hljs-number">870021</span>    <span class="hljs-number">550011</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-05</span>    W    <span class="hljs-number">890426</span>    <span class="hljs-number">557917</span>
</code></pre>
<h3 id="heading-plotting-a-sample-to-look-at-data">Plotting a sample to look at data</h3>
<pre><code class="lang-python">plt.rc(<span class="hljs-string">'font'</span>, size=<span class="hljs-number">14</span>)
plt.rc(<span class="hljs-string">'axes'</span>, labelsize=<span class="hljs-number">14</span>, titlesize=<span class="hljs-number">14</span>)
plt.rc(<span class="hljs-string">'legend'</span>, fontsize=<span class="hljs-number">14</span>)
plt.rc(<span class="hljs-string">'xtick'</span>, labelsize=<span class="hljs-number">10</span>)
plt.rc(<span class="hljs-string">'ytick'</span>, labelsize=<span class="hljs-number">10</span>)
</code></pre>
<pre><code class="lang-python">df[<span class="hljs-string">"2021-03"</span>:<span class="hljs-string">"2021-05"</span>].plot(grid=<span class="hljs-literal">True</span>, marker=<span class="hljs-string">"."</span>, figsize=(<span class="hljs-number">8</span>, <span class="hljs-number">3.5</span>))
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769869967769/4e5c0926-0df0-44a1-8698-37e7cb2a3f88.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-data-analysis">Data Analysis</h3>
<p>Here is a comprehensive analysis of the time series data concerning Bus and Rail ridership from approximately March to May 2021.</p>
<p><strong>Daily Time Series:</strong></p>
<ul>
<li><p><strong>Bus Ridership (blue):</strong> Ranges from about 160,000 to 360,000.</p>
</li>
<li><p><strong>Rail Ridership (orange):</strong> Ranges from approximately 95,000 to 225,000.</p>
</li>
</ul>
<p><strong>Key Observations:</strong></p>
<ol>
<li><p><strong>Strong Weekly Seasonality:</strong> Both bus and rail ridership exhibit a pronounced pattern of weekly fluctuations.</p>
</li>
<li><p><strong>Upward Trend:</strong> There is a noticeable increase in ridership over time for both modes of transport.</p>
</li>
<li><p><strong>Highly Synchronized Dips:</strong> Whenever bus ridership declines, rail ridership also tends to decline, and vice versa during increases.</p>
</li>
<li><p><strong>Increasing Variability Over Time:</strong> The variability in ridership data appears to be growing.</p>
</li>
</ol>
<p>Given these characteristics, multivariate forecasting models, such as the Multivariate Long Short-Term Memory (LSTM) model, are likely to perform better than univariate models in predicting future ridership trends.</p>
<h3 id="heading-preparing-data-for-model">Preparing data for model</h3>
<p>Lets create Multivariate Dataset for train and bus</p>
<pre><code class="lang-python">df_mulvar = df[[<span class="hljs-string">"rail"</span>, <span class="hljs-string">"bus"</span>]] / <span class="hljs-number">1e6</span>  <span class="hljs-comment"># use both rail &amp; bus series as input</span>
df_mulvar.head()
</code></pre>
<p>First 5 rows are shown below</p>
<pre><code class="lang-python">
rail    bus
date        
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-01</span>    <span class="hljs-number">0.126455</span>    <span class="hljs-number">0.297192</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-02</span>    <span class="hljs-number">0.501952</span>    <span class="hljs-number">0.780827</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-03</span>    <span class="hljs-number">0.536432</span>    <span class="hljs-number">0.824923</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-04</span>    <span class="hljs-number">0.550011</span>    <span class="hljs-number">0.870021</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-05</span>    <span class="hljs-number">0.557917</span>    <span class="hljs-number">0.890426</span>
</code></pre>
<p>we add day_type in data as one hot encoding for three day_type .</p>
<p>next_day_type_A: for saturday<br />next_day_type_U : for sunday and holiday<br />next_day_type_W.: for weekdays</p>
<pre><code class="lang-python">df_mulvar[<span class="hljs-string">"next_day_type"</span>] = df[<span class="hljs-string">"day_type"</span>].shift(<span class="hljs-number">-1</span>)  <span class="hljs-comment"># we know tomorrow's type</span>
df_mulvar = pd.get_dummies(df_mulvar, dtype=float)  <span class="hljs-comment"># one-hot encode day type</span>
df_mulvar.head()
</code></pre>
<pre><code class="lang-python">
rail    bus    next_day_type_A    next_day_type_U    next_day_type_W
date                    
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-01</span>    <span class="hljs-number">0.126455</span>    <span class="hljs-number">0.297192</span>    <span class="hljs-number">0.0</span>    <span class="hljs-number">0.0</span>    <span class="hljs-number">1.0</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-02</span>    <span class="hljs-number">0.501952</span>    <span class="hljs-number">0.780827</span>    <span class="hljs-number">0.0</span>    <span class="hljs-number">0.0</span>    <span class="hljs-number">1.0</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-03</span>    <span class="hljs-number">0.536432</span>    <span class="hljs-number">0.824923</span>    <span class="hljs-number">0.0</span>    <span class="hljs-number">0.0</span>    <span class="hljs-number">1.0</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-04</span>    <span class="hljs-number">0.550011</span>    <span class="hljs-number">0.870021</span>    <span class="hljs-number">0.0</span>    <span class="hljs-number">0.0</span>    <span class="hljs-number">1.0</span>
<span class="hljs-number">2001</span><span class="hljs-number">-01</span><span class="hljs-number">-05</span>    <span class="hljs-number">0.557917</span>    <span class="hljs-number">0.890426</span>    <span class="hljs-number">1.0</span>    <span class="hljs-number">0.0</span>    <span class="hljs-number">0.0</span>
</code></pre>
<p>The following code generates sliding</p>
<p>windows from a time series, which can be used for predicting the next value in the sequence. The input parameters consist of the time series data itself and a specified window length, denoted as T. This method enables the extraction of overlapping segments of the series, facilitating the analysis and forecasting of future data points based on the patterns observed in these windows.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">MulvarTimeSeriesDataset</span>(<span class="hljs-params">TimeSeriesDataset</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__getitem__</span>(<span class="hljs-params">self, idx</span>):</span>
        window, target = super().__getitem__(idx)
        <span class="hljs-keyword">return</span> window, target[:<span class="hljs-number">2</span>]
</code></pre>
<p>We now create train, validation, and test tensors for rail ridership through date slicing. Using pandas' label-based time slicing, the command <strong>df[["rail"]]["2016-01":"2018-12"]</strong> selects the '<strong>rail</strong>' column for the specified date range.</p>
<p>Next, we apply unit scaling by dividing the raw rider counts by 1 million (1e6). This conversion helps keep the values numerically smaller, which contributes to training stability.</p>
<p>The splits are as follows:</p>
<ul>
<li><p><strong>rail_train</strong>: January 2016 to December 2018</p>
</li>
<li><p><strong>rail_valid</strong>: January 2019 to May 2019</p>
</li>
<li><p><strong>rail_test</strong>: June 2019 to the end of the data</p>
</li>
</ul>
<p>Typically, these tensors can be fed into a <strong>TimeSeriesDataset</strong> with a specified <strong>window_length</strong> to generate sliding windows for next-day predictions.</p>
<pre><code class="lang-python">mulvar_train = torch.FloatTensor(df_mulvar[<span class="hljs-string">"2016-01"</span>:<span class="hljs-string">"2018-12"</span>].values)
mulvar_valid = torch.FloatTensor(df_mulvar[<span class="hljs-string">"2019-01"</span>:<span class="hljs-string">"2019-05"</span>].values)
mulvar_test = torch.FloatTensor(df_mulvar[<span class="hljs-string">"2019-06"</span>:].values)
</code></pre>
<p>Create <strong>TimeSeriesDataset</strong> for <strong>train, validate and test</strong> for rail ridership</p>
<pre><code class="lang-python">window_length = <span class="hljs-number">56</span>
mulvar_train_set = MulvarTimeSeriesDataset(mulvar_train, window_length)
mulvar_train_loader = DataLoader(mulvar_train_set, batch_size=<span class="hljs-number">32</span>, shuffle=<span class="hljs-literal">True</span>)
mulvar_valid_set = MulvarTimeSeriesDataset(mulvar_valid, window_length)
mulvar_valid_loader = DataLoader(mulvar_valid_set, batch_size=<span class="hljs-number">32</span>)
mulvar_test_set = MulvarTimeSeriesDataset(mulvar_test, window_length)
mulvar_test_loader = DataLoader(mulvar_test_set, batch_size=<span class="hljs-number">32</span>)
</code></pre>
<p>Create following functions to evaluate and train the model(<strong>LSTM model</strong>).</p>
<ul>
<li><p><strong>evaluate_tm</strong>(<strong>model, data_loader, metric</strong>)</p>
</li>
<li><p><strong>train(model, optimizer, loss_fn, metric, train_loader, valid_loader, n_epochs, patience=10, factor=0.1)</strong></p>
</li>
</ul>
<h3 id="heading-evaltm-functionmodel-dataloader-metric">eval_tm function(<strong>model, data_loader, metric</strong>)</h3>
<p>The following process evaluates a trained model on a dataset and returns a performance metric (e.g., Mean Absolute Error (MAE) or Accuracy) without updating the model weights.</p>
<ol>
<li><p><strong>Evaluation Mode:</strong> Use <code>model.eval()</code> to set the model in evaluation mode. This adjusts the behavior of layers like dropout and batch normalization to ensure consistent results during inference.</p>
</li>
<li><p><strong>No Gradients:</strong> Employ <a target="_blank" href="http://torch.no"><code>torch.no</code></a><code>_grad()</code> to disable gradient tracking. This saves memory and speeds up the evaluation process.</p>
</li>
<li><p><strong>Batch Loop:</strong> For each batch of data:</p>
<ul>
<li><p>Move <code>X_batch</code> and <code>y_batch</code> to the appropriate device (e.g., GPU).</p>
</li>
<li><p>Compute predictions: <code>y_pred = model(X_batch)</code>.</p>
</li>
<li><p>Update the performance metric using the predictions and the true targets.</p>
</li>
</ul>
</li>
<li><p><strong>Metric Lifecycle:</strong></p>
<ul>
<li><p>Call <code>metric.reset()</code> to clear any prior state.</p>
</li>
<li><p>Use <code>metric.compute()</code> to return the aggregated score across all batches.</p>
</li>
</ul>
</li>
</ol>
<p><strong>Inputs:</strong></p>
<ul>
<li><p><code>model</code> (an instance of <code>nn.Module</code>)</p>
</li>
<li><p><code>data_loader</code> (provides batches for evaluation)</p>
</li>
<li><p><code>metric</code> (an object from <code>torchmetrics</code>)</p>
</li>
</ul>
<p><strong>Output:</strong> The evaluation process outputs a single scalar metric that summarizes the model's performance over the entire data loader.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">evaluate_tm</span>(<span class="hljs-params">model, data_loader, metric</span>):</span>
    model.eval()
    metric.reset()
    <span class="hljs-keyword">with</span> torch.no_grad():
        <span class="hljs-keyword">for</span> X_batch, y_batch <span class="hljs-keyword">in</span> data_loader:
            X_batch, y_batch = X_batch.to(device), y_batch.to(device)
            y_pred = model(X_batch)
            metric.update(y_pred, y_batch)
    <span class="hljs-keyword">return</span> metric.compute()
</code></pre>
<h3 id="heading-trainmodel-optimizer-lossfn-metric-trainloader-validloader-nepochs-patience10-factor01"><strong>train(model, optimizer, loss_fn, metric, train_loader, valid_loader, n_epochs, patience=10, factor=0.1)</strong></h3>
<p>This function orchestrates the model training process while tracking metrics and adapting the learning rate.</p>
<p><strong>Inputs:</strong> It requires the following parameters: <code>model</code>, <code>optimizer</code>, <code>loss_fn</code>, <code>metric</code>, <code>train_loader</code>, <code>valid_loader</code>, <code>n_epochs</code>, <code>patience</code>, and <code>factor</code>.</p>
<p><strong>Scheduler:</strong> The function employs <code>ReduceLROnPlateau</code> with <code>mode="min"</code> to decrease the learning rate when there is no improvement in the validation metric.</p>
<p><strong>Loop:</strong> For each epoch, the function computes <code>y_pred</code>, calculates the loss, performs backpropagation (<code>loss.backward()</code>), updates the model parameters (<code>optimizer.step()</code>), resets the gradients (<a target="_blank" href="http://optimizer.zero"><code>optimizer.zero</code></a><code>_grad()</code>), and updates the metric.</p>
<p><strong>Logging:</strong> It records the average training loss and training metric, evaluates the validation metric using <code>evaluate_tm()</code>, and prints a concise summary of the epoch.</p>
<p><strong>Output:</strong> The function returns a history object containing <code>train_losses</code>, <code>train_metrics</code>, and <code>valid_metrics</code>, which can be used for plotting and analysis.</p>
<pre><code class="lang-python">
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">train</span>(<span class="hljs-params">model, optimizer, loss_fn, metric, train_loader, valid_loader,
          n_epochs, patience=<span class="hljs-number">10</span>, factor=<span class="hljs-number">0.1</span></span>):</span>
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, mode=<span class="hljs-string">"min"</span>, patience=patience, factor=factor)
    history = {<span class="hljs-string">"train_losses"</span>: [], <span class="hljs-string">"train_metrics"</span>: [], <span class="hljs-string">"valid_metrics"</span>: []}
    <span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(n_epochs):
        total_loss = <span class="hljs-number">0.0</span>
        metric.reset()
        model.train()
        <span class="hljs-keyword">for</span> X_batch, y_batch <span class="hljs-keyword">in</span> train_loader:
            X_batch, y_batch = X_batch.to(device), y_batch.to(device)
            y_pred = model(X_batch)
            loss = loss_fn(y_pred, y_batch)
            total_loss += loss.item()
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            metric.update(y_pred, y_batch)
        history[<span class="hljs-string">"train_losses"</span>].append(total_loss / len(train_loader))
        history[<span class="hljs-string">"train_metrics"</span>].append(metric.compute().item())
        val_metric = evaluate_tm(model, valid_loader, metric).item()
        history[<span class="hljs-string">"valid_metrics"</span>].append(val_metric)
        scheduler.step(val_metric)
        print(<span class="hljs-string">f"Epoch <span class="hljs-subst">{epoch + <span class="hljs-number">1</span>}</span>/<span class="hljs-subst">{n_epochs}</span>, "</span>
              <span class="hljs-string">f"train loss: <span class="hljs-subst">{history[<span class="hljs-string">'train_losses'</span>][<span class="hljs-number">-1</span>]:<span class="hljs-number">.4</span>f}</span>, "</span>
              <span class="hljs-string">f"train metric: <span class="hljs-subst">{history[<span class="hljs-string">'train_metrics'</span>][<span class="hljs-number">-1</span>]:<span class="hljs-number">.4</span>f}</span>, "</span>
              <span class="hljs-string">f"valid metric: <span class="hljs-subst">{history[<span class="hljs-string">'valid_metrics'</span>][<span class="hljs-number">-1</span>]:<span class="hljs-number">.4</span>f}</span>"</span>)
    <span class="hljs-keyword">return</span> history
</code></pre>
<p>Utility function <strong>fit_and_evaluate()</strong> which will be called to train the model.</p>
<ul>
<li><p>Trains a model on <code>train_loader</code>, evaluates on <code>valid_loader</code>, and returns the best validation score scaled back to riders.</p>
</li>
<li><p>Loss: Uses <code>nn.HuberLoss()</code> for robustness to outliers compared to MSE.</p>
</li>
<li><p>Optimizer: Applies <code>SGD</code> with momentum <code>0.95</code> and user-provided learning rate <code>lr</code> for stable convergence.</p>
</li>
<li><p>Metric: Tracks <code>torchmetrics.MeanAbsoluteError</code> (<code>MAE</code>) on the active <code>device</code>.</p>
</li>
<li><p>Training: Delegates to <code>train()</code> with <code>n_epochs</code>, <code>patience</code>, and <code>factor</code> (for <code>ReduceLROnPlateau</code> inside <code>train()</code>), recording loss and metrics each epoch.</p>
</li>
<li><p>Returns <code>min(history["valid_metrics"]) * 1e6</code> — the best validation <code>MAE</code> rescaled from “millions” back to riders for an intuitive number.</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># extra code – defines a utility function we'll reuse several time</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fit_and_evaluate</span>(<span class="hljs-params">model, train_loader, valid_loader, lr, n_epochs=<span class="hljs-number">50</span>,
                     patience=<span class="hljs-number">20</span>, factor=<span class="hljs-number">0.1</span></span>):</span>
    loss_fn = nn.HuberLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=<span class="hljs-number">0.95</span>)
    metric = torchmetrics.MeanAbsoluteError().to(device)
    history = train(model, optimizer, loss_fn, metric,
                    train_loader, valid_loader, n_epochs=n_epochs,
                    patience=patience, factor=factor)
    <span class="hljs-keyword">return</span> min(history[<span class="hljs-string">"valid_metrics"</span>]) * <span class="hljs-number">1e6</span>
</code></pre>
<h3 id="heading-lstm-module">LSTM module</h3>
<p>This model unrolls an <strong>LSTMCell</strong> over the time dimension (many-to-one). It takes the final hidden state and maps it to the output.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">LstmModel</span>(<span class="hljs-params">nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, input_size, hidden_size, output_size</span>):</span>
        super().__init__()
        self.hidden_size = hidden_size
        self.memory_cell = nn.LSTMCell(input_size, hidden_size)
        self.output = nn.Linear(hidden_size, output_size)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, X</span>):</span>
        batch_size, window_length, dimensionality = X.shape
        X_time_first = X.transpose(<span class="hljs-number">0</span>, <span class="hljs-number">1</span>)
        H = torch.zeros(batch_size, self.hidden_size, device=X.device)
        C = torch.zeros(batch_size, self.hidden_size, device=X.device)
        <span class="hljs-keyword">for</span> X_t <span class="hljs-keyword">in</span> X_time_first:
            H, C = self.memory_cell(X_t, (H, C))
        <span class="hljs-keyword">return</span> self.output(H)
</code></pre>
<pre><code class="lang-markdown"><span class="hljs-code">```
Input X (batch, T, input_size)
         ┌───────────────────────────────────────────────────────────┐
         │  Time-first transpose → X_time_first (T, batch, input_size) │
         └───────────────────────────────────────────────────────────┘
                          │
                t = 1     │      t = 2              …              t = T
                          ▼
     X_1 ──► [ LSTMCell ] ──► H_1,C_1
                          │
     X_2 ──► [ LSTMCell ] ──► H_2,C_2
                          │
       …    [ LSTMCell ]   …
                          │
     X_T ──► [ LSTMCell ] ──► H_T,C_T
                          │
                          ▼
                 take final hidden H_T (batch, hidden_size)
                          │
                          ▼
               Linear(hidden_size → output_size)
                          │
                          ▼
                    y_pred (batch, output_size)
```</span>
</code></pre>
<pre><code class="lang-python">- `LSTMCell` runs one step at a time; `nn.LSTM` can process all steps at once (faster on GPU).
- Many-to-one setup: single output per sequence using the last hidden state.
- For many-to-many tasks, apply a head to each time step (e.g., over all hidden states).
</code></pre>
<h3 id="heading-train-the-model">Train the Model</h3>
<pre><code class="lang-python">torch.manual_seed(<span class="hljs-number">42</span>)
Lstm_model = LstmModel(
    input_size=<span class="hljs-number">5</span>, hidden_size=<span class="hljs-number">32</span>, output_size=<span class="hljs-number">2</span>).to(device)
Lstm_model = Lstm_model.to(device)
fit_and_evaluate(Lstm_model, mulvar_train_loader, mulvar_valid_loader, lr=<span class="hljs-number">0.05</span>, n_epochs=<span class="hljs-number">50</span>)
</code></pre>
<pre><code class="lang-python">Epoch <span class="hljs-number">1</span>/<span class="hljs-number">50</span>, train loss: <span class="hljs-number">0.0675</span>, train metric: <span class="hljs-number">0.3049</span>, valid metric: <span class="hljs-number">0.2044</span>
Epoch <span class="hljs-number">2</span>/<span class="hljs-number">50</span>, train loss: <span class="hljs-number">0.0184</span>, train metric: <span class="hljs-number">0.1556</span>, valid metric: <span class="hljs-number">0.1573</span>
Epoch <span class="hljs-number">3</span>/<span class="hljs-number">50</span>, train loss: <span class="hljs-number">0.0104</span>, train metric: <span class="hljs-number">0.1184</span>, valid metric: <span class="hljs-number">0.0974</span>
...
Epoch <span class="hljs-number">47</span>/<span class="hljs-number">50</span>, train loss: <span class="hljs-number">0.0013</span>, train metric: <span class="hljs-number">0.0364</span>, valid metric: <span class="hljs-number">0.0265</span>
Epoch <span class="hljs-number">48</span>/<span class="hljs-number">50</span>, train loss: <span class="hljs-number">0.0013</span>, train metric: <span class="hljs-number">0.0365</span>, valid metric: <span class="hljs-number">0.0258</span>
Epoch <span class="hljs-number">49</span>/<span class="hljs-number">50</span>, train loss: <span class="hljs-number">0.0013</span>, train metric: <span class="hljs-number">0.0363</span>, valid metric: <span class="hljs-number">0.0305</span>
Epoch <span class="hljs-number">50</span>/<span class="hljs-number">50</span>, train loss: <span class="hljs-number">0.0013</span>, train metric: <span class="hljs-number">0.0368</span>, valid metric: <span class="hljs-number">0.0266</span>
</code></pre>
<h3 id="heading-evaluate-the-model-on-test-dataset">Evaluate the model on test dataset</h3>
<pre><code class="lang-python"><span class="hljs-comment"># Evaluate the trained LSTM model on the test set</span>
metric = torchmetrics.MeanAbsoluteError().to(device)
test_mae = evaluate_tm(Lstm_model, mulvar_test_loader, metric).item()
print(<span class="hljs-string">f"Test MAE: <span class="hljs-subst">{test_mae:<span class="hljs-number">.6</span>f}</span> (<span class="hljs-subst">{test_mae*<span class="hljs-number">1e6</span>:<span class="hljs-number">.2</span>f}</span> riders)"</span>)
</code></pre>
<p><strong>Test MAE: 0.134912 (134911.54 riders)</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769873715148/dacc26b1-4103-4703-a017-789d699cc750.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-predicting-future-date">Predicting future date</h3>
<p>Lets print the last 5 data rows from our data set</p>
<pre><code class="lang-python"> <span class="hljs-comment"># Also show the last 5 rows for context</span>
    print(<span class="hljs-string">"\nLast 5 rows:"</span>)
    print(df_mulvar.tail(<span class="hljs-number">5</span>))
</code></pre>
<pre><code class="lang-python">
Last <span class="hljs-number">5</span> rows:
                rail       bus  next_day_type_A  next_day_type_U  \
date                                                               
<span class="hljs-number">2021</span><span class="hljs-number">-11</span><span class="hljs-number">-26</span>  <span class="hljs-number">0.189694</span>  <span class="hljs-number">0.257700</span>              <span class="hljs-number">1.0</span>              <span class="hljs-number">0.0</span>   
<span class="hljs-number">2021</span><span class="hljs-number">-11</span><span class="hljs-number">-27</span>  <span class="hljs-number">0.187065</span>  <span class="hljs-number">0.237839</span>              <span class="hljs-number">0.0</span>              <span class="hljs-number">1.0</span>   
<span class="hljs-number">2021</span><span class="hljs-number">-11</span><span class="hljs-number">-28</span>  <span class="hljs-number">0.147830</span>  <span class="hljs-number">0.184817</span>              <span class="hljs-number">0.0</span>              <span class="hljs-number">0.0</span>   
<span class="hljs-number">2021</span><span class="hljs-number">-11</span><span class="hljs-number">-29</span>  <span class="hljs-number">0.276090</span>  <span class="hljs-number">0.421322</span>              <span class="hljs-number">0.0</span>              <span class="hljs-number">0.0</span>   
<span class="hljs-number">2021</span><span class="hljs-number">-11</span><span class="hljs-number">-30</span>  <span class="hljs-number">0.302349</span>  <span class="hljs-number">0.450230</span>              <span class="hljs-number">0.0</span>              <span class="hljs-number">0.0</span>   

            next_day_type_W  
date                         
<span class="hljs-number">2021</span><span class="hljs-number">-11</span><span class="hljs-number">-26</span>              <span class="hljs-number">0.0</span>  
<span class="hljs-number">2021</span><span class="hljs-number">-11</span><span class="hljs-number">-27</span>              <span class="hljs-number">0.0</span>  
<span class="hljs-number">2021</span><span class="hljs-number">-11</span><span class="hljs-number">-28</span>              <span class="hljs-number">1.0</span>  
<span class="hljs-number">2021</span><span class="hljs-number">-11</span><span class="hljs-number">-29</span>              <span class="hljs-number">1.0</span>  
<span class="hljs-number">2021</span><span class="hljs-number">-11</span><span class="hljs-number">-30</span>              <span class="hljs-number">0.0</span>
</code></pre>
<ul>
<li><p>'<strong>W</strong>' (weekday)</p>
</li>
<li><p>'<strong>A'</strong> (Saturday)</p>
</li>
<li><p>'<strong>U</strong>' (Sunday/holiday)</p>
</li>
</ul>
<p>Lets do prediction for <strong>W(Weekday)</strong></p>
<p>We will create a single instance of time series data using a sliding window of <strong>56</strong> to feed into the <strong>RNN</strong> model. The one-hot encoding for the "<strong>next_day_type_W</strong>" in the last row has been modified to "1" for weekday. This change is made to predict the output ( y ) for the next day for both train and bus riders.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Predict next day's ridership using the last window and a chosen day type</span>
<span class="hljs-keyword">try</span>:
    future_day_type = <span class="hljs-string">'W'</span>  <span class="hljs-comment"># options: 'W' (weekday), 'A' (Saturday), 'U' (Sunday/holiday)</span>


    <span class="hljs-comment"># Build the window (use defined window_length or default to 56)</span>
    window_len = window_length <span class="hljs-keyword">if</span> <span class="hljs-string">'window_length'</span> <span class="hljs-keyword">in</span> globals() <span class="hljs-keyword">else</span> <span class="hljs-number">56</span>
    X_window = df_mulvar.tail(window_len).values.copy()

    print(<span class="hljs-string">f"Window shape: <span class="hljs-subst">{X_window.shape}</span>"</span>)

    <span class="hljs-comment"># Set the last row's day-type one-hot to the chosen future type</span>
    X_window[<span class="hljs-number">-1</span>, <span class="hljs-number">3</span>] = <span class="hljs-number">1.0</span>

    print(<span class="hljs-string">f"Modified last row of window for future day type '<span class="hljs-subst">{future_day_type}</span>':"</span>)
    print(X_window[<span class="hljs-number">-1</span>])

    <span class="hljs-comment"># Predict</span>
    Lstm_model.eval()
    <span class="hljs-keyword">with</span> torch.no_grad():
        X_t = torch.FloatTensor(X_window).unsqueeze(<span class="hljs-number">0</span>).to(device)  <span class="hljs-comment"># (1, T, 5)</span>
        print(<span class="hljs-string">f"Input tensor shape for prediction: <span class="hljs-subst">{X_t.shape}</span>"</span>)
        y_pred = Lstm_model(X_t).squeeze(<span class="hljs-number">0</span>).cpu().numpy() 


    rail_pred_m, bus_pred_m = float(y_pred[<span class="hljs-number">0</span>]), float(y_pred[<span class="hljs-number">1</span>])
    last_date = df_mulvar.index[<span class="hljs-number">-1</span>]
    future_date = pd.to_datetime(last_date) + pd.Timedelta(days=<span class="hljs-number">1</span>)

    print(<span class="hljs-string">f"Future date: <span class="hljs-subst">{future_date.date()}</span> (day type=<span class="hljs-subst">{future_day_type}</span>)"</span>)
    print(<span class="hljs-string">f"Predicted (millions): rail=<span class="hljs-subst">{rail_pred_m:<span class="hljs-number">.6</span>f}</span>, bus=<span class="hljs-subst">{bus_pred_m:<span class="hljs-number">.6</span>f}</span>"</span>)
    print(<span class="hljs-string">f"Predicted (riders):   rail=<span class="hljs-subst">{rail_pred_m*<span class="hljs-number">1e6</span>:<span class="hljs-number">.0</span>f}</span>, bus=<span class="hljs-subst">{bus_pred_m*<span class="hljs-number">1e6</span>:<span class="hljs-number">.0</span>f}</span>"</span>)
<span class="hljs-keyword">except</span> NameError <span class="hljs-keyword">as</span> e:
    print(<span class="hljs-string">"Required variables not defined. Ensure df_mulvar, window_length, device, and Lstm_model exist."</span>)
    print(<span class="hljs-string">"Error:"</span>, e)
<span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
    print(<span class="hljs-string">"Prediction failed:"</span>, e)
</code></pre>
<pre><code class="lang-python">Window shape: (<span class="hljs-number">56</span>, <span class="hljs-number">5</span>)
Modified last row of window <span class="hljs-keyword">for</span> future day type <span class="hljs-string">'W'</span>:
[<span class="hljs-number">0.302349</span> <span class="hljs-number">0.45023</span>  <span class="hljs-number">0.</span>       <span class="hljs-number">0.</span>       <span class="hljs-number">1.</span>      ]
Input tensor shape <span class="hljs-keyword">for</span> prediction: torch.Size([<span class="hljs-number">1</span>, <span class="hljs-number">56</span>, <span class="hljs-number">5</span>])
Future date: <span class="hljs-number">2021</span><span class="hljs-number">-12</span><span class="hljs-number">-01</span> (day type=W)
Predicted (millions): rail=<span class="hljs-number">0.481875</span>, bus=<span class="hljs-number">0.560757</span>
Predicted (riders):   rail=<span class="hljs-number">481875</span>, bus=<span class="hljs-number">560757</span>
</code></pre>
<p><strong>Predicted (millions): rail=0.481875, bus=0.560757<br />Predicted (riders): rail=481875, bus=560757</strong></p>
]]></content:encoded></item><item><title><![CDATA[Implementing a ResNet-34 CNN Using PyTorch]]></title><description><![CDATA[A while ago, I authored an article Implementing ResNet CNN that provided a detailed explanation of ResNet Convolutional Neural Networks (CNN) along with an implementation using TensorFlow. In this upcoming article, we will take a closer look at ResNe...]]></description><link>https://path2ml.com/implementing-a-resnet-34-cnn-using-pytorch</link><guid isPermaLink="true">https://path2ml.com/implementing-a-resnet-34-cnn-using-pytorch</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[DeepLearning]]></category><category><![CDATA[pytorch]]></category><category><![CDATA[CNN]]></category><category><![CDATA[CNNs (Convolutional Neural Networks)]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Thu, 29 Jan 2026 02:41:24 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769651820908/f89ef455-104f-4fd0-abca-4c7506cbd2a6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A while ago, I authored an article <a target="_blank" href="https://path2ml.com/implementing-resnet-cnn"><strong>Implementing ResNet CNN</strong></a> that provided a detailed explanation of <strong>ResNet Convolutional Neural Networks (CNN)</strong> along with an implementation using <strong>TensorFlow</strong>. In this upcoming article, we will take a closer look at <strong>ResNet34</strong>, a specific variant of the <strong>ResNet</strong> architecture, and implement it using <strong>PyTorch</strong>. This will allow us to explore the unique features and benefits of PyTorch while leveraging the powerful capabilities of <strong>ResNet34</strong> for various tasks in deep learning.</p>
<h2 id="heading-resnet34-architecture">ResNet34 Architecture</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769652364117/dfc28772-4dbf-470b-85aa-72b7a4468d06.png" alt class="image--center mx-auto" /></p>
<p>The ResNet34 class constructs a complete ResNet-34 network consisting of 34 layers. Here’s a breakdown of its structure:</p>
<h3 id="heading-1-stem-initial-layers"><strong>1. Stem (Initial Layers):</strong></h3>
<ul>
<li><p><strong>Conv2d:</strong> Converts 3 input channels to 64 filters with a 7×7 kernel and a stride of 2, which downsamples the image by a factor of 2.</p>
</li>
<li><p><strong>BatchNorm2d + ReLU:</strong> Applies batch normalization followed by the ReLU activation function.</p>
</li>
<li><p><strong>MaxPool2d:</strong> Further downsamples the image with a stride of 2.</p>
</li>
</ul>
<h3 id="heading-2-residual-blocks-core"><strong>2. Residual Blocks (Core):</strong></h3>
<p>The network comprises ResidualUnits grouped into four sections:</p>
<ul>
<li><p><strong>Stage 1:</strong> 3 units with 64 filters (stride = 1, maintaining spatial dimensions).</p>
</li>
<li><p><strong>Stage 2:</strong> 4 units with 128 filters (the first unit has a stride of 2 for downsampling, while the remaining units have a stride of 1).</p>
</li>
<li><p><strong>Stage 3:</strong> 6 units with 256 filters (the first unit has a stride of 2, while the rest have a stride of 1).</p>
</li>
<li><p><strong>Stage 4:</strong> 3 units with 512 filters (the first unit has a stride of 2, while the rest have a stride of 1).</p>
</li>
</ul>
<p>In total, there are 3 + 4 + 6 + 3 = <strong>16 residual blocks</strong>, which results in 32 convolutional layers plus 2 initial convolutional layers, equating to <strong>34 layers</strong> overall.</p>
<p><strong>Stride Logic:</strong></p>
<ul>
<li><p>A stride of 2 is used when the number of filters changes, which reduces spatial resolution and increases the number of channels.</p>
</li>
<li><p>A stride of 1 is maintained when the number of filters remains the same.</p>
</li>
</ul>
<h3 id="heading-3-classification-head"><strong>3. Classification Head:</strong></h3>
<ul>
<li><p><strong>AdaptiveAvgPool2d:</strong> Performs global average pooling, resulting in an output shape of (batch_size, 512, 1, 1).</p>
</li>
<li><p><strong>Flatten:</strong> Converts the output to a shape of (batch_size, 512).</p>
</li>
<li><p><strong>LazyLinear:</strong> Maps the flattened output from 512 to 10 classes.</p>
</li>
</ul>
<p><strong>Key Design Points:</strong></p>
<ul>
<li><p>Progressively reduces spatial dimensions (56 → 28 → 14 → 7) while increasing channels</p>
</li>
<li><p>Each stage transition uses stride=2 to halve dimensions</p>
</li>
<li><p>Skip connections allow gradients to flow through all 34 layers</p>
</li>
<li><p>Total parameters: ~23.5 million</p>
</li>
</ul>
<p>Lets implement this in Pytorch</p>
<h3 id="heading-import-the-packages">Import the packages</h3>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> load_sample_images
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">import</span> torchvision
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> torchvision.transforms.v2 <span class="hljs-keyword">as</span> T
<span class="hljs-keyword">from</span> functools <span class="hljs-keyword">import</span> partial
<span class="hljs-keyword">import</span> torchmetrics
<span class="hljs-keyword">import</span> torch.nn.functional <span class="hljs-keyword">as</span> F
</code></pre>
<h3 id="heading-residualunit">ResidualUnit</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769653150484/4624e421-22be-46ea-8093-cc8d077abb48.png" alt class="image--center mx-auto" /></p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ResidualUnit</span>(<span class="hljs-params">nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, in_channels, out_channels, stride=<span class="hljs-number">1</span></span>):</span>
        super().__init__()
        DefaultConv2d = partial(
            nn.Conv2d, kernel_size=<span class="hljs-number">3</span>, stride=<span class="hljs-number">1</span>, padding=<span class="hljs-number">1</span>, bias=<span class="hljs-literal">False</span>)
        self.main_layers = nn.Sequential(
            DefaultConv2d(in_channels, out_channels, stride=stride),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(),
            DefaultConv2d(out_channels, out_channels),
            nn.BatchNorm2d(out_channels),
        )
        <span class="hljs-keyword">if</span> stride &gt; <span class="hljs-number">1</span>:
            self.skip_connection = nn.Sequential(
                DefaultConv2d(in_channels, out_channels, kernel_size=<span class="hljs-number">1</span>,
                              stride=stride, padding=<span class="hljs-number">0</span>),
                nn.BatchNorm2d(out_channels),
            )
        <span class="hljs-keyword">else</span>:
            self.skip_connection = nn.Identity()

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, inputs</span>):</span>
        <span class="hljs-keyword">return</span> F.relu(self.main_layers(inputs) + self.skip_connection(inputs))
</code></pre>
<p>The <strong>ResidualUnit</strong> class implements a <strong>residual block</strong>, which is the core building block of ResNet (Residual Networks). Here's the breakdown:<strong>Key Components</strong></p>
<p><strong>1. Main Path</strong></p>
<ul>
<li><p>Two convolutional blocks in sequence:</p>
<ul>
<li>Conv2d → BatchNorm2d → ReLU → Conv2d → BatchNorm2d</li>
</ul>
</li>
<li><p>The first Conv2d uses the stride parameter (for downsampling if needed)</p>
</li>
<li><p>The second Conv2d always uses stride=1</p>
</li>
</ul>
<p><strong>2. Skip Connection</strong></p>
<ul>
<li><p><strong>If</strong> stride <code>&gt; 1:</code> Creates a 1×1 convolution with the specified stride + batch norm (adjusts dimensions and spatial resolution)</p>
</li>
<li><p><strong>If</strong> stride <code>= 1:</code> Uses nn.Identity() (passes input unchanged)</p>
</li>
<li><p>This ensures the skip connection has the same dimensions as the main path output</p>
</li>
</ul>
<p><strong>4. Forward Pass</strong></p>
<ul>
<li><p>Adds the output of <code>main_layers</code> and <code>skip_connection</code></p>
</li>
<li><p>Applies ReLU activation to the sum</p>
</li>
</ul>
<p>The key innovation is <strong>addition of the skip connection</strong> to the main path. This allows:</p>
<ul>
<li><p>Gradients to bypass layers during backpropagation (easier training)</p>
</li>
<li><p>The network to learn residual mappings (differences) rather than full transformations</p>
</li>
<li><p>Training of very deep networks without degradation</p>
</li>
</ul>
<h2 id="heading-resnet34">ResNet34</h2>
<p><strong>ResNet34</strong> class builds complete <strong>ResNet34</strong> architecture leveraging <strong>ResidualUnit</strong> class . Architecture diagram for ResNet34 is show above earlier.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ResNet34</span>(<span class="hljs-params">nn.Module</span>):</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
        super().__init__()
        layers = [
            nn.Conv2d(in_channels=<span class="hljs-number">3</span>, out_channels=<span class="hljs-number">64</span>, kernel_size=<span class="hljs-number">7</span>, stride=<span class="hljs-number">2</span>,
                      padding=<span class="hljs-number">3</span>, bias=<span class="hljs-literal">False</span>),
            nn.BatchNorm2d(num_features=<span class="hljs-number">64</span>),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=<span class="hljs-number">3</span>, stride=<span class="hljs-number">2</span>, padding=<span class="hljs-number">1</span>),
        ]
        prev_filters = <span class="hljs-number">64</span>
        <span class="hljs-keyword">for</span> filters <span class="hljs-keyword">in</span> [<span class="hljs-number">64</span>] * <span class="hljs-number">3</span> + [<span class="hljs-number">128</span>] * <span class="hljs-number">4</span> + [<span class="hljs-number">256</span>] * <span class="hljs-number">6</span> + [<span class="hljs-number">512</span>] * <span class="hljs-number">3</span>:
            stride = <span class="hljs-number">1</span> <span class="hljs-keyword">if</span> filters == prev_filters <span class="hljs-keyword">else</span> <span class="hljs-number">2</span>
            layers.append(ResidualUnit(prev_filters, filters, stride=stride))
            prev_filters = filters
        layers += [
            nn.AdaptiveAvgPool2d(output_size=<span class="hljs-number">1</span>),
            nn.Flatten(),
            nn.LazyLinear(<span class="hljs-number">10</span>),
        ]
        self.resnet = nn.Sequential(*layers)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, inputs</span>):</span>
        <span class="hljs-keyword">return</span> self.resnet(inputs)
</code></pre>
<h3 id="heading-loading-the-cifar-10-dataset">Loading the CIFAR-10 dataset</h3>
<pre><code class="lang-python"><span class="hljs-comment"># Load CIFAR-10 Dataset</span>
transform = T.Compose([
    T.ToImage(),
    T.ToDtype(torch.float32, scale=<span class="hljs-literal">True</span>),
    T.Normalize(mean=[<span class="hljs-number">0.485</span>, <span class="hljs-number">0.456</span>, <span class="hljs-number">0.406</span>], std=[<span class="hljs-number">0.229</span>, <span class="hljs-number">0.224</span>, <span class="hljs-number">0.225</span>])
])

<span class="hljs-comment"># Load full training set</span>
train_valid_dataset = torchvision.datasets.CIFAR10(root=<span class="hljs-string">'./datasets'</span>, train=<span class="hljs-literal">True</span>, 
                                            download=<span class="hljs-literal">False</span>, transform=transform)
<span class="hljs-comment"># Load test set for validation</span>
test_dataset = torchvision.datasets.CIFAR10(root=<span class="hljs-string">'./datasets'</span>, train=<span class="hljs-literal">False</span>, 
                                            download=<span class="hljs-literal">False</span>, transform=transform)

torch.manual_seed(<span class="hljs-number">42</span>)
train_dataset, valid_dataset = torch.utils.data.random_split(
    train_valid_dataset, [<span class="hljs-number">45</span>_000, <span class="hljs-number">5</span>_000]
)

<span class="hljs-comment"># Create Data Loaders</span>
batch_size = <span class="hljs-number">128</span>
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset, batch_size=batch_size, shuffle=<span class="hljs-literal">True</span>)
valid_loader = torch.utils.data.DataLoader(
    dataset=valid_dataset, batch_size=batch_size, shuffle=<span class="hljs-literal">False</span>
)
test_loader = torch.utils.data.DataLoader(
    dataset=test_dataset, batch_size=batch_size, shuffle=<span class="hljs-literal">False</span>
)

print(<span class="hljs-string">f"Training samples: <span class="hljs-subst">{len(train_dataset)}</span>"</span>)
print(<span class="hljs-string">f"Validation samples: <span class="hljs-subst">{len(valid_dataset)}</span>"</span>)
print(<span class="hljs-string">f"Testing  samples: <span class="hljs-subst">{len(test_dataset)}</span>"</span>)
</code></pre>
<p>The below code sets up the training environment for your <strong>ResNet34</strong> model</p>
<ul>
<li><p>Creates a new ResNet34 instance</p>
</li>
<li><p>Moves the model to the selected device (GPU or CPU)</p>
</li>
<li><p>Defines the loss function for multi-class classification</p>
</li>
</ul>
<pre><code class="lang-python"><span class="hljs-comment"># Setup for Training</span>
device = torch.device(<span class="hljs-string">"mps"</span> <span class="hljs-keyword">if</span> torch.backends.mps.is_available() <span class="hljs-keyword">else</span> <span class="hljs-string">"cpu"</span>)
print(<span class="hljs-string">f"Using device: <span class="hljs-subst">{device}</span>"</span>)

<span class="hljs-comment"># Initialize model</span>
model = ResNet34().to(device)

<span class="hljs-comment"># Loss function and optimizer</span>
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=<span class="hljs-number">0.001</span>)
</code></pre>
<p>The <code>train_epoch</code> function trains the model for one complete pass through the training dataset and returns average loss and accuracy for the entire epoch</p>
<h3 id="heading-step-by-step-breakdown"><strong>Step-by-Step Breakdown</strong></h3>
<p><strong>1.</strong> Set Model to Training Mode</p>
<p>2. Initialize Tracking Variables</p>
<p>3. Loops Through Each Batch</p>
<p>5. Does Forward Pass</p>
<p>6. Does Backward Pass (Compute Gradients)</p>
<ul>
<li><p>Updates all model weights using computed gradients</p>
</li>
<li><p>Moves weights in direction that reduces loss</p>
</li>
</ul>
<p>8. Track Metrics</p>
<pre><code class="lang-python"><span class="hljs-comment"># Training function</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">train_epoch</span>(<span class="hljs-params">model, train_loader, criterion, optimizer, device</span>):</span>
    model.train()
    total_loss = <span class="hljs-number">0</span>
    correct = <span class="hljs-number">0</span>
    total = <span class="hljs-number">0</span>

    <span class="hljs-keyword">for</span> images, labels <span class="hljs-keyword">in</span> train_loader:
        images, labels = images.to(device), labels.to(device)

        <span class="hljs-comment"># Forward pass</span>
        outputs = model(images)
        loss = criterion(outputs, labels)

        <span class="hljs-comment"># Backward pass</span>
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        _, predicted = torch.max(outputs.data, <span class="hljs-number">1</span>)
        total += labels.size(<span class="hljs-number">0</span>)
        correct += (predicted == labels).sum().item()

    avg_loss = total_loss / len(train_loader)
    accuracy = <span class="hljs-number">100</span> * correct / total
    <span class="hljs-keyword">return</span> avg_loss, accuracy
</code></pre>
<h3 id="heading-train-the-model">Train The Model</h3>
<pre><code class="lang-python"><span class="hljs-comment"># Train for 10 epochs</span>
num_epochs = <span class="hljs-number">10</span>
train_losses = []
train_accs = []
valid_losses = []
valid_accs = []

print(<span class="hljs-string">"Starting training for 10 epochs..."</span>)
print(<span class="hljs-string">"="</span> * <span class="hljs-number">80</span>)

<span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(num_epochs):
    <span class="hljs-comment"># Train for one epoch</span>
    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
    train_losses.append(train_loss)
    train_accs.append(train_acc)

    <span class="hljs-comment"># Validate for one epoch</span>
    valid_loss, valid_acc = test_epoch(model, valid_loader, criterion, device)
    valid_losses.append(valid_loss)
    valid_accs.append(valid_acc)

    <span class="hljs-comment"># Print results for each epoch</span>
    print(<span class="hljs-string">f"Epoch [<span class="hljs-subst">{epoch+<span class="hljs-number">1</span>:<span class="hljs-number">2</span>d}</span>/<span class="hljs-subst">{num_epochs}</span>] | "</span>
          <span class="hljs-string">f"Train Loss: <span class="hljs-subst">{train_loss:<span class="hljs-number">.4</span>f}</span> | Train Acc: <span class="hljs-subst">{train_acc:<span class="hljs-number">6.2</span>f}</span>% | "</span>
          <span class="hljs-string">f"Valid Loss: <span class="hljs-subst">{valid_loss:<span class="hljs-number">.4</span>f}</span> | Valid Acc: <span class="hljs-subst">{valid_acc:<span class="hljs-number">6.2</span>f}</span>%"</span>)

print(<span class="hljs-string">"="</span> * <span class="hljs-number">80</span>) 
print(<span class="hljs-string">"Training completed!"</span>)

print(<span class="hljs-string">"Model saved to './my_resnet34_checkpoint.pt'"</span>)

<span class="hljs-comment"># Save the trained modeltorch.save(model.state_dict(), './my_resnet34_checkpoint.pt')</span>
</code></pre>
<pre><code class="lang-python">Starting training <span class="hljs-keyword">for</span> <span class="hljs-number">10</span> epochs...
================================================================================
Epoch [ <span class="hljs-number">1</span>/<span class="hljs-number">10</span>] | Train Loss: <span class="hljs-number">0.1853</span> | Train Acc:  <span class="hljs-number">93.58</span>% | Valid Loss: <span class="hljs-number">0.6287</span> | Valid Acc:  <span class="hljs-number">82.22</span>%
Epoch [ <span class="hljs-number">2</span>/<span class="hljs-number">10</span>] | Train Loss: <span class="hljs-number">0.1498</span> | Train Acc:  <span class="hljs-number">94.80</span>% | Valid Loss: <span class="hljs-number">0.7269</span> | Valid Acc:  <span class="hljs-number">80.48</span>%
Epoch [ <span class="hljs-number">3</span>/<span class="hljs-number">10</span>] | Train Loss: <span class="hljs-number">0.1282</span> | Train Acc:  <span class="hljs-number">95.52</span>% | Valid Loss: <span class="hljs-number">0.7559</span> | Valid Acc:  <span class="hljs-number">80.24</span>%
Epoch [ <span class="hljs-number">4</span>/<span class="hljs-number">10</span>] | Train Loss: <span class="hljs-number">0.0961</span> | Train Acc:  <span class="hljs-number">96.71</span>% | Valid Loss: <span class="hljs-number">0.8131</span> | Valid Acc:  <span class="hljs-number">80.16</span>%
Epoch [ <span class="hljs-number">5</span>/<span class="hljs-number">10</span>] | Train Loss: <span class="hljs-number">0.0948</span> | Train Acc:  <span class="hljs-number">96.57</span>% | Valid Loss: <span class="hljs-number">0.8196</span> | Valid Acc:  <span class="hljs-number">80.94</span>%
Epoch [ <span class="hljs-number">6</span>/<span class="hljs-number">10</span>] | Train Loss: <span class="hljs-number">0.0853</span> | Train Acc:  <span class="hljs-number">97.06</span>% | Valid Loss: <span class="hljs-number">0.8924</span> | Valid Acc:  <span class="hljs-number">79.26</span>%
Epoch [ <span class="hljs-number">7</span>/<span class="hljs-number">10</span>] | Train Loss: <span class="hljs-number">0.0755</span> | Train Acc:  <span class="hljs-number">97.44</span>% | Valid Loss: <span class="hljs-number">0.8582</span> | Valid Acc:  <span class="hljs-number">80.14</span>%
Epoch [ <span class="hljs-number">8</span>/<span class="hljs-number">10</span>] | Train Loss: <span class="hljs-number">0.0661</span> | Train Acc:  <span class="hljs-number">97.79</span>% | Valid Loss: <span class="hljs-number">0.9182</span> | Valid Acc:  <span class="hljs-number">80.18</span>%
Epoch [ <span class="hljs-number">9</span>/<span class="hljs-number">10</span>] | Train Loss: <span class="hljs-number">0.0653</span> | Train Acc:  <span class="hljs-number">97.71</span>% | Valid Loss: <span class="hljs-number">0.9218</span> | Valid Acc:  <span class="hljs-number">80.42</span>%
Epoch [<span class="hljs-number">10</span>/<span class="hljs-number">10</span>] | Train Loss: <span class="hljs-number">0.0518</span> | Train Acc:  <span class="hljs-number">98.22</span>% | Valid Loss: <span class="hljs-number">0.9642</span> | Valid Acc:  <span class="hljs-number">79.80</span>%
================================================================================
Training completed!
Model saved to <span class="hljs-string">'./my_resnet34_checkpoint.pt'</span>
</code></pre>
<h3 id="heading-chart-of-losses-and-accuracy-for-training-and-validation-data">Chart of losses and accuracy for training and validation data</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769654124233/b14bc28a-3c0c-443c-ab52-96709d9b1168.png" alt class="image--center mx-auto" /></p>
<p>\====================================================================== <strong>TRAINING SUMMARY</strong> \======================================================================</p>
<p>Total Epochs Trained: 10</p>
<p>Final Metrics:</p>
<p>Train Loss: 0.0518 Train Accuracy: 98.22% Valid Loss: 0.9642 Valid Accuracy: 79.80%</p>
<p>Best Validation Metrics: Best Valid Accuracy: 82.22% (Epoch 1) Best Valid Loss: 0.6287 (Epoch 1) ======================================================================</p>
<h3 id="heading-evaluating-on-test-data">Evaluating on test Data</h3>
<pre><code class="lang-python"><span class="hljs-comment"># Evaluate on Test Data</span>
print(<span class="hljs-string">"Evaluating model on test data..."</span>)
test_loss, test_acc = test_epoch(model, test_loader, criterion, device)
print(<span class="hljs-string">f"Test Loss: <span class="hljs-subst">{test_loss:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Test Accuracy: <span class="hljs-subst">{test_acc:<span class="hljs-number">.2</span>f}</span>%"</span>)
</code></pre>
<pre><code class="lang-python">Evaluating model on test data...
Test Loss: <span class="hljs-number">1.2146</span>
Test Accuracy: <span class="hljs-number">76.64</span>%
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Deep learning using Pytorch on Images dataset]]></title><description><![CDATA[The CIFAR-10 dataset is a widely used collection of images in the field of machine learning. It consists of 60,000 32x32 color images categorized into 10 different classes, with each class containing 6,000 images. These classes include airplanes, aut...]]></description><link>https://path2ml.com/deep-learning-using-pytorch-on-images-dataset</link><guid isPermaLink="true">https://path2ml.com/deep-learning-using-pytorch-on-images-dataset</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[DeepLearning]]></category><category><![CDATA[pytorch]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Wed, 28 Jan 2026 01:18:53 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769562920414/8dec8c2e-bb22-4e32-9068-efa68886ee53.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The <strong>CIFAR-10</strong> dataset is a widely used collection of images in the field of machine learning. It consists of 60,000 32x32 color images categorized into <strong>10</strong> different classes, with each class containing 6,000 images. These classes include airplanes, automobiles, birds, cats, deer, dogs, frogs, horses, rabbits, and ships.</p>
<p>In the PyTorch vision library, the CIFAR-10 dataset can be easily accessed and utilized. PyTorch provides a convenient way to download the dataset and offers various transformations to preprocess the images for training and testing machine learning models. This makes it a popular choice for training and evaluating image classification algorithms.</p>
<p>In this blog we will train a deep neural network on the <strong>CIFAR10</strong> image dataset.</p>
<p>I have utilized Jupyter Notebook, which is installed on my <strong>Mac</strong>, to run my code. However, an excellent alternative is Google Colab, which allows for seamless execution of the code presented in this blog.</p>
<p>In order to work with the code effectively, there are specific packages that I need to install within my virtual environment. These packages include:</p>
<ul>
<li><p><strong>PyTorch</strong>: A powerful deep learning library that provides flexibility and ease of use for building and training neural networks.</p>
</li>
<li><p><strong>TorchMetrics</strong>: A library that offers a wide range of metrics specifically designed for evaluating the performance of generative models.</p>
</li>
<li><p><strong>TorchVision</strong>: A package that simplifies the process of loading and preprocessing datasets, including the popular CIFAR-10 dataset, which contains a variety of images used for training machine learning models.</p>
</li>
</ul>
<p>By setting up these packages, I can leverage the capabilities of deep learning and effectively work with image data.</p>
<h2 id="heading-loading-packages-and-cifar-10-dataset">Loading Packages and CIFAR-10 DataSet</h2>
<p>Lets load the packages needed</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> torchvision
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn
<span class="hljs-keyword">import</span> torchvision.transforms.v2 <span class="hljs-keyword">as</span> T
</code></pre>
<p>We will detects and selects the best available device for PyTorch computations</p>
<pre><code class="lang-python"><span class="hljs-keyword">if</span> torch.cuda.is_available():
    device = <span class="hljs-string">"cuda"</span>
<span class="hljs-keyword">elif</span> torch.backends.mps.is_available():
    device = <span class="hljs-string">"mps"</span>
<span class="hljs-keyword">else</span>:
    device = <span class="hljs-string">"cpu"</span>
</code></pre>
<p>Using TorchVision we will load the dataset and split into train, test and valid dataset.</p>
<p>We create a preprocessing <strong>pipeline</strong> that transforms images as they're loaded:</p>
<ol>
<li><p><strong>T.Compose([...])</strong> - Chains multiple transforms together in sequence. Each transform is applied in order.</p>
</li>
<li><p><strong>T.ToImage()</strong> - Converts images from various formats (PIL images, NumPy arrays, raw tensors) into TorchVision's Image class (a specialized tensor type). Standardizes the image format.</p>
</li>
<li><p><strong>T.ToDtype(torch.float32, scale=True)</strong> - Converts image values to 32-bit floats and scales them:</p>
<ul>
<li><p><strong>torch.float32</strong> - Sets data type to float32</p>
</li>
<li><p><strong>scale=True</strong> - Normalizes pixel values from their original range (0-255 for typical images) to <strong>0.0-1.0</strong></p>
</li>
</ul>
</li>
</ol>
<pre><code class="lang-python">toTensor = T.Compose([T.ToImage(), T.ToDtype(torch.float32, scale=<span class="hljs-literal">True</span>)])
train_and_valid_set = torchvision.datasets.CIFAR10(
    root=<span class="hljs-string">"datasets"</span>, train=<span class="hljs-literal">True</span>, download=<span class="hljs-literal">True</span>, transform=toTensor
)
test_set= torchvision.datasets.CIFAR10(
    root=<span class="hljs-string">"datasets"</span>, train=<span class="hljs-literal">False</span>, download=<span class="hljs-literal">True</span>, transform=toTensor
)
</code></pre>
<ul>
<li><p>Randomly splits (50,000 CIFAR-10 training images) into two subsets:</p>
<ul>
<li><p><strong>train_set</strong> → 45,000 images (for training)</p>
</li>
<li><p><strong>valid_set</strong> → 5,000 images (for validation/testing model performance during training)</p>
</li>
</ul>
</li>
</ul>
<pre><code class="lang-python">torch.manual_seed(<span class="hljs-number">42</span>)
train_set, valid_set = torch.utils.data.random_split(
    train_and_valid_set, [<span class="hljs-number">45</span>_000, <span class="hljs-number">5</span>_000]
)
</code></pre>
<p>Now load the datasets as python’s DataLoader objects</p>
<pre><code class="lang-python">batch_size = <span class="hljs-number">128</span>
train_loader = torch.utils.data.DataLoader(
    dataset=train_set, batch_size=batch_size, shuffle=<span class="hljs-literal">True</span>)
valid_loader = torch.utils.data.DataLoader(
    dataset=valid_set, batch_size=batch_size, shuffle=<span class="hljs-literal">False</span>
)
test_loader = torch.utils.data.DataLoader(
    dataset=test_set, batch_size=batch_size, shuffle=<span class="hljs-literal">False</span>
)
</code></pre>
<h2 id="heading-build-the-modeldeep-neural-network">Build the Model(Deep Neural Network)</h2>
<p>We will build a deep neural network (<strong>DNN</strong>) with <strong>20</strong> hidden layers, each containing <strong>100</strong> neurons. We will use He initialization for the weights and the Swish activation function (implemented as nn.SiLU). Since this is a classification task, the output layer will have one neuron for each class.</p>
<h3 id="heading-he-initialization"><strong>He Initialization</strong></h3>
<p>We will use <strong>He Initialization</strong> (also known as <strong>Kaiming Initialization</strong>) which is a technique for initializing weights in neural networks that use <strong>ReLU</strong> (Rectified Linear Unit) or <strong>Swish/SiLU</strong> activation functions.</p>
<h3 id="heading-the-problem-it-solves">The Problem It Solves:</h3>
<p>When training deep networks, weights that are initialized randomly can lead to several issues:</p>
<ul>
<li><p><strong>Vanishing Gradients</strong>: Gradients can become very small, causing learning to slow down significantly.</p>
</li>
<li><p><strong>Exploding Gradients</strong>: Gradients can grow too large, resulting in unstable training.</p>
</li>
</ul>
<h3 id="heading-how-it-works">How It Works:</h3>
<p>He initialization scales the weights based on the number of input neurons to the layer:</p>
<p>\([ w \sim U(-\sqrt{\frac{6}{n_{\text{in}}}}, \sqrt{\frac{6}{n_{\text{in}}}}) ]\)</p>
<p>where \(( n_{\text{in}} ) \) is the number of input neurons for that layer.</p>
<p>Layers with more input neurons will have smaller weight magnitudes. This approach helps maintain consistent signal variance throughout the network.</p>
<h3 id="heading-why-it-matters">Why It Matters:</h3>
<ul>
<li><p><strong>Faster Convergence</strong>: The network trains more efficiently.</p>
</li>
<li><p><strong>Better Performance for Deeper Networks</strong>: It helps prevent gradient-related issues in very deep architectures.</p>
</li>
<li><p><strong>Optimized for ReLU/SiLU</strong>: Specifically tuned for these activation functions.</p>
</li>
</ul>
<p>Without <strong>He initialization</strong>, training a 20-layer network (like yours) would be challenging. With it, deep networks can learn significantly faster and more stably.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">use_he_init</span>(<span class="hljs-params">module</span>):</span>
    <span class="hljs-keyword">if</span> isinstance(module, torch.nn.Linear):
        torch.nn.init.kaiming_uniform_(module.weight)
        torch.nn.init.zeros_(module.bias)
</code></pre>
<p>Now we build Deep Neural Network of <strong>20</strong> hidden layers, each containing <strong>100</strong> neurons and using activation function <strong>SiLU</strong></p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">build_deep_model</span>(<span class="hljs-params">n_hidden, n_neurons, n_inputs, n_outputs</span>):</span>
    layers = [nn.Flatten(), nn.Linear(n_inputs, n_neurons), nn.SiLU()]
    <span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(n_hidden - <span class="hljs-number">1</span>):
        layers += [nn.Linear(n_neurons, n_neurons), nn.SiLU()]

    layers += [nn.Linear(n_neurons, n_outputs)]
    model = torch.nn.Sequential(*layers)
    model.apply(use_he_init)
    <span class="hljs-keyword">return</span> model
</code></pre>
<h3 id="heading-activation-function-silu"><strong>Activation Function SiLU</strong></h3>
<p><strong>SiLU (Sigmoid Linear Unit)</strong>, also known as <strong>Swish</strong>, is a smooth activation function that has gained popularity in modern deep learning.</p>
<p><strong>Mathematical Definition:</strong></p>
<p>\(\mathbf{  \text{SiLU}(x) = x \cdot \sigma(x) = x \cdot \frac{1}{1 + e^{-x}} }\)</p>
<p>Here, \(\mathbf{ \sigma(x) }\) represents the sigmoid function.</p>
<p><strong>How It Works:</strong></p>
<ul>
<li><p>The function multiplies the input ( x ) by its sigmoid value, which lies between 0 and 1.</p>
</li>
<li><p>When ( x ) is negative, the sigmoid value approaches 0, resulting in a small output.</p>
</li>
<li><p>When ( x ) is positive, the sigmoid value approaches 1, making the output close to ( x ).</p>
</li>
<li><p>This creates a smooth, non-linear curve.</p>
</li>
</ul>
<p><strong>Key Advantages:</strong></p>
<ul>
<li><p><strong>Smoothness</strong>: Unlike ReLU, which has a sharp corner at 0, SiLU is smooth everywhere.</p>
</li>
<li><p><strong>Self-gating</strong>: The sigmoid component acts as a "gate," determining which activations can pass through.</p>
</li>
<li><p><strong>Better Gradient Flow</strong>: It helps prevent issues related to vanishing or exploding gradients in deep networks.</p>
</li>
<li><p><strong>Compatibility with He Initialization</strong>: SiLU is designed to work effectively with He weight initialization.</p>
</li>
</ul>
<p><strong>Comparison to ReLU:</strong></p>
<ul>
<li><p><strong>ReLU</strong>: Returns \(( \max(0, x) ) \) – it is fast but has a sharp transition.</p>
</li>
<li><p><strong>SiLU</strong>: Returns \(( x \cdot \sigma(x) ) \) – it is smoother and more expressive.</p>
</li>
</ul>
<p>Below shows three functions <strong>Sigmoid , SiLU(Swish) and ReLU</strong> activation functions</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769553680246/f976cdc7-a9d5-4e17-b7bd-0a8281a9ee0b.png" alt class="image--center mx-auto" /></p>
<p>Now we build the model calling our function <strong>build_deep_model</strong></p>
<pre><code class="lang-python">torch.manual_seed(<span class="hljs-number">42</span>)
<span class="hljs-comment"># build the model   </span>
model = build_deep_model(n_hidden=<span class="hljs-number">20</span>, n_neurons=<span class="hljs-number">100</span>, n_inputs=<span class="hljs-number">3</span> * <span class="hljs-number">32</span> * <span class="hljs-number">32</span>, n_outputs=<span class="hljs-number">10</span>)
model.to(device)
</code></pre>
<pre><code class="lang-python">Sequential(
  (<span class="hljs-number">0</span>): Flatten(start_dim=<span class="hljs-number">1</span>, end_dim=<span class="hljs-number">-1</span>)
  (<span class="hljs-number">1</span>): Linear(in_features=<span class="hljs-number">3072</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">2</span>): SiLU()
  (<span class="hljs-number">3</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">4</span>): SiLU()
  (<span class="hljs-number">5</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">6</span>): SiLU()
  (<span class="hljs-number">7</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">8</span>): SiLU()
  (<span class="hljs-number">9</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">10</span>): SiLU()
  (<span class="hljs-number">11</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">12</span>): SiLU()
  (<span class="hljs-number">13</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">14</span>): SiLU()
  (<span class="hljs-number">15</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">16</span>): SiLU()
  (<span class="hljs-number">17</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">18</span>): SiLU()
  (<span class="hljs-number">19</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">20</span>): SiLU()
  (<span class="hljs-number">21</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">22</span>): SiLU()
  (<span class="hljs-number">23</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
...
  (<span class="hljs-number">38</span>): SiLU()
  (<span class="hljs-number">39</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">100</span>, bias=<span class="hljs-literal">True</span>)
  (<span class="hljs-number">40</span>): SiLU()
  (<span class="hljs-number">41</span>): Linear(in_features=<span class="hljs-number">100</span>, out_features=<span class="hljs-number">10</span>, bias=<span class="hljs-literal">True</span>)
)
</code></pre>
<h2 id="heading-train-the-model">Train the model</h2>
<p>Now we will write a function that trains a neural network with early stopping to prevent overfitting. Here's what it does:</p>
<ol>
<li><p><strong>History Tracking:</strong> The function sets up a system to track losses and metrics during training.</p>
</li>
<li><p><strong>Metric Calculation:</strong> It calculates both training and validation metrics at each epoch.</p>
</li>
<li><p><strong>Validation Improvement:</strong></p>
<ul>
<li>If the validation metric improves, the function saves the model weights and resets the patience counter to zero.</li>
</ul>
</li>
<li><p><strong>No Improvement Case:</strong></p>
<ul>
<li><p>If the validation metric does not improve, the function increments the patience counter.</p>
</li>
<li><p>After a specified number of epochs without improvement (the patience threshold), training is stopped early.</p>
</li>
</ul>
</li>
<li><p><strong>Restoring the Best Model:</strong> The function restores the model to the version with the highest validation metric.</p>
</li>
</ol>
<h3 id="heading-why-this-matters">Why This Matters:</h3>
<ul>
<li><p><strong>Prevents Overfitting:</strong> Early stopping helps to stop training before the model memorizes the training data.</p>
</li>
<li><p><strong>Saves the Best Model:</strong> It retains the checkpoint that exhibits the best validation performance.</p>
</li>
<li><p><strong>Efficiency:</strong> This approach avoids unnecessary training once the model's performance plateaus.</p>
<pre><code class="lang-python">  <span class="hljs-comment"># This function evaluates a trained model on a dataset and computes a metric </span>
  <span class="hljs-comment">#(like accuracy)</span>
  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">evaluate_tm</span>(<span class="hljs-params">model, data_loader, metric</span>):</span>
      model.eval()
      metric.reset()
      <span class="hljs-keyword">with</span> torch.no_grad():
          <span class="hljs-keyword">for</span> X_batch, y_batch <span class="hljs-keyword">in</span> data_loader:
              X_batch, y_batch = X_batch.to(device), y_batch.to(device)
              y_pred = model(X_batch)
              metric.update(y_pred, y_batch)
      <span class="hljs-keyword">return</span> metric.compute()

  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">train_with_early_stopping</span>(<span class="hljs-params">model, optimizer, loss_fn, metric, train_loader,
                                valid_loader, n_epochs, patience=<span class="hljs-number">10</span>,
                                checkpoint_path=None, scheduler=None</span>):</span>
      checkpoint_path = checkpoint_path <span class="hljs-keyword">or</span> <span class="hljs-string">"my_checkpoint.pt"</span>
      history = {<span class="hljs-string">"train_losses"</span>: [], <span class="hljs-string">"train_metrics"</span>: [], <span class="hljs-string">"valid_metrics"</span>: []}
      best_metric = <span class="hljs-number">0.0</span>
      patience_counter = <span class="hljs-number">0</span>
      <span class="hljs-keyword">for</span> epoch <span class="hljs-keyword">in</span> range(n_epochs):
          total_loss = <span class="hljs-number">0.0</span>
          metric.reset()
          model.train()
          t0 = time.time()
          <span class="hljs-keyword">for</span> X_batch, y_batch <span class="hljs-keyword">in</span> train_loader:
              X_batch, y_batch = X_batch.to(device), y_batch.to(device)
              y_pred = model(X_batch)
              loss = loss_fn(y_pred, y_batch)
              total_loss += loss.item()
              loss.backward()
              optimizer.step()
              optimizer.zero_grad()
              metric.update(y_pred, y_batch)

          train_metric = metric.compute().item()
          valid_metric = evaluate_tm(model, valid_loader, metric).item()
          <span class="hljs-keyword">if</span> valid_metric &gt; best_metric:
              torch.save(model.state_dict(), checkpoint_path)
              best_metric = valid_metric
              best = <span class="hljs-string">" (best)"</span>
              patience_counter = <span class="hljs-number">0</span>
          <span class="hljs-keyword">else</span>:
              patience_counter += <span class="hljs-number">1</span>
              best = <span class="hljs-string">""</span>

          t1 = time.time()
          history[<span class="hljs-string">"train_losses"</span>].append(total_loss / len(train_loader))
          history[<span class="hljs-string">"train_metrics"</span>].append(train_metric)
          history[<span class="hljs-string">"valid_metrics"</span>].append(valid_metric)
          print(<span class="hljs-string">f"Epoch <span class="hljs-subst">{epoch + <span class="hljs-number">1</span>}</span>/<span class="hljs-subst">{n_epochs}</span>, "</span>
                <span class="hljs-string">f"train loss: <span class="hljs-subst">{history[<span class="hljs-string">'train_losses'</span>][<span class="hljs-number">-1</span>]:<span class="hljs-number">.4</span>f}</span>, "</span>
                <span class="hljs-string">f"train metric: <span class="hljs-subst">{history[<span class="hljs-string">'train_metrics'</span>][<span class="hljs-number">-1</span>]:<span class="hljs-number">.4</span>f}</span>, "</span>
                <span class="hljs-string">f"valid metric: <span class="hljs-subst">{history[<span class="hljs-string">'valid_metrics'</span>][<span class="hljs-number">-1</span>]:<span class="hljs-number">.4</span>f}</span><span class="hljs-subst">{best}</span>"</span>
                <span class="hljs-string">f" in <span class="hljs-subst">{t1 - t0:<span class="hljs-number">.1</span>f}</span>s"</span>
          )
          <span class="hljs-keyword">if</span> scheduler <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>:
              scheduler.step()
          <span class="hljs-keyword">if</span> patience_counter &gt;= patience:
              print(<span class="hljs-string">"Early stopping!"</span>)
              <span class="hljs-keyword">break</span>

      model.load_state_dict(torch.load(checkpoint_path))
      <span class="hljs-keyword">return</span> history
</code></pre>
<p>  Let's use the <strong>NAdam</strong> optimizer with a learning rate set to 0.002.</p>
</li>
<li><p><strong>NAdam</strong> (Nesterov-accelerated Adaptive Moment Estimation) is an advanced optimizer that combines two powerful techniques:</p>
<ul>
<li><p>A variant of the <strong>Adam optimizer</strong> that adds <strong>Nesterov momentum</strong></p>
</li>
<li><p>Designed to improve convergence speed and training stability</p>
</li>
</ul>
</li>
<li><p><strong>Key Advantages Over Adam:</strong></p>
<ol>
<li><p><strong>Faster convergence</strong> - Nesterov momentum helps avoid oscillations</p>
</li>
<li><p><strong>Better final performance</strong> - Often achieves lower final loss</p>
</li>
<li><p><strong>Adaptive learning rates</strong> - Still maintains per-parameter learning rates like Adam</p>
</li>
<li><p><strong>Good for deep networks</strong> - Especially effective with deep architectures</p>
</li>
</ol>
</li>
<li><pre><code class="lang-python">  optimizer = torch.optim.NAdam(model.parameters(), lr=<span class="hljs-number">2e-3</span>)
  criterion = nn.CrossEntropyLoss()
  accuracy = torchmetrics.Accuracy(task=<span class="hljs-string">"multiclass"</span>, num_classes=<span class="hljs-number">10</span>).to(device)
</code></pre>
<p>  Now we will call the training function</p>
</li>
<li><pre><code class="lang-python">  n_epochs = <span class="hljs-number">100</span>
  <span class="hljs-comment"># now we will call the training function</span>
  history = train_with_early_stopping(
      model, optimizer, criterion, accuracy,
      train_loader, valid_loader,
      n_epochs
  )
</code></pre>
</li>
</ul>
<pre><code class="lang-python">Epoch <span class="hljs-number">1</span>/<span class="hljs-number">100</span>, train loss: <span class="hljs-number">2.0548</span>, train metric: <span class="hljs-number">0.2134</span>, valid metric: <span class="hljs-number">0.1974</span> (best) <span class="hljs-keyword">in</span> <span class="hljs-number">4.1</span>s
Epoch <span class="hljs-number">2</span>/<span class="hljs-number">100</span>, train loss: <span class="hljs-number">1.9637</span>, train metric: <span class="hljs-number">0.2550</span>, valid metric: <span class="hljs-number">0.2746</span> (best) <span class="hljs-keyword">in</span> <span class="hljs-number">4.0</span>s
Epoch <span class="hljs-number">3</span>/<span class="hljs-number">100</span>, train loss: <span class="hljs-number">1.8881</span>, train metric: <span class="hljs-number">0.2879</span>, valid metric: <span class="hljs-number">0.3152</span> (best) <span class="hljs-keyword">in</span> <span class="hljs-number">3.9</span>s
Epoch <span class="hljs-number">4</span>/<span class="hljs-number">100</span>, train loss: <span class="hljs-number">1.8266</span>, train metric: <span class="hljs-number">0.3196</span>, valid metric: <span class="hljs-number">0.2904</span> <span class="hljs-keyword">in</span> <span class="hljs-number">3.9</span>s
...
Epoch <span class="hljs-number">29</span>/<span class="hljs-number">100</span>, train loss: <span class="hljs-number">1.3953</span>, train metric: <span class="hljs-number">0.5011</span>, valid metric: <span class="hljs-number">0.4284</span> <span class="hljs-keyword">in</span> <span class="hljs-number">4.0</span>s
Epoch <span class="hljs-number">30</span>/<span class="hljs-number">100</span>, train loss: <span class="hljs-number">1.3910</span>, train metric: <span class="hljs-number">0.5036</span>, valid metric: <span class="hljs-number">0.4216</span> <span class="hljs-keyword">in</span> <span class="hljs-number">4.1</span>s
Epoch <span class="hljs-number">31</span>/<span class="hljs-number">100</span>, train loss: <span class="hljs-number">1.3841</span>, train metric: <span class="hljs-number">0.5062</span>, valid metric: <span class="hljs-number">0.4260</span> <span class="hljs-keyword">in</span> <span class="hljs-number">4.1</span>s
Early stopping!
Output <span class="hljs-keyword">is</span> truncated. View <span class="hljs-keyword">as</span> a scrollable element <span class="hljs-keyword">or</span> open <span class="hljs-keyword">in</span> a text editor. Adjust cell output settings...
</code></pre>
<p>This output shows the <strong>training stopped early</strong> due to no improvement. Here's what it means:</p>
<p><strong>Key Observations:</strong></p>
<ol>
<li><p><strong>Train &gt; Valid accuracy gap</strong> (50.62% vs 42.60%) - Shows some <strong>overfitting</strong>, but this is expected</p>
</li>
<li><p><strong>Stopped at epoch 31</strong> - Saved time and computational resources by not training all 100 epochs</p>
</li>
<li><p><strong>Model restored</strong> - The best model checkpoint (from epoch 21) was automatically loaded</p>
</li>
</ol>
<h2 id="heading-test-the-model">Test the Model</h2>
<p>We will test the model on test set</p>
<pre><code class="lang-python">
test_accuracy = evaluate_tm(model, test_loader, accuracy).item()
print(<span class="hljs-string">f"\nTest Accuracy: <span class="hljs-subst">{test_accuracy:<span class="hljs-number">.4</span>f}</span> (<span class="hljs-subst">{test_accuracy*<span class="hljs-number">100</span>:<span class="hljs-number">.2</span>f}</span>%)"</span>)
</code></pre>
<pre><code class="lang-python">
Test Accuracy: <span class="hljs-number">0.4377</span> (<span class="hljs-number">43.77</span>%)
</code></pre>
<h2 id="heading-charts">Charts</h2>
<p>Lets draw the charts for training loss over epochs and Accuracy over epochs</p>
<pre><code class="lang-python"><span class="hljs-comment"># Plot training history</span>
fig, axes = plt.subplots(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, figsize=(<span class="hljs-number">14</span>, <span class="hljs-number">5</span>))

<span class="hljs-comment"># Plot loss</span>
epochs_range = range(<span class="hljs-number">1</span>, len(history[<span class="hljs-string">'train_losses'</span>]) + <span class="hljs-number">1</span>)
axes[<span class="hljs-number">0</span>].plot(epochs_range, history[<span class="hljs-string">'train_losses'</span>], <span class="hljs-string">'b-'</span>, linewidth=<span class="hljs-number">2</span>, label=<span class="hljs-string">'Train Loss'</span>)
axes[<span class="hljs-number">0</span>].set_xlabel(<span class="hljs-string">'Epoch'</span>, fontsize=<span class="hljs-number">12</span>)
axes[<span class="hljs-number">0</span>].set_ylabel(<span class="hljs-string">'Loss'</span>, fontsize=<span class="hljs-number">12</span>)
axes[<span class="hljs-number">0</span>].set_title(<span class="hljs-string">'Training Loss Over Epochs'</span>, fontsize=<span class="hljs-number">14</span>, fontweight=<span class="hljs-string">'bold'</span>)
axes[<span class="hljs-number">0</span>].grid(<span class="hljs-literal">True</span>, alpha=<span class="hljs-number">0.3</span>)
axes[<span class="hljs-number">0</span>].legend()

<span class="hljs-comment"># Plot accuracy</span>
axes[<span class="hljs-number">1</span>].plot(epochs_range, history[<span class="hljs-string">'train_metrics'</span>], <span class="hljs-string">'g-'</span>, linewidth=<span class="hljs-number">2</span>, label=<span class="hljs-string">'Train Accuracy'</span>)
axes[<span class="hljs-number">1</span>].plot(epochs_range, history[<span class="hljs-string">'valid_metrics'</span>], <span class="hljs-string">'r-'</span>, linewidth=<span class="hljs-number">2</span>, label=<span class="hljs-string">'Valid Accuracy'</span>)
axes[<span class="hljs-number">1</span>].set_xlabel(<span class="hljs-string">'Epoch'</span>, fontsize=<span class="hljs-number">12</span>)
axes[<span class="hljs-number">1</span>].set_ylabel(<span class="hljs-string">'Accuracy'</span>, fontsize=<span class="hljs-number">12</span>)
axes[<span class="hljs-number">1</span>].set_title(<span class="hljs-string">'Accuracy Over Epochs'</span>, fontsize=<span class="hljs-number">14</span>, fontweight=<span class="hljs-string">'bold'</span>)
axes[<span class="hljs-number">1</span>].grid(<span class="hljs-literal">True</span>, alpha=<span class="hljs-number">0.3</span>)
axes[<span class="hljs-number">1</span>].legend()

plt.tight_layout()
plt.show()

<span class="hljs-comment"># Print summary</span>
print(<span class="hljs-string">"\n"</span> + <span class="hljs-string">"="</span>*<span class="hljs-number">50</span>)
print(<span class="hljs-string">"TRAINING SUMMARY"</span>)
print(<span class="hljs-string">"="</span>*<span class="hljs-number">50</span>)
print(<span class="hljs-string">f"Total Epochs Trained: <span class="hljs-subst">{len(history[<span class="hljs-string">'train_losses'</span>])}</span>"</span>)
print(<span class="hljs-string">f"Final Train Loss: <span class="hljs-subst">{history[<span class="hljs-string">'train_losses'</span>][<span class="hljs-number">-1</span>]:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"Final Train Accuracy: <span class="hljs-subst">{history[<span class="hljs-string">'train_metrics'</span>][<span class="hljs-number">-1</span>]:<span class="hljs-number">.4</span>f}</span> (<span class="hljs-subst">{history[<span class="hljs-string">'train_metrics'</span>][<span class="hljs-number">-1</span>]*<span class="hljs-number">100</span>:<span class="hljs-number">.2</span>f}</span>%)"</span>)
print(<span class="hljs-string">f"Final Valid Accuracy: <span class="hljs-subst">{history[<span class="hljs-string">'valid_metrics'</span>][<span class="hljs-number">-1</span>]:<span class="hljs-number">.4</span>f}</span> (<span class="hljs-subst">{history[<span class="hljs-string">'valid_metrics'</span>][<span class="hljs-number">-1</span>]*<span class="hljs-number">100</span>:<span class="hljs-number">.2</span>f}</span>%)"</span>)
print(<span class="hljs-string">f"Best Valid Accuracy: <span class="hljs-subst">{max(history[<span class="hljs-string">'valid_metrics'</span>]):<span class="hljs-number">.4</span>f}</span> (<span class="hljs-subst">{max(history[<span class="hljs-string">'valid_metrics'</span>])*<span class="hljs-number">100</span>:<span class="hljs-number">.2</span>f}</span>%)"</span>)
print(<span class="hljs-string">"="</span>*<span class="hljs-number">50</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769562422970/a6ab6c3a-e3fc-4430-ba49-aca4b8c6d026.png" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[Convolutional Neural Network]]></title><description><![CDATA[A convolutional neural network (CNN) is an advanced deep learning architecture designed for the identification and classification of images. In addition to image recognition, CNNs are utilized for object detection within images, audio classification,...]]></description><link>https://path2ml.com/convolutional-neural-network</link><guid isPermaLink="true">https://path2ml.com/convolutional-neural-network</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[convolutional networks]]></category><category><![CDATA[CNNs (Convolutional Neural Networks)]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Tue, 27 Jan 2026 05:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1739674185364/5601406e-b1ab-4ebc-a5e4-7cf4be3d3a4b.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A convolutional neural network <strong>(CNN)</strong> is an advanced deep learning architecture designed for the identification and classification of images. In addition to image recognition, CNNs are utilized for object detection within images, audio classification, and the analysis of time-series data. A convolutional layer processes an input volume, transforming it into an output volume that may vary in size.</p>
<p><strong>Convolutional neural networks</strong> (commonly known as <strong>convnets</strong>) are powerful architectures that build upon the foundation of fully connected neural networks. They consist of layers of neurons equipped with learnable weights and biases. Each neuron processes input data by performing a linear transformation followed by a nonlinear activation function, culminating in a unified scoring function that translates raw image pixels at the input layer into definitive class scores at the output layer.</p>
<p>The unique advantage of convnets lies in their deliberate assumptions about the structure of input data, especially images. These assumptions empower the architecture to encode critical properties, leading to remarkable implementation efficiency and a significant reduction in the number of parameters in the network.</p>
<p>Instead of treating input data as simple linear arrays, convnets expertly manage information as three-dimensional volumes defined by width, height, and depth. This allows each layer to accept a 3D volume of numerical data as input and produce another 3D volume as output. By incorporating color depth as the third dimension, a two-dimensional input image is seamlessly transformed into a three-dimensional representation, enhancing the network's ability to interpret and analyze visual information.</p>
<p>A <strong>Convolutional Neural Network (CNN)</strong> is structured with several layers, primarily categorized into three main types: convolutional layers, pooling layers, and fully connected layers. CNNs are often comprised of many layers, particularly a combination of convolutional and pooling layers, which work together to extract and refine features from input data.</p>
<p>In convolutional layers, specialized nodes, or filters, slide over the input data to detect patterns, such as edges or textures, by performing convolution operations. Pooling layers follow, downsampling the feature maps generated by the convolutional layers to reduce their dimensionality, thereby retaining essential information while minimizing computational load.</p>
<p>Finally, fully connected layers integrate the features learned throughout the network, connecting all nodes to produce the final output. Together, these layers enable the CNN to effectively analyze and interpret complex data, making them a powerful tool in various applications, such as image and video recognition.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739739623933/7c638645-2637-413f-8cae-e37fcf5d3283.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-convolution-layer">Convolution Layer</h3>
<p>The convolution process consists of the following steps:</p>
<ol>
<li><p>It commences with an input volume.</p>
</li>
<li><p>A filter is applied at every position throughout the input.</p>
</li>
<li><p>The process yields an output volume, which typically differs in size from the input.</p>
</li>
</ol>
<p>To convolve a 3x3 filter with an image, one multiplies the filter's values element-wise with the corresponding values of the original matrix. The resulting products are then summed, and a bias is added to produce the final output. Then we move one step next to the right for horizontal move and apply the filter and repeat the same convolution process. We do the same process while moving one step vertically down. This is also called to be using one stride.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739630442421/18a93f56-4b97-455c-b78c-c2a7bf8a6490.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-zero-padding">Zero-Padding</h3>
<p>Zero-padding adds zeros around the border of an image.</p>
<p>The primary advantages of utilizing padding in convolutional neural networks are as follows:</p>
<ul>
<li><p>Padding enables the application of a convolutional layer without necessarily reducing the height and width of the input volumes. This characteristic is crucial for constructing deeper networks, as it prevents a reduction in height and width as one progresses through subsequent layers. Notably, the "same" convolution is a specific instance where the height and width are accurately maintained after processing through one layer.</p>
</li>
<li><p>Additionally, padding contributes to the retention of information at the periphery of an image. In the absence of padding, the influence of border pixels on subsequent layers would be significantly diminished, thereby compromising the utilization of crucial edge data.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739630463928/0e3ae299-5e8c-4d81-b90b-11e19e976c14.jpeg" alt class="image--center mx-auto" /></p>
<h3 id="heading-stride">Stride</h3>
<p>Stride = amount you move the window each time you slide as shown in below in an image by red arrows. In below image Convolution operation is shown with a filter of 3x3 and a stride of 2 to create 3 by 3 output. In convolutional neural networks (CNNs), the main advantage of using stride is its capacity to efficiently downsample input data. This allows the network to concentrate on more significant features while also reducing computational complexity by processing fewer pixels. Increasing the stride decreases the computational load, as the filter moves across more pixels with each step, resulting in fewer operations. This can speed up both the training and inference processes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739641923292/b08dbf40-15c1-43a1-9cbb-df9e90a4596b.jpeg" alt class="image--center mx-auto" /></p>
<p>If n x n is the size of the input image , f x f is the size of filter P is padding and S is stride then output size will be</p>
<p>\(\frac{n+2P-f}{S} +1\) by \(\frac{n+2P-f}{S} +1\)</p>
<h3 id="heading-pooling-layer"><strong>Pooling Layer</strong></h3>
<p>The pooling (POOL) layer reduces the height and width of the input, which helps decrease computation while also making feature detectors more invariant to their position in the input. There are two main types of pooling layers:</p>
<ul>
<li><p><strong>Max-Pooling Layer:</strong> This layer stores the maximum value within the specified window in the output.</p>
</li>
<li><p><strong>Average-Pooling Layer:</strong> This layer calculates and stores the average value within the specified window in the output.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739670803338/3e243e12-f82b-4949-b14e-9444bb4230c4.jpeg" alt class="image--center mx-auto" /></p>
<p>Looking at <strong>CNN</strong> in 3D ,in a <strong>convolutional</strong> layer, each pixel in the <strong>feature map</strong> corresponds to a single neuron. All neurons within a specific feature map share the same parameters, meaning they use the same kernel and bias term. However, neurons in different feature maps utilize different parameters. Each neuron's receptive field remains consistent, extending across all the feature maps from the previous layer. In summary, a convolutional layer applies multiple trainable filters simultaneously to its inputs, allowing it to detect various features anywhere within the input data.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769608657284/781bb89d-137e-446c-8f5b-266a7ef5ae53.gif" alt class="image--center mx-auto" /></p>
<p>Having all neurons in a feature map share the same parameters significantly reduces the number of parameters in the model.</p>
]]></content:encoded></item><item><title><![CDATA[Deep Learning Explained]]></title><description><![CDATA[In the realm of popular neural network architectures, a diverse array of layer types is employed, each serving a distinct purpose. In this blog, we will delve into one of the most fundamental components: the linear layer. This layer is characterized ...]]></description><link>https://path2ml.com/deep-learning-explained</link><guid isPermaLink="true">https://path2ml.com/deep-learning-explained</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[DeepLearning]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[MachineLearning]]></category><category><![CDATA[Gradient-Descent ]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Sat, 17 Jan 2026 21:06:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768663272258/07691eba-2c31-4252-b732-d76826316fc6.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the realm of popular neural network architectures, a diverse array of layer types is employed, each serving a distinct purpose. In this blog, we will delve into one of the most fundamental components: the linear layer. This layer is characterized by a structure in which each neuron, also known as a perceptron, in the preceding layer is intricately connected to every neuron in the subsequent layer. This design is commonly referred to as a fully <strong>connected layer</strong>, given that all neurons engage in interactions with one another across layers.</p>
<h2 id="heading-forward-propagation">Forward Propagation</h2>
<p>To better understand this, consider that if the preceding layer consists of <strong>( m )</strong> neurons and the following layer has <strong>( n )</strong> neurons, the network establishes a total of <strong>( mn )</strong> individual connections. Each of these connections carries its own unique weight, which plays a critical role in determining how information is processed as it flows through the network. This foundational layer effectively facilitates the transmission of signals, allowing for complex computations and ultimately contributing to the learning capabilities of the neural network.</p>
<p>The weight associated with the connection between the <strong>kth</strong> neuron in the previous layer \(\textbf{l-1}\) and the <strong>jth</strong> neuron in the current layer \(\textbf{l}\) is denoted as \(\mathbf{w_{jk}^l}\). This weight represents the strength of the influence that the <strong>kth</strong> neuron has on the activation of the <strong>jth</strong> neuron, playing a crucial role in the computations performed by the neural network.</p>
<h3 id="heading-single-perceptron-at-layer-l">Single perceptron at layer l</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768618116926/3cb92982-a47e-4344-844e-ffe79943477a.png" alt /></p>
<p>Neurons are depicted in circles and <strong>jth</strong> neuron in layer <strong>l</strong> as <strong>red</strong> circle</p>
<p>A parameterized function is a mathematical construct that processes an input to yield a specific decision or estimate. One of the most basic forms of this function is the <strong>weighted (w) sum of inputs</strong>, augmented by a <strong>bias term (b)</strong>. This approach assigns varying levels of importance to each input before they are combined, enabling a more nuanced evaluation of how each input contributes to the overall outcome. Nonlinearity is introduced through functions such as the sigmoid function. The weights <strong><em>w</em><sub>0</sub></strong>, <strong><em>w</em><sub>1</sub></strong> and the bias <strong><em>b</em></strong> are the parameters of the function</p>
<p>Let \(\mathbf{a_0^{l-1}}\), \(a_1^{l-1}\) …. , \(\mathbf{a_m^{l-1}}\) are the outputs of <strong>m</strong> neurons in layer <strong>l-1. and</strong> \(\mathbf{a_0^{l}}\), \(\mathbf{a_1^{l}}\)… \(\mathbf{a_n^{l}}\) are the outputs of <strong>n neurons</strong> at layer l.</p>
<p>Considering the <strong>jth neuron</strong> in layer l</p>
<p>Parameterized model function in layer l for <strong>jth</strong> neuron is defined as</p>
<p>\(\mathbf{f_{w,b}(a)}=\mathbf{z_j^l= \sum_{k=0}^m w_{jk}^l a_k^{l-1} +b_j^l}\) where the number of neurons in layer <strong>l-1</strong> and <strong>l</strong> are <strong>m</strong> and <strong>n</strong></p>
<p>In dot product this can be represented as</p>
<p>\(\mathbf{z_j^l=[ w_{j0}^l\enspace w_{j1}^l \enspace ... w_{jm}^l] \begin{bmatrix} a_0^{l-1} \\\\ a_1^{l-1}   \\\\..... \\\\ a_m^{l-1} \end{bmatrix}+b_j^l}\)</p>
<p>And the activation vector which is the output at layer <strong>l</strong> for <strong>jth</strong> neuron is derived by applying activation function on the model function \(z_j^l\)</p>
<p>\(\mathbf{a_j^l=\sigma(z_j^l)}\)</p>
<p>For all \(\mathbf{j=0....n}\) , here arrow as superscript is used to depict Vector, and can be written as</p>
<p>\(\mathbf{\vec{z}^{\,l}=W^l  \enspace \vec{a}^{\,l-1}+\vec{b}^{\,l}}\)</p>
<p>\(\mathbf{\vec{a}^{\,l}=\sigma(\vec{z}^{\,l})}\) <strong>………………………………………………</strong> \(\mathbf{(1)}\)</p>
<p>Here \(\mathbf{W^l}\) is an <strong><em>n</em> × <em>m</em></strong> matrix representing the weights of <em>all connections from layer</em> <strong><em>l − 1</em></strong> <em>to layer</em> <strong><em>l</em></strong></p>
<p>\(W^l=\begin{bmatrix} w_{00}^{l}  \enspace w_{01}^{l}   \enspace ....  \enspace w_{0m}^{l}    \\\\ w_{10}^{l}  \enspace w_{11}^{l}   \enspace ....  \enspace w_{1m}^{l}   \\\\..... \\\\ w_{n0}^{l}  \enspace w_{n1}^{l}   \enspace ....  \enspace w_{nm}^{l}\end{bmatrix}\) and \(\vec{a}^{\,l-1}=\begin{bmatrix} a_0^{l-1} \\\\ a_1^{l-1}   \\\\..... \\\\ a_m^{l-1} \end{bmatrix}\)and \(\vec{b}^{\,l}=\begin{bmatrix} b_0^{l} \\\\ b_1^{l}   \\\\..... \\\\ b_m^{l} \end{bmatrix}\) and</p>
<p>\(\sigma(\vec{z}^{\,l})= \begin{bmatrix} \sigma (z_0^{l}) \\\\ \sigma (z_1^{l})   \\\\..... \\\\ \sigma (z_n^{l}) \end{bmatrix}\)</p>
<p><strong>Equation 1</strong> details the process of <strong>forward</strong> <strong>propagation</strong> within a single linear layer of a neural network for example <strong>layer</strong> \(\mathbf{l}\) as shown above. In the context of a multilayer perceptron (MLP), which consists of a series of fully connected layers ranging from layer 0 to layer L, the final output can be achieved by continually applying this equation to the input data. Each application of the equation transforms the input, allowing information to flow through the network and ultimately produce the desired output.</p>
<p>This expression is evaluated incrementally through the repeated application of the linear layer.</p>
<p>\(\mathbf{\vec{a}^{\,0}=\sigma(W^0  \enspace \vec{x}+\vec{b}^{\,0})}\)</p>
<p>\(\mathbf{\vec{a}^{\,l}=\sigma(W^1  \enspace \vec{a^0}+\vec{b}^{\,1})}\)</p>
<p>……………………</p>
<p>\(\mathbf{\vec{a}^{\,L}=\sigma(W^1  \enspace \vec{a^{L-1}}+\vec{b}^{\,L})}\)</p>
<h2 id="heading-backward-propagation">Backward Propagation</h2>
<h3 id="heading-loss-function-and-training">Loss function and training</h3>
<p>In the context of neural networks, let <strong>( y )</strong> represent the output generated by the model, while \(( \hat{y} ) \) signifies the actual or ground truth value that we aim to predict. To quantify the difference between these two values, we utilize a common metric known as the mean squared error <strong>(MSE)</strong>. This loss function is expressed mathematically as \(\mathbf{ (y - \hat{y})^2 }\), which calculates the square of the difference between the predicted output and the true value. For now, for simplicity we have selected the MSE as our primary loss function for training the neural network.</p>
<p>\(L= \mathbf{\frac{1}{2} \sum_{k=0}^m (y-\hat{y})^2}\) <strong>…………………………………………………………</strong> \(\mathbf{(2)}\)</p>
<p>We can transform each layer's weight matrix, denoted as \(\mathbf{w^l}\), along with its corresponding bias \(\mathbf{b^l}\), into individual vectors. After this conversion, we concatenate the vectors from all layers in sequence , resulting in a single, extensive vector that encompasses all the weights and biases throughout the multilayer perceptron (MLP). This concatenated vector one for weights and other for bias serves as a unified representation of the entire model's parameters, facilitating efficient processing and optimization during training.</p>
<p>\(\mathbf{\vec{w}=[ w_{00}^0\enspace w_{01}^0 \enspace ..... w_{00}^1 w_{01}^1\enspace ... \enspace w_{00}^L  w_{01}^L \enspace ..]}\)</p>
<p>\(\mathbf{\vec{b}=[ b_{0}^0\enspace b_{1}^0 \enspace ..... b_{0}^1 b_{1}^1\enspace ... \enspace b_{0}^L  b_{1}^L \enspace ..]}\)</p>
<p>The primary objective of training is to discover the optimal parameters and configurations that will effectively reduce the loss <strong>(equation 2)</strong> to its lowest possible level.</p>
<p>We begin by calculating the gradients of the loss function in relation to the weights and biases of our model. These gradients indicate the direction and rate at which we should adjust the weights and biases to minimize the loss. To refine our model, we update the weights and biases by a value that is proportional to these computed gradients. By repeatedly performing this update process, we progressively move toward the minimum point of the loss function, ultimately leading to improved model performance.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768618009518/c0f66472-783b-449b-bb58-ccaf387150a0.png" alt /></p>
<p>The equations for updating weights and biases in gradient descent are</p>
<p>\(\mathbf{\vec{w}=\vec{w}- \lambda \frac{\partial L}{\partial w_{jk}^l}  }\) for all <strong>l,j,k</strong></p>
<p>\(\mathbf{\vec{b}=\vec{b}- \lambda \frac{\partial L}{\partial b_{j}^l}  }\) for all <strong>j,l</strong> <strong>……..………………………………………</strong> \(\mathbf{(3)}\)</p>
<p>Equations to updating individual weights and biases using individual partial derivatives are</p>
<p>\(\mathbf{w_{jk}^l=w_{jk}^l- \lambda \frac{\partial L}{\partial w_{jk}^l}  }\) for all <strong>l, j, k</strong></p>
<p>\(\mathbf{b_j^l=b_j^l- \lambda \frac{\partial L}{\partial b_{j}^l}  }\) <strong>………………………………………………………….</strong> \(\mathbf{(4)}\)</p>
<p>Gradient descent is an iterative optimization algorithm that updates the weights and biases of a model to minimize the loss function. It accomplishes this by applying <strong>equation</strong> \(\mathbf{(3)}\) in each iteration, allowing for systematic adjustment of the model parameters based on the calculated error. This method is essentially equivalent to updating each weight and bias individually by utilizing their specific partial derivatives, By doing so, gradient descent effectively fine-tunes the model, gradually improving its performance with each update.</p>
<h3 id="heading-back-propagation-with-single-neuron-par-layer">Back propagation with single neuron par layer</h3>
<p>We will assess <strong>back propagation</strong> on a simple perceptron that consists of only one neuron per layer. This simplification allows us to avoid using subscripts for individual <strong>weights</strong> and <strong>biases</strong>, as there is only one weight and one bias between two consecutive layers. We will use superscripts to indicate the layer. We will employ Mean Squared Error (MSE) as our loss function and will focus on a single input-output pair, denoted as \(\mathbf{x_i}\) and \(\mathbf{y_i}\). The total loss <strong>L</strong>, which represents the summation across all training data instances, can be easily derived by applying the same steps repeatedly.</p>
<p>Forward propagation for an arbitrary layer <strong>l</strong> is defined as</p>
<p>\(\mathbf{z^l=w^l a^{l-1} + b^l}\) and \(\mathbf{a^l=\sigma(z^l)}\) <strong>……………………………………………..</strong> \(\mathbf{(5)}\)</p>
<p>Loss function for a given \(\mathbf{(x_i,y_i)}\) is \(L= \mathbf{\frac{1}{2}  (a^L-\hat{y_i})^2}\) where <strong>L</strong> is last layer <strong>…………………………</strong> \(\mathbf{(6)}\)</p>
<h3 id="heading-partial-derivative-of-loss-with-respect-to-the-weights-w-for-the-last-layer-l">Partial derivative of loss with respect to the weights (w) for the last layer, L</h3>
<p>\(\mathbf{\frac{\partial L}{\partial w^L}= \frac{\partial L}{\partial z^L}\frac{\partial z}{\partial w^L}= \frac{\partial L}{\partial z^L}  a^{L-1}}\) ………………………………………. \(\mathbf{(7)}\)</p>
<p>\(\mathbf{\frac{\partial z}{\partial w^L}=  a^{L-1}}\) <strong>and …………………………………</strong> \(\mathbf{(8)}\)</p>
<p>\(\mathbf{\frac{\partial L}{\partial z^L}=  \frac{\partial L}{\partial a^L}   \frac{\partial a^L}{\partial z^L}}\) Using the chain rule for partial derivatives ………………………….. \(\mathbf{(9)}\)</p>
<p>\(\mathbf{\frac{\partial L}{\partial a^L}=  (a^L-\hat{y_i}) }\) ………………………………………….. \(\mathbf{(10)}\)</p>
<p>\(\mathbf{\frac{\partial a^L}{\partial z^L}=  \frac{\partial \sigma(z^L)}{\partial z^L}   }\) ………………………………………………..…. \(\mathbf{11)}\)</p>
<p>The backpropagation algorithm is a powerful tool used in training artificial neural networks, and its effectiveness is not limited to just the <strong>sigmoid</strong> \(\mathbf{\sigma}\) activation function. In fact, back propagation can work well with a variety of activation functions, including <strong>ReLU</strong> <strong>(Rectified Linear Unit)</strong>, <strong>tanh (hyperbolic tangent),</strong> and <strong>softmax</strong>, among others. Each of these functions has unique properties that can lead to improved performance in different contexts. For instance, while the sigmoid function is helpful for binary classification tasks, the ReLU function is often preferred in deeper networks because it mitigates issues related to vanishing gradients, allowing for faster convergence. The flexibility of back propagation in accommodating multiple activation functions enables it to be applied across a wide range of neural network architectures and applications, enhancing its versatility and effectiveness in machine learning.</p>
<p>These functions possess the characteristic of maintaining a nonzero derivative throughout their entire domain, which enables the <strong>gradient descent</strong> algorithm to consistently make progress at each iteration. This ensures that the optimization process can effectively navigate the function landscape without encountering flat regions, allowing for a more fluid and efficient <strong>convergence</strong> toward the optimal solution.</p>
<p>Some popular activation functions and their derivatives are shown below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768838220580/364e15fe-d540-4296-bd69-5dc645e58318.png" alt class="image--center mx-auto" /></p>
<p>For our current use case we have picked <strong>sigmoid</strong> activation function.</p>
<p>Lets calculate the derivative of <strong>sigmoid</strong> function for a variable x</p>
<p>\(\mathbf{\frac{d \sigma(x)}{ dx}=\frac{d( (1+e^{-x} )^{-1})}{dx} =(\frac{-1}{ (1+e^{-x})^{-2} })  \frac{d}{dx}((1+e^{-x}))=(\frac{-1}{ (1+e^{-x})^{-2} })  ((-e^{-x}))=\sigma(x)(1-\sigma(x))}\)now replacing x with \(z^L\)</p>
<p>\(\mathbf{\frac{\partial a^L}{\partial z^L}=  \frac{\partial \sigma(z^L)}{\partial z^L}   =\sigma(z^L)(1-\sigma(z^L))=a^L(1-a^L)}\) …………………………… \(\mathbf{(12)}\)</p>
<p>Now substituting \(\mathbf{(10)}\) and \(\mathbf{(12)}\) in \(\mathbf{(9)}\) we get</p>
<p>\(\mathbf{\frac{\partial L}{\partial z^L}= (a^L-\hat{y_i})( a^L(1-a^L))  }\) ………………………………………………. \(\mathbf{(13)}\)</p>
<p>Now substituting \(\mathbf{(8)}\) and \(\mathbf{(13)}\) in equation \(\mathbf{(7)}\) we get</p>
<p>\(\mathbf{\frac{\partial L}{\partial w^L}= a^{L-1}(a^L-\hat{y_i})(a^L(1-a^L)) }\) ………………………………………. \(\mathbf{(14)}\)</p>
<h3 id="heading-partial-derivative-of-loss-with-respect-to-the-bias-b-for-the-last-layer-l">Partial derivative of loss with respect to the bias (b) for the last layer, L</h3>
<p>\(\mathbf{\frac{\partial L}{\partial b^L}= \frac{\partial L}{\partial z^L}\frac{\partial z}{\partial b^L}= \frac{\partial L}{\partial z^L}. 1}\) ………………………………………………………. \(\mathbf{(15)}\)</p>
<p>so \(\mathbf{\frac{\partial L}{\partial b^L}=(a^L-\hat{y_i})(a^L(1-a^L))}\) …………………………………………. \(\mathbf{(16)}\)</p>
<p>Weights and biases in equation \(\mathbf{(4)} \) get adjusted as shown below using equation \(\mathbf{(14)}\) and \(\mathbf{(16)}\)</p>
<p>\(\mathbf{w^L=w^L- \lambda \frac{\partial L}{\partial w^L}  }\)</p>
<p>\(\mathbf{b^L=b^L- \lambda \frac{\partial L}{\partial b^L}  }\)</p>
<p>Where \(\mathbf{\lambda}\) is the learning rate parameter which decides how big is step, taken in gradient decent.</p>
<p>After adjusting the weights and biases in the last layer <strong>L</strong> of the neural network, we begin the process of back propagation. This involves systematically moving backwards through each preceding layer, making necessary adjustments to both the weights and biases. We continue this process layer by layer until we reach the very first layer of the network, ensuring that each component is fine-tuned to improve the overall performance of the model.</p>
]]></content:encoded></item><item><title><![CDATA[Importance of dot product in machine learning]]></title><description><![CDATA[In the most basic type of machine learning model, the output is calculated by taking a weighted sum of the input features. Each input is multiplied by a corresponding weight that represents its importance in the model. Once this weighted sum is obtai...]]></description><link>https://path2ml.com/importance-of-dot-product-in-machine-learning</link><guid isPermaLink="true">https://path2ml.com/importance-of-dot-product-in-machine-learning</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[machine-learning-math]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Sun, 11 Jan 2026 22:40:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768163651460/23d5de7b-fa03-4188-83bb-92b7aecde0b8.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the most basic type of machine learning model, the output is calculated by taking a weighted sum of the input features. Each input is multiplied by a corresponding weight that represents its importance in the model. Once this weighted sum is obtained, a bias term is added to the result. The bias allows the model to adjust the output independently of the input values, helping to improve accuracy and fit the model better to the data. This fundamental approach serves as the foundation for more complex machine learning algorithms.</p>
<p>For a given one input instance defined as \(x=[x_0 x_1]\) where \(x_0 \) and \(x_1\) are features in the data set output of the model is defined as</p>
<p>\( y=w_0x_0+w_1x_1+b\) where \(w_0,w_1\) are weights and \(b\) is bias.</p>
<p>In situations where there are multiple features and weights, we utilize dot product notation for representation. The <strong>dot product</strong> is defined as the element-wise multiplication of two vectors, facilitating the analysis of their relationships.</p>
<p>lets say we have two vectors vector \(x= \\\begin{bmatrix} x_0  \\ x_1  \\ \vdots \\ x_n \end{bmatrix}\\\) and vector \(w= \\\begin{bmatrix} w_0  \\ w_1  \\ \vdots \\w_n \end{bmatrix}\\\) then their dot product will simply be the element-wise multiplication of the two vectors.</p>
<p>\(w.x=w_0x_0+w_1x_1+.....+w_nx_n\) in other way the dot product of two vectors is the sum of the products of their corresponding elements.</p>
<p>Lets assume a machine learning model that is designed to predict a specific target value, represented as \(y\)However, instead of achieving this target, the model produces an output, denoted as \(\hat{y}\) which may differ from what we expect. To evaluate the model's performance and understand how accurately it is making predictions, we need to calculate the error. This error is defined as the difference between the desired target value y and the actual output generated by the model \(\hat{y}\). To effectively measure this discrepancy, we employ a statistical method known as mean squared error, which helps us quantify the average of the squared differences between the predicted and target values. This allows us to gain insights into the model's accuracy and areas for improvement.</p>
<p>squared error \(e^2=(y-\hat{y})^2\)</p>
<p>The total error across the entire training dataset is determined by calculating the difference between the output vector and the ground truth vector. Each element of this difference is squared, and the resulting squared values are summed to obtain the total error. This procedure is equivalent to computing the dot product of the difference vector with itself. This operation represents the <strong>squared magnitude</strong>, or length, known as the <strong>L2 norm</strong> of a vector, which is defined as the <strong>dot product of the vector with itself</strong>.</p>
<p>\(E^2=(Y-\hat{Y}).(Y-\hat{Y})=(Y-\hat{Y})^T(Y-\hat{Y})= \\\begin{bmatrix} y_0  \\ y_1  \\ \vdots \\ y_n \end{bmatrix}\\.\\\begin{bmatrix} y_0  \\ y_1  \\ \vdots \\ y_n \end{bmatrix}\\\)</p>
<p>\(=y_0^2+y_1^2.......y_n^2\)</p>
<p>The <strong>L2 norm of a vector</strong>, often referred to as the Euclidean norm, is a mathematical concept that measures the length or magnitude of a vector in a multi-dimensional space. It is calculated as the square root of the sum of the squares of its components.</p>
<p>L2 norm of a vector is denoted as \(||x||\) is defined as \(||V||=\sqrt{V^TV}=\sqrt{v_0^2+v_1^2+....v_n^2}\)</p>
<p>In a machine learning model with an output vector \(\hat{Y}\) and a target vector \(Y\), the error is defined as the magnitude or L2 norm of the difference between these vectors.</p>
<p>\(E=\sqrt{(Y-\hat{Y}).(Y-\hat{Y})}=\sqrt{(Y-\hat{Y})^T(Y-\hat{Y})}\)</p>
<h3 id="heading-feature-similarity-using-dot-product">Feature similarity using dot product</h3>
<p>lets take an example as shown below where we have multiple documents as shown as sentences and words eligible for the feature vector are highlighted in bold. The first element of the feature vector indicates the number of occurrences of the word home, and the second indicates office.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>id</td><td>document</td><td>feature vector</td></tr>
</thead>
<tbody>
<tr>
<td>\(d_0\)</td><td>I can't wait to go <strong>home</strong> after a long vacation.</td><td>\([1,0]\)</td></tr>
<tr>
<td>\(d_1\)</td><td>"I have the flexibility to work from my <strong>home</strong> <strong>office</strong> three days a week, but I still prefer going into the main <strong>office</strong> for meetings</td><td>\([1,2]\)</td></tr>
<tr>
<td>\(d_2\)</td><td>In his new remote setup, his <strong>home</strong> had to function simultaneously as both a quiet <strong>home</strong> environment and a fully operational <strong>home</strong> office, blending the comfort of <strong>home</strong> with the structure of the <strong>office</strong> until he couldn't tell where the <strong>home</strong> ended and the <strong>office</strong> began</td><td>\([5,2]\)</td></tr>
<tr>
<td>\(d_3\)</td><td>I need to stop by the main <strong>office</strong> to pick up my new employee badge before the meeting starts.</td><td>\([0,1]\)</td></tr>
</tbody>
</table>
</div><p>We have a collection of documents, each represented by its own feature vector. To evaluate the similarity between any two documents, we need to assess the similarity between their corresponding <strong>feature vectors</strong>. In this section, we will explore how the dot product of a pair of vectors can serve as a measure of their <strong>similarity</strong>.</p>
<p>Feature vectors corresponding to \(d_0\) and \(d_3\) are \(\\\begin{bmatrix} 1 \\ 0\end{bmatrix}\\\) and \(\\\begin{bmatrix} 0\\ 1\end{bmatrix}\\\) their dot product between them will be \(\\\begin{bmatrix} 1 \\ 0\end{bmatrix}\\.\\\begin{bmatrix} 0 \\ 1\end{bmatrix}\\=0  . 1+1 . 0=0\). This low score aligns with our intuition that there is no common word of interest between the documents, indicating they are very dissimilar.</p>
<p>Feature vectors corresponding to \(d_1\) and \(d_2\) are \(\\\begin{bmatrix} 1 \\ 2\end{bmatrix}\\\) and \(\\\begin{bmatrix} 5 \\ 2\end{bmatrix}\\\) their dot product will be</p>
<p>\(\\\begin{bmatrix} 1 \\ 2\end{bmatrix}\\.\\\begin{bmatrix} 5 \\ 2\end{bmatrix}\\=1  . 5+2 . 2=9\) .</p>
<p>The elevated score reinforces our understanding that the documents possess numerous common words of interest, highlighting their similarities. As a result, we can deduce that vectors representing similar content generate larger dot products, indicating a stronger relationship, while vectors that represent dissimilar content generate dot products that are nearly zero, reflecting a lack of connection between them.This high score aligns with our intuition that the documents share many common words of interest and exhibit similarities. Therefore, we can conclude that <strong>similar vectors</strong> produce <strong>larger dot products</strong>, while <strong>dissimilar vectors</strong> yield dot products that are close to zero.</p>
]]></content:encoded></item><item><title><![CDATA[Algorithms in Amazon SageMaker AI]]></title><description><![CDATA[Amazon SageMaker AI is a fully managed machine learning service provided by AWS that enables developers and data scientists to build, train, and deploy machine learning models at scale.
Amazon SageMaker AI is a cloud-based platform that simplifies th...]]></description><link>https://path2ml.com/algorithms-in-amazon-sagemaker-ai</link><guid isPermaLink="true">https://path2ml.com/algorithms-in-amazon-sagemaker-ai</guid><category><![CDATA[sagemaker ai]]></category><category><![CDATA[AWS]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Deep Learning]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Fri, 10 Oct 2025 22:48:26 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1760135635756/5dfbebcc-79f4-4b7d-ac7e-65d3fcd4630c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Amazon SageMaker AI</strong> is a fully managed machine learning service provided by AWS that enables developers and data scientists to build, train, and deploy machine learning models at scale.</p>
<p>Amazon SageMaker AI is a cloud-based platform that simplifies the machine learning workflow by providing:</p>
<ul>
<li><p><strong>Pre-built algorithms</strong> for various ML tasks.</p>
</li>
<li><p><strong>Managed infrastructure</strong> for training and deployment.</p>
</li>
<li><p><strong>Integrated tools</strong> for data preprocessing, model tuning, and monitoring.</p>
</li>
</ul>
<p>It supports a wide range of <strong>machine learning algorithms</strong> across different categories:</p>
<h3 id="heading-types-of-algorithm-supported-in-sagemaker-ai-in-different-categories-are">Types of algorithm supported in SageMaker AI in different Categories are</h3>
<h2 id="heading-time-series"><strong><em>Time-Series</em></strong></h2>
<p>SageMaker AI provides algorithms that are tailored to the analysis of time-series data for forecasting product demand, server loads, webpage requests, and more.</p>
<ol>
<li><h3 id="heading-deepar">DeepAR</h3>
<p> The Amazon SageMaker AI DeepAR forecasting algorithm is a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN). Classical forecasting methods, such as autoregressive integrated moving average (ARIMA) or exponential smoothing (ETS), fit a single model to each individual time series.</p>
<ul>
<li><p><strong>Type</strong>: Supervised</p>
</li>
<li><p><strong>Purpose</strong>: Forecast scalar (1D) time-series data using RNNs.</p>
</li>
<li><p><strong>Use Cases</strong>: Demand forecasting, server load prediction, web traffic estimation.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Learns across multiple related time series.</p>
</li>
<li><p>Outperforms classical methods like ARIMA and ETS.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<h2 id="heading-text"><strong><em>Text</em></strong></h2>
<p>SageMaker AI provides algorithms that are tailored to the analysis of textual documents used in natural language processing, document classification or summarization, topic modeling or classification, and language transcription or translation.</p>
<ol>
<li><h3 id="heading-blazingtext">BlazingText</h3>
<p> BlazingText algorithm provides highly optimized implementations of the Word2vec and text classification algorithms. The Word2vec algorithm is useful for many downstream natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, machine translation, etc. Text classification is an important task for applications that perform web searches, information retrieval, ranking, and document classification.</p>
<ul>
<li><p><strong>Type</strong>: Supervised</p>
</li>
<li><p><strong>Purpose</strong>: Word embeddings (Word2Vec) and text classification.</p>
</li>
<li><p><strong>Use Cases</strong>: Sentiment analysis, document classification, search ranking.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Highly optimized for speed and scalability.</p>
</li>
<li><p>Supports multi-threading and GPU acceleration</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="2">
<li><h3 id="heading-latent-dirichlet-allocation-lda">Latent Dirichlet Allocation (LDA)</h3>
<p> Latent Dirichlet Allocation (LDA) algorithm is an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of distinct categories. LDA is most commonly used to discover a user-specified number of topics shared by documents within a text corpus. Here each observation is a document, the features are the presence (or occurrence count) of each word, and the categories are the topics. Since the method is unsupervised, the topics are not specified up front, and are not guaranteed to align with how a human may naturally categorize documents. The topics are learned as a probability distribution over the words that occur in each document. Each document, in turn, is described as a mixture of topics.</p>
<ul>
<li><p><strong>Type</strong>: Unsupervised</p>
</li>
<li><p><strong>Purpose</strong>: Topic modeling.</p>
</li>
<li><p><strong>Use Cases</strong>: Discovering themes in document corpora.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Learns topics as distributions over words.</p>
</li>
<li><p>CPU-only, single-instance training.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="3">
<li><h3 id="heading-ntm">NTM</h3>
<p> NTM is an unsupervised learning algorithm that is used to organize a corpus of documents into <em>topics</em> that contain word groupings based on their statistical distribution. Documents that contain frequent occurrences of words such as "bike", "car", "train", "mileage", and "speed" are likely to share a topic on "transportation" for example.Topic modeling provides a way to visualize the contents of a large document corpus in terms of the learned topics.</p>
<p> Although you can use both the Amazon SageMaker AI <strong>NTM</strong> and <strong>LDA</strong> algorithms for topic modeling, they are distinct algorithms and can be expected to produce different results on the same input data.From a practicality standpoint regarding hardware and compute power, <strong>SageMaker NTM hardware is more flexible than LDA and can scale better because NTM</strong> can run on CPU and GPU and can be parallelized across multiple GPU instances, whereas LDA only supports single-instance CPU training.</p>
<ul>
<li><p><strong>Type</strong>: Unsupervised</p>
</li>
<li><p><strong>Purpose</strong>: Topic modeling using neural networks.</p>
</li>
<li><p><strong>Use Cases</strong>: Visualizing document clusters by topic.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Scales better than LDA.</p>
</li>
<li><p>Supports GPU and multi-instance training.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="4">
<li><h3 id="heading-object2vec">Object2Vec</h3>
<p> Object2Vec algorithm is a general-purpose neural embedding algorithm that is highly customizable. It can learn low-dimensional dense embeddings of high-dimensional objects. The embeddings are learned in a way that preserves the semantics of the relationship between pairs of objects in the original space in the embedding space. You can use the learned embeddings to efficiently compute nearest neighbors of objects and to visualize natural clusters of related objects in low-dimensional space, for example. You can also use the embeddings as features of the corresponding objects in downstream supervised tasks, such as classification or regression. Object2Vec generalizes the well-known Word2Vec embedding technique for words that is optimized in the SageMaker AI <a target="_blank" href="https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html">BlazingText algorithm</a>. For a blog post that discusses how to apply Object2Vec to some practical use cases, see <a target="_blank" href="https://aws.amazon.com/blogs/machine-learning/introduction-to-amazon-sagemaker-object2vec/">Introduction to Amazon SageMaker AI Object2Vec</a>.</p>
<ul>
<li><ul>
<li><p><strong>Type</strong>: Supervised</p>
<ul>
<li><p><strong>Purpose</strong>: Learn embeddings for high-dimensional objects.</p>
</li>
<li><p><strong>Use Cases</strong>: Similarity search, clustering, feature engineering.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Generalizes Word2Vec for arbitrary objects.</p>
</li>
<li><p>Useful for downstream classification/regression.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="5">
<li><h3 id="heading-sequence-to-sequence">Sequence to Sequence</h3>
<p> Sequence to Sequence is a supervised learning algorithm where the input is a sequence of tokens (for example, text, audio) and the output generated is another sequence of tokens. Example applications include: machine translation (input a sentence from one language and predict what that sentence would be in another language), text summarization (input a longer string of words and predict a shorter string of words that is a summary), speech-to-text (audio clips converted into output sentences in tokens). Recently, problems in this domain have been successfully modeled with deep neural networks that show a significant performance boost over previous methodologies. Amazon SageMaker AI seq2seq uses Recurrent Neural Networks (RNNs) and Convolutional Neural Network (CNN) models with attention as encoder-decoder architectures.</p>
<ul>
<li><p><strong>Type</strong>: Supervised</p>
</li>
<li><p><strong>Purpose</strong>: Map input sequences to output sequences.</p>
</li>
<li><p><strong>Use Cases</strong>: Machine translation, summarization, speech-to-text.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Uses RNNs and CNNs with attention mechanisms.</p>
</li>
<li><p>Encoder-decoder architecture.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="6">
<li><h3 id="heading-text-classification-tensorflow">Text Classification TensorFlow</h3>
<p> Text Classification - TensorFlow algorithm is a supervised learning algorithm that supports transfer learning with many pretrained models from the <a target="_blank" href="https://tfhub.dev/">TensorFlow Hub. Use transfer</a> learning to fine-tune one of the available pretrained models on your own dataset, even if a large amount of text data is not available. The text classification algorithm takes a text string as input and outputs a probability for each of the class labels. Training datasets must be in CSV format.</p>
<ul>
<li><ul>
<li><p><strong>Type</strong>: Supervised</p>
<ul>
<li><p><strong>Purpose</strong>: Classify text using pretrained models.</p>
</li>
<li><p><strong>Use Cases</strong>: Spam detection, sentiment analysis.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Transfer learning via TensorFlow Hub.</p>
</li>
<li><p>Requires CSV input format.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<h2 id="heading-tabular"><em>Tabular</em></h2>
<ol>
<li><h3 id="heading-autogluon-tabular">AutoGluon-Tabular</h3>
<p> <a target="_blank" href="https://auto.gluon.ai/stable/index.html">AutoGluon-Tabular</a> <a target="_blank" href="https://auto.gluon.ai/stable/index.html">is a popular ope</a>n-source AutoML framework that trains highly accurate machine learning models on an unprocessed tabular dataset. Unlike existing AutoML frameworks that primarily focus on model and hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers. This page includes information about Amazon EC2 instance recommendations and sample notebooks for AutoGluon-Tabular.</p>
<ul>
<li><p><strong>Type</strong>: AutoML (Supervised)</p>
</li>
<li><p><strong>Purpose</strong>: Automatically train and ensemble models.</p>
</li>
<li><p><strong>Use Cases</strong>: Predictive modeling on structured data.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Stacks multiple models.</p>
</li>
<li><p>Minimal tuning required</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="2">
<li><h3 id="heading-catboost">CatBoost</h3>
<p> <a target="_blank" href="https://catboost.ai/">CatBoost</a> <a target="_blank" href="https://catboost.ai/">is a po</a>pular and high-performance open-source implementation of the Gradient Boosting Decision Tree (GBDT) algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models.</p>
<p> CatBoost <a target="_blank" href="https://catboost.ai/">introdu</a>ces two critical algorithmic advances to GBDT:</p>
<ol>
<li><p>The impl<a target="_blank" href="https://catboost.ai/">ementati</a>on of ordered boosting, a permutation-driven alternative to the classic algorithm</p>
</li>
<li><p>An innov<a target="_blank" href="https://catboost.ai/">ative al</a>gorithm for processing categorical features</p>
</li>
</ol>
</li>
</ol>
<p>    SageMaker AI CatBoost currently only trains using CPUs. CatBoost is a memory-bound (as opposed to compute-bound) algorithm.</p>
<ul>
<li><ul>
<li><p><strong>Type</strong>: Supervised (GBDT)</p>
<ul>
<li><p><strong>Purpose</strong>: Classification and regression.</p>
</li>
<li><p><strong>Use Cases</strong>: Credit scoring, churn prediction.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Handles categorical features natively.</p>
</li>
<li><p>CPU-only, memory-bound.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<ol start="3">
<li><h3 id="heading-factorization-machines">Factorization Machines</h3>
<p> The Factorization Machines algorithm is a general-purpose supervised learning algorithm that you can use for both classification and regression tasks. It is an extension of a linear model that is designed to capture interactions between features within high dimensional sparse datasets economically. For example, in a click prediction system, the Factorization Machines model can capture click rate patterns observed when ads from a certain ad-category are placed on pages from a certain page-category. Factorization machines are a good choice for tasks dealing with high dimensional sparse datasets, such as click prediction and item recommendation.</p>
<ul>
<li><ul>
<li><p><strong>Type</strong>: Supervised</p>
<ul>
<li><p><strong>Purpose</strong>: Capture feature interactions in sparse data.</p>
</li>
<li><p><strong>Use Cases</strong>: Click prediction, recommendation systems.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li>Efficient for high-dimensional sparse datasets.</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="4">
<li><h3 id="heading-k-nearest-neighbors-k-nn">k-nearest neighbors (k-NN)</h3>
<p> k-nearest neighbors (k-NN) algorithm is an index-based algorithm. It uses a non-parametric method for classification or regression. For classification problems, the algorithm queries the <em>k</em> points that are closest to the sample point and returns the most frequently used label of their class as the predicted label. For regression problems, the algorithm queries the <em>k</em> closest points to the sample point and returns the average of their feature values as the predicted value.</p>
<p> Training with the k-NN algorithm has three steps: sampling, dimension reduction, and index building. Sampling reduces the size of the initial dataset so that it fits into memory. For dimension reduction, the algorithm decreases the feature dimension of the data to reduce the footprint of the k-NN model in memory and inference latency. We provide two methods of dimension reduction methods: random projection and the fast Johnson-Lindenstrauss transform. Typically, you use dimension reduction for high-dimensional (d &gt;1000) datasets to avoid the “curse of dimensionality” that troubles the statistical analysis of data that becomes sparse as dimensionality increases. The main objective of k-NN's training is to construct the index. The index enables efficient lookups of distances between points whose values or class labels have not yet been determined and the k nearest points to use for inference.</p>
<ul>
<li><ul>
<li><p><strong>Type</strong>: Supervised</p>
<ul>
<li><p><strong>Purpose</strong>: Classification and regression via similarity.</p>
</li>
<li><p><strong>Use Cases</strong>: Recommendation systems, anomaly detection.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Index-based lookup.</p>
</li>
<li><p>Includes sampling and dimensionality reduction.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="5">
<li><h3 id="heading-lightgbm">LightGBM</h3>
<p> <a target="_blank" href="https://lightgbm.readthedocs.io/en/latest/">LightGBM</a> <a target="_blank" href="https://lightgbm.readthedocs.io/en/latest/">is a po</a>pular and efficient open-source implementation of the Gradient Boosting Decision Tree (GBDT) algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models. LightGBM uses additional techniques to significantly improve the efficiency and scalability of conventional GBDT. This page includes information about Amazon EC2 instance recommendations and sample notebooks for LightGBM.</p>
<ul>
<li><h3 id="heading-lightgbm-1"><strong>LightGBM</strong></h3>
<ul>
<li><p><strong>Type</strong>: Supervised (GBDT)</p>
</li>
<li><p><strong>Purpose</strong>: Classification and regression.</p>
</li>
<li><p><strong>Use Cases</strong>: Tabular modeling, ranking.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Efficient and scalable.</p>
</li>
<li><p>Supports large datasets.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="6">
<li><h3 id="heading-linear-learner-algorithm">Linear learner algorithm</h3>
<p> The Amazon SageMaker AI linear learner algorithm provides a solution for both classification and regression problems.The linear learner algorithm supports both <code>recordIO-wrapped protobuf</code> and <code>CSV</code> formats.</p>
<ul>
<li><ul>
<li><p><strong>Type</strong>: Supervised</p>
<ul>
<li><p><strong>Purpose</strong>: Linear models for classification/regression.</p>
</li>
<li><p><strong>Use Cases</strong>: Binary classification, regression tasks.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Fast training.</p>
</li>
<li><p>Supports CSV and RecordIO formats.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="7">
<li><h3 id="heading-tabtransformer">TabTransformer</h3>
<p> <a target="_blank" href="https://arxiv.org/abs/2012.06678">TabTransformer</a> <a target="_blank" href="https://arxiv.org/abs/2012.06678">is a novel de</a>ep tabular data modeling architecture for supervised learning. The TabTransformer architecture is built on self-attention-based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Furthermore, the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. This page includes information about Amazon EC2 instance recommendations and sample notebooks for TabTransformer.</p>
<ul>
<li><h3 id="heading-tabtransformer-1"><strong>TabTransformer</strong></h3>
<ul>
<li><p><strong>Type</strong>: Supervised</p>
</li>
<li><p><strong>Purpose</strong>: Deep learning for tabular data.</p>
</li>
<li><p><strong>Use Cases</strong>: Predictive modeling with categorical features.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Uses Transformer architecture.</p>
</li>
<li><p>Robust to missing/noisy data.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="8">
<li><h3 id="heading-xgboost">XGBoost</h3>
<p> The <a target="_blank" href="https://github.com/dmlc/xgboost">XGBoost</a> <a target="_blank" href="https://github.com/dmlc/xgboost">(eXtre</a>me Gradient Boosting) is a popular and efficient open-source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm that tries to accurately predict a target variable by combining multiple estimates from a set of simpler models. The XGBoost algorithm performs well in machine learning competitions for the following reasons:</p>
<ul>
<li><p>Its robust <a target="_blank" href="https://github.com/dmlc/xgboost">handlin</a>g of a variety of data types, relationships, distributions.</p>
</li>
<li><p>The variety <a target="_blank" href="https://github.com/dmlc/xgboost">of hyp</a>erparameters that you can fine-tune.</p>
</li>
</ul>
</li>
</ol>
<p>    You can use <a target="_blank" href="https://github.com/dmlc/xgboost">XGBoos</a>t for regression, classification (binary and multiclass), and ranking problems.</p>
<ul>
<li><ul>
<li><p><strong>Type</strong>: Supervised (GBDT)</p>
<ul>
<li><p><strong>Purpose</strong>: Classification, regression, ranking.</p>
</li>
<li><p><strong>Use Cases</strong>: ML competitions, structured data modeling.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Highly tunable.</p>
</li>
<li><p>Handles various data types and distributions.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h2 id="heading-unsupervised"><em>Unsupervised</em></h2>
<ol>
<li><h3 id="heading-ip-insights">IP insights</h3>
<p> Amazon SageMaker AI IP Insights is an unsupervised learning algorithm that learns the usage patterns for IPv4 addresses. It is designed to capture associations between IPv4 addresses and various entities, such as user IDs or account numbers. You can use it to identify a user attempting to log into a web service from an anomalous IP address, for example. Or you can use it to identify an account that is attempting to create computing resources from an unusual IP address. Trained IP Insight models can be hosted at an endpoint for making real-time predictions or used for processing batch transforms.</p>
<p> SageMaker AI IP insights ingests historical data as (entity, IPv4 Address) pairs and learns the IP usage patterns of each entity. When queried with an (entity, IPv4 Address) event, a SageMaker AI IP Insights model returns a score that infers how anomalous the pattern of the event is. For example, when a user attempts to log in from an IP address, if the IP Insights score is high enough, a web login server might decide to trigger a multi-factor authentication system. In more advanced solutions, you can feed the IP Insights score into another machine learning model. For example, you can combine the IP Insight score with other features to rank the findings of another security system, such as those from <a target="_blank" href="https://docs.aws.amazon.com/guardduty/latest/ug/what-is-guardduty.html">Amazon GuardDuty</a>.</p>
<p> The SageMaker AI IP Insights algorithm can also learn vector representations of IP addresses, known as <em>embeddings</em>. You can use vector-encoded embeddings as features in downstream machine learning tasks that use the information observed in the IP addresses. For example, you can use them in tasks such as measuring similarities between IP addresses in clustering and visualization tasks.</p>
<ul>
<li><p><strong>Type</strong>: Unsupervised</p>
</li>
<li><p><strong>Purpose</strong>: Detect anomalous IP usage patterns.</p>
</li>
<li><p><strong>Use Cases</strong>: Fraud detection, security monitoring.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Learns entity-IP associations.</p>
</li>
<li><p>Outputs anomaly scores.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="2">
<li><h3 id="heading-k-means">K-means</h3>
<p> K-means is an unsupervised learning algorithm. It attempts to find discrete groupings within data, where members of a group are as similar as possible to one another and as different as possible from members of other groups. You define the attributes that you want the algorithm to use to determine similarity.The k-means algorithm expects tabular data, where rows represent the observations that you want to cluster, and the columns represent attributes of the observations. The <em>n</em> attributes in each row represent a point in <em>n</em>-dimensional space. The Euclidean distance between these points represents the similarity of the corresponding observations.</p>
<ul>
<li><ul>
<li><p><strong>Type</strong>: Unsupervised</p>
<ul>
<li><p><strong>Purpose</strong>: Clustering.</p>
</li>
<li><p><strong>Use Cases</strong>: Customer segmentation, pattern discovery.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Uses Euclidean distance.</p>
</li>
<li><p>Requires tabular data.</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="3">
<li><h3 id="heading-pca">PCA</h3>
<p> PCA is an unsupervised machine learning algorithm that attempts to reduce the dimensionality (number of features) within a dataset while still retaining as much information as possible. This is done by finding a new set of features called <em>components</em>, which are composites of the original features that are uncorrelated with one another. They are also constrained so that the first component accounts for the largest possible variability in the data, the second component the second most variability, and so on.In Amazon SageMaker AI, PCA operates in two modes, depending on the scenario:</p>
<ul>
<li><p><strong>regular</strong>: For datasets with sparse data and a moderate number of observations and features.</p>
</li>
<li><p><strong>randomized</strong>: For datasets with both a large number of observations and features. This mode uses an approximation algorithm.</p>
</li>
</ul>
</li>
</ol>
<p>    PCA uses tabular data.</p>
<ul>
<li><p><strong>Type</strong>: Unsupervised</p>
</li>
<li><p><strong>Purpose</strong>: Dimensionality reduction.</p>
</li>
<li><p><strong>Use Cases</strong>: Visualization, preprocessing.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><ul>
<li><p>Regular and randomized modes.</p>
<ul>
<li>Works on tabular data.</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<ol start="4">
<li><h3 id="heading-random-cut-forest-rcf">Random Cut Forest (RCF)</h3>
<p> Amazon SageMaker AI Random Cut Forest (RCF) is an unsupervised algorithm for detecting anomalous data points within a data set. These are observations which diverge from otherwise well-structured or patterned data.Anomalies can manifest as unexpected spikes in time series data, breaks in periodicity, or unclassifiable data points.With each data point, RCF associates an anomaly score. Low score values indicate that the data point is considered "normal." High values indicate the presence of an anomaly in the data. The definitions of "low" and "high" depend on the application but common practice suggests that scores beyond three standard deviations from the mean score are considered anomalous.</p>
<ul>
<li><p><strong>Type</strong>: Unsupervised</p>
</li>
<li><p><strong>Purpose</strong>: Anomaly detection.</p>
</li>
<li><p><strong>Use Cases</strong>: Detecting outliers in time-series or structured data.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Assigns anomaly scores.</p>
</li>
<li><p>Suitable for streaming data.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<h2 id="heading-vision"><em>Vision</em></h2>
<ol>
<li><h3 id="heading-image-classification-mxnet">Image classification-MXNet</h3>
<p> The Amazon SageMaker image classification algorithm is a supervised learning algorithm that supports multi-label classification. It takes an image as input and outputs one or more labels assigned to that image. It uses a convolutional neural network that can be trained from scratch or trained using transfer learning when a large number of training images are not available.Image classification in Amazon SageMaker AI can be run in two modes: full training and transfer learning. In full training mode, the network is initialized with random weights and trained on user data from scratch. In transfer learning mode, the network is initialized with pre-trained weights and just the top fully connected layer is initialized with random weights. Then, the whole network is fine-tuned with new data. In this mode, training can be achieved even with a smaller dataset. This is because the network is already trained and therefore can be used in cases without sufficient training data.</p>
<ul>
<li><p><strong>Type</strong>: Supervised</p>
</li>
<li><p><strong>Purpose</strong>: Multi-label image classification.</p>
</li>
<li><p><strong>Use Cases</strong>: Object recognition, medical imaging.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Supports full training and transfer learning.</p>
</li>
<li><p>Uses CNNs.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="2">
<li><h3 id="heading-image-classification-tensorflow">Image Classification - TensorFlow</h3>
<p> The Amazon SageMaker Image Classification - TensorFlow algorithm is a supervised learning algorithm that supports transfer learning with many pretrained models from the <a target="_blank" href="https://tfhub.dev/s?fine-tunable=yes&amp;module-type=image-classification&amp;subtype=module,placeholder&amp;tf-version=tf2">TensorFlow Hub</a><a target="_blank" href="https://tfhub.dev/s?fine-tunable=yes&amp;module-type=image-classification&amp;subtype=module,placeholder&amp;tf-version=tf2">. Use transfer</a> learning to fine-tune one of the available pretrained models on your own dataset, even if a large amount of image data is not available. The image classification algorithm takes an image as input and outputs a probability for each provided class label.</p>
<ul>
<li><p><strong>Type</strong>: Supervised</p>
</li>
<li><p><strong>Purpose</strong>: Image classification using pretrained models.</p>
</li>
<li><p><strong>Use Cases</strong>: Visual recognition with limited data.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Transfer learning via TensorFlow Hub.</p>
</li>
<li><p>Outputs class probabilities.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="3">
<li><h3 id="heading-object-detection-mxnet">Object Detection - MXNet</h3>
<p> The Amazon SageMaker AI Object Detection - MXNet algorithm detects and classifies objects in images using a single deep neural network. It is a supervised learning algorithm that takes images as input and identifies all instances of objects within the image scene. The object is categorized into one of the classes in a specified collection with a confidence score that it belongs to the class. Its location and scale in the image are indicated by a rectangular bounding box. It uses the <a target="_blank" href="https://arxiv.org/pdf/1512.02325.pdf">Single Shot multibox Detector (SSD)</a> <a target="_blank" href="https://arxiv.org/pdf/1512.02325.pdf">framework and supports two base ne</a>tworks: <a target="_blank" href="https://arxiv.org/pdf/1409.1556.pdf">VGG</a> <a target="_blank" href="https://arxiv.org/pdf/1409.1556.pdf">an</a>d <a target="_blank" href="https://arxiv.org/pdf/1603.05027.pdf">ResNet</a><a target="_blank" href="https://arxiv.org/pdf/1603.05027.pdf">. The</a> network can be trained from scratch, or trained with models that have been pre-trained on the <a target="_blank" href="http://www.image-net.org/">ImageNet</a> <a target="_blank" href="http://www.image-net.org/">dataset</a>.</p>
<ul>
<li><p><strong>Type</strong>: Supervised</p>
</li>
<li><p><strong>Purpose</strong>: Detect and classify objects in images.</p>
</li>
<li><p><strong>Use Cases</strong>: Surveillance, autonomous vehicles.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>SSD framework with VGG/ResNet.</p>
</li>
<li><p>Outputs bounding boxes and class scores.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="4">
<li><h3 id="heading-object-detection-tensorflow">Object Detection - TensorFlow</h3>
<p> The Amazon SageMaker AI Object Detection - TensorFlow algorithm is a supervised learning algorithm that supports transfer learning with many pretrained models from the <a target="_blank" href="https://github.com/tensorflow/models">TensorFlow Model Garden</a><a target="_blank" href="https://github.com/tensorflow/models">. Use transfer learning</a> to fine-tune one of the available pretrained models on your own dataset, even if a large amount of image data is not available. The object detection algorithm takes an image as input and outputs a list of bounding boxes.</p>
<ul>
<li><p><strong>Type</strong>: Supervised</p>
</li>
<li><p><strong>Purpose</strong>: Object detection using pretrained models.</p>
</li>
<li><p><strong>Use Cases</strong>: Retail analytics, robotics.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Transfer learning via TensorFlow Model Garden.</p>
</li>
<li><p>Outputs bounding boxes.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<ol start="5">
<li><h3 id="heading-semantic-segmentation">Semantic segmentation</h3>
<p> The SageMaker AI semantic segmentation algorithm provides a fine-grained, pixel-level approach to developing computer vision applications. It tags every pixel in an image with a class label from a predefined set of classes. Tagging is fundamental for understanding scenes, which is critical to an increasing number of computer vision applications, such as self-driving vehicles, medical imaging diagnostics, and robot sensing.</p>
<ul>
<li><p><strong>Type</strong>: Supervised</p>
</li>
<li><p><strong>Purpose</strong>: Pixel-level image classification.</p>
</li>
<li><p><strong>Use Cases</strong>: Medical diagnostics, autonomous driving.</p>
</li>
<li><p><strong>Key Features</strong>:</p>
<ul>
<li><p>Tags each pixel with a class label.</p>
</li>
<li><p>Enables fine-grained scene understanding.</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<h2 id="heading-scenario-to-algorithm-mapping-table"><strong>Scenario-to-Algorithm Mapping Table</strong></h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>Scenario</strong></td><td><strong>Algorithm</strong></td><td><strong>Type</strong></td><td><strong>Example Use Case</strong></td></tr>
</thead>
<tbody>
<tr>
<td>Forecasting product demand</td><td>DeepAR</td><td>Supervised (Time-Series)</td><td>Predicting weekly sales</td></tr>
<tr>
<td>Sentiment analysis</td><td>BlazingText</td><td>Supervised (NLP)</td><td>Classifying tweets as positive/negative</td></tr>
<tr>
<td>Topic modeling in documents</td><td>LDA / NTM</td><td>Unsupervised (NLP)</td><td>Discovering themes in news articles</td></tr>
<tr>
<td>Object similarity search</td><td>Object2Vec</td><td>Supervised (Embedding)</td><td>Recommending similar products</td></tr>
<tr>
<td>Machine translation</td><td>Sequence-to-Sequence</td><td>Supervised (NLP)</td><td>Translating English to French</td></tr>
<tr>
<td>Text classification with limited data</td><td>Text Classification (TensorFlow)</td><td>Supervised (Transfer Learning)</td><td>Spam detection in emails</td></tr>
<tr>
<td>Predicting customer churn</td><td>AutoGluon-Tabular</td><td>AutoML (Tabular)</td><td>Churn prediction from customer data</td></tr>
<tr>
<td>Click prediction in sparse data</td><td>Factorization Machines</td><td>Supervised (Tabular)</td><td>Ad click-through rate prediction</td></tr>
<tr>
<td>Fraud detection via IP patterns</td><td>IP Insights</td><td>Unsupervised</td><td>Detecting login anomalies</td></tr>
<tr>
<td>Customer segmentation</td><td>K-Means</td><td>Unsupervised</td><td>Grouping users by behavior</td></tr>
<tr>
<td>Dimensionality reduction</td><td>PCA</td><td>Unsupervised</td><td>Visualizing high-dimensional data</td></tr>
<tr>
<td>Anomaly detection in logs</td><td>Random Cut Forest</td><td>Unsupervised</td><td>Detecting unusual spikes in server logs</td></tr>
<tr>
<td>Image classification</td><td>Image Classification (MXNet / TensorFlow)</td><td>Supervised (Vision)</td><td>Identifying dog breeds</td></tr>
<tr>
<td>Object detection in images</td><td>Object Detection (MXNet / TensorFlow)</td><td>Supervised (Vision)</td><td>Detecting cars in traffic footage</td></tr>
<tr>
<td>Scene understanding</td><td>Semantic Segmentation</td><td>Supervised (Vision)</td><td>Medical image diagnostics</td></tr>
<tr>
<td>Tabular classification with categorical features</td><td>TabTransformer</td><td>Supervised (Tabular)</td><td>Predicting loan defaults</td></tr>
<tr>
<td>Binary classification</td><td>Linear Learner</td><td>Supervised</td><td>Predicting if a transaction is fraudulent</td></tr>
<tr>
<td>High-performance tabular modeling</td><td>CatBoost / LightGBM / XGBoost</td><td>Supervised</td><td>Credit scoring, sales prediction</td></tr>
<tr>
<td>Nearest neighbor search</td><td>k-NN</td><td>Supervised</td><td>Recommending similar users</td></tr>
</tbody>
</table>
</div><p>Above content is derived from <a target="_blank" href="https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html">https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html</a></p>
]]></content:encoded></item><item><title><![CDATA[Seq2Seq Encoder-Decoder Model]]></title><description><![CDATA[The Seq2Seq (Sequence to Sequence) architecture is a highly advanced design in neural networks that underpins numerous complex tasks across various fields, particularly in natural language processing. Its significance is especially evident in applica...]]></description><link>https://path2ml.com/seq2seq-encoder-decoder-model</link><guid isPermaLink="true">https://path2ml.com/seq2seq-encoder-decoder-model</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[Seq2seq Models]]></category><category><![CDATA[encoder decoder]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Text Translation]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Tue, 05 Aug 2025 22:29:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1754354396072/960b1cee-cb81-438b-9d23-bf38e7c2ae03.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The <strong>Seq2Seq</strong> (Sequence to Sequence) architecture is a highly advanced design in neural networks that underpins numerous complex tasks across various fields, particularly in natural language processing. Its significance is especially evident in applications such as language translation, text summarization, and conversational AI. The architecture is structured around two fundamental components: the encoder and the decoder. These components are frequently constructed using Long Short-Term Memory (<strong>LSTM</strong>) networks, although alternative structures like Gated Recurrent Units (<strong>GRUs</strong>) may also be employed for specific scenarios.</p>
<p><strong>Encoder:</strong> The encoder acts as the foundational stage of the <strong>Seq2Seq</strong> model, dedicated to the meticulous processing of the input sequence. This input can consist of anything from a simple sentence to expansive blocks of text, and its complexity requires careful handling. The primary role of the encoder is to convert this input into a fixed-size context vector—a compact but rich abstract representation that captures the most imperative information from the entire input sequence. This context vector is meticulously crafted to encapsulate the essential features and semantic nuances of the original input, thus empowering the decoder to interpret the information effectively. Typically, the encoder is organized in layers of <strong>LSTM</strong> cells, which synergistically collaborate to absorb and retain patterns over time. This design adeptly navigates the intricacies of sequential data, enabling it to learn from prior inputs while accounting for their contextual significance.</p>
<p><strong>Decoder:</strong> Following the encoder, the decoder takes on the task of outputting the desired sequence. It begins its operation with the context vector created by the encoder, using it as a springboard to generate outputs one element at a time—this could be a word, token, or any other applicable unit of information, depending on the specific application. At each step of the decoding process, the model integrates not only the context vector but also the elements it has previously produced. This integration is crucial; it enables the decoder to maintain coherence and relevance throughout the generated sequence. Thanks to the sequential nature of this decoding process, the architecture is capable of producing output that feels more natural and contextually appropriate, making it particularly effective for tasks like language translation and chatbot interactions.</p>
<p>The LSTM architecture is particularly well-suited for <strong>Seq2Seq</strong> tasks due to its exceptional ability to learn and remember long-term dependencies in data. Traditional recurrent neural networks (RNNs) often encounter difficulties with the vanishing gradient problem, which can prevent them from effectively capturing information from earlier parts of the sequence. LSTMs circumvent this challenge through the strategic use of gates—specifically, input, output, and forget gates. These gates meticulously regulate the flow of information, allowing <strong>LSTMs</strong> to keep hold of crucial details while discarding less relevant information as the sequence unfolds. This capability is vital for tasks requiring nuanced contextual understanding over extended sequences, such as comprehending lengthy sentences or analyzing multi-sentence paragraphs.</p>
<p><strong>Seq2Seq</strong> architecture, particularly when integrated with <strong>LSTM</strong> networks, presents a robust framework for transforming one sequence into another with remarkable efficiency and accuracy. By encoding input information into a detailed and comprehensive context vector and subsequently decoding it into a desired output format, this architecture facilitates applications that necessitate high levels of precision and contextual sensitivity. As a result, it stands as an indispensable tool in the realm of natural language processing and is invaluable across numerous other domains.</p>
<h2 id="heading-seq2seq-encoder-part">Seq2Seq Encoder part</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1754410213631/16b0201b-f862-4ef3-a158-e851e602b8ac.png" alt class="image--center mx-auto" /></p>
<p>We are developing an <strong>Encoder-Decoder</strong> model designed to translate the <strong>English</strong> phrase <strong>"I am"</strong> into its Spanish equivalent, "<strong>soy</strong>." The process begins with the creation of an embedding layer that captures the relationships between different words in our input vocabulary. This layer outputs a dense representation of the phrase, which will serve as the input for our Long Short-Term Memory (<strong>LSTM</strong>) network.</p>
<p>To effectively manage the flow of information, we initialize both the <strong>long-term</strong> and <strong>short-term memory</strong> states of the <strong>LSTM</strong>. As we proceed, we unroll the <strong>LSTM</strong>, which involves unfolding the network across the sequence of inputs. This unrolling process allows us to maintain consistent weights and biases across each time step.</p>
<p>During the operation of the <strong>LSTM</strong>, we perform a series of calculations to determine the cell state, which maintains information over long sequences, and the hidden state, which carries short-term information. These states allow the network to retain relevant context and understand the relationships between different parts of the input.</p>
<p>Ultimately, we generate a <strong>context vector</strong> that encapsulates both the long-term and short-term memories produced by the encoder. This context vector is crucial as it serves as the foundational input for the decoder, guiding it to accurately produce the corresponding translation in <strong>Spanish</strong>.</p>
<p>Following calculations gets done in encoder and decoder</p>
<ol>
<li><p>Forget Gate</p>
</li>
<li><p>Candidate Value</p>
</li>
<li><p>Update gate</p>
</li>
<li><p>Output gate</p>
</li>
<li><p>Cell state</p>
</li>
<li><p>Hidden state</p>
</li>
</ol>
<h3 id="heading-forget-gate">Forget gate</h3>
<p>\(\mathbf{\Gamma}_f^{\langle t \rangle} = \sigma(\mathbf{W}_f[\mathbf{a}^{\langle t-1 \rangle}, \mathbf{x}^{\langle t \rangle}] + \mathbf{b}_f)\tag{1}\)</p>
<p>The previous time step's hidden state \(a^{\langle t-1 \rangle}\)and current time step's input \(x^{\langle t \rangle}\)are concatenated together and multiplied by \(\mathbf{W_{f}}\)</p>
<h3 id="heading-candidate-value">Candidate value</h3>
<p>\(\mathbf{\tilde{c}}^{\langle t \rangle} = \tanh\left( \mathbf{W}_{c} [\mathbf{a}^{\langle t - 1 \rangle}, \mathbf{x}^{\langle t \rangle}] + \mathbf{b}_{c} \right) \tag{3}\)</p>
<p>The candidate value is a tensor containing information from the current time step that <strong>may</strong> be stored in the current cell state \(\mathbf{c}^{\langle t \rangle}.\)</p>
<p>The parts of the candidate value that get passed on depend on the update gate.</p>
<h3 id="heading-update-gate">Update gate</h3>
<p>\(\mathbf{\Gamma}_i^{\langle t \rangle} = \sigma(\mathbf{W}_i[a^{\langle t-1 \rangle}, \mathbf{x}^{\langle t \rangle}] + \mathbf{b}_i)\tag{2}\)</p>
<p>Update gate decides what aspects of the candidate \(\tilde{\mathbf{c}}^{\langle t \rangle}\)to add to the cell state \(c^{\langle t \rangle}\)</p>
<p>The update gate decides what parts of a "candidate" tensor \(\tilde{\mathbf{c}}^{\langle t \rangle}\)are passed onto the cell state \(\mathbf{c}^{\langle t \rangle}\)</p>
<h3 id="heading-output-gate">Output gate</h3>
<p>\(\mathbf{\Gamma}_o^{\langle t \rangle}=  \sigma(\mathbf{W}_o[\mathbf{a}^{\langle t-1 \rangle}, \mathbf{x}^{\langle t \rangle}] + \mathbf{b}_{o})\tag{5}\)</p>
<p>The output gate decides what gets sent as the prediction (output) of the time step.</p>
<h3 id="heading-cell-state">Cell state</h3>
<p>\(\mathbf{c}^{\langle t \rangle} = \mathbf{\Gamma}_f^{\langle t \rangle}* \mathbf{c}^{\langle t-1 \rangle} + \mathbf{\Gamma}_{i}^{\langle t \rangle} *\mathbf{\tilde{c}}^{\langle t \rangle} \tag{4}\)</p>
<p>The cell state is the "memory" that gets passed onto future time steps</p>
<p>The previous cell state \(\mathbf{c}^{\langle t-1 \rangle}\)is adjusted (weighted) by the forget gate \(\mathbf{\Gamma}_{f}^{\langle t \rangle}\)and the candidate value \(\tilde{\mathbf{c}}^{\langle t \rangle}\). adjusted (weighted) by the update gate \(\mathbf{\Gamma}_{i}^{\langle t \rangle}\)</p>
<h3 id="heading-hidden-state">Hidden state</h3>
<p>\(\mathbf{a}^{\langle t \rangle} = \mathbf{\Gamma}_o^{\langle t \rangle} * \tanh(\mathbf{c}^{\langle t \rangle})\tag{6}\)</p>
<p>The hidden state gets passed to the LSTM cell's next time step.</p>
<p>The hidden state \(\mathbf{a}^{\langle t \rangle}\)is determined by the cell state \(\mathbf{c}^{\langle t \rangle}\) in combination with the output gate \(\mathbf{\Gamma}_{o}\)</p>
<h2 id="heading-seq2seq-decoder-part">Seq2Seq Decoder part</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1754431297937/4031d521-d38f-436c-9544-b5dff3e39037.png" alt class="image--center mx-auto" /></p>
<p>The process begins with the encoder generating a context vector from the initial phrase <strong>"I am."</strong> This context vector serves as a crucial starting point, as it initializes both the long-term and short-term memory components of the Long Short-Term Memory (LSTM) network in the decoder.</p>
<p>As the decoding process commences, the input to the LSTM is set using the embedding value of the end-of-sequence token, denoted as <strong>&lt;EOS&gt;</strong>. This value is sourced from an embedding layer that has been trained to represent the output vocabulary.</p>
<p>Within the LSTM, the short-term memory undergoes processing and is subsequently fed into a fully connected dense layer. This dense layer applies a <strong>softmax</strong> activation function, which plays a critical role in determining the first word that the decoder will output. In this instance, the generated word is "<strong>soy</strong>."</p>
<p>However, the decoding journey is not complete with the generation of "<strong>soy</strong>." The decoder continues to unroll, using the last generated word as the input for the embedding layer for the subsequent <strong>LSTM</strong> cycle. Once again, the same calculations are performed to update both long-term and short-term memory. This information then flows into the dense layer, followed by the application of the softmax function. This iterative process continues until the decoder finally produces the end-of-sequence ,<strong>&lt;EOS&gt;</strong> token, signaling that the generation is complete.</p>
]]></content:encoded></item><item><title><![CDATA[Vanishing/Exploding gradients in RNN]]></title><description><![CDATA[A basic Recurrent Neural Network (RNN) is a specialized type of artificial neural network designed to effectively process sequences of data, which is common in various fields such as natural language processing, time series analysis, and speech recog...]]></description><link>https://path2ml.com/vanishingexploding-gradients-in-rnn</link><guid isPermaLink="true">https://path2ml.com/vanishingexploding-gradients-in-rnn</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[RNN]]></category><category><![CDATA[AI]]></category><category><![CDATA[neural networks]]></category><category><![CDATA[Neural Network]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Wed, 30 Jul 2025 02:34:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1753828231322/d1dec0be-44f4-4a10-9072-59770b536ceb.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A basic <strong>Recurrent Neural Network (RNN)</strong> is a specialized type of artificial neural network designed to effectively process sequences of data, which is common in various fields such as natural language processing, time series analysis, and speech recognition. Unlike traditional feedforward neural networks, where information moves in one direction from input to output, <strong>RNNs</strong> incorporate connections that allow certain neurons to loop back onto themselves. This unique architecture enables RNNs to maintain a form of memory, which is crucial for understanding the context and dependencies in sequential data.</p>
<p>In an <strong>RNN</strong>, the input at each time step is not only processed independently but is also combined with the hidden state derived from the previous time step. The hidden state serves as a form of memory that captures relevant information from prior inputs in the sequence. This dynamic updating of the hidden state allows the network to incorporate both the current input and the context of previous inputs, effectively enabling the learning of temporal dependencies. Consequently, <strong>RNNs</strong> can adaptively handle sequences of varying lengths, making them particularly advantageous for tasks where the input size is not fixed, such as in natural language sentences or time-varying signals.</p>
<p>Basic Recurrent Neural Networks (<strong>RNNs</strong>) often encounter significant challenges related to the phenomena of vanishing and exploding gradients. The <strong>vanishing gradient</strong> problem arises when gradients become progressively smaller as they are propagated backward through the network during training, leading to inadequate updates to the weights of earlier layers. This makes it difficult for the network to learn long-term dependencies from the input sequences. Conversely, the <strong>exploding gradient</strong> problem occurs when gradients grow excessively large, causing sudden and erratic changes in the weights, which can destabilize the learning process. Both of these issues can severely hinder the performance of RNNs and limit their ability to effectively model sequential data. In this article, we will explore how backpropagation can lead to vanishing and exploding gradients. We will begin by examining a simple <strong>RNN</strong> architecture, which includes a feedback loop along with its associated weights and biases. Very simple RNN is shown below.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753828796146/2194f3d1-9cdf-451f-9cb6-88907cd30c10.png" alt class="image--center mx-auto" /></p>
<p>\(Image-1\)</p>
<p>To illustrate this, we will start with a basic design of an <strong>RNN</strong>, as shown below, to demonstrate the calculation of back propagation.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753833049181/08a028ca-dc98-4db1-bb0b-a08b2762ea3d.png" alt class="image--center mx-auto" /></p>
<p>For this scenario we will use <strong>SSR</strong> ( Sum of Squared Residuals) as cost function. The sum of squared residuals serves as a cost function in various statistical models. It measures the discrepancy between observed values and the values predicted by the model. This cost function is calculated by taking the difference between each observed value and its corresponding predicted value (the residual), squaring each of those differences to eliminate negative values, and then summing all the squared differences together. The goal is to minimize this sum, which indicates that the model's predictions are closely aligned with the actual data. It can be defined as</p>
<p>$$SSR= \sum_i^m (Observed_i-Predicted_i)^2$$</p><p>Lets ignore feedback loop for now and just calculate the derivative of SSR with respect to W1 considering only Input3.</p>
<p>Applying chain rule we can say that</p>
<p>\(\begin{flalign*} &amp; \frac{dSSR}{dW1}= \frac{dSSR}{dPredicted}.* \frac{dPredicted}{dW1}\space\space\space\space\cdots\cdots\cdots\cdots\cdots1    &amp;\\ \end{flalign*}\)</p>
<p>First we calculate the derivative of SSR with respect to predicted value(output)</p>
<p>\(\begin{flalign*} &amp; \frac{dSSR}{dpredicted}= \frac{d \sum_i^m (Observed_i-Predicted_i)^2}{dPredicted} &amp;\\ \end{flalign*}\)</p>
<p>Applying The chain rule</p>
<p>\(\begin{flalign*} &amp; \frac{dSSR}{dpredicted}= {\sum_i^m 2*(Observed_i-Predicted_i)} *  -1   &amp;\\ \end{flalign*}\)</p>
<p>\(\begin{flalign*} &amp; \frac{dSSR}{dpredicted}= {\sum_i^m -2*(Observed_i-Predicted_i)}   &amp;\\ \end{flalign*}\)</p>
<p>Now lets calculate the derivative of Predicted value with respect to W1</p>
<p>\(\begin{flalign*} &amp; \frac{dPredicted}{dW1}= \frac{d(W1*Input Value)}{dW1}=Input3   &amp;\\ \end{flalign*}\)</p>
<p>So now the <strong>Equation 1</strong> can be summarized as</p>
<p>\(\begin{flalign*} &amp; \frac{dSSR}{dW1}= {\sum_i^m -2*(Observed_i-Predicted_i)}  * Input3  &amp;\\ \end{flalign*}\)</p>
<p>Now lets calculate derivative, when we unroll RNN to include previous Input as feedback as is shown below</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753838582170/a9a0aa09-e801-4116-97c3-ac8438f5f58b.png" alt class="image--center mx-auto" /></p>
<p>When we unroll the RNN, the predicted value is the sum of <strong>previous value(input 2)</strong> <strong><em>W1</em></strong> <em>multiplied by</em> <strong><em>W2 plus Input3</em> * W1</strong></p>
<p>\(Predicted=(Input2 * W_1 * W_2)+(W_1*Input3)\)</p>
<p>\(\begin{flalign*} &amp; \frac{dPredicted}{dW1}= \frac{d(Input2 * W_1*W_2)+(Input3*W_1)}{dW1}=(Input2 *W_2)+Input3   &amp;\\ \end{flalign*}\)</p>
<p>If we consider one more previous input (Input1) like in <strong>Image-1</strong> , then predicted will change to</p>
<p>\(Predicted=[(Input1 * W_1 * W_2)+(W_1*Input2)]*W_2+(Input3*W_1)\)</p>
<p>solving it</p>
<p>\(Predicted=(Input1 * W_1 * W_2^2)+(Input2*W_1*W_2)+(Input3*W_1)\)</p>
<p>\(\begin{flalign*} &amp; \frac{dPredicted}{dW1}= \frac{d[(Input1 * W_1 * W_2)+(W_1*Input2)]*W_2+(Input3*W_1)}{dW1}  &amp;\\ \end{flalign*}\)</p>
<p>\(=(Input1*W_2^2)+(Input2*W_2)+Input3\)</p>
<p>Now lets replace the whole derivative of <strong>SSR</strong> with respect to <strong>W1</strong></p>
<p>\(\begin{flalign*} &amp; \frac{dSSR}{dW1}= {\sum_i^m -2*(Observed_i-Predicted_i)}  * ((Input1*W_2^2)+(Input2*W_2)+Input3)  &amp;\\ \end{flalign*}\)</p>
<p>We see that there is a <strong>pattern of raising the power of</strong> \(W_2\)<strong>by the number of times we unroll the RNN to include previous input</strong> \(((Input1*W_2^2)\)</p>
<h3 id="heading-vanishing-and-exploding-gradients">Vanishing and Exploding Gradients</h3>
<p>Lets say we unroll <strong>RNN</strong> multiple times to include many previous values , way more than shown on Image-1</p>
<p>If the weights \(W_2\)is between <strong>-1</strong> and <strong>1 ,</strong> then the derivative part \(((Input1*W_2^2)\) of the equation from \(\frac{dSSR}{dW_1}, \) will become very small, essentially referring to <strong>vanishing gradients</strong>; in other words, we can say that the contribution weight of previous values will disappear.</p>
<p>If the weights \(W_2\)is less than -1 and greater than 1 , then the derivative part \(((Input1*W_2^2)\) of t he equation from \(\frac{dSSR}{dW_1},\) will explode , it will be described as an <strong>exploding gradient</strong>, meaning the weights attributed to contributions from previous values will be exceedingly high.</p>
<p>The fundamental concept revolves around the inherent limitations of a basic Recurrent Neural Network (RNN) concerning the temporal dependencies it can effectively manage. Specifically, an RNN can only unroll for a limited number of time steps before the influence of older data points on the training process becomes problematic. When the sequence length exceeds this optimal range, the older inputs may either lose their significance—resulting in diminishing returns on their contribution to learning—or exert an overwhelming influence, thereby skewing the model's predictions and learning dynamics. This imbalance can hinder the model’s ability to retain relevant information over long sequences, ultimately affecting its performance on tasks that involve longer temporal dependencies.</p>
<h3 id="heading-long-short-term-memory-networkslstm"><strong>Long Short-Term Memory networks(LSTM)</strong></h3>
<p>Long Short-Term Memory networks, commonly known as <strong>LSTMs</strong>, are a specialized type of recurrent neural network (<strong>RNN</strong>) designed to overcome the significant challenges of vanishing and exploding gradients that often occur in traditional <strong>RNNs</strong> during training. Vanishing gradients can make it difficult for the network to learn long-range dependencies in sequences, as the gradients used to update the model's weights become excessively small, effectively freezing the learning of earlier layers. Conversely, exploding gradients can lead to numerical instability and erratic updates, causing the model to diverge.</p>
<p><strong>LSTMs</strong> address these issues through a unique architectural design featuring memory cells and three distinct gates: the input gate, the forget gate, and the output gate. The input gate regulates the flow of new information into the memory cell, the forget gate decides what information to discard from the memory cell, and the output gate controls the information that is sent out of the cell. This gating mechanism enables <strong>LSTMs</strong> to maintain information over extended sequences, allowing them to learn complex patterns and relationships in data, making them particularly effective for tasks such as language modeling, speech recognition, and time series forecasting.</p>
<p>We will cover <strong>LSTM</strong> in another article.</p>
]]></content:encoded></item><item><title><![CDATA[Implementing Convolutional Neural Network using PyTorch]]></title><description><![CDATA[My Previous article “ convolutional neural network **“**explained about the architecture of CNN and how it works. Another article “Implementing CNN using TensorFlow” showed how to implement CNN using TensorFlow.
In this article, we will explore the p...]]></description><link>https://path2ml.com/implementing-convolutional-neural-network-using-pytorch</link><guid isPermaLink="true">https://path2ml.com/implementing-convolutional-neural-network-using-pytorch</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[CNNs (Convolutional Neural Networks)]]></category><category><![CDATA[CNN]]></category><category><![CDATA[pytorch]]></category><category><![CDATA[DeepLearning]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[MachineLearning]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Sun, 27 Jul 2025 15:51:06 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1753631365019/14f97c37-ba7b-4de8-baca-16f67c07e93a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>My Previous article “ <a target="_blank" href="https://path2ml.com/convolutional-neural-network"><strong>convolutional neural network</strong></a> **“**explained about the architecture of CNN and how it works. Another article <a target="_blank" href="https://path2ml.com/implementing-convolutional-neural-network-using-tensorflow"><strong>“Implementing CNN using TensorFlow”</strong></a> showed how to implement <strong>CNN</strong> using <strong>TensorFlow</strong>.</p>
<p>In this article, we will explore the process of creating and optimizing a simple Convolutional Neural Network (<strong>CNN</strong>) using <strong>PyTorch</strong> and <strong>Lightning</strong>. A CNN is a specialized type of neural network that excels in processing and classifying images.</p>
<p>We will begin by outlining the fundamental concepts of Convolutional Neural Networks, including their architecture and the role of convolutional layers, pooling layers, and activation functions. Following that, we will walk through the implementation steps in PyTorch, detailing how to set up the environment, load the data, and construct the network.</p>
<p>We will focus on building a CNN that can distinguish between images of Xs and Os. We will also cover the optimization techniques used to improve the model's accuracy and efficiency, making use of Lightning to streamline our training process.</p>
<p>an example of CNN is shown below comprising of Conv2D and MaxPool layers.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1739895921483/ffb5e267-7cf0-40a4-a5a2-fa57933dc849.png?auto=compress,format&amp;format=webp" alt /></p>
<h3 id="heading-we-will-start-with-importing-needed-libraries-first-we-install-lightening-framework">We will start with importing needed <strong>Libraries</strong>. First we install Lightening framework</h3>
<pre><code class="lang-python">%%capture

!pip install lightning
</code></pre>
<pre><code class="lang-python"><span class="hljs-comment"># torch will allow us to create tensors.</span>
<span class="hljs-keyword">import</span> torch 
<span class="hljs-comment"># torch.nn allows us to create a neural network.</span>
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn 
<span class="hljs-comment"># nn.functional give us access to the activation and loss functions.</span>
<span class="hljs-keyword">import</span> torch.nn.functional <span class="hljs-keyword">as</span> F 
<span class="hljs-comment"># optim contains many optimizers. This time we're using Adam</span>

<span class="hljs-keyword">from</span> torch.optim <span class="hljs-keyword">import</span> Adam 
<span class="hljs-comment"># lightning has tons of cool tools that make neural networks easier</span>
<span class="hljs-keyword">import</span> lightning <span class="hljs-keyword">as</span> L 
<span class="hljs-comment"># these are needed for the training data</span>
<span class="hljs-keyword">from</span> torch.utils.data <span class="hljs-keyword">import</span> TensorDataset, DataLoader
<span class="hljs-comment">## matplotlib allows us to draw the images used for input.</span>
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
</code></pre>
<p>Once we import the necessary Python modules, our next step will be to create images of the letters O and X. These images are essential for training and testing our neural network's performance. We need to design the images to closely resemble the examples provided below, ensuring they are clear and correctly formatted for optimal neural network processing. This will involve defining the size, resolution, and any specific features that make the letters recognizable. By preparing these images carefully, we can improve the accuracy and reliability of our model in recognizing and interpreting these characters.</p>
<p>We will begin the process by generating a visual representation of the letter "O." To do this, we will construct a 6x6 matrix of numbers. In this matrix, the number 0 will represent the color white, while the number 1 will represent the color black. Each element of the matrix will correspond to a pixel in the image, allowing us to form the distinctive shape of the letter "O" through the arrangement of these values.</p>
<pre><code class="lang-python"><span class="hljs-comment">## Create a 6x6 matrix of numbers where 0 represents white</span>
<span class="hljs-comment">## and 1 represents black.</span>
o_image = [[<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>],
           [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>],
           [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>],
           [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>],
           [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>],
           [<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>]]
o_image <span class="hljs-comment"># print out the matrix to verify that it is what we expect</span>
</code></pre>
<p>We will create an image of the letter <strong>X</strong> by creating a similar 6x6 matrix, where the 1s are now in an <strong>X</strong> pattern.</p>
<pre><code class="lang-python">x_image = [[<span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>],
           [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>],
           [<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>],
           [<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>],
           [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>],
           [<span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>]]
x_image
</code></pre>
<p>To visualize the o_image and x_image with <strong>matplotlib</strong>, we begin by using the subplots() function. This function generates a grid of subplots, returning an array named axarr[]. Each element in this array corresponds to a subplot defined by the parameters nrows (number of rows) and ncols (number of columns) we specify. By organizing the images into this grid, we can easily position and display each image within its respective subplot for clear and effective comparison.</p>
<pre><code class="lang-python"><span class="hljs-comment">## To draw the o_image and x_image, we first call subplots(), which creates </span>
<span class="hljs-comment">## an array, called axarr[], with an entry for each element in a grid</span>
<span class="hljs-comment">## specified by nrows and ncols.</span>
fig, axarr = plt.subplots(nrows=<span class="hljs-number">1</span>, ncols=<span class="hljs-number">2</span>, figsize=(<span class="hljs-number">5</span>, <span class="hljs-number">5</span>))

<span class="hljs-comment">## Now we pass o_image and x_image to .imshow() for each element</span>
<span class="hljs-comment">## in the grid created by plt.subplots()</span>
axarr[<span class="hljs-number">0</span>].imshow(o_image, cmap=<span class="hljs-string">'gray_r'</span>) <span class="hljs-comment">## Setting cmap='gray_r' gives us reverse grayscale.</span>
axarr[<span class="hljs-number">1</span>].imshow(x_image, cmap=<span class="hljs-string">'gray_r'</span>)
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753626723637/3a08122b-b6d6-4314-ae91-afbdd1c9da6e.png" alt class="image--center mx-auto" /></p>
<p>We will begin by loading the training data into a DataLoader, a powerful tool in PyTorch that streamlines the process of feeding data into our neural network for training. DataLoaders are particularly advantageous when working with large datasets for several reasons. First, they enable us to access our data in manageable batches, which helps reduce memory consumption and speeds up the training process. Second, DataLoaders provide an easy way to shuffle our dataset at the beginning of each epoch, ensuring that the model does not learn any unintended patterns from the order of the data. Finally, if we want to quickly test our code or validate our model's functionality without using the entire dataset, DataLoaders allow us to work with a smaller subset of the data.</p>
<p>In order to prepare our training data for the DataLoader, we will convert the images into tensors using the <code>torch.tensor()</code> function. This step is crucial because PyTorch requires inputs to be in tensor format for processing. Once converted, we will save these tensors as <code>input_images</code>, which will then be passed to the DataLoader for efficient batch processing during training. This systematic approach will facilitate a smoother training experience and help us achieve better results with our neural network.</p>
<pre><code class="lang-python"><span class="hljs-comment">## Convert the images into tensors...</span>
input_images = torch.tensor([o_image, x_image]).type(torch.float32)
</code></pre>
<p>In this step, we will create tensors that represent the labels, which are the ideal output values corresponding to each input image in our dataset. Specifically, our convolutional neural network is designed to recognize two distinct letters: O and X.</p>
<p>To achieve this, we will define our output for the letter O as the tensor [1.0, 0.0], indicating that the first output neuron is activated for the letter O while the second one is not. Conversely, the tensor [0.0, 1.0] will be used to represent the ideal output for the letter X, where the second output neuron is activated.</p>
<p>These tensors will be crucial for training the neural network, as they will guide the model in learning to differentiate between the two letters based on the input images. All the generated labels for our training dataset will be saved in a variable named <code>input_labels</code>, which will facilitate easy access and manipulation during the training process.</p>
<pre><code class="lang-python"><span class="hljs-comment">## Create the labels for the input images</span>
input_labels = torch.tensor([[<span class="hljs-number">1.0</span>, <span class="hljs-number">0.0</span>], [<span class="hljs-number">0.0</span>, <span class="hljs-number">1.0</span>]]).type(torch.float32)
</code></pre>
<p>We will combine the input images with the input labels to create a TensorDataset, which we will then use to create a DataLoader.</p>
<pre><code class="lang-python"><span class="hljs-comment">## Now combine input_images and input_labels into a TensorDataset...</span>
dataset = TensorDataset(input_images, input_labels) 
<span class="hljs-comment">## ...and use the TensorDataset to create a DataLoader.</span>
dataloader = DataLoader(dataset)
</code></pre>
<h2 id="heading-build-a-convolutional-neural-network-with-pytorch-and-lightning">Build a convolutional neural network with PyTorch and Lightning</h2>
<p>To build a convolutional neural network (CNN) using PyTorch, we will need to define a new class that extends the capabilities of LightningModule. This approach simplifies the training process and enhances model organization. The new class will encompass several key methods, each serving a specific purpose in the model's functionality:</p>
<ul>
<li><p><strong>init()</strong>: This method is crucial for initializing the CNN's parameters. Inside this method, you will set up the weights and biases for the network layers. Additionally, you’ll maintain any necessary bookkeeping information, such as the architecture details of the network and configurations for training.</p>
</li>
<li><p><strong>forward()</strong>: In this method, you will define how data flows through the network during a forward pass. This includes the series of operations performed on the input data as it travels through each layer of the CNN, such as convolutional layers, activation functions, and pooling layers.</p>
</li>
<li><p><strong>configure_optimizers()</strong>: This method is used to set up the optimization algorithm that will update the model's weights during training. In this tutorial, we will be using the Adam optimizer, which is well-regarded for its efficiency and effectiveness in optimizing deep learning models.</p>
</li>
<li><p><strong>training_step()</strong>: This method handles the training process for each batch of data. It takes the training data as input and feeds it into the <code>forward()</code> method to obtain predictions. Afterward, it calculates the loss by comparing the predicted values with the actual target values. Additionally, it keeps track of the loss values, allowing for logging and monitoring during training, which is essential for assessing model performance.</p>
</li>
</ul>
<p>By implementing these methods, we will have a well-structured and functional convolutional neural network ready for training with PyTorch.</p>
<h3 id="heading-steps-to-build-cnn-using-pytorch"><strong>Steps to build CNN using PyTorch</strong></h3>
<p>Let's build a simple Convolutional Neural Network (CNN) using the LightningModule. This network will help us extract features from images and classify them accordingly.</p>
<h3 id="heading-step-1-initializing-weights-and-biases">Step 1: Initializing Weights and Biases</h3>
<p>We begin by initializing the weights and biases for our CNN. This step is crucial, as these parameters will be adjusted during training to improve the model's performance.</p>
<h3 id="heading-step-2-setting-up-the-convolutional-layer">Step 2: Setting Up the Convolutional Layer</h3>
<p>The first layer of our CNN is the convolutional layer, which we set up using nn.Conv2d(). This layer applies a filter to our input data to extract features. The parameters needed to configure this layer include:</p>
<ul>
<li><p><strong>in_channels</strong>: This parameter specifies the number of input channels. For instance, a grayscale (black and white) image has one channel, while a color image typically has three (for red, green, and blue).</p>
</li>
<li><p><strong>out_channels</strong>: This parameter determines how many output channels the convolutional layer will produce. If the model receives multiple input channels, we can combine them into fewer output channels, or we can increase the number of output channels to capture more features.</p>
</li>
<li><p><strong>kernel_size</strong>: This refers to the dimensions of the filter (also known as the convolutional kernel). In our implementation, we will use a 3x3 filter, but we have the flexibility to choose other sizes, including rectangular shapes, depending on our specific needs.</p>
</li>
</ul>
<h3 id="heading-step-3-implementing-max-pooling">Step 3: Implementing Max Pooling</h3>
<p>After the convolutional layer, we apply a max pooling operation using nn.MaxPool2d(). This step reduces the dimensionality of the feature maps, helping to extract the most important features and reduce computational load. The parameters for the max pooling layer include:</p>
<ul>
<li><p><strong>kernel_size</strong>: This defines the size of the pooling filter. In our case, we are using a 2x2 filter, which will help summarize the features in each 2x2 section of the input.</p>
</li>
<li><p><strong>stride</strong>: The stride determines how far we move the pooling filter with each operation. In our example, we set the stride to 2, meaning that after applying the filter to one section, it will move 2 units over (or down), ensuring there is no overlap between pooling sections.</p>
</li>
</ul>
<h3 id="heading-step-4-constructing-the-fully-connected-neural-network">Step 4: Constructing the Fully Connected Neural Network</h3>
<p>Now, we move on to constructing a fully connected neural network (also known as a dense layer). This network will take in the features extracted from the convolutional and pooling layers. The configuration of this layer includes:</p>
<ul>
<li><p><strong>Input features (</strong><code>in_features=4</code>): This specifies the number of features that will be input into the neural network.</p>
</li>
<li><p><strong>Output features (</strong><code>out_features=1</code>): This indicates that we are producing a single output from the neural network from ReLU activation function, which could represent the predicted classification for our input.</p>
</li>
</ul>
<p>Additionally, we will implement a hidden layer that has:</p>
<ul>
<li><p><strong>Input features (</strong><code>in_features=1</code>): Here, the output from the previous layer feeds into this hidden layer.</p>
</li>
<li><p><strong>Output features (</strong><code>out_features=2</code>): This layer will produce two outputs, allowing the network to classify the input into two different categories.</p>
</li>
</ul>
<h3 id="heading-step-5-calculating-loss-with-cross-entropy">Step 5: Calculating Loss with Cross Entropy</h3>
<p>To assess how well our neural network is performing, we will use Cross Entropy Loss. This loss function compares the network's predicted classifications against the actual species labels in our dataset. The implementation of this is done using <code>nn.CrossEntropyLoss</code>, which conveniently applies a SoftMax function to the output values. This means we don't need to apply the SoftMax ourselves during training. However, we must remember to apply it during inference after the model has been trained.</p>
<h3 id="heading-step-6-applying-the-filter-and-activation-functions">Step 6: Applying the Filter and Activation Functions</h3>
<p>We start the forward pass of our CNN by applying the filter to the input image. After this, the output from the convolution is passed through a ReLU activation function, which introduces non-linearity into the model:</p>
<p>Next, we take the output from the ReLU layer and feed it into the max pooling layer:</p>
<p>At this stage, we have a reduced matrix of feature values. To prepare this for input into our fully connected neural network, we flatten the matrix into a vector format:</p>
<h3 id="heading-step-7-running-the-flattened-values-through-the-neural-network">Step 7: Running the Flattened Values Through the Neural Network</h3>
<p>Once the values are flattened, we can pass them through our fully connected layer, which includes the hidden layer along with the activation function, to obtain the final output for classification.</p>
<h3 id="heading-step-8-configuring-the-optimizer">Step 8: Configuring the Optimizer</h3>
<p>Finally, we need to set up the optimizer that will adjust our model's parameters. We pass the parameters we want to optimize, which can be accessed using <code>self.parameters()</code>, into the optimizer. For this implementation, we’ll use the Adam optimizer, setting a learning rate (<code>lr</code>) of 0.001</p>
<p>We have now established a functioning CNN capable of processing images and making predictions. With proper training and validation, this model will learn to classify images effectively based on the features extracted from the data.</p>
<pre><code class="lang-python"><span class="hljs-comment">## Now build a simple CNN...</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">SimpleCNN</span>(<span class="hljs-params">L.LightningModule</span>):</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>

        super().__init__() 
        L.seed_everything(seed=<span class="hljs-number">42</span>)

        self.conv = nn.Conv2d(in_channels=<span class="hljs-number">1</span>, out_channels=<span class="hljs-number">1</span>, kernel_size=<span class="hljs-number">3</span>)

        self.pool = nn.MaxPool2d(kernel_size=<span class="hljs-number">2</span>, stride=<span class="hljs-number">2</span>)

        self.input_to_hidden = nn.Linear(in_features=<span class="hljs-number">4</span>, out_features=<span class="hljs-number">1</span>)
        <span class="hljs-comment">## ..and the single hidden layer, in_features=1, goes to</span>
        <span class="hljs-comment">## two outputs, out_features=2</span>
        self.hidden_to_output = nn.Linear(in_features=<span class="hljs-number">1</span>, out_features=<span class="hljs-number">2</span>)

        self.loss = nn.CrossEntropyLoss()


    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self, x</span>):</span>

        <span class="hljs-comment">## First we apply a filter to the input image</span>
        x = self.conv(x)

        <span class="hljs-comment">## Then we run the output from the filter through a ReLU...</span>
        x = F.relu(x)
        <span class="hljs-comment">## Then we run the output from the ReLU through a Max Pooling layer...</span>
        x = self.pool(x)
        x = torch.flatten(x, <span class="hljs-number">1</span>) <span class="hljs-comment"># flatten all dimensions except batch </span>
        x = self.input_to_hidden(x)
        x = F.relu(x)
        x = self.hidden_to_output(x)

        <span class="hljs-keyword">return</span> x


    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">configure_optimizers</span>(<span class="hljs-params">self</span>):</span>

        <span class="hljs-keyword">return</span> Adam(self.parameters(), lr=<span class="hljs-number">0.001</span>)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">training_step</span>(<span class="hljs-params">self, batch, batch_idx</span>):</span>

        inputs, labels = batch 

        outputs = self.forward(inputs)

        <span class="hljs-comment">## Then we calculate the loss.</span>
        loss = self.loss(outputs, labels)


        <span class="hljs-keyword">return</span> loss
</code></pre>
<h2 id="heading-training-our-neural-network">Training our Neural Network</h2>
<p>To train our new convolutional neural network, we are creating a model based on the new class, SimpleCNN. Create a Lightning Trainer using the function <code>L.Trainer()</code>, and utilize it to optimize the parameters. Please note that we will begin with 100 epochs, which means we will complete 100 full passes through our training data. This may be sufficient to successfully optimize all of the parameters, but there is a possibility it might not be enough.</p>
<pre><code class="lang-python">model = SimpleCNN()
trainer = L.Trainer(max_epochs=<span class="hljs-number">700</span>)
trainer.fit(model, train_dataloaders=dataloader)
</code></pre>
<pre><code class="lang-python">INFO: 💡 Tip: For seamless cloud uploads <span class="hljs-keyword">and</span> versioning, <span class="hljs-keyword">try</span> installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically <span class="hljs-keyword">with</span> the Lightning model registry.
INFO:lightning.pytorch.utilities.rank_zero:💡 Tip: For seamless cloud uploads <span class="hljs-keyword">and</span> versioning, <span class="hljs-keyword">try</span> installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically <span class="hljs-keyword">with</span> the Lightning model registry.
INFO: GPU available: <span class="hljs-literal">False</span>, used: <span class="hljs-literal">False</span>
INFO:lightning.pytorch.utilities.rank_zero:GPU available: <span class="hljs-literal">False</span>, used: <span class="hljs-literal">False</span>
INFO: TPU available: <span class="hljs-literal">False</span>, using: <span class="hljs-number">0</span> TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: <span class="hljs-literal">False</span>, using: <span class="hljs-number">0</span> TPU cores
INFO: HPU available: <span class="hljs-literal">False</span>, using: <span class="hljs-number">0</span> HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: <span class="hljs-literal">False</span>, using: <span class="hljs-number">0</span> HPUs
INFO: 
  | Name             | Type             | Params | Mode 
--------------------------------------------------------------
<span class="hljs-number">0</span> | conv             | Conv2d           | <span class="hljs-number">10</span>     | train
<span class="hljs-number">1</span> | pool             | MaxPool2d        | <span class="hljs-number">0</span>      | train
<span class="hljs-number">2</span> | input_to_hidden  | Linear           | <span class="hljs-number">5</span>      | train
<span class="hljs-number">3</span> | hidden_to_output | Linear           | <span class="hljs-number">4</span>      | train
<span class="hljs-number">4</span> | loss             | CrossEntropyLoss | <span class="hljs-number">0</span>      | train
--------------------------------------------------------------
<span class="hljs-number">19</span>        Trainable params
<span class="hljs-number">0</span>         Non-trainable params
<span class="hljs-number">19</span>        Total params
<span class="hljs-number">0.000</span>     Total estimated model params size (MB)
<span class="hljs-number">5</span>         Modules <span class="hljs-keyword">in</span> train mode
<span class="hljs-number">0</span>         Modules <span class="hljs-keyword">in</span> eval mode
INFO:lightning.pytorch.callbacks.model_summary:
  | Name             | Type             | Params | Mode 
--------------------------------------------------------------
<span class="hljs-number">0</span> | conv             | Conv2d           | <span class="hljs-number">10</span>     | train
<span class="hljs-number">1</span> | pool             | MaxPool2d        | <span class="hljs-number">0</span>      | train
<span class="hljs-number">2</span> | input_to_hidden  | Linear           | <span class="hljs-number">5</span>      | train
<span class="hljs-number">3</span> | hidden_to_output | Linear           | <span class="hljs-number">4</span>      | train
<span class="hljs-number">4</span> | loss             | CrossEntropyLoss | <span class="hljs-number">0</span>      | train
--------------------------------------------------------------
<span class="hljs-number">19</span>        Trainable params
<span class="hljs-number">0</span>         Non-trainable params
<span class="hljs-number">19</span>        Total params
<span class="hljs-number">0.000</span>     Total estimated model params size (MB)
<span class="hljs-number">5</span>         Modules <span class="hljs-keyword">in</span> train mode
<span class="hljs-number">0</span>         Modules <span class="hljs-keyword">in</span> eval mode
</code></pre>
<p>Epoch 699: 100%</p>
<p> 2/2 [00:00&lt;00:00, 82.10it/s, v_num=2]</p>
<pre><code class="lang-python">INFO: `Trainer.fit` stopped: `max_epochs=<span class="hljs-number">700</span>` reached.
INFO:lightning.pytorch.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=<span class="hljs-number">700</span>` reached.
</code></pre>
<p>Having completed the training of our model, we are now positioned to utilize it for making predictions using new data. In particular, we will evaluate the efficacy of our model in predicting an image of the letter <strong>"X"</strong> that has been shifted one pixel to the right. To initiate this process, we will first generate an image of the letter <strong>"X"</strong> that is displaced by one pixel.</p>
<pre><code class="lang-python">shifted_x_image = [[<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>],
                   [<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>],
                   [<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>],
                   [<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>],
                   [<span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>],
                   [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>]]
shifted_x_image
</code></pre>
<p>Lets check the image by drawing with matplotlib</p>
<pre><code class="lang-python">fig, ax = plt.subplots(figsize=(<span class="hljs-number">2.5</span>, <span class="hljs-number">2.5</span>))
ax.imshow(shifted_x_image, cmap=<span class="hljs-string">'gray_r'</span>) <span class="hljs-comment">## Setting cmap='gray_r' gives us reverse grayscale.</span>
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753630836766/19756479-b163-457f-9fc6-11962d012018.png" alt class="image--center mx-auto" /></p>
<p>Let's see if our trained convolutional neural network can accurately classify it as an X.</p>
<pre><code class="lang-python"><span class="hljs-comment">## First, let's make a prediction with the new image...</span>
prediction = model(torch.tensor([shifted_x_image]).type(torch.float32))

<span class="hljs-comment">## Now make the prediction easy to read and interpret by</span>
<span class="hljs-comment">## running it through torch.softmax() and torch.round()</span>
predicted_label = torch.round(torch.softmax(prediction, dim=<span class="hljs-number">1</span>), decimals=<span class="hljs-number">2</span>) <span class="hljs-comment">## dim=0 applies argmax to rows, dim=1 applies argmax to colum</span>

predicted_label
</code></pre>
<pre><code class="lang-python">tensor([[<span class="hljs-number">0.0200</span>, <span class="hljs-number">0.9800</span>]], grad_fn=&lt;RoundBackward1&gt;)
</code></pre>
<p>We see that the trained network correctly predicted X, as the second output value, representing X, is larger than the first output value, representing O.</p>
]]></content:encoded></item><item><title><![CDATA[Single Layer Neural Network Using PyTorch]]></title><description><![CDATA[In this article, we will explore the Iris flower dataset, a well-known and historically significant dataset in the field of machine learning. Originally introduced by the statistician Ronald Fisher in 1936, this dataset has been widely used for class...]]></description><link>https://path2ml.com/single-layer-neural-network-using-pytorch</link><guid isPermaLink="true">https://path2ml.com/single-layer-neural-network-using-pytorch</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[pytorch]]></category><category><![CDATA[neural networks]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[DeepLearning]]></category><category><![CDATA[MachineLearning]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Wed, 23 Jul 2025 01:29:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1753228222806/fbaa6065-0648-425a-8876-66c068599ec4.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this article, we will explore the Iris flower dataset, a well-known and historically significant dataset in the field of machine learning. Originally introduced by the statistician Ronald Fisher in 1936, this dataset has been widely used for classification tasks. It consists of 150 samples from three different species of Iris flowers—<strong>Setosa, Versicolor, and Virginica</strong>—each characterized by four features: sepal length, sepal width, petal length, and petal width.</p>
<p>We will utilize the PyTorch framework to develop a classification model that can accurately identify the species of Iris flowers based on these features. Throughout the article, we will walk through the process step by step, from data loading and preprocessing to building, training, and evaluating our model. By the end, you will have a solid understanding of how to apply machine learning techniques to this dataset using PyTorch.</p>
<p>We will utilize the Lightning framework of <strong>PyTorch</strong>, which simplifies the process of building and training deep learning models. This framework provides a high-level interface that promotes best practices, enhances code organization, and facilitates efficient model training and testing. By leveraging Lightning, we can focus on developing our model's architecture and experiment with different training strategies, while the framework handles the boilerplate code and optimization tasks for us.</p>
<p>We will construct the neural network architecture as illustrated below. The model will take two features as input and consist of one hidden layer that contains two neurons. Each of these neurons will utilize the <strong>ReLU</strong> (Rectified Linear Unit) activation function to introduce non-linearity into the model. The final output layer will be designed to classify the input data into one of three distinct categories. This configuration aims to effectively capture the underlying patterns in the data for accurate classification.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753228526407/9f9f549a-7228-436b-aa7d-fe7ce55b83a5.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-start-with-installing-lightening-framework">Start with installing lightening framework</h3>
<pre><code class="lang-python">%%capture
!pip install lightning
</code></pre>
<h3 id="heading-next-we-import-all-libraries">Next we import all libraries</h3>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> torch <span class="hljs-comment"># torch will allow us to create tensors.</span>
<span class="hljs-keyword">import</span> torch.nn <span class="hljs-keyword">as</span> nn <span class="hljs-comment"># torch.nn allows us to create a neural network.</span>
<span class="hljs-comment"># nn.functional give us access to the activation and loss functions.</span>
<span class="hljs-keyword">import</span> torch.nn.functional <span class="hljs-keyword">as</span> F 
<span class="hljs-keyword">from</span> torch.optim <span class="hljs-keyword">import</span> Adam <span class="hljs-comment"># optim contains many optimizers. This time we're using Adam</span>

<span class="hljs-keyword">import</span> lightning <span class="hljs-keyword">as</span> L <span class="hljs-comment"># lightning has tons of cool tools that make neural networks easier</span>
<span class="hljs-comment"># these are needed for the training data</span>
<span class="hljs-keyword">from</span> torch.utils.data <span class="hljs-keyword">import</span> TensorDataset, DataLoader 

<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd <span class="hljs-comment"># We'll use pandas to read in the data and normalize it</span>
<span class="hljs-comment"># We'll use this to create training and testing datasets</span>
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split 
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> MinMaxScaler
<span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> load_iris
</code></pre>
<h3 id="heading-we-load-iris-dataset-using-scikit">We load iris dataset using Scikit</h3>
<pre><code class="lang-python">iris= load_iris(as_frame=<span class="hljs-literal">True</span>)
df=iris.data
</code></pre>
<p>The dataset consists of 150 samples total, 50 for each of 3 species of Iris, Setosa, Versicolor, and Virginica.</p>
<pre><code class="lang-python">df.shape
(<span class="hljs-number">150</span>, <span class="hljs-number">4</span>)
</code></pre>
<p>To start our analysis, we need to divide the dataset into training and testing subsets. The first step in this process is to identify and separate the relevant columns into two distinct DataFrames: one for the input values and another for the labels.</p>
<p>The first DataFrame, which we will name "<strong>input_values</strong>," will contain the features that we will use to make our predictions. Specifically, this DataFrame will include the measurements of the petal and sepal widths, which are critical for our predictive model.</p>
<p>The second DataFrame, labeled "<strong>label_values</strong>," will hold the target variable we aim to predict. This DataFrame will consist of the species classifications, which will allow us to assess the accuracy and effectiveness of our predictions once the model is trained.</p>
<p>By clearly defining these two DataFrames, we set the foundation for an organized approach to model training and evaluation.</p>
<p>In this example, we will keep the neural network simple by using only the values for petal width and sepal width as inputs. First, we'll ensure we can correctly isolate the columns we want from those we don't need. To do this, we will pass the DataFrame (df) a list of the column names we want to retrieve values for: ['petal width (cm), 'sepal width (cm)'].</p>
<pre><code class="lang-python">input_values = df[[<span class="hljs-string">'petal width (cm)'</span>, <span class="hljs-string">'sepal width (cm)'</span>]]
label_values = iris.target
</code></pre>
<p>Using the <strong>DataFrame</strong> <strong>factorize</strong>() function, you will get two outputs: a list of numeric codes, shaped the same as your input, and an array of unique values that represent what each number corresponds to.</p>
<pre><code class="lang-python">classes_as_numbers, classes = label_values.factorize()
</code></pre>
<p>We will separate the variables, specifically <code>input_values</code> and <code>classes_as_numbers</code>, into distinct training and testing datasets. This process is essential for building a robust machine learning model and helps us evaluate its performance effectively. To accomplish this, we will utilize the <code>train_test_split()</code> function from the <code>sklearn</code> library. This function allows us to randomly partition our data, ensuring that we have a subset for training the model and a separate subset for testing its accuracy and reliability.</p>
<pre><code class="lang-python">input_train, input_test, label_train, label_test = train_test_split(input_values,
                                                                    classes_as_numbers,
                                                                    test_size=<span class="hljs-number">0.25</span>,
                                                                    stratify=classes_as_numbers)input_train.shape
</code></pre>
<pre><code class="lang-python">input_train.shape
(<span class="hljs-number">112</span>, <span class="hljs-number">2</span>)
input_test.shape
(<span class="hljs-number">38</span>, <span class="hljs-number">2</span>)
</code></pre>
<p>Since our neural network has three outputs, one for each species (as illustrated in the drawing of the neural network above), we need to convert the numbers in <code>label_train</code> into arrays with three elements. Each element in the array corresponds to a specific output of the neural network. We will use the following encoding: [1.0, 0.0, 0.0] for Setosa, [0.0, 1.0, 0.0] for Versicolor, and [0.0, 0.0, 1.0] for Virginica. The good news is that we can easily perform this one-hot encoding. Additionally, we'll use <code>type(torch.float32)</code> to ensure that the numbers are stored in the correct format for efficient processing by the neural network.</p>
<pre><code class="lang-python">one_hot_label_train = F.one_hot(torch.tensor(label_train)).type(torch.float32)
</code></pre>
<p>To enhance the effectiveness of our machine learning models, it is important to normalize the input variables so that their values fall within a range of 0 to 1. Normalization standardizes the data, ensuring that all features contribute equally during the training process. This scaling helps to improve the model's convergence and overall performance. To achieve this, we will utilize the MinMaxScaler, a tool provided by the scikit-learn library, which efficiently transforms the data by adjusting the minimum and maximum values accordingly.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Initialize the scaler</span>
scaler = MinMaxScaler()
input_train_normalized = scaler.fit_transform(input_train)
input_test_normalized = scaler.fit_transform(input_test)
</code></pre>
<p>To effectively train our neural network, we need to organize our training data into a DataLoader. DataLoaders are particularly useful for handling large datasets, as they facilitate the processing of data in manageable batches. This approach not only allows us to shuffle the dataset at the beginning of each epoch, enhancing the training process by reducing potential overfitting, but it also lets us work with a smaller subset of the data if we're aiming for a quick, preliminary run—perfect for debugging our code.</p>
<p>To start, we will convert our training inputs, <code>input_train</code>, into PyTorch tensors using the function <code>torch.tensor()</code>. This step is crucial because neural networks in PyTorch operate with tensors.</p>
<p>Once we have our input data in tensor format, we'll combine <code>input_train</code> with our labels, <code>one_hot_label_train</code>, to form a <code>TensorDataset</code>. This dataset acts as a wrapper that pairs our inputs with their corresponding labels, ensuring that during training, the model learns from the correct label for each input.</p>
<p>Finally, we'll use the <code>TensorDataset</code> to create the DataLoader. By doing so, we can specify parameters such as batch size and whether we would like to shuffle the data. With everything set up in this manner, the DataLoader will streamline the process of feeding data to our neural network during training, enhancing both efficiency and ease of use.</p>
<pre><code class="lang-python"><span class="hljs-comment">## Convert the DataFrame input_train into tensors</span>
input_train_tensors = torch.tensor(input_train.values).type(torch.float32)
<span class="hljs-comment"># Convert the DataFrame input_test into tensors</span>
input_test_tensors = torch.tensor(input_test.values).type(torch.float32)
train_dataset = TensorDataset(input_train_tensors, one_hot_label_train)
train_dataloader = DataLoader(train_dataset)
</code></pre>
<p>To build a neural network using PyTorch, you need to create a new class that inherits from <code>LightningModule</code>. This approach makes it easier to train the neural network.</p>
<p>Our new class will include the following methods:</p>
<ol>
<li><p><code>__init__()</code>: This method initializes the weights and biases, as well as manages other housekeeping tasks.</p>
</li>
<li><p><code>forward()</code>: This method performs a forward pass through the neural network.</p>
</li>
<li><p><code>configure_optimizers()</code>: This method sets up the optimizer. Although there are many optimizers available, for this tutorial, we will use the Adam optimizer.</p>
</li>
<li><p><code>training_step()</code>: This method takes the training data, passes it to the <code>forward()</code> method, calculates the loss, and logs the loss values.</p>
</li>
</ol>
<p>By implementing these methods, we will create a functional and efficient neural network ready for training.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">MultipleInsOuts</span>(<span class="hljs-params">L.LightningModule</span>):</span>

  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self</span>):</span>
    super().__init__()

    L.seed_everything(seed=<span class="hljs-number">42</span>)
    self.input_to_hidden=nn.Linear(in_features=<span class="hljs-number">2</span>,out_features=<span class="hljs-number">2</span>,bias=<span class="hljs-literal">True</span>)
    self.hidden_to_output = nn.Linear(in_features=<span class="hljs-number">2</span>, out_features=<span class="hljs-number">3</span>, bias=<span class="hljs-literal">True</span>)
    self.loss = nn.MSELoss(reduction=<span class="hljs-string">'sum'</span>)

  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">forward</span>(<span class="hljs-params">self,input</span>):</span>

    <span class="hljs-comment">## First, we run the input values to the activation functions</span>
        <span class="hljs-comment">## in the hidden layer</span>
        hidden = self.input_to_hidden(input)
        <span class="hljs-comment">## Then we run the values through a ReLU activation function</span>
        <span class="hljs-comment">## and then run those values to the output.</span>
        output_values = self.hidden_to_output(torch.relu(hidden))
        <span class="hljs-keyword">return</span>(output_values)

  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">configure_optimizers</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-comment">## configuring the optimizer</span>
        <span class="hljs-comment">## consists of passing it the weights and biases we want</span>
        <span class="hljs-comment">## to optimize, which are all in self.parameters(),</span>
        <span class="hljs-comment">## and setting the learning rate with lr=0.001.</span>
        <span class="hljs-keyword">return</span> Adam(self.parameters(), lr=<span class="hljs-number">0.001</span>)

  <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">training_step</span>(<span class="hljs-params">self, batch, batch_idx</span>):</span>
        <span class="hljs-comment">## The first thing we do is split 'batch'</span>
        <span class="hljs-comment">## into the input and label values.</span>
        inputs, labels = batch

        <span class="hljs-comment">## Then we run the input through the neural network</span>
        outputs = self.forward(inputs)

        <span class="hljs-comment">## Then we calculate the loss.</span>
        loss = self.loss(outputs, labels)

        <span class="hljs-keyword">return</span> loss
</code></pre>
<p>Training our new neural network involves creating a model from the new class, MultipleInsOuts.</p>
<pre><code class="lang-python">model = MultipleInsOuts()
</code></pre>
<pre><code class="lang-python">INFO: Seed set to <span class="hljs-number">42</span>
INFO:lightning.fabric.utilities.seed:Seed set to <span class="hljs-number">42</span>
</code></pre>
<p>We will develop a Lightning Trainer, referred to as L.Trainer, aimed at optimizing our model parameters. The training process will commence with an initial setting of 100 epochs. This approach allows us to thoroughly evaluate and adjust the model's performance over multiple iterations, ensuring that we can refine our techniques and achieve better accuracy in our results.</p>
<pre><code class="lang-python">trainer = L.Trainer(max_epochs=<span class="hljs-number">100</span>)
trainer.fit(model, train_dataloaders=train_dataloader)
</code></pre>
<h3 id="heading-lets-test-using-test-data">Lets test using test data</h3>
<pre><code class="lang-python"><span class="hljs-comment"># Run the input_test_tensors through the neural network</span>
predictions = model(input_test_tensors)

<span class="hljs-comment">## Select the output with highest value...</span>
predicted_labels = torch.argmax(predictions, dim=<span class="hljs-number">1</span>) <span class="hljs-comment">## dim=0 applies softmax to rows, dim=1 applies softmax to columns</span>

torch.sum(torch.eq(torch.tensor(label_test), predicted_labels)) / len(predicted_labels)
</code></pre>
<pre><code class="lang-python">tensor(<span class="hljs-number">0.8947</span>)
</code></pre>
<p>We get <strong>89% Accuracy</strong></p>
<p>With our model now trained, we can use it to make predictions from new data. This is achieved by passing the model a tensor that includes normalized petal and sepal widths.</p>
<p>Jupyter notebook is available at github at <a target="_blank" href="https://github.com/learner14/MachineLearning/tree/main/iris_PyTorch">iris_PyTorch</a></p>
]]></content:encoded></item><item><title><![CDATA[Linear Regression]]></title><description><![CDATA[Machine learning is an evolving domain within the field of artificial intelligence (AI) that focuses on the development of algorithms capable of learning from data. These algorithms empower systems to improve their performance on tasks over time thro...]]></description><link>https://path2ml.com/linear-regression</link><guid isPermaLink="true">https://path2ml.com/linear-regression</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[linearregression]]></category><category><![CDATA[Gradient-Descent ]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Wed, 25 Jun 2025 14:43:39 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750862485515/47f33022-ab43-4e6a-8653-fd3b47dde67e.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Machine learning is an evolving domain within the field of artificial intelligence (AI) that focuses on the development of algorithms capable of learning from data. These algorithms empower systems to improve their performance on tasks over time through experience, primarily by recognizing patterns and making data-driven predictions. Among the diverse array of techniques employed in machine learning, linear regression emerges as a fundamental statistical approach extensively used for analyzing relationships among variables.</p>
<p>At its core, linear regression examines the connection between a dependent variable—often referred to as the outcome we aim to predict—and one or multiple independent variables, which are the factors believed to influence this outcome. This technique is particularly powerful in situations where the relationship between variables can be approximated by a straight line.</p>
<p>To implement linear regression, a mathematical line is fitted to the data points in such a manner that the distance between the line and the actual data points is minimized. This method, known as the least squares method, optimally determines the line that best represents the data. The equation of this line can be expressed as</p>
<p>\(\textbf {Y=mX+c}\)</p>
<p>Where (Y) is the dependent variable, (X) is the independent variable, (m) denotes the slope of the line indicating how changes in (X) impact (Y), and (c) represents the y-intercept.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750799979935/ba4123a9-c328-4f5d-88cd-60e49802dbb6.gif" alt class="image--center mx-auto" /></p>
<p>By effectively drawing this line, linear regression not only aids in forecasting and predicting outcomes but also facilitates a deeper understanding of trends in data. It can be utilized to draw insights that inform decision-making processes across various fields, such as economics, where it can elucidate market trends; biology, for exploring relationships between biological factors; engineering, for modeling processes and behaviors; and social sciences, for analyzing societal trends and implications.</p>
<p>A linear model generates predictions by first assessing each input feature, which represents a specific characteristic of the data. It calculates a weighted sum, where each feature is multiplied by a corresponding weight that reflects its importance in the prediction process. To refine this computation, a constant value known as the bias term is added. This bias term allows the model to adjust its predictions to better fit the observed data, ensuring more accurate outcomes.</p>
<p>The model function for linear regression is represented as</p>
<p>\(\textbf  f_{w,b}(x) = \textbf {wx + b}\)</p>
<p>Where \(w\) is weights and \(b\) is bias</p>
<p>for multivariate problem which has many features this can be defines ad</p>
<p>\(\hat{y}= w_0+w_1x_1+w_2x_2........w_nx_n\)</p>
<p>\(\hat{y}\) is predicted value, \(w_0\) is bias term and \(n\)is the number of features.</p>
<p>Or we can say \(\hat{y}= w.x\)</p>
<ul>
<li><p>\(w\) is the model’s parameter vector, containing the bias term \(w_0\) and the feature weights \(w_1\) to \(w_n\) .</p>
</li>
<li><p>x is the instance’s feature vector, containing \(x_1\) to \(x_n\).</p>
</li>
<li><p>\(w.x\) is the dot product of the vectors \(w\) and \(x\).</p>
</li>
</ul>
<p>To evaluate the effectiveness of various pairs of parameters \(\textbf {(w,b)}\) in a linear regression model, we employ a cost function represented as \(\textbf {J(w,b)}\). This function plays a crucial role in measuring the performance of the selected parameters by quantifying the discrepancy between the predicted outcomes generated by the linear model and the actual target values observed in the dataset.</p>
<p>In more detail, the cost function typically calculates <strong>the sum of the squared differences between the predicted values (obtained from the linear equation defined by (w) and (b)) and the true values</strong> from the data. This quantification allows us to assess how accurately the linear model is able to predict outcomes based on the input features.</p>
<p>By systematically varying the parameters (w) (the weights) and (b) (the bias) and computing the corresponding values of the cost function (J(w,b)), we can analyze and compare the performance for different combinations. The ultimate goal of this assessment is to identify the pair of parameters that results in the lowest cost, indicating the best fit for the linear regression model. This process is foundational in optimizing the model to achieve the most accurate predictions possible based on the input data.</p>
<p>The <strong>cost function</strong> for linear regression \(\textbf {J(w, b) }\) is defined as</p>
<p>\( J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 \)</p>
<p>To find the optimal values for the parameters \(\textbf {(w,b)}\) that minimize the cost function \(\textbf {J(w, b) }\), one effective method we can use is <strong>gradient descent</strong>. This powerful iterative optimization technique systematically refines the parameter values over time.</p>
<p>The process begins by calculating the gradient of the cost function, which indicates the direction of the steepest increase in cost. By following the negative gradient, essentially moving in the opposite direction, gradient descent gradually adjusts \(\textbf {(w,b)}\) in small steps. Each update is designed to reduce the cost function \(\textbf {(w,b)}\), guiding the parameters toward those values that achieve the lowest cost.</p>
<p>Through repeated iterations, where each step brings us closer to the optimal solution, gradient descent effectively uncovers the best-fitting parameters for your model. This method is not only fundamental in machine learning but also widely applicable in various optimization problems across different fields.</p>
<p>the gradient descent algorithm is:</p>
<p>\(\begin{align*}&amp; \text{repeat until convergence:} \; \lbrace \newline \; &amp; \phantom {0000} b := b - \alpha \frac{\partial J(w,b)}{\partial b} \newline \; &amp; \phantom {0000} w := w - \alpha \frac{\partial J(w,b)}{\partial w}  \; &amp; \newline &amp; \rbrace\end{align*}\)<br />where, parameters</p>
<p>\(\textbf w\), <strong>b</strong> are both updated simultaniously and where</p>
<p>\(\frac{\partial J(w,b)}{\partial b} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{2}\)</p>
<p>\(\frac{\partial J(w,b)}{\partial w} = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) -y^{(i)})x^{(i)} \)</p>
<ul>
<li><p><strong>m</strong> is the number of training examples in the dataset</p>
</li>
<li><p>\(f_{w,b}(x^{(i)})\)is the model's prediction, while \(y^{(i)},\) is the target value</p>
</li>
</ul>
<h3 id="heading-deriving-partial-derivative-of-cost-function-for-gradient-descent">Deriving Partial derivative of cost function for gradient descent</h3>
<p>Lets find the above gradient descent equations by applying partial derivative on function \(\textbf {J(w, b) }\)with respect to <strong>w</strong> and <strong>b.</strong></p>
<p>\(\frac{\partial J(w,b)}{\partial b}=\frac{\partial      \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2                }{\partial b}\)</p>
<p>Applying the chain rule of derivative with respect to <strong>b</strong> we get</p>
<p>$$\frac {\partial J(w,b)}{\partial b}= { 2.\frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})}={\frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})}$$</p><p>Applying the chain rule and sum rule of derivative with respect to <strong>w</strong> we get</p>
<p>$$\frac {\partial J(w,b)}{\partial w}= { 2.\frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})}x^{(i)}={\frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})}x^{(i)}$$</p><p>We start the process by assigning random values to the weights, denoted as <strong>w</strong> . This initial step is referred to as random initialization and serves as the starting point for our optimization. Once the weights are initialized, we enter the optimization phase, where we strive to improve these weights iteratively.</p>
<p>During this phase, we take small, measured steps—often referred to as "baby steps"—to modify the weights. Each step involves calculating the gradient of the cost function, such as the Mean Squared Error (MSE), with respect to the weights. The gradient informs us of the direction in which to adjust ( <strong>w</strong> ) in order to minimize the cost function.</p>
<p>We continue this iterative process of adjusting the weights and evaluating the cost function until the algorithm converges, meaning that further adjustments result in negligible changes to the cost function. This convergence indicates that we have reached a minimum point, where the weights are optimized for our model.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750861986002/7bdd74bf-9d32-4a51-8f3d-2fa164b13aaa.gif" alt class="image--center mx-auto" /></p>
<p>When the learning rate \( \alpha\) is set excessively high, it can cause the optimization process to overshoot the minimum point in the loss landscape, akin to a ball that rolls too far down one side of a valley and ends up on the opposite slope. Instead of homing in on the optimal solution, the algorithm may land at a point with a higher error value than it started with. This misstep can lead to divergence, where the values of the solution continue to increase uncontrollably, ultimately preventing the algorithm from converging on a suitable and effective solution.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750862430855/78edb6db-589e-4a8d-90c9-22b3ebc719dd.gif" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[Performance Measures for Classification model using Scikit]]></title><description><![CDATA[Performance metrics are essential tools for assessing the effectiveness and reliability of classification machine learning models. These metrics provide a structured and quantitative approach to evaluate how accurately a model can assign data points ...]]></description><link>https://path2ml.com/performance-measures-for-classification-model-using-scikit</link><guid isPermaLink="true">https://path2ml.com/performance-measures-for-classification-model-using-scikit</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Deep Learning]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Sat, 21 Jun 2025 22:21:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750520176307/e9d44a9d-6b32-4348-926d-d0e3d6f25a22.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Performance metrics are essential tools for assessing the effectiveness and reliability of classification machine learning models. These metrics provide a structured and quantitative approach to evaluate how accurately a model can assign data points to specific, predefined categories. A thorough evaluation of a model's performance typically includes a range of measures, each offering unique insights into different aspects of its predictive capabilities. Key metrics include accuracy, precision, recall, F1 score, and the area under the receiver operating characteristic curve (AUC-ROC).</p>
<h2 id="heading-confusion-matrix">Confusion Matrix</h2>
<p>A confusion matrix is a tool used to evaluate the performance of a classification model. It is a table that summarizes the results of the model's predictions compared to the actual outcomes. The matrix typically has four components:</p>
<ol>
<li><p>True Positives (TP): The cases in which the model correctly predicted the positive class.</p>
</li>
<li><p>True Negatives (TN): The cases where the model correctly predicted the negative class.</p>
</li>
<li><p>False Positives (FP): The instances in which the model incorrectly predicted the positive class (also known as Type I error).</p>
</li>
<li><p>False Negatives (FN): The cases where the model failed to predict the positive class but should have (also known as Type II error).</p>
</li>
</ol>
<p>From these four values, various performance metrics can be calculated, such as accuracy, precision, recall, and F1-score, which help in understanding how well the model is performing.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750521259729/10ad975b-3510-4bd4-ba24-4d4d842e4a0c.png" alt class="image--center mx-auto" /></p>
<p><strong>Accuracy</strong> is often seen as the most straightforward metric. It represents the overall proportion of correct predictions made by the model, combining both true positives (correctly identified positive instances) and true negatives (correctly identified negative instances) relative to the total number of predictions. Although accuracy is a useful starting point, it can be misleading in cases where the dataset is imbalanced — for instance, in scenarios where one class significantly outweighs another. In such cases, a high accuracy rate might mask poor performance in predicting the minority class.</p>
<p><strong>Precision</strong> is another critical metric that specifically focuses on the accuracy of the positive predictions made by the model. It is calculated as the number of true positives divided by the sum of true positives and false positives. High precision is particularly crucial in contexts where the consequences of false positives are high, such as in fraud detection or medical testing, where incorrect positive identifications can lead to unnecessary interventions or alarm.</p>
<p>\(\mathbf {precision= \frac {TP}{TP+FP}}\)</p>
<p><strong>Recall</strong>, also known as sensitivity, measures the model's ability to identify all relevant cases within a dataset. It quantifies this capability by dividing the number of true positives by the sum of true positives and false negatives. High recall values are especially important in areas where the cost of missing a positive case can have severe ramifications, such as in disease screening or safety-critical applications, where overlooking a positive instance could result in dire outcomes.</p>
<p>\(\mathbf {recall= \frac {TP}{TP+FN}}\)</p>
<p>The <strong>F1 score</strong> is a composite measure that serves as the harmonic mean of precision and recall. This metric is particularly beneficial in scenarios where both false positives and false negatives carry significant weight, as it offers a single score that balances both metrics. It becomes particularly important in the context of imbalanced classes, where one class may be much smaller than the other, leading to inflated accuracy metrics that do not faithfully represent model performance.</p>
<p>\(\mathbf {F1= \frac {Precision * Recall}{Precision+Recall}}\)</p>
<p>Finally, the <strong>area under the receiver operating characteristic curve (AUC-ROC)</strong> provides a nuanced perspective on the model's capability to differentiate between classes across various classification thresholds. It plots the true positive rate against the false positive rate, outlining the trade-offs involved in model predictions at differing levels of sensitivity and specificity. A high AUC value indicates that the model is effective at distinguishing between classes, giving practitioners a clear indication of performance across a continuum of potential decision thresholds.</p>
<p>By meticulously analyzing these diverse performance metrics, data scientists and machine learning practitioners can uncover the strengths and weaknesses of their models. This multifaceted evaluation empowers them to make informed adjustments and enhancements to their models, ultimately leading to improved performance and more accurate predictions in real-world applications. This rigorous approach not only enhances model robustness but also fosters a deeper understanding of the models' operational characteristics in various contexts.</p>
<h3 id="heading-lets-analyze-these-metrics-using-an-example-of-a-classification-model">Let’s analyze these metrics using an example of a classification model.</h3>
<p>We will use the MNIST dataset, which is available from the Scikit library. Lets load this data</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> fetch_openml
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
mnist = fetch_openml(<span class="hljs-string">'mnist_784'</span>, as_frame=<span class="hljs-literal">False</span>)
mnist.keys()
</code></pre>
<pre><code class="lang-python">dict_keys([<span class="hljs-string">'data'</span>, <span class="hljs-string">'target'</span>, <span class="hljs-string">'frame'</span>, <span class="hljs-string">'categories'</span>, <span class="hljs-string">'feature_names'</span>, <span class="hljs-string">'target_names'</span>, <span class="hljs-string">'DESCR'</span>, <span class="hljs-string">'details'</span>, <span class="hljs-string">'url'</span>])
</code></pre>
<p>Create the features and target lables and check the shape of the data</p>
<pre><code class="lang-python">X, y = mnist.data, mnist.target
X.shape
</code></pre>
<pre><code class="lang-python">(<span class="hljs-number">70000</span>, <span class="hljs-number">784</span>)
</code></pre>
<p>Lets check the first data point</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">plot_digit</span>(<span class="hljs-params">image_data</span>):</span>
    image = image_data.reshape(<span class="hljs-number">28</span>, <span class="hljs-number">28</span>)
    plt.imshow(image, cmap=<span class="hljs-string">"binary"</span>)
    plt.axis(<span class="hljs-string">"off"</span>)

some_digit = X[<span class="hljs-number">0</span>]
plot_digit(some_digit)
plt.show()
</code></pre>
<p>It is digit 5</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750536156385/bdb7d3aa-8ade-43f9-a7a3-ecb48c4f9234.png" alt /></p>
<p>We will divide the data into train and test</p>
<pre><code class="lang-python">X_train, X_test, y_train, y_test = X[:<span class="hljs-number">60000</span>], X[<span class="hljs-number">60000</span>:], y[:<span class="hljs-number">60000</span>], y[<span class="hljs-number">60000</span>:]
</code></pre>
<p>In order to effectively display all performance metrics, we will develop a binary classifier. This classifier will categorize the labels by assigning a value of true when the digit is 5 and a value of false for all other digits. This approach will allow us to analyze the model's ability to correctly identify the presence of the digit 5 compared to other digits.</p>
<pre><code class="lang-python">y_train_5 = (y_train == <span class="hljs-string">'5'</span>)  <span class="hljs-comment"># True for all 5s, False for all other digits</span>
y_test_5 = (y_test == <span class="hljs-string">'5'</span>)
</code></pre>
<p>We will be implementing a stochastic gradient descent (SGD) classifier, which is a powerful and efficient approach for optimizing our machine learning model. This method works by updating the model's parameters incrementally, using randomly selected subsets of the training data known as mini-batches. By doing so, we can navigate the loss function more effectively, allowing us to refine our model's performance while also reducing the computational burden typically associated with processing the entire dataset at once. This iterative process helps us find the optimal weights for our model, ultimately leading to better predictions.</p>
<pre><code class="lang-python">sgd_clf = SGDClassifier(random_state=<span class="hljs-number">42</span>)
sgd_clf.fit(X_train, y_train_5)
</code></pre>
<p>After calling on model fit lets test this model using first digit by calling predict function</p>
<pre><code class="lang-python">sgd_clf.predict([some_digit])
array([ <span class="hljs-literal">True</span>])
</code></pre>
<p>In order to assess the accuracy of our model, we will utilize a technique known as cross-validation. Specifically, we will employ the <code>cross_val_score</code> function from the scikit-learn library. This function allows us to evaluate the performance of our model by splitting the dataset into multiple subsets, training the model on some of these subsets, and validating it on the remaining ones. By repeating this process several times, we can obtain a more reliable estimate of the model's accuracy.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> cross_val_score

cross_val_score(sgd_clf, X_train, y_train_5, cv=<span class="hljs-number">3</span>, scoring=<span class="hljs-string">"accuracy"</span>)
</code></pre>
<pre><code class="lang-python">array([<span class="hljs-number">0.95035</span>, <span class="hljs-number">0.96035</span>, <span class="hljs-number">0.9604</span> ])
</code></pre>
<p>It gives an accuracy of 95% which is good, but to better understand we will create Confusion matrix and will analyze other performance metrics.</p>
<p>We will utilize the <code>cross_val_predict</code> method from the scikit-learn library to generate predicted values based on our model. This method allows us to perform cross-validation and provides a way to obtain predictions for each data point in our dataset by training the model multiple times on different subsets of the data. This approach helps ensure that we get a more accurate estimate of the model's performance.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> cross_val_predict

y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=<span class="hljs-number">3</span>)
</code></pre>
<p>We will utilize the <code>confusion_matrix</code> function provided by the scikit-learn library, which allows us to evaluate the performance of our classification model by comparing the predicted classifications to the actual outcomes. This function generates a matrix that summarizes the correct and incorrect predictions, offering insights into the model's accuracy and error types.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> confusion_matrix

cm = confusion_matrix(y_train_5, y_train_pred)
cm
</code></pre>
<pre><code class="lang-python">array([[<span class="hljs-number">53892</span>,   <span class="hljs-number">687</span>],
       [ <span class="hljs-number">1891</span>,  <span class="hljs-number">3530</span>]])
</code></pre>
<h3 id="heading-tn53892-fp687-fn1891-and-tp-3530">TN=53892, FP=687, FN=1891 and TP= 3530</h3>
<h3 id="heading-lets-calculate-the-precision">Lets calculate the <strong>precision</strong></h3>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> precision_score, recall_score

precision_score(y_train_5, y_train_pred)
</code></pre>
<pre><code class="lang-python"><span class="hljs-number">0.8370879772350012</span>
</code></pre>
<h3 id="heading-recall">Recall</h3>
<pre><code class="lang-python">recall_score(y_train_5, y_train_pred)
</code></pre>
<pre><code class="lang-python"><span class="hljs-number">0.6511713705958311</span>
</code></pre>
<h3 id="heading-f1-score">F1 score</h3>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> f1_score

f1_score(y_train_5, y_train_pred)
</code></pre>
<pre><code class="lang-python"><span class="hljs-number">0.7325171197343847</span>
</code></pre>
<p>We will utilize the <code>precision_recall_curve</code> function from the scikit-learn library, specifically employing the method set to "<strong>decision_function</strong>." This choice allows us to generate <strong>scores</strong> for every instance within our dataset. By doing so, we can conduct a comprehensive assessment of our classification model's performance, as it enables the calculation of precision and recall at various <strong>threshold</strong> levels. This thorough evaluation is crucial for understanding how well our model distinguishes between different classes and identifying the optimal threshold to balance precision and recall effectively.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> precision_recall_curve

precisions, recalls, thresholds = precision_recall_curve(y_train_5, y_scores)
threshold = <span class="hljs-number">3000</span>
</code></pre>
<h3 id="heading-we-will-plot-the-precision-recall-and-threshold">We will plot the precision , recall and threshold</h3>
<pre><code class="lang-python">plt.figure(figsize=(<span class="hljs-number">8</span>, <span class="hljs-number">4</span>))  <span class="hljs-comment"># extra code – it's not needed, just formatting</span>
plt.plot(thresholds, precisions[:<span class="hljs-number">-1</span>], <span class="hljs-string">"b--"</span>, label=<span class="hljs-string">"Precision"</span>, linewidth=<span class="hljs-number">2</span>)
plt.plot(thresholds, recalls[:<span class="hljs-number">-1</span>], <span class="hljs-string">"g-"</span>, label=<span class="hljs-string">"Recall"</span>, linewidth=<span class="hljs-number">2</span>)
plt.vlines(threshold, <span class="hljs-number">0</span>, <span class="hljs-number">1.0</span>, <span class="hljs-string">"k"</span>, <span class="hljs-string">"dotted"</span>, label=<span class="hljs-string">"threshold"</span>)

<span class="hljs-comment"># extra code – this section just beautifies and saves Figure 3–5</span>
idx = (thresholds &gt;= threshold).argmax()  <span class="hljs-comment"># first index ≥ threshold</span>
plt.plot(thresholds[idx], precisions[idx], <span class="hljs-string">"bo"</span>)
plt.plot(thresholds[idx], recalls[idx], <span class="hljs-string">"go"</span>)
plt.axis([<span class="hljs-number">-50000</span>, <span class="hljs-number">50000</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>])
plt.grid()
plt.xlabel(<span class="hljs-string">"Threshold"</span>)
plt.legend(loc=<span class="hljs-string">"center right"</span>)

plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750541781138/88dbbf53-2289-4f20-a533-c47a53857cd7.png" alt /></p>
<p>90 % precision is achieved around 50% recall.</p>
<h3 id="heading-receiver-operating-characteristic-roc">Receiver Operating Characteristic (ROC)</h3>
<p>The Receiver Operating Characteristic (ROC) curve is an important tool used for evaluating the performance of binary classifiers. It visually represents the trade-off between the True Positive Rate (TPR), also known as sensitivity, and the False Positive Rate (FPR). TPR indicates the proportion of actual positive cases that are correctly identified by the model, while FPR reflects the proportion of actual negative cases that are incorrectly classified as positive. Additionally, the True Negative Rate (TNR), which is also called specificity, measures the model’s ability to correctly identify negative cases. The ROC curve essentially plots TPR against 1 minus specificity, providing a graphical representation of the classifier's performance across various threshold settings. This allows for a comprehensive assessment of the model's strengths and weaknesses in distinguishing between the two classes.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> roc_curve

fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)
</code></pre>
<h3 id="heading-lets-get-a-threshold-value-for-a-precision-of-90-by-using-argmax-function">Lets get a threshold value for a precision of 90% by using argmax function.</h3>
<pre><code class="lang-python">idx_for_90_precision = (precisions &gt;= <span class="hljs-number">0.90</span>).argmax()
threshold_for_90_precision = thresholds[idx_for_90_precision]
threshold_for_90_precision
</code></pre>
<p>Plot the ROC curve</p>
<pre><code class="lang-python">idx_for_threshold_at_90 = (thresholds &lt;= threshold_for_90_precision).argmax()
tpr_90, fpr_90 = tpr[idx_for_threshold_at_90], fpr[idx_for_threshold_at_90]

plt.figure(figsize=(<span class="hljs-number">6</span>, <span class="hljs-number">5</span>))  <span class="hljs-comment"># extra code – not needed, just formatting</span>
plt.plot(fpr, tpr, linewidth=<span class="hljs-number">2</span>, label=<span class="hljs-string">"ROC curve"</span>)
plt.plot([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>], [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>], <span class="hljs-string">'k:'</span>, label=<span class="hljs-string">"Random classifier's ROC curve"</span>)
plt.plot([fpr_90], [tpr_90], <span class="hljs-string">"ko"</span>, label=<span class="hljs-string">"Threshold for 90% precision"</span>)

plt.text(<span class="hljs-number">0.12</span>, <span class="hljs-number">0.71</span>, <span class="hljs-string">"Higher\nthreshold"</span>, color=<span class="hljs-string">"#333333"</span>)
plt.xlabel(<span class="hljs-string">'False Positive Rate (Fall-Out)'</span>)
plt.ylabel(<span class="hljs-string">'True Positive Rate (Recall)'</span>)
plt.grid()
plt.axis([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>])
plt.legend(loc=<span class="hljs-string">"lower right"</span>, fontsize=<span class="hljs-number">13</span>)
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750542946343/968d0d1c-a7d6-4a1e-a4a4-2ac0765b65ec.png" alt class="image--center mx-auto" /></p>
<p>Dotted line represents the ROC curve and a good classifier stays far away from this dotted line on top left side.</p>
<p>To effectively evaluate the performance of a classification model, we can measure the area under the Receiver Operating Characteristic <strong>(ROC)</strong> curve. Scikit-learn conveniently provides a function specifically designed for estimating this area. A perfect <strong>ROC-AUC</strong> score, which indicates flawless model performance, is represented by a value of 1. This score signifies that the model can perfectly distinguish between positive and negative classes.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> roc_auc_score

roc_auc_score(y_train_5, y_scores)
</code></pre>
<pre><code class="lang-python">np.float64(<span class="hljs-number">0.9604938554008616</span>)
</code></pre>
<p>Code for this blog is available at <a target="_blank" href="https://github.com/learner14/MachineLearning/blob/main/performanceMeasures/Performance_Measures.ipynb"><strong>PerformanceMeasures</strong></a></p>
]]></content:encoded></item><item><title><![CDATA[Implementing a neural network using Keras for NCAA college basketball game data.]]></title><description><![CDATA[In this Blog, we embark on an exciting journey as data scientists aiming to predict the outcomes of NCAA college basketball games. Our primary objective is to analyze multiple years' worth of game results, process this data meticulously, and utilize ...]]></description><link>https://path2ml.com/implementing-a-neural-network-using-keras-for-ncaa-college-basketball-game-data</link><guid isPermaLink="true">https://path2ml.com/implementing-a-neural-network-using-keras-for-ncaa-college-basketball-game-data</guid><category><![CDATA[kera]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[keras]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[neural networks]]></category><category><![CDATA[DeepLearning]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Mon, 02 Jun 2025 01:08:49 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748824333059/adf9738c-75bd-4064-a4a9-2f356146b0b1.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In this Blog, we embark on an exciting journey as data scientists aiming to predict the outcomes of NCAA college basketball games. Our primary objective is to analyze multiple years' worth of game results, process this data meticulously, and utilize it to train a neural network for accurate predictions.</p>
<p>Our overarching goal is to develop a machine learning model that offers us a competitive advantage in predicting game results. Throughout this project, we will navigate the entire life cycle of a machine learning endeavor, which includes:</p>
<ol>
<li><p><strong>Designing a Neural Network</strong>: We will create a robust neural network architecture using Keras, leveraging its powerful capabilities for building and training models.</p>
</li>
<li><p><strong>Training, Testing, and Validation</strong>: We will initiate a comprehensive training process, followed by rigorous testing and validation phases to ensure our model's accuracy and reliability.</p>
</li>
</ol>
<p>By undertaking these steps, we aim to deliver a cutting-edge machine learning project that not only enhances our understanding of data-driven predictions but also equips us with the tools necessary to make informed outcomes in the realm of college basketball.</p>
<p>Data for this project can be grabbed from <a target="_blank" href="https://github.com/learner14/DeepLearning/blob/main/BasketballGame/Games-Calculated.csv">Games_Calculated.csv</a>.</p>
<p><strong>Columns</strong> are:</p>
<ol>
<li><p>Date of the game</p>
</li>
<li><p>Home Team</p>
</li>
<li><p>Home Team’s Score</p>
</li>
<li><p>Away Team</p>
</li>
<li><p>Away Team’s Score</p>
</li>
<li><p>Home Team’s Offensive average (points scored) while at home</p>
</li>
<li><p>Home Team’s Defensive average (points given up) while at home</p>
</li>
<li><p>Away Team’s Offensive average while away</p>
</li>
<li><p>Away Team’s Defensive average while away</p>
</li>
<li><p>Score difference from the home team’s perspective</p>
</li>
</ol>
<p>Our primary objective is to thoroughly clean and preprocess the dataset by identifying and rectifying any inconsistencies or errors. This includes standardizing formats, removing duplicates, and addressing missing values to ensure the data is reliable. Additionally, we will identify and eliminate any unnecessary columns that do not contribute to our analytical goals. By refining the dataset in this way, we aim to create a streamlined, normalized version that enhances the learning process, ensuring that the insights derived from our analysis are meaningful and actionable.</p>
<p>For this project, I utilized <strong>Google Colab</strong> as my development environment. To begin, I mounted my Google Drive to access the data files stored there. This step is crucial as it allows me to work with the dataset directly from my Drive while taking advantage of Colab's computational resources.</p>
<h3 id="heading-mount-the-drive-to-load-data">Mount the drive to load data</h3>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> google.colab <span class="hljs-keyword">import</span> drive
drive.mount(<span class="hljs-string">'/content/drive'</span>)
</code></pre>
<pre><code class="lang-python">Mounted at /content/drive
</code></pre>
<h3 id="heading-load-the-csv-games-file">Load the CSV games file</h3>
<pre><code class="lang-python">game_file = <span class="hljs-string">'/content/drive/MyDrive/Colab_Notebooks/live_project/game/Games-Calculated.csv'</span>
</code></pre>
<h3 id="heading-import-the-libraries-we-will-use-in-this-project">Import the Libraries we will use in this project</h3>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf
<span class="hljs-keyword">from</span> tensorflow <span class="hljs-keyword">import</span> keras
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
</code></pre>
<p>Specify the column names for the dataset and then load the CSV file into a Pandas DataFrame for further analysis.</p>
<pre><code class="lang-python">column_names = [<span class="hljs-string">'Date'</span>,<span class="hljs-string">'HomeTeam'</span>,<span class="hljs-string">'HomeScore'</span>,<span class="hljs-string">'AwayTeam'</span>,<span class="hljs-string">'AwayScore'</span>,
                <span class="hljs-string">'HomeScoreAverage'</span>,<span class="hljs-string">'HomeDefenseAverage'</span>,<span class="hljs-string">'AwayScoreAverage'</span>,<span class="hljs-string">'AwayDefenseAverage'</span>,
                <span class="hljs-string">'Result'</span>]
data = pd.read_csv(game_file,names=column_names)
</code></pre>
<p>Lets read the first two rows</p>
<pre><code class="lang-python">data.head(<span class="hljs-number">2</span>)
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td><strong>Date</strong></td><td><strong>HomeTeam</strong></td><td><strong>HomeScore</strong></td><td><strong>AwayTeam</strong></td><td><strong>AwayScore</strong></td><td><strong>HomeScoreAverage</strong></td><td><strong>HomeDefenseAverage</strong></td><td><strong>AwayScoreAverage</strong></td><td><strong>AwayDefenseAverage</strong></td><td><strong>Result</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>0</strong></td><td>2015-11-13</td><td>Hawaii</td><td>87</td><td>Montana State</td><td>76</td><td>87.0</td><td>76.0</td><td>76.0</td><td>87.0</td><td>11</td></tr>
<tr>
<td><strong>1</strong></td><td>2015-11-13</td><td>Eastern Michigan</td><td>70</td><td>Vermont</td><td>50</td><td>70.0</td><td>50.0</td><td>50.0</td><td>70.0</td><td>20</td></tr>
</tbody>
</table>
</div><p>We will eliminate the columns that are unnecessary for our training process to streamline the dataset and improve the efficiency of our model.</p>
<pre><code class="lang-python">updated_data=data.drop([<span class="hljs-string">'Date'</span>,<span class="hljs-string">'HomeTeam'</span>,<span class="hljs-string">'HomeScore'</span>,<span class="hljs-string">'AwayTeam'</span>,<span class="hljs-string">'AwayScore'</span>], axis=<span class="hljs-number">1</span>)
updated_data.shape
</code></pre>
<pre><code class="lang-python">(<span class="hljs-number">20160</span>, <span class="hljs-number">5</span>)
</code></pre>
<p>So we have 20160 records and 5 features including the the target lable.</p>
<h3 id="heading-splitting-the-train-and-test-data">Splitting the train and test data</h3>
<p>We need to divide the dataset into two parts using an 80:20 ratio. This means that 80% of the data will be allocated for training our model, while the remaining 20% will be reserved for testing its performance. To achieve this, we will utilize the Pandas library, which provides powerful data manipulation tools. We will first load the dataset into a Pandas DataFrame, then use functions to randomly shuffle and split the data accordingly. This ensures that both the training and testing sets are representative samples of the original dataset.</p>
<pre><code class="lang-python">trainX=updated_data.sample(frac=<span class="hljs-number">0.8</span>,random_state=<span class="hljs-number">0</span>)
testX=updated_data.drop(trainX.index)
</code></pre>
<pre><code class="lang-python">trainX.shape
</code></pre>
<pre><code class="lang-python">(<span class="hljs-number">16128</span>, <span class="hljs-number">5</span>)
</code></pre>
<pre><code class="lang-python">testX.shape
testX.shape
(<span class="hljs-number">4032</span>, <span class="hljs-number">5</span>)
</code></pre>
<p>Currently, we have divided our dataset into two parts: 80% of the data will be used for training our model, while the remaining 20% will serve as the test dataset. Our next step is to create the target variables for both the training and test sets.</p>
<pre><code class="lang-python">trainY=trainX.pop(<span class="hljs-string">'Result'</span>)
testY=testX.pop(<span class="hljs-string">'Result'</span>)
</code></pre>
<h3 id="heading-normalizing-the-data">Normalizing the data</h3>
<p>We will apply data normalization techniques to both the training and testing datasets. Specifically, we will implement <strong>z-score standardization</strong>, which involves converting our data into a standard format. This process will ensure that each feature has a mean of zero and a standard deviation of one, allowing for a more accurate comparison across different scales and distributions. By doing this, we aim to enhance the performance of our machine learning models and improve the overall predictive accuracy.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">z_score_standardization</span>(<span class="hljs-params">df</span>):</span>
    df_scaled = df.copy()
    <span class="hljs-keyword">for</span> column <span class="hljs-keyword">in</span> df.columns:
        df_scaled[column] = (df[column] - df[column].mean()) / df[column].std()
    <span class="hljs-keyword">return</span> df_scaled
</code></pre>
<p>We call the above function for both test and train data to get scaled data.</p>
<pre><code class="lang-python">scaledTrainX=z_score_standardization(trainX)
scaledTestX=z_score_standardization(testX)
</code></pre>
<h3 id="heading-building-the-model">Building the Model</h3>
<p>We will develop a sequential model using Keras, which is a high-level neural networks API. The model will consist of two hidden layers, each containing 32 neurons and utilizing the ReLU (Rectified Linear Unit) activation function to introduce non-linearity. This choice allows the model to learn complex patterns in the data.</p>
<p>Following the two hidden layers, we will include an output layer with a single neuron. This layer will provide the model's prediction, suitable for tasks such as binary classification or regression.</p>
<p>To optimize the model’s performance, we will compile it using the RMSprop optimizer, which is effective for training deep learning models. Additionally, we will define a specific loss function suitable for our task and set up metrics to track the model's performance during training. This comprehensive approach will help ensure that our model is well-structured and efficient.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">buildModel</span>():</span>
  model=keras.models.Sequential([
      keras.layers.Dense(<span class="hljs-number">32</span>,activation=<span class="hljs-string">'relu'</span>,input_shape=[<span class="hljs-number">4</span>]),
      keras.layers.Dense(<span class="hljs-number">32</span>,activation=<span class="hljs-string">'relu'</span>),
      keras.layers.Dense(<span class="hljs-number">1</span>)
      ])
  model.compile(optimizer=<span class="hljs-string">'rmsprop'</span>,loss=<span class="hljs-string">'mean_squared_error'</span>,metrics=[<span class="hljs-string">'accuracy'</span>,<span class="hljs-string">'MeanAbsoluteError'</span>,<span class="hljs-string">'MeanSquaredError'</span>])

  <span class="hljs-keyword">return</span> model
</code></pre>
<h3 id="heading-create-model-and-call-fit-for-100-epochs">Create Model and call fit for 100 epochs</h3>
<pre><code class="lang-python">model=buildModel()
history = model.fit(scaledTrainX, trainY, epochs=<span class="hljs-number">100</span>, validation_split=<span class="hljs-number">0.2</span>)
</code></pre>
<pre><code class="lang-python"><span class="hljs-number">404</span>/<span class="hljs-number">404</span> ━━━━━━━━━━━━━━━━━━━━ <span class="hljs-number">1</span>s <span class="hljs-number">3</span>ms/step - MeanAbsoluteError: <span class="hljs-number">7.9093</span> - MeanSquaredError: <span class="hljs-number">103.3424</span> - accuracy: <span class="hljs-number">0.0156</span> - loss: <span class="hljs-number">103.3424</span> - val_MeanAbsoluteError: <span class="hljs-number">7.8313</span> - val_MeanSquaredError: <span class="hljs-number">102.0163</span> - val_accuracy: <span class="hljs-number">0.0130</span> - val_loss: <span class="hljs-number">102.0163</span>
Epoch <span class="hljs-number">99</span>/<span class="hljs-number">100</span>
<span class="hljs-number">404</span>/<span class="hljs-number">404</span> ━━━━━━━━━━━━━━━━━━━━ <span class="hljs-number">1</span>s <span class="hljs-number">3</span>ms/step - MeanAbsoluteError: <span class="hljs-number">7.9773</span> - MeanSquaredError: <span class="hljs-number">104.4646</span> - accuracy: <span class="hljs-number">0.0166</span> - loss: <span class="hljs-number">104.4646</span> - val_MeanAbsoluteError: <span class="hljs-number">7.8514</span> - val_MeanSquaredError: <span class="hljs-number">102.4821</span> - val_accuracy: <span class="hljs-number">0.0124</span> - val_loss: <span class="hljs-number">102.4821</span>
Epoch <span class="hljs-number">100</span>/<span class="hljs-number">100</span>
<span class="hljs-number">404</span>/<span class="hljs-number">404</span> ━━━━━━━━━━━━━━━━━━━━ <span class="hljs-number">1</span>s <span class="hljs-number">3</span>ms/step - MeanAbsoluteError: <span class="hljs-number">7.8406</span> - MeanSquaredError: <span class="hljs-number">102.1461</span> - accuracy: <span class="hljs-number">0.0168</span> - loss: <span class="hljs-number">102.1461</span> - val_MeanAbsoluteError: <span class="hljs-number">7.8153</span> - val_MeanSquaredError: <span class="hljs-number">101.8889</span> - val_accuracy: <span class="hljs-number">0.0133</span> - val_loss: <span class="hljs-number">101.8889</span>
</code></pre>
<p><strong>MeanAbsoluteError: 7.8406 - MeanSquaredError: 102.1461 - accuracy: 0.0168 - loss: 102.1461 - val_MeanAbsoluteError: 7.8153 - val_MeanSquaredError: 101.8889 - val_accuracy: 0.0133 - val_loss: 101.8889</strong></p>
<p>We are about to proceed with evaluating the model by utilizing our test dataset to assess its performance and accuracy.</p>
<pre><code class="lang-python">test_loss, mae, test_acc, mse = model.evaluate(scaledTestX, testY)
</code></pre>
<pre><code class="lang-python"><span class="hljs-number">126</span>/<span class="hljs-number">126</span> ━━━━━━━━━━━━━━━━━━━━ <span class="hljs-number">0</span>s <span class="hljs-number">2</span>ms/step - MeanAbsoluteError: <span class="hljs-number">7.3928</span> - MeanSquaredError: <span class="hljs-number">93.3554</span> - accuracy: <span class="hljs-number">0.0240</span> - loss: <span class="hljs-number">93.3554</span>
</code></pre>
<h3 id="heading-lets-plot-training-and-validation-loss-using-matplotlib">Lets plot training and validation loss using Matplotlib</h3>
<pre><code class="lang-python">history_dict = history.history
loss_values = history_dict[<span class="hljs-string">"loss"</span>]
val_loss_values = history_dict[<span class="hljs-string">"val_loss"</span>]
epochs = range(<span class="hljs-number">1</span>, len(loss_values) + <span class="hljs-number">1</span>)
plt.plot(epochs, loss_values, <span class="hljs-string">"bo"</span>, label=<span class="hljs-string">"Training loss"</span>)
plt.plot(epochs, val_loss_values, <span class="hljs-string">"r"</span>, label=<span class="hljs-string">"Validation loss"</span>)
plt.title(<span class="hljs-string">"Training and validation loss"</span>)
plt.xlabel(<span class="hljs-string">"Epochs"</span>)
plt.ylabel(<span class="hljs-string">"Loss"</span>)
plt.legend()
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748826254151/ddf8a1ee-020b-4c07-84ad-865aa09b2985.png" alt class="image--center mx-auto" /></p>
<pre><code class="lang-python">history_dict = history.history
</code></pre>
<h3 id="heading-lets-also-plot-meanabsoluteerror">Lets also plot MeanAbsoluteError</h3>
<pre><code class="lang-python">abs_error = history_dict[<span class="hljs-string">"MeanAbsoluteError"</span>]
val_abs_error = history_dict[<span class="hljs-string">"val_MeanAbsoluteError"</span>]
epochs = range(<span class="hljs-number">1</span>, len(abs_error) + <span class="hljs-number">1</span>)
plt.plot(epochs, abs_error, <span class="hljs-string">"bo"</span>, label=<span class="hljs-string">"Training  mean absolute error"</span>)
plt.plot(epochs, val_abs_error, <span class="hljs-string">"r"</span>, label=<span class="hljs-string">"Validation mean absolute error"</span>)
plt.title(<span class="hljs-string">"Training and validation mean absolute error"</span>)
plt.xlabel(<span class="hljs-string">"Epochs"</span>)
plt.ylabel(<span class="hljs-string">"Loss"</span>)
plt.legend()
plt.show()
</code></pre>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748826333211/6caf1aa3-1ad2-4c3e-be4f-2867d952ba91.png" alt class="image--center mx-auto" /></p>
<p>jupyter notebook. for this project can be found at <a target="_blank" href="https://github.com/learner14/DeepLearning/tree/main/BasketballGame">BaskeBallGame Prediction</a></p>
]]></content:encoded></item><item><title><![CDATA[Agentic AI Plan and Execute using LangChain]]></title><description><![CDATA[Agentic AI refers to artificial intelligence systems that possess a degree of autonomy and decision-making capability, allowing them to act independently in specific contexts. Unlike traditional AI, which typically follows predetermined rules and alg...]]></description><link>https://path2ml.com/agentic-ai-plan-and-execute-using-langchain</link><guid isPermaLink="true">https://path2ml.com/agentic-ai-plan-and-execute-using-langchain</guid><category><![CDATA[Deep Learning]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[agentic AI]]></category><category><![CDATA[langchain]]></category><category><![CDATA[DeepLearning]]></category><category><![CDATA[RAG ]]></category><category><![CDATA[llm]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Sat, 31 May 2025 18:40:55 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748712185587/2f26e695-c5ef-4934-8318-d245091e3135.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Agentic AI</strong> refers to artificial intelligence systems that possess a degree of autonomy and decision-making capability, allowing them to act independently in specific contexts. Unlike traditional AI, which typically follows predetermined rules and algorithms, Agentic AI can evaluate situations, assess potential outcomes, and make choices based on its understanding of the environment and objectives. This includes the ability to adapt to new information and improve its performance over time.</p>
<p><strong>Agentic AI</strong> systems are designed to perform tasks that require a level of judgment and reasoning, effectively enabling them to engage in complex interactions or to solve problems that were not explicitly programmed into them. This concept raises important discussions surrounding ethics, accountability, and the implications of delegating decision-making power to machines, as well as the potential impact on industries such as healthcare, transportation, and robotics.</p>
<p><strong>Plan-and-execute</strong> agents utilize a language model <strong>(LLM)</strong> to develop detailed task plans, which are then carried out by a separate execution agent. This collaborative approach allows for more sophisticated task management, where the LLM generates strategies and instructions while the execution agent focuses on implementing the tasks effectively.</p>
<p>The strategy is composed of two key elements. The first element is a planner, which leverages the reasoning capabilities of large language models <strong>(LLMs)</strong> to develop a comprehensive plan by outlining specific steps. The second element is an executor, responsible for interpreting the steps outlined by the planner. This executor identifies the essential tools, resources, or actions required to successfully carry out each step of the plan. Together, these components work in harmony to ensure effective execution of tasks.</p>
<p>Develop a comprehensive plan and implement an agent strategy utilizing the <strong>LangChain</strong> framework. This involves outlining the objectives, defining the role of the agent, selecting appropriate tools and libraries, and executing a series of steps to ensure successful interaction with language models. Additionally, it includes monitoring performance and making adjustments as necessary to optimize outcomes.</p>
<p>The whole work flow is shown below</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748714496376/cb38228b-b7e2-49a0-bdb7-81e68d0df2d2.png" alt class="image--center mx-auto" /></p>
<p>We will develop this application as a RESTful endpoint using FastAPI, a modern web framework for building APIs with Python. In a previous blog post, I outlined the process of creating and deploying a FastAPI endpoint <a target="_blank" href="https://path2ml.com/rag-chatbot-using-langchain-and-openai">RAG_CHATBOT</a>, which I recommend checking out for background information.</p>
<p>For handling HTTP requests, we will utilize the popular <code>requests</code> library, which simplifies the process of sending requests and receiving responses. To extract and parse news articles efficiently, we will employ the <code>newspaper</code> package, a powerful tool designed for web scraping and article extraction.</p>
<p>Once we retrieve the articles, we will store them in DeepLake, a vector database optimized for managing embeddings. This method allows us to organize and access the articles in a format that enhances retrieval and analysis, as demonstrated in the accompanying workflow.</p>
<p>Load the libraries</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fastapi <span class="hljs-keyword">import</span> FastAPI
<span class="hljs-keyword">import</span> os
<span class="hljs-comment">#from langchain.embeddings.openai import OpenAIEmbeddings</span>
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> OpenAIEmbeddings
<span class="hljs-keyword">from</span> langchain_deeplake.vectorstores <span class="hljs-keyword">import</span> DeeplakeVectorStore
<span class="hljs-keyword">from</span> langchain.text_splitter <span class="hljs-keyword">import</span> CharacterTextSplitter, RecursiveCharacterTextSplitter

<span class="hljs-keyword">from</span> langchain_community.document_loaders <span class="hljs-keyword">import</span> SeleniumURLLoader
<span class="hljs-keyword">from</span> langchain_community.document_loaders <span class="hljs-keyword">import</span> WebBaseLoader
<span class="hljs-keyword">from</span> langchain_core.prompts <span class="hljs-keyword">import</span> PromptTemplate
<span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI
<span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">from</span>  newspaper <span class="hljs-keyword">import</span> Article
<span class="hljs-keyword">import</span> time
<span class="hljs-keyword">from</span> langchain_core.tools <span class="hljs-keyword">import</span> Tool
<span class="hljs-keyword">from</span> langchain_experimental.plan_and_execute <span class="hljs-keyword">import</span> PlanAndExecute, load_agent_executor, load_chat_planner
</code></pre>
<h3 id="heading-save-the-openai-and-deeplake-keys">Save the OpenAI and DeepLake keys</h3>
<pre><code class="lang-python">
os.environ[<span class="hljs-string">"OPENAI_API_KEY"</span>] = <span class="hljs-string">'Key_Here'</span>
os.environ[<span class="hljs-string">"ACTIVELOOP_TOKEN"</span>] = <span class="hljs-string">'Key_Here'</span>
</code></pre>
<p>After completing the initial setup, the next step is to create the foundational structure for your FastAPI application. This involves establishing the basic framework that will support your API's functionality, including defining the directory structure, setting up configuration files, and initializing the main application instance. This skeleton will serve as the groundwork for implementing endpoints, integrating middleware, and managing dependencies as you develop your project further.</p>
<h3 id="heading-fastapi">FastAPI</h3>
<pre><code class="lang-python">app=FastAPI(
    title=<span class="hljs-string">"Langchain Server"</span>,
    version=<span class="hljs-string">"1.0"</span>,
    decsription=<span class="hljs-string">"A simple API Server"</span>

)
</code></pre>
<p>Now we will write code in below function for agents calling LLM and executing plan and execute pattern.</p>
<pre><code class="lang-python"><span class="hljs-meta">@app.get("/chat/")</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">root</span>(<span class="hljs-params">query:str</span>):</span>
     <span class="hljs-keyword">return</span> {response}
</code></pre>
<h3 id="heading-custom-tool">Custom Tool</h3>
<p>We will develop a function that defines our custom tool within an agentic AI framework, utilizing Langchain. This function will be designed to efficiently retrieve relevant documents from a Deep Lake database. The database will contain previously fetched, parsed, and saved documents, enabling the AI to access and deliver pertinent information based on user queries or tasks. This process will enhance our AI's ability to provide accurate and contextually relevant responses by leveraging the structured data stored in Deep Lake.</p>
<pre><code class="lang-python">
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">retrieve_n_docs_tool</span>(<span class="hljs-params">query: str,</span>) -&gt; str:</span>
    <span class="hljs-string">"""Searches for relevant documents that may contain the answer to the query."""</span>
    embeddings=OpenAIEmbeddings(model=<span class="hljs-string">'text-embedding-ada-002'</span>)
    db = DeeplakeVectorStore(dataset_path=<span class="hljs-string">"./my_deeplake/"</span>, embedding_function=embeddings, overwrite=<span class="hljs-literal">True</span>)
    <span class="hljs-comment"># Get the retriever object from the deep lake db object and set the number</span>
    <span class="hljs-comment"># of retrieved documents to 3</span>
    retriever = db.as_retriever()
    retriever.search_kwargs[<span class="hljs-string">'k'</span>] = <span class="hljs-number">3</span>
    <span class="hljs-comment"># We define some variables that will be used inside our custom tool</span>
    CUSTOM_TOOL_DOCS_SEPARATOR =<span class="hljs-string">"\n---------------\n"</span> <span class="hljs-comment"># how to join together the retrieved docs to form a single string</span>
    docs = retriever.get_relevant_documents(query)
    texts = [doc.page_content <span class="hljs-keyword">for</span> doc <span class="hljs-keyword">in</span> docs]
    texts_merged = <span class="hljs-string">"---------------\n"</span> + CUSTOM_TOOL_DOCS_SEPARATOR.join(texts) + <span class="hljs-string">"\n---------------"</span>
    <span class="hljs-keyword">return</span> texts_merged
</code></pre>
<p>In the function @app.get("/chat/"), we will implement the code necessary to retrieve content from specified URLs and subsequently store this data in the DeepLake database. To do this, we will first fetch the content from each URL. Next, we will utilize the RecursiveCharacterTextSplitter to divide the retrieved text into manageable chunks. These smaller segments will allow for easier processing and analysis. Finally, we will generate embeddings for each chunk and save them securely in the DeepLake database for future retrieval and use.</p>
<pre><code class="lang-python"> headers = {
        <span class="hljs-string">'User-Agent'</span>: <span class="hljs-string">'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'</span>
    }   

    article_urls = [
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/05/23/meta-open-source-speech-ai-models-support-over-1100-languages/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/05/18/beijing-launches-campaign-against-ai-generated-misinformation/"</span>
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/05/16/openai-ceo-ai-regulation-is-essential/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/05/15/jay-migliaccio-ibm-watson-on-leveraging-ai-to-improve-productivity/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/05/15/iurii-milovanov-softserve-how-ai-ml-is-helping-boost-innovation-and-personalisation/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/05/11/ai-and-big-data-expo-north-america-begins-in-less-than-one-week/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/05/11/eu-committees-green-light-ai-act/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/05/09/wozniak-warns-ai-will-power-next-gen-scams/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/05/09/infocepts-ceo-shashank-garg-on-the-da-market-shifts-and-impact-of-ai-on-data-analytics/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/05/02/ai-godfather-warns-dangers-and-quits-google/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/04/28/palantir-demos-how-ai-can-used-military/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/04/26/ftc-chairwoman-no-ai-exemption-to-existing-laws/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/04/24/bill-gates-ai-teaching-kids-literacy-within-18-months/"</span>,
        <span class="hljs-string">"https://www.artificialintelligence-news.com/2023/04/21/google-creates-new-ai-division-to-challenge-openai/"</span>
    ]

    session=requests.Session()
    pages_content = [] <span class="hljs-comment"># where we save the scraped articles</span>
    <span class="hljs-keyword">for</span> url <span class="hljs-keyword">in</span> article_urls:
        <span class="hljs-keyword">try</span>:
            time.sleep(<span class="hljs-number">2</span>) <span class="hljs-comment"># sleep two seconds for gentle scraping</span>
            response = session.get(url, headers=headers, timeout=<span class="hljs-number">10</span>)

            <span class="hljs-keyword">if</span> response.status_code == <span class="hljs-number">200</span>:
                article = Article(url)
                article.download() <span class="hljs-comment"># download HTML of webpage</span>
                article.parse() <span class="hljs-comment"># parse HTML to extract the article text</span>
                pages_content.append({ <span class="hljs-string">"url"</span>: url, <span class="hljs-string">"text"</span>: article.text })
            <span class="hljs-keyword">else</span>:
                print(<span class="hljs-string">f"Failed to fetch article at <span class="hljs-subst">{url}</span>"</span>)
        <span class="hljs-keyword">except</span> Exception <span class="hljs-keyword">as</span> e:
            print(<span class="hljs-string">f"Error occurred while fetching article at <span class="hljs-subst">{url}</span>: <span class="hljs-subst">{e}</span>"</span>)

    <span class="hljs-comment">#If an error occurs while fetching an article, we catch the exception and print</span>
    <span class="hljs-comment">#an error message. This ensures that even if one article fails to download,</span>
    <span class="hljs-comment">#the rest of the articles can still be processed.</span>


    embeddings=OpenAIEmbeddings(model=<span class="hljs-string">'text-embedding-ada-002'</span>)

    db = DeeplakeVectorStore(dataset_path=<span class="hljs-string">"./my_deeplake/"</span>, embedding_function=embeddings, overwrite=<span class="hljs-literal">True</span>)

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=<span class="hljs-number">1000</span>, chunk_overlap=<span class="hljs-number">100</span>)
    all_texts = []
    <span class="hljs-keyword">for</span> d <span class="hljs-keyword">in</span> pages_content:
            chunks = text_splitter.split_text(d[<span class="hljs-string">"text"</span>])
            <span class="hljs-keyword">for</span> chunk <span class="hljs-keyword">in</span> chunks:
                all_texts.append(chunk)

    ids = db.add_texts(all_texts)
</code></pre>
<p>We are developing a tool within the Langchain framework that leverages the "retrieve_n_docs_tool" function. This function is designed to access and extract content stored in the DeepLake database, enabling us to efficiently retrieve and utilize relevant information as needed. By implementing this tool, we aim to streamline the process of accessing our data, ensuring that users can quickly obtain the necessary documents for their tasks.</p>
<pre><code class="lang-python">   tools = [
        Tool(
            name=<span class="hljs-string">"Search Private Docs"</span>,
            func=retrieve_n_docs_tool,
            description=<span class="hljs-string">"useful for when you need to answer questions about current events about Artificial Intelligence"</span>
        )
    ]
</code></pre>
<p>We will create a planning agent and an execution agent tailored to our specific dataset. This involves configuring the agents to effectively interpret and process the data, ensuring they can execute tasks efficiently and accurately</p>
<pre><code class="lang-python">model = ChatOpenAI(model_name=<span class="hljs-string">"gpt-3.5-turbo"</span>, temperature=<span class="hljs-number">0</span>)

    planner = load_chat_planner(model)
    executor = load_agent_executor(model, tools, verbose=<span class="hljs-literal">True</span>)
    agent = PlanAndExecute(planner=planner, executor=executor, verbose=<span class="hljs-literal">True</span>)
    response = agent.run(query)
</code></pre>
<h3 id="heading-in-browser-type-url-1270018000chatqueryhttp1270018000chatquerywrite-an-overview-of-artificial-intelligence-regulations-by-governments-by-country"><strong>In browser type url</strong> <a target="_blank" href="http://127.0.0.1:8000/chat/?query="><strong>127.0.0.1:8000/chat/?query=</strong></a><strong>”Write an overview of Artificial Intelligence regulations by governments by country”</strong></h3>
<p>Plan gets created with multiple steps by calling LLM</p>
<pre><code class="lang-python">
&gt; Entering new PlanAndExecute chain...
steps=[Step(value=<span class="hljs-string">'Research and gather information on Artificial Intelligence regulations 
by governments in different countries.'</span>), Step(value=<span class="hljs-string">'Organize the information by country, 
including details such as key regulations, policies, and guidelines related to Artificial 
Intelligence.'</span>), Step(value=<span class="hljs-string">'Summarize the regulations for each country in a concise 
manner.'</span>), Step(value=<span class="hljs-string">'Include any recent updates or developments in the field of 
Artificial Intelligence regulations.'</span>), Step(value=<span class="hljs-string">'Provide a comparison of the regulations
 across different countries, highlighting similarities and differences.'</span>), 
Step(value=<span class="hljs-string">'Check for any official government sources or reputable publications to verify
 the accuracy of the information.'</span>), Step(value=<span class="hljs-string">'Compile the overview in a clear and 
structured format for easy understanding.'</span>), Step(value=<span class="hljs-string">"Review the overview to ensure it is
 comprehensive and up-to-date.\nGiven the above steps taken, please respond to the user's 
original question. \n"</span>)]
</code></pre>
<p>Each of this steps are executed using our custom tool function we wrote</p>
<pre><code class="lang-python">*****

Step: Research <span class="hljs-keyword">and</span> gather information on Artificial Intelligence regulations by governments <span class="hljs-keyword">in</span> different countries.

Response: I will now search the private documents <span class="hljs-keyword">for</span> information on Artificial Intelligence regulations by governments <span class="hljs-keyword">in</span> different countries to assist the user <span class="hljs-keyword">with</span> their research objective.

&gt; Entering new AgentExecutor chain...
Thought: The user needs assistance <span class="hljs-keyword">in</span> organizing information on Artificial Intelligence regulations by country. I can help by searching the private documents <span class="hljs-keyword">for</span> relevant details on key regulations, policies, <span class="hljs-keyword">and</span> guidelines related to Artificial Intelligence <span class="hljs-keyword">in</span> different countries.

Action:
```
{
  <span class="hljs-string">"action"</span>: <span class="hljs-string">"Search Private Docs"</span>,
  <span class="hljs-string">"action_input"</span>: {<span class="hljs-string">"type"</span>: <span class="hljs-string">"Artificial Intelligence regulations by country"</span>}
}
```
</code></pre>
<p>The agent will systematically compile a comprehensive overview of artificial intelligence regulations by analyzing and synthesizing a variety of documents stored in the DeepLake database. This process will involve multiple iterations to ensure that all relevant information is captured and accurately represented, drawing from diverse sources to provide a well-rounded understanding of the current regulatory landscape surrounding AI technology.</p>
]]></content:encoded></item><item><title><![CDATA[Implementing XGBoost using Scikit]]></title><description><![CDATA[XGBoost, which stands for Extreme Gradient Boosting, is an advanced machine learning algorithm that is widely used for regression, classification, and ranking tasks. It is particularly known for its speed and performance, making it a popular choice i...]]></description><link>https://path2ml.com/implementing-xgboost-using-scikit</link><guid isPermaLink="true">https://path2ml.com/implementing-xgboost-using-scikit</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Xgboost]]></category><category><![CDATA[Deep Learning]]></category><category><![CDATA[decisiontree]]></category><category><![CDATA[MachineLearning]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Wed, 28 May 2025 21:59:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748391930218/d211ea49-f283-499e-8bc5-b778cc7ea3be.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>XGBoost, which stands for Extreme Gradient Boosting, is an advanced machine learning algorithm that is widely used for regression, classification, and ranking tasks. It is particularly known for its speed and performance, making it a popular choice in data science competitions and practical applications.</p>
<p>At its core, XGBoost is based on the concept of boosting, which is an ensemble learning technique. This approach combines the predictions from multiple weak learners, typically decision trees, to create a strong predictive model. The primary idea behind boosting is to focus on the instances that previous models misclassified, thereby sequentially improving the model's performance.</p>
<h3 id="heading-key-features-of-xgboost">Key Features of XGBoost:</h3>
<ol>
<li><p><strong>Regularization</strong>: XGBoost includes a regularization term in its objective function, which helps to prevent overfitting. This feature differentiates it from many other boosting algorithms, as it adds both L1 (Lasso) and L2 (Ridge) penalties. This allows for more flexibility in managing model complexity and improves generalization on unseen data.</p>
</li>
<li><p><strong>Handling Missing Values</strong>: One of the standout features of XGBoost is its ability to handle missing data internally. It does this by learning the best direction to handle missing values during training, making it robust against incomplete datasets.</p>
</li>
<li><p><strong>Parallel Processing</strong>: Unlike traditional gradient boosting algorithms that build trees sequentially, XGBoost leverages parallel processing to speed up the training process. It does this by building trees one level at a time, allowing the algorithm to construct trees much more quickly than its predecessors.</p>
</li>
<li><p><strong>Tree Pruning</strong>: Instead of the standard pre-pruning method used in decision trees, XGBoost employs maximum depth for tree construction and then prunes the trees backwards (post-pruning). This helps to optimize the tree structure and improve overall performance.</p>
</li>
<li><p><strong>Scalability</strong>: XGBoost is designed to be highly scalable. It can handle large datasets and can be run on distributed systems, making it suitable for modern data processing needs.</p>
</li>
</ol>
<h3 id="heading-usage">Usage:</h3>
<p>To use XGBoost, data scientists typically follow these steps:</p>
<ol>
<li><p><strong>Data Preparation</strong>: Clean and preprocess the dataset, addressing missing values and converting categorical variables as necessary.</p>
</li>
<li><p><strong>Model Configuration</strong>: Set parameters for the XGBoost model. This includes specifying the learning rate, the number of trees to create, maximum depth, regularization parameters, and evaluation metrics.</p>
</li>
<li><p><strong>Training</strong>: Train the model on the training dataset while monitoring performance on a validation dataset to avoid overfitting.</p>
</li>
<li><p><strong>Prediction</strong>: After training, the model is used to make predictions on new data.</p>
</li>
<li><p><strong>Evaluation</strong>: Finally, the model’s predictions are evaluated using appropriate metrics (e.g., accuracy, RMSE, F1 score).</p>
</li>
</ol>
<p>XGBoost has become a go-to algorithm due to its impressive performance across a variety of tasks, and its ability to produce predictive models that are both accurate and efficient. Given its flexibility and robustness, it has gained immense popularity in the machine learning community, making it a critical tool for practitioners.</p>
<h3 id="heading-adult-data-set">Adult Data set</h3>
<p>The Adult dataset, frequently referred to as the Census Income dataset, is a widely recognized collection of data utilized for binary classification tasks in machine learning. This dataset contains information from the U.S. Census and includes various attributes such as age, education level, occupation, and marital status, among others. <strong>The primary objective when working with this dataset is to predict whether an individual's income exceeds $50,000 per year based on these features.</strong> Its rich variety of demographic information and clear binary target variable make it an excellent resource for testing algorithms and exploring concepts in classification and predictive modeling.</p>
<p>Lets start with importing the libraries we will use in this</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> sklearn.datasets <span class="hljs-keyword">import</span> fetch_openml
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> GridSearchCV, train_test_split
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> OrdinalEncoder, LabelEncoder
<span class="hljs-keyword">from</span> sklearn.compose <span class="hljs-keyword">import</span> ColumnTransformer
<span class="hljs-keyword">from</span> xgboost <span class="hljs-keyword">import</span> XGBClassifier
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> collections <span class="hljs-keyword">import</span> Counter
</code></pre>
<p>We will obtain the Adult dataset by utilizing the <code>fetch_openml</code> function from the scikit-learn library. This function allows us to easily download and load the dataset from OpenML, a platform that hosts various machine learning datasets. By using this method, we can access the data in a structured format, making it convenient for further analysis and modeling tasks.</p>
<h2 id="heading-goal">Goal</h2>
<p>Predicting whether an individual’s income exceeds $50,000 per year.</p>
<h2 id="heading-load-the-adult-dataset">Load the Adult dataset</h2>
<pre><code class="lang-python">adult = fetch_openml(<span class="hljs-string">'adult'</span>, as_frame=<span class="hljs-literal">True</span>)
X, y = adult.data, adult.target
</code></pre>
<p>Lets print the shape and contents of this loaded data</p>
<h3 id="heading-print-key-information-about-the-dataset">Print key information about the dataset</h3>
<pre><code class="lang-python">
print(<span class="hljs-string">f"Dataset shape: <span class="hljs-subst">{X.shape}</span>"</span>)
print(<span class="hljs-string">f"Features: <span class="hljs-subst">{adult.feature_names}</span>"</span>)
print(<span class="hljs-string">f"Target variable: <span class="hljs-subst">{adult.target_names}</span>"</span>)
print(<span class="hljs-string">f"Class distributions: <span class="hljs-subst">{Counter(y)}</span>"</span>)
</code></pre>
<pre><code class="lang-python">Dataset shape: (<span class="hljs-number">48842</span>, <span class="hljs-number">14</span>)
Features: [<span class="hljs-string">'age'</span>, <span class="hljs-string">'workclass'</span>, <span class="hljs-string">'fnlwgt'</span>, <span class="hljs-string">'education'</span>, <span class="hljs-string">'education-num'</span>, 
<span class="hljs-string">'marital-status'</span>, <span class="hljs-string">'occupation'</span>, <span class="hljs-string">'relationship'</span>, <span class="hljs-string">'race'</span>, <span class="hljs-string">'sex'</span>, <span class="hljs-string">'capitalgain'</span>, <span class="hljs-string">'capitalloss'</span>,
 <span class="hljs-string">'hoursperweek'</span>, <span class="hljs-string">'native-country'</span>]
Target variable: [<span class="hljs-string">'class'</span>]
Class distributions: Counter({<span class="hljs-string">'&lt;=50K'</span>: <span class="hljs-number">37155</span>, <span class="hljs-string">'&gt;50K'</span>: <span class="hljs-number">11687</span>})
</code></pre>
<p>Lets look at some of the data</p>
<pre><code class="lang-python">X.head(<span class="hljs-number">5</span>)
</code></pre>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td><strong>age</strong></td><td><strong>workclass</strong></td><td><strong>fnlwgt</strong></td><td><strong>education</strong></td><td><strong>education-num</strong></td><td><strong>marital-status</strong></td><td><strong>occupation</strong></td><td><strong>relationship</strong></td><td><strong>race</strong></td><td><strong>sex</strong></td><td><strong>capitalgain</strong></td><td><strong>capitalloss</strong></td><td><strong>hoursperweek</strong></td><td><strong>native-country</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>0</strong></td><td>2</td><td>State-gov</td><td>77516</td><td>Bachelors</td><td>13</td><td>Never-married</td><td>Adm-clerical</td><td>Not-in-family</td><td>White</td><td>Male</td><td>1</td><td>0</td><td>2</td><td>United-States</td></tr>
<tr>
<td><strong>1</strong></td><td>3</td><td>Self-emp-not-inc</td><td>83311</td><td>Bachelors</td><td>13</td><td>Married-civ-spouse</td><td>Exec-managerial</td><td>Husband</td><td>White</td><td>Male</td><td>0</td><td>0</td><td>0</td><td>United-States</td></tr>
<tr>
<td><strong>2</strong></td><td>2</td><td>Private</td><td>215646</td><td>HS-grad</td><td>9</td><td>Divorced</td><td>Handlers-cleaners</td><td>Not-in-family</td><td>White</td><td>Male</td><td>0</td><td>0</td><td>2</td><td>United-States</td></tr>
<tr>
<td><strong>3</strong></td><td>3</td><td>Private</td><td>234721</td><td>11th</td><td>7</td><td>Married-civ-spouse</td><td>Handlers-cleaners</td><td>Husband</td><td>Black</td><td>Male</td><td>0</td><td>0</td><td>2</td><td>United-States</td></tr>
<tr>
<td><strong>4</strong></td><td>1</td><td>Private</td><td>338409</td><td>Bachelors</td><td>13</td><td>Married-civ-spouse</td><td>Prof-specialty</td><td>Wife</td><td>Black</td><td>Female</td><td>0</td><td>0</td><td>2</td><td>Cuba</td></tr>
</tbody>
</table>
</div><p>We will utilize the Scikit-learn library to transform categorical features into integer codes. This process involves using techniques such as label encoding or one-hot encoding, which allow us to convert string values representing categories into numerical formats. This transformation is crucial for machine learning models, as they typically perform better with numerical data. By encoding these categorical variables, we ensure that our models can effectively interpret and learn from the input data.</p>
<p>From above data we can see following columns are categorical</p>
<pre><code class="lang-python">nominal = [<span class="hljs-string">'workclass'</span>, <span class="hljs-string">'education'</span>, <span class="hljs-string">'marital-status'</span>, <span class="hljs-string">'occupation'</span>, <span class="hljs-string">'relationship'</span>,
 <span class="hljs-string">'race'</span>, <span class="hljs-string">'sex'</span>, <span class="hljs-string">'native-country'</span>]
</code></pre>
<p>We will utilize the <code>ColumnTransformer</code> from the scikit-learn library to construct a data transformation pipeline. This pipeline will process categorical columns using the <code>OrdinalEncoder</code>, which will convert these categorical values into numerical format while preserving their ordinal relationships. For the remaining columns in the dataset, we will apply the 'passthrough' option, allowing those features to be retained without any transformation. This approach ensures that our preprocessing is tailored to the specific needs of both categorical and numerical data within our dataset.</p>
<pre><code class="lang-python">transformer = ColumnTransformer(transformers=[(<span class="hljs-string">'ordinal'</span>, OrdinalEncoder(), nominal)],
 remainder=<span class="hljs-string">'passthrough'</span>)
</code></pre>
<h3 id="heading-perform-ordinal-encoding">Perform ordinal encoding</h3>
<pre><code class="lang-python">X = transformer.fit_transform(X)
</code></pre>
<h3 id="heading-labelencoder"><strong>LabelEncoder</strong></h3>
<p>The LabelEncoder is a utility in data preprocessing that transforms categorical target labels into a numerical format suitable for machine learning models. It converts each unique label into an integer value ranging from 0 to n_classes - 1, where n_classes represents the total number of distinct categories present in the target variable. This encoding technique is particularly useful for classification problems, as many machine learning algorithms require numerical input rather than categorical data.</p>
<p>When using the LabelEncoder, it is important to apply it exclusively to the target labels, not to the features. This ensures that the transformation accurately reflects the classes without altering the structure of the input data. The LabelEncoder can also assist in normalizing the labels, making them more manageable for algorithms that rely on numerical computations. By encoding the labels in this way, models can learn the underlying patterns in the data effectively, leading to better performance on predictive tasks.</p>
<pre><code class="lang-python">y = LabelEncoder().fit_transform(y)
</code></pre>
<h3 id="heading-we-will-split-the-data-into-train-and-test-sets">we will split the data into train and test sets</h3>
<pre><code class="lang-python">
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class="hljs-number">0.2</span>, random_state=<span class="hljs-number">42</span>, stratify=y)
</code></pre>
<p>In order to optimize the performance of an XGBoost model, it's essential to establish a detailed parameter grid that encompasses a variety of hyperparameters. This grid will allow for a systematic exploration of different configurations to identify the best combination for our specific dataset.</p>
<ol>
<li><p><strong>Learning Rate (eta)</strong>: This controls the contribution of each tree. Values typically range from 0.01 to 0.3.</p>
</li>
<li><p><strong>Maximum Depth (max_depth)</strong>: Defines the maximum depth of a tree in the ensemble. Common values are between 3 and 10.</p>
</li>
<li><p><strong>Subsample</strong>: This parameter represents the fraction of samples to be used for each tree. It usually takes values between 0.5 and 1.0.</p>
</li>
<li><p><strong>Colsample_bytree</strong>: The fraction of features to consider when building each tree, typically ranging from 0.3 to 1.</p>
</li>
<li><p><strong>Number of Estimators (n_estimators)</strong>: The number of trees to be created in the boosting process, commonly set between 100 and 1000.</p>
</li>
</ol>
<p>By methodically defining this parameter grid, we can employ techniques like grid search or random search to uncover the optimal hyperparameter settings that enhance the model's predictive capabilities.</p>
<pre><code class="lang-python">param_grid = {
    <span class="hljs-string">'max_depth'</span>: [<span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>],
    <span class="hljs-string">'learning_rate'</span>: [<span class="hljs-number">0.1</span>, <span class="hljs-number">0.01</span>, <span class="hljs-number">0.05</span>],
    <span class="hljs-string">'n_estimators'</span>: [<span class="hljs-number">50</span>, <span class="hljs-number">100</span>, <span class="hljs-number">200</span>],
    <span class="hljs-string">'subsample'</span>: [<span class="hljs-number">0.8</span>, <span class="hljs-number">1.0</span>],
    <span class="hljs-string">'colsample_bytree'</span>: [<span class="hljs-number">0.8</span>, <span class="hljs-number">1.0</span>]
}
</code></pre>
<p>We create XGBClassifier</p>
<pre><code class="lang-python">model = XGBClassifier(objective=<span class="hljs-string">'binary:logistic'</span>, random_state=<span class="hljs-number">42</span>, n_jobs=<span class="hljs-number">1</span>)
</code></pre>
<p>The <strong>"binary:logistic"</strong> objective function in <strong>XGBoost</strong> is specifically designed for binary classification tasks, where the target variable consists of two distinct classes or outcomes. In this context, it focuses on predicting which of the two classes a given instance belongs to. The optimization process targets the log loss function, which measures the performance of a classification model whose output is a probability value between 0 and 1.</p>
<p>By utilizing the log loss function, this objective effectively quantifies how far off the predicted probabilities are from the actual class labels. This makes "binary:logistic" particularly suitable for applications where understanding the likelihood of an instance belonging to a specific class is crucial.</p>
<h3 id="heading-perform-grid-search">Perform grid search</h3>
<p>To conduct a thorough optimization of our model's hyperparameters, we will utilize the GridSearchCV method from the <strong>scikit-learn</strong> library. This approach involves specifying a range of values for each hyperparameter in the <code>param_grid</code> dictionary. We will create an instance of GridSearchCV, passing in our model as the <code>estimator</code>, alongside the defined parameter grid. Additionally, we will set the <code>cv</code> parameter to 3 to implement three-fold cross-validation during the search process. To leverage all available CPU cores for efficiency, we will assign <code>n_jobs</code> a value of -1. After setting up the grid search configuration, we will fit the model on our training dataset, <code>X_train</code> and <code>y_train</code>, which will allow the algorithm to explore and identify the optimal combination of hyperparameter settings based on cross-validated performance.</p>
<pre><code class="lang-python">grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=<span class="hljs-number">3</span>, n_jobs=<span class="hljs-number">-1</span>)
grid_search.fit(X_train, y_train)
</code></pre>
<h3 id="heading-print-best-score-and-parameters">Print best score and parameters</h3>
<pre><code class="lang-python">
print(<span class="hljs-string">f"Best score: <span class="hljs-subst">{grid_search.best_score_:<span class="hljs-number">.3</span>f}</span>"</span>)
print(<span class="hljs-string">f"Best parameters: <span class="hljs-subst">{grid_search.best_params_}</span>"</span>)
</code></pre>
<pre><code class="lang-python">Best score: <span class="hljs-number">0.859</span>
Best parameters: {<span class="hljs-string">'colsample_bytree'</span>: <span class="hljs-number">1.0</span>, <span class="hljs-string">'learning_rate'</span>: <span class="hljs-number">0.1</span>, <span class="hljs-string">'max_depth'</span>: <span class="hljs-number">5</span>, 
<span class="hljs-string">'n_estimators'</span>: <span class="hljs-number">100</span>, <span class="hljs-string">'subsample'</span>: <span class="hljs-number">0.8</span>}
</code></pre>
<h3 id="heading-access-the-best-model-from-gridsearch">Access the best model from grid_search</h3>
<p>To obtain the optimal model from the grid search results, we can access the best estimator by referencing the <code>best_estimator_</code> attribute of the <code>grid_search</code> object. This attribute contains the model that achieved the highest performance based on the evaluation criteria set during the grid search process.</p>
<pre><code class="lang-python">
best_model = grid_search.best_estimator_
</code></pre>
<h3 id="heading-save-the-best-model">Save the best model</h3>
<pre><code class="lang-python">
best_model.save_model(<span class="hljs-string">'best_model_adult.ubj'</span>)
</code></pre>
<h3 id="heading-now-we-load-the-saved-model">Now we load the saved model</h3>
<pre><code class="lang-python">
loaded_model = XGBClassifier()
loaded_model.load_model(<span class="hljs-string">'best_model_adult.ubj'</span>)
</code></pre>
<p>To generate predictions using the trained model that has been previously loaded into memory, we will apply it to the test dataset. This is done by calling the <code>predict</code> method on the loaded model and passing in the features from the test set, denoted as <code>X_test</code>. The output will be a set of predictions based on the input data.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Use loaded model for predictions</span>
predictions = loaded_model.predict(X_test)
</code></pre>
<h3 id="heading-print-the-accuracy-score">Print the accuracy score</h3>
<p>To evaluate the performance of the model, we will calculate the accuracy score using the test dataset. The accuracy score is determined by comparing the predicted labels generated by the loaded model against the actual labels in the test set. We can achieve this by applying the <code>score</code> method on the <code>loaded_model</code>, passing in <code>X_test</code> as the input features and <code>y_test</code> as the corresponding true labels. This will yield a numerical value representing the proportion of correctly predicted instances in the test data.</p>
<pre><code class="lang-python">
accuracy = loaded_model.score(X_test, y_test)
print(<span class="hljs-string">f"Accuracy: <span class="hljs-subst">{accuracy:<span class="hljs-number">.3</span>f}</span>"</span>)
</code></pre>
<pre><code class="lang-python">Accuracy: <span class="hljs-number">0.862</span>
</code></pre>
<p>Pretty good accuracy</p>
]]></content:encoded></item><item><title><![CDATA[Implementing Random Forest using Scikit learn]]></title><description><![CDATA[The Random Forest algorithm is an ensemble learning method primarily used for classification and regression tasks. It operates by constructing a multitude of decision trees during training and outputs the mode of the classes (for classification) or t...]]></description><link>https://path2ml.com/implementing-random-forest-using-scikit-learn</link><guid isPermaLink="true">https://path2ml.com/implementing-random-forest-using-scikit-learn</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Random Forest]]></category><category><![CDATA[scikit learn]]></category><category><![CDATA[Decision Tree]]></category><category><![CDATA[MachineLearning]]></category><category><![CDATA[algorithms]]></category><dc:creator><![CDATA[Nitin Sharma]]></dc:creator><pubDate>Mon, 26 May 2025 13:49:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748218932369/8bed66d6-ece1-49a5-8d4b-1a5e12cf1326.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The <strong>Random Forest algorithm</strong> is an ensemble learning method primarily used for classification and regression tasks. It operates by constructing a multitude of decision trees during training and outputs the mode of the classes (for classification) or the mean prediction (for regression) of the individual trees.</p>
<p>The process begins by creating various subsets of the training data through a technique known as bootstrap sampling, where random samples (with replacement) are drawn from the dataset. For each of these subsets, a decision tree is built. Unlike a standard decision tree that considers all features when making splits, Random Forest introduces an additional layer of randomness by only selecting a random subset of features at each split, which helps to enhance the diversity among the trees.</p>
<p>This diversity among the trees reduces the risk of overfitting, which is a common problem in single decision trees. After all trees are constructed, the final output is determined through majority voting for classification tasks or averaging for regression tasks. Random Forest is valued for its robustness, high accuracy, and ability to handle large datasets with higher dimensionality while maintaining computational efficiency. Additionally, it provides insights into feature importance, allowing for better understanding and interpretability of the model's predictions. To better understand Random forest first we need to understand <strong>Decision tree algorithm.</strong></p>
<h2 id="heading-decision-tree-algorithm-of-machine-learning">Decision Tree algorithm of Machine Learning</h2>
<p>The Decision Tree algorithm is a popular machine learning method used for classification and regression tasks. It models decisions in a tree-like structure, where each internal node represents a feature (or attribute), each branch corresponds to a decision rule, and each leaf node represents an outcome (or class label). The decision tree is constructed through an iterative process where the goal is to partition the input space in a way that maximizes the homogeneity of the resulting subsets.</p>
<p>Key concepts involved in building a decision tree include:</p>
<ol>
<li><p><strong>Entropy</strong>: Entropy is a measure of the disorder or uncertainty in a set of data. In the context of decision trees, it quantifies the impurity or randomness of the class labels in a dataset. The formula for calculating entropy H(S) for a set ( S ) with class labels is given by:</p>
</li>
<li><p>\( H(S) = - \sum_{i=1}^{c} p_i \log_2 p_i \)</p>
<p> where \(p_i \) is the proportion of class ( i ) in the dataset and ( c ) is the total number of classes. Lower entropy indicates that the data is more pure and homogeneous, while higher entropy signifies more mixed data.</p>
</li>
<li><p><strong>Information Gain</strong>: Information Gain measures the reduction in entropy after a dataset is split based on a particular feature. It helps to identify which feature best separates the classes. The Information Gain ( IG(S, A) ) for a feature ( A ) is calculated as:</p>
<p> entropy(parent) – [average entropy(children)]</p>
<p> \(    IG(S, A) = H(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} H(S_v)    \)</p>
<p> where \(S_v \) is the subset of \(S \) where feature ( A ) takes on value ( v ). The feature with the highest Information Gain is chosen for the split as it provides the clearest separation of classes.</p>
</li>
<li><p><strong>Gini Impurity</strong>: Gini Impurity is an alternative metric for measuring the quality of a split in a decision tree. It assesses how often a randomly chosen element would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. The Gini Impurity Gini(S) ) for a set ( S ) is calculated as:</p>
<p> \(Gini(S) = 1 - \sum_{i=1}^{c} p_i^2 \)</p>
<p> Like entropy, a lower Gini Impurity indicates a more homogeneous subset. Decision trees can be constructed using Gini Impurity as the criterion for splitting nodes, typically yielding faster results compared to entropy.</p>
</li>
</ol>
<p>Decision Trees utilize Entropy or Gini Impurity as criteria to decide on the best features to split the dataset, aiming to create a model that accurately represents the underlying patterns of the data while fostering interpretability and ease of use.</p>
<h3 id="heading-step-by-step-calculation-of-information-gain">Step-by-Step Calculation of Information Gain</h3>
<ol>
<li><p><strong>Calculate the Entropy of the Whole Dataset</strong>: Let's say we have a dataset of 10 instances with the following classes: 6 positive instances (Yes) and 4 negative instances (No). The formula for entropy (H) is given by:</p>
<p> \( H(S) = - \sum (p_i \cdot \log_2(p_i)) \)</p>
<p> Here, \( p_i \) is the proportion of each class in the dataset.</p>
<ul>
<li><p>For our dataset:</p>
<ul>
<li><p>Proportion of Yes \( p_Y\)\= 6/10 = 0.6</p>
</li>
<li><p>Proportion of No \(p_N\) = 4/10 = 0.4</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<p>    Substituting these values into the entropy formula:</p>
<p>    \(H(S) = - (0.6 \cdot \log_2(0.6) + 0.4 \cdot \log_2(0.4)) \approx 0.971 \)</p>
<p><strong>Split the Dataset based on a Feature</strong>: Suppose we have a feature called "Weather" with three possible outcomes: Sunny, Rainy, and Overcast.</p>
<ul>
<li><p>Let's say it splits our dataset into:</p>
<ul>
<li><p>Sunny: 3 Yes, 1 No (4 instances)</p>
</li>
<li><p>Rainy: 2 Yes, 2 No (4 instances)</p>
</li>
<li><p>Overcast: 1 Yes, 1 No (2 instances)</p>
</li>
</ul>
</li>
</ul>
<ol>
<li><p><strong>Calculate the Entropy for Each Subset</strong>: For each subset, we calculate the entropy.</p>
<ul>
<li><p><strong>Sunny</strong>:</p>
<ul>
<li><p>Proportion of Yes \(p_Y\)\= 3/4 = 0.75</p>
</li>
<li><p>Proportion of No \(P_N\) = 1/4 = 0.25</p>
<p>  \(  H(Sunny) \approx - (0.75 \cdot \log_2(0.75) + 0.25 \cdot \log_2(0.25)) \approx 0.811 \)</p>
</li>
</ul>
</li>
<li><p><strong>Rainy</strong>:</p>
<ul>
<li><p>Proportion of Yes \(P_Y\)\= 2/4 = 0.5</p>
</li>
<li><p>Proportion of No \(P_N\)\= 2/4 = 0.5</p>
</li>
</ul>
</li>
</ul>
</li>
</ol>
<p>    \(H(Rainy) \approx - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1.0 \)</p>
<ul>
<li><p><strong>Overcast</strong>:</p>
<ul>
<li><p>Proportion of Yes \(P_Y\)\= 1/2 = 0.5</p>
</li>
<li><p>Proportion of No \(P_N\)\= 1/2 = 0.5</p>
<p>  \( H(Overcast) \approx 1.0 \)</p>
</li>
</ul>
</li>
</ul>
<ol start="2">
<li><p><strong>Calculate the Weighted Average Entropy of Subsets</strong>: Now, we need to find the weighted average entropy based on the size of each subset:</p>
<p> \(H(Feature) = \frac{4}{10} \cdot H(Sunny) + \frac{4}{10} \cdot H(Rainy) + \frac{2}{10} \cdot H(Overcast) = \frac{4}{10} \cdot 0.811 + \frac{4}{10} \cdot 1.0 + \frac{2}{10} \cdot 1.0 \approx 0.825\)</p>
</li>
<li><p><strong>Calculate Information Gain</strong>: Finally, we compute the Information Gain by subtracting the weighted average entropy of the feature from the original entropy:</p>
<p> \(IG = H(S) - H(Feature) = 0.971 - 0.825 \approx 0.146 \)</p>
</li>
</ol>
<p>The Information Gain tells us how much information about the classification of the dataset is provided by the "Weather" feature. In this case, the Information Gain of approximately 0.146 indicates that the "Weather" feature does provide some useful information, making it a good candidate for splitting the dataset in decision tree algorithms. For each feature IG is calculated to find the root of the tree.</p>
<p>Now Lets Implement <strong>Random Forest</strong> using a sample dataset and Scikit learn library.</p>
<h2 id="heading-problem-domain">Problem Domain</h2>
<p>To address the challenge of predicting whether the price of a New York City Airbnb listing will be above or below the average price, we utilize a tabular dataset containing information about various Airbnb listings in the city. Data set is available to down from</p>
<p><a target="_blank" href="https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data">https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data</a></p>
<p>Or it can be loaded from</p>
<pre><code class="lang-python">https://raw.githubusercontent.com/lmassaron/tabular_datasets/master/AB_NYC_2019.csv
</code></pre>
<h2 id="heading-lets-load-all-libraries">Lets Load all libraries</h2>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">from</span> sklearn.preprocessing <span class="hljs-keyword">import</span> OneHotEncoder, OrdinalEncoder
<span class="hljs-keyword">from</span> sklearn.compose <span class="hljs-keyword">import</span> ColumnTransformer
<span class="hljs-keyword">from</span> sklearn.impute <span class="hljs-keyword">import</span> SimpleImputer
<span class="hljs-keyword">from</span> sklearn.ensemble <span class="hljs-keyword">import</span> BaggingClassifier
<span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeClassifier
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score
<span class="hljs-keyword">from</span> sklearn.pipeline <span class="hljs-keyword">import</span> Pipeline
<span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> make_scorer, accuracy_score
<span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> KFold, cross_validate
<span class="hljs-keyword">from</span> sklearn.ensemble <span class="hljs-keyword">import</span> RandomForestClassifier
</code></pre>
<p>Now lets load the data and take a look at it.</p>
<pre><code class="lang-python">data = pd.read_csv(<span class="hljs-string">"https://raw.githubusercontent.com/lmassaron/tabular_datasets/master/AB_NYC_2019.csv"</span>)
</code></pre>
<p>1 to 5 of 5 entriesFilter</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td><strong>index</strong></td><td><strong>id</strong></td><td><strong>name</strong></td><td><strong>host_id</strong></td><td><strong>host_name</strong></td><td><strong>neighbourhood_group</strong></td><td><strong>neighbourhood</strong></td><td><strong>latitude</strong></td><td><strong>longitude</strong></td><td><strong>room_type</strong></td><td><strong>price</strong></td><td><strong>minimum_nights</strong></td><td><strong>number_of_reviews</strong></td><td><strong>last_review</strong></td><td><strong>reviews_per_month</strong></td><td><strong>calculated_host_listings_count</strong></td><td><strong>availability_365</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>0</strong></td><td>2539</td><td>Clean &amp; quiet apt home by the park</td><td>2787</td><td>John</td><td>Brooklyn</td><td>Kensington</td><td>40.64749</td><td>-73.97237</td><td>Private room</td><td>149</td><td>1</td><td>9</td><td>2018-10-19</td><td>0.21</td><td>6</td><td>365</td></tr>
<tr>
<td><strong>1</strong></td><td>2595</td><td>Skylit Midtown Castle</td><td>2845</td><td>Jennifer</td><td>Manhattan</td><td>Midtown</td><td>40.75362</td><td>-73.98377</td><td>Entire home/apt</td><td>225</td><td>1</td><td>45</td><td>2019-05-21</td><td>0.38</td><td>2</td><td>355</td></tr>
<tr>
<td><strong>2</strong></td><td>3647</td><td>THE VILLAGE OF HARLEM....NEW YORK !</td><td>4632</td><td>Elisabeth</td><td>Manhattan</td><td>Harlem</td><td>40.80902</td><td>-73.9419</td><td>Private room</td><td>150</td><td>3</td><td>0</td><td>NaN</td><td>NaN</td><td>1</td><td>365</td></tr>
<tr>
<td><strong>3</strong></td><td>3831</td><td>Cozy Entire Floor of Brownstone</td><td>4869</td><td>LisaRoxanne</td><td>Brooklyn</td><td>Clinton Hill</td><td>40.68514</td><td>-73.95976</td><td>Entire home/apt</td><td>89</td><td>1</td><td>270</td><td>2019-07-05</td><td>4.64</td><td>1</td><td>194</td></tr>
<tr>
<td><strong>4</strong></td><td>5022</td><td>Entire Apt: Spacious Studio/Loft by central park</td><td>7192</td><td>Laura</td><td>Manhattan</td><td>East Harlem</td><td>40.79851</td><td>-73.94399</td><td>Entire home/apt</td><td>80</td><td>10</td><td>9</td><td>2018-11-19</td><td>0.1</td><td>1</td><td>0</td></tr>
</tbody>
</table>
</div><h3 id="heading-list-of-features-to-be-excluded-from-the-analysis">List of features to be excluded from the analysis</h3>
<p><em>A list of features that should be excluded from the analysis, such as unique identifiers and text features</em></p>
<pre><code class="lang-python">excluding_list = [<span class="hljs-string">'price'</span>, <span class="hljs-string">'id'</span>, <span class="hljs-string">'latitude'</span>, <span class="hljs-string">'longitude'</span>, <span class="hljs-string">'host_id'</span>,
                  <span class="hljs-string">'last_review'</span>, <span class="hljs-string">'name'</span>, <span class="hljs-string">'host_name'</span>]
</code></pre>
<p>Lets take a look at the categorical features</p>
<pre><code class="lang-python">categorical = [<span class="hljs-string">'neighbourhood_group'</span>, <span class="hljs-string">'neighbourhood'</span>, <span class="hljs-string">'room_type'</span>]

data[categorical].nunique()
</code></pre>
<pre><code class="lang-python"><span class="hljs-number">0</span>
neighbourhood_group    <span class="hljs-number">5</span>
neighbourhood    <span class="hljs-number">221</span>
room_type    <span class="hljs-number">3</span>

dtype: int64
</code></pre>
<p>If we do one hot encoding of all categorical features it will create many columns and with too many Zeros creating problem during model training. So we divide into list of low-cardinality categorical features to be one-hot encoded and list of high-cardinality categorical features to be ordinally encoded. ‘low_card_categorical’ is a subset of categorical features that have a low cardinality (few unique values) and will be one-hot encoded. ‘high_card_categorical’ is a subset of categorical features that have a high cardinality (many unique values) and will be encoded using an ordinal encoding</p>
<pre><code class="lang-python">low_card_categorical = [<span class="hljs-string">'neighbourhood_group'</span>, <span class="hljs-string">'room_type'</span>] 
high_card_categorical = [<span class="hljs-string">'neighbourhood'</span>]
</code></pre>
<p>All required integer columns are added as. ‘continuous ‘ is a list of continuous numerical features that will be standardized for analysis</p>
<pre><code class="lang-python">continuous = [<span class="hljs-string">'minimum_nights'</span>, <span class="hljs-string">'number_of_reviews'</span>, <span class="hljs-string">'reviews_per_month'</span>,
              <span class="hljs-string">'calculated_host_listings_count'</span>, <span class="hljs-string">'availability_365'</span>]
</code></pre>
<p>whole data shape looks like</p>
<pre><code class="lang-python">data.shape
(<span class="hljs-number">48895</span>, <span class="hljs-number">16</span>)
</code></pre>
<p>We create binary targets, target_median, based on percentiles for classification purposes. It is important to note that our target_median is a balanced binary target, allowing us to safely use accuracy as an effective performance measurement. We find an almost equal number of cases for both the positive and negative classes when we count the values.</p>
<pre><code class="lang-python">target_median.value_counts()
</code></pre>
<pre><code class="lang-python">
count
price    
<span class="hljs-number">0</span>    <span class="hljs-number">24472</span>
<span class="hljs-number">1</span>    <span class="hljs-number">24423</span>

dtype: int64
</code></pre>
<p>In the context of the Scikit-learn library, a transformer object is a component that is designed to perform data transformation tasks as part of a machine learning pipeline. These objects are crucial for preprocessing data, including steps such as normalization, encoding categorical variables, or reducing dimensionality.</p>
<p>In the next step, we will develop a series of transformers designed to preprocess the data, ensuring it is well-prepared for the analysis required for this project. These transformers will help clean, organize, and transform the raw data into a format that facilitates more accurate and insightful analysis.</p>
<ul>
<li><p><strong>categorical_onehot_encoding</strong>: This transformer is designed to perform one-hot encoding on low-cardinality categorical features. It converts categorical variables into a format that can be provided to machine learning algorithms, effectively representing each category as a binary vector.</p>
</li>
<li><p><strong>categorical_ord_encoding</strong>: This transformer is tailored for high-cardinality categorical features and employs ordinal encoding. It assigns integer values to unique categories based on their order, making it suitable for situations where categories have a meaningful sequence.</p>
</li>
<li><p><strong>numeric_passthrough</strong>: This transformer serves a straightforward purpose: it passes continuous numerical features directly to the next stage in the data processing pipeline without any alteration. This ensures that the integrity of numerical data is maintained.</p>
</li>
<li><pre><code class="lang-python">  categorical_onehot_encoding = OneHotEncoder(handle_unknown=<span class="hljs-string">'ignore'</span>)
  categorical_ord_encoding = OrdinalEncoder(handle_unknown=<span class="hljs-string">"use_encoded_value"</span>, unknown_value=np.nan)
  numeric_passthrough = SimpleImputer(strategy=<span class="hljs-string">"constant"</span>, fill_value=<span class="hljs-number">0</span>)
</code></pre>
<p>  The code creates a ColumnTransformer object that is designed to manage different types of features in a dataset by applying specific transformations tailored to each subset. It uses one-hot encoding for categorical features with a limited number of unique values (low-cardinality), effectively converting them into a format that can be utilized by machine learning algorithms. Meanwhile, continuous numerical features are passed through without any modifications, preserving their original values.</p>
</li>
<li><p>The transformer is configured to exclude any features that are not explicitly included in the defined transformation steps, thus maintaining a clean and relevant set of output features. Additionally, the output feature names will be concise and clear, making it easier to interpret the results. To ensure the output is always in a usable format, the sparse_threshold parameter is set to zero, guaranteeing that the transformer will return dense arrays, regardless of the input data's sparsity.</p>
</li>
</ul>
<pre><code class="lang-python">column_transform = ColumnTransformer(
    [(<span class="hljs-string">'low_card_categories'</span>, categorical_onehot_encoding, low_card_categorical),
     (<span class="hljs-string">'high_card_categories'</span>, categorical_ord_encoding, high_card_categorical),
     (<span class="hljs-string">'numeric'</span>, numeric_passthrough, continuous),
    ],
    remainder=<span class="hljs-string">'drop'</span>,
    verbose_feature_names_out=<span class="hljs-literal">False</span>,
    sparse_threshold=<span class="hljs-number">0.0</span>)
</code></pre>
<p><strong>K-fold cross-validation</strong> is a powerful technique used to evaluate the performance of a machine learning model. It involves dividing the available training dataset into k distinct partitions or "folds." The process begins by training the model k times, where in each iteration, the model is trained on k-1 of these partitions while reserving the one remaining partition as a testing set. This means that each fold gets the opportunity to serve as the validation set once, allowing for a comprehensive assessment of the model’s performance.</p>
<p>Once all k models have been trained and evaluated, we calculate the average of the performance scores obtained from each fold. Additionally, we assess the standard deviation of these scores to gauge the consistency of the model’s performance across the different subsets of data. This statistical approach not only provides a more reliable estimate of how the model is likely to perform on unseen data but also quantifies the uncertainty surrounding this estimate, giving insights into the model's robustness and generalizability.</p>
<p>We are setting up a <strong>RandomForestClassifier</strong>, a popular ensemble learning method for classification tasks. In this implementation, we are utilizing 300 estimators, which means the model will build 300 individual decision trees to enhance overall predictive accuracy and robustness. Additionally, we have specified that the minimum number of samples required to be in a leaf node is 3. This parameter helps to prevent overfitting by ensuring that each leaf has a sufficient number of samples, promoting generalization in our model's predictions.</p>
<pre><code class="lang-python">accuracy = make_scorer(accuracy_score)
cv = KFold(<span class="hljs-number">5</span>, shuffle=<span class="hljs-literal">True</span>, random_state=<span class="hljs-number">0</span>)
model = RandomForestClassifier(n_estimators=<span class="hljs-number">300</span>,
                               min_samples_leaf=<span class="hljs-number">3</span>,
                               random_state=<span class="hljs-number">0</span>)

column_transform = ColumnTransformer(
    [(<span class="hljs-string">'categories'</span>, categorical_onehot_encoding, low_card_categorical),
     (<span class="hljs-string">'numeric'</span>, numeric_passthrough, continuous)],
    remainder=<span class="hljs-string">'drop'</span>,
    verbose_feature_names_out=<span class="hljs-literal">False</span>,
    sparse_threshold=<span class="hljs-number">0.0</span>)
</code></pre>
<p>A Scikit-learn pipeline that systematically applies data transformations to columns of a dataset, followed by the implementation of a Random Forest classifier model to facilitate the prediction or classification tasks. This pipeline allows for seamless integration of preprocessing steps, such as encoding categorical variables and scaling numerical features, leading into the training of the Random Forest model, thus streamlining the workflow for machine learning.</p>
<pre><code class="lang-python">model_pipeline = Pipeline(
    [(<span class="hljs-string">'processing'</span>, column_transform),
     (<span class="hljs-string">'modeling'</span>, model)]) <span class="hljs-comment">#C</span>
</code></pre>
<p>In our analysis, we utilize Scikit-learn's <code>cross_validate</code> function to perform a comprehensive five-fold cross-validation. This method involves segmenting our dataset into five distinct subsets or "folds." For each iteration, we train the model on four of these folds while using the remaining fold as a validation set. This process is repeated until each fold has been used as the validation set once. Throughout this procedure, we calculate and record the accuracy scores for each fold, allowing us to assess the performance of our defined machine learning pipeline more robustly. By averaging these accuracy scores across all five folds, we can obtain a reliable estimate of the model's overall effectiveness.</p>
<pre><code class="lang-python">cv_scores = cross_validate(estimator=model_pipeline,
                           X=data,
                           y=target_median,
                           scoring=accuracy,
                           cv=cv,
                           return_train_score=<span class="hljs-literal">True</span>,
                           return_estimator=<span class="hljs-literal">True</span>)
</code></pre>
<p>We retrieve the mean and standard deviation of the accuracy scores from cross-validation</p>
<pre><code class="lang-python">mean_cv = np.mean(cv_scores[<span class="hljs-string">'test_score'</span>])
std_cv = np.std(cv_scores[<span class="hljs-string">'test_score'</span>])
fit_time = np.mean(cv_scores[<span class="hljs-string">'fit_time'</span>])
score_time = np.mean(cv_scores[<span class="hljs-string">'score_time'</span>])
print(<span class="hljs-string">f"<span class="hljs-subst">{mean_cv:<span class="hljs-number">0.3</span>f}</span> (<span class="hljs-subst">{std_cv:<span class="hljs-number">0.3</span>f}</span>)"</span>,
      <span class="hljs-string">f"fit: <span class="hljs-subst">{fit_time:<span class="hljs-number">0.2</span>f}</span> secs pred: <span class="hljs-subst">{score_time:<span class="hljs-number">0.2</span>f}</span> secs"</span>)
</code></pre>
<pre><code class="lang-python"><span class="hljs-number">0.826</span> (<span class="hljs-number">0.004</span>) fit: <span class="hljs-number">13.86</span> secs pred: <span class="hljs-number">0.58</span> secs
</code></pre>
<p>We successfully implemented Random Forest algorithm using Scikit learn.</p>
]]></content:encoded></item></channel></rss>