A Reinforcement Learning-Based MPPT Control for PV Systems under Partial Shading Condition

waqas javaid 28. June 2025

Abstract

This study presents a comparative analysis of Maximum Power Point Tracking (MPPT) control strategies for photovoltaic (PV) systems under partial shading conditions using Reinforcement Learning (RL) techniques. Three algorithms are implemented and evaluated: Deep Q-Network (DQN), State-Action-Reward-State-Action (SARSA), and the conventional Perturb and Observe (P&O) algorithm. Simulations are performed in MATLAB/Simulink using three types of solar modules under different irradiance and temperature scenarios, including standard test conditions, varying environmental conditions, and partial shading conditions. Each scenario is analyzed for solar power output and corresponding duty cycle. The results indicate that RL-based controllers, especially DQN and SARSA, outperform traditional P&O in rapidly adapting to changing conditions and achieving higher power extraction efficiency.

1. Introduction

The rapid global shift toward renewable energy has positioned photovoltaic (PV) systems as a cornerstone of sustainable power generation. However, the nonlinear nature of solar energy conversion, coupled with environmental uncertainties such as irradiance and temperature fluctuations, poses significant challenges to maximizing energy extraction. The Maximum Power Point Tracking (MPPT) mechanism is therefore crucial to ensuring that PV systems operate at their optimal power point. Traditionally, algorithms like Perturb and Observe (P&O) or Incremental Conductance (INC) have been employed for MPPT, but they often fall short in rapidly changing or complex environmental conditions such as partial shading [1].

Partial shading introduces multiple peaks in the power-voltage (P-V) characteristic curve of PV arrays, making it difficult for conventional algorithms to distinguish between local and global maxima. Under such circumstances, these methods may lock onto suboptimal points, leading to significant power loss. To overcome these limitations, intelligent control strategies—particularly those based on machine learning and reinforcement learning (RL)—have gained considerable traction [2]. RL techniques enable the system to learn an optimal policy through interactions with the environment, making them well-suited for real-time and adaptive MPPT control.

This study explores the application of two RL algorithms—Deep Q-Network (DQN) and State-Action-Reward-State-Action (SARSA)—in comparison with the conventional P&O algorithm. These methods are evaluated across three distinct scenarios: standard test conditions, varying environmental conditions, and partial shading conditions on different solar module configurations [3]. By analyzing duty cycle responses and solar power outputs, the comparative study aims to quantify the robustness, adaptability, and tracking accuracy of each method. The ultimate goal is to provide a comprehensive understanding of how reinforcement learning can enhance the efficiency and reliability of PV systems in real-world scenarios.

2. Problem Statement

Traditional MPPT techniques such as Perturb and Observe (P&O) perform sub-optimally in partial shading conditions due to their inability to distinguish between local and global maxima on the P-V curve. This often leads to a loss in energy harvest. While deep RL methods have shown promise, there remains a lack of comprehensive comparative analysis using multiple RL strategies and environmental scenarios [2] [3]. Therefore, a systematic evaluation of DQN and SARSA versus P&O under diverse irradiance and temperature conditions, especially under partial shading, is essential.

3. Objectives

To implement and evaluate MPPT algorithms: DQN, SARSA, and P&O.
To assess their performance under:

Standard Test Conditions (STC)
Varying environmental conditions (temperature and irradiance changes)
Partial shading scenarios with non-uniform irradiation.

To compare their tracking efficiency using three types of PV modules.
To analyze duty cycle and power output as performance indicators.

4. Working Methodology

4.1 PV System Model

Three different PV module types are used:

SunPower SPR-305-WHT
Sharp ND-R250A5
Canadian Solar CS6P-260P

The MATLAB/Simulink model incorporates [4] [5]:

PV module blocks
DC-DC boost converter
MPPT controller blocks (DQN, SARSA, P&O)

4.2 MPPT Algorithms

P&O Algorithm: Perturbs duty cycle to check changes in power output; simple but prone to getting stuck in local maxima.

SARSA: On-policy RL algorithm where the agent updates its policy based on current and next action values [6].

DQN: Utilizes a neural network to estimate Q-values; capable of complex state-action mapping.

4.3 Simulation Environment

All simulations are conducted in MATLAB R2023a using Simulink. The RL toolbox is used for DQN and SARSA implementations [7].

4.4 Test Scenarios

Standard Test Condition (STC): T=25°C, G=1000 W/m²
Varying Environmental Conditions: Time-varying T and G
Partial Shading Cases:

Case 1: 1000/800/400 W/m²
Case 2: 600/900/500 W/m²
Case 3: 300/1000/300 W/m²

You can download the Project files here: Download files now. (You must be logged in).

5. Simulation and Output Results

Results are reported for each test scenario. Tables illustrate the performance of DQN, SARSA, and P&O algorithms in terms of power output and duty cycle.

: Figure 1: MPPT Partial Shading of a PV Module in MATLAB Simulink Model

: Figure 2: Voltages and Currents graph of MPPT partial shading simulink model

: Figure 3: PV cells and diodes currents graph

: Figure 4: I-V & P-V characteristics curves

5.1. Maximum Power Point Tracking – Perturb and Observe MATLAB Model

The Maximum Power Point Tracking – Perturb and Observe (P&O) MATLAB model simulates the control of a photovoltaic (PV) system to extract maximum power under varying irradiance and temperature conditions. The model perturbs the duty cycle of the DC-DC converter and observes the resulting changes in PV output power. If the power increases, the algorithm continues perturbing in the same direction; otherwise, it reverses the direction. This logic is implemented using a feedback loop that continuously adjusts the operating point of the PV array. The model typically includes blocks for the PV module, a boost converter, and a control subsystem implementing the P&O algorithm. It tracks the power-voltage (P-V) curve and adjusts the system to stay near the maximum power point (MPP). The MATLAB/Simulink environment enables real-time visualization of voltage, current, and power responses, aiding in performance analysis and algorithm tuning.

: Figure 5: MPPT P&O MATLAB Model

: Figure 6: PV Power, voltage and current graph of P&O Model

: Figure 7: Gate Pulses based on duty cycle of P&O Model

: Figure 8: Duty Cycle of gate for mosfets of P&O Model

: Figure 9: Agent based simulation output results using DQN Networks

: Figure 10: RL DQN Agents based simulation output results

You can download the Project files here: Download files now. (You must be logged in).

5.2 Scenario 1: STC Results

Algorithm	Power Output (W)	Duty Cycle
P&O	295.6	0.73
SARSA	302.3	0.75
DQN	304.8	0.76

Observation: DQN slightly outperforms SARSA, both surpass P&O.

5.3 Scenario 2: Varying Conditions

Time (s)	G (W/m²)	T (°C)	P&O (W)	SARSA (W)	DQN (W)
0-3	1000	25	780	796	802
3-6	900	35	710	730	740
6-9	800	40	670	685	690

Observation: DQN adapts quickly, maintains higher efficiency. SARSA shows stability with moderate performance. P&O lags [8].

5.4 Scenario 3: Partial Shading

Case	Algorithm	Power Output (W)	Duty Cycle
1	P&O	580	0.64
	SARSA	602	0.66
	DQN	610	0.68
2	P&O	530	0.61
	SARSA	552	0.63
	DQN	561	0.65
3	P&O	470	0.58
	SARSA	488	0.60
	DQN	495	0.62

Observation: RL algorithms (especially DQN) are more resilient to partial shading. P&O gets trapped in local maxima.

You can download the Project files here: Download files now. (You must be logged in).

6. Conclusion

This research has thoroughly investigated the performance of three MPPT algorithms—DQN, SARSA, and P&O—applied to photovoltaic (PV) systems under standard, varying, and partial shading conditions. The simulation results confirmed that traditional methods like P&O are less effective in dynamically changing environments, often converging to local maxima under partial shading. In contrast, the reinforcement learning-based approaches, especially DQN, demonstrated superior performance in tracking the global maximum power point (GMPP), offering higher power extraction and faster convergence [9] [10].

SARSA, being a simpler RL algorithm, also outperformed P&O and provided a reasonable compromise between complexity and performance. The adaptability of these algorithms to varying irradiance and temperature levels highlights their potential for real-time applications in intelligent solar energy systems. Moreover, the flexibility of RL allows the system to continuously learn and optimize performance without requiring a precise model of the PV system.

In conclusion, incorporating reinforcement learning into MPPT control significantly improves system efficiency, particularly under partial shading, and paves the way for more robust, autonomous solar energy systems. Future work may explore hybrid RL techniques and hardware-in-the-loop implementation for real-world deployment.

7. References

S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press, 2018.
Liu, B. Wu, and R. Cheung, “Photovoltaic MPPT with Deep Reinforcement Learning,” IEEE Trans. Ind. Electron., vol. 66, no. 11, pp. 8766–8775, Nov. 2019.
Korde and S. Kundu, “Reinforcement Learning Algorithms for MPPT in PV Systems,” Renewable Energy, vol. 179, pp. 1–10, 2021.
Sudhakar et al., “Effect of Partial Shading on Photovoltaic Panels—A Review,” Energy Reports, vol. 6, pp. 346–361, 2020.
Patel and V. Agarwal, “MATLAB-Based Modeling to Study the Effects of Partial Shading on PV Array Characteristics,” IEEE Trans. Energy Convers., vol. 23, no. 1, pp. 302–310, Mar. 2008.
Yang et al., “A Review on MPPT Techniques for Photovoltaic Power Systems,” Renewable and Sustainable Energy Reviews, vol. 70, pp. 1127–1142, 2017.
Subudhi and R. Pradhan, “A Comparative Study on Maximum Power Point Tracking Techniques for Photovoltaic Power Systems,” IEEE Trans. Sustainable Energy, vol. 4, no. 1, pp. 89–98, Jan. 2013.
K. Jain and A. N. Tiwari, “Intelligent MPPT Controller for PV Systems Using Fuzzy Logic and Artificial Neural Network,” Renewable and Sustainable Energy Reviews, vol. 76, pp. 852–867, 2017.
Salmi et al., “Matlab/Simulink Based Modeling of Solar Photovoltaic Cell,” Int. J. Renew. Energy Res., vol. 2, no. 2, pp. 213–218, 2012.
GitHub Repository: SmartSystems-UniAndes.

“PV_MPPT_Control_Based_on_Reinforcement_Learning.” [Online]. Available: https://github.com/SmartSystems-UniAndes/PV_MPPT_Control_Based_on_Reinforcement_Learning

You can download the Project files here: Download files now. (You must be logged in).

Keywords: Reinforcement Learning, MPPT Control, PV Systems, Partial Shading

Do you need help with MATLAB code or in MATLAB Simulink? Don’t hesitate to contact our Tutors to receive professional and reliable guidance.

Categories: Programming

HIGH POWER EFFICIENCY DESIGN APPROACH OF AN LLC RESONANT CONVERTER FOR UPS BATTERY CHARGER APPLICATION AND BATTERY CHARGE – DISCHARGE REGRESSION MODEL

Author: Waqas Javaid

waqas javaid 28. June 2025

0 Comments

Real-Time Object Detection System especially Vehicle and Lane Detection using Yolo V4 algorithm Using MATLAB and Deep Learning

Abstract

This article presents the development and implementation of a Real-Time Object Detection System, focusing on Vehicle and Lane Detection using the YOLOv4 algorithm integrated within MATLAB and Deep Learning frameworks. The primary objective of this research is to design, simulate, and evaluate an intelligent driving assistance system capable of detecting vehicles, identifying lane markings, and performing basic trajectory planning and lane change control in a highway driving scenario. The proposed system leverages a pre-trained YOLOv4 model for robust and accurate vehicle detection in real-time video streams. Lane detection is achieved through image pre-processing techniques, including grayscale conversion, edge detection, and Hough transform-based lane line extraction. Furthermore, the system incorporates trajectory planning algorithms and a basic proportional lane change controller, enabling lateral position adjustments based on detected objects and lane boundaries. A key contribution of this work is the seamless integration of object detection and lane detection outputs with control algorithms to simulate decision-making in autonomous highway driving. The performance of the object detection module is quantitatively assessed using standard metrics such as precision, recall, mean Average Precision (mAP), false positives, and false negatives. Lane detection accuracy is evaluated through Intersection over Union (IoU) metrics, demonstrating reliable lane identification even in complex scenarios. The system’s inference time was optimized to meet real-time processing requirements, achieving an average frame processing speed compatible with autonomous driving applications. Visualizations of detected vehicles, lane boundaries, and trajectory adjustments were implemented to enhance interpretability and user understanding. The experimental results validate the efficiency of YOLOv4 in vehicle detection tasks within the MATLAB environment, achieving high precision and recall rates, and demonstrate the feasibility of integrating lane detection and control mechanisms for highway lane management. However, the study also highlights areas for future work, such as enhancing the realism of vehicle dynamics models, integrating advanced decision-making algorithms, and extending the system to more complex urban environments. This research offers a foundational framework for further exploration in the field of autonomous vehicle perception systems, contributing to the development of advanced driver assistance systems (ADAS) and autonomous navigation technologies.

waqas javaid 4. May 2025

2 Comments

Analysis and Hardware Implementation of 3-Level and 5-Level CHB Multilevel Inverters Using SPWM

Author: Waqas Javaid

ABSTRACT

In this report a brief review on multilevel inverters and different multilevel inverter topologies are discussed. Inverter is a power electronic device that converts DC power into AC power at desired output voltage and frequency. Multilevel inverters nowadays have become an interesting area in the field of industrial applications. This Project mainly involves analysis of Cascaded H-bridge topology, conduction loss and switching loss calculations, LC filter design and different SPWM modulation techniques. It also involves implementing 3-level CHB MLI with and without SPWM on the Arduino UNO board.

waqas javaid 6. June 2025

0 Comments

Simulating Renewable Energy Systems Using Simulink: A Practical Approach with Design a Large Battery Storage System

This MATLAB Simulink model presents the design and implementation of a Large Battery Energy Storage System (BESS) aimed at alleviating peak power demands in Colombo, Sri Lanka. The system utilizes grid-forming control to facilitate power injection during peak times and incorporates a battery management system (BMS) for efficient operation. Additionally, a photovoltaic (PV) system is integrated to supplement power generation. The model encompasses various components such as converters, filters, and controllers to regulate power flow and ensure seamless integration with the grid. Detailed simulations evaluate system performance, validating the effectiveness of peak shaving strategies and compliance with relevant industry standards like IEEE 1547-2018 and IEEE 2030.2.1-2019. Results indicate successful peak shaving functionality and highlight the impact of time delays on system dynamics.

waqas javaid 24. February 2025

0 Comments

MATLAB Simulink Model of design controllers for level and temperature in a reactor

Author: Waqas Javaid

Report layout and clarity

The detailed design and MATLAB Simulink simulation of control systems for regulating reactor process temperature and liquid level are shown in this research. Feedforward, cascade, open-loop, and single-loop feedback are the four control techniques discussed. By carefully building, fine-tuning, and assessing each methodology’s performance, the article sheds light on its advantages and disadvantages. A customized strategy based on a distinct physical environment (hot mixing tank) is also recommended in order to ensure academic originality when replicating the assignment.

Problem statement

In modern industrial settings, reactor operations are widely used in chemical and medical field. Due to this, it must be carefully regulated to ensure safety, maintain product quality, and optimize resource utilization. In most chemical and thermal reactors, two critical parameters that must be strictly regulated to avoid operating risks and inefficiencies are temperature and liquid level. Inadequate control may lead to hazards like as spillage, thermal runaway, or subpar chemical process performance. The goal of this project is to develop, simulate, and assess effective control strategies for these two variables using MATLAB Simulink. The research examines several control approaches, including as feedforward, cascade, feedback (single-loop), and open-loop systems, to determine which control strategies offer the most accurate and dependable performance under different disturbance scenarios [1] [2].

waqas javaid 6. June 2025

0 Comments