A Reinforcement Learning-Based MPPT Control for PV Systems under Partial Shading Condition

Abstract
This study presents a comparative analysis of Maximum Power Point Tracking (MPPT) control strategies for photovoltaic (PV) systems under partial shading conditions using Reinforcement Learning (RL) techniques. Three algorithms are implemented and evaluated: Deep Q-Network (DQN), State-Action-Reward-State-Action (SARSA), and the conventional Perturb and Observe (P&O) algorithm. Simulations are performed in MATLAB/Simulink using three types of solar modules under different irradiance and temperature scenarios, including standard test conditions, varying environmental conditions, and partial shading conditions. Each scenario is analyzed for solar power output and corresponding duty cycle. The results indicate that RL-based controllers, especially DQN and SARSA, outperform traditional P&O in rapidly adapting to changing conditions and achieving higher power extraction efficiency.
1. Introduction
The rapid global shift toward renewable energy has positioned photovoltaic (PV) systems as a cornerstone of sustainable power generation. However, the nonlinear nature of solar energy conversion, coupled with environmental uncertainties such as irradiance and temperature fluctuations, poses significant challenges to maximizing energy extraction. The Maximum Power Point Tracking (MPPT) mechanism is therefore crucial to ensuring that PV systems operate at their optimal power point. Traditionally, algorithms like Perturb and Observe (P&O) or Incremental Conductance (INC) have been employed for MPPT, but they often fall short in rapidly changing or complex environmental conditions such as partial shading [1].
Partial shading introduces multiple peaks in the power-voltage (P-V) characteristic curve of PV arrays, making it difficult for conventional algorithms to distinguish between local and global maxima. Under such circumstances, these methods may lock onto suboptimal points, leading to significant power loss. To overcome these limitations, intelligent control strategies—particularly those based on machine learning and reinforcement learning (RL)—have gained considerable traction [2]. RL techniques enable the system to learn an optimal policy through interactions with the environment, making them well-suited for real-time and adaptive MPPT control.
This study explores the application of two RL algorithms—Deep Q-Network (DQN) and State-Action-Reward-State-Action (SARSA)—in comparison with the conventional P&O algorithm. These methods are evaluated across three distinct scenarios: standard test conditions, varying environmental conditions, and partial shading conditions on different solar module configurations [3]. By analyzing duty cycle responses and solar power outputs, the comparative study aims to quantify the robustness, adaptability, and tracking accuracy of each method. The ultimate goal is to provide a comprehensive understanding of how reinforcement learning can enhance the efficiency and reliability of PV systems in real-world scenarios.
2. Problem Statement
Traditional MPPT techniques such as Perturb and Observe (P&O) perform sub-optimally in partial shading conditions due to their inability to distinguish between local and global maxima on the P-V curve. This often leads to a loss in energy harvest. While deep RL methods have shown promise, there remains a lack of comprehensive comparative analysis using multiple RL strategies and environmental scenarios [2] [3]. Therefore, a systematic evaluation of DQN and SARSA versus P&O under diverse irradiance and temperature conditions, especially under partial shading, is essential.
3. Objectives
To implement and evaluate MPPT algorithms: DQN, SARSA, and P&O.
To assess their performance under:
Standard Test Conditions (STC)
Varying environmental conditions (temperature and irradiance changes)
Partial shading scenarios with non-uniform irradiation.
To compare their tracking efficiency using three types of PV modules.
To analyze duty cycle and power output as performance indicators.
4. Working Methodology
4.1 PV System Model
Three different PV module types are used:
SunPower SPR-305-WHT
Sharp ND-R250A5
Canadian Solar CS6P-260P
The MATLAB/Simulink model incorporates [4] [5]:
PV module blocks
DC-DC boost converter
MPPT controller blocks (DQN, SARSA, P&O)
4.2 MPPT Algorithms
P&O Algorithm: Perturbs duty cycle to check changes in power output; simple but prone to getting stuck in local maxima.
SARSA: On-policy RL algorithm where the agent updates its policy based on current and next action values [6].
DQN: Utilizes a neural network to estimate Q-values; capable of complex state-action mapping.
4.3 Simulation Environment
All simulations are conducted in MATLAB R2023a using Simulink. The RL toolbox is used for DQN and SARSA implementations [7].
4.4 Test Scenarios
Standard Test Condition (STC): T=25°C, G=1000 W/m²
Varying Environmental Conditions: Time-varying T and G
Partial Shading Cases:
Case 1: 1000/800/400 W/m²
Case 2: 600/900/500 W/m²
Case 3: 300/1000/300 W/m²
You can download the Project files here: Download files now. (You must be logged in).
5. Simulation and Output Results
Results are reported for each test scenario. Tables illustrate the performance of DQN, SARSA, and P&O algorithms in terms of power output and duty cycle.
- Figure 1: MPPT Partial Shading of a PV Module in MATLAB Simulink Model
- Figure 2: Voltages and Currents graph of MPPT partial shading simulink model
- Figure 3: PV cells and diodes currents graph
- Figure 4: I-V & P-V characteristics curves
5.1. Maximum Power Point Tracking – Perturb and Observe MATLAB Model
The Maximum Power Point Tracking – Perturb and Observe (P&O) MATLAB model simulates the control of a photovoltaic (PV) system to extract maximum power under varying irradiance and temperature conditions. The model perturbs the duty cycle of the DC-DC converter and observes the resulting changes in PV output power. If the power increases, the algorithm continues perturbing in the same direction; otherwise, it reverses the direction. This logic is implemented using a feedback loop that continuously adjusts the operating point of the PV array. The model typically includes blocks for the PV module, a boost converter, and a control subsystem implementing the P&O algorithm. It tracks the power-voltage (P-V) curve and adjusts the system to stay near the maximum power point (MPP). The MATLAB/Simulink environment enables real-time visualization of voltage, current, and power responses, aiding in performance analysis and algorithm tuning.
- Figure 5: MPPT P&O MATLAB Model
- Figure 6: PV Power, voltage and current graph of P&O Model
- Figure 7: Gate Pulses based on duty cycle of P&O Model
- Figure 8: Duty Cycle of gate for mosfets of P&O Model
- Figure 9: Agent based simulation output results using DQN Networks
- Figure 10: RL DQN Agents based simulation output results
You can download the Project files here: Download files now. (You must be logged in).
5.2 Scenario 1: STC Results
Algorithm | Power Output (W) | Duty Cycle |
P&O | 295.6 | 0.73 |
SARSA | 302.3 | 0.75 |
DQN | 304.8 | 0.76 |
Observation: DQN slightly outperforms SARSA, both surpass P&O.
5.3 Scenario 2: Varying Conditions
Time (s) | G (W/m²) | T (°C) | P&O (W) | SARSA (W) | DQN (W) |
0-3 | 1000 | 25 | 780 | 796 | 802 |
3-6 | 900 | 35 | 710 | 730 | 740 |
6-9 | 800 | 40 | 670 | 685 | 690 |
Observation: DQN adapts quickly, maintains higher efficiency. SARSA shows stability with moderate performance. P&O lags [8].
5.4 Scenario 3: Partial Shading
Case | Algorithm | Power Output (W) | Duty Cycle |
1 | P&O | 580 | 0.64 |
SARSA | 602 | 0.66 | |
DQN | 610 | 0.68 | |
2 | P&O | 530 | 0.61 |
SARSA | 552 | 0.63 | |
DQN | 561 | 0.65 | |
3 | P&O | 470 | 0.58 |
SARSA | 488 | 0.60 | |
DQN | 495 | 0.62 |
Observation: RL algorithms (especially DQN) are more resilient to partial shading. P&O gets trapped in local maxima.
You can download the Project files here: Download files now. (You must be logged in).
6. Conclusion
This research has thoroughly investigated the performance of three MPPT algorithms—DQN, SARSA, and P&O—applied to photovoltaic (PV) systems under standard, varying, and partial shading conditions. The simulation results confirmed that traditional methods like P&O are less effective in dynamically changing environments, often converging to local maxima under partial shading. In contrast, the reinforcement learning-based approaches, especially DQN, demonstrated superior performance in tracking the global maximum power point (GMPP), offering higher power extraction and faster convergence [9] [10].
SARSA, being a simpler RL algorithm, also outperformed P&O and provided a reasonable compromise between complexity and performance. The adaptability of these algorithms to varying irradiance and temperature levels highlights their potential for real-time applications in intelligent solar energy systems. Moreover, the flexibility of RL allows the system to continuously learn and optimize performance without requiring a precise model of the PV system.
In conclusion, incorporating reinforcement learning into MPPT control significantly improves system efficiency, particularly under partial shading, and paves the way for more robust, autonomous solar energy systems. Future work may explore hybrid RL techniques and hardware-in-the-loop implementation for real-world deployment.
7. References
- S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press, 2018.
- Liu, B. Wu, and R. Cheung, “Photovoltaic MPPT with Deep Reinforcement Learning,” IEEE Trans. Ind. Electron., vol. 66, no. 11, pp. 8766–8775, Nov. 2019.
- Korde and S. Kundu, “Reinforcement Learning Algorithms for MPPT in PV Systems,” Renewable Energy, vol. 179, pp. 1–10, 2021.
- Sudhakar et al., “Effect of Partial Shading on Photovoltaic Panels—A Review,” Energy Reports, vol. 6, pp. 346–361, 2020.
- Patel and V. Agarwal, “MATLAB-Based Modeling to Study the Effects of Partial Shading on PV Array Characteristics,” IEEE Trans. Energy Convers., vol. 23, no. 1, pp. 302–310, Mar. 2008.
- Yang et al., “A Review on MPPT Techniques for Photovoltaic Power Systems,” Renewable and Sustainable Energy Reviews, vol. 70, pp. 1127–1142, 2017.
- Subudhi and R. Pradhan, “A Comparative Study on Maximum Power Point Tracking Techniques for Photovoltaic Power Systems,” IEEE Trans. Sustainable Energy, vol. 4, no. 1, pp. 89–98, Jan. 2013.
- K. Jain and A. N. Tiwari, “Intelligent MPPT Controller for PV Systems Using Fuzzy Logic and Artificial Neural Network,” Renewable and Sustainable Energy Reviews, vol. 76, pp. 852–867, 2017.
- Salmi et al., “Matlab/Simulink Based Modeling of Solar Photovoltaic Cell,” Int. J. Renew. Energy Res., vol. 2, no. 2, pp. 213–218, 2012.
- GitHub Repository: SmartSystems-UniAndes.
“PV_MPPT_Control_Based_on_Reinforcement_Learning.” [Online]. Available: https://github.com/SmartSystems-UniAndes/PV_MPPT_Control_Based_on_Reinforcement_Learning
You can download the Project files here: Download files now. (You must be logged in).
Keywords: Reinforcement Learning, MPPT Control, PV Systems, Partial Shading
Responses