KR102870042B1

Movatterモバイル変換

Info

Publication number: KR102870042B1
Application number: KR1020220055564A
Authority: KR
Inventors: 김홍석; 정재익
Original assignee: 서강대학교산학협력단
Filing date: 2022-05-04
Publication date: 2025-10-13
Anticipated expiration: 2042-05-04

Abstract

Translated fromKorean

배터리를 활용한 수익 최대화를 위한 에너지 입찰 방법이 제공된다. 에너지 입찰 방법은 딥러닝 알고리즘을 이용하여 미래의 에너지 발전량 및 에너지 가격을 결정하는 단계; 강화학습 알고리즘을 이용하여 입찰값 및 충방전 조절 비율을 결정하는 단계; 및 상기 결정된 값들에 기초하여 에너지 입찰을 수행하는 단계를 포함할 수 있다.An energy bidding method is provided to maximize profits from battery utilization. The energy bidding method utilizes a deep learning algorithm to predict future energy generation. and energy prices Step of determining the bid value using a reinforcement learning algorithm and charge/discharge control ratio A step of determining; and a step of performing energy bidding based on the determined values.

Description

Translated fromKorean

배터리를 활용한 수익 최대화를 위한 에너지 입찰 방법{ENERGY BIDDING METHOD WITH BATTERY FOR MAXIMIZING THE PROFITS}Energy Bidding Method with Batteries for Maximizing Profits

본 발명은 배터리를 활용한 수익 최대화를 위한 에너지 입찰 방법에 관한 것이다.The present invention relates to an energy bidding method for maximizing profits using batteries.

발전된 에너지를 전력시장에 입찰하기 위해 배터리를 활용하여 최종 수익을 극대화하기 위한 방안이 연구되고 있다. 최종 수익을 최대화하기 위한 기술로, 차익거래를 활용하는 기술과 입찰 오차 보정을 활용하는 기술을 들 수 있다. 여기서, 차익거래는 에너지 가격이 낮을 때 충전하고 에너지 가격이 높을 때 방전하여 이익을 증대하는 방식이고, 입찰 오차 보정은 발전량보다 큰 입찰값(과대 입찰)만큼 방전하고, 발전량보다 작은 입찰값(과소 입찰)만큼 충전하여 오차 패널티를 감축하는 방식이다.Research is underway to maximize final revenue by utilizing batteries to bid on generated energy in the electricity market. Technologies for maximizing final revenue include arbitrage and bidding error correction. Arbitrage involves charging when energy prices are low and discharging when prices are high, thereby increasing profits. Bidding error correction involves discharging an amount exceeding the amount of generated energy (overbidding) and charging an amount below the amount of generated energy (underbidding), thereby reducing the error penalty.

차익거래를 활용하는 기술은 에너지 가격 불확실성만 고려하고 발전량의 불확실성을 고려하지 않은 경우가 대다수이며, 발전량의 불확실성을 고려했다 하더라도 입찰 정확도가 높다는 가정 하에 오차 보정을 고려하지 않았다. 즉, 기존 입찰값에 방전량만큼 더하거나 충전량만큼 뺀 값을 새로운 입찰값으로 만들어, 기존 오차(발전량 - 기존 입찰값)와 동일한 오차(발전량 + 방전량 - 충전량 - 새로운 입찰값)를 만드는 방식이다. 결과적으로 차익거래로 인해 이익은 증대하지만, 오차로 인한 패널티는 기존과 똑같은 양만큼 부여받게 된다.Most arbitrage techniques only consider energy price uncertainty, ignoring the uncertainty of power generation. Even when they do consider power generation uncertainty, they assume high bidding accuracy and fail to account for error correction. In other words, they create a new bid by adding or subtracting the amount of discharge from the original bid, resulting in an error (power generation + discharge - power generation - new bid) equal to the original error (power generation - original bid). Ultimately, while arbitrage increases profits, the penalty for the error remains the same.

한편, 입찰 오차 보정을 활용하는 기술은 발전량의 불확실성만 고려하고 에너지 가격의 불확실성을 고려하지 않는다. 재생에너지나 천연가스와 같이 미래 에너지 발전량의 불확실성이 있는 경우 발전량의 예측 기술이 사용되는데, 일반적으로 예측된 발전량을 입찰값으로 삼고 오차가 생긴 만큼 배터리로 보정(충방전)하는 전략을 사용한다. 또한 오차를 줄이는 것이 아닌 보정 가능한 오차를 만드는 전략으로 보정 후 오차를 획기적으로 감축하여 오차 패널티를 크게 줄이는 기술도 있다. 그러나 에너지 가격의 변동에 따른 차익거래는 고려되지 않았고, 배터리는 오차를 보정하기 위해서만 사용되고 있다.Meanwhile, technologies utilizing bid error correction only consider uncertainty in power generation and not in energy prices. In cases where future energy generation is uncertain, such as with renewable energy or natural gas, power generation prediction technologies are used. Typically, the predicted power generation is used as the bid price, and the resulting error is compensated for by battery charging and discharging. Furthermore, some technologies create a compensable error, rather than simply reducing the error. This significantly reduces the error penalty after correction. However, arbitrage due to energy price fluctuations is not considered, and batteries are used solely to compensate for the error.

본 발명이 해결하고자 하는 과제는, 발전된 에너지를 전력시장에 입찰할 때, 배터리를 활용하여 차익거래를 통한 이익 증대와 입찰 오차 보정을 통한 패널티 감축을 동시에 달성하여, 궁극적으로 최종 입찰 수익을 최대화할 수 있는 에너지 입찰 방법을 제공하는 것이다.The problem that the present invention seeks to solve is to provide an energy bidding method that can simultaneously increase profits through arbitrage by utilizing batteries and reduce penalties through bidding error correction when bidding generated energy in the electricity market, thereby ultimately maximizing final bidding profits.

본 발명의 일 실시 예에 따른 에너지 입찰 방법은, 에너지 입찰 방법은 딥러닝 알고리즘을 이용하여 미래의 에너지 발전량 및 에너지 가격을 결정하는 단계; 강화학습 알고리즘을 이용하여 입찰값 및 충방전 조절 비율을 결정하는 단계; 및 상기 결정된 값들에 기초하여 에너지 입찰을 수행하는 단계를 포함할 수 있다.An energy bidding method according to one embodiment of the present invention is a method for bidding energy using a deep learning algorithm to estimate future energy generation. and energy prices Step of determining the bid value using a reinforcement learning algorithm and charge/discharge control ratio A step of determining; and a step of performing energy bidding based on the determined values.

일 실시 예에서, 상기 방법은, 입찰 오차 보정 방식을 위해 과소 예측일 경우만큼 충전하고 과대 예측일 경우만큼 방전하는 단계를 더 포함할 수 있다.In one embodiment, the method is for under-predicting the bid error correction method. If it is charged as much and over-predicted, It may include further steps to discharge as much.

일 실시 예에서, 상기 방법은, 차익거래 방식을 위해 여기서 결정된 충방전량을의 비율로 추가로 조절하는 단계를 더 포함할 수 있다.In one embodiment, the method comprises determining the charge/discharge amount determined here for the arbitrage method. It may further include a step of further adjusting the ratio.

일 실시 예에서, 상기 강화학습 알고리즘은 마르코프 결정 과정(Markov Decision Process; MDP)에 기반한 강화학습 알고리즘을 포함할 수 있다.In one embodiment, the reinforcement learning algorithm may include a reinforcement learning algorithm based on a Markov Decision Process (MDP).

일 실시 예에서, 상기 MDP는 상태(state,), 행동(action,), 보상(reward,)을 포함하고, 시계열의 불확실한 상황에서는 관측(observation,)의 시계열 데이터들을 포함할 수 있다.In one embodiment, the MDP is a state (state, ), action, ), reward, ) and in the uncertain situation of time series. is an observation, ) can include time series data.

일 실시 예에서, 시계열 데이터를 학습하기 위해 딥러닝 모델 중 LSTM(Long Short-Term Memory)이 사용될 수 있다.In one embodiment, a deep learning model, LSTM (Long Short-Term Memory), may be used to learn time series data.

일 실시 예에서, 모델 업데이트를 위해 추정된 가치함수가 사용될 수 있다.In one embodiment, the estimated value function for model update can be used.

일 실시 예에서, 상기 방법은, 배터리 충방전으로 차익거래 및 입찰 오차 보정 효과를 얻은 후 최종 수익을 이용하여 보상를 결정하는 단계; 및 보상과 추정된 가치함수로 모델을 업데이트하고 배터리 충방전 후 값 및 다음 시간대에 관측된 에너지 발전량와 에너지 가격로 다음 상태를 결정하는 단계를 더 포함할 수 있다.In one embodiment, the method compensates by using the final profit after obtaining the effect of arbitrage and bidding error correction by charging and discharging the battery. Step of determining; and updating the model with the compensation and estimated value function and the value after battery charging and discharging and the energy generation amount observed in the next time zone. and energy prices It may further include a step of determining the next state.

본 발명의 실시 예들에 따르면, 오차 패널티 감축을 위한 입찰값과 차익거래로 이익 증대를 위한 충방전량을 동시에 결정하는 입찰 전략을 제공하며, 이는 기존에 차익거래 또는 입찰 오차 보정 중 하나에만 초점이 맞춰진 방식과 달리 두 방식의 장점을 모두 활용할 수 있어 궁극적으로 최종 입찰 수익을 극대화할 수 있다.According to embodiments of the present invention, a bidding strategy is provided that simultaneously determines a bid price for reducing an error penalty and a charge/discharge amount for increasing profits through arbitrage, which, unlike existing methods that focus only on either arbitrage or bid error correction, can utilize the advantages of both methods, ultimately maximizing the final bidding profit.

또한, 발전량 및 에너지 가격 예측을 위한 딥러닝 구조에 입찰 수익 극대화를 위한 입찰값 및 충방전량을 결정하는 강화학습 알고리즘을 적용한 심층강화학습을 활용하여 발전량 및 에너지 가격 불확실성이 있는 환경(재생에너지, 천연가스 등)에서도 입찰 수익 극대화를 실현할 수 있다.In addition, by applying deep reinforcement learning, which applies a reinforcement learning algorithm to determine the bid value and charge/discharge amount to maximize bidding profits in a deep learning structure for predicting power generation and energy prices, it is possible to maximize bidding profits even in environments with uncertainty in power generation and energy prices (renewable energy, natural gas, etc.).

도 1은 본 발명의 일 실시 예에 따른 에너지 입찰 방법을 설명하기 위한 도면이다.
도 2는 발전량의 불확실성이 있는 재생에너지를 에너지 시장에 입찰하는 예시적인 방법을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시 예에 따른 심층강화학습 기반 입찰 알고리즘을 설명하기 위한 도면이다.
도 4는 본 발명의 실시 예들에 기초하여 수행된 입찰 모의 실험 결과를 설명하기 위한 도면이다.
도 5는 본 발명의 실시 예들에 따른 에너지 입찰 방법을 구현하기 위한 컴퓨팅 장치를 설명하기 위한 블록도이다.FIG. 1 is a drawing for explaining an energy bidding method according to one embodiment of the present invention.
Figure 2 is a diagram illustrating an exemplary method for bidding renewable energy with uncertainty in power generation in an energy market.
FIG. 3 is a diagram for explaining a deep reinforcement learning-based bidding algorithm according to one embodiment of the present invention.
FIG. 4 is a drawing for explaining the results of a bidding simulation experiment performed based on embodiments of the present invention.
FIG. 5 is a block diagram illustrating a computing device for implementing an energy bidding method according to embodiments of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Below, with reference to the attached drawings, embodiments of the present invention are described in detail so that those skilled in the art can easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in the drawings, parts irrelevant to the description are omitted for clarity of description, and similar parts are designated with similar reference numerals throughout the specification.

명세서 및 청구범위 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. 또한, 아래에서 설명되는 실시 예들에 따른 배터리를 활용한 재생에너지 오차 보정 가능한 예측 방법은 프로그램 또는 소프트웨어로 구현될 수 있고, 프로그램 또는 소프트웨어는 컴퓨터로 판독 가능한 매체에 저장될 수 있다.Throughout the specification and claims, when a part is said to "include" a certain component, this does not exclude other components, but rather means that other components may be included, unless otherwise specifically stated. In addition, terms such as "...part," "...unit," and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware, software, or a combination of hardware and software. In addition, the method for predicting renewable energy using a battery according to the embodiments described below may be implemented as a program or software, and the program or software may be stored on a computer-readable medium.

도 1은 본 발명의 일 실시 예에 따른 에너지 입찰 방법을 설명하기 위한 도면이다.FIG. 1 is a drawing for explaining an energy bidding method according to one embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 에너지 입찰 방법은, 딥러닝 알고리즘을 이용하여 미래의 에너지 발전량 및 에너지 가격을 결정하는 단계(S110), 강화학습 알고리즘을 이용하여 입찰값 및 충방전 조절 비율을 결정하는 단계(S120) 및 결정된 값들에 기초하여 에너지 입찰을 수행하는 단계(S130)를 포함할 수 있다.Referring to FIG. 1, an energy bidding method according to an embodiment of the present invention may include a step (S110) of determining future energy generation and energy price using a deep learning algorithm, a step (S120) of determining a bid value and a charge/discharge control ratio using a reinforcement learning algorithm, and a step (S130) of performing energy bidding based on the determined values.

불확실한 에너지 발전량이 있는 환경에서 차익거래와 입찰 오차 보정은 모두 수익을 극대화할 수 있는 요소들이다. 2 가지 방식을 단순히 결합한다면 배터리 용량을 약 2배로 늘려야 하는데, 이로 인해 배터리 투자비용이 크게 증가할 수 있다. 실시 예들은, 차익거래와 입찰 오차 보정을 모두 고려하는 방법을 제공한다.In an environment with uncertain energy generation, both arbitrage and bid error correction are factors that can maximize profits. Simply combining the two methods would require roughly doubling battery capacity, which could significantly increase battery investment costs. The examples provide a method that considers both arbitrage and bid error correction.

이를 위해, 딥러닝 구조를 활용하여 발전량과 에너지 가격의 불확실성을 모두 고려하였으며, 강화학습 알고리즘을 적용한 심층강화학습으로 확장하여 수익 최대화를 위한 입찰값과 충방전량를 결정하게 하였고, 이 때 입찰값은 오차 보정이 가능하도록 결정되며 차익거래를 위한 충방전량이 추가되도록 구성되었다. 이에 따라 동일한 배터리 용량만으로 차익거래와 입찰 오차 보정을 모두 고려하여 추가적인 투자비용 없이 최종 입찰 수익을 극대화할 수 있다.To achieve this, we leveraged a deep learning architecture to account for both power generation and energy price uncertainty. We then extended the approach with deep reinforcement learning, which utilizes a reinforcement learning algorithm, to determine bid prices and charge/discharge amounts to maximize profits. The bid prices are determined with error correction in mind, and additional charge/discharge amounts are configured to allow for arbitrage. Consequently, with the same battery capacity, both arbitrage and bid error correction can be considered, maximizing final bidding profits without additional investment.

에너지 입찰 모델Energy Bidding Model

실시 예들에 따른 에너지 입찰 모델 및 이와 관련된 파라미터들은 다음과 같다.The energy bidding model and its related parameters according to the embodiments are as follows.

그리고 전체적으로 해결하고자 하는 문제는 다음과 같이 정의할 수 있다.And the overall problem we want to solve can be defined as follows:

식 (2)는 시간대에서 배터리의 충전상태(SoC; State-of-Charge)가일 때 다음 시간대에서 배터리에 최대로 충전 또는 방전할 수 있는 전력을 나타낸다. 배터리는 SoC가 매우 높거나 매우 낮을 때 열화 현상이 심하다. 따라서 SoC가 배터리 열화가 심하지 않은 범위() 내에 있도록 제한해야 하므로는을 만족해야 한다. 이 때 충방전 효율()과 최대 충방전 전력()도 고려해야 한다. 배터리를 충전 또는 방전할 때 에너지 손실이 생기기 때문에, 충전할 때는 배터리에 에너지가 덜 쌓이게 되어 충전 가능한 양이의 비율로 늘어나게 되고, 방전할 때는 에너지 손실로 인해 방전된 양이의 비율로 줄어들게 된다. 또한 전력변환 시스템의 조건에 의해 용량이 남아있어도 이상으로 충전 또는 방전할 수는 없다. 이에 따라 식 (2)와 같이 시간대의 SoC에 따라 다음 시간대에서 최대로 충전 또는 방전할 수 있는 양이 결정된다.Equation (2) is the time zone The battery's state of charge (SoC) is Next time zone It represents the maximum power that can be charged or discharged to the battery. The battery is prone to degradation when the SoC is very high or very low. Therefore, the SoC should be within the range where the battery degradation is not severe ( ) must be limited to within Is must be satisfied. At this time, the charge/discharge efficiency ( ) and maximum charge/discharge power ( ) should also be taken into account. Since energy loss occurs when charging or discharging a battery, less energy is stored in the battery when charging, which reduces the amount of charge that can be made. increases at a rate of , and when discharging, the amount discharged is reduced due to energy loss. It will be reduced at a rate of . Also, even if the capacity remains depending on the conditions of the power conversion system, It cannot be charged or discharged beyond this limit. Accordingly, as in Equation (2), the time zone Next time zone depending on SoC The maximum amount that can be charged or discharged is determined.

식 (3)은 배터리에 충전 또는 방전되는 전력을 나타낸다.를 시간대에서의 재생에너지 발전량 실측값으로,를 시간대에서의 재생에너지 발전량 입찰값으로 정의하며, 오차는가 된다. 입찰 오차 보정 방식을 위해 과소 예측일 경우만큼 충전하고 과대 예측일 경우만큼 방전한다. 차익거래 방식을 위해 여기서 결정된 충방전량을의 비율로 추가로 조절한다. 즉, 에이전트가 결정해야 되는 값은와 2개로 나타난다. 충방전량은 각각 식 (2)에서 계산했던와로 제한된다. Equation (3) represents the power charged or discharged to the battery. Time zone As a result of actual measurement of renewable energy generation in Time zone It is defined as the bidding value of renewable energy generation in , and the error is If it is an underestimation, it will be used to correct the bidding error. If it is charged as much and over-predicted, Discharge as much as possible. The charge and discharge amount determined here for the arbitrage method is further adjusted by the ratio of . That is, the value that the agent must decide is and It appears as two. The charge and discharge amounts are calculated in Equation (2). and is limited to.

식 (4)는 배터리 SoC의 변화를 나타낸다. 에너지 손실로 인해 충전할 때는 충전한 전력에서의 비율로 덜 쌓이게 되고, 방전할 때는의 비율로 더 방전해야 방전해야 할 전력를 맞출 수 있다. 다른 식과 달리이에 의해 결정되는 시간 결합성(time-coupling)이 있는 식이다. Equation (4) represents the change in battery SoC. When charging, the charged power is at It accumulates less at a rate of , and when discharging, The power that must be discharged at a rate of can be matched. Unlike other formulas, this It is a formula with time-coupling determined by .

식 (5)는 전력망에 급전(Dispatch)된 전력량를 나타낸다. 실측값에서 배터리에 충전한 양만큼 빼고 방전된 양만큼 더한 것이다. 급전된 전력량은 에너지 시장으로 보내진 전력량을 의미하고, 이에 따라 에너지 사업자의 수익이 결정된다. 수익은 시장 규칙에 따라 다르게 나타나지만, 발전된 전력을 최대한 에너지 가격이 높을 때 급전할수록, 입찰된 값과 급전된 값의 차이인 입찰 오차가 작을수록 높은 수익을 얻게 된다. Equation (5) is the amount of power dispatched to the power grid. It represents the actual value minus the amount charged to the battery and adding the amount discharged. The amount of electricity delivered refers to the amount of electricity sent to the energy market, and this determines the revenue of energy providers. Revenue varies depending on market rules, but the higher the generated electricity is delivered when energy prices are as high as possible, and the smaller the bidding error (the difference between the bid price and the delivered price), the higher the revenue.

식 (6)은 시장 규칙의 한 예시를 보여주며, 미국 PJM 실시간 에너지 시장에서 사용하는 시장 규칙이 적용된 식이다. 먼저 급전된 에너지량에 단위 에너지당 가격을 곱한 만큼 수익이 부여된다. 이때 입찰 오차에 오차당 페널티를 곱한만큼 페널티를 부과해야 한다.은 수익 대비 페널티를 얼마만큼의 비율로 줄 것인지를 나타내는 상수이다. 뿐만 아니라 배터리를 사용하기 때문에 배터리 사용에 대한 요금도 부과된다. SoC가 배터리 열화가 심하지 않은 범위()에 있을 때 배터리 열화는 충방전 전력에 거의 비례한다고 알려져 있으며, 충전 에너지 및 방전 에너지에 비례하는 만큼 부과된다.는 충방전 전력당 배터리 열화에 대한 비용을 나타내는 상수이다. 따라서 최종 수익는 급전된 에너지에 대한 수익에서 오차에 대한 페널티 비용과 배터리 열화 비용을 뺀 값이 된다. 다른 시장 규칙에 대해서도 식 (6)과 같이 입찰값에 따른 수익이 어떻게 나타내는지 표현하면 적용이 가능하다. 에너지 시장은 공통적으로 에너지 가격의 변동이 있고, 입찰 오차에 따른 페널티 또는 입찰 정확도에 따른 인센티브가 부여되므로 어떠한 시장 규칙에서도 차익거래와 입찰 오차 보정 효과를 노리도록와를 결정하게 할 수 있다. Equation (6) shows an example of a market rule, and is an equation that applies the market rule used in the PJM real-time energy market in the United States. First, the amount of energy supplied Price per unit of energy The profit is awarded as a multiplication of the bid error. Penalty per error A penalty must be imposed as a multiplication factor. is a constant that indicates the percentage of the profit to be penalized. In addition, since a battery is used, a fee for battery usage is also charged. The SoC is in a range where battery degradation is not severe ( ) is known to be almost proportional to the charge and discharge power, and the charging energy and discharge energy It is imposed in proportion to the amount. is a constant representing the cost of battery degradation per charge/discharge power. Therefore, the final profit is the profit for the supplied energy minus the penalty cost for the error and the battery deterioration cost. Equation (6) can also be applied to other market rules by expressing how the profit according to the bid value is represented. The energy market commonly experiences fluctuations in energy prices, and penalties for bid errors or incentives for bid accuracy are provided, so any market rule aims for arbitrage and bid error correction effects. and can be decided.

궁극적으로 목적 함수는 식 (1)과 같이 나타난다. 미래의 에너지 발전량 및 에너지 가격의 분포에 따라 수익의 합을 최대화할 수 있는 입찰값와 충방전 조절 비율을 결정하는 것이다. 이때 감가 상각()의 의미는 먼 미래에 발생한 수익일수록 그 중요도를 낮춘 것이다. 미래의 수익까지 모두 고려해야 하는 이유는 식 (4)로 인해 시간 결합성(time-coupling)이 생기기 때문이다. Ultimately, the objective function is expressed as Equation (1). Future energy generation amount and energy prices Profits according to distribution The bid value that maximizes the sum of and charge/discharge control ratio is to decide. At this time, depreciation ( ) means that the more distant the future earnings are, the lower their importance is. The reason why all future earnings must be considered is because time coupling occurs due to equation (4).

심층강화학습 적용 방법How to apply deep reinforcement learning

미래의 에너지 발전량과 에너지 가격은 현재 알려지지 않은 불확실한 요소들이므로 과거 데이터에 기반한 기계학습 알고리즘을 필요로 한다. 이때 불확실한 요소들을 예측하기 위해 로 데이터(Raw data)에서도 의미있는 예측을 수행할 수 있는 딥러닝 알고리즘이 널리 사용되며, 본 발명의 실시 예들 역시 딥러닝에 기반할 수 있다. 또한 식 (4)로 인해 시간 결합성(time-coupling)이 있다는 것은 해결하고자 하는 문제가 입찰값와 충방전 조절 비율을 순차적으로 결정해야 하는 순차적 의사 결정임을 의미한다. 이는 마르코프 결정 과정(Markov Decision Process; MDP)에 기반한 강화학습 알고리즘을 사용하여야 한다. 따라서 본 발명의 실시 예들의 경우, 딥러닝과 강화학습에 기반한 심층강화학습이 적용될 수 있다.Future energy generation and energy prices Since are currently unknown uncertain factors, a machine learning algorithm based on past data is required. At this time, deep learning algorithms that can perform meaningful predictions even on raw data are widely used to predict uncertain factors, and embodiments of the present invention can also be based on deep learning. In addition, the existence of time-coupling due to equation (4) means that the problem to be solved is the bid price. and charge/discharge control ratio This means that sequential decision-making is required. This requires the use of a reinforcement learning algorithm based on a Markov Decision Process (MDP). Therefore, in the embodiments of the present invention, deep reinforcement learning based on deep learning and reinforcement learning can be applied.

MDP는 상태(state,), 행동(action,), 보상(reward,)으로 구성된다. 시계열의 불확실한 상황에서는 관측(observation,)의 시계열 데이터들로 구성된다. 본 발명에서,,,는 아래와 같이 정의된다. MDP is a state (state, ), action, ), reward, ) is composed of. In the uncertain situation of time series is an observation, ) consists of time series data. In the present invention, , , , is defined as follows:

시간대에서의 에너지 발전량와 에너지 가격는 미리 관측될 수 있는 사항들이 아닌 불확실한 요소들이다. 따라서 관측의 요소에는 기존에 관측되었던과가 대신 들어가게 된다. 상태는 이들의 시계열 데이터로 구성되었으며, 이를 기반으로 행동를 결정하게 된다. 에이전트가 결정해야 되는 값은와이므로 행동는 이 둘로 구성된다. 보상는 에이전트의 목적에 의해 결정되기에로 구성할 수 있다. 그러나는와에 의해서도 결정되는데, 이는 학습 효율을 떨어뜨릴 수 있다. 예를 들어 또는가 낮은 상황에서는 수익가 어쩔 수 없이 낮게 되는데 학습 알고리즘은 잘못된 학습으로가 낮아진 것으로 오판할 수 있다. 반대로 또는가 높으면 잘못된 결정을 내려도 수익가 높을 수 있는데 학습 알고리즘이 맞는 결정을 내렸다고 오판할 수 있다. 따라서 학습에 사용될 보상함수는와 기본적인 수익의 차이로 재정의할 수 있다. 이를 통해 랜덤하게 높아지거나 낮아지는 보상가 아닌 에이전트의 결정에 의해 높아지거나 낮아지는 보상를 가질 수 있게 된다.slot Energy generation in and energy prices are uncertain factors that cannot be observed in advance. Therefore, observations The elements of the previously observed class will be entered instead. Status It consists of their time series data and based on this, actions are taken. is decided. The value that the agent must decide is and Therefore, action It consists of these two. Compensation is determined by the agent's purpose. can be composed of. However, Is and It is also determined by, which can reduce learning efficiency. For example, or In low-risk situations, profits It is inevitably low, but the learning algorithm is learning incorrectly. It can be misjudged as having decreased. On the contrary, or If the is high, even if you make a wrong decision, you will make a profit can be high, which can lead to the learning algorithm mistakenly thinking that it has made the right decision. Therefore, the reward function used for learning is and basic income can be redefined by the difference between . This allows the reward to be randomly increased or decreased. A reward that is increased or decreased by the agent's decision, not by the agent. You will be able to have .

행동의 구성 요소인와는 연속적인 값이므로 정책 기반 강화학습이 사용된다. 정책이란 어떤 상태에서 어떤 행동를 취해야 하는지 알려주는 것이고, 정책는에 따라가 취해질 확률을 나타내 준다. 정책 기반 강화학습은 정책을 파라미터로 모델링하여 (이때의 정책을라 함)을 최대화할 수 있는 파라미터를 찾는다. 이때는 딥러닝의 파라미터를 의미한다. action is a component of and Since is a continuous value, policy-based reinforcement learning is used. What is a policy? Any action in It tells you what you need to do and what policy you need to take. Is According to The probability that will be taken Policy-based reinforcement learning represents the policy as a parameter. By modeling (the policy at this time) (called) Parameters that can maximize Find it. At this time refers to the parameters of deep learning.

딥러닝 모델의 입력은 상태이고, 출력은 여기서의 정책이다. 일반적으로 가우시안 분포로 모델링되며 가우시안 분포의 평균 벡터를 출력한다. (여기서는 구성요소가와이므로 2차원 벡터이다.) 가우시안 분포의 공분산 행렬은 하이퍼파라미터로 설정되어 모델 트레이닝 과정 중에 조절된다. 구성요소가 2개이므로 2Х2 단위 행렬에 설정된 분산 하이퍼파라미터를 곱한 값이 공분산 행렬로 사용된다. 행동는 정책에서 추출된다. The input of a deep learning model is a state and the output is the policy here It is generally modeled as a Gaussian distribution and outputs the mean vector of the Gaussian distribution. (Here, the components are and ) is a two-dimensional vector. The covariance matrix of the Gaussian distribution is set as a hyperparameter and adjusted during the model training process. Since it has two components, the variance hyperparameter is set to the 2Х2 identity matrix. The value multiplied by is used as the covariance matrix. Action is a policy is extracted from.

해결해야 하는 문제인 식 (1)을 강화학습 형태로 바꾸면 아래와 같다.If we change equation (1), which is the problem to be solved, into reinforcement learning form, it is as follows.

식 (1)과 달라진 점은 목적함수가 대신가 사용되었다는 점과, 모든 시간대에서의를 결정하는 대신 각 상태에서 어떤를 취할지 알려주는 정책을 결정한다는 점이다. 대신를 사용한 것은 학습 효율을 높이기 위함이고,를 결정하는 것은 모든 시간대에서의를 결정하는 것과 같기 때문에 식 (1)에서 목적하는 바를 이룰 수 있다. 모든 연속값 제어가 가능한 정책 기반 강화학습 알고리즘이 식 (11)을 해결하는 데 사용될 수 있다.The difference from equation (1) is that the objective function is instead that was used, and in all time zones Instead of deciding on each state In which A policy that informs whether to take The point is that it determines. instead The purpose of using is to increase learning efficiency, Determining is in all time zones Since it is the same as determining Equation (1), the goal in Equation (1) can be achieved. Any policy-based reinforcement learning algorithm capable of controlling all continuous values can be used to solve Equation (11).

도 2는 발전량의 불확실성이 있는 재생에너지를 에너지 시장에 입찰하는 예시적인 방법을 설명하기 위한 도면이다.Figure 2 is a diagram illustrating an exemplary method for bidding renewable energy with uncertainty in power generation in an energy market.

도 2를 참조하면, 발전량의 불확실성으로 유명한 재생에너지를 에너지 시장에 입찰할 때 프로세스의 예시(미국 PJM 시장)를 나타낸다. 재생에너지 사업자가 시간대에서의 입찰값와 충방전 조절 비율를 사전에 결정하고, 이후 태양광 또는 풍력과 같은 재생에너지 발전소에서 발전량가 정해진다.,,에 의해서 충방전 전력와가 정해지면 급전되는 전력량와 배터리 사용 비용이 정해진다. 급전되는 전력량에 따라 이익이 부여되며와의 차이로 인해 생기는 입찰 오차로 인해 페널티도 부과된다.Referring to Figure 2, an example of the process (in the PJM market in the U.S.) is presented when bidding on renewable energy, which is known for its uncertainty in power generation, in the energy market. Renewable energy operators Bid price in and charge/discharge control ratio Decide in advance and then generate power from renewable energy power plants such as solar or wind power. is decided. , , Charge and discharge power by and The amount of power supplied when the and battery usage cost This is determined by the amount of electricity supplied. This is granted and Penalty due to bidding error caused by difference is also imposed.

도 3은 본 발명의 일 실시 예에 따른 심층강화학습 기반 입찰 알고리즘을 설명하기 위한 도면이다.FIG. 3 is a diagram for explaining a deep reinforcement learning-based bidding algorithm according to one embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시 예에 따른 심층강화학습 기반 입찰 알고리즘을 도식화한 것이다. 먼저 시계열 데이터를 학습하기 위해 딥러닝 모델 중 LSTM(Long Short-Term Memory)이 사용된다. LSTM은 시간순으로 두 벡터와를 다음 LSTM 셀(cell)에 전달하므로, 시간대에 모든 관측값이 사용되어 결과적으로 상태가 입력된 형태임을 알 수 있다. LSTM의 파라미터를라 하면 정책의 평균이 출력된다. 이는 하이퍼파라미터로 설정된 정책의 공분산 행렬과 같이 가우시안 분포를 만들고 행동는 이 분포로부터 추출된다. 모델의 또 다른 출력는 추정된 가치함수로 모델 업데이트에 사용된다. 배터리 충방전으로 차익거래 및 입찰 오차 보정 효과를 얻은 후 최종 수익을 이용하여 보상가 결정된다. 보상과 추정된 가치함수로 모델을 업데이트하고 배터리 충방전 후 값 및 다음 시간대에 관측된 에너지 발전량와 에너지 가격로 다음 상태를 결정한다.Referring to Figure 3, a deep reinforcement learning-based bidding algorithm according to one embodiment of the present invention is schematically illustrated. First, LSTM (Long Short-Term Memory) is used among deep learning models to learn time-series data. LSTM divides two vectors in time order. and It passes to the next LSTM cell, so the time zone All observations in This is used and results in a state It can be seen that the input format is . The parameters of LSTM If so, the policy The average of is output. This creates a Gaussian distribution and acts like the covariance matrix of the policy set as a hyperparameter. is drawn from this distribution. Another output of the model The estimated value function is used to update the model. After obtaining the effects of arbitrage and bidding error correction through battery charging and discharging, the final profit is used for compensation. is determined. The model is updated with the compensation and estimated value function, and the values after battery charging and discharging and the energy generation observed in the next time period are calculated. and energy prices Determines the next state.

도 4는 본 발명의 실시 예들에 기초하여 수행된 입찰 모의 실험 결과를 설명하기 위한 도면이다.FIG. 4 is a drawing for explaining the results of a bidding simulation experiment performed based on embodiments of the present invention.

도 4를 참조하면, 본 발명의 실시 예들로 입찰 모의 실험을 했을 때 재생에너지(RES; Renewable Energy Sources) 발전량, 에너지 가격, 배터리에 저장된 에너지의 변화를 나타낸다. 재생에너지는 발전량의 불확실성을 갖는 태양광과 풍력을 대상으로 모의 실험하였다. 전체적으로 에너지 가격이 높고 배터리에 저장된 에너지가 많을 때 과대 입찰을 하고, 에너지 가격이 낮고 배터리에 저장된 에너지가 적을 때 과소 입찰을 하려는 경향을 보임을 알 수 있다.Referring to Figure 4, a bidding simulation using embodiments of the present invention illustrates changes in renewable energy sources (RES) power generation, energy prices, and battery energy. The simulation targeted solar and wind power, which have uncertainties in power generation. Overall, it can be seen that bidders tend to overbid when energy prices are high and battery energy is abundant, and underbid when energy prices are low and battery energy is limited.

도 5는 본 발명의 실시 예들에 따른 에너지 입찰 방법을 구현하기 위한 컴퓨팅 장치를 설명하기 위한 블록도이다.FIG. 5 is a block diagram illustrating a computing device for implementing an energy bidding method according to embodiments of the present invention.

도 5를 참조하면, 본 발명의 실시 예들에 따른 에너지 입찰 방법은 컴퓨팅 장치(500)를 이용하여 구현될 수 있다.Referring to FIG. 5, an energy bidding method according to embodiments of the present invention can be implemented using a computing device (500).

컴퓨팅 장치(500)는 버스(520)를 통해 통신하는 프로세서(510), 메모리(530), 사용자 인터페이스 입력 장치(540), 사용자 인터페이스 출력 장치(550) 및 저장 장치(560) 중 적어도 하나를 포함할 수 있다. 컴퓨팅 장치(500)는 또한 네트워크(40), 예컨대 무선 네트워크에 전기적으로 접속되는 네트워크 인터페이스(570)를 포함할 수 있다. 네트워크 인터페이스(570)는 네트워크(40)를 통해 다른 개체와 신호를 송신 또는 수신할 수 있다.The computing device (500) may include at least one of a processor (510), a memory (530), a user interface input device (540), a user interface output device (550), and a storage device (560) that communicate via a bus (520). The computing device (500) may also include a network interface (570) that is electrically connected to a network (40), such as a wireless network. The network interface (570) may transmit or receive signals to or from other entities via the network (40).

프로세서(510)는 AP(Application Processor), CPU(Central Processing Unit), GPU(Graphic　Processing　Unit) 등과 같은 다양한 종류들로 구현될 수 있으며, 메모리(530) 또는 저장 장치(560)에 저장된 명령을 실행하는 임의의 반도체 장치일 수 있다. 프로세서(510)는 도 1 내지 도 4에서 설명한 기능 및 방법들을 구현하도록 구성될 수 있다.The processor (510) may be implemented in various types such as an AP (Application Processor), a CPU (Central Processing Unit), a GPU (Graphic Processing Unit), etc., and may be any semiconductor device that executes instructions stored in a memory (530) or a storage device (560). The processor (510) may be configured to implement the functions and methods described in FIGS. 1 to 4.

메모리(530) 및 저장 장치(560)는 다양한 형태의 휘발성 또는 비 휘발성 저장 매체를 포함할 수 있다. 예를 들어, 메모리는 ROM(read-only memory)(531) 및 RAM(random access memory)(532)를 포함할 수 있다. 본 발명의 실시 예에서 메모리(530)는 프로세서(510)의 내부 또는 외부에 위치할 수 있고, 메모리(530)는 이미 알려진 다양한 수단을 통해 프로세서(510)와 연결될 수 있다.The memory (530) and storage device (560) may include various types of volatile or non-volatile storage media. For example, the memory may include read-only memory (ROM) (531) and random access memory (RAM) (532). In an embodiment of the present invention, the memory (530) may be located inside or outside the processor (510), and the memory (530) may be connected to the processor (510) via various known means.

또한, 본 발명의 실시 예들에 따른 에너지 입찰 방법 중 적어도 일부는 컴퓨팅 장치(500)에서 실행되는 프로그램 또는 소프트웨어로 구현될 수 있고, 프로그램 또는 소프트웨어는 컴퓨터로 판독 가능한 매체에 저장될 수 있다.Additionally, at least some of the energy bidding methods according to embodiments of the present invention may be implemented as a program or software running on a computing device (500), and the program or software may be stored on a computer-readable medium.

또한, 본 발명의 실시 예들에 따른 에너지 입찰 방법 중 적어도 일부는 컴퓨팅 장치(500)와 전기적으로 접속될 수 있는 하드웨어로 구현될 수도 있다.Additionally, at least some of the energy bidding methods according to embodiments of the present invention may be implemented as hardware that can be electrically connected to a computing device (500).

이제까지 설명한 본 발명의 실시 예들에 따르면, 오차 패널티 감축을 위한 입찰값과 차익거래로 이익 증대를 위한 충방전량을 동시에 결정하는 입찰 전략을 제공하며, 이는 기존에 차익거래 또는 입찰 오차 보정 중 하나에만 초점이 맞춰진 방식과 달리 두 방식의 장점을 모두 활용할 수 있어 궁극적으로 최종 입찰 수익을 극대화할 수 있다.According to the embodiments of the present invention described so far, a bidding strategy is provided that simultaneously determines a bid price for reducing an error penalty and a charge/discharge amount for increasing profits through arbitrage, which, unlike existing methods that focus only on either arbitrage or bid error correction, can utilize the advantages of both methods, ultimately maximizing the final bidding profit.

이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리 범위는 이에 한정되는 것은 아니고, 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자의 여러 변형 및 개량 형태 또한 본 발명의 권리 범위에 속한다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements made by a person of ordinary skill in the art to which the present invention pertains using the basic concept of the present invention defined in the following claims also fall within the scope of the present invention.

Claims

Translated fromKorean

프로세서 및 메모리를 포함하는 컴퓨팅 장치에 의해 수행되는 에너지 입찰 방법으로서,
상기 프로세서가, 딥러닝 알고리즘을 이용하여 미래의 에너지 발전량 및 에너지 가격을 결정하는 단계;
상기 프로세서가, 강화학습 알고리즘을 이용하여 입찰값 및 충방전 조절 비율을 결정하는 단계; 및
상기 프로세서가, 상기 결정된 값들에 기초하여 에너지 입찰을 수행하는 단계를 포함하는
에너지 입찰 방법.An energy bidding method performed by a computing device including a processor and a memory,
The above processor uses a deep learning algorithm to predict future energy generation. and energy prices Steps to determine;
The above processor uses a reinforcement learning algorithm to determine the bid value. and charge/discharge control ratio a step of determining; and
The processor comprises a step of performing energy bidding based on the determined values.
How to bid for energy.

제1항에 있어서,
상기 프로세서가, 입찰 오차 보정 방식을 위해 과소 예측일 경우만큼 충전하고 과대 예측일 경우만큼 방전하는 단계를 더 포함하는 에너지 입찰 방법.In the first paragraph,
If the above processor is under-predicted for the bidding error correction method, If it is charged as much and over-predicted, An energy bidding method further comprising a step of discharging.

제2항에 있어서,
상기 프로세서가, 차익거래 방식을 위해 여기서 결정된 충방전량을의 비율로 추가로 조절하는 단계를 더 포함하는 에너지 입찰 방법.In the second paragraph,
The above processor determines the charge/discharge amount here for the arbitrage method. An energy bidding method further comprising a step of further adjusting the ratio.

제1항에 있어서,
상기 강화학습 알고리즘은 마르코프 결정 과정(Markov Decision Process; MDP)에 기반한 강화학습 알고리즘을 포함하는, 에너지 입찰 방법.In the first paragraph,
The above reinforcement learning algorithm is an energy bidding method including a reinforcement learning algorithm based on a Markov Decision Process (MDP).

제4항에 있어서,
상기 MDP는 상태(state,), 행동(action,), 보상(reward,)을 포함하고, 시계열의 불확실한 상황에서는 관측(observation,)의 시계열 데이터들을 포함하는, 에너지 입찰 방법.In paragraph 4,
The above MDP is a state (state, ), action, ), reward, ) and in the uncertain situation of time series. is an observation, ) containing time series data of energy bidding method.

제1항에 있어서,
시계열 데이터를 학습하기 위해 딥러닝 모델 중 LSTM(Long Short-Term Memory)이 사용되는, 에너지 입찰 방법.In the first paragraph,
An energy bidding method that uses LSTM (Long Short-Term Memory) among deep learning models to learn time-series data.

제1항에 있어서,
모델 업데이트를 위해 추정된 가치함수가 사용되는, 에너지 입찰 방법.In the first paragraph,
Estimated value function for model update Energy bidding method used.

제1항에 있어서,
상기 프로세서가, 배터리 충방전으로 차익거래 및 입찰 오차 보정 효과를 얻은 후 최종 수익을 이용하여 보상를 결정하는 단계; 및
상기 프로세서가, 보상과 추정된 가치함수로 모델을 업데이트하고 배터리 충방전 후 값 및 다음 시간대에 관측된 에너지 발전량와 에너지 가격로 다음 상태를 결정하는 단계를 더 포함하는 에너지 입찰 방법.In the first paragraph,
The above processor compensates by using the final profit after obtaining the effect of arbitrage and bidding error correction through battery charging and discharging. a step of determining; and
The above processor updates the model with the reward and estimated value function and the value after battery charging and discharging and the energy generation amount observed in the next time period. and energy prices An energy bidding method further comprising the step of determining the next state.