Bayesian Optimization Acquisition Function | Bayesian Optimization – Math And Algorithm Explained 모든 답변

당신은 주제를 찾고 있습니까 “bayesian optimization acquisition function – Bayesian Optimization – Math and Algorithm Explained“? 다음 카테고리의 웹사이트 ppa.diaochoangduong.vn 에서 귀하의 모든 질문에 답변해 드립니다: https://ppa.diaochoangduong.vn/blog/. 바로 아래에서 답을 찾을 수 있습니다. 작성자 Machine Learning Mastery 이(가) 작성한 기사에는 조회수 16,373회 및 955521 Like 개의 좋아요가 있습니다.

bayesian optimization acquisition function 주제에 대한 동영상 보기

여기에서 이 주제에 대한 비디오를 시청하십시오. 주의 깊게 살펴보고 읽고 있는 내용에 대한 피드백을 제공하세요!

d여기에서 Bayesian Optimization – Math and Algorithm Explained – bayesian optimization acquisition function 주제에 대한 세부정보를 참조하세요

Learn the algorithmic behind Bayesian optimization, Surrogate Function calculations and Acquisition Function (Upper Confidence Bound). Visualize a scratch implementation on how the approximation works iteratively. Finally, understand how to use scikit-optimize package todo hyperparameter tuning using bayesian optimization.

bayesian optimization acquisition function 주제에 대한 자세한 내용은 여기를 참조하세요.

Bayesian Optimization Acquisition Functions

Bayesian optimization proceeds by maintaining a probabilistic belief about f and designing a so- called acquisition function to determine where to evaluate …

+ 여기를 클릭

Source: www.cse.wustl.edu

Date Published: 11/16/2022

View: 1118

Maximizing acquisition functions for Bayesian optimization

Bayesian optimization is a sample-efficient approach to global optimization that relies on theoretically motivated value heuristics (acquisition functions) …

+ 여기에 표시

Source: papers.neurips.cc

Date Published: 5/28/2021

View: 1775

Acquisition functions – tidymodels/tune

Acquisition functions are mathematical techniques that gue how the parameter space should be explored during Bayesian optimization.

+ 여기에 더 보기

Source: tune.tidymodels.org

Date Published: 7/16/2021

View: 1838

Bayesian Optimization Algorithm – MATLAB & Simulink

The ‘probability-of-improvement’ acquisition function makes a similar, but simpler, calculation as ‘expected-improvement’ . In both cases, bayesopt first …

+ 자세한 내용은 여기를 클릭하십시오

Source: www.mathworks.com

Date Published: 12/22/2022

View: 4737

Why do we use Acquisition Functions? – Cross Validated

The purpose of Bayesian optimization is to find global minima of functions that have many local …

+ 여기에 보기

Source: stats.stackexchange.com

Date Published: 11/11/2021

View: 1529

Augmenting Acquisition Functions with User Beliefs for … – arXiv

Bayesian optimization (BO) has become an established framework and popular tool for hyperparameter optimization (HPO) of machine learning (ML) …

+ 여기에 더 보기

Source: arxiv.org

Date Published: 4/5/2021

View: 5008

A new acquisition function for Bayesian optimization based on …

The global optimum is approached by iteratively maximizing a so-called acquisition function, that balances the exploration and exploitation effect of the search …

+ 여기에 자세히 보기

Source: ieeexplore.ieee.org

Date Published: 6/27/2022

View: 408

How to Implement Bayesian Optimization from Scratch in Python

The acquisition function is responsible for scoring or estimating the likelihood that a given candate sample (input) is worth evaluating with …

+ 여기에 자세히 보기

Source: machinelearningmastery.com

Date Published: 11/14/2021

View: 6648

주제와 관련된 이미지 bayesian optimization acquisition function

주제와 관련된 더 많은 사진을 참조하십시오 Bayesian Optimization – Math and Algorithm Explained. 댓글에서 더 많은 관련 이미지를 보거나 필요한 경우 더 많은 관련 기사를 볼 수 있습니다.

Bayesian Optimization - Math and Algorithm Explained
Bayesian Optimization – Math and Algorithm Explained

주제에 대한 기사 평가 bayesian optimization acquisition function

  • Author: Machine Learning Mastery
  • Views: 조회수 16,373회
  • Likes: 955521 Like
  • Date Published: 2021. 5. 31.
  • Video Url link: https://www.youtube.com/watch?v=ECNU4WIuhSE

Acquisition functions

Acquisition functions Source: vignettes/acquisition_functions.Rmd acquisition_functions.Rmd

Acquisition functions are mathematical techniques that guide how the parameter space should be explored during Bayesian optimization. They use the predicted mean and predicted variance generated by the Gaussian process model. For a set of such predictions on a set of candidate parameter sets, an acquisition functions combines the means and variances into a criterion that will direct the search.

The variance term that is generated by the Gaussian process model usually reflects the spatial aspects of the data. Candidate sets with high variance are not near any existing parameter values (i.e. those that have observed performance estimates). The predicted variance is very close to zero at or very near to an existing result.

There is usually a trade-off between two strategies:

exploitation focuses on results in the vicinity of the current best results by penalizing for higher variance values.

exploration pushes the search towards unexplored regions.

The acquisition functions themselves have quasi-tuning parameters that are usually trade-offs between exploitation and exploration. For example, if the performance measure being used should be maximized (i.e. accuracy, the area under the ROC curve, etc), then one acquisition function would be a lower confidence bound \(L = \mu – C \times \sigma\). The multiplier \(C\) would be used to penalize based on the predicted standard error (\(\sigma\)) of different parameter combinations. Note that the acquisition function is not the performance measure, but a function of what metric is used to evaluate the model.

One of the most common acquisition functions is the expected improvement. Based on basic probability theory, this can be computed relative to the current estimate of the optimal performance. Suppose that our performance metric should be maximized (e.g. accuracy, area under the ROC curve, etc). For any tuning parameter combination \(\theta\), we have the predicted mean and standard error of that metric (call those \(\mu(\theta)\) and \(\sigma(\theta)\)). From previous data, the best (mean) performance value was \(m_{opt}\)). The expected improvement is determined using:

\[ \begin{align} EI(\theta; m_{opt}) &= \delta(\theta) \Phi\left(\frac{\delta(\theta)}{\sigma(\theta)}\right) + \sigma(\theta) \phi\left(\frac{\delta(\theta)}{\sigma(\theta)}\right)

otag \\ &\text{where}

otag \\ \delta(\theta) &= \mu(\theta) – m_{opt}

otag \end{align} \]

The function \(\Phi(\cdot)\) is the cumulative standard normal and \(\phi(\cdot)\) is the standard normal density.

The value \(\delta(\theta)\) measures how close we are (on average) to the current best performance value . When new candidate tuning parameters are needed, the space of \(\theta\) is searched for the value that maximizes the expected improvement.

Suppose a single parameter were being optimized and that parameter was represented using a log10 transformation. Using resampling, suppose the accuracy results for three points were evaluated:

In the first iteration of Bayesian optimization, these three data points are given to the Gaussian process model to produce predictions across a wider range of values. The fitted curve (i.e. \(\mu(\theta)\)) is shown on the top panel below, along with approximate 95% credible intervals \(\mu(\theta) \pm 1.96 \sigma(\theta)\):

Notice that the interval width is large in regions far from observed data points.

The bottom panel shows the expected improvement across the range of candidate values. Of the observed points, the expected improvement near the middle point has the largest improvement. This is because the first term in the equation above (with the \(\delta(\theta)\) coefficient) is very large while the second term (with the coefficient \(\sigma(\theta)\) is virtually zero. This focus on the mean portion will keep the search mostly in the region of the best performance.

Using these results, the parameter value with the largest improvement is then evaluated using cross-validation. The GP model is then updated and a new parameter is chosen and so on.

The results at iteration 20 were:

The points shown on the graph indicate that there is a region in the neighborhood of 0.01 that appears to produce the best results, and that the expected improvement function has driven the optimization to focus on this region.

When using expected improvement, the primary method for compromising between exploitation and exploration is the use of a “trade-off” value. This value is the amount of performance (in the original units) that can be sacrificed when computing the improvement. This has the effect of down-playing the contribution of the mean effect in the computations. For a trade-off value \(\tau\), the equation above uses:

\[ \delta(\theta) = \mu(\theta) – m_{opt} – \tau \]

Suppose that we were willing to trade-off \(\tau = 0.05\)% of the predicted accuracy during the search. Using the same three initial results, the procedure would end up in the same general location but would have explored more values across the total range:

There are two main strategies for dynamic trade-offs during the optimization:

Bayesian Optimization (Part 1)

This is the first blog post of a series that introduces the concept of Bayesian optimization (BO). BO is part of the analysis and optimization toolkit of JCMsuite and is regularly used to optimize photonic structures. The optimization of photonic structures is in general very challenging, mainly because of two reasons:

The behavior of the objective functions is only known implicitly because their evaluations require the full solution of Maxwell’s equation. That is, one has to optimize an expensive black-box function .

. The optical behavior of small photonic structures (e.g. the scattering in certain direction) is dominated by diffraction, interference and resonance phenomena. This leads often to a highly oscillatory behavior of the objective function.

It turns out, that BO is a very efficient method for the optimization of such expensive black-box functions. That is, compared with many other optimization methods, it requires a much smaller number of function evaluation in order to finds the global optimum or a very good local optimum of the objective function. In many scenarios, this can save many hours to days of computation time.

Bayesian Optimization Algorithm

The algorithm stops after reaching any of the following:

Update the Gaussian process model of f(x) to obtain a posterior distribution over functions Q(f|x i , y i for i = 1,…,t). (Internally, bayesopt uses fitrgp to fit a Gaussian process model to the data.)

Evaluate y i = f(x i ) for NumSeedPoints points x i , taken at random within the variable bounds. NumSeedPoints is a bayesopt setting. If there are evaluation errors, take more random points until there are NumSeedPoints successful evaluations. The probability distribution of each component is either uniform or log-scaled, depending on the Transform value in optimizableVariable .

An acquisition function a(x) (based on the Gaussian process model of f) that you maximize to determine the next point x for evaluation. For details, see Acquisition Function Types and Acquisition Function Maximization .

A Bayesian update procedure for modifying the Gaussian process model at each new evaluation of f(x).

Throughout this discussion, D represents the number of components of x.

The Bayesian optimization algorithm attempts to minimize a scalar objective function f(x) for x in a bounded domain. The function can be deterministic or stochastic, meaning it can return different results when evaluated at the same point x. The components of x can be continuous reals, integers, or categorical, meaning a discrete set of names.

The kernel function k(x,x′;θ) can significantly affect the quality of a Gaussian process regression. bayesopt uses the ARD Matérn 5/2 kernel defined in Kernel (Covariance) Function Options .

Fitting a Gaussian process regression model to observations consists of finding values for the noise variance σ 2 and kernel parameters θ. This fitting is a computationally intensive process performed by fitrgp .

Also, the observations are assumed to have added Gaussian noise with variance σ 2 . So the prior distribution has covariance K(X,X;θ) + σ 2 I.

Without loss of generality, the prior mean is given as 0 .

In a bit more detail, denote a set of points X = x i with associated objective function values F = f i . The prior’s joint distribution of the function values F is multivariate normal, with mean μ(X) and covariance matrix K(X,X), where K ij = k(x i ,x j ).

The underlying probabilistic model for the objective function f is a Gaussian process prior with added Gaussian noise in the observations. So the prior distribution on f(x) is a Gaussian process with mean μ(x;θ) and covariance kernel function k(x,x′;θ). Here, θ is a vector of kernel parameters. For the particular kernel function bayesopt uses, see Kernel Function .

Acquisition Function Types

Six choices of acquisition functions are available for bayesopt . There are three basic types, with expected-improvement also modified by per-second or plus :

‘expected-improvement-per-second-plus’ (default)

‘expected-improvement’

‘expected-improvement-plus’

‘expected-improvement-per-second’

‘lower-confidence-bound’

‘probability-of-improvement’

The acquisition functions evaluate the “goodness” of a point x based on the posterior distribution function Q. When there are coupled constraints, including the Error constraint (see Objective Function Errors), all acquisition functions modify their estimate of “goodness” following a suggestion of Gelbart, Snoek, and Adams [2]. Multiply the “goodness” by an estimate of the probability that the constraints are satisfied, to arrive at the acquisition function.

Expected Improvement The ‘expected-improvement’ family of acquisition functions evaluates the expected amount of improvement in the objective function, ignoring values that cause an increase in the objective. In other words, define x best as the location of the lowest posterior mean.

μ Q (x best ) as the lowest value of the posterior mean. Then the expected improvement E I ( x , Q ) = E Q [ max ( 0 , μ Q ( x best ) − f ( x ) ) ] .

Probability of Improvement The ‘probability-of-improvement’ acquisition function makes a similar, but simpler, calculation as ‘expected-improvement’ . In both cases, bayesopt first calculates x best and μ Q (x best ). Then for ‘probability-of-improvement’ , bayesopt calculates the probability PI that a new point x leads to a better objective function value, modified by a “margin” parameter m: P I ( x , Q ) = P Q ( f ( x ) < μ Q ( x best ) − m ) . bayesopt takes m as the estimated noise standard deviation. bayesopt evaluates this probability as P I = Φ ( ν Q ( x ) ) , where ν Q ( x ) = μ Q ( x best ) − m − μ Q ( x ) σ Q ( x ) . Here Φ(·) is the unit normal CDF, and σ Q is the posterior standard deviation of the Gaussian process at x. Lower Confidence Bound The 'lower-confidence-bound' acquisition function looks at the curve G two standard deviations below the posterior mean at each point: G ( x ) = μ Q ( x ) − 2 σ Q ( x ) . G(x) is the 2σ Q lower confidence envelope of the objective function model. bayesopt then maximizes the negative of G: L C B = 2 σ Q ( x ) − μ Q ( x ) . Per Second Sometimes, the time to evaluate the objective function can depend on the region. For example, many Support Vector Machine calculations vary in timing a good deal over certain ranges of points. If so, bayesopt can obtain better improvement per second by using time-weighting in its acquisition function. The cost-weighted acquisition functions have the phrase per-second in their names. These acquisition functions work as follows. During the objective function evaluations, bayesopt maintains another Bayesian model of objective function evaluation time as a function of position x. The expected improvement per second that the acquisition function uses is E I p S ( x ) = E I Q ( x ) μ S ( x ) , where μ S (x) is the posterior mean of the timing Gaussian process model.

Acquisition functions

Acquisition functions Source: vignettes/acquisition_functions.Rmd acquisition_functions.Rmd

Acquisition functions are mathematical techniques that guide how the parameter space should be explored during Bayesian optimization. They use the predicted mean and predicted variance generated by the Gaussian process model. For a set of such predictions on a set of candidate parameter sets, an acquisition functions combines the means and variances into a criterion that will direct the search.

The variance term that is generated by the Gaussian process model usually reflects the spatial aspects of the data. Candidate sets with high variance are not near any existing parameter values (i.e. those that have observed performance estimates). The predicted variance is very close to zero at or very near to an existing result.

There is usually a trade-off between two strategies:

exploitation focuses on results in the vicinity of the current best results by penalizing for higher variance values.

exploration pushes the search towards unexplored regions.

The acquisition functions themselves have quasi-tuning parameters that are usually trade-offs between exploitation and exploration. For example, if the performance measure being used should be maximized (i.e. accuracy, the area under the ROC curve, etc), then one acquisition function would be a lower confidence bound \(L = \mu – C \times \sigma\). The multiplier \(C\) would be used to penalize based on the predicted standard error (\(\sigma\)) of different parameter combinations. Note that the acquisition function is not the performance measure, but a function of what metric is used to evaluate the model.

One of the most common acquisition functions is the expected improvement. Based on basic probability theory, this can be computed relative to the current estimate of the optimal performance. Suppose that our performance metric should be maximized (e.g. accuracy, area under the ROC curve, etc). For any tuning parameter combination \(\theta\), we have the predicted mean and standard error of that metric (call those \(\mu(\theta)\) and \(\sigma(\theta)\)). From previous data, the best (mean) performance value was \(m_{opt}\)). The expected improvement is determined using:

\[ \begin{align} EI(\theta; m_{opt}) &= \delta(\theta) \Phi\left(\frac{\delta(\theta)}{\sigma(\theta)}\right) + \sigma(\theta) \phi\left(\frac{\delta(\theta)}{\sigma(\theta)}\right)

otag \\ &\text{where}

otag \\ \delta(\theta) &= \mu(\theta) – m_{opt}

otag \end{align} \]

The function \(\Phi(\cdot)\) is the cumulative standard normal and \(\phi(\cdot)\) is the standard normal density.

The value \(\delta(\theta)\) measures how close we are (on average) to the current best performance value . When new candidate tuning parameters are needed, the space of \(\theta\) is searched for the value that maximizes the expected improvement.

Suppose a single parameter were being optimized and that parameter was represented using a log10 transformation. Using resampling, suppose the accuracy results for three points were evaluated:

In the first iteration of Bayesian optimization, these three data points are given to the Gaussian process model to produce predictions across a wider range of values. The fitted curve (i.e. \(\mu(\theta)\)) is shown on the top panel below, along with approximate 95% credible intervals \(\mu(\theta) \pm 1.96 \sigma(\theta)\):

Notice that the interval width is large in regions far from observed data points.

The bottom panel shows the expected improvement across the range of candidate values. Of the observed points, the expected improvement near the middle point has the largest improvement. This is because the first term in the equation above (with the \(\delta(\theta)\) coefficient) is very large while the second term (with the coefficient \(\sigma(\theta)\) is virtually zero. This focus on the mean portion will keep the search mostly in the region of the best performance.

Using these results, the parameter value with the largest improvement is then evaluated using cross-validation. The GP model is then updated and a new parameter is chosen and so on.

The results at iteration 20 were:

The points shown on the graph indicate that there is a region in the neighborhood of 0.01 that appears to produce the best results, and that the expected improvement function has driven the optimization to focus on this region.

When using expected improvement, the primary method for compromising between exploitation and exploration is the use of a “trade-off” value. This value is the amount of performance (in the original units) that can be sacrificed when computing the improvement. This has the effect of down-playing the contribution of the mean effect in the computations. For a trade-off value \(\tau\), the equation above uses:

\[ \delta(\theta) = \mu(\theta) – m_{opt} – \tau \]

Suppose that we were willing to trade-off \(\tau = 0.05\)% of the predicted accuracy during the search. Using the same three initial results, the procedure would end up in the same general location but would have explored more values across the total range:

There are two main strategies for dynamic trade-offs during the optimization:

Bayesian Optimization Algorithm

The algorithm stops after reaching any of the following:

Update the Gaussian process model of f(x) to obtain a posterior distribution over functions Q(f|x i , y i for i = 1,…,t). (Internally, bayesopt uses fitrgp to fit a Gaussian process model to the data.)

Evaluate y i = f(x i ) for NumSeedPoints points x i , taken at random within the variable bounds. NumSeedPoints is a bayesopt setting. If there are evaluation errors, take more random points until there are NumSeedPoints successful evaluations. The probability distribution of each component is either uniform or log-scaled, depending on the Transform value in optimizableVariable .

An acquisition function a(x) (based on the Gaussian process model of f) that you maximize to determine the next point x for evaluation. For details, see Acquisition Function Types and Acquisition Function Maximization .

A Bayesian update procedure for modifying the Gaussian process model at each new evaluation of f(x).

Throughout this discussion, D represents the number of components of x.

The Bayesian optimization algorithm attempts to minimize a scalar objective function f(x) for x in a bounded domain. The function can be deterministic or stochastic, meaning it can return different results when evaluated at the same point x. The components of x can be continuous reals, integers, or categorical, meaning a discrete set of names.

The kernel function k(x,x′;θ) can significantly affect the quality of a Gaussian process regression. bayesopt uses the ARD Matérn 5/2 kernel defined in Kernel (Covariance) Function Options .

Fitting a Gaussian process regression model to observations consists of finding values for the noise variance σ 2 and kernel parameters θ. This fitting is a computationally intensive process performed by fitrgp .

Also, the observations are assumed to have added Gaussian noise with variance σ 2 . So the prior distribution has covariance K(X,X;θ) + σ 2 I.

Without loss of generality, the prior mean is given as 0 .

In a bit more detail, denote a set of points X = x i with associated objective function values F = f i . The prior’s joint distribution of the function values F is multivariate normal, with mean μ(X) and covariance matrix K(X,X), where K ij = k(x i ,x j ).

The underlying probabilistic model for the objective function f is a Gaussian process prior with added Gaussian noise in the observations. So the prior distribution on f(x) is a Gaussian process with mean μ(x;θ) and covariance kernel function k(x,x′;θ). Here, θ is a vector of kernel parameters. For the particular kernel function bayesopt uses, see Kernel Function .

Acquisition Function Types

Six choices of acquisition functions are available for bayesopt . There are three basic types, with expected-improvement also modified by per-second or plus :

‘expected-improvement-per-second-plus’ (default)

‘expected-improvement’

‘expected-improvement-plus’

‘expected-improvement-per-second’

‘lower-confidence-bound’

‘probability-of-improvement’

The acquisition functions evaluate the “goodness” of a point x based on the posterior distribution function Q. When there are coupled constraints, including the Error constraint (see Objective Function Errors), all acquisition functions modify their estimate of “goodness” following a suggestion of Gelbart, Snoek, and Adams [2]. Multiply the “goodness” by an estimate of the probability that the constraints are satisfied, to arrive at the acquisition function.

Expected Improvement The ‘expected-improvement’ family of acquisition functions evaluates the expected amount of improvement in the objective function, ignoring values that cause an increase in the objective. In other words, define x best as the location of the lowest posterior mean.

μ Q (x best ) as the lowest value of the posterior mean. Then the expected improvement E I ( x , Q ) = E Q [ max ( 0 , μ Q ( x best ) − f ( x ) ) ] .

Probability of Improvement The ‘probability-of-improvement’ acquisition function makes a similar, but simpler, calculation as ‘expected-improvement’ . In both cases, bayesopt first calculates x best and μ Q (x best ). Then for ‘probability-of-improvement’ , bayesopt calculates the probability PI that a new point x leads to a better objective function value, modified by a “margin” parameter m: P I ( x , Q ) = P Q ( f ( x ) < μ Q ( x best ) − m ) . bayesopt takes m as the estimated noise standard deviation. bayesopt evaluates this probability as P I = Φ ( ν Q ( x ) ) , where ν Q ( x ) = μ Q ( x best ) − m − μ Q ( x ) σ Q ( x ) . Here Φ(·) is the unit normal CDF, and σ Q is the posterior standard deviation of the Gaussian process at x. Lower Confidence Bound The 'lower-confidence-bound' acquisition function looks at the curve G two standard deviations below the posterior mean at each point: G ( x ) = μ Q ( x ) − 2 σ Q ( x ) . G(x) is the 2σ Q lower confidence envelope of the objective function model. bayesopt then maximizes the negative of G: L C B = 2 σ Q ( x ) − μ Q ( x ) . Per Second Sometimes, the time to evaluate the objective function can depend on the region. For example, many Support Vector Machine calculations vary in timing a good deal over certain ranges of points. If so, bayesopt can obtain better improvement per second by using time-weighting in its acquisition function. The cost-weighted acquisition functions have the phrase per-second in their names. These acquisition functions work as follows. During the objective function evaluations, bayesopt maintains another Bayesian model of objective function evaluation time as a function of position x. The expected improvement per second that the acquisition function uses is E I p S ( x ) = E I Q ( x ) μ S ( x ) , where μ S (x) is the posterior mean of the timing Gaussian process model.

Why do we use Acquisition Functions?

$\begingroup$

The purpose of Bayesian optimization is to find global minima of functions that have many local minima. More typical optimizers are “local,” in the sense that they follow some procedure until they find the gradient is zero, and then stop, whether or not there is an even lower value elsewhere.

Almost immediately, this simple statement of the problem exposes the core tension that Bayesian optimization is designed to navigate:

We might exploit our current best estimate to find a better function value nearby our current “best estimate” of the lowest value;

our current best estimate to find a better function value nearby our current “best estimate” of the lowest value; alternatively, we might explore a region far away from what we’ve already visited to find a better value — but these locations are the ones where we have the least information.

This is really no different than deciding what to make for dinner. You could make the same meal you made last night, and it would probably be about as enjoyable. Alternatively, you could experiment and make a new meal, but that’s a gamble. It could be better, or it could be worse. If you want to optimize your enjoyment of dinner, you’re immediately confronted with a choice about whether you want to do something reliable or take the chance that you might be able to make something better (but, by the same token, it might be worse).

Gaussian processes are flexible in that they can exactly interpolate the observed data, but they also reflect increasing uncertainty about the function value as you move away from the observed values. (GPs are a prior over functions, so the further you move from observed data, the more the behavior becomes dominated by the prior.) If we don’t ever explore areas that are far away from our current best estimate, then it’s possible that we’re skipping over the optimal portion of the space.

Moreover, the surrogate function we estimate using the GP and the observed data is not going to be a perfect representation of the true function under optimization. Especially early in the optimization procedure, the minimum identified in the surrogate is unlikely to correspond to the minimum of the true function.

The purpose of the acquisition function is to assign a numerical value that will govern the tradeoff between exploration and exploitation. We want that numerical value to both incorporate the local information about our estimates of the function values, and our uncertainty about those estimates. The acquisition function tells us which function inputs are the most valuable to visit. Because acquisition functions are designed to be cheap to compute and reflect the uncertainty of the surrogate model’s estimates, it summarizes the value and uncertainty estimates from the GP into a single value.

Since your bounty asks for an authoritative source, here’s a quote from a peer-reviewed publication:

Using the Gaussian process model, an acquisition function is constructed to represent the most promising setting for the next experiment. Acquisition functions are mainly derived from the $\mu(x)$ and $\sigma(x)$ of the GP model, and are hence cheap to compute. The acquisition function allows a balance between exploitation (sampling where the objective mean $\mu(\cdot)$ is high) and exploration (sampling where the uncertainty $\sigma(\cdot)$ is high), and its global maximizer is used as the next experimental setting.

from S. Greenhill, S. Rana, S. Gupta, P. Vellanki and S. Venkatesh, “Bayesian Optimization for Adaptive Experimental Design: A Review,” in IEEE Access, vol. 8, pp. 13937-13948, 2020, doi: 10.1109/ACCESS.2020.2966228.

The whole article is very accessible and worth reading if you’re interested in Bayesian optimization.

[2204.11051] $π$BO: Augmenting Acquisition Functions with User Beliefs for Bayesian Optimization

Accessible arXiv

Do you navigate arXiv using a screen reader or other assistive technology? Are you a professor who helps students do so? We want to hear from you. Please consider signing up to share your insights as we work to make arXiv even more open.

키워드에 대한 정보 bayesian optimization acquisition function

다음은 Bing에서 bayesian optimization acquisition function 주제에 대한 검색 결과입니다. 필요한 경우 더 읽을 수 있습니다.

이 기사는 인터넷의 다양한 출처에서 편집되었습니다. 이 기사가 유용했기를 바랍니다. 이 기사가 유용하다고 생각되면 공유하십시오. 매우 감사합니다!

사람들이 주제에 대해 자주 검색하는 키워드 Bayesian Optimization – Math and Algorithm Explained

  • Bayesian optimization
  • Surrogate function
  • Acquisition function
  • Hyperparameter tuning

Bayesian #Optimization #- #Math #and #Algorithm #Explained


YouTube에서 bayesian optimization acquisition function 주제의 다른 동영상 보기

주제에 대한 기사를 시청해 주셔서 감사합니다 Bayesian Optimization – Math and Algorithm Explained | bayesian optimization acquisition function, 이 기사가 유용하다고 생각되면 공유하십시오, 매우 감사합니다.

See also  Base Con Ruedas Para Silla De Oficina | ✅ Mejora Un 200% Tu Silla De Escritorio/Gaming 답을 믿으세요

Leave a Comment