Autoencoder Anomaly Detection | 88 – Applications Of Autoencoders – Anomaly Detection 최근 답변 260개

당신은 주제를 찾고 있습니까 “autoencoder anomaly detection – 88 – Applications of Autoencoders – Anomaly Detection“? 다음 카테고리의 웹사이트 https://ppa.diaochoangduong.vn 에서 귀하의 모든 질문에 답변해 드립니다: https://ppa.diaochoangduong.vn/blog/. 바로 아래에서 답을 찾을 수 있습니다. 작성자 DigitalSreeni 이(가) 작성한 기사에는 조회수 12,663회 및 좋아요 272개 개의 좋아요가 있습니다.

autoencoder anomaly detection 주제에 대한 동영상 보기

여기에서 이 주제에 대한 비디오를 시청하십시오. 주의 깊게 살펴보고 읽고 있는 내용에 대한 피드백을 제공하세요!

d여기에서 88 – Applications of Autoencoders – Anomaly Detection – autoencoder anomaly detection 주제에 대한 세부정보를 참조하세요

Autoencoders can be used for anomaly detection by setting limits on the reconstruction error. All ‘good’ data points fall within the acceptable error and any outliers are considered anomalies. This approach can be used for images or other forms of data. This video tutorial explains the process using a synthetic dataset stored in a csv file.
The code from this video is available at: https://github.com/bnsreenu/python_for_microscopists

autoencoder anomaly detection 주제에 대한 자세한 내용은 여기를 참조하세요.

Handbook of Anomaly Detection with Python Outlier Detection

An autoencoder is a special type of neural network that copies the input values to the output values as shown in Figure (B). It does not require …

+ 더 읽기

Source: towardsdatascience.com

Date Published: 2/16/2021

View: 702

Anomaly Detection with Auto-Encoders – Kaggle

Actual training of our autoencoder; Valation of the neural network’s ability to generalize. Testing : mix of fraud and non-fraud. Treated like new data …

+ 여기에 자세히 보기

Source: www.kaggle.com

Date Published: 6/30/2022

View: 6101

Anomaly Detection using AutoEncoders | A Walk-Through in …

Anomaly detection is the process of finding abnormalities in data. In this post let us dive deep into anomaly detection using autoencoders.

+ 자세한 내용은 여기를 클릭하십시오

Source: www.analyticsvidhya.com

Date Published: 6/23/2021

View: 6686

ANOMALY DETECTION IN CARDIO DATASET USING DEEP …

Anomaly Detection: Autoencoders use the property of a neural network in a special way to accomplish some efficient methods of training networks to learn normal …

+ 여기에 더 보기

Source: medium.com

Date Published: 4/21/2021

View: 5157

Autoencoders for Anomaly Detection (MNIST DIGIT)

This is a small experiment on autoencoders application for anomaly detection done using MNIST-digit dataset on Kaggle. Made by Tanmay Mane using W&B.

+ 여기에 더 보기

Source: wandb.ai

Date Published: 10/28/2021

View: 7450

Anomaly Detection Using Autoencoder Reconstruction upon …

Extreme learning machine boundary (ELM-B) is used to detect anomalies and utilises traditional autoencoders (AEs) configured with start-up …

+ 여기에 더 보기

Source: www.mdpi.com

Date Published: 9/14/2021

View: 6644

주제와 관련된 이미지 autoencoder anomaly detection

주제와 관련된 더 많은 사진을 참조하십시오 88 – Applications of Autoencoders – Anomaly Detection. 댓글에서 더 많은 관련 이미지를 보거나 필요한 경우 더 많은 관련 기사를 볼 수 있습니다.

88 - Applications of Autoencoders - Anomaly Detection
88 – Applications of Autoencoders – Anomaly Detection

주제에 대한 기사 평가 autoencoder anomaly detection

  • Author: DigitalSreeni
  • Views: 조회수 12,663회
  • Likes: 좋아요 272개
  • Date Published: 2020. 1. 20.
  • Video Url link: https://www.youtube.com/watch?v=u1vLJBwOFC8

Anomaly Detection using AutoEncoders – A Walk-Through in Python

This article was published as a part of the Data Science Blogathon

Anomaly Detection

Anomaly detection is the process of finding abnormalities in data. Abnormal data is defined as the ones that deviate significantly from the general behavior of the data. Some of the applications of anomaly detection include fraud detection, fault detection, and intrusion detection. Anomaly Detection is also referred to as outlier detection.

Some of the anomaly detection algorithms are,

Local Outlier Factor

Isolation Forest

Connectivity Based Outlier Factor

KNN based Outlier Detection

One class SVM

AutoEncoders

Outlier Detection vs Novelty Detection

In outlier detection, the training data consists of both anomalies and normal observations whereas in novelty detection the training data consists only of normal observations rather than having both normal and anomalous observations. In this post, we’re gonna see a use case of novelty detection.

AutoEncoder

AutoEncoder is an unsupervised Artificial Neural Network that attempts to encode the data by compressing it into the lower dimensions (bottleneck layer or code) and then decoding the data to reconstruct the original input. The bottleneck layer (or code) holds the compressed representation of the input data. The number of hidden units in the code is called code size.

Applications of AutoEncoders

Dimensionality reduction

Anomaly detection

Image denoising

Image compression

Image generation

In this post let us dive deep into anomaly detection using autoencoders.

Anomaly Detection using AutoEncoders

AutoEncoders are widely used in anomaly detection. The reconstruction errors are used as the anomaly scores. Let us look at how we can use AutoEncoder for anomaly detection using TensorFlow.

Import the required libraries and load the data. Here we are using the ECG data which consists of labels 0 and 1. Label 0 denotes the observation as an anomaly and label 1 denotes the observation as normal.

import numpy as np import pandas as pd import tensorflow as tf import matplotlib.pyplot as plt from sklearn.metrics import accuracy_score from tensorflow.keras.optimizers import Adam from sklearn.preprocessing import MinMaxScaler from tensorflow.keras import Model, Sequential from tensorflow.keras.layers import Dense, Dropout from sklearn.model_selection import train_test_split from tensorflow.keras.losses import MeanSquaredLogarithmicError # Download the dataset PATH_TO_DATA = ‘http://storage.googleapis.com/download.tensorflow.org/data/ecg.csv’ data = pd.read_csv(PATH_TO_DATA, header=None) data.head() # data shape # (4998, 141)

Output:

# last column is the target # 0 = anomaly, 1 = normal TARGET = 140 features = data.drop(TARGET, axis=1) target = data[TARGET] x_train, x_test, y_train, y_test = train_test_split( features, target, test_size=0.2, stratify=target ) # use case is novelty detection so use only the normal data # for training train_index = y_train[y_train == 1].index train_data = x_train.loc[train_index] # min max scale the input data min_max_scaler = MinMaxScaler(feature_range=(0, 1)) x_train_scaled = min_max_scaler.fit_transform(train_data.copy()) x_test_scaled = min_max_scaler.transform(x_test.copy())

The last column in the data is the target ( column name is 140). Split the data for training and testing and scale the data using MinMaxScaler.

# create a model by subclassing Model class in tensorflow class AutoEncoder(Model): “”” Parameters ———- output_units: int Number of output units code_size: int Number of units in bottle neck “”” def __init__(self, output_units, code_size=8): super().__init__() self.encoder = Sequential([ Dense(64, activation=’relu’), Dropout(0.1), Dense(32, activation=’relu’), Dropout(0.1), Dense(16, activation=’relu’), Dropout(0.1), Dense(code_size, activation=’relu’) ]) self.decoder = Sequential([ Dense(16, activation=’relu’), Dropout(0.1), Dense(32, activation=’relu’), Dropout(0.1), Dense(64, activation=’relu’), Dropout(0.1), Dense(output_units, activation=’sigmoid’) ]) def call(self, inputs): encoded = self.encoder(inputs) decoded = self.decoder(encoded) return decoded model = AutoEncoder(output_units=x_train_scaled.shape[1]) # configurations of model model.compile(loss=’msle’, metrics=[‘mse’], optimizer=’adam’) history = model.fit( x_train_scaled, x_train_scaled, epochs=20, batch_size=512, validation_data=(x_test_scaled, x_test_scaled) )

The encoder of the model consists of four layers that encode the data into lower dimensions. The decoder of the model consists of four layers that reconstruct the input data.

The model is compiled with Mean Squared Logarithmic loss and Adam optimizer. The model is then trained with 20 epochs with a batch size of 512.

plt.plot(history.history[‘loss’]) plt.plot(history.history[‘val_loss’]) plt.xlabel(‘Epochs’) plt.ylabel(‘MSLE Loss’) plt.legend([‘loss’, ‘val_loss’]) plt.show()

def find_threshold(model, x_train_scaled): reconstructions = model.predict(x_train_scaled) # provides losses of individual instances reconstruction_errors = tf.keras.losses.msle(reconstructions, x_train_scaled) # threshold for anomaly scores threshold = np.mean(reconstruction_errors.numpy()) \ + np.std(reconstruction_errors.numpy()) return threshold def get_predictions(model, x_test_scaled, threshold): predictions = model.predict(x_test_scaled) # provides losses of individual instances errors = tf.keras.losses.msle(predictions, x_test_scaled) # 0 = anomaly, 1 = normal anomaly_mask = pd.Series(errors) > threshold preds = anomaly_mask.map(lambda x: 0.0 if x == True else 1.0) return preds threshold = find_threshold(model, x_train_scaled) print(f”Threshold: {threshold}”) # Threshold: 0.01001314025746261 predictions = get_predictions(model, x_test_scaled, threshold) accuracy_score(predictions, y_test) # 0.944

The reconstruction errors are considered to be anomaly scores. The threshold is then calculated by summing the mean and standard deviation of the reconstruction errors. The reconstruction errors above this threshold are considered to be anomalies. We can further fine-tune the model by leveraging Keras-tuner.

The autoencoder model does not have to symmetric encoder and decoder but the code size has to be smaller than that of the features in the data.

Find the entire code in my Google Colab Notebook.

References

[2] Intro to Autoencoders

Happy Deep Learning!

Thank You!

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Analytics Vidhya

This article was published as a part of the Data Science Blogathon.

critical in machine learning use cases. Data Compression is a big topic used in computer vision, computer networks, and many more. Data compression represents our input into a more miniature representation that we recreate to quality. This is a more miniature representation of what would be passed around, and when an original one is required, they will reconstruct it from a more miniature model.

Table of Contents

Brief on AutoEncoders why we need AutoEncoders? Components of AutoEncoders Properties of Autoencoders Architecture of Autoencoders Training of AutoEncoder Applications of Autoencoders Hands-on Implementation of Anomaly detection using AutoEncoders End Notes

Introduction to AutoEncoders

AutoEncoders is a neural network that learns to copy its inputs to outputs. In simple words, AutoEncoders are used to learn the compressed representation of raw data. Autoencoders are based on unsupervised machine learning that applies the backpropagation technique and sets the target values equal to the inputs. It does here is simple dimensionality reduction, the same as the PCA algorithm. But the potential benefit is how they treat the non-linearity of data. It allows the Model to learn very powerful generalizations. And it can reconstruct the output back with lower significant loss of information than PCA. This is the advantage of AutoEncoder over PCA. Let us summarize Autoencoder in the below three key points.

It is an unsupervised ML algorithm similar to PCA.

It minimizes the same objective function as PCA.

The neural network target output is its output.

Why do we need AutoEncoders?

Many have questions that if we already have PCA, why learn and use Autoencoder? Is it only because of its property to work on non-linear data? So the answer is No because apart from dealing with non-linear data, Autoencoder provides different applications from Computer vision to time series forecasting.

Non-linear Transformations – it can learn non-linear activation functions and multiple layers. Convolutional layer – it doesn’t have to learn dense layers to use CNN or LSTM. Higher Efficiency – More efficient in model parameters to learn several layers with an autoencoder rather than learn one huge transformation with PCA. Multiple transformations – An autoencoder also gives a representation as to the output of each layer, and having multiple representations of different dimensions is always useful. An autoencoder lets you use pre-trained layers from another model to apply transfer learning to prime the encoder and decoder.

Components of AutoEncoders

Autoencoder is comprised of two parts named encoder and decoder. In some articles, you will also find three components, and the third component is a middleware between both known as code.

Encoders

It compresses the input into a latent space representation. The encoder layer encodes the input image as a compressed representation in a reduced dimension; now, the compressed image looks like the original image but not the original image.

Code

An encoder is mapping from input space into lower dimension latent space, also known as bottleneck layer(represented as z in architecture). At this stage, it is a lower-dimensional representation of data unsupervised. Code is the part that represents the compressed input fed to the decoder.

Decoder

The decoder decodes the encoded image back to the original image of the same dimension. The decoder takes the data from the lower latent space to the reconstruction phase, where the dimensionality of the output X bar is equal to output X. But if we look at it as Image compression, then there is lossless compression, but In the case of Autoencoders, there is lossy compression, so what happens is it compresses and uncompresses the input. When it uncompresses, it tries to reach close to input, but the output is not the same.

Properties of AutoEncoders

Let us look at the important properties passed by autoencoders.

1) Unsupervised – They do not need labels to train on.

2) data specific – They can only compress the data similar to what they have been trained on. for example, an autoencoder trained on the human face will not perform well on images of modern buildings. This improvises the difference between auto-encoder and mp3 kind of compression algorithm, which only holds assumptions about sound.

3) Lossy – Autoencoders are lossy, meaning the decompressed output will be degraded.

Architecture of AutoEncoder

Now let us understand the architecture of Autoencoder and have a deeper insight into the hidden layers. So in Autoencoder, we add a couple of layers between input and output, and the sizes of this layer are smaller than the input layer.

A critical part of the Autoencoder is the bottleneck. The bottleneck approach is a beautifully elegant approach to representation learning specifically for deciding which aspects of obs data are relevant information and which aspects can be thrown away. It describes by balancing two criteria.

The compactness of representation, measured as the compressibility number of bits needed to store compressibility. Information the representation retains about some behaviorally rele

In this case, the difference between input representation and output representation is known as reconstruction error(error between input vector and output vector). One of the predominant use cases of the Autoencoder is anomaly detection. Think about cases like IoT devices, sensors in CPU, and memory devices which work very nicely as per functions. Still, when we collect their fault data, we have majority positive classes and significantly less percentage of minority class data, also known as imbalance data. Sometimes it is tough to label the data or expensive labelling the data, so we know the expected behaviour of data.

We pass Autoencoder with majority classes(normal data). The training objective is to minimize the reconstruction error, and the training objective is to minimize this. as training progresses, the model weights for the encoder and decoder are updated. The encoder is a downsampler, and the decoder is an upsampler. Encoder and decoder can be ANN, CNN, or LSTM neural network.

What AutoEncoder does? It learns the reconstruction function that works with normal data, and we can use this Model for anomaly detection. We get low reconstruction error for normal data and high for abnormal data(minority class).

Reconstruction error is the difference between the red and blue lines, and this reconstruction error is undoubtedly high. So a model will not fit perfectly.

Training of AutoEncoders

There are four significant hyperparameters that we need to set before training them.

1) Code Size represents the number of nodes in the middle layer—the smaller size results in more compression.

2) Number of Layers – The Autoencoder can be as deep as we want to be.

3) Loss function – To update the weights, we must calculate the loss, which we need to minimize using optimizer and weight updation. Mainly mean squared error and binary cross-entropy. The input value is 0-1; then, we use cross-entropy. Otherwise, we use mean squared error.

4) Number of nodes per layer – the number of nodes decreases with each subsequent encoder layer and increases back in the decoder.

We have unlimited power to handle this hyperparameter as per our choice.

Application of AutoEncoder

1) Image Reconstruction – The convolutional Autoencoder learns to remove noise from a picture or reconstruct the missing parts, so the input noisy version becomes the clean output version. The network also fills the gap in the image.

2) Image colourization – Autoencoders maps circles and squares from an image to the same image with red and blue, respectively. It is useful in converting any black and white picture to a coloured image.

3) Feature variation – It extracts only the required features of an image and generates the output by removing any noise or unnecessary interruption.

4) Image Search – Deep autoencoders can compress images into 30 number vector images. Image search becomes a matter of application where a search engine will compress an image and compare the vector to index and translate to a matching image.

Hands-On Implementation of Anomaly Detection model using Autoencoders

About Dataset

We will be using ECG dataset throughout this article. ECG stands for Electrocardiogram. It checks how your heart is functioning by measuring the electrical activities of your heart. Ian electrical pulse travels through the heart in each heartbeat, creating muscles to squeeze and pump blood to the heart. It carries the electrical impulse that helps the doctor know if the heart is pumping normally or strange. So we will take this dataset and build an anomaly detector using Autoencoder. Download the dataset zip folder and extract it from here.

Loading Libraries

We import the important libraries like pandas and numpy to play with data(data preprocessing and common mathematical operations), matplotlib for data visualization. We use Keras API built on top of TensorFlow for model building.

import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib as mpl import tensorflow as tf from tensorflow.keras.models import Model from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler, StandardScaler mpl.rcParams[‘figure.figsize’] = (10, 5) mpl.rcParams[‘axes.grid’] = False

We combine the train and test files to load the dataset and save the result in another file. For this, use the same cat command in jupyter notebook or google collab, and then we load the data. Data does not have a header, so we give it as a None otherwise, it uses the first row as a header. The data has 5000 rows and 140 columns, where the first column is our target column.

!cat “/ECG5000_TRAIN.txt” “/ECG5000_TEST.txt” > ecg_final.txt df = pd.read_csv(“ecg_final.txt”, sep=’ ‘, header=None) df.shape

Basic Preprocessing

Pandas give the default name as numeric to the column name, so we cannot do any processing because pandas do not allow slice and dice if we have only numeric in the column name. So, we are adding any prefix to the column name to do slice and dice on columns.

df = df.add_prefix(‘c’) df[‘c0’].value_counts()

Suppose we see the number of records in each category using the value counts function. In that case, we see one has more number of observations so that we will take the first category into the normal data and combine the data from the second category to the fifth as abnormal.

Train-Test Splitting and Scaling the data

Before separating the data as normal and abnormal, we split the data into train and test sets. Neural networks give a good performance when data is scaled on a common scale, so we normalize the data.

x_train, x_test, y_train, y_test = train_test_split(df.values, df.values[:,0:1], test_size=0.2, random_state=111)

scaler = MinMaxScaler() data_scaled = scaler.fit(x_train) train_data_scaled = data_scaled.transform(x_train) test_data_scaled = data_scaled.transform(x_test)

Separate Anomaly and Normal Data

Now we will split the training data into normal data and anomaly data. We know how to split these two datasets based on the value train data scale means the first category in normal data and the remaining category data in anomaly data. And the same applies to test data.

normal_train_data = pd.DataFrame(train_data_scaled).add_prefix(‘c’).query(‘c0 == 0’).values[:,1:] anomaly_train_data = pd.DataFrame(train_data_scaled).add_prefix(‘c’).query(‘c0 > 0’).values[:, 1:] normal_test_data = pd.DataFrame(test_data_scaled).add_prefix(‘c’).query(‘c0 == 0’).values[:,1:] anomaly_test_data = pd.DataFrame(test_data_scaled).add_prefix(‘c’).query(‘c0 > 0′).values[:, 1:]

Now the training dataset has its own normal and anomaly dataframe. This is an 80-20 split. The dataset is not highly imbalanced that we need to do anything, but anyway, like for Autoencoder, we will train only with a normal dataset. An anomaly dataset will only be used for validation and finding inference.

Data Visualization

Data visualization is vital to understanding the relationship between two or more variables. So that we can see how the data differs, let us first plot the normal data.

plt.plot(normal_train_data[0]) plt.plot(normal_train_data[1]) plt.plot(normal_train_data[2]) plt.title(“Normal Data”) plt.show()

We plot the first three columns. At an initial level, it is dropped because ECG measurement starts from there, and then it moves in a normal way. Now let us plot take anomaly data on the contrary.

plt.plot(anomaly_train_data[0]) plt.plot(anomaly_train_data[1]) plt.plot(anomaly_train_data[2]) plt.title(“Anomaly Data”) plt.show()

By observing the above graphs difference between normal and anomaly data can be easily understood. If you remember about the term reconstruction error we discussed above, we will use it to identify and differentiate anomaly and normal data.

Modelling

To create an autoencoder model, there are two ways. One is to use Sequential modelling provided by Keras API on top of TensorFlow. First, we add the encoding layer, decoding layers and one intermediate layer. It converts the data from a higher dimension into a lower dimension(). And then, to reconstruct the data back, it uses an upsampler. The below code snippet defines this architecture.

model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(64, activation=”relu”)) model.add(tf.keras.layers.Dense(32, activation=”relu”)) model.add(tf.keras.layers.Dense(16, activation=”relu”)) model.add(tf.keras.layers.Dense(8, activation=”relu”)) model.add(tf.keras.layers.Dense(16, activation=”relu”)) model.add(tf.keras.layers.Dense(32, activation=”relu”)) model.add(tf.keras.layers.Dense(64, activation=”relu”)) model.add(tf.keras.layers.Dense(140, activation=”sigmoid”))

The difference between input and output is reconstruction error which will be very high compared to normal data, which is used to differentiate between normal and anomaly data.

The other way that we will use is subclassing or Model subclassing . Why we use it is because it tells us to use encoder and decoder separately easily. suppose If we want this Model only for compressing the data, then I can only use encoder. Hence, it allows using a model in multiple different ways. Below is a code snippet and an explanation of it.

class AutoEncoder(Model): def __init__(self): super(AutoEncoder, self).__init__() self.encoder = tf.keras.Sequential([ tf.keras.layers.Dense(64, activation=”relu”), tf.keras.layers.Dense(32, activation=”relu”), tf.keras.layers.Dense(16, activation=”relu”), tf.keras.layers.Dense(8, activation=”relu”) ]) self.decoder = tf.keras.Sequential([ tf.keras.layers.Dense(16, activation=”relu”), tf.keras.layers.Dense(32, activation=”relu”), tf.keras.layers.Dense(64, activation=”relu”), tf.keras.layers.Dense(140, activation=”sigmoid”) ]) def call(self, x): encoded = self.encoder(x) decoded = self.decoder(encoded) return decoded

Explanation ~ Above, we create a class and create a constructor, and then we create an encoder in the same way of decreasing layer order and 8 units are bottleneck layers. Then we have a decoder that does upsampling of the data that is downsampled by an encoder and finally, the output is 140 units. depending on the problem statement number of units changes. The final activation function is sigmoid. The call function calls the encoder and passes the input data and the encoded data is passed to the decoder. So this Autoencoder is a class and when I run this class it will return the final model object which contains an encoder, bottleneck layer, and a decoder.

Compile and train the Model

We are adding the Early Stopping phenomenon which terminates the training if the validation loss is not decreasing after two epochs. Then we compile the Model using Adam optimizer. we use MAE(mean absolute error) as a loss function. Now we pass the train data to the Model (we pass it two times because it is a mandatory field so pass a dummy Y values). We add 50 epochs but due to early stopping, it will not run too many epochs. If you are using time-series data then the order is important so you have to set shuffle to False.

model = AutoEncoder() early_stopping = tf.keras.callbacks.EarlyStopping(monitor=”val_loss”, patience=2, mode=”min”) model.compile(optimizer=’adam’, loss=”mae”) history = model.fit(normal_train_data, normal_train_data, epochs=50, batch_size=120, validation_data=(train_data_scaled[:,1:], train_data_scaled[:, 1:]), shuffle=True, callbacks=[early_stopping] )

Model Evaluation

In Autoencoders difference between training loss and validation loss is high. It does not mean that Model is underfitting. The reason is in the validation function I am giving both the normal data and abnormal data against training data as normal data so you will see this kind of behaviour which is completely fine that’s why we are having early stopping. I can get my encoder and decoder output separately.

encoder_out = model.encoder(normal_test_data).numpy() #8 unit representation of data decoder_out = model.decoder(encoder_out).numpy()

First, we will plot the performance on Normal data which is first-class data.

plt.plot(normal_test_data[0], ‘b’) plt.plot(decoder_out[0], ‘r’) plt.title(“Model performance on Normal data”) plt.show()

As discussed there will be a very little reconstruction error on Normal data which we can see in the above graph. The difference between both the red and blue lines is very less. what if I pass the anomaly test data.

encoder_out_a = model.encoder(anomaly_test_data).numpy() #8 unit representation of data decoder_out_a = model.decoder(encoder_out_a).numpy()

plt.plot(anomaly_test_data[0], ‘b’) plt.plot(decoder_out_a[0], ‘r’) plt.title(“Model performance on Anomaly Data”) plt.show()

The top is anomaly test data and the red one is decoder output. if you see that the reconstruction error over here is pretty high. Now we got a very good model.

Calculate Loss

Now we will define our loss like threshold loss for our Model which gives us an output error between input and output. we define the Keras loss function over mean absolute error and plot it in form of a histogram.

reconstruction = model.predict(normal_test_data) train_loss = tf.keras.losses.mae(reconstruction, normal_test_data) plt.hist(train_loss, bins=50)

If you see the X-axis then most of the values lie below 0.5. there are a few anomalies because we cannot have a 100% perfect model. This is how the error looks like between normal data and reconstructed data. Now we want to set a threshold from which we can tell that value above it is anomalies and below it is normal data so we take the mean of training loss and multiply it with second standard deviation. The threshold should be set as per business standards.

threshold = np.mean(train_loss) + 2*np.std(train_loss) reconstruction_a = model.predict(anomaly_test_data) train_loss_a = tf.keras.losses.mae(reconstruction_a, anomaly_test_data)

plt.hist(train_loss_a, bins=50) plt.title(“loss on anomaly test data”) plt.show()

If you saw the above graph then most of the data were below 0.5 and if we plot histogram on anomaly test data loss then most of the data lies above 0.5. Observe the separation between normal data loss and anomaly data loss; hence, we got a very good model.

Plot Normal and anomaly Loss together

To get a better idea of how both losses together look like let us plot both the loss along with the threshold. So we pass the normal training loss and anomaly loss in a separate histogram on a single graph. we are drawing a vertical line on the graph which is the threshold for better visualization.

plt.hist(train_loss, bins=50, label=’normal’) plt.hist(train_loss_a, bins=50, label=’anomaly’) plt.axvline(threshold, color=’r’, linewidth=3, linestyle=’dashed’, label='{:0.3f}’.format(threshold)) plt.legend(loc=’upper right’) plt.title(“Normal and Anomaly Loss”) plt.show()

Now if you see a mean anomaly loss then earlier it was 33 percent but not it is 90 percent which is way far but the standard deviation will be close. Now we have a very good understanding of model performance so let us see how many false positives and false negatives are there so that we can define the average model performance.

How well does it predict Normal Class?

Anomaly data is data above to threshold so we use TensorFlow max function to find the values that are in anomaly data.

preds = tf.math.less(train_loss, threshold) tf.math.count_nonzero(preds)

Among 563 total records, it predicts 536 values correctly so we can say that Model is 95 per cent accurate in predicting normal class.

How well does it perform on Anomaly data?

Now we use a greater function to find the count of values that are greater than a threshold which is anomalies.

preds_a = tf.math.greater(train_loss_a, threshold) tf.math.count_nonzero(preds_a)

From 437 total records 431, it has predicted accurately means the final Model is 90 to 95 per cent accurate in predicting the new points.

End Notes

Hurray! we have made our first autoencoder model from scratch for anomaly detection which is working pretty decent on new unseen data. You can use different architecture like LSTM, convolutional 1-d, etc but this is a base model only to make you understand the working and requirement of Autoencoder in today’s data world and how does it manage to give better results than other models like PCA, isolation forest.

I hope that it was easy to catch up with each heading we have discussed in this guide. you can post your doubts in the comment section below 👇, and can connect with me.👍

Image Sources

Towards Data Science(Applications of Autoencoders)

Autoencoder Architecture at TDS

Some Images are code screenshots and Graph Plots

Connect with me on Linkedin

Check out my other articles here and on Blogspot

Thanks for giving your time!

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Anomaly Detection using AutoEncoders – A Walk-Through in Python

This article was published as a part of the Data Science Blogathon

Anomaly Detection

Anomaly detection is the process of finding abnormalities in data. Abnormal data is defined as the ones that deviate significantly from the general behavior of the data. Some of the applications of anomaly detection include fraud detection, fault detection, and intrusion detection. Anomaly Detection is also referred to as outlier detection.

Some of the anomaly detection algorithms are,

Local Outlier Factor

Isolation Forest

Connectivity Based Outlier Factor

KNN based Outlier Detection

One class SVM

AutoEncoders

Outlier Detection vs Novelty Detection

In outlier detection, the training data consists of both anomalies and normal observations whereas in novelty detection the training data consists only of normal observations rather than having both normal and anomalous observations. In this post, we’re gonna see a use case of novelty detection.

AutoEncoder

AutoEncoder is an unsupervised Artificial Neural Network that attempts to encode the data by compressing it into the lower dimensions (bottleneck layer or code) and then decoding the data to reconstruct the original input. The bottleneck layer (or code) holds the compressed representation of the input data. The number of hidden units in the code is called code size.

Applications of AutoEncoders

Dimensionality reduction

Anomaly detection

Image denoising

Image compression

Image generation

In this post let us dive deep into anomaly detection using autoencoders.

Anomaly Detection using AutoEncoders

AutoEncoders are widely used in anomaly detection. The reconstruction errors are used as the anomaly scores. Let us look at how we can use AutoEncoder for anomaly detection using TensorFlow.

Import the required libraries and load the data. Here we are using the ECG data which consists of labels 0 and 1. Label 0 denotes the observation as an anomaly and label 1 denotes the observation as normal.

import numpy as np import pandas as pd import tensorflow as tf import matplotlib.pyplot as plt from sklearn.metrics import accuracy_score from tensorflow.keras.optimizers import Adam from sklearn.preprocessing import MinMaxScaler from tensorflow.keras import Model, Sequential from tensorflow.keras.layers import Dense, Dropout from sklearn.model_selection import train_test_split from tensorflow.keras.losses import MeanSquaredLogarithmicError # Download the dataset PATH_TO_DATA = ‘http://storage.googleapis.com/download.tensorflow.org/data/ecg.csv’ data = pd.read_csv(PATH_TO_DATA, header=None) data.head() # data shape # (4998, 141)

Output:

# last column is the target # 0 = anomaly, 1 = normal TARGET = 140 features = data.drop(TARGET, axis=1) target = data[TARGET] x_train, x_test, y_train, y_test = train_test_split( features, target, test_size=0.2, stratify=target ) # use case is novelty detection so use only the normal data # for training train_index = y_train[y_train == 1].index train_data = x_train.loc[train_index] # min max scale the input data min_max_scaler = MinMaxScaler(feature_range=(0, 1)) x_train_scaled = min_max_scaler.fit_transform(train_data.copy()) x_test_scaled = min_max_scaler.transform(x_test.copy())

The last column in the data is the target ( column name is 140). Split the data for training and testing and scale the data using MinMaxScaler.

# create a model by subclassing Model class in tensorflow class AutoEncoder(Model): “”” Parameters ———- output_units: int Number of output units code_size: int Number of units in bottle neck “”” def __init__(self, output_units, code_size=8): super().__init__() self.encoder = Sequential([ Dense(64, activation=’relu’), Dropout(0.1), Dense(32, activation=’relu’), Dropout(0.1), Dense(16, activation=’relu’), Dropout(0.1), Dense(code_size, activation=’relu’) ]) self.decoder = Sequential([ Dense(16, activation=’relu’), Dropout(0.1), Dense(32, activation=’relu’), Dropout(0.1), Dense(64, activation=’relu’), Dropout(0.1), Dense(output_units, activation=’sigmoid’) ]) def call(self, inputs): encoded = self.encoder(inputs) decoded = self.decoder(encoded) return decoded model = AutoEncoder(output_units=x_train_scaled.shape[1]) # configurations of model model.compile(loss=’msle’, metrics=[‘mse’], optimizer=’adam’) history = model.fit( x_train_scaled, x_train_scaled, epochs=20, batch_size=512, validation_data=(x_test_scaled, x_test_scaled) )

The encoder of the model consists of four layers that encode the data into lower dimensions. The decoder of the model consists of four layers that reconstruct the input data.

The model is compiled with Mean Squared Logarithmic loss and Adam optimizer. The model is then trained with 20 epochs with a batch size of 512.

plt.plot(history.history[‘loss’]) plt.plot(history.history[‘val_loss’]) plt.xlabel(‘Epochs’) plt.ylabel(‘MSLE Loss’) plt.legend([‘loss’, ‘val_loss’]) plt.show()

def find_threshold(model, x_train_scaled): reconstructions = model.predict(x_train_scaled) # provides losses of individual instances reconstruction_errors = tf.keras.losses.msle(reconstructions, x_train_scaled) # threshold for anomaly scores threshold = np.mean(reconstruction_errors.numpy()) \ + np.std(reconstruction_errors.numpy()) return threshold def get_predictions(model, x_test_scaled, threshold): predictions = model.predict(x_test_scaled) # provides losses of individual instances errors = tf.keras.losses.msle(predictions, x_test_scaled) # 0 = anomaly, 1 = normal anomaly_mask = pd.Series(errors) > threshold preds = anomaly_mask.map(lambda x: 0.0 if x == True else 1.0) return preds threshold = find_threshold(model, x_train_scaled) print(f”Threshold: {threshold}”) # Threshold: 0.01001314025746261 predictions = get_predictions(model, x_test_scaled, threshold) accuracy_score(predictions, y_test) # 0.944

The reconstruction errors are considered to be anomaly scores. The threshold is then calculated by summing the mean and standard deviation of the reconstruction errors. The reconstruction errors above this threshold are considered to be anomalies. We can further fine-tune the model by leveraging Keras-tuner.

The autoencoder model does not have to symmetric encoder and decoder but the code size has to be smaller than that of the features in the data.

Find the entire code in my Google Colab Notebook.

References

[2] Intro to Autoencoders

Happy Deep Learning!

Thank You!

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

How is Autoencoder different from PCA

In this article, we are going to see how is Autoencoder different from Principal Component Analysis (PCA).

Role of Dimensionality Reduction in ML

We often come into the curse of dimensionality issues in machine learning projects, when the amount of data records is not a significant component of the number of features. This often causes issues since it necessitates training a large number of parameters with a limited data set, which may easily lead to overfitting and poor generalization. High dimensionality also entails lengthy training periods. To solve these challenges, dimensionality reduction methods are often utilized. Despite its location in high-dimensional space, feature space often possesses a low-dimensional structure.

PCA and auto-encoders are two popular methods for lowering the dimensionality of the feature space.

Principal Component Analysis (PCA)

PCA simply projects the data into another space by learning a linear transformation with projection vectors specified by the data’s variance. Dimensionality reduction may be achieved by limiting the dimensionality to a small number of components that account for the majority of the variation in the data set.

Autoencoders

Autoencoders are neural networks that stack numerous non-linear transformations to reduce input into a low-dimensional latent space (layers). They use an encoder-decoder system. The encoder converts the input into latent space, while the decoder reconstructs it. For accurate input reconstruction, they are trained through backpropagation. Autoencoders may be used to reduce dimensionality when the latent space has fewer dimensions than the input. Because they can rebuild the input, these low-dimensional latent variables should store the most relevant properties, according to intuition.

PCA vs Autoencoder

Although PCA is fundamentally a linear transformation, auto-encoders may describe complicated non-linear processes.

Because PCA features are projections onto the orthogonal basis, they are completely linearly uncorrelated. However, since autoencoded features are only trained for correct reconstruction, they may have correlations.

PCA is quicker and less expensive to compute than autoencoders.

PCA is quite similar to a single layered autoencoder with a linear activation function.

Because of the large number of parameters, the autoencoder is prone to overfitting. (However, regularization and proper planning might help to prevent this).

How to select the models?

Aside from processing computational resources, the choice of approach is influenced by the features of the feature space itself. If the features have a non-linear connection, the autoencoder may compress the data more efficiently into a low-dimensional latent space by utilizing its capacity to represent complicated non-linear processes.

Researchers created a two-dimensional feature space with linear and non-linear relationships between them (x and y are two features) (with some added noise). After projecting the input into latent space, we can compare the capabilities of autoenocoders and PCA to properly reconstruct the input. PCA is a linear transformation with a well-defined inverse transform, and the reconstructed input comes from the autoencoder’s decoder output. For both PCA and autoencoders, we employ a one-dimensional latent space.

Autoencoded latent space may be employed for more accurate reconstruction if there is a nonlinear connection (or curvature) in the feature space. PCA, on the other hand, only keeps the projection onto the first principal component and discards any information that is perpendicular to it.

Conclusion:

There must be underlying low-dimensional structure in the feature space for dimensionality reduction to be successful. To put it another way, the characteristics should be related to one another. Autoencoders may encode more information with fewer dimensions if the low dim structure has non-linearity or curvature. As a result, in certain cases, they are a superior dimensionality reduction strategy.

Anomaly Detection in Machine Learning

Anomaly detection is one of the most common use cases of machine learning. Finding and identifying outliers helps to prevent fraud, adversary attacks, and network intrusions that can compromise your company’s future.

In this post, we will talk about how anomaly detection works, what machine learning techniques you can use for it, and what benefits anomaly detection with ML brings to a business.

What is an anomaly?

Before talking about anomaly detection, we need to understand what an anomaly is.

Generally speaking, an anomaly is something that differs from a norm: a deviation, an exception. In software engineering, by anomaly we understand a rare occurrence or event that doesn’t fit into the pattern, and, therefore, seems suspicious. Some examples are:

sudden burst or decrease in activity;

error in the text;

sudden rapid drop or increase in temperature.

Common reasons for outliers are:

data preprocessing errors;

noise;

fraud;

attacks.

Normally, you want to catch them all; a software program must run smoothly and be predictable so every outlier is a potential threat to its robustness and security. Catching and identifying anomalies is what we call anomaly or outlier detection.

For example, if large sums of money are spent one after another within one day and it is not your typical behavior, a bank can block your card. They will see an unusual pattern in your daily transactions. This anomaly can typically be connected to fraud since identity thieves try to steal as much money as they can while they can. Once an anomaly is detected, it needs to be investigated, or problems may follow.

Types of anomalies

Now let’s see what kinds of anomalies or outliers machine learning engineers usually have to face.

Global outliers

When a data point assumes a value that is far outside all the other data point value ranges in the dataset, it can be considered a global anomaly. In other words, it’s a rare event.

For example, if you receive an average American salary to your bank accounts each month but one day get a million dollars, that would look like a global anomaly to the bank’s analytics team.

Contextual outliers

When an outlier is called contextual it means that its value doesn’t correspond with what we expect to observe for a similar data point in the same context. Contexts are usually temporal, and the same situation observed at different times can be not an outlier.

For example, for stores it’s quite normal to experience an increase in customers during the holiday season. However, if a sudden boost happens outside of holidays or sales, it can be considered a contextual outlier.

Collective outliers

Collective outliers are represented by a subset of data points that deviate from the normal behavior.

In general, tech companies tend to grow bigger and bigger. Some companies may decay but it’s not a general trend. However, if many companies at once show a decrease in revenue in the same period of time, we can identify a collective outlier.

Why do you need machine learning for anomaly detection?

This is a process that is usually conducted with the help of statistics and machine learning tools.

The reason is that the majority of companies today that require outlier detection work with huge amounts of data: transactions, text, image, and video content, etc. You would have to spend days going through all the transitions that happen inside a bank every hour, and more and more are generated every second. It is simply impossible to drive any meaningful insights from this amount of data manually.

Moreover, another difficulty is that the data is often unstructured, which means that the information wasn’t arranged in any specific way for the data analysis. For example, business documents, emails, or images are examples of unstructured data.

To be able to collect, clean, structure, analyze, and store data, you need to use tools that aren’t scared of big volumes of data. Machine learning techniques, in fact, show the best results when large data sets are involved. Machine learning algorithms are able to process most types of data. Moreover, you can choose the algorithm based on your problem and even combine different techniques for the best results.

Machine learning used for real-world applications helps to streamline the process of anomaly detection and save the resources. It can happen not only post-factum but also in real time. Real-time anomaly detection is applied to improve security and robustness, for instance, in fraud discovery and cybersecurity.

What are anomaly detection methods?

There are different kinds of anomaly detection methods with machine learning.

Supervised

In supervised anomaly detection, an ML engineer needs a training dataset. Items in the dataset are labeled into two categories: normal and abnormal. The model will use these examples to extract patterns and be able to detect abnormal patterns in the previously unseen data.

In supervised learning, the quality of the training dataset is very important. There is a lot of manual work involved since somebody needs to collect and label examples.

Note: While you can label some anomalies and try to classify them (hence it’s a classification task), the underlying goal of anomaly detection is defining “normal data points” rather than “abnormal data points”. So in real world applications with very few anomaly samples labelled, it’s almost never regarded as a supervised task.

Unsupervised

This type of anomaly detection is the most common type, and the most well-known representative of unsupervised algorithms are neural networks.

Artificial neural networks allow to decrease the amount of manual work needed to pre-process examples: no manual labeling is needed. Neural networks can even be applied to unstructured data. NNs can detect anomalies in unlabeled data and use what they have learned when working with new data.

The advantage of this method is that it allows you to decrease the manual work in anomaly detection. Moreover, quite often it’s impossible to predict all the anomalies that can occur in the dataset. Think of self-driving cars, for example. They can face a situation on the road that has never happened before. Putting all road situations into a finite number of classes would be impossible. That is why neural networks are priceless when working with real-life data in real-time.

However, ANNs almost rocket science level of complexity. So before you try out those, you might want to experiment with more conventional algorithms like DBSCAN, especially if your project is not that big.

Moreover, the architecture of neural networks is a black box. We often don’t know what kinds of events neural networks will label as anomalies, moreover, it can easily learn wrong rules that are not so easy to fix. That is why unsupervised anomaly detection techniques are often less trustworthy than supervised ones.

Semi-supervised

Semi-supervised anomaly detection methods combine the benefits of the previous two methods. Engineers can apply unsupervised learning methods to automate feature learning and work with unstructured data. However, by combining it with human supervision, they have an opportunity to monitor and control what kind of patterns the model learns. This usually helps to make the model’s predictions more accurate.

Machine learning algorithms for anomaly detection

Multiple machine learning algorithms can be used for anomaly detection depending on the dataset size and the type of the problem.

Local outlier factor (LOF)

Local outlier factor is probably the most common technique for anomaly detection. This algorithm is based on the concept of the local density. It compares the local density of an object with that of its neighbouring data points. If a data point has a lower density than its neighbours, then it is considered an outlier.

K-nearest neighbors

kNN is a supervised ML algorithm often used for classification. When applied to anomaly detection problems, kNN is a useful tool because it allows to easily visualize the data points on the scatterplot and make anomaly detection much more intuitive. Another benefit of kNN is that it works well on both small and large datasets.

Instead of learning ‘normal’ and ‘abnormal’ values to solve the classification problem, kNN doesn’t perform any actual learning. So when it comes to anomaly detection, kNN works as an unsupervised learning algorithm. A machine learning expert defines a range of normal and abnormal values manually, and the algorithm breaks this representation into classes by itself.

Support vector machines

Support vector machine (SVM) is also a supervised machine learning algorithm often used for classification. SVMs use hyperplanes in multi-dimensional space to divide data points into classes. The hyperparameter nu is the threshold (percentage) for outliers which you have to choose manually.

SVM is usually applied when there are more than one classes involved in the problem. However, in anomaly detection it is also used for single class problems. The model is trained to learn the ‘norm’ and can identify whether unfamiliar data belongs to this class or represents an anomaly.

DBSCAN

This is an unsupervised ML algorithm based on the principle of density. DBSCAN is able to uncover clusters in large spatial datasets by looking at the local density of the data points and generally shows good results when used for anomaly detection. The points that do not belong to any cluster get their own class: -1 so they are easy to identify. This algorithm handles outliers well when the data is represented by non-discrete data points.

Autoencoders

This algorithm is based on the use of artificial neural networks that encode the data by compressing it into the lower dimensions. Then, ANNs decode the data to reconstruct the original input. When we reduce the dimensionality, we don’t lose the necessary information because the rules have already been identified in the compressed data. Now we can already discover outliers.

Bayesian networks

Bayesian networks enable ML engineers to discover anomalies even in high-dimensional data. This method is used when the anomalies that we’re looking for are more subtle and harder to discover and visualizing them on the plot might not produce the desired results.

What is anomaly detection used for?

Now let’s see how anomaly detection can be used in practice.

Intrusion detection

Cybersecurity is key for many companies that work with confidential information, intellectual property, and private data of their employees and clients. Intrusion detection systems monitor the network to detect and report potentially malicious traffic. IDS software notifies the team if suspicious activity is detected. Some examples are software by Cisco Systems and McAfee.

Fraud detection

Fraud detection with machine learning helps to prevent activities aimed at obtaining money or property unlawfully. Fraud detection software is used by banks, credit organizations, and insurance companies. For example, banks check loan applications before making a decision. If the system detects that some of the documents are fraudulent, for example, that your tax number doesn’t exist in the system, it will notify the bank employer.

Health monitoring

Anomaly detection systems are incredibly helpful in healthcare. They help doctors with diagnosis detecting unusual patterns in MRI and test results. Usually, neural networks that have been trained on thousands of examples are applied here, and sometimes they give a more accurate diagnosis than doctors with 20 years of experience.

Defect detection

Manufactures can lose millions in lawsuits supplying their clients with mechanisms or mechanism details that have defects. One detail that doesn’t correspond to the production standards can cause a plane to crash, thus, killing hundreds of people.

Anomaly detection systems that use computer vision can detect if the detail has a defect even among thousands of other similar details on the beltline. Moreover, anomaly detection systems can be connected to the mechanisms to monitor internal systems such as engine temperature, fuel levels, and other parameters.

Conclusion

Anomaly detection is identifying data points in data that don’t fit the normal patterns. It can be useful to solve many problems including fraud detection, medical diagnosis, etc. Machine learning methods allow to automate anomaly detection and make it more effective, especially when large datasets are involved. Some of the common ML methods used in anomaly detection include LOF, autoencoders, and Bayesian networks.

If you want to learn more about machine learning, artificial intelligence, and data analysis, continue reading our blog posts:

키워드에 대한 정보 autoencoder anomaly detection

다음은 Bing에서 autoencoder anomaly detection 주제에 대한 검색 결과입니다. 필요한 경우 더 읽을 수 있습니다.

이 기사는 인터넷의 다양한 출처에서 편집되었습니다. 이 기사가 유용했기를 바랍니다. 이 기사가 유용하다고 생각되면 공유하십시오. 매우 감사합니다!

사람들이 주제에 대해 자주 검색하는 키워드 88 – Applications of Autoencoders – Anomaly Detection

  • microscopy
  • python
  • image processing
  • autoencoders
  • Keras
  • neural networks
  • anomaly detection

88 #- #Applications #of #Autoencoders #- #Anomaly #Detection


YouTube에서 autoencoder anomaly detection 주제의 다른 동영상 보기

주제에 대한 기사를 시청해 주셔서 감사합니다 88 – Applications of Autoencoders – Anomaly Detection | autoencoder anomaly detection, 이 기사가 유용하다고 생각되면 공유하십시오, 매우 감사합니다.

See also  Ag Agcl Rhe | Part 6 Silver - Silver Chloride (Ag-Agcl) Reference Electrode (Potentiometry) 103 개의 베스트 답변

Leave a Comment