This tutorial is a part of a rising sequence on Data Science for Raman Spectroscopy with Python printed in towards data science. It’s primarily based on this publication within the journal Analytica Chimica Acta. By following alongside, you’ll add a useful software to your information evaluation toolkit — an efficient methodology for cleansing up Raman spectra that’s already utilized in printed analysis.
Spike elimination is a vital a part of Raman information preprocessing. Spikes, brought on by cosmic rays impacting the detector, seem as intense, slim peaks that may distort the evaluation. These bursts of vitality hit the charge-coupled system (CCD) digicam, creating sharp, high-intensity peaks that, if left uncorrected, can intrude with additional processing steps like normalization, spectral search, or multivariate information evaluation. Cleansing these artifacts is due to this fact a precedence. This tutorial will cowl a sensible algorithm for eradicating spikes from Raman spectra. Utilizing Python, we’ll stroll by a user-friendly, customizable method for spike detection and correction to maintain Raman information correct and dependable.
Determine 1 exhibits an instance of a graphene Raman spectrum the place a spike is current. Graphene’s distinctive bodily properties — akin to its electrical and thermal conductivity — have made it a extremely studied materials. Its Raman spectrum incorporates peaks that mirror structural traits, revealing details about doping, pressure, and grain boundaries. Subsequently, Raman spectroscopy is a extensively used approach to characterize graphene. Nevertheless, to profit from this software, spikes should be beforehand eliminated.
import numpy as np
# Load information immediately right into a numpy array
information = np.loadtxt(spiked_spectrum.asc, delimiter=',', skiprows=1)# Extract Raman shift from the primary column (index)
ramanshift = information[:, 0]
# Extract depth from the second column (index 1in Python)
depth = information[:, 1]
# Plot the information
import matplotlib.pyplot as plt
fig = plt.determine(figsize = (5,3))
plt.plot(ramanshift, depth)
plt.xlabel('Raman shift (cm$^{-1}$)')
plt.ylabel('Depth (a.u.)')
plt.present()
The spike elimination algorithm introduced right here consists of 4 predominant steps:
1. Peak discovering
2. Spike detection
3. Spike flagging
4. Spectrum correction
Let’s check out the totally different steps with Python code snippets:
1. Peak discovering: First, the algorithm identifies vital peaks by checking for native maxima with a minimal prominence threshold. Including a prominence threshold helps to exclude small noise-generated peaks, as we don’t purpose to right all of the noise. See the next determine for comparability.
from scipy.sign import find_peaks
# Discover the peaks within the spectrum (with and with out prominence threshold)
peaks_wo_p, _ = find_peaks(depth) # Peaks discovered and not using a prominence threshold
peaks_w_p, _ = find_peaks(depth, prominence = 20) # Peaks discovered and not using a prominence thresholdfig, ax = plt.subplots(1, 2, figsize = (10,3))
ax[0].plot(ramanshift, depth, zorder=0, label='Uncooked spectrum')
ax[0].scatter(ramanshift[peaks_wo_p], depth[peaks_wo_p], marker ='.', shade = 'pink',label='Discovered peaks')
ax[1].plot(ramanshift, depth, zorder=0, label='Uncooked spectrum')
ax[1].scatter(ramanshift[peaks_w_p], depth[peaks_w_p], marker ='.', shade = 'pink',label='Discovered peaks')
plt.present()
2. Spike detection: Then, spikes are flagged primarily based on their attribute slim widths. This level may assist in the automation of huge spectral datasets. If we all know the width of the Raman bands current in our spectra, we will select a threshold under such a price. For instance, with our system decision, we don’t anticipate to have graphene Raman bands with widths under 10 cm-1.
from scipy.sign import peak_widths
widths = peak_widths(depth, peaks_w_p)[0]fig, ax = plt.subplots(figsize = (5,3))
ax.plot(ramanshift, depth, zorder=0, label='Uncooked spectrum')
ax2 = ax.twinx()
ax2.scatter(ramanshift[peaks_w_p], widths, marker ='+', shade = 'pink',label='Peak widths')
plt.present()
3. Spike flagging Subsequent, any information factors affected by spikes are flagged utilizing a variety calculated from the height’s prominence, successfully isolating corrupted pixels. In different phrases, we choose the window that should be corrected
# Let's set the parameters:
width_param_rel = 0.8
width_threshold = 10 # Estimation of the width of the narrowest Raman band# Calculation of the vary the place the spectral factors are asumed to be corrupted
widths_ext_a = peak_widths(depth, peaks_w_p, rel_height=width_param_rel)[2]
widths_ext_b = peak_widths(depth, peaks_w_p, rel_height=width_param_rel)[3]
# Create a vector the place spikes shall be flag: no spike = 0, spike = 1.
spikes = np.zeros(len(depth))
# Flagging the world beforehand outlined if the height is taken into account a spike (width under width_threshold)
for a, width, ext_a, ext_b in zip(vary(len(widths)), widths, widths_ext_a, widths_ext_b):
if width < width_threshold:
spikes[int(ext_a) - 1: int(ext_b) + 2] = 1
fig = plt.determine(figsize = (5,3))
plt.plot(ramanshift, depth, zorder=0,label='Uncooked spectrum')
a=1
plt.scatter(ramanshift[int(widths_ext_a[a])-1 : int(widths_ext_b[a])+1],
depth[int(widths_ext_a[a])-1 : int(widths_ext_b[a])+1],
shade ='pink', label = 'corrupted factors')
plt.axvline(x = ramanshift[int(widths_ext_a[a]) -1], linestyle = '--', shade = 'pink')
plt.axvline(x = ramanshift[int(widths_ext_b[a]) + 1], linestyle = '--', shade = 'pink')
plt.present()
4. Spectrum correction Lastly, these factors are corrected by interpolation of close by values, preserving the spectrum’s integrity for subsequent analyses.
from scipy import interpolate
# Let's set the parameter:
moving_average_window = 10intensity_out = depth.copy()
# Interpolation of corrupted factors
for i, spike in enumerate(spikes):
if spike != 0: # If we have now an spike in place i
window = np.arange(i - moving_average_window, i + moving_average_window + 1) # we choose 2 ma + 1 factors round our spike
window_exclude_spikes = window[spikes[window] == 0] # From such interval, we select those which aren't spikes
interpolator = interpolate.interp1d(window_exclude_spikes, depth[window_exclude_spikes], type='linear') # We use the not corrupted factors across the spike to calculate the interpolation
intensity_out[i] = interpolator(i) # The corrupted level is exchanged by the interpolated worth.
fig = plt.determine(figsize = (5,3))
plt.plot(ramanshift, depth, zorder=0, shade ='pink',label='Uncooked spectrum')
plt.plot(ramanshift, intensity_out, zorder=0, label='Corrected spectrum')
plt.present()
All these snippets could be summarized in a single operate. This operate is designed to be customizable primarily based in your particular information wants, with parameters for adjusting prominence and width:
import numpy as np
from scipy.sign import find_peaks, peak_widths, peak_prominences
from scipy import interpolatedef spike_removal(y,
width_threshold,
prominence_threshold=None,
moving_average_window=10,
width_param_rel=0.8,
interp_type='linear'):
"""
Detects and replaces spikes within the enter spectrum with interpolated values. Algorithm first
printed by N. Coca-Lopez in Analytica Chimica Acta. https://doi.org/10.1016/j.aca.2024.342312
Parameters:
y (numpy.ndarray): Enter spectrum depth.
width_threshold (float): Threshold for peak width.
prominence_threshold (float): Threshold for peak prominence.
moving_average_window (int): Variety of factors in transferring common window.
width_param_rel (float): Relative top parameter for peak width.
tipo: kind of interpolation (linear, quadratic, cubic)
Returns:
numpy.ndarray: Sign with spikes changed by interpolated values.
"""
# First, we discover all peaks exhibiting a prominence above prominence_threshold on the spectra
peaks, _ = find_peaks(y, prominence=prominence_threshold)
# Create a vector the place spikes shall be flag: no spike = 0, spike = 1.
spikes = np.zeros(len(y))
# Calculation of the widths of the discovered peaks
widths = peak_widths(y, peaks)[0]
# Calculation of the vary the place the spectral factors are asumed to be corrupted
widths_ext_a = peak_widths(y, peaks, rel_height=width_param_rel)[2]
widths_ext_b = peak_widths(y, peaks, rel_height=width_param_rel)[3]
# Flagging the world beforehand outlined if the height is taken into account a spike (width under width_threshold)
for a, width, ext_a, ext_b in zip(vary(len(widths)), widths, widths_ext_a, widths_ext_b):
if width < width_threshold:
spikes[int(ext_a) - 1: int(ext_b) + 2] = 1
y_out = y.copy()
# Interpolation of corrupted factors
for i, spike in enumerate(spikes):
if spike != 0: # If we have now an spike in place i
window = np.arange(i - moving_average_window, i + moving_average_window + 1) # we choose 2 ma + 1 factors round our spike
window_exclude_spikes = window[spikes[window] == 0] # From such interval, we select those which aren't spikes
interpolator = interpolate.interp1d(window_exclude_spikes, y[window_exclude_spikes], type=interp_type) # We use the not corrupted factors across the spike to calculate the interpolation
y_out[i] = interpolator(i) # The corrupted level is exchanged by the interpolated worth.
return y_out
The operate with the algorithm can then be utilized to the spiked graphene spectrum as follows:
intensity_despiked = spike_removal(depth,
width_threshold = 3,
prominence_threshold = 20,
moving_average_window=10,
width_param_rel=0.8,
interp_type='linear')fig, ax = plt.subplots(1, 2, figsize = (2*5,3))
ax[0].plot(ramanshift, depth, label = 'spike', shade ='pink', linewidth = 0.9)
ax[0].plot(ramanshift, intensity_despiked)
ax[1].plot(ramanshift, intensity_despiked)
plt.present()
With this spike elimination method, you possibly can guarantee your Raman spectra are clear and dependable, minimizing artefacts with out dropping important spectral particulars. The tactic is good for automation, particularly if the anticipated minimal peak width is thought, making it extremely adaptable for large-scale spectral datasets and high-throughput evaluation
I hope you loved this tutorial. Be happy to drop any questions or share your individual Raman information challenges within the feedback — I’d love to listen to how this algorithm helps in your initiatives!
Able to strive it out? Obtain the Jupyter Pocket book here. And in case you discovered this convenient, please, bear in mind to quote the original work, that will assist me rather a lot! 🙂