如何使波形呈现更有趣?波形、有趣

2023-09-11 22:57:57 作者:闲客

我写了一个波形渲染器,它接受一个音频文件,并创建是这样的:

I wrote a waveform renderer that takes an audio file and creates something like this:

的逻辑是pretty的简单。余计算所需的各像素的音频采样的数量,读取这些样品,求出它们的平均值和绘制像素的根据所得到的值的列。

The logic is pretty simple. I calculate the number of audio samples required for each pixel, read those samples, average them and draw a column of pixels according to the resulting value.

通常情况下,我将呈现围绕600-800像素的全曲,所以波是pretty的COM pressed。不幸的是这通常会导致因为几乎整首歌曲只是呈现在几乎相同的高度吸引力的视觉效果。没有变化。

Typically, I will render a whole song on around 600-800 pixels, so the wave is pretty compressed. Unfortunately this usually results in unappealing visuals as almost the entire song is just rendered at almost the same heights. There is no variation.

有趣的是,如果你的波形上 SoundCloud 中几乎没有一个是因为无聊我的结果。他们都有一些变化。可能是什么把戏吗?我不认为他们只是增加随机噪声。

Interestingly, if you look at the waveforms on SoundCloud almost none of them are as boring as my results. They all have some variation. What could be the trick here? I don't think they just add random noise.

推荐答案

我不认为SoundCloud是做什么特别特殊。有很多歌曲我对他们的头版看到,很平的。它更多的是与细节感知的方式和什么样的歌曲的整体动态都喜欢。的主要区别在于,SoundCloud是借鉴绝对值。 (图像的负侧只是一个镜子。)

I don't think SoundCloud is doing anything particularly special. There are plenty of songs I see on their front page that are very flat. It has more to do with the way detail is perceived and what the overall dynamics of the song are like. The main difference is that SoundCloud is drawing absolute value. (The negative side of the image is just a mirror.)

有关演示,这里是一个基本的白噪声图用直线:

For demonstration, here is a basic white noise plot with straight lines:

现在,典型的填充用于使整体轮廓更容易看到。这已经做了很多的外观:

Now, typically a fill is used to make the overall outline easier to see. This already does a lot for the appearance:

放大波形(尤其是缩小)通常使用的镜面效果,因为动力更加明显:

Larger waveforms ("zoomed out" in particular) typically use a mirror effect because the dynamics become more pronounced:

酒吧是另一种可视化,并能给予详细的错觉:

Bars are another way to visualize and can give an illusion of detail:

对于一个典型的波形图形(平均ABS和镜子)伪程序可能是这样的:

A pseudo routine for a typical waveform graphic (average of abs and mirror) might look like this:

for (each pixel in width of image) {
    var sum = 0

    for (each sample in subset contained within pixel) {
        sum = sum + abs(sample)
    }

    var avg = sum / length of subset

    draw line(avg to -avg)
}

这是有效像COM pressing时间轴作为窗口的RMS。 (RMS也可以使用,但它们几乎是相同的。)现在的波形示出了整体动力学

This is effectively like compressing the time axis as RMS of the window. (RMS could also be used but they are almost the same.) Now the waveform shows overall dynamics.

这是不是从你已经在做太大的不同,只是ABS,镜和补。对于盒状SoundCloud使用,你会绘制矩形。

That is not too different from what you are already doing, just abs, mirror and fill. For boxes like SoundCloud uses, you would be drawing rectangles.

只是作为奖励,这里是用Java编写的描述,产生的盒子波形的MCVE。 (很抱歉,如果Java是不是你的语言)。实际绘图code是接近顶部。这个方案还正常化,即,波形延伸到图像的高度。

Just as a bonus, here is an MCVE written in Java to generate a waveform with boxes as described. (Sorry if Java is not your language.) The actual drawing code is near the top. This program also normalizes, i.e., the waveform is "stretched" to the height of the image.

此简单的输出是与上述相同的伪程序:

This simple output is the same as the above pseudo routine:

这与盒的输出是非常相似的SoundCloud:

This output with boxes is very similar to SoundCloud:

import javax.swing.*;
import java.awt.*;
import java.awt.event.*;
import java.awt.image.*;
import java.io.*;
import javax.sound.sampled.*;

public class BoxWaveform {
    static int boxWidth = 4;
    static Dimension size = new Dimension(boxWidth == 1 ? 512 : 513, 97);

    static BufferedImage img;
    static JPanel view;

    // draw the image
    static void drawImage(float[] samples) {
        Graphics2D g2d = img.createGraphics();

        int numSubsets = size.width / boxWidth;
        int subsetLength = samples.length / numSubsets;

        float[] subsets = new float[numSubsets];

        // find average(abs) of each box subset
        int s = 0;
        for(int i = 0; i < subsets.length; i++) {

            double sum = 0;
            for(int k = 0; k < subsetLength; k++) {
                sum += Math.abs(samples[s++]);
            }

            subsets[i] = (float)(sum / subsetLength);
        }

        // find the peak so the waveform can be normalized
        // to the height of the image
        float normal = 0;
        for(float sample : subsets) {
            if(sample > normal)
                normal = sample;
        }

        // normalize and scale
        normal = 32768.0f / normal;
        for(int i = 0; i < subsets.length; i++) {
            subsets[i] *= normal;
            subsets[i] = (subsets[i] / 32768.0f) * (size.height / 2);
        }

        g2d.setColor(Color.GRAY);

        // convert to image coords and do actual drawing
        for(int i = 0; i < subsets.length; i++) {
            int sample = (int)subsets[i];

            int posY = (size.height / 2) - sample;
            int negY = (size.height / 2) + sample;

            int x = i * boxWidth;

            if(boxWidth == 1) {
                g2d.drawLine(x, posY, x, negY);
            } else {
                g2d.setColor(Color.GRAY);
                g2d.fillRect(x + 1, posY + 1, boxWidth - 1, negY - posY - 1);
                g2d.setColor(Color.DARK_GRAY);
                g2d.drawRect(x, posY, boxWidth, negY - posY);
            }
        }

        g2d.dispose();
        view.repaint();
        view.requestFocus();
    }

    // handle most WAV and AIFF files
    static void loadImage() {
        JFileChooser chooser = new JFileChooser();
        int val = chooser.showOpenDialog(null);
        if(val != JFileChooser.APPROVE_OPTION) {
            return;
        }

        File file = chooser.getSelectedFile();
        float[] samples;

        try {
            AudioInputStream in = AudioSystem.getAudioInputStream(file);
            AudioFormat fmt = in.getFormat();

            if(fmt.getEncoding() != AudioFormat.Encoding.PCM_SIGNED) {
                throw new UnsupportedAudioFileException("unsigned");
            }

            boolean big = fmt.isBigEndian();
            int chans = fmt.getChannels();
            int bits = fmt.getSampleSizeInBits();
            int bytes = bits + 7 >> 3;

            int frameLength = (int)in.getFrameLength();
            int bufferLength = chans * bytes * 1024;

            samples = new float[frameLength];
            byte[] buf = new byte[bufferLength];

            int i = 0;
            int bRead;
            while((bRead = in.read(buf)) > -1) {

                for(int b = 0; b < bRead;) {
                    double sum = 0;

                    // (sums to mono if multiple channels)
                    for(int c = 0; c < chans; c++) {
                        if(bytes == 1) {
                            sum += buf[b++] << 8;

                        } else {
                            int sample = 0;

                            // (quantizes to 16-bit)
                            if(big) {
                                sample |= (buf[b++] & 0xFF) << 8;
                                sample |= (buf[b++] & 0xFF);
                                b += bytes - 2;
                            } else {
                                b += bytes - 2;
                                sample |= (buf[b++] & 0xFF);
                                sample |= (buf[b++] & 0xFF) << 8;
                            }

                            final int sign = 1 << 15;
                            final int mask = -1 << 16;
                            if((sample & sign) == sign) {
                                sample |= mask;
                            }

                            sum += sample;
                        }
                    }

                    samples[i++] = (float)(sum / chans);
                }
            }

        } catch(Exception e) {
            problem(e);
            return;
        }

        if(img == null) {
            img = new BufferedImage(size.width, size.height, BufferedImage.TYPE_INT_ARGB);
        }

        drawImage(samples);
    }

    static void problem(Object msg) {
        JOptionPane.showMessageDialog(null, String.valueOf(msg));
    }

    public static void main(String[] args) {
        SwingUtilities.invokeLater(new Runnable() {
            @Override
            public void run() {
                JFrame frame = new JFrame("Box Waveform");
                JPanel content = new JPanel(new BorderLayout());
                frame.setContentPane(content);

                JButton load = new JButton("Load");
                load.addActionListener(new ActionListener() {
                    @Override
                    public void actionPerformed(ActionEvent ae) {
                        loadImage();
                    }
                });

                view = new JPanel() {
                    @Override
                    protected void paintComponent(Graphics g) {
                        super.paintComponent(g);

                        if(img != null) {
                            g.drawImage(img, 1, 1, img.getWidth(), img.getHeight(), null);
                        }
                    }
                };

                view.setBackground(Color.WHITE);
                view.setPreferredSize(new Dimension(size.width + 2, size.height + 2));

                content.add(view, BorderLayout.CENTER);
                content.add(load, BorderLayout.SOUTH);

                frame.pack();
                frame.setResizable(false);
                frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
                frame.setLocationRelativeTo(null);
                frame.setVisible(true);
            }
        });
    }
}

注意:为简便起见,该程序加载整个音频文件到存储器。一些JVM可能抛出的OutOfMemoryError 。要解决此问题,运行有如下描述增加堆大小。

Note: for the sake of simplicity, this program loads the entire audio file in to memory. Some JVMs may throw OutOfMemoryError. To correct this, run with increased heap size as described here.