foundation of ANIMUSIC
In the physical world, sound is produced as a result of some stimulus. A drum sounds when the drum head us struck by some moving object (stick, mallet, hand, ...). A guitar sounds when its string vibrates as the result of some other object that moved (pick, finger, bow...).
Skipping the whole subject of sound involving from vibrations, traveling thru a medium (usually air, unless you're a whale), bypassing the whole discussion about the music really only being in ones brain, there's one critical fundamental aspect that I personally have embrassed as the fundumental principle of the technique used by ANIMUSIC...
Cause and Effect
The Effect is the sound. What was its Cause? Something moved resulting in something else vibrating. Obviously, the Cause happens before the Effect (and may continue during the Effect, but not always).
Cause: drum stick hits drum
Effect: short burst of sound
Cause: guitar pick plucks string
Effect: string vibrates making a tone
Anticipation and Follow-Through
Anyone who's studied classic animation fundumentals has learned about two bookends of an animated motion.
- Anticipation (eg. a character gets ready to throw a ball by contorting its body in whimsical ways indicating what's about to happen)
- Follow-Through (eg. a difference character catches the ball, stumbling backwards awkwardly, probably making a face).
In the most general sense, I like to think of it as:
- Cause -> Effect
- Anticipation -> Follow-Through
The audio drives the visuals. Thus the visuals "react to" the audio data. This is typically done by temporally filtering descrete buckets along the audible frequency spectrum, which produces a stream of N floating point variables that can be used to drive any desired graphical parameter.
Typical parameters include object height or other scaling, particle emission parameters, brightness, glow, or other shader parameters, etc.
- FFT code to magically transform an audio stream into an array of float streams is widely available.
- Plugging in and using requires no understanding of the amazingly brilliant mathematics churning away inside
- It's trivial to just drop some audio in and see what happens. With some audio content (eg certain EDM subgenres especially with spectrally isolated Kick and Snare sounds) one can adjust a couple of buckets to pull out those critical aspects and quickly have a usable video stream perfectly synced in real time to the music.
- Fun stuff. No question. VJ's love it. Shader developers have some parameters to drive dynamic aspects. Instant Wow.
- Working with a finished audio mix, one cannot truely isolate even the Kick or Snare since there is invariably other instruments moving in and out of any given bucket, which smears the data so all but a few musical features are noticable (if any).
- if one has access to individual audio tracks, this can be overcome, at the expense of losing the simplicity of using a single array of buckets from a single finished audio mix (usually even combining L and R channels to have one mono audio stream)
- All the buckets really give you is an averaged momentary energy level of a slice of the frequency spectrum. There is no continuity to a given sound or instrument as it, for example, climbs a scale of notes. The resulting data is seen as one frequency bucket diminishing while the next one instreases.
- This is expecially unsatisfying if the adjacent buckets are used to drive graphically unrelated things (which is often the case)
- Anything this easy invites lots of thoughtless experimentation, producing a massive quantity of kinda interesting wiggly stuff that sorta wiggles with the music sometimes, and doesn't hold anyones attention too long.
- Thus it's often thrown in with lots of other freaky which effects may be even less related to the music
- Nother wrong with any of the above; this approach fulfills the purpose of adding a certain ingredient to the atmosphere which tickles two senses with just enough correlation for our brains to light up a little brighter sometimes.