There’s so much science and nuance behind getting a taxonomy to align and function along with a classification system. The current UI alone tells me that you all deeply understand this and how it relates to the topic of audio.
The following isn’t meant as a soapbox; it’s just meant to “show the math” without assumptions to explain my suggestion and reasoning for it, so anyone can clearly follow along and do gap analysis as well.
Statistically, feedback will be provided from less than 5% of the user base in general. I firmly believe it is critical to have some method to gather feedback in context that does not rely on after-action reports (forum posts, email, surveys, etc.) for two reasons.
- This type can often skew negative and/or result create a dog-pile of a vocal minority.
- More importantly, after-action reports are given when the user is on the “cold-state” side of the Hot-Cold Empathy Gap.
When humans are in a hot-state, we sometimes go with initial interpretations or instinctual approaches. Sometimes these are without conscious thought. Many times we think or do things that we would be surprised by or would consider absurd while in a cold-state. This is simply human nature. To achieve your goal of a fully “intuitive interface” that is intended to be used during what is essentially a live performance by a GM, everything a user sees and selects should result in matched expectations while the GM is in a hot-state mindset.
Therefore the best feedback will come from a user when they are in a “hot-state”, i.e. in the middle of using Nova to run a game.
For these reasons, providing a method for users a quick, unambiguous, unobtrusive method in the UI to record contextual feedback on any select-able sound/track that focuses on whether the delivered sound matched their exceptions from the selection(s) made is, and will continue to be, extremely valuable.
Suggestion:
Any box that gets clicked which starts new audio and plays for more than 3 seconds has some simple icons show up that can record a yes/no answer to the question “Does what you’ve heard match the expectation you had when you clicked it from the context + picture + description?” Or in other words… Did the pictures + text give them what they thought was going to be delivered to their ear holes? 
Critically, the UI needs to make clear that it is not a rating for whether the
sound / music is to their personal tastes. It should be focused on feedback about the terminology / taxonomy / classification.
I would also recommend focusing on making it an elegant experience for the user rather than to allow it to be a “toggle” to universally hide / disable it; to be effective, the ability must be ready at all times with no delay, friction, or forethought to catch the “hot-state” situation. If it’s turned off, then all of those opportunities are lost.
Example of Possible Experience:
I click Location > Tavern > A Good Evening. After 3 seconds of playing, a thumbs up / thumbs down icon pair quickly fades into view above the text “A Good Evening” on the selected box.
Hovering over the thumbs, it shows the question "Did this sound match your expectation?".
Clicking thumbs up shows a tiny “Huzzah! Thank you for the feedback!” and the icons fade out quickly.
Clicking thumbs down presents a third option icon; something like “(optional) tell us why…”. The icons don’t fade again for maybe 10s. Clicking this third option would show the user three points of data on a pop-up, side panel, or some other element:
- The selection “path” (Location > Tavern > )
- Whatever the label of the selection box is (A Good Evening)
- The longer “internal definition” / synopsis of this selection that you used to inspire or match the sounds to.
Essentially, if there was some nuance or details that the designer / Syrinscape team was trying to include and/or assumed, it would be included here. This gives the team a bit more words to explain what the current goal is, and lets the user have the possibility to speak to what you are basing decisions from rather than their assumption of what you are basing decisions from. Since the UI is going to require constant negotiations with brevity throughout, this balance obviously will need constant attention as things grow.
Then below those are the space for their “why”. A cultivated list of common answers is best so they can click one (or many) and be done, but this situation may actually benefit from what I call the dreaded “Other” + multi-line text box option to start. They can write how the vibe ain’t right, a commentary on the wording, or whatever nuance instead. Music and sound is highly personal, so giving people a way to express that nuance is nice and could be great, but I can’t say if that will end up with any more actionable insights.
This information could then be used in reporting and match up with your analytics and include any additional metrics so that you can clearly include insights like:
- What was clicked on after the interaction?
- Did the visit result in a thumbs up after a thumbs down was given?
2.1 How much more was clicked on after the thumbs down?
2.2 Did the volume get turned down / muted?