sample rate and bitdepth are two different things. 16 bit is what CD quality audio uses. it's possible that SNES samples are 8 bit (I don't actually know), but that won't make a great degree of audible difference by itself, it just means the sounds will have a higher noise floor (less space for dynamics.) I think the SNES sample rate is 32KHz, which is also only a little bit duller than CD quality. Either way, that alone isn't responsible for the sound quality of the SNES, because we're not dealing with streamed audio here, we're dealing with sequences+samples, and it's the samples that are generally speaking small since there's only a limited amount of space on the cartridges and in the music RAM (65KB at any one time/per song, unless samples are swapped in and out of RAM on the fly.)
for SNES and rompler samples generally, some parts of the sounds are recorded from real instruments or synthesizers, sometimes just the transient attacks on the samples, so that you get the character of a trumpet or a vibraphone or whatever. but then they just use that and use a short loop for the rest of the sound most of the time, plus ADSR settings. that's what creates the characteristic sound. believe me, if you knock the attack transient off a lot of these sounds, they're just going to sound like chiptunes/simple waveforms.
I was reading an interview about Super Mario Kart
wouldn't be the one conducted by Dave Harris would it?

Super Bomberman 5's Space Station is actually someone saying "Hit it!", but that instrument doesn't sound muffled at all
It isn't a very big sample. Possibly it was sampled at a low pitch and then upsampled in the song, I don't know. There's nothing particularly special or "16 bit" about it. It probably is a bit muffled compared to where it was sampled from though.
If you want to hear how muffled sounds can be, check out Tales of Phantasia's opening song, the one that can't be dumped to SPC. It has a vocal line. It's seriously muffled because the samples had to be small, likely they were sampled at high pitch and then downsampled for the song.
When you synthesize an instrument, what determines the kind of instrument that it will be (like brass or electric guitar)?
The way it's designed, the kind of harmonics it has. That's kind of a complex question really. Synthesis is a big area and there's lots of ways to get the qualities of certain instruments through additive synthesis, FM, ADSRs, LFOs etc.
If sample rates don't dictate how many bits an instrument is, then what does, and why do 16-bit samples take up less space than samples of real instruments? If I were trying to synthesize an instrument, how would I make it 8-bit or 16-bit or however many bits real life is?
as I explained, they're not 16 bit samples, but I get what you mean, and it's because there's less actual audio data than sampling an entire instrument note. short loop points.
And also a non-8-bit chiptune if it isn't too much trouble?
sorry, you're really confusing lots of different concepts. 8-bit and 16-bit when used to describe NES and SNES are describing the computing architecture of the systems themselves. it isn't to do with the audio quality. a chiptune written with 16bit 44.1KHz quality but still using a very simple subtractive synth is going to sound pretty similar to one using 8bit 22KHz.