Emotion is defined as a response to external stimuli and internal mental representations. It has been characterized as a multidimensional concept, primarily comprising two dimensions: valence and arousal. Existing studies have demonstrated that emotional experience exerts a powerful impact on auditory processing in terms of valence. However, it has also been shown that while negative emotion can improve auditory perception in healthy subjects, patients with depression show deficits in auditory perception. We thus speculated that both arousal and valence jointly modulate auditory perception. To examine the emotion-driven effects on the auditory response, we induced positive, negative, and neutral emotional states in healthy subjects and collected auditory steady-state response (ASSR) evoked by a 40-Hz chirp sound. We calculated peak-to-peak amplitude (PPA) and event-related spectral perturbation (ERSP) of evoked ASSRs and observed that the positive emotions significantly enhanced brain responses to auditory stimuli (p < 0.001), but that ASSRs in a negative state were not significantly enhanced compared with the neutral state. Subsequently, regression analysis showed a significant positive multiple linear relationship between the PPA and ratings of two emotional dimensions, indicating that arousal and valence jointly regulated the auditory cortex’s synchronous oscillation, rather than the valence in isolation, offering the potential to clarify the conflicting results surrounding the role of negative emotions in auditory responses. Because depression is generally characterized by low arousal and low valence in actual life, whereas the negative emotion evoked under laboratory conditions is always with low valence but high arousal.