English  |  正體中文  |  简体中文  |  Post-Print筆數 : 11 |  Items with full text/Total items : 88613/118155 (75%)
Visitors : 23475133      Online Users : 318
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 理學院 > 資訊科學系 > 學位論文 >  Item 140.119/125641
    Please use this identifier to cite or link to this item: http://nccur.lib.nccu.edu.tw/handle/140.119/125641

    Title: 利用卷積式注意力機制語言模型為影片生成鋼琴樂曲
    InverseMV: Composing Piano Scores with a Convolutional Video-Music Transformer
    Authors: 林鑫彤
    Lin, Chin-Tung
    Contributors: 沈錳坤
    Shan, Man-Kwan
    Lin, Chin-Tung
    Keywords: 為影片生成音樂
    Video-Music Transformer
    VMT Model
    Convolutional Video-Music Transformer
    Date: 2019
    Issue Date: 2019-09-05 16:14:39 (UTC+8)
    Abstract: 近年手機鏡頭的技術趨向成熟,加上如Facebook、Instagram等社群網站的興起,使用者可輕易用手機拍出高品質的照片及影片並分享到網路上。一個高流量的影片往往有著與之搭配的音樂,而一般人並非專業的配樂師,受限於音樂素材的收集和敏銳度,在影片配樂的挑選上時常遇到困難。影片的配樂上使用現成的音樂會受限於版權的問題,因此在影片配樂上使用音樂的自動生成將成為一個新的研究趨勢。
    隨著近年類神經網路(Neural Network, NN)蓬勃的發展,有許多研究開始嘗試使用類神經網路模型來生成符號音樂(symbolic music),但據我們所知目前並未有人嘗試為影片生成音樂。在缺乏現成dataset的情況下,我們人工收集並標記一個pop music的dataset來做為我們模型的訓練資料。基於注意力機制模型(Transformer)在自然語言處理(Natural Language Processing, NLP)問題上的成功,而符號音樂的生成與語言生成也有著異曲同工之處,本研究提出一個為影片自動生成配樂的模型VMT(Video-Music Transformer),輸入影片的frame sequence來生成對應的符號鋼琴音樂(symbolic piano music)。我們在實驗結果也得到VMT模型相對於序列模型(sequence to sequence model)在音樂流暢度和影片匹配度上有較好的結果。
    With the wide popularity of social media including Facebook, Twitter, Instagram, YouTube, etc. and the modernization of mobile photography, users on social media tend to watch and send videos rather than text. People want their video with a high click-through rate. However, such video requires great editing skill and perfect matching music, which are very difficult for common people. On top of that, people creating soundtrack suffer from the lack of ownership of musical pieces. The music generated from a model instead of existing music conduces to preventing from breaching copyright.
    The rise of deep learning brought out much work using a model based on the neural network to generate symbolic music. However, to the best of our knowledge, there is no work trying to compose music for video and no dataset with paired video and music. Therefore, we release a new dataset composed of over 7 hours of piano scores with fine alignment between pop music videos and midi files. We propose a model VMT(Video-Music Transformer) that generates piano scores from video frames, and then evaluate our model with seq2seq and obtain better music smooth and relevance of video.
    Reference: [1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
    [2] H.-W. Dong, W.-Y. Hsiao, L.-C. Yang, and Y.-H. Yang, MuseGAN: Symbolic-domain music generation and accompaniment with multi-track sequential generative adversarial networks. arXiv preprint arXiv:1709.06298, 2017.
    [3] J. Engel, C. Resnick, A. Roberts, S. Dieleman, M. Norouzi, D. Eck, and K. Simonyan, Neural audio synthesis of musical notes with wavenet autoencoders. Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017.
    [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets. Advances in neural information processing systems, 2014.
    [5] G. Hadjeres, F. Pachet, and F. Nielsen, DeepBach: a Steerable Model for Bach chorales generation. arXiv preprint arXiv:1612.01010, 2016.
    [6] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82-97, 2012.
    [7] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
    [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012.
    [9] F.-F. Kuo, M.-F. Chiang, M.-K. Shan, and S.-Y. Lee, Emotion-based music recommendation by association discovery from film music. Proceedings of the 13th annual ACM international conference on Multimedia, 2005.
    [10] J.-C. Lin, W.-L. Wei, and H.-M. Wang, EMV-matchmaker: emotional temporal course modeling and matching for automatic music video generation. Proceedings of the 23rd ACM international conference on Multimedia, 2015.
    [11] O. Mogren, C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arXiv preprint arXiv:1611.09904, 2016.
    [12] A. V. D. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
    [13] S. Oore, I. Simon, S. Dieleman, D. Eck, and K. Simonyan, This time with feeling: learning expressive musical performance. Neural Computing and Applications, 1-13, 2018.
    [14] P. M. Todd, A connectionist approach to algorithmic composition. Computer Music Journal, 13(4), 27-43, 1989.
    [15] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need. Advances in neural information processing systems, 2017.
    Description: 碩士
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0105753023
    Data Type: thesis
    DOI: 10.6814/NCCU201901153
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File SizeFormat
    302301.pdf3086KbAdobe PDF0View/Open

    All items in 政大典藏 are protected by copyright, with all rights reserved.

    社群 sharing

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback