English  |  正體中文  |  简体中文  |  Post-Print筆數 : 11 |  Items with full text/Total items : 88531/118073 (75%)
Visitors : 23457573      Online Users : 111
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 理學院 > 資訊科學系 > 學位論文 >  Item 140.119/124874
    Please use this identifier to cite or link to this item: http://nccur.lib.nccu.edu.tw/handle/140.119/124874

    Title: 以兩層式機器學習進行連網設備識別
    Two-Level Machine Learning for Network Enabled Devices Identification
    Authors: 吳明倫
    Wu, Ming-Lun
    Contributors: 胡毓忠
    Hu, Yuh-Jong
    Wu, Ming-Lun
    Keywords: 物聯網
    Network Enabled Devices
    Cyber Security
    Two-level Machine Learning
    Semi-supervised Learning
    Network Scan Data
    Support Vector Machine
    Random Forest
    Binary Classifier
    Date: 2019
    Issue Date: 2019-08-07 16:36:36 (UTC+8)
    Abstract: 隨著物聯網技術的蓬勃發展,網路上連網設備數量呈現爆炸性的成長,提供的服務也更為多元,使人們的生活更方便。然連網設備產品的設計不良及資安防護能力的缺乏,使設備漏洞遭駭客利用的事件層出不窮,導致充斥連網設備的家庭及企業網路環境面臨重大資安威脅。為了瞭解目標網路內連接有多少具有潛在風險的連網設備,藉由連網設備識別來瞭解網路狀況便是資安防護的第一步。本研究希望探索以兩層式機器學習(Two-level Machine Learning)的技術,用於處理量體龐大且具有階層式資料(Hierarchical Structure Data)特性的連網設備資料上,並比較與目前常用的單層式機器學習間的差異,加上結合半監督式學習的概念,探索自動處理受歸類為未知設備的可能性。

    本研究使用 Censys 網路掃描資料集來進行支援向量機(Support Vector Machine)及隨機森林(Random Forest)兩種分類演算法的二元分類器訓練,進而對連網設備資料進行分類;並採半監督式學習概念,嘗試找出以基於密度的分群演算法來處理受歸類為未知類別設備的最佳參數。最後透過多項模擬實驗來驗證與比較在這個應用問題中,兩種分類演算法及單層與兩層式機器學習之間的差異,並就實驗成果提出相關量化與質化的觀察結果。
    With the rapid development of Internet of Things technology, the number of network enabled devices on the Internet has exploded and the services provided have become more diverse, making people's lives more convenient. However, the poor design of network enabled devices and the lack of security protection capabilities have led to an endless stream of equipment exploits by hackers, which has led to major security threats to home and corporate network environments that are full of network enabled devices. In order to understand how many potentially network enabled devices are connected to the target network, it is the first step of security protection to understand the network status through network enabled devices identification. This study hopes to explore the technology of two-level machine learning, which is used to process network enabled devices with large volume and hierarchical structure data characteristics, then compare differences with common single-level machine learning. Combined with the concept of semi-supervised learning to explore the possibility of automatically classifying objects which are classified as unknown device.

    This study uses the Censys network scan dataset to perform binary classifier training with Support Vector Machine and Random Forest classification algorithms, and then classifies the network enabled devices. With semi-supervised learning concepts, trying to find out the best parameters for classified unknown devices by density-based clustering algorithms. Finally, through a number of simulation experiments to verify and compare the differences between the two classification algorithms and single-level and two-level machine learning in this application problem, then provides relevant quantitative and qualitative observations on the experimental results.
    Reference: [1] Y. Yuchen et al. A survey on security and privacy issues in internet-of-things. IEEE Internet of Things Journal, 4(5):1250-1258, 2017.
    [2] A. Gupta et al. dkk.,(2013), vulnerability assessment and penetration testing. International Journal of Engineering Trends and Technology, 4(3-2013), 2013.
    [3] Susan Dumais and Hao Chen. Hierarchical classification of web content. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pages 256-263. ACM, 2000.
    [4] Huei Chen Wu. A study on multi-layered automatic book classification system using data mining. Master's thesis, National Chung Hsing University, 2015.
    [5] O. Papadopoulou et al. A Two-Level Classification Approach for Detecting Clickbait Posts using Text-Based Features. arXiv preprint arXiv:1710.08528, 2017.
    [6] O. Chapelle et al. Semi-Supervised Learning. The MIT Press, 1st edition, 2010.
    [7] Levi Lelis and Jörg Sander. Semi-supervised density-based clustering. In 2009 Ninth IEEE International Conference on Data Mining, pages 842-847. IEEE, 2009.
    [8] M. Ester et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226-231, 1996.
    [9] Kishore Angrishi. Turning internet of things (iot) into internet of vulnerabilities (iov): Iot botnets. arXiv preprint arXiv:1702.03681, 2017.
    [10] Keaton Mower and Hovav Shacham. Pixel perfect: Fingerprinting canvas in html5. Proceedings of W2SP, pages 1-12, 2012.
    [11] Z. Durumeric et al. Zmap: Fast internet-wide scanning and its security applications. In Presented as part of the 22nd {USENIX} Security Symposium ({USENIX} Security 13), pages 605-620, 2013.
    [12] Z. Durumeric et al. A search engine backed by Internet-wide scanning. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 542-553. ACM, 2015.
    [13] D. Arora et al. Big Data Analytics for Classification of Network Enabled Devices. In 2016 30th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pages 708-713, March 2016.
    [14] M. Miettinen et al. IoT SENTINEL: Automated Device-Type Identification for Security Enforcement in IoT. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 2177-2184, June 2017.
    [15] B. Genge et al. ShoVAT: Shodan-based vulnerability assessment tool for Internet-facing services. Security and communication networks, 9(15):2696-2714, 2016.
    [16] S. Shaikh et al. Implementation of dbscan algorithm for internet traffic classification. International Journal of Computer Science and Information Technology Research (IJCSITR), pages 25-32, 2013.
    [17] Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861-874, 2006.
    [18] Arie Ben-David. About the relationship between roc curves and cohen's kappa. Engineering Applications of Artificial Intelligence, 21(6):874-882, 2008.
    [19] Ka Yee Yeung and Walter L Ruzzo. Details of the adjusted rand index and clustering algorithms, supplement to the paper an empirical study on principal component analysis for clustering gene expression data. Bioinformatics, 17(9):763-774, 2001.
    [20] Leland McInnes and John Healy. Accelerated hierarchical density based clustering. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on, pages 33-42. IEEE, 2017.
    Description: 碩士
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0106753015
    Data Type: thesis
    DOI: 10.6814/NCCU201900635
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File SizeFormat
    301501.pdf2085KbAdobe PDF0View/Open

    All items in 政大典藏 are protected by copyright, with all rights reserved.

    社群 sharing

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback