How would you develop a model to identify plagiarism?
Follow the steps below for developing a model that identifies plagiarism: Tokenise the document. Use the NLTK library in Python for the removal of stopwords from data. Create LDA or SDA of the document and then use the GenSim library to identify the most relevant words, line by line. Use Google Search API to search for those words.