This stream we will present how to combine web scraping, OCR, and NLP techniques to construct the Matrix interaction network.
- Scraping Matrix fandom page with Selenium
- Using PyTesseract to read the Matrix movie script PDF
- Extract characters in each scene by using the SpaCy’s rule-based matcher
- Construct and analyze the character’s co-occurrence network in Neo4j
Blog: https://towardsdatascience.com/construct-the-matrix-interaction-network-based-on-the-movie-script-738b4fa9b46d
Neo4j Sandbox: https://dev.neo4j.com/try
Colab Notebook: https://github.com/tomasonjo/blogs/blob/master/matrix/MatrixNLP.ipynb
Matrix Characters: https://matrix.fandom.com/wiki/Category:Characters_in_The_Matrix
Follow Tomaz: https://twitter.com/tb_tomaz
Graph Algorithms for Data Science: https://www.manning.com/books/graph-algorithms-for-data-science - use code au35bra for 35% discount