Project Description:
In modern agriculture, pesticides are essential for ensuring crop protection and maximizing yield. There are over 7,000 pesticides registered in the United States and while the U.S. Environmental Protection Agency (EPA) provides pesticide information as a PDF label, users have been manually extracting the information out. Standard approaches to extract out this information have failed in obtaining accurate and structured information about pesticides as these documents vary significantly in structure and formatting. Therefore, our group plans to utilize modern AI technology to create an automated system that can extract and store pesticide information to a structured database. Our project consists of a complete data pipeline consisting of four parts: Label Acquisition, PDF-to-text parsing, Data extraction with AI model, and a Database.