Accepted to Google Summer of Code with NumFOCUS
Hello, everyone!
Excited to announce that I have been accepted to Google Summer of Code 2021. I will be contributing to Data Retriever under NumFOCUS umbrella this summer as a GSoC student!
The Data Retriever is a package manager for publicly accessible data. The Data retriever automatically finds, downloads, and pre-processes publicly available datasets and it stores these datasets in a ready-to-analyze state. A number of data providers require the use of an account with an associated Login or API key to access data programmatically.
The Data Retriever currently has support for the Kaggle API allowing users to securely use the Data Retriever to install datasets hosted by Kaggle. The goal of this project is to find sources of public Data which require a Login/API key to access the data and integrate them into Data Retriever. Two data sources with Python API (Socrata & CKAN) have been thoroughly researched and are ready to be added. The users will place the appropriate credentials in a file in their home directory. The Data Retriever will automatically identify the required credential files and handle the login/API request to download the dataset
The community bonding period has already begun. My mentors in this project are : Henry Senyondo, and Ethan White. I am communicating with them through Gitter as we begin to schedule and plan the best course of action for this project.
Stay tuned for more blogs