Adding Name Trends to a Database

Background:

In a previous post, I used python to read the contents of a large number of TXT documents in order to store the information in a SQLite database.  To expand upon that project, I am now creating a MySQL database instead.  My hosted server has MySQL so it will be the default format I will use going forward.  Additionally, the MySQL database will be readily available for querying on my hosted server and will be more robust for large tables.

I am using the same data as previously.  The data I am accessing is available at https://www.ssa.gov/oact/babynames/limits.html.  I am only interested in the National Data at the moment, but may also grab the state specific data later.

The data comes as a zip file containing a text file for every year of data since 1879.  There is also a README supplied describing the format of each text file.

The goal of this project is to convert the data from a series of text files into a database.  I want to be able to write python, php, and sql queries against the data and may attempt to write an API that can be used from a page on this site.

I am using MySQL to store this data as a database.  I do not have access set up so that I can run the python application from my workstation and output the data directly to the MySQL instance running on my hosted server.  The hosted server has python, so that python scripts can be executed; however, I don’t have access to install additional packages or the ability to run python in an editor or shell.  To make writing and testing the application easy, I am going to continue editing from PyCharm on my workstation.  I don’t want to run a MySQL instance on the same host, so I have installed a MySQL server instance on a virtual server running from my workstation in VMware Workstation Player.  I have configured the virtual server to have a bridged network connection and a static IP on my network.

Basic Steps:
      1. Import python packages
      2. Create a database connection
      3. Create a table
      4. Read from the table to determine the next ID
      5. Read the files in a given folder and create a list of the file names
      6. Open, read, and add the contents of each file to a tuple.  The inner for loop appends data to a new tuple from the reconstructed column data.  The file data contains the name, sex, and occurrence information, while the filename provides the year data.  The index and year are added to the other data to create the final tuple row.  The data is written to the database iteratively after each file is written.
      7. Commit and Close the database connection
      Lessons Learned:
      • In my previous post, I realized that appending to a single, growing tuple was cumbersome on resources, so I revised the for loop for this application to append the results of each file to the database before updating the tuple with the next set of information.  This allows for iterative writing to the database rather than one large “executemany” clause.
      Future Considerations:

      I was able to successfully write the database content in MySQL.  My next task will be to export this database from the virtual server I created to my hosted MySQL instance.  Beyond that, I will decide how I want to utilize the data on my site.

Adding Name Trends to a Database

Background:

In the past, I have worked with created databases in order to query information and with data analysis tools in order to infer explore data relationships.  I want to start collecting my own data of interest so that I can explore analysis in unique ways.  This is my first attempt to collect and store data for such future projects.  I want to start off with something relatively straightforward, so I am choosing a readily available data set to work with.  I am going to be taking name data collected by the US Social Security Administration and adding it to a database.

To begin with, the data I am accessing is available at https://www.ssa.gov/oact/babynames/limits.html.  I am only interested in the National Data at the moment, but may also grab the state specific data later.

The data comes as a zip file containing a text file for every year of data since 1879.  There is also a README supplied describing the format of each text file.

The goal of this project is to convert the data from a series of text files into a database.  I want to be able to write python, php, and sql queries against the data and may attempt to write an API that can be used from a page on this site.

I am using SQLite to store this data as a database.

Basic Steps:
  1. Import python packages
  2. Create a database connection
  3. Create a table
  4. Read from the table to determine the next ID
  5. Read the files in a given folder and create a list of the file names
  6. Open, read, and add the contents of each file to a tuple.  The inner for loop appends data to a new tuple from the reconstructed column data.  The file data contains the name, sex, and occurrence information, while the filename provides the year data.  The index and year are added to the other data to create the final tuple row.
  7. Add the tuple to the table
  8. Commit and Close the database connection
Lessons Learned:
  • SQLite was a good choice to test the script functionality.  It is lightweight and doesn’t require an installed service to run.
  • SQLite was not a good choice for the real database storage.  The number of entries ended up being more that 9 million.  This made the table too large to be used.  When attempting to read the table back with python, the connector was never actually able to return any results.  In the future, I will only utilize the SQLite functionality to test the parsing and formatting of my data collection on a subset of the intended data.
  • After some thought, I realized that appending to a single, growing tuple was cumbersome on resources, so I will revise the for loop to append the results of each file to the database before updating the tuple with the next set of information.  This will allow for iterative writing to the database rather than one large “executemany” clause.
Future Considerations:

Ultimately, this was successful; however, the result was not useful for my particular scenario.  The SQLite approach is handy for quick testing, but I am interested in doing something with the data in a web-ready format.  Since I want to move this database to my hosted web server, I need to get the data into a MySQL format.  I did not find a straightforward way to convert between the formats, so I am going to rewrite the script to store the data in MySQL in a future post.