Adding Name Trends to a Database


In a previous post, I used python to read the contents of a large number of TXT documents in order to store the information in a SQLite database.  To expand upon that project, I am now creating a MySQL database instead.  My hosted server has MySQL so it will be the default format I will use going forward.  Additionally, the MySQL database will be readily available for querying on my hosted server and will be more robust for large tables.

I am using the same data as previously.  The data I am accessing is available at  I am only interested in the National Data at the moment, but may also grab the state specific data later.

The data comes as a zip file containing a text file for every year of data since 1879.  There is also a README supplied describing the format of each text file.

The goal of this project is to convert the data from a series of text files into a database.  I want to be able to write python, php, and sql queries against the data and may attempt to write an API that can be used from a page on this site.

I am using MySQL to store this data as a database.  I do not have access set up so that I can run the python application from my workstation and output the data directly to the MySQL instance running on my hosted server.  The hosted server has python, so that python scripts can be executed; however, I don’t have access to install additional packages or the ability to run python in an editor or shell.  To make writing and testing the application easy, I am going to continue editing from PyCharm on my workstation.  I don’t want to run a MySQL instance on the same host, so I have installed a MySQL server instance on a virtual server running from my workstation in VMware Workstation Player.  I have configured the virtual server to have a bridged network connection and a static IP on my network.

Basic Steps:
      1. Import python packages
      2. Create a database connection
      3. Create a table
      4. Read from the table to determine the next ID
      5. Read the files in a given folder and create a list of the file names
      6. Open, read, and add the contents of each file to a tuple.  The inner for loop appends data to a new tuple from the reconstructed column data.  The file data contains the name, sex, and occurrence information, while the filename provides the year data.  The index and year are added to the other data to create the final tuple row.  The data is written to the database iteratively after each file is written.
      7. Commit and Close the database connection
      Lessons Learned:
      • In my previous post, I realized that appending to a single, growing tuple was cumbersome on resources, so I revised the for loop for this application to append the results of each file to the database before updating the tuple with the next set of information.  This allows for iterative writing to the database rather than one large “executemany” clause.
      Future Considerations:

      I was able to successfully write the database content in MySQL.  My next task will be to export this database from the virtual server I created to my hosted MySQL instance.  Beyond that, I will decide how I want to utilize the data on my site.

Leave a Reply

Your email address will not be published. Required fields are marked *