In a previous post, I used python to read the contents of a large number of TXT documents in order to store the information in a SQLite database. To expand upon that project, I am now creating a MySQL database instead. My hosted server has MySQL so it will be the default format I will use going forward. Additionally, the MySQL database will be readily available for querying on my hosted server and will be more robust for large tables.
I am using the same data as previously. The data I am accessing is available at https://www.ssa.gov/oact/babynames/limits.html. I am only interested in the National Data at the moment, but may also grab the state specific data later.
The data comes as a zip file containing a text file for every year of data since 1879. There is also a README supplied describing the format of each text file.
The goal of this project is to convert the data from a series of text files into a database. I want to be able to write python, php, and sql queries against the data and may attempt to write an API that can be used from a page on this site.
I am using MySQL to store this data as a database. I do not have access set up so that I can run the python application from my workstation and output the data directly to the MySQL instance running on my hosted server. The hosted server has python, so that python scripts can be executed; however, I don’t have access to install additional packages or the ability to run python in an editor or shell. To make writing and testing the application easy, I am going to continue editing from PyCharm on my workstation. I don’t want to run a MySQL instance on the same host, so I have installed a MySQL server instance on a virtual server running from my workstation in VMware Workstation Player. I have configured the virtual server to have a bridged network connection and a static IP on my network.
- Import python packages
12import mysql.connectorimport os
- Create a database connection
1234# Connect to or create the databaseconn = mysql.connector.connect(user='username', password='password',host='192.168.1.5',database='names')
- Create a table
12345678910# Create table"""c.execute('''CREATE TABLE namestats (id int(11) NOT NULL AUTO_INCREMENT,title varchar(128) NOT NULL,slug varchar(128) NOT NULL,text text NOT NULL,PRIMARY KEY (id),KEY slug (slug));''')"""
- Read from the table to determine the next ID
123456789101112c.execute('''SELECT id FROM namestats''')ids = c.fetchall()idlist =for item in ids:idlist.append(item)idlist.sort()if len(idlist) == 0:lastid = 0else:lastid = idlist[-1] + 1
- Read the files in a given folder and create a list of the file names
123456789# Note the r to denote the path as a raw stringfiles = os.listdir(r"C:\Users\file\path")filelist = for item in files:if item[-3:] == 'txt':filelist.append(item)filelist.sort()
- Open, read, and add the contents of each file to a tuple. The inner for loop appends data to a new tuple from the reconstructed column data. The file data contains the name, sex, and occurrence information, while the filename provides the year data. The index and year are added to the other data to create the final tuple row. The data is written to the database iteratively after each file is written.
12345678910111213rowinfo = for file in filelist:entries = tuple(open(r"C:\Users\file\path" + "\\" + file, 'r'))filename = file[3:7]rowinfo = for index, item in enumerate(entries, start=lastid):name, sex, occ = item.split(',')rowinfo.append([index, name, sex, int(occ.rstrip()), int(filename)])lastid = index + 1rowinfo = tuple(rowinfo)stmt = "INSERT INTO namestats (id,name,sex,occurrence,year) VALUES (%s, %s, %s, %s, %s)"c.executemany(stmt, rowinfo)
- Commit and Close the database connection
- In my previous post, I realized that appending to a single, growing tuple was cumbersome on resources, so I revised the for loop for this application to append the results of each file to the database before updating the tuple with the next set of information. This allows for iterative writing to the database rather than one large “executemany” clause.
I was able to successfully write the database content in MySQL. My next task will be to export this database from the virtual server I created to my hosted MySQL instance. Beyond that, I will decide how I want to utilize the data on my site.
- Import python packages