Adding a Column to a DataFrame Based on Web Data in Python in Azure ML

Posted in software by Christopher R. Wirz on Fri Oct 13 2017



Microsoft Azure Machine Learning Studio offers a wide array of functionality for manipulating and analyzing data. Sometimes it is necessary to go line by line and add contents from a remote source. Here is an example of how to do it in a Python Executable module.

Note: The script MUST contain a function named azureml_main which is the entry point for this module.

import pandas as pd
# import the libraries for the web request
import urllib.request, json 

# The entry point function can contain up to two input arguments:
#   Param: a pandas.DataFrame
#   Param: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):

	# Define the columns you are about to add.
	# It helps to initialize them with the right type
    dataframe1['code'] = ""
    dataframe1['popularity'] = 0
    
    for index, row in dataframe1.iterrows():
		# Since we are performing a web query, surround with try-except
        try:
			# the name must have its spaces replaced
            param = row['name'].replace(" ", "%20")
			# Use the current name to find the airport_code_iata			
            with urllib.request.urlopen("http://www.prokerala.com/travel/flight-time/search.unity.php?mode=airport&q=" + param) as url:
                data = json.loads(url.read().decode())
				# Set the data based on row, column
                dataframe1.loc[index,'code'] = data[0]['airport_code_iata']
                dataframe1.loc[index,'popularity'] = data[0]['popularity']
        except:
            print('An error occured.')
    
    # Return value must be of a sequence of pandas.DataFrame
    return dataframe1,