python - Convert JSON object to Pandas DataFrame ensuring that Key is considered as column label - Stack Overflow

I have a python script which needs to be executed by passing the input using command line. The command

I have a python script which needs to be executed by passing the input using command line. The command is as follows

python script.py --input [{\\"A\\":\\"322|985\\",\\"B\\":3}]

The idea is to convert the input to a pandas DataFrame. Code below does convert it to Pandas DataFrame but only creates a single column named 0 and the value for that column is [{\A\:\322|985\,\B\:3}].

import json
import pandas as pd
import argparse


def validate_input(input_data):

    if isinstance(input_data, pd.DataFrame):
        return input_data  # Already a DataFrame, return as is
    
    json_conv = json.dumps(input_data)
    json_data = json.loads(json_conv)
    
    return pd.DataFrame([json_data])  # Convert JSON serializable to DataFrame

def process_data(input_data):
    """
    Function that processes data, only called if dtype is valid.
    """
    validated_data = validate_input(input_data)
    print(validated_data)
    print("Processing data:\n", validated_data)

def main():
    parser = argparse.ArgumentParser(description="Validate and process JSON or Pandas DataFrame input.")
    parser.add_argument("--input", type=str, help="Input data as a JSON string")
    args = parser.parse_args()
    
    try:
        process_data(args.input)  # Proceed with processing only after validation
    except json.JSONDecodeError:
        raise TypeError("Invalid JSON input. Please provide a valid JSON string.")
    

if __name__ == "__main__":
    main()

Run code below to get expected output

pd.DataFrame([{"A":"322|985","B":3}])

I have a python script which needs to be executed by passing the input using command line. The command is as follows

python script.py --input [{\\"A\\":\\"322|985\\",\\"B\\":3}]

The idea is to convert the input to a pandas DataFrame. Code below does convert it to Pandas DataFrame but only creates a single column named 0 and the value for that column is [{\A\:\322|985\,\B\:3}].

import json
import pandas as pd
import argparse


def validate_input(input_data):

    if isinstance(input_data, pd.DataFrame):
        return input_data  # Already a DataFrame, return as is
    
    json_conv = json.dumps(input_data)
    json_data = json.loads(json_conv)
    
    return pd.DataFrame([json_data])  # Convert JSON serializable to DataFrame

def process_data(input_data):
    """
    Function that processes data, only called if dtype is valid.
    """
    validated_data = validate_input(input_data)
    print(validated_data)
    print("Processing data:\n", validated_data)

def main():
    parser = argparse.ArgumentParser(description="Validate and process JSON or Pandas DataFrame input.")
    parser.add_argument("--input", type=str, help="Input data as a JSON string")
    args = parser.parse_args()
    
    try:
        process_data(args.input)  # Proceed with processing only after validation
    except json.JSONDecodeError:
        raise TypeError("Invalid JSON input. Please provide a valid JSON string.")
    

if __name__ == "__main__":
    main()

Run code below to get expected output

pd.DataFrame([{"A":"322|985","B":3}])
Share edited Mar 7 at 22:09 Barmar 784k57 gold badges548 silver badges659 bronze badges asked Mar 7 at 21:49 LopezLopez 4341 gold badge5 silver badges30 bronze badges 0
Add a comment  | 

1 Answer 1

Reset to default 2

You're escaping the backslashes, so the doublequotes aren't being taken literally. As a result, the shell is treating them as string delimiters, not passing them to python.

The simplest fix would be to put the entire argument in single quotes.

python script.py --input '[{"A":"322|985","B":3}]'

There's no need to call json.dumps(input_data). input_data is a JSON string, not data, so it doesn't need to be converted to JSON.

json_data is already a list because the JSON has []. You don't need to wrap it in another list when calling pd.DataFrame().

So the corrected version of validate_input() is:

def validate_input(input_data):

    if isinstance(input_data, pd.DataFrame):
        return input_data  # Already a DataFrame, return as is

    json_data = json.loads(input_data)

    return pd.DataFrame(json_data)  # Convert JSON serializable to DataFrame

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744908697a4600414.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信