3
\$\begingroup\$

How are you doing? I have to make a script to parse an xml input file to a json file. I tried to do my best, but it will be nice if you could check it and help me to improve it. The idea is that I don't have use objectify of libraries that converts files directly. I have to write this scritps with at least this properties:-

  1. Seat/Element type (Seat, Kitchen, Bathroom, etc)-

  2. List item

  3. Seat id (17A, 18A)

  4. Seat price

  5. Cabin class

  6. Availability

By the way I couldn't find the seat/element type for each seat.

import jsonimport xml.dom.minidomfrom collections import OrderedDictxmlFile = xml.dom.minidom.parse("seatmap1.xml")def set_amount(element_to_analyze, element_to_change):    if element_to_analyze.getAttribute('AvailableInd') == 'true':        element_to_change['seat_price'] = seat.getElementsByTagName('ns:Service')[0].getElementsByTagName(            'ns:Fee')[0].getAttribute('Amount')def str_to_bool(s):    if s == 'true':        return True    else:        return Falseflight_data = OrderedDict()if xmlFile.getElementsByTagName('Document').length == 0:    plane_data = xmlFile.getElementsByTagName('ns:FlightSegmentInfo')[0]    flight_data['FlightNumber'] = plane_data.getAttribute('FlightNumber')    flight_data['DepartureDateTime'] = plane_data.getAttribute('DepartureDateTime')    flight_data['DepartureAirport'] = plane_data.getElementsByTagName('ns:DepartureAirport')[0].getAttribute(        'LocationCode')    flight_data['ArrivalAirport'] = plane_data.getElementsByTagName('ns:ArrivalAirport')[0].getAttribute('LocationCode')    plane = xmlFile.getElementsByTagName('ns:CabinClass')    cabin_object = OrderedDict()  # NS CABIN CLASS    for cabin_class in plane:        cabin = cabin_class.getElementsByTagName('ns:RowInfo')        cabin_type = cabin[0].getAttribute('CabinType')        for row_group in cabin:            row_object = OrderedDict()  # NS ROW INFO            seat_group = row_group.getElementsByTagName('ns:SeatInfo')            for seat in seat_group:                seat_details = OrderedDict()                details = seat.getElementsByTagName('ns:Summary')[0]                seat_details['seat'] = seat.getElementsByTagName('ns:')                seat_details['seat_id'] = details.getAttribute('SeatNumber')                seat_details['cabin_class'] = cabin_type                seat_details['availability'] = str_to_bool(details.getAttribute('AvailableInd'))                set_amount(details, seat_details)                row_object[details.getAttribute('SeatNumber')[-1]] = seat_details            cabin_object[row_group.getAttribute('RowNumber')] = row_object    flight_data['Rows'] = cabin_object    with open('_parsed.json', 'w') as outfile:        outfile.write(json.dumps(flight_data))

This is my xml file

<?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope    xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"    xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"    xmlns:xsd="http://www.w3.org/2001/XMLSchema"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">    <soapenv:Body>        <ns:OTA_AirSeatMapRS Version="1"            xmlns:ns="http://www.opentravel.org/OTA/2003/05/common/">            <ns:Success/>            <ns:SeatMapResponses>                <ns:SeatMapResponse>                    <ns:FlightSegmentInfo DepartureDateTime="2020-11-22T15:30:00" FlightNumber="1179">                        <ns:DepartureAirport LocationCode="LAS"/>                        <ns:ArrivalAirport LocationCode="IAH"/>                        <ns:Equipment AirEquipType="739"/>                    </ns:FlightSegmentInfo>                    <ns:SeatMapDetails>                        <ns:CabinClass Layout="AB EF" UpperDeckInd="false">                            <ns:RowInfo CabinType="First" OperableInd="true" RowNumber="1">                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1A"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1B"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="4" ExitRowInd="false" GalleyInd="false" GridNumber="4" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1E"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1F"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                            </ns:RowInfo>                            <ns:RowInfo CabinType="First" OperableInd="true" RowNumber="2">                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2A"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2B"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="4" ExitRowInd="false" GalleyInd="false" GridNumber="4" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2E"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2F"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                            </ns:RowInfo>                        </ns:CabinClass>                        <ns:CabinClass Layout="ABC DEF" UpperDeckInd="false">                            <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="7">                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7A"/>                                    <ns:Features extension="Lavatory">Other_</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7B"/>                                    <ns:Features extension="Lavatory">Other_</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7C"/>                                    <ns:Features extension="Lavatory">Other_</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7D"/>                                    <ns:Features>BlockedSeat_Permanent</ns:Features>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7E"/>                                    <ns:Features>BlockedSeat_Permanent</ns:Features>                                    <ns:Features>Center</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7F"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                            </ns:RowInfo>                            <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="8">                                <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8A"/>                                    <ns:Status>Held</ns:Status>                                    <ns:Features extension="Limited Recline">Other_</ns:Features>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8B"/>                                    <ns:Status>Held</ns:Status>                                    <ns:Features extension="Limited Recline">Other_</ns:Features>                                    <ns:Features>Center</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8C"/>                                    <ns:Status>Held</ns:Status>                                    <ns:Features extension="Limited Recline">Other_</ns:Features>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8D"/>                                    <ns:Status>Held</ns:Status>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="8E"/>                                    <ns:Features>Center</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="8F"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                            </ns:RowInfo>                            <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="9">                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="9A"/>                                    <ns:Status>Held</ns:Status>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9B"/>                                    <ns:Features>Center</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9C"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9D"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9E"/>                                    <ns:Features>Center</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9F"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                            </ns:RowInfo>                            <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="10">                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10A"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10B"/>                                    <ns:Features>Center</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10C"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10D"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10E"/>                                    <ns:Features>Center</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10F"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                            </ns:RowInfo>                            <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="11">                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11A"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11B"/>                                    <ns:Features>Center</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11C"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11D"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11E"/>                                    <ns:Features>Center</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11F"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                            </ns:RowInfo>                            <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="12">                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12A"/>                                    <ns:Status>Held</ns:Status>                                    <ns:Features extension="Preferred">Other_</ns:Features>                                    <ns:Features>Window</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12B"/>                                    <ns:Status>Held</ns:Status>                                    <ns:Features extension="Preferred">Other_</ns:Features>                                    <ns:Features>Center</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left">                                    <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="12C"/>                                    <ns:Features extension="Preferred">Other_</ns:Features>                                    <ns:Features>Aisle</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                    <ns:Service CodeContext="Preferred">                                        <ns:Fee Amount="4200" CurrencyCode="USD" DecimalPlaces="2">                                            <ns:Taxes Amount="0" CurrencyCode="USD"/>                                        </ns:Fee>                                    </ns:Service>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right">                                    <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="12D"/>                                    <ns:Features extension="Preferred">Other_</ns:Features>                                    <ns:Features>Aisle</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                    <ns:Service CodeContext="Preferred">                                        <ns:Fee Amount="4200" CurrencyCode="USD" DecimalPlaces="2">                                            <ns:Taxes Amount="0" CurrencyCode="USD"/>                                        </ns:Fee>                                    </ns:Service>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12E"/>                                    <ns:Status>Held</ns:Status>                                    <ns:Features extension="Preferred">Other_</ns:Features>                                    <ns:Features>Center</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12F"/>                                    <ns:Status>Held</ns:Status>                                    <ns:Features extension="Preferred">Other_</ns:Features>                                    <ns:Features>Window</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                </ns:SeatInfo>                            </ns:RowInfo>                            <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="38">                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">                                    <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38A"/>                                    <ns:Features>Window</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                    <ns:Service CodeContext="Economy">                                        <ns:Fee Amount="1300" CurrencyCode="USD" DecimalPlaces="2">                                            <ns:Taxes Amount="0" CurrencyCode="USD"/>                                        </ns:Fee>                                    </ns:Service>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center">                                    <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38B"/>                                    <ns:Features>Center</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                    <ns:Service CodeContext="Economy">                                        <ns:Fee Amount="1200" CurrencyCode="USD" DecimalPlaces="2">                                            <ns:Taxes Amount="0" CurrencyCode="USD"/>                                        </ns:Fee>                                    </ns:Service>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left">                                    <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38C"/>                                    <ns:Features>Aisle</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                    <ns:Service CodeContext="Economy">                                        <ns:Fee Amount="1800" CurrencyCode="USD" DecimalPlaces="2">                                            <ns:Taxes Amount="0" CurrencyCode="USD"/>                                        </ns:Fee>                                    </ns:Service>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right">                                    <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38D"/>                                    <ns:Features>Aisle</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                    <ns:Service CodeContext="Economy">                                        <ns:Fee Amount="1800" CurrencyCode="USD" DecimalPlaces="2">                                            <ns:Taxes Amount="0" CurrencyCode="USD"/>                                        </ns:Fee>                                    </ns:Service>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center">                                    <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38E"/>                                    <ns:Features>Center</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                    <ns:Service CodeContext="Economy">                                        <ns:Fee Amount="1200" CurrencyCode="USD" DecimalPlaces="2">                                            <ns:Taxes Amount="0" CurrencyCode="USD"/>                                        </ns:Fee>                                    </ns:Service>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right">                                    <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38F"/>                                    <ns:Features>Window</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                    <ns:Service CodeContext="Economy">                                        <ns:Fee Amount="1300" CurrencyCode="USD" DecimalPlaces="2">                                            <ns:Taxes Amount="0" CurrencyCode="USD"/>                                        </ns:Fee>                                    </ns:Service>                                </ns:SeatInfo>                            </ns:RowInfo>                            <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="39">                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="39A"/>                                    <ns:Status>Held</ns:Status>                                    <ns:Features>Window</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center">                                    <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="39B"/>                                    <ns:Features>Center</ns:Features>                                    <ns:Features extension="Chargeable">Other_</ns:Features>                                    <ns:Service CodeContext="Economy">                                        <ns:Fee Amount="1200" CurrencyCode="USD" DecimalPlaces="2">                                            <ns:Taxes Amount="0" CurrencyCode="USD"/>                                        </ns:Fee>                                    </ns:Service>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39C"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39D"/>                                    <ns:Features>Aisle</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39E"/>                                    <ns:Features>Center</ns:Features>                                </ns:SeatInfo>                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right">                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39F"/>                                    <ns:Features>Window</ns:Features>                                </ns:SeatInfo>                            </ns:RowInfo>                        </ns:CabinClass>                    </ns:SeatMapDetails>                </ns:SeatMapResponse>            </ns:SeatMapResponses>            <ns:Warnings>                <ns:Warning Type="11" Code="59">ENSURE PASSENGER MEETS GOVERNMENT DESIGNATED EXIT ROW CRITERIA</ns:Warning>                <ns:Warning Type="11" Code="450">Valid Credit Card Payment Types: ,VI,UP,MPVI,MC,AX,DS,DC,TP,JC</ns:Warning>            </ns:Warnings>        </ns:OTA_AirSeatMapRS>    </soapenv:Body></soapenv:Envelope>
Reinderien's user avatar
Reinderien
71.2k5 gold badges76 silver badges257 bronze badges
askedMay 13, 2021 at 2:18
Patricio Cabo's user avatar
\$\endgroup\$
8
  • 1
    \$\begingroup\$Can you show an excerpt of the XML you're parsing?\$\endgroup\$CommentedMay 13, 2021 at 3:16
  • \$\begingroup\$Its a bit long, I could try\$\endgroup\$CommentedMay 13, 2021 at 3:31
  • \$\begingroup\$Why are you writing to a JSON file? What's going to consume it?\$\endgroup\$CommentedMay 13, 2021 at 16:40
  • \$\begingroup\$And why is this in an XML file on disk? It looks like a SOAP response. Can't it just be parsed in memory?\$\endgroup\$CommentedMay 13, 2021 at 16:43
  • \$\begingroup\$It is a test I should pass for a job. The idea is they give me this XML and I have to parsed it to a JSON without libraries such objetify or xmltodict or something like that. Also I have to modify the file so the output is formatted as I said\$\endgroup\$CommentedMay 13, 2021 at 17:31

1 Answer1

2
\$\begingroup\$

There's a lot going on here, and the problem is very ill-specified. I understand that you've been asked to do this (where's the original problem description?) for a job application, so maybe they've left a bunch open to interpretation, but anyway:

Usual claims floating around on the internet are that etree is a more Pythonic XML parsing interface when compared to minidom. Seehttps://stackoverflow.com/a/8022507/313768 for example. It's very unclear whether you're going to have performance constraints pushing you toward lxml. I find the etree interface to be more natural so it's what I've shown in my example code, but minidom is "also fine". It's not ideal that you're frequently asking for all matching tags only to pay attention to the first. I have shown a fairly strict xpath navigation scheme that does not force the parser to search the entire tree, and asks for only one element when that's called for.

Yourset_amount is a somewhat strange block of code to extract into a function. It has no return values, and mutateselement_to_change in place. Functions are overall a better fit when they return values and do not mutate their members. Python's approach to this is entirely lax, but if you ever switch to a functional language this becomes more of a factor.

You definestr_to_bool but then fail to use it inset_amount. It's such a simple operation that it's probably not worth capturing in a function, and can be done inline with an== 'true' predicate and noif-statements.

Your use ofOrderedDict is not strictly necessary forany modern version of Python.

YourDocument check is inside-out and backwards - rather than checking for the presence of a totally unrelated element, you should be checking for theabsence of an element that you rely on to generate the currently-attempted document type. This can be represented, for example, as an exception thrown from a constructor as I have it. Fancier patterns could use a factory that probes the document on parse and spins up the correct loading class but your question has insufficient context to justify this.

You've conflated two operations in one: loading from XML into a well-defined in-memory representation, and serialization to JSON-compatible dictionaries. I have shown how these can be separated.

Do not calloutfile.write(dumps; simply calldump which accepts a file-like.

Your

            seat_details['seat'] = seat.getElementsByTagName('ns:')

is mysterious and doesn't seem to ever produce anything. Maybe it can just be deleted?

This:

            row_object[details.getAttribute('SeatNumber')[-1]] = seat_details

is more fragile than it needs to be. You've already been given a row name. Assuming that the row name always precedes the column name in the ID, you should not simply be taking the last character for the column - instead, take a substring from the beginning whose length is the row you already have, and validate that to be your row ID; assign the rest to be your column ID. This will support multi-character columns.

Example Code

This generates output equivalent to yours.

import jsonfrom datetime import datetimefrom decimal import Decimalfrom functools import partialfrom typing import Iterable, Tuple, Optional, Dict, Anyfrom xml.etree import ElementTreefrom xml.etree.ElementTree import ElementNAMESPACES = {    'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/',    'ns': 'http://www.opentravel.org/OTA/2003/05/common/',}ns_find = partial(Element.find, namespaces=NAMESPACES)ns_findall = partial(Element.findall, namespaces=NAMESPACES)class Seat:    __slots__ = (        'available', 'cabin_type', 'seat_id', 'row', 'col', 'seat_price',    )    def __init__(self, seat: Element, cabin_type: str, row: str):        summary = ns_find(seat, './ns:Summary')        self.available = summary.attrib['AvailableInd'] == 'true'        self.cabin_type = cabin_type        seat_id = summary.attrib['SeatNumber']        row_from_id = seat_id[:len(row)]        if row != row_from_id:            raise ValueError(f'Row {row} conflicts with seat ID {seat_id}')        self.seat_id = seat_id        self.row = row        self.col = seat_id[len(row):]        if self.available:            self.seat_price: Optional[Decimal] = Decimal(                ns_find(seat, './ns:Service/ns:Fee').attrib['Amount']            )        else:            self.seat_price = None    def __str__(self):        return self.seat_id    def as_dict(self) -> Dict[str, Any]:        d = {            'seat_id': self.seat_id,            'cabin_class': self.cabin_type,            'availability': self.available,        }        if self.seat_price is not None:            d['seat_price'] = str(self.seat_price)        return d    @classmethod    def get_row(cls, row: Element, cabin_type: str, row_no: str) -> Iterable[Tuple[str, 'Seat']]:        for seat_elm in ns_findall(row, './ns:SeatInfo'):            seat = cls(seat_elm, cabin_type, row_no)            yield seat.col, seatclass AirSeatMap:    __slots__ = ('flight', 'seat_map')    def __init__(self, filename: str):        root = ElementTree.parse(filename).getroot()        response = ns_find(            root,            './soapenv:Body/ns:OTA_AirSeatMapRS'            '/ns:SeatMapResponses/ns:SeatMapResponse'        )        if response is None:            raise ValueError('This is probably not an AirSeatMap')        self.flight = ns_find(response, './ns:FlightSegmentInfo')        self.seat_map = ns_find(response, './ns:SeatMapDetails')    @property    def flight_number(self) -> str:        return self.flight.attrib['FlightNumber']    @property    def departure_time(self) -> datetime:        return datetime.fromisoformat(self.flight.attrib['DepartureDateTime'])    @property    def departure_airport(self) -> str:        return ns_find(self.flight, './ns:DepartureAirport').attrib['LocationCode']    @property    def arrival_airport(self) -> str:        return ns_find(self.flight, './ns:ArrivalAirport').attrib['LocationCode']    @property    def seats(self) -> Iterable[Tuple[str,        Iterable[Tuple[str, Seat]]    ]]:        for cabin_class in ns_findall(self.seat_map, './ns:CabinClass'):            for row in ns_findall(cabin_class, './ns:RowInfo'):                cabin_type = row.attrib['CabinType']                row_no = row.attrib['RowNumber']                yield row_no, Seat.get_row(row, cabin_type, row_no)    def as_dict(self) -> Dict[str, Any]:        return {            'FlightNumber': self.flight_number,            'DepartureDateTime': self.departure_time.isoformat(),            'DepartureAirport': self.departure_airport,            'ArrivalAirport': self.arrival_airport,            'Rows': {                row_no: {                    col_no: seat.as_dict()                    for col_no, seat in row                }                for row_no, row in self.seats            },        }def main():    map = AirSeatMap("seatmap1.xml")    with open('_parsed.json', 'w') as outfile:        json.dump(map.as_dict(), outfile)if __name__ == '__main__':    main()
answeredMay 14, 2021 at 0:30
Reinderien's user avatar
\$\endgroup\$
6
  • 1
    \$\begingroup\$This is so awesome Reinderien, thank you so much. Your solution is so clean. My first problem is that the two xml files have different structure and different attributes and elements. That's why my first idea was oriented to do 2 scripts in one, dividing them with thatif. I know it is an awful solution. I mean they don't share any property, how do I do 1 script to parse two XML file with different structure and different properties? This is my second XML file :GitHubRepo\$\endgroup\$CommentedMay 14, 2021 at 5:26
  • \$\begingroup\$These are the repositories with the instructions and both XML files so you check what I'm saying:InstructionsXML 1XML 2\$\endgroup\$CommentedMay 14, 2021 at 5:40
  • \$\begingroup\$You are my only hope\$\endgroup\$CommentedMay 14, 2021 at 17:48
  • \$\begingroup\$Haha, well I appreciate that, but if you want more feedback about a general solution that encompasses both file formats, you're going to need to write a new question that includes - copied verbatim - the problem statement, all of your own code, and example sections from both XML formats.\$\endgroup\$CommentedMay 14, 2021 at 17:50
  • \$\begingroup\$meta.stackexchange.com/questions/364452/python-xml-json-parsing\$\endgroup\$CommentedMay 14, 2021 at 21:56

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.