How are you doing? I have to make a script to parse an xml input file to a json file. I tried to do my best, but it will be nice if you could check it and help me to improve it. The idea is that I don't have use objectify of libraries that converts files directly. I have to write this scritps with at least this properties:-
Seat/Element type (Seat, Kitchen, Bathroom, etc)-
List item
Seat id (17A, 18A)
Seat price
Cabin class
Availability
By the way I couldn't find the seat/element type for each seat.
import jsonimport xml.dom.minidomfrom collections import OrderedDictxmlFile = xml.dom.minidom.parse("seatmap1.xml")def set_amount(element_to_analyze, element_to_change): if element_to_analyze.getAttribute('AvailableInd') == 'true': element_to_change['seat_price'] = seat.getElementsByTagName('ns:Service')[0].getElementsByTagName( 'ns:Fee')[0].getAttribute('Amount')def str_to_bool(s): if s == 'true': return True else: return Falseflight_data = OrderedDict()if xmlFile.getElementsByTagName('Document').length == 0: plane_data = xmlFile.getElementsByTagName('ns:FlightSegmentInfo')[0] flight_data['FlightNumber'] = plane_data.getAttribute('FlightNumber') flight_data['DepartureDateTime'] = plane_data.getAttribute('DepartureDateTime') flight_data['DepartureAirport'] = plane_data.getElementsByTagName('ns:DepartureAirport')[0].getAttribute( 'LocationCode') flight_data['ArrivalAirport'] = plane_data.getElementsByTagName('ns:ArrivalAirport')[0].getAttribute('LocationCode') plane = xmlFile.getElementsByTagName('ns:CabinClass') cabin_object = OrderedDict() # NS CABIN CLASS for cabin_class in plane: cabin = cabin_class.getElementsByTagName('ns:RowInfo') cabin_type = cabin[0].getAttribute('CabinType') for row_group in cabin: row_object = OrderedDict() # NS ROW INFO seat_group = row_group.getElementsByTagName('ns:SeatInfo') for seat in seat_group: seat_details = OrderedDict() details = seat.getElementsByTagName('ns:Summary')[0] seat_details['seat'] = seat.getElementsByTagName('ns:') seat_details['seat_id'] = details.getAttribute('SeatNumber') seat_details['cabin_class'] = cabin_type seat_details['availability'] = str_to_bool(details.getAttribute('AvailableInd')) set_amount(details, seat_details) row_object[details.getAttribute('SeatNumber')[-1]] = seat_details cabin_object[row_group.getAttribute('RowNumber')] = row_object flight_data['Rows'] = cabin_object with open('_parsed.json', 'w') as outfile: outfile.write(json.dumps(flight_data))This is my xml file
<?xml version="1.0" encoding="UTF-8"?><soapenv:Envelope xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <soapenv:Body> <ns:OTA_AirSeatMapRS Version="1" xmlns:ns="http://www.opentravel.org/OTA/2003/05/common/"> <ns:Success/> <ns:SeatMapResponses> <ns:SeatMapResponse> <ns:FlightSegmentInfo DepartureDateTime="2020-11-22T15:30:00" FlightNumber="1179"> <ns:DepartureAirport LocationCode="LAS"/> <ns:ArrivalAirport LocationCode="IAH"/> <ns:Equipment AirEquipType="739"/> </ns:FlightSegmentInfo> <ns:SeatMapDetails> <ns:CabinClass Layout="AB EF" UpperDeckInd="false"> <ns:RowInfo CabinType="First" OperableInd="true" RowNumber="1"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1A"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1B"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="4" ExitRowInd="false" GalleyInd="false" GridNumber="4" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1E"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="First" OperableInd="true" RowNumber="2"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2A"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2B"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="4" ExitRowInd="false" GalleyInd="false" GridNumber="4" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2E"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="2F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> </ns:CabinClass> <ns:CabinClass Layout="ABC DEF" UpperDeckInd="false"> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="7"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7A"/> <ns:Features extension="Lavatory">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7B"/> <ns:Features extension="Lavatory">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7C"/> <ns:Features extension="Lavatory">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7D"/> <ns:Features>BlockedSeat_Permanent</ns:Features> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7E"/> <ns:Features>BlockedSeat_Permanent</ns:Features> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="7F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="8"> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8A"/> <ns:Status>Held</ns:Status> <ns:Features extension="Limited Recline">Other_</ns:Features> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8B"/> <ns:Status>Held</ns:Status> <ns:Features extension="Limited Recline">Other_</ns:Features> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="true" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8C"/> <ns:Status>Held</ns:Status> <ns:Features extension="Limited Recline">Other_</ns:Features> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="8D"/> <ns:Status>Held</ns:Status> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="8E"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="8F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="9"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="9A"/> <ns:Status>Held</ns:Status> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9B"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9C"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9D"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9E"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="9F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="10"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10A"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10B"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10C"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10D"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10E"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="10F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="11"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11A"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11B"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11C"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11D"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11E"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="11F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="12"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12A"/> <ns:Status>Held</ns:Status> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Window</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12B"/> <ns:Status>Held</ns:Status> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Center</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="12C"/> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Aisle</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Preferred"> <ns:Fee Amount="4200" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="12D"/> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Aisle</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Preferred"> <ns:Fee Amount="4200" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12E"/> <ns:Status>Held</ns:Status> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Center</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="12F"/> <ns:Status>Held</ns:Status> <ns:Features extension="Preferred">Other_</ns:Features> <ns:Features>Window</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="38"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38A"/> <ns:Features>Window</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1300" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38B"/> <ns:Features>Center</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1200" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38C"/> <ns:Features>Aisle</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1800" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38D"/> <ns:Features>Aisle</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1800" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38E"/> <ns:Features>Center</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1200" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="38F"/> <ns:Features>Window</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1300" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> </ns:RowInfo> <ns:RowInfo CabinType="Economy" OperableInd="true" RowNumber="39"> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="true" SeatNumber="39A"/> <ns:Status>Held</ns:Status> <ns:Features>Window</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="2" ExitRowInd="false" GalleyInd="false" GridNumber="2" PlaneSection="Center"> <ns:Summary AvailableInd="true" InoperativeInd="false" OccupiedInd="false" SeatNumber="39B"/> <ns:Features>Center</ns:Features> <ns:Features extension="Chargeable">Other_</ns:Features> <ns:Service CodeContext="Economy"> <ns:Fee Amount="1200" CurrencyCode="USD" DecimalPlaces="2"> <ns:Taxes Amount="0" CurrencyCode="USD"/> </ns:Fee> </ns:Service> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="3" ExitRowInd="false" GalleyInd="false" GridNumber="3" PlaneSection="Left"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39C"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="5" ExitRowInd="false" GalleyInd="false" GridNumber="5" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39D"/> <ns:Features>Aisle</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="6" ExitRowInd="false" GalleyInd="false" GridNumber="6" PlaneSection="Center"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39E"/> <ns:Features>Center</ns:Features> </ns:SeatInfo> <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="7" ExitRowInd="false" GalleyInd="false" GridNumber="7" PlaneSection="Right"> <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="39F"/> <ns:Features>Window</ns:Features> </ns:SeatInfo> </ns:RowInfo> </ns:CabinClass> </ns:SeatMapDetails> </ns:SeatMapResponse> </ns:SeatMapResponses> <ns:Warnings> <ns:Warning Type="11" Code="59">ENSURE PASSENGER MEETS GOVERNMENT DESIGNATED EXIT ROW CRITERIA</ns:Warning> <ns:Warning Type="11" Code="450">Valid Credit Card Payment Types: ,VI,UP,MPVI,MC,AX,DS,DC,TP,JC</ns:Warning> </ns:Warnings> </ns:OTA_AirSeatMapRS> </soapenv:Body></soapenv:Envelope>- 1\$\begingroup\$Can you show an excerpt of the XML you're parsing?\$\endgroup\$Reinderien– Reinderien2021-05-13 03:16:02 +00:00CommentedMay 13, 2021 at 3:16
- \$\begingroup\$Its a bit long, I could try\$\endgroup\$Patricio Cabo– Patricio Cabo2021-05-13 03:31:52 +00:00CommentedMay 13, 2021 at 3:31
- \$\begingroup\$Why are you writing to a JSON file? What's going to consume it?\$\endgroup\$Reinderien– Reinderien2021-05-13 16:40:22 +00:00CommentedMay 13, 2021 at 16:40
- \$\begingroup\$And why is this in an XML file on disk? It looks like a SOAP response. Can't it just be parsed in memory?\$\endgroup\$Reinderien– Reinderien2021-05-13 16:43:23 +00:00CommentedMay 13, 2021 at 16:43
- \$\begingroup\$It is a test I should pass for a job. The idea is they give me this XML and I have to parsed it to a JSON without libraries such objetify or xmltodict or something like that. Also I have to modify the file so the output is formatted as I said\$\endgroup\$Patricio Cabo– Patricio Cabo2021-05-13 17:31:23 +00:00CommentedMay 13, 2021 at 17:31
1 Answer1
There's a lot going on here, and the problem is very ill-specified. I understand that you've been asked to do this (where's the original problem description?) for a job application, so maybe they've left a bunch open to interpretation, but anyway:
Usual claims floating around on the internet are that etree is a more Pythonic XML parsing interface when compared to minidom. Seehttps://stackoverflow.com/a/8022507/313768 for example. It's very unclear whether you're going to have performance constraints pushing you toward lxml. I find the etree interface to be more natural so it's what I've shown in my example code, but minidom is "also fine". It's not ideal that you're frequently asking for all matching tags only to pay attention to the first. I have shown a fairly strict xpath navigation scheme that does not force the parser to search the entire tree, and asks for only one element when that's called for.
Yourset_amount is a somewhat strange block of code to extract into a function. It has no return values, and mutateselement_to_change in place. Functions are overall a better fit when they return values and do not mutate their members. Python's approach to this is entirely lax, but if you ever switch to a functional language this becomes more of a factor.
You definestr_to_bool but then fail to use it inset_amount. It's such a simple operation that it's probably not worth capturing in a function, and can be done inline with an== 'true' predicate and noif-statements.
Your use ofOrderedDict is not strictly necessary forany modern version of Python.
YourDocument check is inside-out and backwards - rather than checking for the presence of a totally unrelated element, you should be checking for theabsence of an element that you rely on to generate the currently-attempted document type. This can be represented, for example, as an exception thrown from a constructor as I have it. Fancier patterns could use a factory that probes the document on parse and spins up the correct loading class but your question has insufficient context to justify this.
You've conflated two operations in one: loading from XML into a well-defined in-memory representation, and serialization to JSON-compatible dictionaries. I have shown how these can be separated.
Do not calloutfile.write(dumps; simply calldump which accepts a file-like.
Your
seat_details['seat'] = seat.getElementsByTagName('ns:')is mysterious and doesn't seem to ever produce anything. Maybe it can just be deleted?
This:
row_object[details.getAttribute('SeatNumber')[-1]] = seat_detailsis more fragile than it needs to be. You've already been given a row name. Assuming that the row name always precedes the column name in the ID, you should not simply be taking the last character for the column - instead, take a substring from the beginning whose length is the row you already have, and validate that to be your row ID; assign the rest to be your column ID. This will support multi-character columns.
Example Code
This generates output equivalent to yours.
import jsonfrom datetime import datetimefrom decimal import Decimalfrom functools import partialfrom typing import Iterable, Tuple, Optional, Dict, Anyfrom xml.etree import ElementTreefrom xml.etree.ElementTree import ElementNAMESPACES = { 'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/', 'ns': 'http://www.opentravel.org/OTA/2003/05/common/',}ns_find = partial(Element.find, namespaces=NAMESPACES)ns_findall = partial(Element.findall, namespaces=NAMESPACES)class Seat: __slots__ = ( 'available', 'cabin_type', 'seat_id', 'row', 'col', 'seat_price', ) def __init__(self, seat: Element, cabin_type: str, row: str): summary = ns_find(seat, './ns:Summary') self.available = summary.attrib['AvailableInd'] == 'true' self.cabin_type = cabin_type seat_id = summary.attrib['SeatNumber'] row_from_id = seat_id[:len(row)] if row != row_from_id: raise ValueError(f'Row {row} conflicts with seat ID {seat_id}') self.seat_id = seat_id self.row = row self.col = seat_id[len(row):] if self.available: self.seat_price: Optional[Decimal] = Decimal( ns_find(seat, './ns:Service/ns:Fee').attrib['Amount'] ) else: self.seat_price = None def __str__(self): return self.seat_id def as_dict(self) -> Dict[str, Any]: d = { 'seat_id': self.seat_id, 'cabin_class': self.cabin_type, 'availability': self.available, } if self.seat_price is not None: d['seat_price'] = str(self.seat_price) return d @classmethod def get_row(cls, row: Element, cabin_type: str, row_no: str) -> Iterable[Tuple[str, 'Seat']]: for seat_elm in ns_findall(row, './ns:SeatInfo'): seat = cls(seat_elm, cabin_type, row_no) yield seat.col, seatclass AirSeatMap: __slots__ = ('flight', 'seat_map') def __init__(self, filename: str): root = ElementTree.parse(filename).getroot() response = ns_find( root, './soapenv:Body/ns:OTA_AirSeatMapRS' '/ns:SeatMapResponses/ns:SeatMapResponse' ) if response is None: raise ValueError('This is probably not an AirSeatMap') self.flight = ns_find(response, './ns:FlightSegmentInfo') self.seat_map = ns_find(response, './ns:SeatMapDetails') @property def flight_number(self) -> str: return self.flight.attrib['FlightNumber'] @property def departure_time(self) -> datetime: return datetime.fromisoformat(self.flight.attrib['DepartureDateTime']) @property def departure_airport(self) -> str: return ns_find(self.flight, './ns:DepartureAirport').attrib['LocationCode'] @property def arrival_airport(self) -> str: return ns_find(self.flight, './ns:ArrivalAirport').attrib['LocationCode'] @property def seats(self) -> Iterable[Tuple[str, Iterable[Tuple[str, Seat]] ]]: for cabin_class in ns_findall(self.seat_map, './ns:CabinClass'): for row in ns_findall(cabin_class, './ns:RowInfo'): cabin_type = row.attrib['CabinType'] row_no = row.attrib['RowNumber'] yield row_no, Seat.get_row(row, cabin_type, row_no) def as_dict(self) -> Dict[str, Any]: return { 'FlightNumber': self.flight_number, 'DepartureDateTime': self.departure_time.isoformat(), 'DepartureAirport': self.departure_airport, 'ArrivalAirport': self.arrival_airport, 'Rows': { row_no: { col_no: seat.as_dict() for col_no, seat in row } for row_no, row in self.seats }, }def main(): map = AirSeatMap("seatmap1.xml") with open('_parsed.json', 'w') as outfile: json.dump(map.as_dict(), outfile)if __name__ == '__main__': main()- 1\$\begingroup\$This is so awesome Reinderien, thank you so much. Your solution is so clean. My first problem is that the two xml files have different structure and different attributes and elements. That's why my first idea was oriented to do 2 scripts in one, dividing them with that
if. I know it is an awful solution. I mean they don't share any property, how do I do 1 script to parse two XML file with different structure and different properties? This is my second XML file :GitHubRepo\$\endgroup\$Patricio Cabo– Patricio Cabo2021-05-14 05:26:39 +00:00CommentedMay 14, 2021 at 5:26 - \$\begingroup\$These are the repositories with the instructions and both XML files so you check what I'm saying:InstructionsXML 1XML 2\$\endgroup\$Patricio Cabo– Patricio Cabo2021-05-14 05:40:34 +00:00CommentedMay 14, 2021 at 5:40
- \$\begingroup\$You are my only hope\$\endgroup\$Patricio Cabo– Patricio Cabo2021-05-14 17:48:53 +00:00CommentedMay 14, 2021 at 17:48
- \$\begingroup\$Haha, well I appreciate that, but if you want more feedback about a general solution that encompasses both file formats, you're going to need to write a new question that includes - copied verbatim - the problem statement, all of your own code, and example sections from both XML formats.\$\endgroup\$Reinderien– Reinderien2021-05-14 17:50:27 +00:00CommentedMay 14, 2021 at 17:50
- \$\begingroup\$meta.stackexchange.com/questions/364452/python-xml-json-parsing\$\endgroup\$Patricio Cabo– Patricio Cabo2021-05-14 21:56:04 +00:00CommentedMay 14, 2021 at 21:56
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.