Movatterモバイル変換


[0]ホーム

URL:


Skip to Main Content
Python for Data Analysis, 2nd Edition
Related skills
  • Data Science
    Associated roles
    • Business analyst
    • Data architect
    • Data scientist
    • Python developer

      Contents

      • New for the Second EditionConventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgmentsIn Memoriam: John D. Hunter (1968–2012)Acknowledgments for the Second Edition (2017)Acknowledgments for the First Edition (2012)
      • 1.1 What Is This Book About?What Kinds of Data?1.2 Why Python for Data Analysis?Python as GlueSolving the “Two-Language” ProblemWhy Not Python?1.3 Essential Python LibrariesNumPypandasmatplotlibIPython and JupyterSciPyscikit-learnstatsmodels1.4 Installation and SetupWindowsApple (OS X, macOS)GNU/LinuxInstalling or Updating Python PackagesPython 2 and Python 3Integrated Development Environments (IDEs) and Text Editors1.5 Community and Conferences1.6 Navigating This BookCode ExamplesData for ExamplesImport ConventionsJargon
      • 2.1 The Python Interpreter2.2 IPython BasicsRunning the IPython ShellRunning the Jupyter NotebookTab CompletionIntrospectionThe %run CommandExecuting Code from the ClipboardTerminal Keyboard ShortcutsAbout Magic CommandsMatplotlib Integration2.3 Python Language BasicsLanguage SemanticsScalar TypesControl Flow
      • 3.1 Data Structures and SequencesTupleListBuilt-in Sequence FunctionsdictsetList, Set, and Dict Comprehensions3.2 FunctionsNamespaces, Scope, and Local FunctionsReturning Multiple ValuesFunctions Are ObjectsAnonymous (Lambda) FunctionsCurrying: Partial Argument ApplicationGeneratorsErrors and Exception Handling3.3 Files and the Operating SystemBytes and Unicode with Files3.4 Conclusion
      • 4.1 The NumPy ndarray: A Multidimensional Array ObjectCreating ndarraysData Types for ndarraysArithmetic with NumPy ArraysBasic Indexing and SlicingBoolean IndexingFancy IndexingTransposing Arrays and Swapping Axes4.2 Universal Functions: Fast Element-Wise Array Functions4.3 Array-Oriented Programming with ArraysExpressing Conditional Logic as Array OperationsMathematical and Statistical MethodsMethods for Boolean ArraysSortingUnique and Other Set Logic4.4 File Input and Output with Arrays4.5 Linear Algebra4.6 Pseudorandom Number Generation4.7 Example: Random WalksSimulating Many Random Walks at Once4.8 Conclusion
      • 5.1 Introduction to pandas Data StructuresSeriesDataFrameIndex Objects5.2 Essential FunctionalityReindexingDropping Entries from an AxisIndexing, Selection, and FilteringInteger IndexesArithmetic and Data AlignmentFunction Application and MappingSorting and RankingAxis Indexes with Duplicate Labels5.3 Summarizing and Computing Descriptive StatisticsCorrelation and CovarianceUnique Values, Value Counts, and Membership5.4 Conclusion
      • 6.1 Reading and Writing Data in Text FormatReading Text Files in PiecesWriting Data to Text FormatWorking with Delimited FormatsJSON DataXML and HTML: Web Scraping6.2 Binary Data FormatsUsing HDF5 FormatReading Microsoft Excel Files6.3 Interacting with Web APIs6.4 Interacting with Databases6.5 Conclusion
      • 7.1 Handling Missing DataFiltering Out Missing DataFilling In Missing Data7.2 Data TransformationRemoving DuplicatesTransforming Data Using a Function or MappingReplacing ValuesRenaming Axis IndexesDiscretization and BinningDetecting and Filtering OutliersPermutation and Random SamplingComputing Indicator/Dummy Variables7.3 String ManipulationString Object MethodsRegular ExpressionsVectorized String Functions in pandas7.4 Conclusion
      • 8.1 Hierarchical IndexingReordering and Sorting LevelsSummary Statistics by LevelIndexing with a DataFrame’s columns8.2 Combining and Merging DatasetsDatabase-Style DataFrame JoinsMerging on IndexConcatenating Along an AxisCombining Data with Overlap8.3 Reshaping and PivotingReshaping with Hierarchical IndexingPivoting “Long” to “Wide” FormatPivoting “Wide” to “Long” Format8.4 Conclusion
      • 9.1 A Brief matplotlib API PrimerFigures and SubplotsColors, Markers, and Line StylesTicks, Labels, and LegendsAnnotations and Drawing on a SubplotSaving Plots to Filematplotlib Configuration9.2 Plotting with pandas and seabornLine PlotsBar PlotsHistograms and Density PlotsScatter or Point PlotsFacet Grids and Categorical Data9.3 Other Python Visualization Tools9.4 Conclusion
      • 10.1 GroupBy MechanicsIterating Over GroupsSelecting a Column or Subset of ColumnsGrouping with Dicts and SeriesGrouping with FunctionsGrouping by Index Levels10.2 Data AggregationColumn-Wise and Multiple Function ApplicationReturning Aggregated Data Without Row Indexes10.3 Apply: General split-apply-combineSuppressing the Group KeysQuantile and Bucket AnalysisExample: Filling Missing Values with Group-Specific ValuesExample: Random Sampling and PermutationExample: Group Weighted Average and CorrelationExample: Group-Wise Linear Regression10.4 Pivot Tables and Cross-TabulationCross-Tabulations: Crosstab10.5 Conclusion
      • 11.1 Date and Time Data Types and ToolsConverting Between String and Datetime11.2 Time Series BasicsIndexing, Selection, SubsettingTime Series with Duplicate Indices11.3 Date Ranges, Frequencies, and ShiftingGenerating Date RangesFrequencies and Date OffsetsShifting (Leading and Lagging) Data11.4 Time Zone HandlingTime Zone Localization and ConversionOperations with Time Zone−Aware Timestamp ObjectsOperations Between Different Time Zones11.5 Periods and Period ArithmeticPeriod Frequency ConversionQuarterly Period FrequenciesConverting Timestamps to Periods (and Back)Creating a PeriodIndex from Arrays11.6 Resampling and Frequency ConversionDownsamplingUpsampling and InterpolationResampling with Periods11.7 Moving Window FunctionsExponentially Weighted FunctionsBinary Moving Window FunctionsUser-Defined Moving Window Functions11.8 Conclusion
      • 12.1 Categorical DataBackground and MotivationCategorical Type in pandasComputations with CategoricalsCategorical Methods12.2 Advanced GroupBy UseGroup Transforms and “Unwrapped” GroupBysGrouped Time Resampling12.3 Techniques for Method ChainingThe pipe Method12.4 Conclusion
      • 13.1 Interfacing Between pandas and Model Code13.2 Creating Model Descriptions with PatsyData Transformations in Patsy FormulasCategorical Data and Patsy13.3 Introduction to statsmodelsEstimating Linear ModelsEstimating Time Series Processes13.4 Introduction to scikit-learn13.5 Continuing Your Education
      • 14.1 1.USA.gov Data from BitlyCounting Time Zones in Pure PythonCounting Time Zones with pandas14.2 MovieLens 1M DatasetMeasuring Rating Disagreement14.3 US Baby Names 1880–2010Analyzing Naming Trends14.4 USDA Food Database14.5 2012 Federal Election Commission DatabaseDonation Statistics by Occupation and EmployerBucketing Donation AmountsDonation Statistics by State14.6 Conclusion
      • A.1 ndarray Object InternalsNumPy dtype HierarchyA.2 Advanced Array ManipulationReshaping ArraysC Versus Fortran OrderConcatenating and Splitting ArraysRepeating Elements: tile and repeatFancy Indexing Equivalents: take and putA.3 BroadcastingBroadcasting Over Other AxesSetting Array Values by BroadcastingA.4 Advanced ufunc Usageufunc Instance MethodsWriting New ufuncs in PythonA.5 Structured and Record ArraysNested dtypes and Multidimensional FieldsWhy Use Structured Arrays?A.6 More About SortingIndirect Sorts: argsort and lexsortAlternative Sort AlgorithmsPartially Sorting Arraysnumpy.searchsorted: Finding Elements in a Sorted ArrayA.7 Writing Fast NumPy Functions with NumbaCreating Custom numpy.ufunc Objects with NumbaA.8 Advanced Array Input and OutputMemory-Mapped FilesHDF5 and Other Array Storage OptionsA.9 Performance TipsThe Importance of Contiguous Memory
      • B.1 Using the Command HistorySearching and Reusing the Command HistoryInput and Output VariablesB.2 Interacting with the Operating SystemShell Commands and AliasesDirectory Bookmark SystemB.3 Software Development ToolsInteractive DebuggerTiming Code: %time and %timeitBasic Profiling: %prun and %run -pProfiling a Function Line by LineB.4 Tips for Productive Code Development Using IPythonReloading Module DependenciesCode Design TipsB.5 Advanced IPython FeaturesMaking Your Own Classes IPython-FriendlyProfiles and ConfigurationB.6 Conclusion
      Content preview fromPython for Data Analysis, 2nd Edition

      Chapter 6.Data Loading, Storage,and File Formats

      Accessing data is a necessary first step for using most of the tools in this book. I’m going to be focused on data input and output using pandas, though there are numerous tools in other libraries to help with reading and writing data in various formats.

      Input and output typically falls into a few main categories: reading text files and other more efficient on-disk formats, loading data from databases, and interacting with network sources like web APIs.

      6.1 Reading and Writing Data in Text Format

      pandas features a number of functions for reading tabular data as a DataFrame object.Table 6-1 summarizes some of them, thoughread_csv is likely the one you’ll use the most.

      Table 6-1.Parsing functions in pandas
      FunctionDescription
      read_csvLoad delimited data from a file, URL, or file-like object; use comma as default delimiter
      read_fwfRead data in fixed-width column format (i.e., no delimiters)
      read_clipboardVersion ofread_csv that reads data from the clipboard; useful for converting tables from web pages
      read_excelRead tabular data from an Excel XLS or XLSX file
      read_hdfRead HDF5 files written by pandas
      read_htmlRead all tables found in the given HTML document
      read_jsonRead data from a JSON (JavaScript Object Notation) string representation
      read_msgpackRead pandas data encoded using the MessagePack binary format
      read_pickleRead an arbitrary object stored in Python pickle format
      read_sasRead a SAS dataset stored in one of the SAS system’s custom ...
      Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
      and much more.
      Start your free trial

      Publisher Resources

      Errata Page

      [8]ページ先頭

      ©2009-2025 Movatter.jp